Free Python Library to Create & Manage Large ZIP-Archives
A Powerful Open Source Very Fast Python library that enables Python Developers to Load, Extract & Parse Data from more than 30 Different Archive, Compression, and File-System formats
What is Unblob Library?
Unblob is an essential Python library and command-line tool designed to simplify and accelerate the extraction of binary blobs—files containing data in unknown or proprietary formats commonly found in firmware images, memory dumps, network packets, or encrypted archives. Originally developed and maintained by ONEKEY for use in their production analysis platform, Unblob expertly handles the challenge of parsing these complex files by supporting more than 30 different archive, compression, and file-system formats. It efficiently processes formats such as ZIP, TAR, GZIP, BZIP2, XZ, LZMA, SquashFS, CramFS, JFFS2, UBI/UBIFS, and YAFFS2, recursively extracting content and carving out unidentified chunks that remain unaccounted for.
Free to use under the MIT license, Unblob is both highly efficient and incredibly versatile. It delivers blazing-fast performance by leveraging multi-processing, efficient code, memory-mapped files, and the Hyperscan high-performance matching library. The tool minimizes false positives and overlapping chunks by identifying start and end offsets using battle-tested rules and format standards. Additionally, Unblob features an easy-to-handle API that allows users to write custom format handlers and extractors quickly, and it supports dynamically loaded plugins to extend its core functionality. This makes Unblob an indispensable asset for anyone needing to analyze complex, multi-layered binary data containing compression, encryption, or embedded file systems.
Getting Started with Unblob
The recommend way to install Unblob is using pypi.org. Please use the following command for a smooth installation.
Install Unblob Library via pypi
pip3 install unblob
Clone the unblob repository from GitHub
git clone https://github.com/onekey-sec/unblob.git
It is also possible to install it manually; download the latest release files directly from GitHub repository.Extract Recognized Formats using Python Library
The open source Unblob library has included support for extracting content from different file formats inside Python applications. The library has included support for more than 30 archives, compression formats and Filesystems. Please remember that to use the library with all supported formats, all extractors need to be installed. Users can use various options to customize the behavior of the library, such as specifying the output directory, the recursion depth, the verbosity level, or the report format. The extract_chunk method can be used by the developers to extract the content of each chunk to a specified output directory. Below is a sample code snippet that shows how to use the library to extract content to an output directory inside Python applications.
How to Extract Content to an Output Directory using Python Library?
import unblob
# Scan a file for known formats
file_path = "some_file.bin"
chunks = unblob.scan_file(file_path)
# Extract each chunk to an output directory
output_dir = "some_dir"
for chunk in chunks:
unblob.extract_chunk(chunk, output_dir)
Perform ELF File Analysis via Python API
The open source Unblob library makes it easy for software developers to detect and extract ELF files from unknown binary blobs, as well as identify their capabilities and features. The library supports both 32-bit and 64-bit ELF files, as well as different architectures such as x86, x86_64, ARM, MIPS, and PowerPC. It can also handle different types of ELF files, such as executables, shared libraries, object files, core dumps, and kernel modules.
Metadata Extraction via Python API
The open source Unblob library has included support for generating a metadata file inside their Python applications. Users can use the --report option when you run Unblob as a command-line tool. This option will generate a JSON file that contains information about the extracted files, such as their path, size, type, magic, entropy, and chunk details. Uers can specify the name and location of the JSON file as an argument to the --report option. Users can then view or analyze the metadata.json file using any JSON viewer or parser.