Free Python Library to Create & Manage Large ZIP-Archives

A Powerful Open Source very fast Python library that enables Python Developers to Load, Extract & Parse Data from more than 30 different Archive, Compression, and File-System formats

Binary blobs are files that contain data in an unknown or proprietary format. They are often found in firmware images, memory dumps, network packets, or encrypted archives. Analyzing binary blobs can be challenging, as they may contain multiple layers of compression, encryption, or file systems that obscure the actual content. Unblob was originally developed and currently maintained by ONEKEY and it is used in production in their ONEKEY analysis platform. The library can handle various formats such as ZIP, TAR, GZIP, BZIP2, XZ, LZMA, SquashFS, CramFS, JFFS2, UBI/UBIFS, YAFFS2, and many more.

Unblob is a Python library and a command-line tool that aims to make the extraction of binary blobs easier and faster. It can parse unknown binary blobs for more than 30 different archive, compression, and file-system formats, extract their content recursively, and carve out unknown chunks that have not been accounted for. It is free to use, licensed under MIT license, and can be used as a Python library or a standalone tool.

Unblob is very easy to handle and exposes an API that can be used to write custom format handlers and extractors in no time. It is blazing fast, as it uses multi-processing by default, efficient code, memory-mapped files, and Hyperscan as a high-performance matching library. It also supports plugins that can be loaded dynamically to extend its functionality. The library identifies chunk start and end offsets using battle-tested rules and format standards, minimizing false positives and overlapping chunks.

Previous Next

Getting Started with Unblob

The recommend way to install Unblob is using pypi.org. Please use the following command for a smooth installation.

Install Unblob Library via pypi

 pip3 install unblob 

Clone the unblob repository from GitHub

git clone https://github.com/onekey-sec/unblob.git 
It is also possible to install it manually; download the latest release files directly from GitHub repository.

Extract Recognized Formats using Python Library

The open source Unblob library has included support for extracting content from different file formats inside Python applications. The library has included support for more than 30 archives, compression formats and Filesystems. Please remember that to use the library with all supported formats, all extractors need to be installed. Users can use various options to customize the behavior of the library, such as specifying the output directory, the recursion depth, the verbosity level, or the report format. The extract_chunk method can be used by the developers to extract the content of each chunk to a specified output directory. Below is a sample code snippet that shows how to use the library to extract content to an output directory inside Python applications.

How to Extract Content to an Output Directory using Python Library?

import unblob

# Scan a file for known formats
file_path = "some_file.bin"
chunks = unblob.scan_file(file_path)

# Extract each chunk to an output directory
output_dir = "some_dir"
for chunk in chunks:
unblob.extract_chunk(chunk, output_dir)

Perform ELF File Analysis via Python API

The open source Unblob library makes it easy for software developers to detect and extract ELF files from unknown binary blobs, as well as identify their capabilities and features. The library supports both 32-bit and 64-bit ELF files, as well as different architectures such as x86, x86_64, ARM, MIPS, and PowerPC. It can also handle different types of ELF files, such as executables, shared libraries, object files, core dumps, and kernel modules.

Metadata Extraction via Python API

The open source Unblob library has included support for generating a metadata file inside their Python applications. Users can use the --report option when you run Unblob as a command-line tool. This option will generate a JSON file that contains information about the extracted files, such as their path, size, type, magic, entropy, and chunk details. Uers can specify the name and location of the JSON file as an argument to the --report option. Users can then view or analyze the metadata.json file using any JSON viewer or parser.