Go Library for Parallel Compression and Decompression
Golang API for programmatically generating as well as reading standard GZIP files. Compress large files by splitting it into blocks and perform compression/decompression in parallel.
pgzip is an open source library that provides complete functionality for parallel compression and decompression using the Go language. The library is very useful for compressing a large amount of data as it is divided into blocks and performed compression/decompression in parallel. The pgzip library is incredibly popular among the developer community and allows Go apps to directly read compressed files with just a couple of commands.
The library is very stable and allows developers to programmatically generate as well as read standard GZIP files. To get the best out of the library it is recommended to use compress or decompress a big amount of data (more than 2MB at a time). The library has included support for several important features such as compress files, decompress files, opening and reading GZIP files, and much more.
Getting Started with pgzip
The recommended way to install pgzip is from GitHub, please use the following command for smooth installation.
Install pgzip via command
go get github.com/klauspost/pgzip/...
Compress Large Files via Go API
The open source pgzip library has included functionality for compressing large amounts of data files using a couple of lines of Go code. The API supports the splitting of the large file into small parts (by default the block size is 1MB) and can be processed up to the number of CPU threads. You can easily control the size of the blocks as well as customize it according to your needs and how many you want to be processed in parallel. For better performance gains, it is advised that users at least be compressing more than 1 megabyte of data at a time.
Decompressing Files via Go API
The free pgzip library enables software developers to decompress their files inside their own Go applications. Same as compression, the decompression can also be performed by customizing the block size. You can easily get your own reader and specify your own read ahead. For your reader, you need to define the block size and the maximum number of blocks that are going to be decoded ahead.
Performance Improvement
The performance of pgzip can be improved as compared to gzip when you have big amounts of data. As pgzip processes blocks in parallel, it obviously has a speed advantage over the other compressors. Use for high throughput, high compression material, like logs, JSON, and CSV data can also be useful. One great advantage of pgzip while decompression is it allows you to do other work while the decompression is taking place.