Free Ruby Librry to Parse Large Excel XLSX File Remotely
A Powerful Open Source Ruby Library to Parse Large Excel XLSX and XLSM Files Remotely. It Supports Parsing Remote Files and Images, Mapping Headers, and So on.
What is Creek Library?
In the world of data processing, handling large Excel files can be a daunting task, often leading to performance bottlenecks and high memory consumption. Enter Creek, a powerful open-source Ruby library designed to parse large Excel files with remarkable efficiency. It is an efficient tool for parsing large Excel files. It details key features like stream parsing, support for various file types, and flexible usage options in both standalone scripts and Rails applications. It provided complete support for basic operations, image parsing, and handling remote files, making it a valuable resource for developers looking to optimize their data processing workflows.
Creek is a robust, efficient, and focused open-source library for parsing .xlsx / .xlsm Excel files in Ruby. It is a Ruby gem that provides a fast and simple way to read and parse large Excel files (XLSX and XLSM). It utilizes stream parsing, which means it reads the file piece by piece instead of loading the entire file into memory. This approach makes Creek incredibly memory-efficient and ideal for applications that deal with massive datasets. Whether you're working on a standalone Ruby script or a Rails application, Creek offers a seamless integration experience. If your project involves large spreadsheets, images, metadata, or Rails file uploads, Creek gives a lot of needed functionality with minimal overhead.
Getting Started with Creek
The recommend way to install Creek library is by using RubyGems. Please use the following command for smooth installation.
Install Creek via RubyGems
$ gem install Creek
Parsing Large Excel Files via Ruby
The cornerstone of open source Creek library is its stream parsing capability. This feature allows you to process large Excel files without worrying about memory overloads. By reading the file in chunks, Creek ensures that your application remains responsive and stable, even when handling files with hundreds of thousands of rows. The most common use case is to open a file and read data from its worksheets. Here is a simple example that demonstrates, how software developers can parse an Excel file via Ruby library.
How to Parse Large Excel XLSX Files via Ruby Library?
require 'creek'
# Open the Excel file
creek = Creek::Book.new 'path/to/your/sample.xlsx'
# Get the first sheet
sheet = creek.sheets[0]
# Loop through rows with cell coordinates
sheet.rows.each do |row|
puts row
# => {"A1"=>"Content 1", "B1"=>nil, "C1"=>"Content 2"}
end
# Loop through rows without cell coordinates
sheet.simple_rows.each do |row|
puts row
# => {"A"=>"Content 1", "B"=>nil, "C"=>"Content 2"}
End
Image Parsing & Extracting via Ruby Library
While not enabled by default to conserve memory, the Creek library can parse images from your Excel files. By using the with_images method, you can preload and extract images from cells. The images are returned as an array of Pathname objects, making them easy to work with. Here is a simple example that demonstrates, how software developers can parse and extract I mages from an Excel spreadsheet via Ruby Library.
How to Parse and Extract Images from an Excel Worksheet via Ruby Library?
require 'creek'
book = Creek::Book.new 'presentation.xlsx'
sheet = book.sheets.first
sheet.with_images.rows.each do |row|
row.each do |coord, value|
if value.is_a?(Array)
# this cell has images
puts "Images at #{coord}: #{value.inspect}"
else
puts "#{coord}: #{value}"
end
end
end
# Images at a specific cell
images = sheet.images_at('B2')
if images
images.each do |path|
puts "Found image file: #{path}"
end
else
puts "No image at B2"
end
Remote File Parsing via Ruby Library
Need to parse an Excel file from a URL? The Creek library has you covered. By setting the remote: true option, you can parse files directly from a remote server, eliminating the need to download them first. You can parse files from URLs or paths even if they don’t have .xlsx or .xlsm extensions. The extension check can be skipped. The argument check_file_extension can be provided to bypass extension enforcement. Here is a simple example for parsing Excel files remotely via Ruby library.
How to Parse Excel XLSX File Remotely via Ruby Library?
remote_url = 'http://example.com/sample.xlsx'
creek = Creek::Book.new remote_url, remote: true
# ... process the file
XLSX and XLSM Files Parsing via Ruby
The open source ruby library Creek supports both the standard XLSX and the macro-enabled XLSM file formats, providing flexibility for various use cases. This ensures that you can handle a wide range of Excel files without needing multiple libraries.