Thư viện Ruby miễn phí để Phân tích Tệp Excel XLSX lớn từ xa

Một Thư viện Ruby mạnh mẽ và mã nguồn mở để phân tích các tệp Excel XLSX và XLSM lớn từ xa. Nó hỗ trợ phân tích các tệp và hình ảnh từ xa, ánh xạ tiêu đề và các tính năng khác.

Thư viện Creek là gì?

在数据处理的世界中,处理大型 Excel 文件可能是一项艰巨的任务,常常导致性能瓶颈和高内存消耗。Creek 应运而生,这是一款功能强大的开源 Ruby 库,旨在以卓越的效率解析大型 Excel 文件。它是一种高效的解析大型 Excel 文件的工具,详细介绍了流式解析、对各种文件类型的支持以及在独立脚本和 Rails 应用程序中的灵活使用选项等关键特性。它提供了对基本操作、图像解析和远程文件处理的完整支持,是希望优化数据处理工作流的开发者的宝贵资源。

Creek 是一个强大、高效且专注的开源库,用于在 Ruby 中解析 .xlsx/.xlsm Excel 文件。它是一个 Ruby gem,提供了一种快速且简单的方式来读取和解析大型 Excel 文件(XLSX 和 XLSM)。它利用流式解析,这意味着它会逐块读取文件,而不是将整个文件加载到内存中。这种方法使 Creek 极其节省内存,适用于处理大型数据集的应用程序。无论您是在独立的 Ruby 脚本还是 Rails 应用程序中工作,Creek 都提供了无缝的集成体验。如果您的项目涉及大型电子表格、图像、元数据或 Rails 文件上传,Creek 能以最小的开销提供大量所需功能。

Previous Next

Bắt đầu với Creek

推荐的安装 Creek 库方式是使用 RubyGems。请使用以下命令进行顺畅的安装。

通过 RubyGems 安装 Creek

$ gem install Creek  

Phân tích các tệp Excel lớn bằng Ruby

The cornerstone of open source Creek library is its stream parsing capability. This feature allows you to process large Excel files without worrying about memory overloads. By reading the file in chunks, Creek ensures that your application remains responsive and stable, even when handling files with hundreds of thousands of rows. The most common use case is to open a file and read data from its worksheets. Here is a simple example that demonstrates, how software developers can parse an Excel file via Ruby library.

如何使用 Ruby 库解析大型 Excel XLSX 文件?

require 'creek'

# Open the Excel file
creek = Creek::Book.new 'path/to/your/sample.xlsx'

# Get the first sheet
sheet = creek.sheets[0]

# Loop through rows with cell coordinates
sheet.rows.each do |row|
  puts row
  # => {"A1"=>"Content 1", "B1"=>nil, "C1"=>"Content 2"}
end

# Loop through rows without cell coordinates
sheet.simple_rows.each do |row|
  puts row
  # => {"A"=>"Content 1", "B"=>nil, "C"=>"Content 2"}
End

Phân tích và trích xuất hình ảnh bằng Thư viện Ruby

While not enabled by default to conserve memory, the Creek library can parse images from your Excel files. By using the with_images method, you can preload and extract images from cells. The images are returned as an array of Pathname objects, making them easy to work with. Here is a simple example that demonstrates, how software developers can parse and extract I mages from an Excel spreadsheet via Ruby Library.

如何使用 Ruby 库解析并提取 Excel 工作表中的图像?

require 'creek'

book = Creek::Book.new 'presentation.xlsx'
sheet = book.sheets.first

sheet.with_images.rows.each do |row|
  row.each do |coord, value|
    if value.is_a?(Array)
      # this cell has images
      puts "Images at #{coord}: #{value.inspect}"
    else
      puts "#{coord}: #{value}"
    end
  end
end

# Images at a specific cell
images = sheet.images_at('B2')
if images
  images.each do |path|
    puts "Found image file: #{path}"
  end
else
  puts "No image at B2"
end

Phân tích tệp từ xa bằng Thư viện Ruby

Need to parse an Excel file from a URL? The Creek library has you covered. By setting the remote: true option, you can parse files directly from a remote server, eliminating the need to download them first. You can parse files from URLs or paths even if they don’t have .xlsx or .xlsm extensions. The extension check can be skipped. The argument check_file_extension can be provided to bypass extension enforcement. Here is a simple example for parsing Excel files remotely via Ruby library.

如何使用 Ruby 库远程解析 Excel XLSX 文件?

remote_url = 'http://example.com/sample.xlsx'
creek = Creek::Book.new remote_url, remote: true
# ... process the file

Phân tích các tệp XLSX và XLSM bằng Ruby

The open source ruby library Creek supports both the standard XLSX and the macro-enabled XLSM file formats, providing flexibility for various use cases. This ensures that you can handle a wide range of Excel files without needing multiple libraries.

 中国人