1. Products
  2.   HTML
  3.   Ruby
  4.   Nokogiri
 
  

Free Ruby Library to Read, Write & Parse HTML Files

Open Source Ruby Library For Handling HTML and XML documents. It supports reading, writing, modifying, and querying HTML documents via Ruby API.

In the world of web development, parsing and manipulating HTML and XML documents are common tasks. The Ruby programming language offers a powerful tool for handling these tasks through the Nokogiri library. Nokogiri provides an elegant and efficient way to parse and manipulate HTML and XML documents, making it an essential tool in any Ruby developer's toolbox. Whether you're a beginner or an experienced developer, the library simplifies the process of working with structured documents in Ruby.

Nokogiri is a very powerful open source Ruby gem that serves as a bridge between the Ruby programming language and the libxml2 and libxslt libraries. It allows software developers to parse and manipulate both HTML and XML documents with ease. Whether you're scraping data from websites, extracting information from XML-based APIs, or performing transformations on document structures, it simplifies these tasks. The library supports Document search via XPath 1.0, XSD Schema validation, DOM or SAX Parser for XML, XSLT transformation, HTML4, and HTML5 and so on.

Nokogiri has included an easy-to-understand API for reading, writing, modifying, and querying documents. The library has provided complete support document encoding. To handle the document encoding properly it is always recommended to explicitly set the encoding. Its ease of use, powerful selection methods, and efficient manipulation capabilities make it an excellent choice for tasks like web scraping, XML processing, and HTML parsing. So go ahead, explore its features, and unlock the potential of efficient document parsing and manipulation in your Ruby projects.

Previous Next

Getting Started with Nokogiri

The recommended and easiest way to install Nokogiri is using RubyGems, the dependency management tool for Ruby. Please use the following command a smooth installation.

Install Nokogiri via RubyGemsx

$ gem install nokogiri

You can also install it manually; download the latest release files directly from GitHub repository.

Parsing HTML Documents via Ruby API

The open source Nokogiri library has included support for loading and parsing HTML as well as XML documents inside Ruby applications. Parsing HTML and XML documents is the cornerstone of Nokogiri's capabilities. The library provides two main ways to parse documents: from a file or from a string. To achieve the goal first you use its Nokogiri::HTML classes to create a document object and after that can traverse the document using various methods and selectors, similar to CSS or XPath. The following example shows, how software developers can parse and query a document using Ruby commands.

How to Parse & Query HTML Document inside Ruby Apps?

#! /usr/bin/env ruby

require 'nokogiri'
require 'open-uri'

# Fetch and parse HTML document
doc = Nokogiri::HTML(URI.open('https://nokogiri.org/tutorials/installing_nokogiri.html'))

# Search for nodes by css
doc.css('nav ul.menu li a', 'article h2').each do |link|
  puts link.content
end

# Search for nodes by xpath
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
  puts link.content
end

# Or mix and match
doc.search('nav ul.menu li a', '//article//h2').each do |link|
  puts link.content
end

Modifying HTML/XML Documents via Ruby

The Nokogiri library is a valuable asset for any Ruby developer who needs to work with HTML and XML documents. The library makes it easy for software developers to load and manipulate HTML or XML documents using Ruby code. It enables developers to modify documents by adding, updating, or deleting elements and attributes. This is particularly useful when you need to manipulate web pages or generate dynamic content. The following example shows how software developers can add a new element to an HTML document inside Ruby applications.

How to Add a New Element to an HTML Document via Ruby API?

new_element = Nokogiri::XML::Node.new('p', doc)
new_element.content = 'This is a new paragraph.'

# Add the new element to the document
doc.root.add_child(new_element)

Use XPath and CSS Selectors via Ruby

The open source Nokogiri library has provided a wide array of methods to search for specific elements within a document. It allows users to use both XPath and CSS selectors to navigate and select elements within a parsed document inside Ruby applications. This makes it easy to extract specific information or manipulate the document's structure. The following examples shows how easily software developers can select elements using XPath as well as well CSS selector using Ruby commands.

How to Select Elements using XPath or CSS Selector via Ruby API?

new_element = Nokogiri::XML::Node.new('p', doc)
new_element.content = 'This is a new paragraph.'

# Add the new element to the document
doc.root.add_child(new_element)