Free Ruby Library to Sanitize, Scrub & Parse HTML Files
Open Source HTML Library that allows Ruby Developers to Read, Sanitize, Parse, Scrub and Manipulate HTML and XML Documents inside Ruby applications.
In the world of web development, handling HTML content can be a daunting task. Ensuring that user-generated HTML is safe and properly formatted is crucial to prevent security vulnerabilities and maintain a consistent user experience. User inputs, especially in the form of HTML, can carry malicious scripts and vulnerabilities that can jeopardize the security and integrity of your web application. To mitigate this risk, developers turn to tools and libraries designed to sanitize and clean HTML input, and one such powerful library in the Ruby ecosystem is Loofah.
Loofah is a powerful Ruby library designed for sanitizing and manipulating HTML and XML documents. It is a powerful and flexible tool that allows developers to parse, transform, and clean up HTML content effortlessly. Loofah is built on top of Loofah, another popular XML and HTML parsing library for Ruby, making it even more versatile. There are several important features part of the library, such as HTML sanitization support, manipulating HTML content, generating custom scrubbers, CSS sanitization support, parsing HTML documents, and many more.
Loofah is designed to work seamlessly with various Ruby frameworks and libraries, including Ruby on Rails. Integrating it into your existing projects is straightforward, making it an excellent choice for web developers. Loofah is an indispensable tool for Ruby developers working with HTML content. Whether you need to sanitize user-generated input, parse web pages, or manipulate HTML documents programmatically, Loofah provides a straightforward and powerful solution. By leveraging its features, you can enhance the security and functionality of your Ruby applications while saving time and effort.
At A Glance
An overview of Loofah features.
- Sanitize HTML/XML
- Clean HTML
- Web Scrapping
- Extract HTML Elements
- Parse Documents from String
- Extract Text
- Manipulate HTML Documents
- Parse HTML from URL
- Search Data
- Parse HTML
- HTML rendering
- HTML Viewer
- HTML to PDF
Loofah supports HTML file format as well as industry-standard formats for export.
Loofah is tested with Ruby >= 2.7
- Ruby >= 2.7
- JRuby >= 22.214.171.124
Getting Started with Loofah
The recommend way to install Loofah is using RubyGems package manager. Please use the following command for a smooth installation.
Install Loofah via RubyGems
$ gem install loofah
You can also install it manually; download the latest release files directly from GitHub repository.
HTML Sanitization via Ruby Library
The open source Loofah library allows software developers to load and sanitize HTML documents inside their Ruby applications. At its core, Loofah excels in HTML sanitization. It offers a wide range of options for removing potentially harmful elements, attributes, and protocols from HTML documents. Software developers can define a whitelist of allowed elements and attributes while stripping out everything else. This ensures that user-generated content is safe to render on your web pages. The following example demonstrates how software developers can Sanitize HTML documents with just a couple of lines of code.
How to Sanitize HTML Documents inside Ruby Applications?
html = "
Safe content" sanitizer = Loofah.scrub_fragment(html, :prune) cleaned_html = sanitizer.to_s
Create Customizable Scrubbers via Ruby API
The open source Loofah library allows software developers to create customizable scrubbers to clean and format HTML content according to their needs inside their Ruby applications. Loofah's scrubbing capabilities go beyond basic sanitization. For example, you can ensure that all links have the correct target attribute or add additional attributes to specific elements. The library allows to create custom scrubbers, which are classes responsible for defining how elements and attributes should be sanitized. This level of customization is invaluable when dealing with specific requirements or content structures. The following example shows how software developers can create custom scrubber using Ruby code.
How to Create Custom Scrubber via Ruby API?
n# Custom scrubber example class CustomScrubber < Loofah::Scrubber def initialize @direction = :top_down end def scrub(node) node.remove if node.text? && node.text =~ /malicious_pattern/ end end document = Loofah.fragment("Some text with malicious_pattern") document.scrub!(CustomScrubber.new)
Manipulate & Parsing HTML in Ruby Apps
Loofah provides a simple and consistent interface for parsing HTML documents, allowing software developers to extract information from web pages, emails, or any other HTML source. This feature is particularly useful for web scraping and data extraction tasks. Moreover, the library enables software professionals to manipulate HTML content easily. Software developers can add, modify, or delete elements and attributes within the document. This is handy when users need to customize the appearance or structure of HTML documents programmatically.