Free C# .NET API for Parsing HTML Documents

Open Source C# .NET library that enables software developers to parse HTML documents, manipulate HTML elements, and extracting relevant data

HTML parsing, web scraping, and data extraction can be challenging and essential tasks for web and software engineers. However, web engineers can now breathe a sigh of relief thanks to the Html Agility Pack (HAP) library. Html Agility Pack (HAP) is a very powerful open-source library that simplifies the process of parsing, manipulating, and querying HTML documents, making it an indispensable asset for web developers and data enthusiasts alike.

The Html Agility Pack is an open-source library for .NET that allows developers to parse HTML documents easily. It provides a convenient object model and a robust set of APIs to navigate and manipulate HTML elements programmatically. Whether you need to extract data from websites, scrape information, or perform any other HTML-related task, HAP comes to the rescue with its intuitive interface and extensive functionality. The HAP library can be easily integrated into your .NET applications using NuGet. Simply install the package and start using its features in your code.

Using Html Agility Pack (HAP) library, software developers can interact with HTML elements using a simple and intuitive object model. Elements can be easily selected, modified, and queried using familiar syntax, making it a breeze to navigate and manipulate HTML documents programmatically. The library is a game-changer for developers who work with HTML parsing and manipulation tasks. By simplifying the complexities of working with HTML documents, HAP empowers software developers to focus on extracting meaningful data and building robust applications.

At A Glance

An overview of Html Agility Pack features.

Features Overview

Robust HTML Parsing
Manipulate HTML Files
Render Office documents
Extract Images from HTML
Open HTML
Read HTML
Modify HTML Files
HTML rendering
HTML Viewer
HTML to PDF
Extract TOC
Extract plain text

Html Agility Pack

Html Agility Pack supports HTML file format as well as industry-standard formats for export.

Reader

HTML

Writer

TXT, HTML , PDF

Html Agility Pack

Platform Independence

Html Agility Pack includes .NET Core >= 1.0 & .NET Framework >= 4.6

.NET Framework >= 4.6
.NET Standard >= 1.3

Html Agility Pack

Getting Started with Html Agility Pack

The recommended way to install Html Agility Pack (HAP) is using NuGet. Please use the following command a smooth installation.

Install Html Agility Pack via NuGet

NuGet\Install-Package HtmlAgilityPack -Version 1.11.46

You can also install it manually; download the latest release files directly from GitHub repository.

Robust HTML Parsing via C# API

The open source Html Agility Pack (HAP) library has included a very useful features for loading and parsing HTML parsing inside C# applications. The HAP library is designed to handle malformed HTML and can parse even the most complex HTML documents. It performs automatic tag balancing, supports self-closing tags, and adjusts to tag soup situations. There are various ways to load and parse HTML, such as from file, string, web, and from Browser. The following code shows various ways for loading and parsing files inside .NET applications.

How to Load and Parse files inside .NET applications via C# Library?

// From File
var doc = new HtmlDocument();
doc.Load(filePath);

// From String
var doc = new HtmlDocument();
doc.LoadHtml(html);

// From Web
var url = "http://html-agility-pack.net/";
var web = new HtmlWeb();
var doc = web.Load(url);

Manipulate HTML Documents via C# API

The free Html Agility Pack (HAP) library has included very powerful features for working with HTML documents and elements inside C# applications. The HAP allows you to modify the HTML structure by adding, modifying, or removing elements. There are several important features part of the library, such as creates a duplicate of the node, inserts the specified node immediately, removes all the children, adds the specified node to the end of the list, creates an HTML node from a string representing literal HTML and many more. You can update attributes, change text content, or even clone elements as per your requirements. The following example shows how manipulate HTML documents using C# code.

Load and Manipulate HTML Documents via .NET API

var doc = new HtmlDocument();
doc.LoadHtml(html);

// InnerHtml 
var innerHtml = doc.DocumentNode.InnerHtml;

// InnerText 
var innerText = doc.DocumentNode.InnerText;