1. Products
  2.   OCR
  3.   .NET
  4.   Tesseract
 
  

Open Source .NET API for OCR To Process Text & Images

Open Source .NET Optical Character Recognition (OCR) API used to convert images (scanned images & PDF files) containing text into machine-readable text.

What is Free-OCR-API-CSharp?

For software and web developers seeking a robust .NET OCR API, Tesseract provides an exceptional open source .NET OCR API solution. This powerful engine allows you to perform OCR on scanned images and convert image to text directly within C# applications. As a versatile tool, it is designed to recognize text on image files across numerous formats, including JPEG, PNG, TIFF, and BMP. This makes it an ideal choice for tasks like document digitization, data extraction, and automating text recognition workflows, offering a free and highly accessible way to integrate advanced OCR capabilities into any .NET project.

The strength of Tesseract OCR C# API lies in its extensive feature set and reliability. It excels in its ability to convert PDFs containing text into machine-readable text and to convert scanned images to machine-readable text with support for over 100 languages. Originally developed by Hewlett-Packard and now maintained by Google, this engine provides the tools necessary for creating searchable documents, archiving content, and optimizing OCR performance for specific use cases. Its high customizability and proven track record make it a premier choice for building scalable applications that require accurate and efficient text recognition.

Previous Next

Getting Started with Tesseract

The recommend way to install Tesseract is using NuGet. Please use the following command for a smooth installation.

Install Tesseract via NuGet

 Install-Package Tesseract 

Install Tesseract via GitHub

 git clone https://github.com/charlesw/tesseract.git 

Extract Basic Text from an Image via C#

The open source C# library Tesseract enables software developers to extract text from an image inside their own .NET applications. The library makes it easy for software developers to easily retrieve the text content of scanned documents or images, and use it for further processing or analysis. To achieve the task first developers need to import the Tesseract namespace in your code file and create an instance of the Tesseract engine. The following example shows how to extract the basic text from the image and output it to the console.

How to Extract the Basic Text from Image via C# API?

using Tesseract;
using System.Drawing;

namespace MyNamespace
{
    class Program
    {
        static void Main(string[] args)
        {
            var engine = new TesseractEngine("./tessdata", "eng", EngineMode.Default);
            var image = new Bitmap(@"C:\path\to\your\image.jpg");
            var page = engine.Process(image);
            var text = page.GetText();
            image.Dispose();
            page.Dispose();
            engine.Dispose();
            Console.WriteLine(text);
        }
    }
}

Convert Image to Searchable PDF via C# .NET

The open source C# library Tesseract has included some useful features for converting images to searchable PDF documents using C# code. The library also has included support for various output formats, such as plain text, hOCR (HTML), PDF, invisible-text-only PDF, TSV, ALTO and many more. Please remember that to get better OCR results, developer’s need to improve the quality of the images they are going to provide to Tesseract. The following example shows how to create a searchable PDF document containing the recognized text from the image.

How to Convert Image to Searchable PDF using C# .NET?

using (var engine = new TesseractEngine(@"./tessdata", "eng", EngineMode.Default))
    {
        using (var img = Pix.LoadFromFile(testImagePath))
        {
            using (var page = engine.Process(img))
            {
                var text = page.GetText();
                Console.WriteLine("Mean confidence: {0}", page.GetMeanConfidence());

                Console.WriteLine("Text (GetText): \r\n{0}", text);
                Console.WriteLine("Text (iterator):");
                }
        }
    }
FORMAT_PLAINTEXT);
 English