Aspose.OCR for Java

Java OCR API to Optical Character Recognition

Develop application with Optical Character Recognition (OCR) capabilities using Java API. Recognize Text from Scanned Documents, Images & other sources.

For software developers seeking a powerful Java OCR APIAspose.OCR for Java Java stands out as a premier solution for integrating sophisticated text recognition capabilities. This robust library is designed to perform OCR on scanned images and recognize text from images across a vast array of file formats, including JPEG, PNG, BMP, TIFF, and PDF. It provides a comprehensive toolkit to convert image to text seamlessly within any Java application, making it an indispensable asset for automating data extraction from scanned documents, digitizing archives, and processing user-uploaded content. Its ease of integration allows developers to quickly add advanced Java Optical Character Recognition features without extensive overhead.

The engine's strength lies in its use of advanced algorithms that accurately recognize text from scanned documents, even when dealing with low-quality scans, smartphone photos, or files with background noise. Beyond basic extraction, this open source Java OCR API offers a suite of powerful features, including support for over 50 languages, the ability to recognize text from files and specific image regions, handwritten text recognition, and the creation of searchable PDFs. This functionality is crucial for enhancing data accessibility, building automated processing pipelines, and transforming static images into editable and machine-readable text, solidifying its role as a top-tier OCR solution for the Java ecosystem.

At A Glance

An overview of Aspose.OCR for Java features.

Features Overview

OCR Operations
Add OCR Capabilities
Recognize Image text
Convet images of text
Recognized Font text
Search PDF
27 Recognition Languages
Create OCR apps
Save to browser
Extract Text
Multi-threading Support

Features Overview

Recognize rotated Image
Pre-processing filters
PDF to Images
Recognizes Chines Chars
Detects Popular typefaces
Processes whole image
Rotated images Support
Batch Recognition
Built-in Spell Checker
Split PDF
PDF to Excel
PDF to SVG

Aspose.OCR for Java

API mainly supports PDF format but can export PDF documents to a number of other formats.

Reader

PDF, PDF/A, TEX, XPS, SVG

Writer

PDF, TXT, PNG, JPEG , PDF/A, DOC, DOCX, TEX, XPS, SVG, XLSX, PPTX

Aspose.OCR for Java

Platform Independence

Aspose.OCR for Java can work with any Java based programming language.

Java Runtime

Aspose.OCR for Java

Getting Started with Aspose.OCR for Java

The recommend way to install Aspose.OCR for Java is using Maven Repository. Please use the following command for a smooth installation.

Install Aspose.OCR for Java via Maven Repository

 <repositories>
	<repository>
	<Id>AsposeJavaAPI</Id>
	<name>Aspose Java API;/name>
	<url>http://repository.aspose.com/repo/</url>
    </repository>
</repositories>

You can download the library directly from Aspose.OCR product page

Extract Text from Images via Java API

Aspose.OCR for Java has included very useful features allowing software developers to extract text from various types of images inside Java applications. The library has included support for reading text from raster images such as JPEG, PNG, WBMP, BMP, GIF and many more. There are other useful features part of the library for handling text extraction such as reading text from multi-page TIFF images, extracting text from pixel array, Reading images in fastest recognition mode, recognizing single line, extracting text from receipts and many more. The following example shows how to extract text from an image using Java commands.

How to Extract Text from Image using Java API?

AsposeOCR api = new AsposeOCR();
// Customize recognition
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setLanguage(Language.Ukr);
// Extract text from image
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
// Show non-critical recognition problems
result.warnings.forEach((w) -> {
	System.out.println(w);
});
// Get recognition results as JSON
String resultJson = result.GetJson();

Read Specific Areas of an Image via Java API

Aspose.OCR for Java is a useful OCR library allowing software developers to find and read only particular areas of an image, not all text using a couple of lines of Java code. It is a very useful feature and can be very helpful in batch processing for uniform documents like visas, driver’s licenses, ID cards, and so on. It supports features like extracting text inside an image rectangle, extracting lines with coordinates, automatic search for word and line bounding boxes, and many more. The following example shows how to extract text inside a rectangle using Java code.

How to Extract Text inside a Rectangle via Java API?

AsposeOCR api = new AsposeOCR();
// Define image regions
ArrayList regions = new ArrayList();
regions.add(new Rectangle(231,101,430,42));
regions.add(new Rectangle(546,224,123,26));
// Specify recognition settings
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAutoDenoising(true);	
recognitionSettings.setRecognitionAreas(regions);
// Extract text from selected regions
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
System.out.println("Name: " + result.recognitionAreasText.get(0));
System.out.println("Expiry: " + result.recognitionAreasText.get(1));

Select Document Specific Areas via Java API

A document image may contains numerous blocks of various content such as text paragraphs, drawings, diagrams, formulas, tables, maps and so on. Aspose.OCR for Java enables software developers to select and detect particular areas of interest on a page with ease and perform OCR operation on it. The library supports automatic area detection as well as you can override it by manually selecting areas of interest. The following example demonstrated how software developers can enable automatic document areas detection inside their Java applications.

Java Code for Automatic Document Areas Detection via Free API

 // Create instance of OCR API
AsposeOCR api = new AsposeOCR();
// Enable automatic document areas detection
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAllowedCharacters(CharactersAllowedType.LATIN_ALPHABET);
recognitionSettings.setDetectAreas(true);
// Extract text from image
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");