Aspose.OCR for Java

Java OCR API to Optical Character Recognition

Develop application with Optical Character Recognition (OCR) capabilities using Java API. Recognize Text from Scanned Documents, Images & other sources.

Aspose.OCR for Java is a leading OCR (Optical Character Recognition) engine that gives software developers the capability to integrate text recognition functionality into their Java applications. It is designed to be very simple to handle and allows software developers to recognize text from scanned documents, images, and other sources, and can be used to extract text from various file formats, including JPEG, PNG, BMP, TIFF, HTML, PDF and many others.

Aspose.OCR for Java uses advanced OCR algorithms to accurately recognize text from images, even in low-quality scans or photographs. It has included support for over 50 different languages, including English, Spanish, French, German, Chinese and more. There are several important features part of the library such as handwritten Text Recognition, extract text from screenshots, extract text from specific areas of an image, create searchable PDFs, smartphone photos support, noise removal support, image binarization support, Increased Accessibility, and many more.

Aspose.OCR for Java uses advanced OCR algorithms that can accurately recognize text from images, even in low-quality scans or photographs. The library can enhance accessibility for users by converting scanned documents into searchable digital documents with ease. With its advanced OCR algorithms, multiple language support, and easy integration, Aspose.OCR for Java is quickly becoming the go-to OCR software for developers looking to add OCR functionality to their Java applications.

At A Glance

An overview of Aspose.OCR for Java features.

Features Overview

OCR Operations
Add OCR Capabilities
Recognize Image text
Convet images of text
Recognized Font text
Search PDF
27 Recognition Languages
Create OCR apps
Save to browser
Extract Text
Multi-threading Support

Features Overview

Recognize rotated Image
Pre-processing filters
PDF to Images
Recognizes Chines Chars
Detects Popular typefaces
Processes whole image
Rotated images Support
Batch Recognition
Built-in Spell Checker
Split PDF
PDF to Excel
PDF to SVG

Aspose.OCR for Java

API mainly supports PDF format but can export PDF documents to a number of other formats.

Reader

PDF, PDF/A, TEX, XPS, SVG

Writer

PDF, TXT, PNG, JPEG , PDF/A, DOC, DOCX, TEX, XPS, SVG, XLSX, PPTX

Aspose.OCR for Java

Platform Independence

Aspose.OCR for Java can work with any Java based programming language.

Java Runtime

Aspose.OCR for Java

Getting Started with Aspose.OCR for Java

The recommend way to install Aspose.OCR for Java is using Maven Repository. Please use the following command for a smooth installation.

Install Aspose.OCR for Java via Maven Repository

 <repositories>
	<repository>
	<Id>AsposeJavaAPI</Id>
	<name>Aspose Java API;/name>
	<url>http://repository.aspose.com/repo/</url>
    </repository>
</repositories>

You can download the library directly from Aspose.OCR product page

Extract Text from Images via Python API

Aspose.OCR for Java has included very useful features allowing software developers to extract text from various types of images inside Python applications. The library has included support for reading text from raster images such as JPEG, PNG, WBMP, BMP, GIF and many more. There are other useful features part of the library for handling text extraction such as reading text from multi-page TIFF images, extracting text from pixel array, Reading images in fastest recognition mode, recognizing single line, extracting text from receipts and many more. The following example shows how to extract text from an image using Java commands.

How to Extract Text from Image using Java API?

AsposeOCR api = new AsposeOCR();
// Customize recognition
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setLanguage(Language.Ukr);
// Extract text from image
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
// Show non-critical recognition problems
result.warnings.forEach((w) -> {
	System.out.println(w);
});
// Get recognition results as JSON
String resultJson = result.GetJson();

Read Specific Areas of an Image via Java API

Aspose.OCR for Java is a useful OCR library allowing software developers to find and read only particular areas of an image, not all text using a couple of lines of Java code. It is a very useful feature and can be very helpful in batch processing for uniform documents like visas, driver’s licenses, ID cards, and so on. It supports features like extracting text inside an image rectangle, extracting lines with coordinates, automatic search for word and line bounding boxes, and many more. The following example shows how to extract text inside a rectangle using Java code.

How to Extract Text inside a Rectangle via Java API?

AsposeOCR api = new AsposeOCR();
// Define image regions
ArrayList regions = new ArrayList();
regions.add(new Rectangle(231,101,430,42));
regions.add(new Rectangle(546,224,123,26));
// Specify recognition settings
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAutoDenoising(true);	
recognitionSettings.setRecognitionAreas(regions);
// Extract text from selected regions
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
System.out.println("Name: " + result.recognitionAreasText.get(0));
System.out.println("Expiry: " + result.recognitionAreasText.get(1));

Select Document Specific Areas via Java API

A document image may contains numerous blocks of various content such as text paragraphs, drawings, diagrams, formulas, tables, maps and so on. Aspose.OCR for Java enables software developers to select and detect particular areas of interest on a page with ease and perform OCR operation on it. The library supports automatic area detection as well as you can override it by manually selecting areas of interest. The following example demonstrated how software developers can enable automatic document areas detection inside their Java applications.

Automatic Document Areas Detection via Java API

 // Create instance of OCR API
AsposeOCR api = new AsposeOCR();
// Enable automatic document areas detection
RecognitionSettings recognitionSettings = new RecognitionSettings();
recognitionSettings.setAllowedCharacters(CharactersAllowedType.LATIN_ALPHABET);
recognitionSettings.setDetectAreas(true);
// Extract text from image
RecognitionResult result = api.RecognizePage("source.png", recognitionSettings);
System.out.println("Recognition result:\n" + result.recognitionText + "\n\n");