Open Source Java Library for OCR Text & Image Processing
A Free Java Library that allows Software Developers to Add OCR Capabilities to Java apps & Perform OCR on Scanned Images & PDF Files.
For software developers in need of a robust and accessible Java OCR API, the Asprise OCR SDK offers a powerful open source Java OCR solution. This SDK is engineered to efficiently perform OCR operations on images, PDFs, and various document types, enabling applications to recognize text from images and scanned files with high accuracy. Its primary function is to conduct optical character recognition on scanned images, converting them into structured data. A key advantage is its ability to extract text in plain text format, as well as searchable PDF and Word documents, making it an invaluable free Java OCR library for projects involving document digitization, data entry automation, and content processing within Java applets, web apps, and enterprise systems.
The power of this SDK for Java optical character recognition lies in its advanced feature set designed to handle real-world complexity. It supports over 100 languages and utilizes image enhancement facilities to improve accuracy on low-quality scans or difficult fonts. This allows developers to reliably recognize text from files regardless of origin, from crisp PDFs to photographs of documents. By providing a straightforward API that simplifies integration, the Asprise OCR SDK empowers developers to quickly build sophisticated applications that transform static images into editable, searchable, and machine-readable text, significantly streamlining workflows that depend on data extraction.
Getting Started with Asprise OCR SDK for Java
Getting Started with Asprise OCR SDK for Java The recommend way to install Asprise OCR SDK for Java is using Maven. Please use the following command for a smooth installation.
Maven Dependency for Asprise OCR SDK for Java
<dependencies>
<dependency>
<groupId>com.asprise.ocr</groupId>
<artifactId>java-ocr-api;/artifactId>
<version>[15,)</version>
</dependency>
</dependencies>
Install Asprise OCR SDK for Java via GitHub
git clone https://github.com/Asprise/java-.net-ocr-api-library
Extract Text in Plain Text Format via Java
Asprise OCR SDK for Java has provided complete functionality for extracting text from images in plain text format. The library allows users to easily retrieve the text content of scanned documents or images, and use it for further processing or analysis. To achieve the plain text extracting task first you need to load the image from a file, input stream, or URL and apply OCR recognition to the loaded image using the API. Use the appropriate function to retrieve the recognized text in plain text format. The following shows how to load an image, and recognized text as plain text, and the result is printed to the console.
How to Load Image, & Recognized Text via Java API?
import com.asprise.ocr.Ocr;
public class OCRTest {
public static void main(String[] args) throws Exception {
// Load image from file
Ocr ocr = new Ocr();
ocr.startEngine("eng", Ocr.SPEED_FASTEST);
String recognizedText = ocr.recognize(new File("image.png"), Ocr.RECOGNIZE_TYPE_TEXT, Ocr.OUTPUT_FORMAT_PLAINTEXT);
// Print the plain text output
System.out.println("Recognized Text: " + recognizedText);
ocr.stopEngine();
}
}
Perform Various OCR Operations in Java Apps
Asprise OCR SDK for Java API allows software developers to carry out various OCR (Optical Character Recognition) operations on different types of documents. The library fully supports different types of OCR operations, such as OCR Image Files, OCR PDF Files, OCR Handwritten Text, OCR Multiple Languages, perform OCR on part of the image, perform OCR on multiple input files in one shot, perform OCR on a certain page from the specified TIFF file, OCR Batch Processing and many more. The library provides a powerful and flexible tool for performing OCR operations on various types of documents. With its support for multiple languages, image and PDF files, handwritten text, and batch processing, you can quickly and accurately extract text from your documents.
How to Perform OCR on Multiple Files via Java Library?
String s = ocr.recognize("test.png;test2.jpg", -1, 0, 0, 400, 200,
Ocr.RECOGNIZE_TYPE_TEXT, Ocr.OUTPUT_FORMAT_PLAINTEXT);
perform OCR on a PDF input file:
String s = ocr.recognize("test.pdf", -1, 100, 100, 400, 200,
Ocr.RECOGNIZE_TYPE_TEXT, Ocr.OUTPUT_FORMAT_PLAINTEXT);
Multi-threading Support using Asprise OCR
Asprise OCR SDK for Java API has included complete support for multi-threading, which allows developers to process multiple OCR tasks simultaneously. This feature enables developers to improve the performance of their OCR applications by distributing OCR processing across multiple threads, which can run concurrently on multiple cores or processors. The multi-threading support is highly customizable, allowing developers to fine-tune the number of threads and OCR engines to match the available resources and processing requirements. Largely, Asprise OCR SDK for Java's multi-threading support provides developers with a powerful and flexible tool for building high-performance OCR applications that can process large volumes of text quickly and efficiently.
Writing Tricky thread Management Code
OcrExecutorService oes =
new OcrExecutorService("eng", Ocr.SPEED_FASTEST, 4); // 4 threads
List> futures = oes.invokeAll(Arrays.asList(
new OcrExecutorService.OcrCallable(
new File[] {new File("test1.png")},
Ocr.RECOGNIZE_TYPE_ALL, Ocr.OUTPUT_FORMAT_XML),
new OcrExecutorService.OcrCallable(
new File[] {new File("test2.png")},
Ocr.RECOGNIZE_TYPE_ALL, Ocr.OUTPUT_FORMAT_XML)
);
System.out.println("Result of test1.png: " + futures.get(0).get());
System.out.println("Result of test2.png: " + futures.get(1).get());
oes.shutdown(); // stops all OCR engines and disposes all threads