Open Source Java Library for OCR Text & Image Processing
A Free Java Library that allows Software Developers to Add OCR Capabilities to Java apps & Perform OCR on Scanned Images & PDF Files.
Asprise OCR SDK for Java is a powerful an open source Java SDK provides the capability for performing optical character recognition (OCR) on scanned images, PDF files, and other documents. With its easy-to-use Java API, this SDK can help developers add OCR capabilities to their Java applications quickly and easily. OCR is a very useful technology that enables computers to recognize text in images or documents. The OCR software can be used to convert scanned images of text into digital text that can be edited, searched, or processed by a computer.
The Asprise OCR SDK has included several important features for handling OCR related activities, such as fast OCR processing, several languages support, an Image enhancement facility, recognized text in a variety of formats, and many more. The SDK is based on advanced OCR technology that can recognize text in a wide variety of fonts and languages. The library can be used to develop applications for Java applets, web applications, Swing/JavaFX components and JEE enterprise applications.
The Asprise OCR SDK allows software developers to output recognized text in a variety of formats, including plain text, searchable PDF, and Microsoft Word. The SDK can recognize text in over 100 languages, including English, Chinese, Japanese, Arabic, and many more. With its advanced OCR technology, comprehensive language support, and easy-to-use API, this SDK can help developers save time and effort when building OCR applications.
Getting Started with Asprise OCR SDK for Java
Getting Started with Asprise OCR SDK for Java The recommend way to install Asprise OCR SDK for Java is using Maven. Please use the following command for a smooth installation.
Maven Dependency for Asprise OCR SDK for Java
<dependencies>
<dependency>
<groupId>com.asprise.ocr</groupId>
<artifactId>java-ocr-api;/artifactId>
<version>[15,)</version>
</dependency>
</dependencies>
Install Asprise OCR SDK for Java via GitHub
git clone https://github.com/Asprise/java-.net-ocr-api-library
Extract Text in Plain Text Format via Java
Asprise OCR SDK for Java has provided complete functionality for extracting text from images in plain text format. The library allows users to easily retrieve the text content of scanned documents or images, and use it for further processing or analysis. To achieve the plain text extracting task first you need to load the image from a file, input stream, or URL and apply OCR recognition to the loaded image using the API. Use the appropriate function to retrieve the recognized text in plain text format. The following shows how to load an image, and recognized text as plain text, and the result is printed to the console.
Load Image, & Recognized Text via Java API
import com.asprise.ocr.Ocr;
public class OCRTest {
public static void main(String[] args) throws Exception {
// Load image from file
Ocr ocr = new Ocr();
ocr.startEngine("eng", Ocr.SPEED_FASTEST);
String recognizedText = ocr.recognize(new File("image.png"), Ocr.RECOGNIZE_TYPE_TEXT, Ocr.OUTPUT_FORMAT_PLAINTEXT);
// Print the plain text output
System.out.println("Recognized Text: " + recognizedText);
ocr.stopEngine();
}
}
Perform Various OCR Operations in Java Apps
Asprise OCR SDK for Java API allows software developers to carry out various OCR (Optical Character Recognition) operations on different types of documents. The library fully supports different types of OCR operations, such as OCR Image Files, OCR PDF Files, OCR Handwritten Text, OCR Multiple Languages, perform OCR on part of the image, perform OCR on multiple input files in one shot, perform OCR on a certain page from the specified TIFF file, OCR Batch Processing and many more. The library provides a powerful and flexible tool for performing OCR operations on various types of documents. With its support for multiple languages, image and PDF files, handwritten text, and batch processing, you can quickly and accurately extract text from your documents.
Perform OCR on Multiple Files via Java Library
String s = ocr.recognize("test.png;test2.jpg", -1, 0, 0, 400, 200,
Ocr.RECOGNIZE_TYPE_TEXT, Ocr.OUTPUT_FORMAT_PLAINTEXT);
perform OCR on a PDF input file:
String s = ocr.recognize("test.pdf", -1, 100, 100, 400, 200,
Ocr.RECOGNIZE_TYPE_TEXT, Ocr.OUTPUT_FORMAT_PLAINTEXT);
Multi-threading Support using Asprise OCR
Asprise OCR SDK for Java API has included complete support for multi-threading, which allows developers to process multiple OCR tasks simultaneously. This feature enables developers to improve the performance of their OCR applications by distributing OCR processing across multiple threads, which can run concurrently on multiple cores or processors. The multi-threading support is highly customizable, allowing developers to fine-tune the number of threads and OCR engines to match the available resources and processing requirements. Largely, Asprise OCR SDK for Java's multi-threading support provides developers with a powerful and flexible tool for building high-performance OCR applications that can process large volumes of text quickly and efficiently.
Writing Tricky thread Management Code
OcrExecutorService oes =
new OcrExecutorService("eng", Ocr.SPEED_FASTEST, 4); // 4 threads
List> futures = oes.invokeAll(Arrays.asList(
new OcrExecutorService.OcrCallable(
new File[] {new File("test1.png")},
Ocr.RECOGNIZE_TYPE_ALL, Ocr.OUTPUT_FORMAT_XML),
new OcrExecutorService.OcrCallable(
new File[] {new File("test2.png")},
Ocr.RECOGNIZE_TYPE_ALL, Ocr.OUTPUT_FORMAT_XML)
);
System.out.println("Result of test1.png: " + futures.get(0).get());
System.out.println("Result of test2.png: " + futures.get(1).get());
oes.shutdown(); // stops all OCR engines and disposes all threads