PHP Library to Extract Image Text in Multiple Languages

Open Source PHP Optical Character Recognition API allows to Load & Scan Images or Documents, Recognize & Extract Text from Images in Multiple Languages inside PHP Apps.

Optical Character Recognition (OCR) technology has become an essential tool for extracting text from images and documents nowadays. With the rise of digital transformation, the need for efficient and accurate OCR solutions has never been more pressing. OcrPHP is a very powerful open source OCR library that empowers software developers to build robust and scalable OCR applications. It is a PHP-based OCR library that utilizes the Tesseract OCR engine, a widely-used and highly-regarded OCR technology developed by Google. There are numerous features part of the library, such as document scanning, extracting text from images, text extraction in a specific language, extracting text form PDFs, and many more.

The OcrPHP library has included advanced image preprocessing techniques, such as deskewing, despeckling, and binarization, to improve OCR accuracy. It supports performing OCR in multiple languages, including English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, and many more. Software developers can customize the OCR process by adjusting parameters such as language, page segmentation mode, and OCR engine settings. It has included robust error handling mechanisms to ensure that OCR operations are executed smoothly and efficiently. With features like multi-language support, advanced image scanning, custom configurations, and straightforward integration, it empowers developers to create versatile text-recognition tools with little efforts and low cost.

At A Glance

An overview of OcrPHP features.

Features Overview

Perform OCR
Add OCR Capabilities
Recognize text in many languages
Convet Images of text
Recognized Font text
Search PDF
Other Languages
Create OCR apps
Save to browser
Extract Text
Multi-threading Support

OcrPHP

OcrPHP supports popular compression file formats listed below.

Reader

PNG, JPEG, BMP, TIFF, TGA, DICOM

Writer

PNG, JPEG, BMP, TIFF

OcrPHP

Platform Independence

OcrPHP only requires PHP Runtime.

PHP 5.1 and above.

OcrPHP

Getting Started with OcrPHP

The recommend way to install OcrPHP is using Composer. Please use the following command for a smooth installation.

Install OcrPHP via Composer

composer require fizzday/ocrphp

Install OcrPHP via Github

git clone https://github.com/fizzday/OcrPHP.git

You can download the compiled shared library from Github repository.

Recognize and Extract Text from an Image via PHP

The open source OcrPHP library makes it easy for software to load various types of images and extract text from those images with just a couple of lines of PHP code. Here is a very simple example, that uses the Imagick library to load an image file and create an instance of the OcrPHP class. Developers after that can set the language and OCR engine settings before performing OCR on the image using the recognize() method. Finally, it prints the extracted text using the getText() method.

How to Extract Text from an Image using PHP Library?

require_once 'OcrPHP/autoload.php';

// Load the image file
$image = new Imagick('path/to/image.jpg');

// Create an instance of the OcrPHP class
$ocr = new OcrPHP();

// Set the language and OCR engine settings
$ocr->setLanguage('eng');
$ocr->setPageSegmentationMode(OcrPHP::PSM_SINGLE_BLOCK);

// Perform OCR on the image
$result = $ocr->recognize($image);

// Print the extracted text
echo $result->getText();

Recognize Text in a Specific Language via PHP

The OcrPHP library has provided support for multiple languages to carry out OCR operations inside PHP applications. Whether your text is in English, Chinese, or any other supported language, OcrPHP can handle it seamlessly. To extract text in a specific language, pass the language code as a parameter. Ensure the corresponding Tesseract language model is installed. The following example shows how developers can extract from an images in Chinese language inside PHP applications.

How to Extract Text from an Image in Chinses Language via PHP?

require 'vendor/autoload.php';

use Fizzday\Ocr\Ocr;

$imagePath = __DIR__ . '/example-image-chinese.png';

$ocr = new Ocr();

// Extract text in Chinese
$text = $ocr->scan($imagePath, 'chi_sim'); // Use 'eng' for English

echo "Extracted Text (Chinese): \n" . $text;

Batch Processing and OCR Automation via PHP

For software developers building document processing applications, batch processing can be a valuable feature. The open source OcrPHP makes it easy for developers to loop through a directory of image files and extract text from each one automatically. This is perfect for automating tasks like scanning invoices, receipts, or books. Here is a very useful example that scans all .png files in the specified directory, extracts text from each, and prints it. You can extend this to save the output to a file or database, making it a powerful tool for document processing.

How to Extract Text from Multiple Images via PHP Library?

require 'vendor/autoload.php';

use Fizzday\Ocr\Ocr;

$directory = __DIR__ . '/images/';
$ocr = new Ocr();

foreach (glob($directory . '*.png') as $imagePath) {
    $text = $ocr->scan($imagePath);
    echo "Text from {$imagePath}: \n" . $text . "\n\n";
}

Custom Configuration & Integration Support

The open source OcrPHP is a versatile and developer-friendly library that simplifies the integration of OCR capabilities into PHP projects. The library allows you to specify custom Tesseract configurations, such as language, page segmentation mode, and image preprocessing parameters, offering flexibility to tailor OCR results.