1. Products
  2.   OCR
  3.   PHP
  4.   OcrPHP
 
  

PHP Library to Extract Image Text in Multiple Languages

Open Source PHP Optical Character Recognition API allows to Load & Scan Images or Documents, Recognize & Extract Text from Images in Multiple Languages inside PHP Apps.

What is OcrPHP?

In the era of rapid digital transformation, Optical Character Recognition (OCR) has evolved into a non-negotiable asset for seamless data digitization and management. OcrPHP stands out as a high-performance, open-source OCR library designed specifically for PHP environments, granting developers the ability to construct scalable and robust text extraction solutions effortlessly. Built upon Google’s renowned Tesseract OCR engine, this powerful tool simplifies the complex process of extracting text from images and converting static PDF documents into editable data. By bridging the gap between raw files and digital utility, OcrPHP provides a comprehensive suite of features—ranging from efficient document scanning to precise text recognition—making it an ideal architecture for modern software seeking high efficiency.

Beyond basic extraction, OcrPHP ensures superior output quality through advanced image preprocessing capabilities, such as automated deskewing, despeckling, and binarization, which significantly refine input clarity for maximum accuracy. The library offers extensive versatility with multi-language support—covering global languages like English, Chinese, French, and German—while allowing developers to fine-tune results via custom parameters like page segmentation modes and engine settings. Coupled with rigorous error handling mechanisms, this PHP-based solution facilitates straightforward integration and reliable execution. Ultimately, OcrPHP empowers development teams to deploy versatile, cost-effective text-recognition tools that drive productivity with minimal development overhead.

Previous Next

Getting Started with OcrPHP

The recommend way to install OcrPHP is using Composer. Please use the following command for a smooth installation.

Install OcrPHP via Composer

composer require fizzday/ocrphp

Install OcrPHP via Github

git clone https://github.com/fizzday/OcrPHP.git 

You can download the compiled shared library from Github repository.

Recognize and Extract Text from an Image via PHP

The open source OcrPHP library makes it easy for software to load various types of images and extract text from those images with just a couple of lines of PHP code. Here is a very simple example, that uses the Imagick library to load an image file and create an instance of the OcrPHP class. Developers after that can set the language and OCR engine settings before performing OCR on the image using the recognize() method. Finally, it prints the extracted text using the getText() method.

How to Extract Text from an Image using PHP Library?

require_once 'OcrPHP/autoload.php';

// Load the image file
$image = new Imagick('path/to/image.jpg');

// Create an instance of the OcrPHP class
$ocr = new OcrPHP();

// Set the language and OCR engine settings
$ocr->setLanguage('eng');
$ocr->setPageSegmentationMode(OcrPHP::PSM_SINGLE_BLOCK);

// Perform OCR on the image
$result = $ocr->recognize($image);

// Print the extracted text
echo $result->getText();

Recognize Text in a Specific Language via PHP

The OcrPHP library has provided support for multiple languages to carry out OCR operations inside PHP applications. Whether your text is in English, Chinese, or any other supported language, OcrPHP can handle it seamlessly. To extract text in a specific language, pass the language code as a parameter. Ensure the corresponding Tesseract language model is installed. The following example shows how developers can extract from an images in Chinese language inside PHP applications.

How to Extract Text from an Image in Chinses Language via PHP?

require 'vendor/autoload.php';

use Fizzday\Ocr\Ocr;

$imagePath = __DIR__ . '/example-image-chinese.png';

$ocr = new Ocr();

// Extract text in Chinese
$text = $ocr->scan($imagePath, 'chi_sim'); // Use 'eng' for English

echo "Extracted Text (Chinese): \n" . $text;

Batch Processing and OCR Automation via PHP

For software developers building document processing applications, batch processing can be a valuable feature. The open source OcrPHP makes it easy for developers to loop through a directory of image files and extract text from each one automatically. This is perfect for automating tasks like scanning invoices, receipts, or books. Here is a very useful example that scans all .png files in the specified directory, extracts text from each, and prints it. You can extend this to save the output to a file or database, making it a powerful tool for document processing.

How to Extract Text from Multiple Images via PHP Library?

require 'vendor/autoload.php';

use Fizzday\Ocr\Ocr;

$directory = __DIR__ . '/images/';
$ocr = new Ocr();

foreach (glob($directory . '*.png') as $imagePath) {
    $text = $ocr->scan($imagePath);
    echo "Text from {$imagePath}: \n" . $text . "\n\n";
}

Custom Configuration & Integration Support

The open source OcrPHP is a versatile and developer-friendly library that simplifies the integration of OCR capabilities into PHP projects. The library allows you to specify custom Tesseract configurations, such as language, page segmentation mode, and image preprocessing parameters, offering flexibility to tailor OCR results.

 English