PHP-bibliotek for å ekstrahere bildetekst på flere språk

Open source PHP Optical Character Recognition API gjør det mulig å laste inn og skanne bilder eller dokumenter, gjenkjenne og ekstrahere tekst fra bilder på flere språk i PHP‑applikasjoner.

Optical Character Recognition (OCR) technology has become an essential tool for extracting text from images and documents nowadays. With the rise of digital transformation, the need for efficient and accurate OCR solutions has never been more pressing. OcrPHP is a very powerful open source OCR library that empowers software developers to build robust and scalable OCR applications. It is a PHP-based OCR library that utilizes the Tesseract OCR engine, a widely-used and highly-regarded OCR technology developed by Google. There are numerous features part of the library, such as document scanning, extracting text from images, text extraction in a specific language, extracting text form PDFs, and many more.

Den optiske tegngjenkjennings‑teknologien (OCR) har blitt et viktig verktøy for å ekstrahere tekst fra bilder og dokumenter i dag. Med den digitale transformasjonens vekst har behovet for effektive og nøyaktige OCR‑løsninger aldri vært mer presserende. OcrPHP er et svært kraftig open source‑OCR‑bibliotek som gir programvareutviklere mulighet til å bygge robuste og skalerbare OCR‑applikasjoner. Det er et PHP‑basert OCR‑bibliotek som bruker Tesseract‑OCR‑motoren, en mye brukt og høyt ansett OCR‑teknologi utviklet av Google. Biblioteket inneholder mange funksjoner, som dokument‑skanning, ekstrahering av tekst fra bilder, tekst‑ekstraksjon på et spesifikt språk, ekstrahering av tekst fra PDF‑filer, og mye mer.

The OcrPHP library has included advanced image preprocessing techniques, such as deskewing, despeckling, and binarization, to improve OCR accuracy. It supports performing OCR in multiple languages, including English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, and many more. Software developers can customize the OCR process by adjusting parameters such as language, page segmentation mode, and OCR engine settings. It has included robust error handling mechanisms to ensure that OCR operations are executed smoothly and efficiently. With features like multi-language support, advanced image scanning, custom configurations, and straightforward integration, it empowers developers to create versatile text-recognition tools with little efforts and low cost.

OcrPHP‑biblioteket har inkludert avanserte forhåndsbehandlingsteknikker for bilder, som avretting, støvfjerning og binarisering, for å forbedre OCR‑nøyaktigheten. Det støtter OCR på flere språk, inkludert engelsk, spansk, fransk, tysk, italiensk, portugisisk, kinesisk, japansk og mange flere. Programvareutviklere kan tilpasse OCR‑prosessen ved å justere parametere som språk, side‑segmenteringsmodus og OCR‑motorinnstillinger. Biblioteket har robuste feilhåndteringsmekanismer for å sikre at OCR‑operasjoner utføres jevnt og effektivt. Med funksjoner som flerspråklig støtte, avansert bildeskanning, tilpassede konfigurasjoner og enkel integrasjon, gjør det utviklere i stand til å lage allsidige tekstgjenkjenningsverktøy med liten innsats og lav kostnad.

Kort oversikt

En oversikt over OcrPHP-funksjoner.

Funksjonsoversikt

Utføre OCR
Legge til OCR-funksjonalitet
Gjenkjenne tekst på mange språk
Konvertere bilder av tekst
Gjenkjent skrifttype-tekst
Søke i PDF
Andre språk
Opprette OCR‑apper
Lagre til nettleser
Ekstrahere tekst
Støtte for flertråding

OcrPHP

OcrPHP støtter populære komprimeringsfilformater listet nedenfor.

Leser

PNG, JPEG, BMP, TIFF, TGA, DICOM

Skriver

PNG, JPEG, BMP, TIFF

OcrPHP

Plattformuavhengighet

OcrPHP krever kun PHP‑runtime.

PHP 5.1 og nyere.

OcrPHP

Kom i gang med OcrPHP

Den anbefalte måten å installere OcrPHP på er via Composer. Vennligst bruk følgende kommando for en smidig installasjon.

Install OcrPHP via Composer

composer require fizzday/ocrphp

Install OcrPHP via Github

git clone https://github.com/fizzday/OcrPHP.git

You can download the compiled shared library from Github repository.

Gjenkjenn og ekstraher tekst fra et bilde via PHP

The open source OcrPHP library makes it easy for software to load various types of images and extract text from those images with just a couple of lines of PHP code. Here is a very simple example, that uses the Imagick library to load an image file and create an instance of the OcrPHP class. Developers after that can set the language and OCR engine settings before performing OCR on the image using the recognize() method. Finally, it prints the extracted text using the getText() method.

How to Extract Text from an Image using PHP Library?

require_once 'OcrPHP/autoload.php';

// Load the image file
$image = new Imagick('path/to/image.jpg');

// Create an instance of the OcrPHP class
$ocr = new OcrPHP();

// Set the language and OCR engine settings
$ocr->setLanguage('eng');
$ocr->setPageSegmentationMode(OcrPHP::PSM_SINGLE_BLOCK);

// Perform OCR on the image
$result = $ocr->recognize($image);

// Print the extracted text
echo $result->getText();

Gjenkjenn tekst på et spesifikt språk via PHP

The OcrPHP library has provided support for multiple languages to carry out OCR operations inside PHP applications. Whether your text is in English, Chinese, or any other supported language, OcrPHP can handle it seamlessly. To extract text in a specific language, pass the language code as a parameter. Ensure the corresponding Tesseract language model is installed. The following example shows how developers can extract from an images in Chinese language inside PHP applications.

How to Extract Text from an Image in Chinses Language via PHP?

require 'vendor/autoload.php';

use Fizzday\Ocr\Ocr;

$imagePath = __DIR__ . '/example-image-chinese.png';

$ocr = new Ocr();

// Extract text in Chinese
$text = $ocr->scan($imagePath, 'chi_sim'); // Use 'eng' for English

echo "Extracted Text (Chinese): \n" . $text;

Batch‑behandling og OCR‑automatisering via PHP

For software developers building document processing applications, batch processing can be a valuable feature. The open source OcrPHP makes it easy for developers to loop through a directory of image files and extract text from each one automatically. This is perfect for automating tasks like scanning invoices, receipts, or books. Here is a very useful example that scans all .png files in the specified directory, extracts text from each, and prints it. You can extend this to save the output to a file or database, making it a powerful tool for document processing.

How to Extract Text from Multiple Images via PHP Library?

require 'vendor/autoload.php';

use Fizzday\Ocr\Ocr;

$directory = __DIR__ . '/images/';
$ocr = new Ocr();

foreach (glob($directory . '*.png') as $imagePath) {
    $text = $ocr->scan($imagePath);
    echo "Text from {$imagePath}: \n" . $text . "\n\n";
}

Tilpasset konfigurasjon og integrasjonsstøtte

The open source OcrPHP is a versatile and developer-friendly library that simplifies the integration of OCR capabilities into PHP projects. The library allows you to specify custom Tesseract configurations, such as language, page segmentation mode, and image preprocessing parameters, offering flexibility to tailor OCR results.