1. Products
  2.   PDF
  3.   PHP
  4.   PDF-to-Text
 
  

Free PHP API to Convert PDF or Specific Part to Text

Open Source PHP Library that Allows Software Developers to Extract Text from PDFs or Specific Page of a PDF File inside PHP Applications.

What is PDF-to-Text?

PDFs are a standard format for digital documents across every industry—from finance and healthcare to education and legal. However, extracting meaningful text content from PDFs can be a challenge for software developers. Fortunately, Spatie’s PDF-to-Text library makes this task simple and efficient for PHP developers. This open source tool provides a clean and lightweight wrapper around the powerful pdftotext binary from the poppler-utils package. The library has an intuitive interface that allows developers to extract text with just a few lines of code. It supports Multi-page documents, PDFs with columns, embedded fonts and images, canned PDFs (if OCR is pre-applied) and much more.

Spatie PDF-to-Text is an open source PHP library that allows software developers to extract plain text from PDF files. It acts as a PHP wrapper around the command-line utility pdftotext, part of the Poppler suite. This means it doesn't rely on PHP libraries for parsing the PDF but instead leverages a mature, battle-tested binary tool that works fast and with high accuracy. Although the library is written in PHP, it works seamlessly across different operating systems (Linux, macOS, and Windows) as long as the pdftotext binary is available. It lightweight and doesn’t pull in any unnecessary third-party libraries. This makes it faster, easier to maintain, and less prone to breaking due to updates in unrelated dependencies.

Previous Next

Getting Started with PDF-to-Text

To run PDF-to-Text first you need to install PHP version 5.3 or higher. The easiest to install the PDF-to-Text is via Composer. Please use the following command for a smooth installation.

Install PDF-to-Text via composer

composer require spatie/pdf-to-text

Install PDF-to-Text via Git Command

git clone https://github.com/spatie/pdf-to-text.git 
cd dompdf/lib

Extract Text from PDF File using PHP API

The open source Spatie PDF-to-Text library is minimal by design, focusing on doing one job—extracting text from PDFs—and doing it well. Its simplicity makes it an excellent fit for custom software solutions such as invoice scanners, academic research tools, legal document processors, or content migration systems. The following example demonstrates, how software developers can just set the PDF file path, and they are ready to extract all its text content. They can also customize the path to the binary if needed.

How to Extract Text from PDF File using PHP API?

use Spatie\PdfToText\Pdf;
$text = (new Pdf())
    ->setPdf('example.pdf')
    ->text();

echo $text;
$text = (new Pdf('/custom/path/to/pdftotext'))
    ->setPdf('report.pdf')
    ->text();

Extract Specific Part of the PDF via PHP

The library allows you to leverage the power of the pdftotext command-line tool by providing support for its various options. This means you can customize the text extraction process to suit your specific needs. For instance, you can use options to maintain the layout of the original PDF, specify a particular page range, or set the resolution of the output. The following simple example shows how software developers first set the layout and r options and then add the f option to specify the first page of the PDF.

How to Set Various Options for Extracting PDF Text via PHP API?

use Spatie\PdfToText\Pdf;

$text = (new Pdf())
    ->setPdf('table.pdf')
    ->setOptions(['layout', 'r 96'])
    ->addOptions(['f 1'])
    ->text();

echo $text;