Open Source Python Library to Process PDF Files

Free Python API allows linearizing PDFs and accessing encrypted PDFs. It supports PDF creation from the scratch, copying pages from one PDF to other, split or merge PDFs and many more.

What is PikePDF Library?

PikePDF is a very simple Python PDF library that allows software developers to work with PDF files inside Python applications. It is based on QPDF, a powerful PDF manipulation and repair library. PikePDF is a PDF content transformation library and provides low-level access to PDF files. This means users need knowledge of PDF internals and familiarity with PDF specs. The library is open source and is available under the MIT license for public use. The library is open source and is available under the MPL-2.0 License.

The PikePDF provides support for linearizing PDFs and access to encrypted PDFs. It has included a very powerful set of features related PDF management such as PDF creation from the scratch, copy pages from one PDF to otherone, split or merge PDFs, image or text extraction from PDF, replacing content in PDF, PDF repairing support, page settings support, manage PDF metadata, work with password-protected, PDF XMP metadata editing, the transformation of existing PDFs and man more.

At A Glance

An overview of PikePDF features.

Features Overview

Generate PDF
Copy PDF pages
Extract Images
PDF repairing
Extract text
Split PDFs
Merge PDFs
Rotating PDFs
Concatenating PDFs
Embedding hyperlinks
Insert circles
Add complex shapes
Replacing conten
Data extraction
Text kerning
PDF form
Embedding images

PikePDF

PikePDF supports PDF file format as well as industry-standard formats for export.

Reader

Writer

TXT, HTML

PikePDF

Platform Independence

PikePDF is tested with Python 3.6 and higher.

Python 3.6 & higher

PikePDF

Getting Started with PikePDF

PikePDF requires Python 3.6 and higher. You can install PikePDF using pip. Please use the following command to install it.

Install PikePDF via pip

 pip install pikepdf

Copy Pages from One PDF to Other via Python

The open source PikePDF library provides the capability that enables software developers to copay page from one PDF to other with just a couple of lines of Python code. Copying pages between PDF objects will create a shallow copy of the source page within the target PDF file and therefore modifying the pages will not affect the original PDF documents. It is also possible to replace specific pages with custom content. It is also possible to copy pages within a particular PDF.

How to Open & Manipulate PDF Documents via Python Library?

 # PDF Documents Manipulation 
  from pikepdf import Pdf
  new_pdf = Pdf.new()
  with Pdf.open('sample.pdf') as pdf:
    pdf.save('output.pdf') 
  
   # Copying pages from other PDFs
  pdf = Pdf.open('../tests/resources/fourpages.pdf')
  appendix = Pdf.open('../tests/resources/sandwich.pdf')
  pdf.pages.extend(appendix.pages)

PDF Splitting & Merging via Python

The PDF PikePDF library gives software developers the power to access existing PDF files and split it into multiple PDF files with ease. While splitting PDF all we require is that the new PDFs must hold the destination pages. The library also makes sure to transfer data associated with each page, so that every page stands on its own. The library also included support for merging or concatenating multiple PDF documents into a single one. It is also possible to reverse the order of the PDF pages with just a couple of lines of code.

How to Split & Merge PDF Documents via Python Library?

 # PDF Splitting
  pdf = Pdf.open('../tests/resources/fourpages.pdf')
  for n, page in enumerate(pdf.pages):
  dst = Pdf.new()
  dst.pages.append(page)
  dst.save(f'{n:02d}.pdf')

  # Combine Multiple PDF pages into a single One

  from glob import glob
  pdf = Pdf.new()
  for file in glob('*.pdf'):
  src = Pdf.open(file)
  pdf.pages.extend(src.pages)
  pdf.save('merged.pdf')

Manage Images inside PDF Document via Python

The PDF PikePDF library makes it easy for software developers to handle images inside a PDF file using Python commands. The library has included several important functions related to image handling such as copying images within PDF page, open and view PDF, resizing images, manipulating images in a PDF, extracting images from PDF, replacing images, deleting an image from PDF, and many more.

How to Extract Image & Replace It in PDF via Python?

 # Extract Image & Replace PDF Images
  import zlib
  rawimage = pdfimage.obj
  pillowimage = pdfimage.as_pil_image()
  greyscale = pillowimage.convert('L')
  greyscale = greyscale.resize((32, 32))
  rawimage.write(zlib.compress(greyscale.tobytes()), filter=Name("/FlateDecode"))
  rawimage.ColorSpace = Name("/DeviceGray")
  rawimage.Width, rawimage.Height = 32, 32

PDF Metadata Handling via Python

PDF metadata includes very useful information about a PDF document such as the author's name, date of creation & modification, keywords, copyright information, and so on. The PDF PikePDF library has included complete functionality for accessing & reading metadata, extracting metadata, delete metadata entries from PDF documents. The following code example shows how to extract metadata from PDF documents.

How to Extract PDF Metadata via Python?

 # Extract PDF Metadata
  import pikepdf
  import sys

  # get the target pdf file from the command-line arguments
  pdf_filename = sys.argv[1]
  # read the pdf file
  pdf = pikepdf.Pdf.open(pdf_filename)
  docinfo = pdf.docinfo
  for key, value in docinfo.items():
    print(key, ":", value)