1. Products
  2.   PDF
  3.   Python
  4.   Pdfrw
 
  

Open Source Python Library for Converting PDF Files

Free Python API allows Developers to Export, Rotates, Merge and Concatenate PDF Files, Extract Data & Elements from PDFs.

pdfrw is an open source pure Python library that gives software developers to read and write PDF files without installing any external special software. pdfrw programming library is very simple to use and the source code is well documented, very simple, and easy to understand. The library has included proper Unicode support for text strings in PDFs as well as the fastest pure Python PDF parser.

pdfrw library includes support for several important PDF operations such as merging PDFs, modifying metadata, concatenating multiple PDFs together, extracting images, PDF printing, Rotating PDF pages, Creating a new PDF, Adding a watermark PDF image, and many more.

.

Previous Next

Getting Started with pdfrw

pdfrw requires Python 2.6, 2.7, 3.3, 3.4, 3.5, and 3.6. You can install pdfrw using pip. Please use the following command to install it.

Install pdfrw  via pip

 python -m pip install pdfrw  

Create PDF Documents via Python Library

pdfrw library provides software developers the capability to create Create PDF Documents inside their own Python applications with just a couple of lines of code. The library also provides support for accessing and modifying existing PDF files. You can easily insert new pages as well as graphics components or text elements into the existing PDF. pdfrw library provides support to find the pages in PDF files you read in, and to write a set of pages back out to a new PDF file.

Create & Alter PDF Documents via Python

 // PDF Documents Creation 
  import sys
  import os
  
  from pdfrw import PdfReader, PdfWriter
  
  inpfn, = sys.argv[1:]
  outfn = 'alter.' + os.path.basename(inpfn)
  
  trailer = PdfReader(inpfn)
  trailer.Info.Title = 'My New Title Goes Here'
  PdfWriter(outfn, trailer=trailer).write() 

Reading PDF Files via Python

pdfrw library gives software developers to easily access and read different parts of PDF documents inside Python applications. It gives easy access to the entire PDF document. The library supports retrieving file information, size, and more. It creates a special attribute named pages, which allows users to list all the pages of a PDF document. It lets you extract a document information object that you can use to pull out information like author, title, etc.

Access & Read PDF Files via Python

 // Reading PDF Files
  from pdfrw import pdfreader

  def get_pdf_info(path):
    pdf = pdfreader(path)

    print(pdf.keys())
    print(pdf.info)
    print(pdf.root.keys())
    print('pdf has {} pages'.format(len(pdf.pages)))

  if __name__ == '__main__':
    get_pdf_info('w9.pdf')

Adding or Modifying Metadata

pdfrw allows software developers to add or modify metadata of PDF files inside their own Python applications. You can alter a single metadata item in a PDF, writes the result to a new PDF as well as can make include multiple files, and concatenate them after adding some nonsensical metadata to the output PDF file.

Modify PDF Metadata via Python

 // Modifying PDF Metadata
  import sys
  import os

  from pdfrw import PdfReader, PdfWriter

  inpfn, = sys.argv[1:]
  outfn = 'alter.' + os.path.basename(inpfn)

  trailer = PdfReader(inpfn)
  trailer.Info.Title = 'My New Title Goes Here'
  PdfWriter(outfn, trailer=trailer).write() 

Splitting PDF Documents

pdfrw allows software developers to programmatically Split PDF Documents documents inside their applications. A user may require extracting a specific part of a PDF book or dividing it into multiple PDFs instead of storing them in one file. It is very easy with pdfrw library, you just need to provide an input PDF file path, the number of pages that you want to extract, and the output path.

 

Split PDF File to Multiple PDFs via Python

 // Splitting PDF file into multiple pdfs
  from pdfrw import pdfreader, pdfwriter

  def split(path, number_of_pages, output):
    pdf_obj = pdfreader(path)
    total_pages = len(pdf_obj.pages)

    writer = pdfwriter()

    for page in range(number_of_pages):
      if page <= total_pages:
        writer.addpage(pdf_obj.pages[page])

    writer.write(output)

  if __name__ == '__main__':
    split('reportlab-sample.pdf', 10, 'subset.pdf')
 English