Create, Edit & Convert PDF to Images via Python API

Free Python API for creating, editing and converting PDF to images and vice versa. Merge/split & convert PDF to text, Rotate /Trim/Crop PDFs.

PDFsuite is a very useful Python library that provides complete functionality related to PDF document creation and management. The library is very stable and can be easily run on MacOS (OS X). PDFsuite is very easy to use and gives users the ability to manage their PDF files and everything they could possibly want to do to a PDF is just a click away.

The library is very feature-rich and has included several important features related to PDF document handling, such as PDF documents conversion to PDF, merging or Splitting PDF documents, converting images to PDF, PDF rotation, Adding page numbers to PDF, inserting Watermarks to PDFs, draw graphics in PDFs, apply quartz filters to PDFs, access and read metadata, export PDF pages as images, count pages in PDF and many more. The PDFsuite scripts are open source are completely free for use in any kind of project and can be adapted in any way.

At A Glance

An overview of PDFsuite features.

Features Overview

Create PDF
Trim PDF Pages
PDF to Images
PDF to text
Image to PDF
Split PDFs
Merge PDFs
PDF rotation
Draw graphics
Add shapes
Rotate PDFs
Font embedding
Encrypt PDF
Embedding images
PDF Pages as images
Read metadata

PDFsuite

PDFsuite supports PDF file format as well as industry-standard formats for export.

Reader

Writer

TXT, HTML, PNG, JPEG, TIFF

PDFsuite

Platform Independence

PDFsuite is tested with Python 3.0 and higher.

Python 3.0 & higher

PDFsuite

Getting Started with PDFsuite

To run PDFsuite first you need to install python 3 and the pyobjc library, and after that in the Terminal please type the following command and press Return.

Install pyobjc via pip

 pip3 install pyobjc

It is also possible to install the library manually; download the latest release files directly from GitHub repository.

Convert PDF to Image & Other Formats

The PDFsuite has incorporated complete functionality for converting PDF documents to various image file formats. It provides PNG, JPEG, Tiff, and other popular image file formats. It is also possible to create a bitmap image from each page of the provided PDF documents with ease. Once the process is complete you need to assign a separate name to each file and save it on the disk. It also supports altering the resolution, transparency, and other parameters. It is also possible to convert PDF files to text & other file formats.

Combine Multiple PDF Files using Python Scripts

Have you ever been in a situation where it is required to combine different PDF documents to create a new PDF file? Organizations often require merging multiple PDF files into a single document. The PDFsuite library makes it easy for its users to combine multiple PDF documents into a single one with just a couple of lines of Python code. It also adds a table of contents entry for each component file. The library also fully supports splitting large PDF documents into smaller ones inside Python apps.

Merge Multiple PDF Files via Python API

def merge(filename):

	writeContext = None
			
	shortName = os.path.splitext(filename)[0]
	outFilename = shortName + "+wm.pdf"
	metaDict = getDocInfo(filename)

	writeContext = createOutputContextWithPath(outFilename, metaDict)
	readPDF = createPDFDocumentWithPath(filename)
	mergePDF = createPDFDocumentWithPath(watermark)
	
	if writeContext != None and readPDF != None:
		numPages = Quartz.CGPDFDocumentGetNumberOfPages(readPDF)
		for pageNum in range(1, numPages + 1):	
			page = Quartz.CGPDFDocumentGetPage(readPDF, pageNum)
			mergepage = Quartz.CGPDFDocumentGetPage(mergePDF, 1)
			if page:
				mediaBox = Quartz.CGPDFPageGetBoxRect(page, Quartz.kCGPDFMediaBox)
				if Quartz.CGRectIsEmpty(mediaBox):
					mediaBox = None			
				Quartz.CGContextBeginPage(writeContext, mediaBox)	
				Quartz.CGContextSetBlendMode(writeContext, Quartz.kCGBlendModeOverlay)

				Quartz.CGContextDrawPDFPage(writeContext, page)
				Quartz.CGContextDrawPDFPage(writeContext, mergepage)
				Quartz.CGContextEndPage(writeContext)
		Quartz.CGPDFContextClose(writeContext)
		del writeContext
			
	else:
		print ("A valid input file and output file must be supplied.")
		sys.exit(1)

if __name__ == "__main__":
	for filename in sys.argv[1:]:
		
		merge(filename)

Convert PDF Files into Text File via Python

The open source PDFsuite library has included several important features for PDF document conversion to numerous support file formats. One important feature is converting the text content of a PDF file into an external text file and saving it to the place of your choice. It is also possible to save each page of the PDF documents as a separate file and save it with a different name.

Convert PDF Documents to Text File via Python API

import os, sys
from Quartz import PDFDocument
from CoreFoundation import (NSURL, NSString)

# Can't seem to import this constant, so manually creating it.
NSUTF8StringEncoding = 4

def main():
	for filename in sys.argv[1:]:	
		shortName = os.path.splitext(filename)[0]
		outputfile = shortName+" text.txt"
		pdfURL = NSURL.fileURLWithPath_(filename)
		pdfDoc = PDFDocument.alloc().initWithURL_(pdfURL)
		if pdfDoc :
			pdfString = NSString.stringWithString_(pdfDoc.string())
			pdfString.writeToFile_atomically_encoding_error_(outputfile, True, NSUTF8StringEncoding, None)

if __name__ == "__main__":
   main()

Rotate, Trim, Crop PDFs or Pages in Python Apps

The PDFsuite library has included several important functionalities for easily handling PDF files. It allows programmers to rotate, trim, crop, tint, watermark, scale, and rinse PDF documents inside their own Python application. It has provided two ways to rotate a PDF page or complete file. The first one is to create a new PDF context, graphically transform each page of the original and save the file. Secondly, you just need to adjust the 'rotation' parameter on each page and that’s it.

How to Trim PDF Files via Python API

import sys
import os
from Quartz import PDFDocument, kPDFDisplayBoxMediaBox, kPDFDisplayBoxTrimBox, CGRectEqualToRect
from CoreFoundation import NSURL

mediabox = kPDFDisplayBoxMediaBox
trimbox = kPDFDisplayBoxTrimBox
	
def trimPDF(filename):
	hasBeenChanged = False
	# filename = filename.decode('utf-8')
	shortName = os.path.splitext(filename)[0]
	outFilename = shortName + " TPS.pdf"
	pdfURL = NSURL.fileURLWithPath_(filename)
	pdfDoc = PDFDocument.alloc().initWithURL_(pdfURL)
	if pdfDoc:
		pages = pdfDoc.pageCount()
		for p in range(0, pages):
			page = pdfDoc.pageAtIndex_(p)
			mediaBoxSize = page.boundsForBox_(mediabox)
			trimBoxSize = page.boundsForBox_(trimbox)
			if not CGRectEqualToRect(mediaBoxSize, trimBoxSize):
				page.setBounds_forBox_(trimBoxSize, mediabox)
				hasBeenChanged = True
		if hasBeenChanged:
			pdfDoc.writeToFile_(outFilename)

if __name__ == '__main__':
	for filename in sys.argv[1:]:
		trimPDF(filename)