Create, Edit & Convert PDF to Images via Python API
Free Python API for creating, editing and converting PDF to images and vice versa. Merge/split & convert PDF to text, Rotate /Trim/Crop PDFs.
PDFsuite is a very useful Python library that provides complete functionality related to PDF document creation and management. The library is very stable and can be easily run on MacOS (OS X). PDFsuite is very easy to use and gives users the ability to manage their PDF files and everything they could possibly want to do to a PDF is just a click away.
The library is very feature-rich and has included several important features related to PDF document handling, such as PDF documents conversion to PDF, merging or Splitting PDF documents, converting images to PDF, PDF rotation, Adding page numbers to PDF, inserting Watermarks to PDFs, draw graphics in PDFs, apply quartz filters to PDFs, access and read metadata, export PDF pages as images, count pages in PDF and many more. The PDFsuite scripts are open source are completely free for use in any kind of project and can be adapted in any way.
Getting Started with PDFsuite
To run PDFsuite first you need to install python 3 and the pyobjc library, and after that in the Terminal please type the following command and press Return.
Install pyobjc via pip
pip3 install pyobjc
It is also possible to install the library manually; download the latest release files directly from GitHub repository.
Convert PDF to Image & Other Formats
The PDFsuite has incorporated complete functionality for converting PDF documents to various image file formats. It provides PNG, JPEG, Tiff, and other popular image file formats. It is also possible to create a bitmap image from each page of the provided PDF documents with ease. Once the process is complete you need to assign a separate name to each file and save it on the disk. It also supports altering the resolution, transparency, and other parameters. It is also possible to convert PDF files to text & other file formats.
Combine Multiple PDF Files using Python Scripts
Have you ever been in a situation where it is required to combine different PDF documents to create a new PDF file? Organizations often require merging multiple PDF files into a single document. The PDFsuite library makes it easy for its users to combine multiple PDF documents into a single one with just a couple of lines of Python code. It also adds a table of contents entry for each component file. The library also fully supports splitting large PDF documents into smaller ones inside Python apps.
Merge Multiple PDF Files via Python API
def merge(filename):
writeContext = None
shortName = os.path.splitext(filename)[0]
outFilename = shortName + "+wm.pdf"
metaDict = getDocInfo(filename)
writeContext = createOutputContextWithPath(outFilename, metaDict)
readPDF = createPDFDocumentWithPath(filename)
mergePDF = createPDFDocumentWithPath(watermark)
if writeContext != None and readPDF != None:
numPages = Quartz.CGPDFDocumentGetNumberOfPages(readPDF)
for pageNum in range(1, numPages + 1):
page = Quartz.CGPDFDocumentGetPage(readPDF, pageNum)
mergepage = Quartz.CGPDFDocumentGetPage(mergePDF, 1)
if page:
mediaBox = Quartz.CGPDFPageGetBoxRect(page, Quartz.kCGPDFMediaBox)
if Quartz.CGRectIsEmpty(mediaBox):
mediaBox = None
Quartz.CGContextBeginPage(writeContext, mediaBox)
Quartz.CGContextSetBlendMode(writeContext, Quartz.kCGBlendModeOverlay)
Quartz.CGContextDrawPDFPage(writeContext, page)
Quartz.CGContextDrawPDFPage(writeContext, mergepage)
Quartz.CGContextEndPage(writeContext)
Quartz.CGPDFContextClose(writeContext)
del writeContext
else:
print ("A valid input file and output file must be supplied.")
sys.exit(1)
if __name__ == "__main__":
for filename in sys.argv[1:]:
merge(filename)
Convert PDF Files into Text File via Python
The open source PDFsuite library has included several important features for PDF document conversion to numerous support file formats. One important feature is converting the text content of a PDF file into an external text file and saving it to the place of your choice. It is also possible to save each page of the PDF documents as a separate file and save it with a different name.
Convert PDF Documents to Text File via Python API
import os, sys
from Quartz import PDFDocument
from CoreFoundation import (NSURL, NSString)
# Can't seem to import this constant, so manually creating it.
NSUTF8StringEncoding = 4
def main():
for filename in sys.argv[1:]:
shortName = os.path.splitext(filename)[0]
outputfile = shortName+" text.txt"
pdfURL = NSURL.fileURLWithPath_(filename)
pdfDoc = PDFDocument.alloc().initWithURL_(pdfURL)
if pdfDoc :
pdfString = NSString.stringWithString_(pdfDoc.string())
pdfString.writeToFile_atomically_encoding_error_(outputfile, True, NSUTF8StringEncoding, None)
if __name__ == "__main__":
main()
Rotate, Trim, Crop PDFs or Pages in Python Apps
The PDFsuite library has included several important functionalities for easily handling PDF files. It allows programmers to rotate, trim, crop, tint, watermark, scale, and rinse PDF documents inside their own Python application. It has provided two ways to rotate a PDF page or complete file. The first one is to create a new PDF context, graphically transform each page of the original and save the file. Secondly, you just need to adjust the 'rotation' parameter on each page and that’s it.
How to Trim PDF Files via Python API
import sys
import os
from Quartz import PDFDocument, kPDFDisplayBoxMediaBox, kPDFDisplayBoxTrimBox, CGRectEqualToRect
from CoreFoundation import NSURL
mediabox = kPDFDisplayBoxMediaBox
trimbox = kPDFDisplayBoxTrimBox
def trimPDF(filename):
hasBeenChanged = False
# filename = filename.decode('utf-8')
shortName = os.path.splitext(filename)[0]
outFilename = shortName + " TPS.pdf"
pdfURL = NSURL.fileURLWithPath_(filename)
pdfDoc = PDFDocument.alloc().initWithURL_(pdfURL)
if pdfDoc:
pages = pdfDoc.pageCount()
for p in range(0, pages):
page = pdfDoc.pageAtIndex_(p)
mediaBoxSize = page.boundsForBox_(mediabox)
trimBoxSize = page.boundsForBox_(trimbox)
if not CGRectEqualToRect(mediaBoxSize, trimBoxSize):
page.setBounds_forBox_(trimBoxSize, mediabox)
hasBeenChanged = True
if hasBeenChanged:
pdfDoc.writeToFile_(outFilename)
if __name__ == '__main__':
for filename in sys.argv[1:]:
trimPDF(filename)