Develop Apps to Work with PDFs via Python Library
Open Source Python API capable of Splitting, Merging, Cropping, and Transforming the pages of PDF files, add custom data & Passwords to PDF.
PyPDF2 is an open source pure Python library that provides the capability to work with PDF files inside Python applications without any external dependencies. The library has included support for numerous important PDF features such as merging multiple PDF files, extracting the content of PDF file, rotate PDF file pages by an angle, scaling of PDF pages, transforming the pages of PDF files, extracting images from PDF pages and many more.
The open source programming library PyPDF2 is very easy to use and the source code is well documented and easy to understand. The library enables developers to read and extract PDF Files metadata such as the number of pages, author, creator, created and last updated time, etc. The library also supports encrypting and decrypting PDF files with just a couple of lines of Python code.
At A Glance
An overview of PyPDF2 features.
- Create PDF
- Transform PDF Pages
- Split PDFs
- Merge PDFs
- Embedding hyperlinks
- Insert circles
- Add custom data
- Add shapes
- Unicode support
- Font embedding
- Encrypt PDF
- Embedding images
- Add passwords
PyPDF2 is tested with Python 2.6 and higher.
- Python 2.6 & higher
Getting Started with PyPDF2
PyPDF2 doesn’t come as a part of the Python Standard Library, so you will need to install it yourself. The preferred way to do so is to use pip.
Install PyPDF2 via pip
python -m pip install pypdf2
Extract Text and Metadata from PDF
The PyPDF2 library provides the capability for programmatically extracting metadata as well as text from PDF files via Python. It allows developers to retrieve information about pages in the PDF file, PDF author, title, creator app, and creation dates. Developers can also extract the text of the pages with ease. They can use the extractText() method on the page object to get the text content of the page.
Reading PDF Files via Python
The PyPDF2 library gives software developers the power to easily open and read different parts of PDF documents inside Python applications. It gives developers the ability to access to retrieve information about file size, number of pages, and other document properties. You can also retrieve text and metadata from PDFs as well as merge entire files together.
Merge or Split PDF Documents
Have you ever been in a situation where you needed to merge two or more PDF files into a single document? The organization often requires merging multiple PDF files into a single document. The PyPDF2 library provides the capability to combine PDF files with just a couple of lines of Python code. Developers can also easily split large PDF documents into smaller ones according to their needs. Developers can easily extract a specific part of a PDF book or divided it into multiple PDFs
Extract Metadata from PDF Files
The PyPDF2 library has included functionality for extracting Metadata from PDF documents by using a couple of Python commands. You can easily get information about the author, the creator app, number of pages, document title, and creation dates, etc. You can easily extract metadata of PDF documents and use it according to your needs.