Java API for PDF Documents Processing
Open Source Java Library to Create, Print & Split or Merge PDF Documents inside Java applications.
Apache PDFBox is an open source pure-Java library for working with PDF documents. Using this library, Java developers can develop Java programs that create new PDF documents and manipulate existing PDF documents with ease. It also enables developers to read and extract content from PDF documents. In addition to this, PDFBox also includes a command line utility for performing various operations over PDF documents using the available Jar file.
The Portable Document Format (PDF) is a file format that helps to present data in a manner that is independent of Application software, hardware, and operating systems. Apache PDFBox supports several advanced features, such as create, render, print, split, merge, alter, verify and extract text and meta-data of PDF files.
At A Glance
An overview of Apache PDFBox features.
- PDF to text extraction
- Merge PDF Documents
- PDF Encryption
- PDF Decryption
- PDF Searching
- Work with FDF form
- Create PDF from text
- PDF to image export
- Print PDF document
Apache PDFBox only requires Java runtime.
Getting Started with Apache PDFBox
First of all, you need to download the latest release from PDFBox download page. To build PDFBox successfully you need to install Java 7 or higher and Maven 3 to. Use The following build command
mvn clean instal
The command will compile Java sources & going to package the binary classes into jar packages by default.
Java API to Create and Modify New PDF Documents
Apache PDFBox allows programmers to generate a new PDF document from scratch. After creating the document, developers can save the document in the desired location. PDF is one of the most commonly used file formats nowadays. PDF document are compatible across various platforms and represents a document independently of the hardware, operating system, and application software used to create it. PDFBox also facilitates developers to modify existing PDF documents. Developers can add new pages as well as text to an existing page documents.
Create PDF Document - Java
// Create a new PDF document PDDocument document = new PDDocument(); // Save document document.save("fileformat.pdf"); // Close document document.close();
Splitting and Merging PDF Documents using Java Library
Apache PDFBox provides the capability to merge multiple PDF documents into a single PDF document. To merge multiple documents first you need to load existing PDF documents and then set a path to the destination file. After that developers can add all the source PDF files in the sequence they would like to find in the final merged PDF file. We can split the given PDF document into multiple PDF files. This Splitter class is used to split the given PDF document into several other separate documents.
Merge PDF Documents - Java
// Initialize PDFMergerUtility object PDFMergerUtility pdfMergerUtility = new PDFMergerUtility(); // Set output file path pdfMergerUtility.setDestinationFileName("merged.pdf"); // Add source documents pdfMergerUtility.addSource(new File("document1.pdf")); pdfMergerUtility.addSource(new File("document2.pdf")); // Merger documents pdfMergerUtility.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
Add and Extract Images to PDF Documents inside Java Apps
Apache PDFBox facilitates Java developers to insert images to an existing PDF document. Images always add real value to the piece of content. Images help us learn, grab attention, explain concepts and inspire. PDFBox provides a library for inserting an image into a PDF document. This library uses the Java program for inserting images in the PDF document. The API also enables developers to extract images from the existing PDF document and store it on the local disk.
Add Images in PDF - Java
// Create a new PDF document PDDocument document = new PDDocument(); // Create a new page PDPage page = new PDPage(); // Add page document.addPage(page); // Initialize PDImageXObject object PDImageXObject pdImage = PDImageXObject.createFromFile("logo.png",document); // Initialize PDPageContentStream object PDPageContentStream contents = new PDPageContentStream(document, page); // Drawing image contents.drawImage(pdImage, 70, 250); // Close contents contents.close(); // save document document.save("image.pdf");
Print PDF Documents in Various Ways using Java Library
Apache PDFBox enables Java developers to print a PDF document using the standard Java printing API. It allows developers to print PDF documents in various ways. Developers can now print the document at its actual size which is the recommended way to print. It supports printing with a print preview dialog as well as custom attributes. Developers can also print PDF documents using a custom page size and custom margins.