Word Processing File Format APIs

Apache PDFBox

 
 

Java API for PDF Documents Processing

Open Source Java Library to Create, Print & Split or Merge PDF Documents inside Java applications.

 

Apache PDFBox is an open source pure-Java library for working with PDF documents. Using this library, Java developers can develop Java programs that create new PDF documents and manipulate existing PDF documents with ease. It also enables developers to read and extract content from PDF documents. In addition to this, PDFBox also includes a command line utility for performing various operations over PDF documents using the available Jar file.

The Portable Document Format (PDF) is a file format that helps to present data in a manner that is independent of Application software, hardware, and operating systems. Apache PDFBox supports several advanced features, such as create, render, print, split, merge, alter, verify and extract text and meta-data of PDF files.

Getting Started with Apache PDFBox

First of all, you need to download the latest release from PDFBox download page. To build PDFBox successfully you need to install Java 7 or higher and Maven 3 to. Use The following build command

Installation command

 mvn clean instal  

The command will compile Java sources & going to package the binary classes into jar packages by default.

Apache PDFBox Maven Dependency

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.17</version>
</dependency>

Java API to Create and Modify New PDF Documents

Apache PDFBox allows programmers to generate a new PDF document from scratch. After creating the document, developers can save the document in the desired location. PDF is one of the most commonly used file formats nowadays. PDF document are compatible across various platforms and represents a document independently of the hardware, operating system, and application software used to create it. PDFBox also facilitates developers to modify existing PDF documents. Developers can add new pages as well as text to an existing page documents.

Create PDF Document - Java

// Create a new PDF document
PDDocument document = new PDDocument();
// Save document
document.save("fileformat.pdf");
// Close document
document.close();

Splitting and Merging PDF Documents using Java Library

Apache PDFBox provides the capability to merge multiple PDF documents into a single PDF document. To merge multiple documents first you need to load existing PDF documents and then set a path to the destination file.  After that developers can add all the source PDF files in the sequence they would like to find in the final merged PDF file. We can split the given PDF document into multiple PDF files. This Splitter class is used to split the given PDF document into several other separate documents.

Merge PDF Documents - Java

// Initialize PDFMergerUtility object
PDFMergerUtility pdfMergerUtility = new PDFMergerUtility();
// Set output file path
pdfMergerUtility.setDestinationFileName("merged.pdf");
// Add source documents
pdfMergerUtility.addSource(new File("document1.pdf"));
pdfMergerUtility.addSource(new File("document2.pdf"));
// Merger documents
pdfMergerUtility.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());

Add and Extract Images to PDF Documents inside Java Apps

Apache PDFBox facilitates Java developers to insert images to an existing PDF document. Images always add real value to the piece of content. Images help us learn, grab attention, explain concepts and inspire. PDFBox provides a library for inserting an image into a PDF document. This library uses the Java program for inserting images in the PDF document. The API also enables developers to extract images from the existing PDF document and store it on the local disk.

Add Images in PDF - Java

// Create a new PDF document
PDDocument document = new PDDocument();
// Create a new page
PDPage page = new PDPage();
// Add page
document.addPage(page);
// Initialize PDImageXObject object
PDImageXObject pdImage = PDImageXObject.createFromFile("logo.png",document);
// Initialize PDPageContentStream object
PDPageContentStream contents = new PDPageContentStream(document, page);
// Drawing image
contents.drawImage(pdImage, 70, 250);
// Close contents
contents.close();
// save document
document.save("image.pdf");

Print PDF Documents in Various Ways using Java Library

Apache PDFBox enables Java developers to print a PDF document using the standard Java printing API. It allows developers to print PDF documents in various ways. Developers can now print the document at its actual size which is the recommended way to print. It supports printing with a print preview dialog as well as custom attributes. Developers can also print PDF documents using a custom page size and custom margins.