Open Source Java API for Word DOCX Documents

Create, Read, Edit and Convert Microsoft Word DOCX files, add text & tables via Java library.

DOCX4J is JAXB-based open source(Apache v2) library for manipulating Microsoft Office file formats. It provides the functionality to read, write, edit & and save Microsoft Word 2007 DOCX file format.

DOCX4J is similar to Microsoft's OpenXML SDK, but for Java. It uses JAXB to create the in-memory object representation. Using the API you can generate Mircosoft Office documents, edit them, format the text & paragraphs, insert tables & images and manage other form elements, and much more. Basically, its emphasis is on power, if the format supports it you can do it using the API.

Previous Next

Getting Started with DOCX4J

First of all, you need to have the Java Development Kit (JDK) installed on your system. Referencing DOCX4J in your Maven-based Java project is even simpler. All you need is to add the following dependency in your pom.xml and let your IDE fetch and reference the DOCX4J Jar files.

DOCX4J Maven Dependency




Add Paragraph, Image & Table to Word Documents

DOCX4J allows the developers to add paragraphs & images to Word documents. The API also provides the feature to add tables to DOCX documents while making it possible to create simple and nested tables with user-defined data.

Create DOCX Free using DOCX4J - Java

// Create word package
WordprocessingMLPackage wordPackage = WordprocessingMLPackage.createPackage();
// Create main document part
MainDocumentPart mainDocumentPart = wordPackage.getMainDocumentPart();
// Add Paragraph
mainDocumentPart.addParagraphOfText("Open Source Java API for Word DOCX Documents");
// Save file File("FileFormat.docx"));

Extract Text from DOCX

DOCX4J provides the specialized class to extract data from Microsoft Word DOCX documents with just a few lines of code. In the same way, it can also extract headings, footnotes, table data, and so on from a Word file.

Extract Text from DOCX Free - Java

// Load document
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File("FileFormat.docx"));
// Load main document part
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
// Extract nodes
String textNodesXPath = "//w:t";
List<Object> textNodes= mainDocumentPart.getJAXBNodesViaXPath(textNodesXPath, true);
// Print text
for (Object obj : textNodes) {
  Text text = (Text) ((JAXBElement) obj).getValue();
  String textValue = text.getValue();

Generate & Edit Word Documents using Java API

DOCX4J enables software programmers to create new Word Documents in DOCX file format. Developers can also load an existing Microsoft Word DOCX file to edit it according to their application needs. It allows you to add new paragraphs, insert text, apply text alignment & borders, change text styling, and more.

Convert Microsoft Word Docx Documents to PDF

The open source Java library docx4j provides complete support for Microsoft Word docx documents generation & conversion to various popular formats. There docx4j provides 3 different ways to convert Microsoft Word docx documents to PDF. The following example uses documents4j (running remotely) to convert a docx file to PDF.

EWord Docx Documents Converion to PDF via Java

public class DocxFileToPDF {
public static void main(String[] args) throws IOException, Docx4JException {
File output = new File(System.getProperty("user.dir")+"/result.pdf");
FileOutputStream fos = new FileOutputStream(output); 
Documents4jRemoteServices exporter = new Documents4jRemoteServices();

exporter.export(new File(System.getProperty("user.dir")+"/../docx4j-samples-docx4j/sample-docs/sample-docx.docx") , fos, DocumentType.MS_WORD);