Word Processing File Format APIs

Apache POI XWPF

 
 

Java API for Word OOXML Documents

Open Source solution to Create, Read, Edit and Convert Microsoft Word DOCX files in Java applications.

 

Apache POI XWPF provides the functionality to read & write Microsoft Word 2007 DOCX file format. XWPF has a fairly stable core API, providing access to the main parts of a Word DOCX file. It can be used for basic & specific text extraction, manipulation of header & footer, text manipulation & styling features. 

Apache POI XWPF is more renowned for Microsoft Word file generation & document editing, formatting of text & paragraphs, image insertion, table creation & parsing, mail merge features, management of form elements and much more.

Getting Started with Apache POI XWPF

First of all, you need to have the Java Development Kit (JDK) installed on your system. If you already have it then proceed to the Apache POI's download page to get the latest stable release in an archive. Extract the contents of the ZIP file in any directory from where the required libraries can be linked to your Java program. That is all!

Referencing Apache POI in your Maven-based Java project is even simpler. All you need is to add the following dependency in your pom.xml and let your IDE fetch and reference the Apache POI Jar files.

Apache POI Maven Dependency

<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>4.1.0</version>
</dependency>

Generate & Edit Word Documents using Java API

Apache POI XWPF enables the software programmers to create new Word Documents in DOCX file format. Developers can also load an existing Microsoft Word DOCX file to edit it according to their application needs. It allows to add new paragraphs, insert text, apply text alignment & borders, change text styling and more.

Generate a DOCX file from scratch

// initialize a blank document
XWPFDocument document = new XWPFDocument();
// create a new file
FileOutputStream out = new FileOutputStream(new File("document.docx"));
// create a new paragraph paragraph
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("File Format Developer Guide -  " +
    "Learn about computer files that you come across in " +
    "your daily work at: www.fileformat.com ");
document.write(out);
out.close();

Add Paragraph, Image & Table to Word Documents

Apache POI XWPF allows the developers to add paragraphs & images to Word documents. The API also provides the feature to add tables to DOCX document while making it possible to create simple and nested tables with user-defined data.

Create a new DOCX file with a table

// initialize a blank document
XWPFDocument document = new XWPFDocument();
// create a new file
FileOutputStream out = new FileOutputStream(new File("table.docx"));
// create a new table
XWPFTable table = document.createTable();

// create first row
XWPFTableRow tableRowOne = table.getRow(0);
tableRowOne.getCell(0).setText("Serial No");
tableRowOne.addNewTableCell().setText("Products");
tableRowOne.addNewTableCell().setText("Formats");

// create second row
XWPFTableRow tableRowTwo = table.createRow();
tableRowTwo.getCell(0).setText("1");
tableRowTwo.getCell(1).setText("Apache POI XWPF");
tableRowTwo.getCell(2).setText("DOCX, HTML, FO, TXT, PDF");

// create third row
XWPFTableRow tableRowThree = table.createRow();
tableRowThree.getCell(0).setText("2");
tableRowThree.getCell(1).setText("Apache POI HWPF");
tableRowThree.getCell(2).setText("DOC, HTML, FO, TXT");

document.write(out);
out.close();

Extract Text from Word OOXML Document

Apache POI XWPF provides the specialized class to extract data from Microsoft Word DOCX documents with just a few lines of code. In the same way, it can also extract headings, footnotes, table data and so on from a Word file.

Extract text from a Word file

// load DOCX file
FileInputStream fis = new FileInputStream("document.docx");
// open file
XWPFDocument file   = new XWPFDocument(OPCPackage.open(fis));
// read text
XWPFWordExtractor ext = new XWPFWordExtractor(file);
// display text
System.out.println(ext.getText());

Add Custom Header & Footer to DOCX Documents

Header and footer are an important part of the Word document as those usually contain extra information such as dates, page numbers, author's name and footnotes, which help in keeping longer documents organized and easier to read. Apache POI XWPF allows Java developers to add custom headers and footers to Word documents.