1. Products
  2.   Word Processing
  3.   Java
  4.   Apache POI HWPF



Process Microsoft Word Binary Documents

Create, Read, Manipulate & Convert DOC files via Open Source Java Library.

Apache POI HWPF is an Apache POI port for Microsoft Word DOC file format. It provides functionality for reading and writing DOC files without needing any additional libraries. It also provides limited read-only support for the older Word 6 and Word 95 file formats. At this stage, HWPF is mainly concerned with formatted text. It provides basic text extraction, specific text extraction, access to header & footers, and changing text features.

It facilitates developers to create MS-Word Documents with the ability to manipulate paragraphs, add different styles to text, add a table, extract text, and much more.

Previous Next

Getting Started with Apache POI HWPF

First of all, you need to have the Java Development Kit (JDK) installed on your system. If you already have it then proceed to the Apache POI's download page to get the latest stable release in an archive. Extract the contents of the ZIP file in any directory from where the required libraries can be linked to your Java program. That is all!

Referencing Apache POI in your Maven-based Java project is even simpler. All you need is to add the following dependency in your pom.xml and let your IDE fetch and reference the Apache POI Jar files.

Apache POI Maven Dependency

<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->

Create and Modify Word Documents using Java APIs

Apache POI HWPF enables programmers to create new Word Documents in DOC file formats. The API also allows developers to modify existing Word Documents according to their own needs. The API also supports adding a paragraph in a Word document, applying text alignments & font styles, and much more.

Modify DOC file - Java

// open an empty doc file, using APACHE POI we cannot create .doc file format from scratch
HWPFDocument doc = new HWPFDocument(new FileInputStream("empty.doc"));
Range range = doc.getRange();
// inset text
CharacterRun run = range.insertAfter("File Format Developer Guide - " +
"Learn about computer files that you come across in " +
"your daily work at: www.fileformat.com ");
OutputStream out = new FileOutputStream("document.pdf");
// save document

Convert Word Documents to Other Formats using Java

pache POI HWPF enables Software developers to convert Microsoft word documents to any supported file formats with ease. At the moment Java developers can convert Word documents to HTML, FO, and Text format. The org.apache.poi.hwpf.converter package contains Word-to-HTML and Word-to-FO converters.

Convert DOC to HTML

// load document
HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(new FileInputStream("document.doc"));
Document newDocument = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
// initialize WordToHtmlConverter
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(newDocument );
// process document
wordToHtmlConverter.processDocument( wordDocument );
StringWriter stringWriter = new StringWriter();
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty( OutputKeys.INDENT, "yes" );
transformer.setOutputProperty( OutputKeys.ENCODING, "utf-8" );
transformer.setOutputProperty( OutputKeys.METHOD, "html" );
    new DOMSource( wordToHtmlConverter.getDocument() ),
    new StreamResult( stringWriter ) );
// get html
String html = stringWriter.toString();

Read Text From DOC File

Apache POI HWPF provides WordExtractor class to read text from Microsoft Word DOC file format. You can extract text from the file with only a few lines of code.

Extract text from a DOC file

// load DOC file
FileInputStream fis = new FileInputStream(new File("document.doc"));
// open file
HWPFDocument doc = new HWPFDocument(fis);
// read text
WordExtractor extractor = new WordExtractor(doc);
// display text

Add Custom Header & Footer to DOC

Apache POI HWPF enables Java developers to create custom headers and footers inside Word documents. Apache POI HWPF is described as "moderately functional". It provides support for basic text extraction, specific text extraction, access to header & footers, and changing text features. The getText() method can be used to get the text from all the paragraphs, or getParagraphText() can be used to fetch the text from each paragraph in turn. 

Manage Custom Header & Footer in Word DOC File

// The path to the documents directory.
String dataDir = Utils.getDataDir(ApacheHeaders.class);

POIFSFileSystem fs = null;

fs = new POIFSFileSystem(new FileInputStream(dataDir + "MyHeader.doc"));
HWPFDocument doc = new HWPFDocument(fs);

int pageNumber = 1;

HeaderStories headerStore = new HeaderStories(doc);
String header = headerStore.getHeader(pageNumber);

System.out.println("Header Is: " + header);