Apache POI HDGF
Open Source Java Library for Visio Diagrams
Read & Extract textual contents stored in Microsoft Visio Binary format via Free Java APIs.
Apache POI HDGF (Horrible DiaGram Format) is a pure Java implementation for Microsoft Visio binary (VSD) files. This module is young and its capabilities are limited at this time, however, it provides low-level access to the streams, chunks, and chunk commands in order to provide a way to extract the textual content from the file.
Getting Started with Apache POI HDGF
First of all, you need to have the Java Development Kit (JDK) installed on your system. If you already have it then proceed to the Apache POI's download page to get the latest stable release in an archive. Extract the contents of the ZIP file in any directory from where the required libraries can be linked to your Java program. That is all!
Referencing Apache POI in your Maven-based Java project is even simpler. All you need is to add the following dependency in your pom.xml and let your IDE fetch and reference the Apache POI Jar files.
Apache POI Maven Dependency
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-scratchpad</artifactId>
<version>4.1.0</version>
</dependency>
Extract Textual Content from Visio Diagram via Java
Apache POI HDGF has provided basic text extraction for all the project supported file formats. POI-HDGF allows software developers to extract the textual content from a Visio file. Developers need to iterate through the Visio diagram pages to cover the whole Visio diagram text. They can use VisioTextExtractor class to locate all the text entries in a Visio file and returns their contents. It Returns the textual contents of the file. Each textual object's text will be separated by a newline.
Extract Text from VSD - Java
// open VSD file
VisioTextExtractor extractor = new VisioTextExtractor(new FileInputStream("sample.vsd"));
// read text
System.out.println(extractor.getAllText());
Java APIs to Access & Read Microsoft Visio Diagrams
Apache POI-HDGF enables programmers to Access Visio documents in VSD file formats. Developers can read the contents of a Visio diagram. As the API is at a very early stage, therefore, the available features are limited at this time.