Free Optical Character Recognition API for Chinese Manuscripts

Open Source C++ OCR to perform OCR operations on Chinese Manuscripts using template-based matching, where characters are matched pixel-wise against known samples.

What is Free C++ OCR?

Optical Character Recognition (OCR) in non-Latin scripts—especially Chinese—presents unique challenges due to the complexity and variety of characters. Fortunately, the open-source OCR C++ API by Sebastian Starke offers a clean, extensible solution tailored for recognizing printed or handwritten Chinese characters, especially from manuscript sources. Designed with simplicity and adaptability in mind, this lightweight API enables software developers to experiment with character recognition using classical image processing methods rather than heavy machine learning models.

This project isn't a full-scale neural OCR engine like Tesseract. Instead, it takes a different route: it uses template-based matching, where characters are matched pixel-wise against known samples. That makes it ideal for educational use, controlled environments, or specific recognition tasks like historical manuscript analysis, traditional Chinese typesets, or simplified handwriting datasets. This project is particularly suitable for developers working on low-resource environments, such as embedded Linux systems, Raspberry Pi, or industrial scanning devices. Since it does not rely on heavyweight dependencies or deep learning libraries.

At A Glance

An overview of OCR features.

Features Overview

Chines-OCR
Add OCR Capabilities
Recognize Image text
Convet images of text
Recognized Font text
Other Languages Support
Create OCR apps
Image Preprocessing
Extract Text
Multi-threading Support

OCR

OCR supports popular image file formats listed below.

Reader

PNG, JPEG, BMP, TIFF, TGA, DICOM

Writer

PNG, JPEG, BMP, TIFF

OCR

Platform Independence

OCR can work with any C++ based programming language.

C++ runtime.

OCR

Getting Started with OCR

The recommend way to install OCR is using GitHub. Please use the following command for a smooth installation.

Install OCR API via GitHub

 git clone https://github.com/sebastianstarke/OCR.git

You can also install it manually; download the latest release files directly from GitHub repository.

Template-Based OCR Engine

The open source optical character recognition (OCR) in Chinese manuscripts C++ API has provided complete support for using Template-Based OCR Engine inside C++ apps. At the heart of this library lies a classic image comparison system: character images are binarized and then compared against templates using a distance metric (typically pixel-wise comparison). For Chinese, this is particularly useful when dealing with consistent calligraphy or printed manuscripts.

How to Perform Template-Based OCR Recognition in C++ apps?

OCR::TemplateCollection templates;
templates.loadFromFolder("templates/"); // Load preprocessed characters

OCR::Recognizer recognizer(templates);
std::string recognizedText = recognizer.recognizeFromImage("scanned_page.png");

Image Preprocessing Support

The open source OCR library has provide complete support for image preprocessing functionality inside C++ applications. The library offers basic preprocessing like thresholding and cropping to clean up noisy inputs. Chinese manuscripts are often written on aged paper, so image cleanup is essential for accurate results. The following example demonstrates, how with just a couple of lines of C++ code software developers can perform image preprocessing.

How to Perform Image Preprocessing before OCR Operations inside C++ APPs?

OCR::ImageProcessor processor;
cv::Mat cleanImage = processor.binarize("raw_scan.png");

Morphological Transformations Support

The process begins with a series of morphological transformations. These are fundamental image processing operations that modify the geometry of features in an image. In this context, they are used to clean up the manuscript image, removing noise, and preparing the characters for segmentation.