1. Products
  2.   OCR
  3.   Python
  4.   MonkeyOCR
 
  

Open Source Python API to Build Smart OCR Apps

Free Python OCR API to Detect and Recognize Text from Images, Including Natural Scenes, Forms, and Scanned Documents inside Python Apps.

MonkeyOCR is an end-to-end Optical Character Recognition system based on deep learning techniques. It allows software developers to detect and recognize text from images, including natural scenes, forms, and scanned documents. Developed by Yuliang Liu, the library offers a powerful and flexible solution for developers looking to integrate Optical Character Recognition (OCR) into their applications. Designed with a modular and scalable architecture, MonkeyOCR combines cutting-edge deep learning techniques with a practical inference pipeline, making it ideal for a wide range of text recognition tasks in real-world scenarios. It supports real-world applications like invoice and receipt scanning, ID card and passport reader, text extraction from natural scenes (e.g., signboards), multilingual OCR projects, PDF to structured data pipelines and so on.

Built with flexibility and modularity in mind, MonkeyOCR is designed for software engineers who want to build intelligent document processing systems without relying on commercial or closed-source OCR engines. It supports various advanced features like modular OCR pipeline, configuration via YAML files, batch inference support, text box output with coordinates, text detection using DBNet++ or PSENet, text recognition using CRNN or SAR models, preprocessing and post-processing with configurable pipelines and so on. With its modular design, support for modern deep learning models, and easy configuration, it's well-suited for building real-world OCR applications ranging from document automation to mobile-based scene recognition.

Previous Next

Getting Started with MonkeyOCR

The recommend way to install MonkeyOCR is using pip. Please use the following command for a smooth installation.

Install MonkeyOCR via pip

 pip install MonkeyOCR 

Install MonkeyOCR via GitHub

 git clone https://github.com/Yuliang-Liu/MonkeyOCR.git 

You can also install it manually; download the latest release files directly from GitHub repository.

Extracting Text from a Receipt Image via Python

The open source MonkeyOCR is an end-to-end Optical Character Recognition system based on deep learning techniques. Software Developers working on apps that scan documents, IDs, receipts, or license plates can plug MonkeyOCR directly into their backend pipeline. With its modular design, you can use just the detection model or combine it with recognition to extract structured text from images. Here is a simple example that demonstrates how to extract text from a receipt image using Python API.

How to Extract Text from a Receipt Image via Python API?

from monkey_ocr.ocr_predict import OCRPredictor

ocr = OCRPredictor(det_model_path="weights/dbnet.pth", rec_model_path="weights/crnn.pth")
results = ocr.predict("receipt.jpg")

for line in results:
    print(line['text'])

Custom OCR Pipelines for Specific Use Cases

One of the biggest strengths of open source MonkeyOCR library is its modular architecture. Software developers can mix and match components such as detection, recognition, and classification models based on their application requirements. For example, a document scanning app can use a lightweight model like DBNet for detection and CRNN for recognition, optimizing both speed and accuracy.

Custom OCR Pipelines via Python API?

from monkey_ocr.ocr_predict import OCRPredictor

ocr = OCRPredictor(
    det_model_path="weights/dbnet.pth",
    rec_model_path="weights/crnn.pth"
)

results = ocr.predict("form_image.jpg")
for item in results:
    print(item["text"], item["box"])

Integration with Business Software

The open source MonkeyOCR library can also be plugged into enterprise document workflows, such as automating data entry in ERP or CRM systems. Software developers can run MonkeyOCR in the background to scan scanned PDFs or image-based documents uploaded by users, automatically extracting structured information. By configuring MonkeyOCR with a config.yaml, teams can maintain consistency across different deployments.

Build Automated Form Readers

By combining MonkeyOCR’s text detection with positional data (bounding boxes), developers can design intelligent form readers that locate fields (e.g., “Name”, “Date”, “Amount”) and extract associated data. This is ideal for tax documents, medical forms, or surveys.