Open Source Python API to Build Smart OCR Apps
Free Python OCR API to Detect and Recognize Text from Images, Including Natural Scenes, Forms, and Scanned Documents inside Python Apps.
What is MonkeyOCR?
MonkeyOCR is an advanced, end-to-end Optical Character Recognition system built on deep learning for software developers seeking a powerful and flexible solution. Developed by Yuliang Liu, this library enables the precise detection and recognition of text from diverse sources, including natural scenes, forms, and scanned documents. Its modular and scalable architecture merges cutting-edge deep learning techniques with a robust inference pipeline, making it exceptionally suited for real-world text recognition tasks. Practical applications range from invoice scanning and ID card reading to extracting text from signboards and constructing multilingual OCR or PDF-to-data pipelines.
Engineered for maximum flexibility, MonkeyOCR empowers software engineers to create intelligent document processing systems independent of commercial OCR engines. It boasts an array of advanced features, such as a fully modular OCR pipeline, simple YAML file configuration, and efficient batch inference support. The system delivers precise text box outputs with coordinates, utilizing modern models like DBNet++ for detection and CRNN for recognition, all within a configurable pre- and post-processing framework. This combination of modular design, support for contemporary models, and ease of configuration makes MonkeyOCR perfectly suited for building sophisticated, real-world applications—from enterprise document automation to mobile-based scene text recognition.
Getting Started with MonkeyOCR
The recommend way to install MonkeyOCR is using pip. Please use the following command for a smooth installation.
Install MonkeyOCR via pip
pip install MonkeyOCR Install MonkeyOCR via GitHub
git clone https://github.com/Yuliang-Liu/MonkeyOCR.git You can also install it manually; download the latest release files directly from GitHub repository.
Extracting Text from a Receipt Image via Python
The open source MonkeyOCR is an end-to-end Optical Character Recognition system based on deep learning techniques. Software Developers working on apps that scan documents, IDs, receipts, or license plates can plug MonkeyOCR directly into their backend pipeline. With its modular design, you can use just the detection model or combine it with recognition to extract structured text from images. Here is a simple example that demonstrates how to extract text from a receipt image using Python API.
How to Extract Text from a Receipt Image via Python API?
from monkey_ocr.ocr_predict import OCRPredictor
ocr = OCRPredictor(det_model_path="weights/dbnet.pth", rec_model_path="weights/crnn.pth")
results = ocr.predict("receipt.jpg")
for line in results:
print(line['text'])
Custom OCR Pipelines for Specific Use Cases
One of the biggest strengths of open source MonkeyOCR library is its modular architecture. Software developers can mix and match components such as detection, recognition, and classification models based on their application requirements. For example, a document scanning app can use a lightweight model like DBNet for detection and CRNN for recognition, optimizing both speed and accuracy.
Custom OCR Pipelines via Python API?
from monkey_ocr.ocr_predict import OCRPredictor
ocr = OCRPredictor(
det_model_path="weights/dbnet.pth",
rec_model_path="weights/crnn.pth"
)
results = ocr.predict("form_image.jpg")
for item in results:
print(item["text"], item["box"])
Integration with Business Software
The open source MonkeyOCR library can also be plugged into enterprise document workflows, such as automating data entry in ERP or CRM systems. Software developers can run MonkeyOCR in the background to scan scanned PDFs or image-based documents uploaded by users, automatically extracting structured information. By configuring MonkeyOCR with a config.yaml, teams can maintain consistency across different deployments.
Build Automated Form Readers
By combining MonkeyOCR’s text detection with positional data (bounding boxes), developers can design intelligent form readers that locate fields (e.g., “Name”, “Date”, “Amount”) and extract associated data. This is ideal for tax documents, medical forms, or surveys.