Detect & Extract Marked Areas from Scanned Sheets via Python

Opne Source Free Python-based OMR Engine that Supports OMR Processing. It Supports, Processing OMR Sheets, Template-Based Detection & Handling Multi-Page Forms and so on.

What is PyOMR Librar?

Optical Mark Recognition (OMR) is a powerful technology used to capture human-marked data from documents like surveys, tests, and ballots. While commercial OMR solutions exist, developers looking for a free, open-source alternative can leverage PyOMR, a Python library designed for OMR processing. Developed by pattyjogal, PyOMR enables software developers to detect marked bubbles on scanned forms using simple Python code and standard image processing techniques. It supports detecting and extracting marked areas from scanned sheets, processing multiple-choice forms, surveys, and assessments as well as customizing OMR templates for various use cases.

PyOMR is a Python-based OMR engine designed to detect filled-in marks on predefined forms, such as MCQ test sheets or survey responses. It leverages libraries like OpenCV for image processing and provides a template-based system to define expected mark regions on the sheet. Whether you're a software developer building a grading system, a teacher automating test evaluations, or an organization collecting survey data, PyOMR allows you to plug OMR functionality directly into your app. With a template-based approach, OpenCV-powered detection, and full scriptability, it’s an ideal choice for developers who want to build automated form processing tools without relying on third-party services.

At A Glance

An overview of PyOMR features.

Features Overview

Detect Marked Areas
Extract Marked Areas
Exam bubble Sheet Scorer
Handling Multi-Page Forms
Recognize Marks Image
Use Rejection Mechanism
Images of Text Export
Recognized Font Text
Printable Multiple Choice Sheet
Save Data to Browser
Extract Text
Multi Languages Support
Noise Handling and Preprocessing

OMR

PyOMR supports popular image file formats listed below.

Reader

PNG, JPEG, BMP, TIFF, TGA, DICOM

Writer

PNG, JPEG, BMP, TIFF

OMR

Platform Independence

PyOMR can work with Python programming language

Python 2.6 and above.

OMR

Getting Started with PyOMR

The recommend way to install PyOMR Library is using GitHub. Please use the following command for a smooth installation

Install PyOMR Library via GitHub

 git clone https://github.com/pattyjogal/pyomr.git
cd pyomr

Install Supported Libraries via GitHub

 pip install opencv-python numpy

You can download the library directly from GitHub

Automatic Image Preprocessing via Python

• The open source PyOMR library has included complete support for automatic image processing inside Python applications. The library has included built-in functions for Deskewing for correcting tilted scans, Thresholding for converting to black and white for better mark detection and noise removal for improving accuracy in low-quality scans. Moreover, it allows the tool to handle common issues in scanned forms like lighting inconsistencies, low contrast, or minor misalignment. The following code example shows how to process images inside Python applications.

How to Process Image inside Python Apps?

from omr import omr_engine

# File paths
template_path = "config.json"
image_path = "answer_sheet.jpg"

# Run the OMR process
results = omr_engine.process_image(image_path, template_path)

# Output the results
for i, response in enumerate(results, 1):
    print(f"Q{i}: {response if response else 'No mark detected'}")

Template-Based Detection via Python

PyOMR uses a JSON template to define the layout of your answer sheet. This includes, the number of questions, number of options per question, start position of the first bubble, spacing between rows and columns and bubble dimensions. This makes the system highly flexible and reusable across different layouts. Here a simple example showing how to create a JSON template to specify bubble positions using Python library.

How to Create a JSON Template to specify Bubble Positions & Load it via Python?

{
  "bubbles": [
    {"question": "Q1", "options": ["A", "B", "C", "D"], "coords": [100, 150, 30, 30]},
    {"question": "Q2", "options": ["A", "B", "C", "D"], "coords": [100, 200, 30, 30]}
  ],
  "reference_marks": [[50, 50], [500, 50], [50, 700]]
}
#Load template
omr.load_template("template.json")

Handling Multi-Page Forms via Python

The PyOMR library has include support for handling multi-pages forms inside Python application. Whether processing single-page answer sheets or multi-page surveys, PyOMR can be configured to handle different layouts. It is recommended for multi-page forms, process each page separately: The following code example shows how software developers can handle multi-page forms inside Python applications.

How to Handle Multi-Page Forms via Python Library?

# for multi-page forms, process each page separately:

for page in omr.multi_page_scan("multi_page_form.pdf"):
    page.preprocess()
    answers = page.read_answers()
    print(answers)

Noise Handling and Preprocessing

The open source PyOMR library has provided complete support image preprocessing and noise handling inside Python applications. The library includes image preprocessing like grayscale conversion, thresholding, and contour detection to identify marks reliably—even with noisy scans or imperfect printing.