Aspose.OCR Cloud SDK for Python

Python OCR API to Read & Extract Image's Text

Read and Extract text from Images, Photos, Screenshots, Scanned documents, and PDF Files via Leading Python OCR Library.

What is Aspose.OCR Cloud SDK for Python?

The Aspose.OCR Cloud SDK for Python empowers software developers with a flexible optical character recognition solution, free from external dependencies. This advanced OCR tool seamlessly extracts text from images, photos, screenshots, scanned documents, and PDFs across numerous European, Cyrillic, and Eastern scripts. Delivering results in popular document formats, the API simplifies integrating robust OCR functionality into nearly any device or platform—from netbooks and mini PCs to entry-level smartphones.

Ideal for Python developers, this straightforward SDK offers extensive features, including reading entire images or scanned PDFs, extracting text from specific regions, processing receipts and tables, and even converting results to natural speech. Built upon the Aspose.OCR Cloud API, the cloud-based engine supports 45 recognition languages, such as English, French, German, Spanish, Chinese, Japanese, and Arabic. With an intuitive interface, Python programmers can embed OCR capabilities using just a few code lines, effortlessly uploading images, performing recognition, and retrieving text—making it a premier choice for adding OCR to Python applications.

At A Glance

An overview of Aspose.OCR Cloud SDK for Python features.

Features Overview

Perform OCR Operations
Add OCR Capabilities
Recognize Image text
Convet images of text
Recognized Font text
Search PDF
27 Recognition Languages
Create OCR apps
Save to browser
Extract Text
Multi-threading Support

Features Overview

Recognize rotated Image
Pre-processing filters
PDF to Images
Recognizes Chines Chars
Detects Popular typefaces
Processes whole image
Rotated images Support
Batch Recognition
Built-in Spell Checker
Split PDF
PDF to Excel
PDF to SVG

Aspose.OCR Cloud SDK for Python

API mainly supports PDF format but can export PDF documents to a number of other formats.

Reader

PDF, PDF/A, TEX, XPS, SVG

Writer

PDF, TXT, PNG, JPEG , PDF/A, DOC, DOCX, TEX, XPS, SVG, XLSX, PPTX

Aspose.OCR Cloud SDK for Python

Platform Independence

Aspose.OCR Cloud SDK for Python can work with any Python based programming language.

Python 4.5 and above.

Aspose.OCR Cloud SDK for Python

Getting Started with Aspose.OCR Cloud SDK for Python

The recommend way to install Aspose.OCR Cloud SDK for Python is using pip. Please use the following command for a smooth installation.

Install Aspose.OCR Cloud SDK for Python via pip

 pip install aspose-ocr-cloud

You can download the SDK directly from Aspose.OCR Python Cloud SDK product page

Image Recognition using Python SDK

Aspose.OCR Cloud SDK for Python allows software developers to perform OCR operation to achieve image recognition inside their own Python applications. The API is very easy to use and image recognition can be performed from any platform with Internet access. You can easily use the OCR REST API to select and send images for recognition, fetch results and store it in any supported file formats with just a couple of lines of code. The following example shows how to perform OCR operation on images using Python code.

How to Perform OCR Operations on an Image inside Python Apps?

import asposeocrcloud

# create an instance of the OCR client
client = asposeocrcloud.OcrApi(api_key='your_api_key', app_sid='your_app_sid')

# read the image file
with open('image.jpg', 'rb') as image_file:
    image_data = image_file.read()

# call the OCR API to extract text from the image
result = client.post_ocr(image_data=image_data, language='eng', use_default_dictionaries=True)

# print the extracted text
print(result.text)

Extract Text from PDF Files via Python API

Portable Document Format (PDF) is one of the world's most popular business document file format and is a file format developed by Adobe in 1992 to present documents. Aspose.OCR Cloud SDK for Python has included a very powerful feature for extracting text from PDF files inside Python applications. To achieve the task in easy way you need to upload the PDF file to the Aspose cloud storage and perform the OCR recognition on the uploaded PDF file. The following example shows how software developers can extract text from a PDF file using Python code.

How to Extract Text from a PDF File via Python API?

import asposeocrcloud
from asposeocrcloud.apis.ocr_api import OcrApi
from asposeocrcloud.configuration import Configuration

configuration = Configuration(api_key='your_api_key', app_sid='your_app_sid')
api = OcrApi(asposeocrcloud.ApiClient(configuration))

# Upload the PDF file to the Aspose cloud storage

with open('your_pdf_file.pdf', 'rb') as file:
    api.upload_file(path='your_pdf_file.pdf', file=file)

# Perform the OCR recognition on the uploaded PDF file
result = api.post_recognize_ocr_from_url_or_content(file_path='your_pdf_file.pdf')

# Story the recognized text

recognized_text = result['text']
print(recognized_text)

Convert Text to Speech via Python API

Aspose.OCR Cloud SDK for Python enables software developers to convert text from image without installing any 3rd party software. Using the API, programmers can convert the recognition results into a natural human voice that can be played in the background or downloaded. First user’s need to send the image to Aspose OCR Cloud server and extract text from it and after that convert the text to speech using the Aspose OCR Cloud Text-to-Speech API. After the successful conversion you can save the speech file to disk.

How to Convert Text to Speech using Python API?

 import os
from asposeocrcloud import OcrApi, OcrClient, SpeechApi

client_id = os.environ['CLIENT_ID']
client_secret = os.environ['CLIENT_SECRET']
ocr_api = OcrApi(OcrClient(client_id, client_secret))
speech_api = SpeechApi(OcrClient(client_id, client_secret))

# Upload the image containing the text
filename = 'image.png'
with open(filename, 'rb') as file:
    response = ocr_api.post_recognize_from_content(file.read(), language='English', use_default_dictionaries=True)

# Extract the recognized text

text = ''
for result in response.parts:
    for line in result.lines:
        for word in line.words:
            text += word.text + ' '

# Convert the text to speech
response = speech_api.post_recognize_from_text(text, language='en-US', voice_name='Ben')

# Save the speech file to disk

with open('output.wav', 'wb') as file:
    file.write(response.content)