Extracting text from images using Python is a widely used technique in OCR-driven workflows such as document digitization, form recognition, and invoice processing. Many important documents still exist only as scanned images or photos, making it essential to convert visual information into machine-readable text.
With the help of powerful Python libraries, you can easily perform text extraction from image files with Python — even for multilingual documents or layout-sensitive content. In this article, you’ll learn how to use Python to extract text from an image, through practical OCR examples, useful tips, and proven methods to improve recognition accuracy.
The guide is structured as follows:
- Powerful Python Library to Extract Text from Image
- Step-by-Step: Python Code to Extract Text from Image
- Real-World Use Cases for Text Extraction from Images
- Supported Languages and Image Formats
- How to Improve OCR Accuracy (Best Practices)
- FAQ
Powerful Python Library to Extract Text from Image
Spire.OCR for Python is a powerful OCR library for Python, especially suited for applications requiring structured layout extraction and multilingual support. This Python OCR engine supports:
- Text recognition with layout and position information
- Multilingual support (English, Chinese, French, etc.)
- Supports multiple image formats including JPG, PNG, BMP, GIF, and TIFF
Setup: Install Dependencies and OCR Models
Before extracting text from images using Python, you need to install the spire.ocr library and download the OCR model files compatible with your operating system.
1. Install the Spire.OCR Python Package
Use pip to install the Spire.OCR for Python package:
pip install spire.ocr
2. Download the OCR Model Package
Download the OCR model files based on your OS:
- Windows: win-x64.zip
- Linux: linux.zip
- macOS: mac.zip
After downloading, extract the files and set the model path in your Python script when configuring the OCR engine.
Step-by-Step: Python Code to Extract Text from Image
In this section, we’ll walk through different ways to extract text from images using Python — starting with a simple plain-text extraction, and then moving to more advanced structured recognition.
Basic OCR Text Extraction (Image to Plain Text)
Here’s how to extract plain text from an image using Python:
from spire.ocr import *
# Create OCR scanner instance
scanner = OcrScanner()
# Configure OCR model path and language
configureOptions = ConfigureOptions()
configureOptions.ModelPath = r'D:\OCR\win-x64'
configureOptions.Language = 'English'
scanner.ConfigureDependencies(configureOptions)
# Perform OCR on the image
scanner.Scan(r'Sample.png')
# Save extracted text to file
text = scanner.Text.ToString()
with open('output.txt', 'a', encoding='utf-8') as file:
file.write(text + '\n')
Optional: Clean and Preprocess Extracted Text (Post-OCR)
After OCR, the output may contain empty lines or noise. This snippet shows how to clean the text:
# Clean extracted text: remove empty or short lines
clean_lines = [line.strip() for line in text.split('\n') if len(line.strip()) > 2]
cleaned_text = '\n'.join(clean_lines)
# Save to a clean version
with open('output_clean.txt', 'w', encoding='utf-8') as file:
file.write(cleaned_text)
Use Case: Useful for post-processing OCR output before feeding into NLP tasks or database storage.
Here’s an example of plain-text OCR output using Spire.OCR:
Extract Text from Image with Coordinates
In forms or invoices, you may need both text content and layout. The code below outputs each block’s bounding box info:
from spire.ocr import *
scanner = OcrScanner()
configureOptions = ConfigureOptions()
configureOptions.ModelPath = r'D:\OCR\win-x64'
configureOptions.Language = 'English'
scanner.ConfigureDependencies(configureOptions)
scanner.Scan(r'sample.png')
text = scanner.Text
# Extract block-level text with position
block_text = ""
for block in text.Blocks:
rectangle = block.Box
block_info = f'{block.Text} -> x: {rectangle.X}, y: {rectangle.Y}, w: {rectangle.Width}, h: {rectangle.Height}'
block_text += block_info + '\n'
with open('output.txt', 'a', encoding='utf-8') as file:
file.write(block_text + '\n')
Extract Text from Multiple Images in a Folder
You can also batch process a folder of images:
import os
from spire.ocr import *
def extract_text_from_folder(folder_path, model_path):
scanner = OcrScanner()
config = ConfigureOptions()
config.ModelPath = model_path
config.Language = 'English'
scanner.ConfigureDependencies(config)
for filename in os.listdir(folder_path):
if filename.lower().endswith(('.png', '.jpg', '.jpeg')):
image_path = os.path.join(folder_path, filename)
scanner.Scan(image_path)
text = scanner.Text.ToString()
# Save each result as a separate file
output_file = os.path.splitext(filename)[0] + '_output.txt'
with open(output_file, 'w', encoding='utf-8') as f:
f.write(text)
# Example usage
extract_text_from_folder(r'D:\images', r'D:\OCR\win-x64')
The recognized text blocks with position information are shown below:
Real-World Use Cases for Text Extraction from Images
Python-based OCR can be applied in:
- ✅ Invoice and receipt scanning
- ✅ Identity document OCR (passport, license)
- ✅ Business card digitization
- ✅ Form and survey data extraction
- ✅ Multilingual document indexing
Tip: For text extraction from PDF documents instead of images, you might also want to explore this tutorial on extracting text from PDF using Python.
Supported Languages and Image Formats
Spire.OCR supports multiple languages and a wide range of image formats for broader application scenarios.
Supported Languages:
- English
- Simplified / Traditional Chinese
- French
- German
- Japanese
- Korean
You can set the language using configureOptions.Language.
Supported Image Formats:
- JPG / JPEG
- PNG
- BMP
- GIF
- TIFF
How to Improve OCR Accuracy (Best Practices)
For better OCR text extraction from images using Python, follow these tips:
- Use high-resolution images (≥300 DPI)
- Preprocess with grayscale, thresholding, or denoising
- Avoid skewed or noisy scans
- Match the OCR language with the image content
FAQ
How to extract text from an image in Python code?
To extract text from an image using Python, you can use an OCR library like Spire.OCR for Python. With just a few lines of Python code, you can recognize text in scanned documents or photos and convert it into editable, searchable content.
What is the best Python library to extract text from image?
Spire.OCR for Python is a powerful Python OCR library that offers high-accuracy recognition, multilingual support, and layout-aware output. It also works seamlessly with Spire.Office components, allowing full automation — such as saving extracted text to Excel, Word, or searchable PDFs. You can also explore open-source tools to build your Python text extraction from image projects, depending on your specific needs and preferences.
How to extract data (including position) from image in Python?
When performing text extraction from image using Python, Spire.OCR provides not just the recognized text, but also bounding box coordinates for each block — ideal for processing structured content like tables, forms, or receipts.
How to extract text using Python from scanned PDF files?
To perform text extraction from scanned PDF files using Python, you can first convert each PDF page into an image, then apply OCR using Spire.OCR for Python. For this, we recommend using Spire.PDF for Python — it allows you to save PDF pages as images or directly extract embedded images from scanned PDFs, making it easy to integrate with your OCR pipeline.
Conclusion: Efficient Text Extraction from Images with Python
Thanks to powerful libraries like Spire.OCR, text extraction from images in Python is both fast and reliable. Whether you're processing receipts or building an intelligent OCR pipeline, this approach gives you precise control over both content and layout.
If you want to remove usage limitations of Spire.OCR for Python, you can apply for a free temporary license.