8.9 KiB
Mistral OCR API Documentation
This document provides detailed information about the Mistral OCR API response format and how to work with it in your applications.
Table of Contents
API Response Format
The Mistral OCR API returns a JSON response with the following structure:
{
"metadata": {
"title": "Document Title",
"author": "Document Author",
"creation_date": "2023-01-01",
"page_count": 5
},
"pages": [
{
"index": 0,
"markdown": "# Page Content\n\nThis is the content of page 1...",
"images": [
{
"id": "image-1",
"image_base64": "base64-encoded-image-data"
}
]
},
{
"index": 1,
"markdown": "## Page 2 Content\n\nThis is the content of page 2...",
"images": []
}
]
}
Document Metadata
The metadata object contains document-level information:
| Field | Type | Description |
|---|---|---|
title |
String | The document title, if available |
author |
String | The document author, if available |
creation_date |
String | The document creation date in ISO format (YYYY-MM-DD), if available |
page_count |
Integer | The total number of pages in the document |
Note that some metadata fields may be empty or missing if the information cannot be extracted from the document.
Pages
The pages array contains objects representing each page in the document:
| Field | Type | Description |
|---|---|---|
index |
Integer | Zero-based page index |
markdown |
String | The extracted text content in Markdown format |
images |
Array | An array of image objects found on the page |
Images
Each image object in the images array has the following structure:
| Field | Type | Description |
|---|---|---|
id |
String | A unique identifier for the image |
image_base64 |
String | Base64-encoded image data (only included if include_images is specified) |
Working with the API Response
Parsing the JSON Response
Here's an example of how to parse the JSON response in Python:
import json
# Load the JSON response
with open('ocr_results.json', 'r') as f:
ocr_data = json.load(f)
# Access metadata
title = ocr_data.get('metadata', {}).get('title', 'Untitled Document')
page_count = ocr_data.get('metadata', {}).get('page_count', 0)
# Access page content
for page in ocr_data.get('pages', []):
page_index = page.get('index', 0)
page_content = page.get('markdown', '')
print(f"Page {page_index + 1}:")
print(page_content)
print("-" * 40)
Handling Images
The Mistral OCR CLI provides two approaches for handling images:
1. Embedded Images
When using the --images flag without --extract-images, images are embedded directly in the markdown as base64 data. If you've included images in the response (using the --include-images flag), you can extract and save them manually:
import base64
import os
# Create a directory for images
os.makedirs('extracted_images', exist_ok=True)
# Extract images from each page
for page in ocr_data.get('pages', []):
page_index = page.get('index', 0)
for img_index, image in enumerate(page.get('images', [])):
img_id = image.get('id', f'unknown-{img_index}')
img_data = image.get('image_base64', '')
if img_data:
# Remove data URL prefix if present
if ',' in img_data:
img_data = img_data.split(',', 1)[1]
# Decode and save the image
img_bytes = base64.b64decode(img_data)
with open(f'extracted_images/page{page_index}_{img_id}.jpg', 'wb') as img_file:
img_file.write(img_bytes)
2. Extracted Images
Alternatively, you can use the --extract-images flag with the CLI to automatically extract images to separate files. This approach:
- Saves each image as a separate file in the specified directory (or
output_dir/imagesby default) - Updates the markdown to reference these image files instead of embedding base64 data
- Results in smaller, more manageable markdown files
Example command:
mistral-ocr markdown document.pdf --images --extract-images --image-dir custom_images
If you're working with the API directly and want to implement similar functionality, here's how you might do it:
import base64
import os
import re
def extract_images_from_ocr_data(ocr_data, image_dir='images'):
"""Extract images from OCR data and update markdown references."""
# Create image directory
os.makedirs(image_dir, exist_ok=True)
# Process each page
for page in ocr_data.get('pages', []):
page_index = page.get('index', 0)
markdown = page.get('markdown', '')
# Extract and save images
for img_index, image in enumerate(page.get('images', [])):
img_id = image.get('id', f'unknown-{img_index}')
img_data = image.get('image_base64', '')
if img_data:
# Generate filename
filename = f"{img_id.replace(' ', '_')}.jpg"
filepath = os.path.join(image_dir, filename)
# Remove data URL prefix if present
if ',' in img_data:
img_data = img_data.split(',', 1)[1]
# Save the image
with open(filepath, 'wb') as img_file:
img_file.write(base64.b64decode(img_data))
# Update markdown to reference the file
pattern = f"!\\[{re.escape(img_id)}\\]\\({re.escape(img_id)}\\)"
replacement = f", filename)})"
markdown = re.sub(pattern, replacement, markdown)
# Update the page's markdown
page['markdown'] = markdown
return ocr_data
Working with Markdown Content
The OCR results are provided in Markdown format, which makes it easy to convert to other formats or display in applications:
import markdown
# Convert markdown to HTML
for page in ocr_data.get('pages', []):
page_content = page.get('markdown', '')
html_content = markdown.markdown(page_content)
# Now you can use the HTML content in your application
# For example, save it to an HTML file
with open(f'page_{page.get("index", 0)}.html', 'w') as f:
f.write(html_content)
Error Handling
When working with the API, you may encounter various errors. Here are some common error scenarios and how to handle them:
API Key Errors
API key must be provided or set as MISTRAL_API_KEY environment variable
Ensure your API key is correctly set as an environment variable or provided with the --api-key flag.
File Size Errors
File is too large (55.00 MB). Maximum allowed size is 52.00 MB
The Mistral API has a file size limit of 52MB. For larger files, consider splitting them into smaller documents.
Rate Limiting
API returned error status: 429 - Rate limit exceeded
The API has rate limits. Implement exponential backoff and retry logic in your application:
import time
import random
def api_request_with_retry(func, max_retries=5, initial_delay=1):
retries = 0
while retries < max_retries:
try:
return func()
except Exception as e:
if "429" in str(e) and retries < max_retries - 1:
# Exponential backoff with jitter
delay = initial_delay * (2 ** retries) + random.uniform(0, 1)
print(f"Rate limited. Retrying in {delay:.2f} seconds...")
time.sleep(delay)
retries += 1
else:
raise
API Limitations
- Maximum file size: 52MB
- Supported file formats: PDF, JPG, JPEG, PNG, WEBP, GIF
- Rate limits: Depends on your Mistral AI account tier
- Concurrent requests: Depends on your Mistral AI account tier
- Image extraction: Some complex images or diagrams may not be perfectly extracted
- Language support: Check the Mistral AI documentation for the latest information on supported languages