# Mistral OCR API Documentation This document provides detailed information about the Mistral OCR API response format and how to work with it in your applications. ## Table of Contents - [Mistral OCR API Documentation](#mistral-ocr-api-documentation) - [Table of Contents](#table-of-contents) - [API Response Format](#api-response-format) - [Document Metadata](#document-metadata) - [Pages](#pages) - [Images](#images) - [Working with the API Response](#working-with-the-api-response) - [Parsing the JSON Response](#parsing-the-json-response) - [Handling Images](#handling-images) - [Working with Markdown Content](#working-with-markdown-content) - [Error Handling](#error-handling) - [API Key Errors](#api-key-errors) - [File Size Errors](#file-size-errors) - [Rate Limiting](#rate-limiting) - [API Limitations](#api-limitations) ## API Response Format The Mistral OCR API returns a JSON response with the following structure: ```json { "metadata": { "title": "Document Title", "author": "Document Author", "creation_date": "2023-01-01", "page_count": 5 }, "pages": [ { "index": 0, "markdown": "# Page Content\n\nThis is the content of page 1...", "images": [ { "id": "image-1", "image_base64": "base64-encoded-image-data" } ] }, { "index": 1, "markdown": "## Page 2 Content\n\nThis is the content of page 2...", "images": [] } ] } ``` ### Document Metadata The `metadata` object contains document-level information: | Field | Type | Description | |-------|------|-------------| | `title` | String | The document title, if available | | `author` | String | The document author, if available | | `creation_date` | String | The document creation date in ISO format (YYYY-MM-DD), if available | | `page_count` | Integer | The total number of pages in the document | Note that some metadata fields may be empty or missing if the information cannot be extracted from the document. ### Pages The `pages` array contains objects representing each page in the document: | Field | Type | Description | |-------|------|-------------| | `index` | Integer | Zero-based page index | | `markdown` | String | The extracted text content in Markdown format | | `images` | Array | An array of image objects found on the page | ### Images Each image object in the `images` array has the following structure: | Field | Type | Description | |-------|------|-------------| | `id` | String | A unique identifier for the image | | `image_base64` | String | Base64-encoded image data (only included if `include_images` is specified) | ## Working with the API Response ### Parsing the JSON Response Here's an example of how to parse the JSON response in Python: ```python import json # Load the JSON response with open('ocr_results.json', 'r') as f: ocr_data = json.load(f) # Access metadata title = ocr_data.get('metadata', {}).get('title', 'Untitled Document') page_count = ocr_data.get('metadata', {}).get('page_count', 0) # Access page content for page in ocr_data.get('pages', []): page_index = page.get('index', 0) page_content = page.get('markdown', '') print(f"Page {page_index + 1}:") print(page_content) print("-" * 40) ``` ### Handling Images If you've included images in the response (using the `--include-images` flag), you can extract and save them: ```python import base64 import os # Create a directory for images os.makedirs('extracted_images', exist_ok=True) # Extract images from each page for page in ocr_data.get('pages', []): page_index = page.get('index', 0) for img_index, image in enumerate(page.get('images', [])): img_id = image.get('id', f'unknown-{img_index}') img_data = image.get('image_base64', '') if img_data: # Remove data URL prefix if present if ',' in img_data: img_data = img_data.split(',', 1)[1] # Decode and save the image img_bytes = base64.b64decode(img_data) with open(f'extracted_images/page{page_index}_{img_id}.jpg', 'wb') as img_file: img_file.write(img_bytes) ``` ### Working with Markdown Content The OCR results are provided in Markdown format, which makes it easy to convert to other formats or display in applications: ```python import markdown # Convert markdown to HTML for page in ocr_data.get('pages', []): page_content = page.get('markdown', '') html_content = markdown.markdown(page_content) # Now you can use the HTML content in your application # For example, save it to an HTML file with open(f'page_{page.get("index", 0)}.html', 'w') as f: f.write(html_content) ``` ## Error Handling When working with the API, you may encounter various errors. Here are some common error scenarios and how to handle them: ### API Key Errors ``` API key must be provided or set as MISTRAL_API_KEY environment variable ``` Ensure your API key is correctly set as an environment variable or provided with the `--api-key` flag. ### File Size Errors ``` File is too large (55.00 MB). Maximum allowed size is 52.00 MB ``` The Mistral API has a file size limit of 52MB. For larger files, consider splitting them into smaller documents. ### Rate Limiting ``` API returned error status: 429 - Rate limit exceeded ``` The API has rate limits. Implement exponential backoff and retry logic in your application: ```python import time import random def api_request_with_retry(func, max_retries=5, initial_delay=1): retries = 0 while retries < max_retries: try: return func() except Exception as e: if "429" in str(e) and retries < max_retries - 1: # Exponential backoff with jitter delay = initial_delay * (2 ** retries) + random.uniform(0, 1) print(f"Rate limited. Retrying in {delay:.2f} seconds...") time.sleep(delay) retries += 1 else: raise ``` ## API Limitations - **Maximum file size**: 52MB - **Supported file formats**: PDF, JPG, JPEG, PNG, WEBP, GIF - **Rate limits**: Depends on your Mistral AI account tier - **Concurrent requests**: Depends on your Mistral AI account tier - **Image extraction**: Some complex images or diagrams may not be perfectly extracted - **Language support**: Check the Mistral AI documentation for the latest information on supported languages