Add comprehensive documentation and code comments

This commit adds extensive documentation to the Mistral OCR CLI project: - Add API.md with detailed API response format documentation - Add CHANGELOG.md to track version changes - Add CONTRIBUTING.md with guidelines for contributors - Enhance README.md with more detailed usage examples and troubleshooting - Add proper docstrings to all Python modules and functions - Update requirements.txt with development dependencies - Improve setup.py with better metadata These changes make the project more accessible to users and contributors.
2025-04-24 21:11:41 +02:00
parent 240d64023b
commit 5e891ef461
13 changed files with 786 additions and 15 deletions
@@ -0,0 +1,217 @@
+# Mistral OCR API Documentation
+
+This document provides detailed information about the Mistral OCR API response format and how to work with it in your applications.
+
+## Table of Contents
+
+- [Mistral OCR API Documentation](#mistral-ocr-api-documentation)
+  - [Table of Contents](#table-of-contents)
+  - [API Response Format](#api-response-format)
+    - [Document Metadata](#document-metadata)
+    - [Pages](#pages)
+    - [Images](#images)
+  - [Working with the API Response](#working-with-the-api-response)
+    - [Parsing the JSON Response](#parsing-the-json-response)
+    - [Handling Images](#handling-images)
+    - [Working with Markdown Content](#working-with-markdown-content)
+  - [Error Handling](#error-handling)
+    - [API Key Errors](#api-key-errors)
+    - [File Size Errors](#file-size-errors)
+    - [Rate Limiting](#rate-limiting)
+  - [API Limitations](#api-limitations)
+
+## API Response Format
+
+The Mistral OCR API returns a JSON response with the following structure:
+
+```json
+{
+  "metadata": {
+    "title": "Document Title",
+    "author": "Document Author",
+    "creation_date": "2023-01-01",
+    "page_count": 5
+  },
+  "pages": [
+    {
+      "index": 0,
+      "markdown": "# Page Content\n\nThis is the content of page 1...",
+      "images": [
+        {
+          "id": "image-1",
+          "image_base64": "base64-encoded-image-data"
+        }
+      ]
+    },
+    {
+      "index": 1,
+      "markdown": "## Page 2 Content\n\nThis is the content of page 2...",
+      "images": []
+    }
+  ]
+}
+```
+
+### Document Metadata
+
+The `metadata` object contains document-level information:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `title` | String | The document title, if available |
+| `author` | String | The document author, if available |
+| `creation_date` | String | The document creation date in ISO format (YYYY-MM-DD), if available |
+| `page_count` | Integer | The total number of pages in the document |
+
+Note that some metadata fields may be empty or missing if the information cannot be extracted from the document.
+
+### Pages
+
+The `pages` array contains objects representing each page in the document:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `index` | Integer | Zero-based page index |
+| `markdown` | String | The extracted text content in Markdown format |
+| `images` | Array | An array of image objects found on the page |
+
+### Images
+
+Each image object in the `images` array has the following structure:
+
+| Field | Type | Description |
+|-------|------|-------------|
+| `id` | String | A unique identifier for the image |
+| `image_base64` | String | Base64-encoded image data (only included if `include_images` is specified) |
+
+## Working with the API Response
+
+### Parsing the JSON Response
+
+Here's an example of how to parse the JSON response in Python:
+
+```python
+import json
+
+# Load the JSON response
+with open('ocr_results.json', 'r') as f:
+    ocr_data = json.load(f)
+
+# Access metadata
+title = ocr_data.get('metadata', {}).get('title', 'Untitled Document')
+page_count = ocr_data.get('metadata', {}).get('page_count', 0)
+
+# Access page content
+for page in ocr_data.get('pages', []):
+    page_index = page.get('index', 0)
+    page_content = page.get('markdown', '')
+    
+    print(f"Page {page_index + 1}:")
+    print(page_content)
+    print("-" * 40)
+```
+
+### Handling Images
+
+If you've included images in the response (using the `--include-images` flag), you can extract and save them:
+
+```python
+import base64
+import os
+
+# Create a directory for images
+os.makedirs('extracted_images', exist_ok=True)
+
+# Extract images from each page
+for page in ocr_data.get('pages', []):
+    page_index = page.get('index', 0)
+    
+    for img_index, image in enumerate(page.get('images', [])):
+        img_id = image.get('id', f'unknown-{img_index}')
+        img_data = image.get('image_base64', '')
+        
+        if img_data:
+            # Remove data URL prefix if present
+            if ',' in img_data:
+                img_data = img_data.split(',', 1)[1]
+            
+            # Decode and save the image
+            img_bytes = base64.b64decode(img_data)
+            with open(f'extracted_images/page{page_index}_{img_id}.jpg', 'wb') as img_file:
+                img_file.write(img_bytes)
+```
+
+### Working with Markdown Content
+
+The OCR results are provided in Markdown format, which makes it easy to convert to other formats or display in applications:
+
+```python
+import markdown
+
+# Convert markdown to HTML
+for page in ocr_data.get('pages', []):
+    page_content = page.get('markdown', '')
+    html_content = markdown.markdown(page_content)
+    
+    # Now you can use the HTML content in your application
+    # For example, save it to an HTML file
+    with open(f'page_{page.get("index", 0)}.html', 'w') as f:
+        f.write(html_content)
+```
+
+## Error Handling
+
+When working with the API, you may encounter various errors. Here are some common error scenarios and how to handle them:
+
+### API Key Errors
+
+```
+API key must be provided or set as MISTRAL_API_KEY environment variable
+```
+
+Ensure your API key is correctly set as an environment variable or provided with the `--api-key` flag.
+
+### File Size Errors
+
+```
+File is too large (55.00 MB). Maximum allowed size is 52.00 MB
+```
+
+The Mistral API has a file size limit of 52MB. For larger files, consider splitting them into smaller documents.
+
+### Rate Limiting
+
+```
+API returned error status: 429 - Rate limit exceeded
+```
+
+The API has rate limits. Implement exponential backoff and retry logic in your application:
+
+```python
+import time
+import random
+
+def api_request_with_retry(func, max_retries=5, initial_delay=1):
+    retries = 0
+    while retries < max_retries:
+        try:
+            return func()
+        except Exception as e:
+            if "429" in str(e) and retries < max_retries - 1:
+                # Exponential backoff with jitter
+                delay = initial_delay * (2 ** retries) + random.uniform(0, 1)
+                print(f"Rate limited. Retrying in {delay:.2f} seconds...")
+                time.sleep(delay)
+                retries += 1
+            else:
+                raise
+```
+
+## API Limitations
+
+- **Maximum file size**: 52MB
+- **Supported file formats**: PDF, JPG, JPEG, PNG, WEBP, GIF
+- **Rate limits**: Depends on your Mistral AI account tier
+- **Concurrent requests**: Depends on your Mistral AI account tier
+- **Image extraction**: Some complex images or diagrams may not be perfectly extracted
+- **Language support**: Check the Mistral AI documentation for the latest information on supported languages