Add comprehensive documentation and code comments

This commit adds extensive documentation to the Mistral OCR CLI project:

- Add API.md with detailed API response format documentation
- Add CHANGELOG.md to track version changes
- Add CONTRIBUTING.md with guidelines for contributors
- Enhance README.md with more detailed usage examples and troubleshooting
- Add proper docstrings to all Python modules and functions
- Update requirements.txt with development dependencies
- Improve setup.py with better metadata

These changes make the project more accessible to users and contributors.
This commit is contained in:
2025-04-24 21:11:41 +02:00
parent 240d64023b
commit 5e891ef461
13 changed files with 786 additions and 15 deletions
+217
View File
@@ -0,0 +1,217 @@
# Mistral OCR API Documentation
This document provides detailed information about the Mistral OCR API response format and how to work with it in your applications.
## Table of Contents
- [Mistral OCR API Documentation](#mistral-ocr-api-documentation)
- [Table of Contents](#table-of-contents)
- [API Response Format](#api-response-format)
- [Document Metadata](#document-metadata)
- [Pages](#pages)
- [Images](#images)
- [Working with the API Response](#working-with-the-api-response)
- [Parsing the JSON Response](#parsing-the-json-response)
- [Handling Images](#handling-images)
- [Working with Markdown Content](#working-with-markdown-content)
- [Error Handling](#error-handling)
- [API Key Errors](#api-key-errors)
- [File Size Errors](#file-size-errors)
- [Rate Limiting](#rate-limiting)
- [API Limitations](#api-limitations)
## API Response Format
The Mistral OCR API returns a JSON response with the following structure:
```json
{
"metadata": {
"title": "Document Title",
"author": "Document Author",
"creation_date": "2023-01-01",
"page_count": 5
},
"pages": [
{
"index": 0,
"markdown": "# Page Content\n\nThis is the content of page 1...",
"images": [
{
"id": "image-1",
"image_base64": "base64-encoded-image-data"
}
]
},
{
"index": 1,
"markdown": "## Page 2 Content\n\nThis is the content of page 2...",
"images": []
}
]
}
```
### Document Metadata
The `metadata` object contains document-level information:
| Field | Type | Description |
|-------|------|-------------|
| `title` | String | The document title, if available |
| `author` | String | The document author, if available |
| `creation_date` | String | The document creation date in ISO format (YYYY-MM-DD), if available |
| `page_count` | Integer | The total number of pages in the document |
Note that some metadata fields may be empty or missing if the information cannot be extracted from the document.
### Pages
The `pages` array contains objects representing each page in the document:
| Field | Type | Description |
|-------|------|-------------|
| `index` | Integer | Zero-based page index |
| `markdown` | String | The extracted text content in Markdown format |
| `images` | Array | An array of image objects found on the page |
### Images
Each image object in the `images` array has the following structure:
| Field | Type | Description |
|-------|------|-------------|
| `id` | String | A unique identifier for the image |
| `image_base64` | String | Base64-encoded image data (only included if `include_images` is specified) |
## Working with the API Response
### Parsing the JSON Response
Here's an example of how to parse the JSON response in Python:
```python
import json
# Load the JSON response
with open('ocr_results.json', 'r') as f:
ocr_data = json.load(f)
# Access metadata
title = ocr_data.get('metadata', {}).get('title', 'Untitled Document')
page_count = ocr_data.get('metadata', {}).get('page_count', 0)
# Access page content
for page in ocr_data.get('pages', []):
page_index = page.get('index', 0)
page_content = page.get('markdown', '')
print(f"Page {page_index + 1}:")
print(page_content)
print("-" * 40)
```
### Handling Images
If you've included images in the response (using the `--include-images` flag), you can extract and save them:
```python
import base64
import os
# Create a directory for images
os.makedirs('extracted_images', exist_ok=True)
# Extract images from each page
for page in ocr_data.get('pages', []):
page_index = page.get('index', 0)
for img_index, image in enumerate(page.get('images', [])):
img_id = image.get('id', f'unknown-{img_index}')
img_data = image.get('image_base64', '')
if img_data:
# Remove data URL prefix if present
if ',' in img_data:
img_data = img_data.split(',', 1)[1]
# Decode and save the image
img_bytes = base64.b64decode(img_data)
with open(f'extracted_images/page{page_index}_{img_id}.jpg', 'wb') as img_file:
img_file.write(img_bytes)
```
### Working with Markdown Content
The OCR results are provided in Markdown format, which makes it easy to convert to other formats or display in applications:
```python
import markdown
# Convert markdown to HTML
for page in ocr_data.get('pages', []):
page_content = page.get('markdown', '')
html_content = markdown.markdown(page_content)
# Now you can use the HTML content in your application
# For example, save it to an HTML file
with open(f'page_{page.get("index", 0)}.html', 'w') as f:
f.write(html_content)
```
## Error Handling
When working with the API, you may encounter various errors. Here are some common error scenarios and how to handle them:
### API Key Errors
```
API key must be provided or set as MISTRAL_API_KEY environment variable
```
Ensure your API key is correctly set as an environment variable or provided with the `--api-key` flag.
### File Size Errors
```
File is too large (55.00 MB). Maximum allowed size is 52.00 MB
```
The Mistral API has a file size limit of 52MB. For larger files, consider splitting them into smaller documents.
### Rate Limiting
```
API returned error status: 429 - Rate limit exceeded
```
The API has rate limits. Implement exponential backoff and retry logic in your application:
```python
import time
import random
def api_request_with_retry(func, max_retries=5, initial_delay=1):
retries = 0
while retries < max_retries:
try:
return func()
except Exception as e:
if "429" in str(e) and retries < max_retries - 1:
# Exponential backoff with jitter
delay = initial_delay * (2 ** retries) + random.uniform(0, 1)
print(f"Rate limited. Retrying in {delay:.2f} seconds...")
time.sleep(delay)
retries += 1
else:
raise
```
## API Limitations
- **Maximum file size**: 52MB
- **Supported file formats**: PDF, JPG, JPEG, PNG, WEBP, GIF
- **Rate limits**: Depends on your Mistral AI account tier
- **Concurrent requests**: Depends on your Mistral AI account tier
- **Image extraction**: Some complex images or diagrams may not be perfectly extracted
- **Language support**: Check the Mistral AI documentation for the latest information on supported languages
+26
View File
@@ -0,0 +1,26 @@
# Changelog
All notable changes to the Mistral OCR CLI project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- Enhanced documentation including README.md, CONTRIBUTING.md, and CHANGELOG.md
- More detailed troubleshooting section
- API response format documentation
## [0.1.0] - 2025-04-24
### Added
- Initial release of Mistral OCR CLI
- Process command for OCR processing of PDF documents and images
- Convert command for transforming OCR JSON to Markdown
- Markdown command for one-step processing and conversion
- Support for local files and URLs
- Support for extracting and including images
- Support for metadata extraction
- Support for single-file and multi-file output
- Basic error handling and retries
+171
View File
@@ -0,0 +1,171 @@
# Contributing to Mistral OCR CLI
Thank you for your interest in contributing to Mistral OCR CLI! This document provides guidelines and instructions for contributing to this project.
## Table of Contents
- [Contributing to Mistral OCR CLI](#contributing-to-mistral-ocr-cli)
- [Table of Contents](#table-of-contents)
- [Code of Conduct](#code-of-conduct)
- [Getting Started](#getting-started)
- [Development Environment Setup](#development-environment-setup)
- [Project Structure](#project-structure)
- [Development Workflow](#development-workflow)
- [Creating a Feature](#creating-a-feature)
- [Testing](#testing)
- [Documentation](#documentation)
- [Pull Request Process](#pull-request-process)
- [Coding Standards](#coding-standards)
- [Release Process](#release-process)
## Code of Conduct
Please be respectful and considerate of others when contributing to this project. We aim to foster an inclusive and welcoming community.
## Getting Started
### Development Environment Setup
1. **Fork and clone the repository**:
```bash
git clone https://github.com/yourusername/mistral-ocr-python.git
cd mistral-ocr-python
```
2. **Create a virtual environment**:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. **Install the package in development mode**:
```bash
pip install -e .
```
4. **Install development dependencies**:
```bash
pip install pytest pytest-cov black flake8
```
### Project Structure
```
mistral-ocr-python/
├── mistral_ocr/ # Main package
│ ├── __init__.py
│ ├── __main__.py # CLI entry point
│ ├── client.py # Mistral API client
│ └── commands/ # Command implementations
│ ├── __init__.py
│ ├── convert.py # Convert command
│ ├── markdown.py # Markdown command
│ ├── process.py # Process command
│ └── version.py # Version command
├── tests/ # Test directory
├── .gitignore
├── README.md
├── CONTRIBUTING.md
├── LICENSE
├── requirements.txt
├── setup.py
└── build.sh
```
## Development Workflow
### Creating a Feature
1. **Create a new branch**:
```bash
git checkout -b feature/your-feature-name
```
2. **Make your changes**:
- Implement your feature or fix
- Add or update tests as necessary
- Update documentation to reflect your changes
3. **Commit your changes**:
```bash
git add .
git commit -m "Add feature: your feature description"
```
### Testing
We use pytest for testing. To run the tests:
```bash
python -m pytest
```
For coverage report:
```bash
python -m pytest --cov=mistral_ocr
```
Please ensure that your code is well-tested and that all tests pass before submitting a pull request.
### Documentation
- Update the README.md if your changes affect the usage of the tool
- Add docstrings to your code following the Google style guide
- Update or add examples if necessary
## Pull Request Process
1. **Push your changes to your fork**:
```bash
git push origin feature/your-feature-name
```
2. **Create a pull request** from your fork to the main repository
3. **Describe your changes** in the pull request:
- What does this PR add or fix?
- Any breaking changes?
- Any dependencies added?
4. **Address review comments** if any are provided
5. **Your PR will be merged** once it's approved
## Coding Standards
We follow PEP 8 and use Black for code formatting:
```bash
black mistral_ocr
```
For linting:
```bash
flake8 mistral_ocr
```
General guidelines:
- Use descriptive variable and function names
- Add type hints to function signatures
- Write docstrings for all functions, classes, and modules
- Keep functions small and focused on a single task
- Use comments to explain complex logic
## Release Process
Releases are managed by the project maintainers. If you'd like to propose a release:
1. Update the version number in:
- `mistral_ocr/commands/version.py`
- `setup.py`
2. Update the CHANGELOG.md with the changes in the new version
3. Create a pull request with these changes
4. Once merged, the maintainers will create a new release tag
Thank you for contributing to Mistral OCR CLI!
+135 -7
View File
@@ -1,6 +1,6 @@
# Mistral OCR CLI (Python) # Mistral OCR CLI (Python)
A command-line tool for processing documents with Mistral AI's OCR capabilities, implemented in Python. A command-line tool for processing documents with Mistral AI's OCR capabilities, implemented in Python. This tool allows you to extract text and structured content from PDF documents and images while preserving the original formatting and layout.
## Features ## Features
@@ -10,6 +10,20 @@ A command-line tool for processing documents with Mistral AI's OCR capabilities,
- Output results to stdout or to a file - Output results to stdout or to a file
- Convert OCR results to Markdown format - Convert OCR results to Markdown format
- Maintain document structure and formatting in the output - Maintain document structure and formatting in the output
- Support for extracting and embedding images
- Metadata extraction (title, author, creation date)
- Page-by-page processing with optional single-file output
## How It Works
Mistral OCR CLI works by:
1. Uploading your document to the Mistral AI API (for local files) or providing the URL
2. Processing the document using Mistral's advanced OCR capabilities
3. Receiving structured JSON data containing the extracted text, formatting, and metadata
4. Optionally converting this data to Markdown format for easy reading and editing
The tool handles authentication, file uploads, API communication, and result formatting, making it easy to integrate OCR capabilities into your workflow.
## Installation ## Installation
@@ -17,6 +31,7 @@ A command-line tool for processing documents with Mistral AI's OCR capabilities,
- Python 3.7 or later - Python 3.7 or later
- pip (Python package installer) - pip (Python package installer)
- A Mistral AI API key (sign up at [Mistral AI](https://mistral.ai) if you don't have one)
### Installing from source ### Installing from source
@@ -26,7 +41,7 @@ cd mistral-ocr-python
pip install -e . pip install -e .
``` ```
Alternatively, you can use the build script: Alternatively, you can use the build script which creates a virtual environment and installs the package:
```bash ```bash
git clone https://github.com/yourusername/mistral-ocr-python git clone https://github.com/yourusername/mistral-ocr-python
@@ -34,13 +49,19 @@ cd mistral-ocr-python
./build.sh ./build.sh
``` ```
### Installing from PyPI (coming soon)
```bash
pip install mistral-ocr
```
## Usage ## Usage
### Setting up your API key ### Setting up your API key
You can provide your Mistral API key in two ways: You can provide your Mistral API key in two ways:
1. Environment variable: 1. Environment variable (recommended for security):
```bash ```bash
export MISTRAL_API_KEY=your-api-key export MISTRAL_API_KEY=your-api-key
``` ```
@@ -125,19 +146,19 @@ mistral-ocr version
### Examples ### Examples
### Process a local PDF and save the output #### Process a local PDF and save the output
```bash ```bash
mistral-ocr process ~/Documents/sample.pdf --output-file results.json mistral-ocr process ~/Documents/sample.pdf --output-file results.json
``` ```
### Process a document from a URL #### Process a document from a URL
```bash ```bash
mistral-ocr process https://arxiv.org/pdf/2201.04234 > output.json mistral-ocr process https://arxiv.org/pdf/2201.04234 > output.json
``` ```
### Convert OCR JSON to Markdown files #### Convert OCR JSON to Markdown files
```bash ```bash
# Create separate files (one per page) # Create separate files (one per page)
@@ -150,7 +171,7 @@ mistral-ocr convert output.json --single-file --output-dir markdown_docs
mistral-ocr convert output.json --output-file docs/paper.md mistral-ocr convert output.json --output-file docs/paper.md
``` ```
### Process a document and generate markdown files in one step #### Process a document and generate markdown files in one step
```bash ```bash
# Generate separate files (one per page) # Generate separate files (one per page)
@@ -163,6 +184,113 @@ mistral-ocr markdown ~/Documents/research-paper.pdf --single-file --output-dir r
mistral-ocr markdown ~/Documents/research-paper.pdf --output-file research_docs/paper.md mistral-ocr markdown ~/Documents/research-paper.pdf --output-file research_docs/paper.md
``` ```
## OCR Response Format
The OCR API returns a JSON response with the following structure:
```json
{
"metadata": {
"title": "Document Title",
"author": "Document Author",
"creation_date": "2023-01-01",
"page_count": 5
},
"pages": [
{
"index": 0,
"markdown": "# Page Content\n\nThis is the content of page 1...",
"images": [
{
"id": "image-1",
"image_base64": "base64-encoded-image-data"
}
]
},
{
"index": 1,
"markdown": "## Page 2 Content\n\nThis is the content of page 2...",
"images": []
}
]
}
```
### Key Components:
- **metadata**: Contains document-level information
- **title**: Document title (if available)
- **author**: Document author (if available)
- **creation_date**: Document creation date (if available)
- **page_count**: Total number of pages
- **pages**: Array of page objects
- **index**: Zero-based page index
- **markdown**: Extracted text in Markdown format
- **images**: Array of images found on the page
- **id**: Unique image identifier
- **image_base64**: Base64-encoded image data (only included if `--include-images` is specified)
## Troubleshooting
### Common Issues
#### API Key Issues
```
Error processing document: API key must be provided or set as MISTRAL_API_KEY environment variable
```
**Solution**: Ensure your API key is correctly set as an environment variable or provided with the `--api-key` flag.
#### File Size Limits
```
Error processing document: File is too large (55.00 MB). Maximum allowed size is 52.00 MB
```
**Solution**: The Mistral API has a file size limit of 52MB. For larger files, consider splitting them into smaller documents.
#### Rate Limiting
```
Error processing document: API returned error status: 429 - Rate limit exceeded
```
**Solution**: The API has rate limits. Wait a few minutes before trying again or contact Mistral AI to increase your rate limits.
#### Invalid JSON
```
Error converting JSON to markdown: Expecting property name enclosed in double quotes
```
**Solution**: Ensure the JSON file is valid. You can validate it using tools like `jq`.
### API Limitations
- Maximum file size: 52MB
- Supported file formats: PDF, JPG, JPEG, PNG, WEBP, GIF
- Rate limits may apply depending on your Mistral AI account tier
## Contributing
Contributions to Mistral OCR CLI are welcome! Here's how you can contribute:
1. **Fork the repository**
2. **Create a feature branch**:
```bash
git checkout -b feature/your-feature-name
```
3. **Make your changes**
4. **Run tests** (if available):
```bash
python -m unittest discover tests
```
5. **Submit a pull request**
Please ensure your code follows the project's coding standards and includes appropriate tests and documentation.
## License ## License
MIT MIT
+17 -1
View File
@@ -1 +1,17 @@
# Mistral OCR Python Package """
Mistral OCR Python Package
A command-line tool for processing documents with Mistral AI's OCR capabilities.
This package provides functionality for extracting text and structured content
from PDF documents and images while preserving the original formatting and layout.
Main components:
- Client for interacting with the Mistral AI OCR API
- Commands for processing documents, converting results to Markdown, and more
- Utilities for handling file operations and formatting
For usage information, see the README.md file or run:
mistral-ocr --help
"""
__version__ = "0.1.0"
+8
View File
@@ -4,6 +4,14 @@ import os
from mistral_ocr.commands import process, convert, markdown, version from mistral_ocr.commands import process, convert, markdown, version
def main(): def main():
"""
Main entry point for the Mistral OCR CLI.
Parses command line arguments and dispatches to the appropriate command handler.
Returns:
int: Exit code (0 for success, 1 for error)
"""
parser = argparse.ArgumentParser( parser = argparse.ArgumentParser(
description="A CLI tool for performing OCR on documents using Mistral AI.", description="A CLI tool for performing OCR on documents using Mistral AI.",
prog="mistral-ocr" prog="mistral-ocr"
+65 -3
View File
@@ -5,10 +5,33 @@ import requests
from typing import Optional, Dict, Any, Tuple from typing import Optional, Dict, Any, Tuple
class MistralClient: class MistralClient:
"""
Client for interacting with the Mistral AI OCR API.
This client handles authentication, file uploads, and OCR processing
requests to the Mistral AI API.
Attributes:
BASE_URL (str): Base URL for the Mistral AI API
MAX_FILE_SIZE (int): Maximum allowed file size in bytes (52 MB)
api_key (str): Mistral AI API key for authentication
session (requests.Session): Session object for making HTTP requests
"""
BASE_URL = "https://api.mistral.ai/v1" BASE_URL = "https://api.mistral.ai/v1"
MAX_FILE_SIZE = 52 * 1024 * 1024 # 52 MB MAX_FILE_SIZE = 52 * 1024 * 1024 # 52 MB
def __init__(self, api_key: Optional[str] = None): def __init__(self, api_key: Optional[str] = None):
"""
Initialize the Mistral AI client.
Args:
api_key (Optional[str]): Mistral AI API key. If not provided,
will look for MISTRAL_API_KEY environment variable.
Raises:
ValueError: If no API key is provided or found in environment variables.
"""
self.api_key = api_key or os.environ.get("MISTRAL_API_KEY") self.api_key = api_key or os.environ.get("MISTRAL_API_KEY")
if not self.api_key: if not self.api_key:
raise ValueError("API key must be provided or set as MISTRAL_API_KEY environment variable") raise ValueError("API key must be provided or set as MISTRAL_API_KEY environment variable")
@@ -20,7 +43,19 @@ class MistralClient:
}) })
def upload_file(self, file_path: str) -> str: def upload_file(self, file_path: str) -> str:
"""Upload a file to Mistral API for OCR processing.""" """
Upload a file to Mistral API for OCR processing.
Args:
file_path (str): Path to the local file to upload
Returns:
str: File ID returned by the API
Raises:
ValueError: If the file is too large or if the upload fails
requests.RequestException: If there's an error communicating with the API
"""
# Check file size # Check file size
file_size = os.path.getsize(file_path) file_size = os.path.getsize(file_path)
if file_size > self.MAX_FILE_SIZE: if file_size > self.MAX_FILE_SIZE:
@@ -72,7 +107,19 @@ class MistralClient:
raise last_error or ValueError(f"Failed to upload file after {max_retries} attempts") raise last_error or ValueError(f"Failed to upload file after {max_retries} attempts")
def get_file_url(self, file_id: str) -> str: def get_file_url(self, file_id: str) -> str:
"""Get a signed URL for an uploaded file.""" """
Get a signed URL for an uploaded file.
Args:
file_id (str): ID of the file previously uploaded to the API
Returns:
str: Signed URL that can be used for OCR processing
Raises:
ValueError: If the API response does not contain a URL
requests.RequestException: If there's an error communicating with the API
"""
response = self.session.get(f"{self.BASE_URL}/files/{file_id}/url?expiry=24") response = self.session.get(f"{self.BASE_URL}/files/{file_id}/url?expiry=24")
response.raise_for_status() response.raise_for_status()
@@ -85,7 +132,22 @@ class MistralClient:
return url return url
def process_ocr(self, doc_type: str, doc_source: str, include_image_base64: bool = False) -> bytes: def process_ocr(self, doc_type: str, doc_source: str, include_image_base64: bool = False) -> bytes:
"""Process a document with OCR.""" """
Process a document with OCR.
Args:
doc_type (str): Type of document, either "document_url" or "image_url"
doc_source (str): URL of the document to process
include_image_base64 (bool, optional): Whether to include base64-encoded
images in the response. Defaults to False.
Returns:
bytes: JSON response from the API containing OCR results
Raises:
ValueError: If the document type is unsupported or if processing fails
requests.RequestException: If there's an error communicating with the API
"""
if doc_type not in ["document_url", "image_url"]: if doc_type not in ["document_url", "image_url"]:
raise ValueError(f"Unsupported document type: {doc_type}") raise ValueError(f"Unsupported document type: {doc_type}")
+62
View File
@@ -5,6 +5,14 @@ import re
from pathlib import Path from pathlib import Path
def run(args): def run(args):
"""
Main entry point for the convert command.
Converts OCR JSON results to Markdown format.
Args:
args: Command line arguments parsed by argparse
"""
# If output file is specified, enable single file mode # If output file is specified, enable single file mode
if args.output_file: if args.output_file:
args.single_file = True args.single_file = True
@@ -13,11 +21,28 @@ def run(args):
# OCR response structure classes # OCR response structure classes
class OCRResponseImage: class OCRResponseImage:
"""
Represents an image in the OCR response.
Attributes:
id (str): Unique identifier for the image
image_base64 (str): Base64-encoded image data
"""
def __init__(self, id, image_base64): def __init__(self, id, image_base64):
self.id = id self.id = id
self.image_base64 = image_base64 self.image_base64 = image_base64
class OCRResponsePage: class OCRResponsePage:
"""
Represents a page in the OCR response.
Attributes:
index (int): Zero-based page index
markdown (str): Extracted text content in Markdown format
image (str, optional): Main page image (if available)
images (list): List of OCRResponseImage objects
dimensions (dict, optional): Page dimensions
"""
def __init__(self, index, markdown, image=None, images=None, dimensions=None): def __init__(self, index, markdown, image=None, images=None, dimensions=None):
self.index = index self.index = index
self.markdown = markdown self.markdown = markdown
@@ -26,6 +51,15 @@ class OCRResponsePage:
self.dimensions = dimensions self.dimensions = dimensions
class OCRResponseMetadata: class OCRResponseMetadata:
"""
Represents metadata in the OCR response.
Attributes:
title (str, optional): Document title
author (str, optional): Document author
creation_date (str, optional): Document creation date
page_count (int, optional): Total number of pages
"""
def __init__(self, title=None, author=None, creation_date=None, page_count=None): def __init__(self, title=None, author=None, creation_date=None, page_count=None):
self.title = title self.title = title
self.author = author self.author = author
@@ -33,11 +67,29 @@ class OCRResponseMetadata:
self.page_count = page_count self.page_count = page_count
class OCRResponse: class OCRResponse:
"""
Represents the complete OCR response.
Attributes:
pages (list): List of OCRResponsePage objects
metadata (OCRResponseMetadata): Document metadata
"""
def __init__(self, pages=None, metadata=None): def __init__(self, pages=None, metadata=None):
self.pages = pages or [] self.pages = pages or []
self.metadata = metadata or OCRResponseMetadata() self.metadata = metadata or OCRResponseMetadata()
def replace_image_references(content, images, include_images): def replace_image_references(content, images, include_images):
"""
Replace image references in markdown content with base64 data.
Args:
content (str): Markdown content with image references
images (list): List of OCRResponseImage objects
include_images (bool): Whether to include images in the output
Returns:
str: Markdown content with image references replaced with base64 data
"""
if not include_images or not images: if not include_images or not images:
return content return content
@@ -62,6 +114,16 @@ def replace_image_references(content, images, include_images):
return content return content
def convert_json_to_markdown(json_file, args): def convert_json_to_markdown(json_file, args):
"""
Convert OCR JSON results to Markdown format.
Args:
json_file (str): Path to the JSON file containing OCR results
args: Command line arguments containing conversion options
Raises:
SystemExit: If an error occurs during conversion
"""
try: try:
# Read JSON file # Read JSON file
with open(json_file, 'r', encoding='utf-8') as f: with open(json_file, 'r', encoding='utf-8') as f:
+13
View File
@@ -5,6 +5,19 @@ from pathlib import Path
from mistral_ocr.commands import process, convert from mistral_ocr.commands import process, convert
def run(args): def run(args):
"""
Main entry point for the markdown command.
Processes a document with OCR and converts the results to Markdown in one step.
This is a convenience command that combines the functionality of the 'process'
and 'convert' commands.
Args:
args: Command line arguments parsed by argparse
Raises:
SystemExit: If an error occurs during processing or conversion
"""
# Ensure that if --images is true, include_image_base64 is also true # Ensure that if --images is true, include_image_base64 is also true
include_image_base64 = args.images include_image_base64 = args.images
+37
View File
@@ -6,6 +6,14 @@ import urllib.parse
from mistral_ocr.client import MistralClient from mistral_ocr.client import MistralClient
def run(args): def run(args):
"""
Main entry point for the process command.
Processes a document with OCR, either from a URL or a local file.
Args:
args: Command line arguments parsed by argparse
"""
file_path = args.file file_path = args.file
# Determine if input is a URL or a local file # Determine if input is a URL or a local file
@@ -15,6 +23,17 @@ def run(args):
process_local_file(file_path, args.output_file, args.include_images) process_local_file(file_path, args.output_file, args.include_images)
def process_url(url, output_file, include_image_base64): def process_url(url, output_file, include_image_base64):
"""
Process a document from a URL.
Args:
url (str): URL of the document to process
output_file (str): Path to save the OCR results, or None for stdout
include_image_base64 (bool): Whether to include base64-encoded images in the output
Raises:
SystemExit: If an error occurs during processing
"""
try: try:
client = MistralClient() client = MistralClient()
@@ -35,6 +54,17 @@ def process_url(url, output_file, include_image_base64):
sys.exit(1) sys.exit(1)
def process_local_file(file_path, output_file, include_image_base64): def process_local_file(file_path, output_file, include_image_base64):
"""
Process a local document file.
Args:
file_path (str): Path to the local file to process
output_file (str): Path to save the OCR results, or None for stdout
include_image_base64 (bool): Whether to include base64-encoded images in the output
Raises:
SystemExit: If an error occurs during processing
"""
try: try:
print(f"Processing local file: {file_path}") print(f"Processing local file: {file_path}")
@@ -71,6 +101,13 @@ def process_local_file(file_path, output_file, include_image_base64):
sys.exit(1) sys.exit(1)
def handle_output(data, output_file): def handle_output(data, output_file):
"""
Handle the OCR response output.
Args:
data (bytes): JSON response data from the OCR API
output_file (str): Path to save the OCR results, or None for stdout
"""
# Pretty print the JSON response # Pretty print the JSON response
pretty_json = json.dumps(json.loads(data), indent=2) pretty_json = json.dumps(json.loads(data), indent=2)
+10 -3
View File
@@ -1,6 +1,13 @@
import sys import sys
from mistral_ocr import __version__
VERSION = "0.1.0"
def run(args): def run(args):
print(f"Mistral OCR CLI v{VERSION}") """
Main entry point for the version command.
Prints the current version of the Mistral OCR CLI.
Args:
args: Command line arguments parsed by argparse (not used)
"""
print(f"Mistral OCR CLI v{__version__}")
+8
View File
@@ -1 +1,9 @@
# Core dependencies
requests>=2.25.0 requests>=2.25.0
# Optional dependencies for enhanced functionality
markdown>=3.3.0 # For converting markdown to HTML
pytest>=6.0.0 # For running tests
pytest-cov>=2.0.0 # For test coverage reports
black>=21.0.0 # For code formatting
flake8>=3.9.0 # For linting
+17 -1
View File
@@ -1,10 +1,24 @@
from setuptools import setup, find_packages from setuptools import setup, find_packages
import os
import re
# Read version from __init__.py
with open(os.path.join('mistral_ocr', '__init__.py'), 'r') as f:
version_match = re.search(r"__version__\s*=\s*['\"]([^'\"]*)['\"]", f.read())
version = version_match.group(1) if version_match else '0.1.0'
# Read long description from README.md
with open('README.md', 'r', encoding='utf-8') as f:
long_description = f.read()
setup( setup(
name="mistral-ocr", name="mistral-ocr",
version="0.1.0", version=version,
description="A CLI tool for performing OCR on documents using Mistral AI", description="A CLI tool for performing OCR on documents using Mistral AI",
long_description=long_description,
long_description_content_type="text/markdown",
author="Mistral OCR Team", author="Mistral OCR Team",
url="https://github.com/yourusername/mistral-ocr-python",
packages=find_packages(), packages=find_packages(),
install_requires=[ install_requires=[
"requests>=2.25.0", "requests>=2.25.0",
@@ -23,6 +37,8 @@ setup(
"Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.10",
"Topic :: Text Processing",
"Topic :: Utilities",
], ],
python_requires=">=3.7", python_requires=">=3.7",
) )