Add comprehensive documentation and code comments

This commit adds extensive documentation to the Mistral OCR CLI project:

- Add API.md with detailed API response format documentation
- Add CHANGELOG.md to track version changes
- Add CONTRIBUTING.md with guidelines for contributors
- Enhance README.md with more detailed usage examples and troubleshooting
- Add proper docstrings to all Python modules and functions
- Update requirements.txt with development dependencies
- Improve setup.py with better metadata

These changes make the project more accessible to users and contributors.
This commit is contained in:
2025-04-24 21:11:41 +02:00
parent 240d64023b
commit 5e891ef461
13 changed files with 786 additions and 15 deletions
+135 -7
View File
@@ -1,6 +1,6 @@
# Mistral OCR CLI (Python)
A command-line tool for processing documents with Mistral AI's OCR capabilities, implemented in Python.
A command-line tool for processing documents with Mistral AI's OCR capabilities, implemented in Python. This tool allows you to extract text and structured content from PDF documents and images while preserving the original formatting and layout.
## Features
@@ -10,6 +10,20 @@ A command-line tool for processing documents with Mistral AI's OCR capabilities,
- Output results to stdout or to a file
- Convert OCR results to Markdown format
- Maintain document structure and formatting in the output
- Support for extracting and embedding images
- Metadata extraction (title, author, creation date)
- Page-by-page processing with optional single-file output
## How It Works
Mistral OCR CLI works by:
1. Uploading your document to the Mistral AI API (for local files) or providing the URL
2. Processing the document using Mistral's advanced OCR capabilities
3. Receiving structured JSON data containing the extracted text, formatting, and metadata
4. Optionally converting this data to Markdown format for easy reading and editing
The tool handles authentication, file uploads, API communication, and result formatting, making it easy to integrate OCR capabilities into your workflow.
## Installation
@@ -17,6 +31,7 @@ A command-line tool for processing documents with Mistral AI's OCR capabilities,
- Python 3.7 or later
- pip (Python package installer)
- A Mistral AI API key (sign up at [Mistral AI](https://mistral.ai) if you don't have one)
### Installing from source
@@ -26,7 +41,7 @@ cd mistral-ocr-python
pip install -e .
```
Alternatively, you can use the build script:
Alternatively, you can use the build script which creates a virtual environment and installs the package:
```bash
git clone https://github.com/yourusername/mistral-ocr-python
@@ -34,13 +49,19 @@ cd mistral-ocr-python
./build.sh
```
### Installing from PyPI (coming soon)
```bash
pip install mistral-ocr
```
## Usage
### Setting up your API key
You can provide your Mistral API key in two ways:
1. Environment variable:
1. Environment variable (recommended for security):
```bash
export MISTRAL_API_KEY=your-api-key
```
@@ -125,19 +146,19 @@ mistral-ocr version
### Examples
### Process a local PDF and save the output
#### Process a local PDF and save the output
```bash
mistral-ocr process ~/Documents/sample.pdf --output-file results.json
```
### Process a document from a URL
#### Process a document from a URL
```bash
mistral-ocr process https://arxiv.org/pdf/2201.04234 > output.json
```
### Convert OCR JSON to Markdown files
#### Convert OCR JSON to Markdown files
```bash
# Create separate files (one per page)
@@ -150,7 +171,7 @@ mistral-ocr convert output.json --single-file --output-dir markdown_docs
mistral-ocr convert output.json --output-file docs/paper.md
```
### Process a document and generate markdown files in one step
#### Process a document and generate markdown files in one step
```bash
# Generate separate files (one per page)
@@ -163,6 +184,113 @@ mistral-ocr markdown ~/Documents/research-paper.pdf --single-file --output-dir r
mistral-ocr markdown ~/Documents/research-paper.pdf --output-file research_docs/paper.md
```
## OCR Response Format
The OCR API returns a JSON response with the following structure:
```json
{
"metadata": {
"title": "Document Title",
"author": "Document Author",
"creation_date": "2023-01-01",
"page_count": 5
},
"pages": [
{
"index": 0,
"markdown": "# Page Content\n\nThis is the content of page 1...",
"images": [
{
"id": "image-1",
"image_base64": "base64-encoded-image-data"
}
]
},
{
"index": 1,
"markdown": "## Page 2 Content\n\nThis is the content of page 2...",
"images": []
}
]
}
```
### Key Components:
- **metadata**: Contains document-level information
- **title**: Document title (if available)
- **author**: Document author (if available)
- **creation_date**: Document creation date (if available)
- **page_count**: Total number of pages
- **pages**: Array of page objects
- **index**: Zero-based page index
- **markdown**: Extracted text in Markdown format
- **images**: Array of images found on the page
- **id**: Unique image identifier
- **image_base64**: Base64-encoded image data (only included if `--include-images` is specified)
## Troubleshooting
### Common Issues
#### API Key Issues
```
Error processing document: API key must be provided or set as MISTRAL_API_KEY environment variable
```
**Solution**: Ensure your API key is correctly set as an environment variable or provided with the `--api-key` flag.
#### File Size Limits
```
Error processing document: File is too large (55.00 MB). Maximum allowed size is 52.00 MB
```
**Solution**: The Mistral API has a file size limit of 52MB. For larger files, consider splitting them into smaller documents.
#### Rate Limiting
```
Error processing document: API returned error status: 429 - Rate limit exceeded
```
**Solution**: The API has rate limits. Wait a few minutes before trying again or contact Mistral AI to increase your rate limits.
#### Invalid JSON
```
Error converting JSON to markdown: Expecting property name enclosed in double quotes
```
**Solution**: Ensure the JSON file is valid. You can validate it using tools like `jq`.
### API Limitations
- Maximum file size: 52MB
- Supported file formats: PDF, JPG, JPEG, PNG, WEBP, GIF
- Rate limits may apply depending on your Mistral AI account tier
## Contributing
Contributions to Mistral OCR CLI are welcome! Here's how you can contribute:
1. **Fork the repository**
2. **Create a feature branch**:
```bash
git checkout -b feature/your-feature-name
```
3. **Make your changes**
4. **Run tests** (if available):
```bash
python -m unittest discover tests
```
5. **Submit a pull request**
Please ensure your code follows the project's coding standards and includes appropriate tests and documentation.
## License
MIT