Add feature to extract images as separate files

This commit is contained in:
2025-04-24 21:44:49 +02:00
parent 012755b7f4
commit 220864d52f
6 changed files with 197 additions and 17 deletions
+30
View File
@@ -113,6 +113,12 @@ mistral-ocr convert results.json --output-file document.md
# Include images in markdown (if available in JSON)
mistral-ocr convert results.json --images
# Extract images to files instead of embedding them in markdown
mistral-ocr convert results.json --images --extract-images
# Specify a custom directory for extracted images
mistral-ocr convert results.json --images --extract-images --image-dir images_folder
```
#### Process and Convert in One Step
@@ -134,6 +140,12 @@ mistral-ocr markdown path/to/document.pdf --output-file docs/result.md
# Save intermediate JSON and generate markdown files
mistral-ocr markdown path/to/document.pdf --json-file results.json --output-dir docs
# Extract images to files instead of embedding them in markdown
mistral-ocr markdown path/to/document.pdf --images --extract-images
# Specify a custom directory for extracted images
mistral-ocr markdown path/to/document.pdf --images --extract-images --image-dir custom_images
```
This command combines the `process` and `convert` steps, creating markdown files directly from the document.
@@ -182,8 +194,26 @@ mistral-ocr markdown ~/Documents/research-paper.pdf --single-file --output-dir r
# Generate a single markdown file with specific filename
mistral-ocr markdown ~/Documents/research-paper.pdf --output-file research_docs/paper.md
# Process a document and extract images to separate files
mistral-ocr markdown ~/Documents/research-paper.pdf --images --extract-images --output-dir research_docs
```
## Image Handling
The tool provides several options for handling images in the OCR output:
1. **No images**: By default, images are not included in the output.
2. **Embedded images**: Using the `--images` flag without `--extract-images` will embed base64-encoded images directly in the markdown file. This creates a self-contained document but can result in very large files.
3. **Extracted images**: Using both `--images` and `--extract-images` flags will:
- Extract images from the OCR results
- Save them as separate files in an images directory
- Reference these files in the markdown instead of embedding the base64 data
You can specify a custom directory for extracted images using the `--image-dir` option. If not specified, images will be saved in a subdirectory called "images" within the output directory.
## OCR Response Format
The OCR API returns a JSON response with the following structure: