# Mistral OCR Architecture Documentation This document provides a comprehensive overview of the Mistral OCR CLI architecture, including UML diagrams to illustrate the system's structure, behavior, and interactions. ## Table of Contents - [Mistral OCR Architecture Documentation](#mistral-ocr-architecture-documentation) - [Table of Contents](#table-of-contents) - [System Overview](#system-overview) - [Class Diagram](#class-diagram) - [Component Diagram](#component-diagram) - [Sequence Diagrams](#sequence-diagrams) - [Process Document Workflow](#process-document-workflow) - [Convert JSON to Markdown Workflow](#convert-json-to-markdown-workflow) - [Combined Markdown Workflow](#combined-markdown-workflow) - [Image Extraction Workflow](#image-extraction-workflow) - [Activity Diagram](#activity-diagram) - [Use Case Diagram](#use-case-diagram) - [Data Flow Diagram](#data-flow-diagram) ## System Overview Mistral OCR is a command-line tool for processing documents with Mistral AI's OCR capabilities. The system allows users to extract text and structured content from PDF documents and images while preserving the original formatting and layout. The tool is structured around a set of commands: - `process`: Processes a document (local file or URL) with OCR and outputs JSON - `convert`: Converts the JSON output to Markdown - `markdown`: Combines the process and convert steps in one command - `version`: Displays the version information ## Class Diagram The following class diagram illustrates the main classes in the Mistral OCR system and their relationships: ```mermaid classDiagram class MistralClient { +BASE_URL: str +MAX_FILE_SIZE: int +api_key: str +session: requests.Session +__init__(api_key: str) +upload_file(file_path: str): str +get_file_url(file_id: str): str +process_ocr(doc_type: str, doc_source: str, include_image_base64: bool): bytes } class OCRResponse { +pages: list +metadata: OCRResponseMetadata +__init__(pages: list, metadata: OCRResponseMetadata) } class OCRResponseMetadata { +title: str +author: str +creation_date: str +page_count: int +__init__(title: str, author: str, creation_date: str, page_count: int) } class OCRResponsePage { +index: int +markdown: str +image: str +images: list +dimensions: dict +__init__(index: int, markdown: str, image: str, images: list, dimensions: dict) } class OCRResponseImage { +id: str +image_base64: str +__init__(id: str, image_base64: str) } OCRResponse "1" *-- "1" OCRResponseMetadata: contains OCRResponse "1" *-- "many" OCRResponsePage: contains OCRResponsePage "1" *-- "many" OCRResponseImage: contains ``` ## Component Diagram The following component diagram shows the high-level architecture of the Mistral OCR system: ```mermaid graph TD CLI[CLI Interface] --> ProcessCmd[Process Command] CLI --> ConvertCmd[Convert Command] CLI --> MarkdownCmd[Markdown Command] CLI --> VersionCmd[Version Command] ProcessCmd --> APIClient[Mistral API Client] ConvertCmd --> JSONParser[JSON Parser] MarkdownCmd --> ProcessCmd MarkdownCmd --> ConvertCmd APIClient --> ExternalAPI[Mistral AI API] JSONParser --> MarkdownGenerator[Markdown Generator] MarkdownGenerator --> ImageHandler[Image Handler] subgraph "External Services" ExternalAPI end subgraph "Core Components" APIClient JSONParser MarkdownGenerator ImageHandler end subgraph "Command Interface" ProcessCmd ConvertCmd MarkdownCmd VersionCmd end ``` ## Sequence Diagrams ### Process Document Workflow The following sequence diagram illustrates the workflow for processing a document with OCR: ```mermaid sequenceDiagram participant User participant CLI as CLI Interface participant Process as Process Command participant Client as Mistral Client participant API as Mistral AI API User->>CLI: mistral-ocr process document.pdf CLI->>Process: run(args) alt Local File Process->>Client: process_local_file(file_path, output_file, include_images) Client->>Client: upload_file(file_path) Client->>API: POST /files API-->>Client: file_id Client->>Client: get_file_url(file_id) Client->>API: GET /files/{file_id}/url API-->>Client: signed_url Client->>API: POST /ocr API-->>Client: OCR results (JSON) else URL Process->>Client: process_url(url, output_file, include_images) Client->>API: POST /ocr API-->>Client: OCR results (JSON) end Client-->>Process: OCR results (JSON) Process->>Process: handle_output(data, output_file) Process-->>CLI: Success message CLI-->>User: Display results or confirmation ``` ### Convert JSON to Markdown Workflow The following sequence diagram illustrates the workflow for converting JSON to Markdown: ```mermaid sequenceDiagram participant User participant CLI as CLI Interface participant Convert as Convert Command participant Parser as JSON Parser participant Generator as Markdown Generator User->>CLI: mistral-ocr convert results.json CLI->>Convert: run(args) Convert->>Parser: load and parse JSON Parser-->>Convert: OCR data structure Convert->>Generator: convert_json_to_markdown(json_file, args) alt Single File Mode Generator->>Generator: Process all pages into one file loop For each page Generator->>Generator: replace_image_references(content, images, include_images, extract_images, image_dir) end Generator->>Generator: Write combined markdown file else Multi-File Mode loop For each page Generator->>Generator: replace_image_references(content, images, include_images, extract_images, image_dir) Generator->>Generator: Write individual markdown file end end Generator-->>Convert: Success message Convert-->>CLI: Success message CLI-->>User: Display confirmation ``` ### Combined Markdown Workflow The following sequence diagram illustrates the combined workflow for processing a document and converting to Markdown in one step: ```mermaid sequenceDiagram participant User participant CLI as CLI Interface participant Markdown as Markdown Command participant Process as Process Command participant Convert as Convert Command participant Client as Mistral Client participant API as Mistral AI API User->>CLI: mistral-ocr markdown document.pdf CLI->>Markdown: run(args) Markdown->>Markdown: Create temp JSON file if needed alt Local File Markdown->>Process: process_local_file(file_path, json_output_path, include_image_base64) Process->>Client: upload_file(file_path) Client->>API: POST /files API-->>Client: file_id Client->>Client: get_file_url(file_id) Client->>API: GET /files/{file_id}/url API-->>Client: signed_url Client->>API: POST /ocr API-->>Client: OCR results (JSON) Client-->>Process: OCR results Process->>Process: Write JSON to file else URL Markdown->>Process: process_url(url, json_output_path, include_image_base64) Process->>Client: process_url(url, output_file, include_images) Client->>API: POST /ocr API-->>Client: OCR results (JSON) Client-->>Process: OCR results Process->>Process: Write JSON to file end Markdown->>Convert: convert_json_to_markdown(json_output_path, args) Convert->>Convert: Parse JSON and generate markdown Convert-->>Markdown: Success message Markdown->>Markdown: Clean up temp file if created Markdown-->>CLI: Success message CLI-->>User: Display confirmation ``` ### Image Extraction Workflow The following sequence diagram illustrates the workflow for extracting images to separate files: ```mermaid sequenceDiagram participant User participant CLI as CLI Interface participant Convert as Convert Command participant Generator as Markdown Generator participant ImageHandler as Image Handler participant FileSystem as File System User->>CLI: mistral-ocr convert results.json --images --extract-images CLI->>Convert: run(args) Convert->>Generator: convert_json_to_markdown(json_file, args) Generator->>Generator: Determine image directory Generator->>FileSystem: Create image directory if needed loop For each page with images Generator->>ImageHandler: replace_image_references(content, images, include_images, extract_images, image_dir) loop For each image ImageHandler->>ImageHandler: extract_image_to_file(image_base64, image_id, image_dir) ImageHandler->>FileSystem: Write image file FileSystem-->>ImageHandler: File path ImageHandler->>ImageHandler: Update markdown to reference file end ImageHandler-->>Generator: Updated markdown content end Generator->>FileSystem: Write markdown file(s) Generator-->>Convert: Success message Convert-->>CLI: Success message CLI-->>User: Display confirmation ``` ## Activity Diagram The following activity diagram illustrates the overall process flow of the Mistral OCR system: ```mermaid graph TD Start([Start]) --> ParseArgs[Parse Command Line Arguments] ParseArgs --> CommandCheck{Which Command?} CommandCheck -->|process| ProcessCommand[Process Command] CommandCheck -->|convert| ConvertCommand[Convert Command] CommandCheck -->|markdown| MarkdownCommand[Markdown Command] CommandCheck -->|version| VersionCommand[Version Command] CommandCheck -->|none| ShowHelp[Show Help] ProcessCommand --> InputCheck{Input Type?} InputCheck -->|Local File| UploadFile[Upload File to API] InputCheck -->|URL| ProcessURL[Process URL Directly] UploadFile --> GetSignedURL[Get Signed URL] GetSignedURL --> ProcessOCR[Process with OCR API] ProcessURL --> ProcessOCR ProcessOCR --> SaveJSON[Save JSON Results] SaveJSON --> ProcessEnd([Process End]) ConvertCommand --> LoadJSON[Load JSON File] LoadJSON --> ParseJSON[Parse JSON Structure] ParseJSON --> OutputModeCheck{Output Mode?} OutputModeCheck -->|Single File| CombinePages[Combine All Pages] OutputModeCheck -->|Multiple Files| ProcessPages[Process Each Page Separately] CombinePages --> ImageCheck{Include Images?} ProcessPages --> ImageCheck ImageCheck -->|No| GenerateMarkdown[Generate Markdown Without Images] ImageCheck -->|Yes| ExtractCheck{Extract Images?} ExtractCheck -->|No| EmbedImages[Embed Images as Base64] ExtractCheck -->|Yes| ExtractImages[Extract Images to Files] EmbedImages --> GenerateMarkdown ExtractImages --> UpdateReferences[Update Image References] UpdateReferences --> GenerateMarkdown GenerateMarkdown --> SaveMarkdownFiles[Save Markdown Files] SaveMarkdownFiles --> ConvertEnd([Convert End]) MarkdownCommand --> CreateTempJSON[Create Temporary JSON File] CreateTempJSON --> ProcessInMarkdown[Process Document] ProcessInMarkdown --> ConvertInMarkdown[Convert to Markdown] ConvertInMarkdown --> CleanupTemp[Cleanup Temporary Files] CleanupTemp --> MarkdownEnd([Markdown End]) VersionCommand --> ShowVersion[Show Version Information] ShowVersion --> VersionEnd([Version End]) ShowHelp --> HelpEnd([Help End]) ``` ## Use Case Diagram The following use case diagram illustrates the different ways users can interact with the Mistral OCR CLI: ```mermaid graph TD User((User)) subgraph "Mistral OCR CLI" ProcessDoc[Process Document] ProcessURL[Process URL] ConvertJSON[Convert JSON to Markdown] CombinedProcess[Process and Convert in One Step] ExtractImages[Extract Images to Files] EmbedImages[Embed Images in Markdown] CheckVersion[Check Version] end User -->|mistral-ocr process file.pdf| ProcessDoc User -->|mistral-ocr process https://example.com/doc.pdf| ProcessURL User -->|mistral-ocr convert results.json| ConvertJSON User -->|mistral-ocr markdown file.pdf| CombinedProcess User -->|--extract-images flag| ExtractImages User -->|--images flag| EmbedImages User -->|mistral-ocr version| CheckVersion ProcessDoc -.-> ConvertJSON ProcessURL -.-> ConvertJSON ConvertJSON -.-> ExtractImages ConvertJSON -.-> EmbedImages CombinedProcess -.-> ProcessDoc CombinedProcess -.-> ConvertJSON ``` ## Data Flow Diagram The following data flow diagram illustrates how data moves through the Mistral OCR system: ```mermaid graph TD Input[Document Input] API[Mistral AI API] JSONStorage[JSON Storage] MarkdownOutput[Markdown Output] ImageStorage[Image Storage] Input -->|Local File| FileUpload[File Upload] Input -->|URL| DirectProcess[Direct Processing] FileUpload --> API DirectProcess --> API API -->|OCR Results| JSONStorage JSONStorage -->|Parse| MarkdownGen[Markdown Generation] MarkdownGen -->|Text Content| MarkdownOutput MarkdownGen -->|Image Data| ImageCheck{Extract Images?} ImageCheck -->|Yes| ImageExtraction[Image Extraction] ImageCheck -->|No| Base64Embedding[Base64 Embedding] ImageExtraction --> ImageStorage ImageExtraction -->|Image References| MarkdownOutput Base64Embedding -->|Embedded Images| MarkdownOutput subgraph "Input Processing" Input FileUpload DirectProcess end subgraph "OCR Processing" API JSONStorage end subgraph "Output Generation" MarkdownGen ImageCheck ImageExtraction Base64Embedding MarkdownOutput ImageStorage end ``` These diagrams provide a comprehensive overview of the Mistral OCR system architecture, workflows, and interactions. They can be used to understand the system's structure, behavior, and to guide future development and maintenance efforts.