From e9a12d672561eff2a9099246037f4d06e6fcdbfa Mon Sep 17 00:00:00 2001 From: Heiko Joerg Schick Date: Thu, 24 Apr 2025 21:52:18 +0200 Subject: [PATCH] Add comprehensive architecture documentation with UML diagrams --- ARCHITECTURE.md | 435 ++++++++++++++++++++++++++++++++++++++++++++++++ CHANGELOG.md | 1 + README.md | 14 +- 3 files changed, 449 insertions(+), 1 deletion(-) create mode 100644 ARCHITECTURE.md diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 0000000..6535255 --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,435 @@ +# Mistral OCR Architecture Documentation + +This document provides a comprehensive overview of the Mistral OCR CLI architecture, including UML diagrams to illustrate the system's structure, behavior, and interactions. + +## Table of Contents + +- [Mistral OCR Architecture Documentation](#mistral-ocr-architecture-documentation) + - [Table of Contents](#table-of-contents) + - [System Overview](#system-overview) + - [Class Diagram](#class-diagram) + - [Component Diagram](#component-diagram) + - [Sequence Diagrams](#sequence-diagrams) + - [Process Document Workflow](#process-document-workflow) + - [Convert JSON to Markdown Workflow](#convert-json-to-markdown-workflow) + - [Combined Markdown Workflow](#combined-markdown-workflow) + - [Image Extraction Workflow](#image-extraction-workflow) + - [Activity Diagram](#activity-diagram) + - [Use Case Diagram](#use-case-diagram) + - [Data Flow Diagram](#data-flow-diagram) + +## System Overview + +Mistral OCR is a command-line tool for processing documents with Mistral AI's OCR capabilities. The system allows users to extract text and structured content from PDF documents and images while preserving the original formatting and layout. + +The tool is structured around a set of commands: +- `process`: Processes a document (local file or URL) with OCR and outputs JSON +- `convert`: Converts the JSON output to Markdown +- `markdown`: Combines the process and convert steps in one command +- `version`: Displays the version information + +## Class Diagram + +The following class diagram illustrates the main classes in the Mistral OCR system and their relationships: + +```mermaid +classDiagram + class MistralClient { + +BASE_URL: str + +MAX_FILE_SIZE: int + +api_key: str + +session: requests.Session + +__init__(api_key: str) + +upload_file(file_path: str): str + +get_file_url(file_id: str): str + +process_ocr(doc_type: str, doc_source: str, include_image_base64: bool): bytes + } + + class OCRResponse { + +pages: list + +metadata: OCRResponseMetadata + +__init__(pages: list, metadata: OCRResponseMetadata) + } + + class OCRResponseMetadata { + +title: str + +author: str + +creation_date: str + +page_count: int + +__init__(title: str, author: str, creation_date: str, page_count: int) + } + + class OCRResponsePage { + +index: int + +markdown: str + +image: str + +images: list + +dimensions: dict + +__init__(index: int, markdown: str, image: str, images: list, dimensions: dict) + } + + class OCRResponseImage { + +id: str + +image_base64: str + +__init__(id: str, image_base64: str) + } + + OCRResponse "1" *-- "1" OCRResponseMetadata: contains + OCRResponse "1" *-- "many" OCRResponsePage: contains + OCRResponsePage "1" *-- "many" OCRResponseImage: contains +``` + +## Component Diagram + +The following component diagram shows the high-level architecture of the Mistral OCR system: + +```mermaid +graph TD + CLI[CLI Interface] --> ProcessCmd[Process Command] + CLI --> ConvertCmd[Convert Command] + CLI --> MarkdownCmd[Markdown Command] + CLI --> VersionCmd[Version Command] + + ProcessCmd --> APIClient[Mistral API Client] + ConvertCmd --> JSONParser[JSON Parser] + MarkdownCmd --> ProcessCmd + MarkdownCmd --> ConvertCmd + + APIClient --> ExternalAPI[Mistral AI API] + + JSONParser --> MarkdownGenerator[Markdown Generator] + MarkdownGenerator --> ImageHandler[Image Handler] + + subgraph "External Services" + ExternalAPI + end + + subgraph "Core Components" + APIClient + JSONParser + MarkdownGenerator + ImageHandler + end + + subgraph "Command Interface" + ProcessCmd + ConvertCmd + MarkdownCmd + VersionCmd + end +``` + +## Sequence Diagrams + +### Process Document Workflow + +The following sequence diagram illustrates the workflow for processing a document with OCR: + +```mermaid +sequenceDiagram + participant User + participant CLI as CLI Interface + participant Process as Process Command + participant Client as Mistral Client + participant API as Mistral AI API + + User->>CLI: mistral-ocr process document.pdf + CLI->>Process: run(args) + + alt Local File + Process->>Client: process_local_file(file_path, output_file, include_images) + Client->>Client: upload_file(file_path) + Client->>API: POST /files + API-->>Client: file_id + Client->>Client: get_file_url(file_id) + Client->>API: GET /files/{file_id}/url + API-->>Client: signed_url + Client->>API: POST /ocr + API-->>Client: OCR results (JSON) + else URL + Process->>Client: process_url(url, output_file, include_images) + Client->>API: POST /ocr + API-->>Client: OCR results (JSON) + end + + Client-->>Process: OCR results (JSON) + Process->>Process: handle_output(data, output_file) + Process-->>CLI: Success message + CLI-->>User: Display results or confirmation +``` + +### Convert JSON to Markdown Workflow + +The following sequence diagram illustrates the workflow for converting JSON to Markdown: + +```mermaid +sequenceDiagram + participant User + participant CLI as CLI Interface + participant Convert as Convert Command + participant Parser as JSON Parser + participant Generator as Markdown Generator + + User->>CLI: mistral-ocr convert results.json + CLI->>Convert: run(args) + Convert->>Parser: load and parse JSON + Parser-->>Convert: OCR data structure + + Convert->>Generator: convert_json_to_markdown(json_file, args) + + alt Single File Mode + Generator->>Generator: Process all pages into one file + loop For each page + Generator->>Generator: replace_image_references(content, images, include_images, extract_images, image_dir) + end + Generator->>Generator: Write combined markdown file + else Multi-File Mode + loop For each page + Generator->>Generator: replace_image_references(content, images, include_images, extract_images, image_dir) + Generator->>Generator: Write individual markdown file + end + end + + Generator-->>Convert: Success message + Convert-->>CLI: Success message + CLI-->>User: Display confirmation +``` + +### Combined Markdown Workflow + +The following sequence diagram illustrates the combined workflow for processing a document and converting to Markdown in one step: + +```mermaid +sequenceDiagram + participant User + participant CLI as CLI Interface + participant Markdown as Markdown Command + participant Process as Process Command + participant Convert as Convert Command + participant Client as Mistral Client + participant API as Mistral AI API + + User->>CLI: mistral-ocr markdown document.pdf + CLI->>Markdown: run(args) + + Markdown->>Markdown: Create temp JSON file if needed + + alt Local File + Markdown->>Process: process_local_file(file_path, json_output_path, include_image_base64) + Process->>Client: upload_file(file_path) + Client->>API: POST /files + API-->>Client: file_id + Client->>Client: get_file_url(file_id) + Client->>API: GET /files/{file_id}/url + API-->>Client: signed_url + Client->>API: POST /ocr + API-->>Client: OCR results (JSON) + Client-->>Process: OCR results + Process->>Process: Write JSON to file + else URL + Markdown->>Process: process_url(url, json_output_path, include_image_base64) + Process->>Client: process_url(url, output_file, include_images) + Client->>API: POST /ocr + API-->>Client: OCR results (JSON) + Client-->>Process: OCR results + Process->>Process: Write JSON to file + end + + Markdown->>Convert: convert_json_to_markdown(json_output_path, args) + Convert->>Convert: Parse JSON and generate markdown + Convert-->>Markdown: Success message + + Markdown->>Markdown: Clean up temp file if created + Markdown-->>CLI: Success message + CLI-->>User: Display confirmation +``` + +### Image Extraction Workflow + +The following sequence diagram illustrates the workflow for extracting images to separate files: + +```mermaid +sequenceDiagram + participant User + participant CLI as CLI Interface + participant Convert as Convert Command + participant Generator as Markdown Generator + participant ImageHandler as Image Handler + participant FileSystem as File System + + User->>CLI: mistral-ocr convert results.json --images --extract-images + CLI->>Convert: run(args) + Convert->>Generator: convert_json_to_markdown(json_file, args) + + Generator->>Generator: Determine image directory + Generator->>FileSystem: Create image directory if needed + + loop For each page with images + Generator->>ImageHandler: replace_image_references(content, images, include_images, extract_images, image_dir) + + loop For each image + ImageHandler->>ImageHandler: extract_image_to_file(image_base64, image_id, image_dir) + ImageHandler->>FileSystem: Write image file + FileSystem-->>ImageHandler: File path + ImageHandler->>ImageHandler: Update markdown to reference file + end + + ImageHandler-->>Generator: Updated markdown content + end + + Generator->>FileSystem: Write markdown file(s) + Generator-->>Convert: Success message + Convert-->>CLI: Success message + CLI-->>User: Display confirmation +``` + +## Activity Diagram + +The following activity diagram illustrates the overall process flow of the Mistral OCR system: + +```mermaid +graph TD + Start([Start]) --> ParseArgs[Parse Command Line Arguments] + ParseArgs --> CommandCheck{Which Command?} + + CommandCheck -->|process| ProcessCommand[Process Command] + CommandCheck -->|convert| ConvertCommand[Convert Command] + CommandCheck -->|markdown| MarkdownCommand[Markdown Command] + CommandCheck -->|version| VersionCommand[Version Command] + CommandCheck -->|none| ShowHelp[Show Help] + + ProcessCommand --> InputCheck{Input Type?} + InputCheck -->|Local File| UploadFile[Upload File to API] + InputCheck -->|URL| ProcessURL[Process URL Directly] + + UploadFile --> GetSignedURL[Get Signed URL] + GetSignedURL --> ProcessOCR[Process with OCR API] + ProcessURL --> ProcessOCR + + ProcessOCR --> SaveJSON[Save JSON Results] + SaveJSON --> ProcessEnd([Process End]) + + ConvertCommand --> LoadJSON[Load JSON File] + LoadJSON --> ParseJSON[Parse JSON Structure] + ParseJSON --> OutputModeCheck{Output Mode?} + + OutputModeCheck -->|Single File| CombinePages[Combine All Pages] + OutputModeCheck -->|Multiple Files| ProcessPages[Process Each Page Separately] + + CombinePages --> ImageCheck{Include Images?} + ProcessPages --> ImageCheck + + ImageCheck -->|No| GenerateMarkdown[Generate Markdown Without Images] + ImageCheck -->|Yes| ExtractCheck{Extract Images?} + + ExtractCheck -->|No| EmbedImages[Embed Images as Base64] + ExtractCheck -->|Yes| ExtractImages[Extract Images to Files] + + EmbedImages --> GenerateMarkdown + ExtractImages --> UpdateReferences[Update Image References] + UpdateReferences --> GenerateMarkdown + + GenerateMarkdown --> SaveMarkdown[Save Markdown File(s)] + SaveMarkdown --> ConvertEnd([Convert End]) + + MarkdownCommand --> CreateTempJSON[Create Temporary JSON File] + CreateTempJSON --> ProcessCommand + ProcessCommand --> ConvertCommand + ConvertCommand --> CleanupTemp[Cleanup Temporary Files] + CleanupTemp --> MarkdownEnd([Markdown End]) + + VersionCommand --> ShowVersion[Show Version Information] + ShowVersion --> VersionEnd([Version End]) + + ShowHelp --> HelpEnd([Help End]) +``` + +## Use Case Diagram + +The following use case diagram illustrates the different ways users can interact with the Mistral OCR CLI: + +```mermaid +graph TD + User((User)) + + subgraph "Mistral OCR CLI" + ProcessDoc[Process Document] + ProcessURL[Process URL] + ConvertJSON[Convert JSON to Markdown] + CombinedProcess[Process and Convert in One Step] + ExtractImages[Extract Images to Files] + EmbedImages[Embed Images in Markdown] + CheckVersion[Check Version] + end + + User -->|mistral-ocr process file.pdf| ProcessDoc + User -->|mistral-ocr process https://example.com/doc.pdf| ProcessURL + User -->|mistral-ocr convert results.json| ConvertJSON + User -->|mistral-ocr markdown file.pdf| CombinedProcess + User -->|--extract-images flag| ExtractImages + User -->|--images flag| EmbedImages + User -->|mistral-ocr version| CheckVersion + + ProcessDoc -.-> ConvertJSON + ProcessURL -.-> ConvertJSON + ConvertJSON -.-> ExtractImages + ConvertJSON -.-> EmbedImages + CombinedProcess -.-> ProcessDoc + CombinedProcess -.-> ConvertJSON +``` + +## Data Flow Diagram + +The following data flow diagram illustrates how data moves through the Mistral OCR system: + +```mermaid +graph TD + Input[Document Input] + API[Mistral AI API] + JSONStorage[JSON Storage] + MarkdownOutput[Markdown Output] + ImageStorage[Image Storage] + + Input -->|Local File| FileUpload[File Upload] + Input -->|URL| DirectProcess[Direct Processing] + + FileUpload --> API + DirectProcess --> API + + API -->|OCR Results| JSONStorage + + JSONStorage -->|Parse| MarkdownGen[Markdown Generation] + + MarkdownGen -->|Text Content| MarkdownOutput + MarkdownGen -->|Image Data| ImageCheck{Extract Images?} + + ImageCheck -->|Yes| ImageExtraction[Image Extraction] + ImageCheck -->|No| Base64Embedding[Base64 Embedding] + + ImageExtraction --> ImageStorage + ImageExtraction -->|Image References| MarkdownOutput + + Base64Embedding -->|Embedded Images| MarkdownOutput + + subgraph "Input Processing" + Input + FileUpload + DirectProcess + end + + subgraph "OCR Processing" + API + JSONStorage + end + + subgraph "Output Generation" + MarkdownGen + ImageCheck + ImageExtraction + Base64Embedding + MarkdownOutput + ImageStorage + end +``` + +These diagrams provide a comprehensive overview of the Mistral OCR system architecture, workflows, and interactions. They can be used to understand the system's structure, behavior, and to guide future development and maintenance efforts. diff --git a/CHANGELOG.md b/CHANGELOG.md index 56a9985..851404a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - More detailed troubleshooting section - API response format documentation - Option to extract images to separate files instead of embedding them in markdown +- Comprehensive architecture documentation with UML diagrams (ARCHITECTURE.md) ## [0.1.0] - 2025-04-24 diff --git a/README.md b/README.md index ddd6d98..f56818d 100644 --- a/README.md +++ b/README.md @@ -303,6 +303,18 @@ Error converting JSON to markdown: Expecting property name enclosed in double qu - Supported file formats: PDF, JPG, JPEG, PNG, WEBP, GIF - Rate limits may apply depending on your Mistral AI account tier +## Architecture Documentation + +For a comprehensive overview of the Mistral OCR architecture, including UML diagrams, sequence diagrams, and other visual representations, please refer to the [ARCHITECTURE.md](ARCHITECTURE.md) document. This documentation provides detailed insights into: + +- Class structure and relationships +- Component architecture +- Process workflows +- Data flow through the system +- User interaction patterns + +These diagrams are useful for understanding the system design, onboarding new contributors, and planning future enhancements. + ## Contributing Contributions to Mistral OCR CLI are welcome! Here's how you can contribute: @@ -319,7 +331,7 @@ Contributions to Mistral OCR CLI are welcome! Here's how you can contribute: ``` 5. **Submit a pull request** -Please ensure your code follows the project's coding standards and includes appropriate tests and documentation. +Please ensure your code follows the project's coding standards and includes appropriate tests and documentation. For understanding the codebase structure, refer to the [ARCHITECTURE.md](ARCHITECTURE.md) document. ## License