# mknotes Improvement Recommendations This document outlines proposed improvements for the mknotes software, grouped by category. ## Feature Enhancements - **Support for More Audio Formats** ✅ (Partially implemented - WAV support added) - ✅ Added support for WAV files with automatic conversion to MP3 before processing - Extend support to include additional formats like FLAC, OGG, etc. - ✅ Updated the `find_audio_files` function in `utils.py` to recognize WAV extension - **Customizable Enhancement Prompts** ✅ (Implemented) - ✅ Added CLI argument to select different prompt types (lecture, meeting, interview) - ✅ Moved prompts to separate files for easier customization - ✅ Created specialized prompts for different use cases (lectures, meeting minutes, interviews) - Allow users to provide their own custom prompts via configuration files. - **Batch Processing Controls** - Add the ability to limit the number of files processed in one run. - Implement resume functionality for interrupted batch processing. - **Output Format Options** - Support multiple output formats beyond Markdown (e.g., HTML, PDF, DOCX). - Add options for customizing Markdown styling. - **Caching Mechanism** - Implement caching for OpenAI API calls to reduce costs and improve performance. - Store intermediate results to avoid reprocessing if enhancement fails. ## Technical Improvements - **Robust Error Handling and Logging** - Implement a proper logging system instead of print statements. - Add comprehensive error handling with appropriate recovery strategies. - Example: Add retry logic for API calls with exponential backoff. - **Configuration Management** - Create a configuration system using YAML/JSON files. - Allow users to set default values for all parameters. - Support environment-specific configurations. - **API Key Management** - Implement a more secure way to handle API keys. - Add support for API key rotation. - **Performance Optimization** - Implement parallel processing for transcription of multiple files. - Add an option to use local models for offline processing. - **Testing Framework** - Add unit tests for core functionality. - Implement integration tests for the complete workflow. ## Code Structure Improvements - **Separation of Concerns** ✅ (Partially implemented) - ✅ Moved prompts to separate files in a dedicated prompts directory. - Create a more abstract API client layer. - **Progress Tracking and Reporting** - Enhance progress reporting beyond simple tqdm bars. - Add detailed statistics about processing time, token usage, etc. - **Plugin Architecture** - Implement a plugin system to allow for custom transcription or enhancement modules. - Make it easier to switch between different AI models or services. ## User Experience Enhancements - **Interactive Mode** - Add an interactive mode where users can preview and edit enhanced notes before saving. - Implement a simple TUI (Text User Interface) for a better CLI experience. - **Web Interface** - Create a simple web interface for users who prefer GUI over CLI. - Consider a lightweight Flask/FastAPI app that wraps the core functionality. - **Notification System** - Add notifications for long-running processes (email, desktop notifications). - Implement a webhook system for integration with other tools. ## Documentation Improvements - **Enhanced Documentation** - Create comprehensive documentation with examples and use cases. - Add a troubleshooting guide for common issues. - **Sample Configurations** - Provide sample configuration files for different use cases. - Include examples of custom prompts for different types of content. ## Error Recovery and Resilience - **Robust Error Recovery** - Implement proper error recovery mechanisms to prevent data loss - Save transcriptions before enhancement to avoid losing work if API fails - Add retry mechanism with exponential backoff for failed API calls - Handle partial failures gracefully in batch processing - **Process Persistence** - Add checkpoint/resume functionality for interrupted batch processing - Save processing state to allow continuation from last successful file - Implement transaction-like processing (all-or-nothing for each file) ## Memory and Performance Optimization - **Large File Handling** - Implement streaming support for large audio files - Add chunked processing to avoid loading entire files into memory - Optimize memory usage for batch processing - **Parallel Processing** - Implement concurrent transcription of multiple files - Add configurable worker threads/processes - Optimize API calls with batching where possible ## Security Improvements - **API Key Security** - Validate API key before starting batch processing - Implement secure storage for API keys (keyring integration) - Remove unnecessary API key parameter passing - Add support for multiple API keys with rotation ## Advanced User Experience - **Preview and Planning** - Add dry-run mode to preview what will be processed - Show estimated costs before processing - Implement interactive file selection mode - **Processing Statistics** - Display comprehensive statistics after processing (tokens used, cost, time) - Add detailed progress reporting with ETA - Generate processing summary reports - **Flexible Processing Options** - Add option to skip transcription and only enhance existing text files - Support for processing specific file patterns or date ranges - Implement file filtering based on duration or size ## Code Quality Improvements - **Type Safety** - Add type hints throughout the codebase - Implement proper data validation - Use dataclasses for configuration objects - **Dependency Management** - Remove standard library modules from requirements.txt (argparse) - Pin dependency versions for reproducibility - Add optional dependencies for advanced features - **Configuration** - Remove hardcoded values (e.g., "gpt-4.1" model name) - Make all parameters configurable - Support multiple configuration profiles ## Extended Format Support - **Video File Support** - Extract audio from video files (MP4, AVI, MOV, etc.) - Preserve video metadata in output - Option to generate subtitles - **Advanced Audio Features** - Language detection and specification for transcription - Speaker diarization support - Audio preprocessing options (noise reduction, normalization) - Support for merging multiple audio files before processing - **Offline Capabilities** - Support for custom Whisper model paths - Local LLM integration for enhancement - Fully offline mode with local models ## Output Management - **Organization Features** - Organize output by date, source, or custom categories - Preserve original file metadata - Generate index/summary files for processed batches - Support for incremental updates to existing notes - **Export Options** - Export to multiple formats simultaneously - Custom templates for different output formats - Metadata embedding in output files ## Integration Capabilities - **Cloud Storage** - Direct integration with S3, Google Drive, Dropbox - Automatic backup of processed files - Cloud-based processing queue - **Note-Taking Apps** - Direct export to Notion, Obsidian, Roam Research - Sync with existing note structures - Tag and categorization support - **Automation** - Webhook notifications for processing completion - Custom post-processing script support - Integration with workflow automation tools (Zapier, IFTTT) - Watch folder functionality for automatic processing - **API and SDK** - RESTful API for remote processing - Python SDK for programmatic access - Batch job scheduling support --- These recommendations are intended to guide future development and prioritization for mknotes. Each suggestion can be implemented independently or as part of a broader roadmap.