mknotes/enhancement_proposals.md

# mknotes Improvement Recommendations

This document outlines proposed improvements for the mknotes software, grouped by category.

## Feature Enhancements

- **Support for More Audio Formats** ✅ (Partially implemented - WAV support added)
  - ✅ Added support for WAV files with automatic conversion to MP3 before processing
  - Extend support to include additional formats like FLAC, OGG, etc.
  - ✅ Updated the `find_audio_files` function in `utils.py` to recognize WAV extension

- **Customizable Enhancement Prompts** ✅ (Implemented)
  - ✅ Added CLI argument to select different prompt types (lecture, meeting, interview)
  - ✅ Moved prompts to separate files for easier customization
  - ✅ Created specialized prompts for different use cases (lectures, meeting minutes, interviews)
  - Allow users to provide their own custom prompts via configuration files.

- **Batch Processing Controls**
  - Add the ability to limit the number of files processed in one run.
  - Implement resume functionality for interrupted batch processing.

- **Output Format Options**
  - Support multiple output formats beyond Markdown (e.g., HTML, PDF, DOCX).
  - Add options for customizing Markdown styling.

- **Caching Mechanism**
  - Implement caching for OpenAI API calls to reduce costs and improve performance.
  - Store intermediate results to avoid reprocessing if enhancement fails.

## Technical Improvements

- **Robust Error Handling and Logging**
  - Implement a proper logging system instead of print statements.
  - Add comprehensive error handling with appropriate recovery strategies.
  - Example: Add retry logic for API calls with exponential backoff.

- **Configuration Management**
  - Create a configuration system using YAML/JSON files.
  - Allow users to set default values for all parameters.
  - Support environment-specific configurations.

- **API Key Management**
  - Implement a more secure way to handle API keys.
  - Add support for API key rotation.

- **Performance Optimization**
  - Implement parallel processing for transcription of multiple files.
  - Add an option to use local models for offline processing.

- **Testing Framework**
  - Add unit tests for core functionality.
  - Implement integration tests for the complete workflow.

## Code Structure Improvements

- **Separation of Concerns** ✅ (Partially implemented)
  - ✅ Moved prompts to separate files in a dedicated prompts directory.
  - Create a more abstract API client layer.

- **Progress Tracking and Reporting**
  - Enhance progress reporting beyond simple tqdm bars.
  - Add detailed statistics about processing time, token usage, etc.

- **Plugin Architecture**
  - Implement a plugin system to allow for custom transcription or enhancement modules.
  - Make it easier to switch between different AI models or services.

## User Experience Enhancements

- **Interactive Mode**
  - Add an interactive mode where users can preview and edit enhanced notes before saving.
  - Implement a simple TUI (Text User Interface) for a better CLI experience.

- **Web Interface**
  - Create a simple web interface for users who prefer GUI over CLI.
  - Consider a lightweight Flask/FastAPI app that wraps the core functionality.

- **Notification System**
  - Add notifications for long-running processes (email, desktop notifications).
  - Implement a webhook system for integration with other tools.

## Documentation Improvements

- **Enhanced Documentation**
  - Create comprehensive documentation with examples and use cases.
  - Add a troubleshooting guide for common issues.

- **Sample Configurations**
  - Provide sample configuration files for different use cases.
  - Include examples of custom prompts for different types of content.

## Error Recovery and Resilience

- **Robust Error Recovery**
  - Implement proper error recovery mechanisms to prevent data loss
  - Save transcriptions before enhancement to avoid losing work if API fails
  - Add retry mechanism with exponential backoff for failed API calls
  - Handle partial failures gracefully in batch processing

- **Process Persistence**
  - Add checkpoint/resume functionality for interrupted batch processing
  - Save processing state to allow continuation from last successful file
  - Implement transaction-like processing (all-or-nothing for each file)

## Memory and Performance Optimization

- **Large File Handling**
  - Implement streaming support for large audio files
  - Add chunked processing to avoid loading entire files into memory
  - Optimize memory usage for batch processing

- **Parallel Processing**
  - Implement concurrent transcription of multiple files
  - Add configurable worker threads/processes
  - Optimize API calls with batching where possible

## Security Improvements

- **API Key Security**
  - Validate API key before starting batch processing
  - Implement secure storage for API keys (keyring integration)
  - Remove unnecessary API key parameter passing
  - Add support for multiple API keys with rotation

## Advanced User Experience

- **Preview and Planning**
  - Add dry-run mode to preview what will be processed
  - Show estimated costs before processing
  - Implement interactive file selection mode

- **Processing Statistics**
  - Display comprehensive statistics after processing (tokens used, cost, time)
  - Add detailed progress reporting with ETA
  - Generate processing summary reports

- **Flexible Processing Options**
  - Add option to skip transcription and only enhance existing text files
  - Support for processing specific file patterns or date ranges
  - Implement file filtering based on duration or size

## Code Quality Improvements

- **Type Safety**
  - Add type hints throughout the codebase
  - Implement proper data validation
  - Use dataclasses for configuration objects

- **Dependency Management**
  - Remove standard library modules from requirements.txt (argparse)
  - Pin dependency versions for reproducibility
  - Add optional dependencies for advanced features

- **Configuration**
  - Remove hardcoded values (e.g., "gpt-4.1" model name)
  - Make all parameters configurable
  - Support multiple configuration profiles

## Extended Format Support

- **Video File Support**
  - Extract audio from video files (MP4, AVI, MOV, etc.)
  - Preserve video metadata in output
  - Option to generate subtitles

- **Advanced Audio Features**
  - Language detection and specification for transcription
  - Speaker diarization support
  - Audio preprocessing options (noise reduction, normalization)
  - Support for merging multiple audio files before processing

- **Offline Capabilities**
  - Support for custom Whisper model paths
  - Local LLM integration for enhancement
  - Fully offline mode with local models

## Output Management

- **Organization Features**
  - Organize output by date, source, or custom categories
  - Preserve original file metadata
  - Generate index/summary files for processed batches
  - Support for incremental updates to existing notes

- **Export Options**
  - Export to multiple formats simultaneously
  - Custom templates for different output formats
  - Metadata embedding in output files

## Integration Capabilities

- **Cloud Storage**
  - Direct integration with S3, Google Drive, Dropbox
  - Automatic backup of processed files
  - Cloud-based processing queue

- **Note-Taking Apps**
  - Direct export to Notion, Obsidian, Roam Research
  - Sync with existing note structures
  - Tag and categorization support

- **Automation**
  - Webhook notifications for processing completion
  - Custom post-processing script support
  - Integration with workflow automation tools (Zapier, IFTTT)
  - Watch folder functionality for automatic processing

- **API and SDK**
  - RESTful API for remote processing
  - Python SDK for programmatic access
  - Batch job scheduling support

---

These recommendations are intended to guide future development and prioritization for mknotes. Each suggestion can be implemented independently or as part of a broader roadmap.