# mknotes A command-line tool to transcribe all MP3, M4A, and WAV audio files in a directory using Faster Whisper, then enhance the transcriptions into comprehensive notes using OpenAI's GPT-4.1 model. ## Features - Batch transcribes all `.mp3`, `.m4a`, and `.wav` files in a specified directory - Automatically converts WAV files to MP3 format before processing - Converted MP3 files are saved in the same directory as the original WAV files - Reuses existing MP3 files if they've already been converted - Saves transcriptions as `.txt` files - Enhances notes using GPT-4.1 with a custom prompt - Outputs enhanced notes in markdown format - Configurable input and output directories ## Installation ```bash # Clone the repository git clone https://github.com/yourusername/mknotes.git cd mknotes # Create a virtual environment python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Install ffmpeg (required for WAV to MP3 conversion) # On Ubuntu/Debian: # sudo apt-get install ffmpeg # On macOS with Homebrew: # brew install ffmpeg # On Windows: # Download from https://ffmpeg.org/download.html and add to PATH ``` ## Usage ```bash export OPENAI_API_KEY="your-api-key-here" python main.py --input-dir /path/to/audio/files --output-dir /path/to/output [--turbo] ``` - `--input-dir`: Directory containing audio files (.mp3, .m4a, .wav) (required) - `--output-dir`: Directory for output files (default: "output") - `--turbo`: Enable turbo mode for faster inference (uses int8_float16 compute type) - `--force`: Force re-processing of files even if output files already exist ### Turbo Mode Hardware Requirements The `--turbo` flag enables faster inference using the `int8_float16` compute type, which can significantly speed up transcription. However, this requires: - CUDA-compatible GPU with Tensor Cores (NVIDIA Ampere, Turing, or newer architecture) - Or CPU with AVX2 support If your hardware does not support this optimization, the program will automatically fall back to the next most compatible compute type and print a warning. #### Compute Type Fallback The program will attempt to use the most efficient compute type supported by your hardware, in the following order: - `int8_float16` (if `--turbo` is enabled) - `float16` - `int8` - `float32` (most compatible, works on virtually all hardware) If a compute type is not supported, the program will try the next one in the list until successful. ## Requirements - Python 3.8+ - [Faster Whisper](https://github.com/SYSTRAN/faster-whisper) - [OpenAI Python SDK](https://github.com/openai/openai-python) - [pydub](https://github.com/jiaaro/pydub) (for WAV to MP3 conversion) - [ffmpeg](https://ffmpeg.org/) (required by pydub for audio conversion)