Add WAV file support with MP3 conversion and reuse

- Add support for WAV files with automatic conversion to MP3 - Save converted MP3 files in the same directory as WAV files - Reuse existing MP3 files if already converted - Update documentation and requirements
2025-05-22 20:53:39 +02:00
parent b6e6e80cfb
commit c47089aa0d
6 changed files with 71 additions and 11 deletions
@@ -1,10 +1,13 @@
 # mknotes

-A command-line tool to transcribe all MP3 and M4A audio files in a directory using Faster Whisper, then enhance the transcriptions into comprehensive notes using OpenAI's GPT-4.1 model.
+A command-line tool to transcribe all MP3, M4A, and WAV audio files in a directory using Faster Whisper, then enhance the transcriptions into comprehensive notes using OpenAI's GPT-4.1 model.

 ## Features

- Batch transcribes all `.mp3` and `.m4a` files in a specified directory
+- Batch transcribes all `.mp3`, `.m4a`, and `.wav` files in a specified directory
+- Automatically converts WAV files to MP3 format before processing
+  - Converted MP3 files are saved in the same directory as the original WAV files
+  - Reuses existing MP3 files if they've already been converted
 - Saves transcriptions as `.txt` files
 - Enhances notes using GPT-4.1 with a custom prompt
 - Outputs enhanced notes in markdown format
@@ -23,6 +26,16 @@ source venv/bin/activate  # On Windows: venv\Scripts\activate

 # Install dependencies
 pip install -r requirements.txt
+
+# Install ffmpeg (required for WAV to MP3 conversion)
+# On Ubuntu/Debian:
+# sudo apt-get install ffmpeg
+
+# On macOS with Homebrew:
+# brew install ffmpeg
+
+# On Windows:
+# Download from https://ffmpeg.org/download.html and add to PATH
 ```

 ## Usage
@@ -32,7 +45,7 @@ export OPENAI_API_KEY="your-api-key-here"
 python main.py --input-dir /path/to/audio/files --output-dir /path/to/output [--turbo]
 ```

- `--input-dir`: Directory containing audio files (required)
+- `--input-dir`: Directory containing audio files (.mp3, .m4a, .wav) (required)
 - `--output-dir`: Directory for output files (default: "output")
 - `--turbo`: Enable turbo mode for faster inference (uses int8_float16 compute type)
 - `--force`: Force re-processing of files even if output files already exist
@@ -62,3 +75,5 @@ If a compute type is not supported, the program will try the next one in the lis
 - Python 3.8+
 - [Faster Whisper](https://github.com/SYSTRAN/faster-whisper)
 - [OpenAI Python SDK](https://github.com/openai/openai-python)
+- [pydub](https://github.com/jiaaro/pydub) (for WAV to MP3 conversion)
+- [ffmpeg](https://ffmpeg.org/) (required by pydub for audio conversion)