c47089aa0d
- Add support for WAV files with automatic conversion to MP3 - Save converted MP3 files in the same directory as WAV files - Reuse existing MP3 files if already converted - Update documentation and requirements
80 lines
2.7 KiB
Markdown
80 lines
2.7 KiB
Markdown
# mknotes
|
|
|
|
A command-line tool to transcribe all MP3, M4A, and WAV audio files in a directory using Faster Whisper, then enhance the transcriptions into comprehensive notes using OpenAI's GPT-4.1 model.
|
|
|
|
## Features
|
|
|
|
- Batch transcribes all `.mp3`, `.m4a`, and `.wav` files in a specified directory
|
|
- Automatically converts WAV files to MP3 format before processing
|
|
- Converted MP3 files are saved in the same directory as the original WAV files
|
|
- Reuses existing MP3 files if they've already been converted
|
|
- Saves transcriptions as `.txt` files
|
|
- Enhances notes using GPT-4.1 with a custom prompt
|
|
- Outputs enhanced notes in markdown format
|
|
- Configurable input and output directories
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
# Clone the repository
|
|
git clone https://github.com/yourusername/mknotes.git
|
|
cd mknotes
|
|
|
|
# Create a virtual environment
|
|
python -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
|
|
# Install dependencies
|
|
pip install -r requirements.txt
|
|
|
|
# Install ffmpeg (required for WAV to MP3 conversion)
|
|
# On Ubuntu/Debian:
|
|
# sudo apt-get install ffmpeg
|
|
|
|
# On macOS with Homebrew:
|
|
# brew install ffmpeg
|
|
|
|
# On Windows:
|
|
# Download from https://ffmpeg.org/download.html and add to PATH
|
|
```
|
|
|
|
## Usage
|
|
|
|
```bash
|
|
export OPENAI_API_KEY="your-api-key-here"
|
|
python main.py --input-dir /path/to/audio/files --output-dir /path/to/output [--turbo]
|
|
```
|
|
|
|
- `--input-dir`: Directory containing audio files (.mp3, .m4a, .wav) (required)
|
|
- `--output-dir`: Directory for output files (default: "output")
|
|
- `--turbo`: Enable turbo mode for faster inference (uses int8_float16 compute type)
|
|
- `--force`: Force re-processing of files even if output files already exist
|
|
|
|
### Turbo Mode Hardware Requirements
|
|
|
|
The `--turbo` flag enables faster inference using the `int8_float16` compute type, which can significantly speed up transcription. However, this requires:
|
|
|
|
- CUDA-compatible GPU with Tensor Cores (NVIDIA Ampere, Turing, or newer architecture)
|
|
- Or CPU with AVX2 support
|
|
|
|
If your hardware does not support this optimization, the program will automatically fall back to the next most compatible compute type and print a warning.
|
|
|
|
#### Compute Type Fallback
|
|
|
|
The program will attempt to use the most efficient compute type supported by your hardware, in the following order:
|
|
|
|
- `int8_float16` (if `--turbo` is enabled)
|
|
- `float16`
|
|
- `int8`
|
|
- `float32` (most compatible, works on virtually all hardware)
|
|
|
|
If a compute type is not supported, the program will try the next one in the list until successful.
|
|
|
|
## Requirements
|
|
|
|
- Python 3.8+
|
|
- [Faster Whisper](https://github.com/SYSTRAN/faster-whisper)
|
|
- [OpenAI Python SDK](https://github.com/openai/openai-python)
|
|
- [pydub](https://github.com/jiaaro/pydub) (for WAV to MP3 conversion)
|
|
- [ffmpeg](https://ffmpeg.org/) (required by pydub for audio conversion)
|