Files

T

schihei f00f29ab6b Move prompts to separate files and add prompt types

- Created directory structure for prompts (system and user prompts)
- Added specialized prompts for lectures, meetings, and interviews
- Updated enhancer.py to load prompts from files
- Added --prompt-type CLI parameter to select prompt type
- Updated documentation and enhancement proposals

2025-05-22 21:28:36 +02:00

2.9 KiB

Raw Blame History

mknotes

A command-line tool to transcribe all MP3, M4A, and WAV audio files in a directory using Faster Whisper, then enhance the transcriptions into comprehensive notes using OpenAI's GPT-4.1 model.

Features

Batch transcribes all .mp3, .m4a, and .wav files in a specified directory
Automatically converts WAV files to MP3 format before processing
- Converted MP3 files are saved in the same directory as the original WAV files
- Reuses existing MP3 files if they've already been converted
Saves transcriptions as .txt files
Enhances notes using GPT-4.1 with a custom prompt
Outputs enhanced notes in markdown format
Configurable input and output directories

Installation

# Clone the repository
git clone https://github.com/yourusername/mknotes.git
cd mknotes

# Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install ffmpeg (required for WAV to MP3 conversion)
# On Ubuntu/Debian:
# sudo apt-get install ffmpeg

# On macOS with Homebrew:
# brew install ffmpeg

# On Windows:
# Download from https://ffmpeg.org/download.html and add to PATH

Usage

export OPENAI_API_KEY="your-api-key-here"
python main.py --input-dir /path/to/audio/files --output-dir /path/to/output [--turbo]

--input-dir: Directory containing audio files (.mp3, .m4a, .wav) (required)
--output-dir: Directory for output files (default: "output")
--turbo: Enable turbo mode for faster inference (uses int8_float16 compute type)
--force: Force re-processing of files even if output files already exist
--prompt-type: Type of content to enhance (choices: "lecture", "meeting", "interview", default: "lecture")

Turbo Mode Hardware Requirements

The --turbo flag enables faster inference using the int8_float16 compute type, which can significantly speed up transcription. However, this requires:

CUDA-compatible GPU with Tensor Cores (NVIDIA Ampere, Turing, or newer architecture)
Or CPU with AVX2 support

If your hardware does not support this optimization, the program will automatically fall back to the next most compatible compute type and print a warning.

Compute Type Fallback

The program will attempt to use the most efficient compute type supported by your hardware, in the following order:

int8_float16 (if --turbo is enabled)
float16
int8
float32 (most compatible, works on virtually all hardware)

If a compute type is not supported, the program will try the next one in the list until successful.

Requirements

Python 3.8+
Faster Whisper
OpenAI Python SDK
pydub (for WAV to MP3 conversion)
ffmpeg (required by pydub for audio conversion)

2.9 KiB Raw Blame History