whisper

Author	SHA1	Message	Date
Jong Wook Kim	919a713499	attempt to fix the repetition/hallucination issue identified in #1046 (#1052 ) * attempt to fix the repetition/hallucination issue identified in #1046 * zero-pad the audio instead of spectrogram * formatting fix * delete debug print	2023-03-07 20:08:45 -08:00
Jong Wook Kim	b80bcf610d	apply formatting with `black` (#1038 ) * applying black (with the default 88-column limit) * add flake8 * add isort * fix isort	2023-03-06 15:50:37 -08:00
Jong Wook Kim	500d0fe966	word-level timestamps in `transcribe()` (#869 ) * word-level timestamps in `transcribe()` * moving to `timing.py` * numba implementation for dtw, replacing dtw-python * triton implementation for dtw * add test for dtw implementations * triton implementation of median_filter * a simple word-level timestamps test * add scipy as dev dependency * installs an older version of Triton if CUDA < 11.4 * fix broken merge * loosen nvcc version match regex * find_alignment() function * miscellaneous improvements * skip median filtering when the input is too small * Expose punctuation options in cli and transcribe() (#973) * fix merge error * fix merge error 2 * annotating that word_timestamps is experimental --------- Co-authored-by: ryanheise <ryan@ryanheise.com>	2023-03-06 14:00:49 -08:00
Jong Wook Kim	eab8d920ed	Decoding improvements (#1033 ) * suppress task tokens (transcribe/translate) * not ignoring the last segment ending with one timestamp	2023-03-06 11:32:32 -08:00
Jong Wook Kim	a6b36ede1f	drop python 3.7 support (#889 )	2023-01-24 14:05:57 -08:00
Jong Wook Kim	7f1ef223ab	handle printing even if sys.stdout.buffer is not available (#887 )	2023-01-24 10:12:04 -08:00
Niels Mayer	f5bfe004ec	Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228 ) * Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas> * for easier reading by spreadsheets importing CSV, the third column of the CSV file is delimited by quotes, and any quote characters that might be in the transcript (which would interfere with parsing the third column as a string) are converted to "''". * fix syntax error * docstring edit Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-22 00:27:17 -08:00
Aaryan YVS	da600abd2b	Added --output_format option (#333 ) * Added --output option --output option will help select the output files that will be generated. Corrected the logic, which wrongly shows progress bar when verbose is set to False * Changed output_files variable * Changed back the tqdm verbose * refactor output format handling Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-21 23:58:38 -08:00
Jong Wook Kim	12e1089462	use stdout for printing transcription progress (#867 )	2023-01-20 00:54:05 -08:00
Jong Wook Kim	9d646db9d8	print '?' if a letter can't be encoded using the system default encoding (#859 )	2023-01-17 23:28:36 -08:00
adamreis	70861c7ce3	Fix tiny transcribe() docstring typo (#857 ) s/successfully/successively, which I believe was the intent.	2023-01-16 22:42:01 -08:00
Jong Wook Kim	02aa851a49	fix to return only the text token ids	2022-11-15 16:25:11 -08:00
Jong Wook Kim	d18e9ea5dd	transcribe() on English-only model won't complain when language="en" is not given	2022-10-09 02:40:12 -07:00
eudoxos	35713c66e0	Add --threads option to transcribe (#278 ) * Add --threads option to transcribe Torch on CPU uses by default number_of_cores/2. This option allows to override this default. * Update transcribe.py Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>	2022-10-09 02:11:15 -07:00
Jibin Mathew	0b1ba3d46e	Add model_dir to arguments (#202 ) * Add model_dir to arguments * minor formatting change Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2022-09-30 14:45:51 -07:00
Jong Wook Kim	7cb4cc21bf	allowing nonzero initial temperature	2022-09-29 18:05:12 -07:00
Vicki Anand	2b0c2971af	Don't update duration if last timestamp is same as begin (#191 )	2022-09-29 12:27:48 -07:00
Jong Wook Kim	62fe7f1009	patience definition to match the paper	2022-09-27 19:00:41 -07:00
Nick Konovalchuk	b4308c4782	fix: transcribe verbosity (#140 )	2022-09-26 11:46:21 -07:00
VulumeCode	2037b65f3f	Context prompt (#128 ) Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-26 05:22:33 -07:00
EliEron	fc0f40981d	Write each sentence as a separate line for the txt output (#101 ) * Write each sentence as a separate line for the txt output Write each sentence as a separate line for the txt output * Update utils.py Co-authored-by: EliEron <example@example.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-26 04:52:28 -07:00
fatih	ead77fab97	add srt subtitle export utility (#102 ) * add srt subtitle export utility * simplifying Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-26 03:50:26 -07:00
fatih	9e7e418ff1	add progress bar for transcribe loop (#100 ) * add progress bar to transcribe loop * improved warning message for English-only models * add --condition_on_previous_text * progressbar renames Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-26 03:24:13 -07:00
Jong Wook Kim	5d8d3e75a4	add --condition_on_previous_text	2022-09-25 05:16:08 -07:00
Jong Wook Kim	2d3032de01	improved warning message for English-only models	2022-09-25 02:10:36 -07:00
Jong Wook Kim	15ab548263	nocaptions -> nospeech to match the paper figure	2022-09-23 15:45:32 +09:00
mj-kh	61989529b7	Fix possible mistake when loading model to device (#57 ) Before this change, the model is loaded into GPU regardless of the value of "device" argument in CLI. (e.g. whisper "test.wav" --device cpu loads into GPU anyway)	2022-09-23 15:21:47 +09:00
hanacchi	c85eaaae29	Use UTF-8 encoding to save the txt and vtt files (#37 ) Explicitly set the text encoding to UTF-8 in order to avoid UnicodeEncodeErrors Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-23 12:10:55 +09:00
EliEron	759e8d47a8	Fix output_dir argument when audio file is a path (#45 )	2022-09-23 11:38:37 +09:00
Jong Wook Kim	834f00a0ea	making small model the default	2022-09-22 02:45:12 +09:00
Jong Wook Kim	6e3be77e1a	initial commit	2022-09-22 01:09:43 +09:00

31 Commits