Commit Graph

31 Commits

Author SHA1 Message Date
Jong Wook Kim 919a713499 attempt to fix the repetition/hallucination issue identified in #1046 (#1052)
* attempt to fix the repetition/hallucination issue identified in #1046

* zero-pad the audio instead of spectrogram

* formatting fix

* delete debug print
2023-03-07 20:08:45 -08:00
Jong Wook Kim b80bcf610d apply formatting with black (#1038)
* applying black (with the default 88-column limit)

* add flake8

* add isort

* fix isort
2023-03-06 15:50:37 -08:00
Jong Wook Kim 500d0fe966 word-level timestamps in transcribe() (#869)
* word-level timestamps in `transcribe()`

* moving to `timing.py`

* numba implementation for dtw, replacing dtw-python

* triton implementation for dtw

* add test for dtw implementations

* triton implementation of median_filter

* a simple word-level timestamps test

* add scipy as dev dependency

* installs an older version of Triton if CUDA < 11.4

* fix broken merge

* loosen nvcc version match regex

* find_alignment() function

* miscellaneous improvements

* skip median filtering when the input is too small

* Expose punctuation options in cli and transcribe() (#973)

* fix merge error

* fix merge error 2

* annotating that word_timestamps is experimental

---------

Co-authored-by: ryanheise <ryan@ryanheise.com>
2023-03-06 14:00:49 -08:00
Jong Wook Kim eab8d920ed Decoding improvements (#1033)
* suppress task tokens (transcribe/translate)

* not ignoring the last segment ending with one timestamp
2023-03-06 11:32:32 -08:00
Jong Wook Kim a6b36ede1f drop python 3.7 support (#889) 2023-01-24 14:05:57 -08:00
Jong Wook Kim 7f1ef223ab handle printing even if sys.stdout.buffer is not available (#887) 2023-01-24 10:12:04 -08:00
Niels Mayer f5bfe004ec Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228)
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>

* for easier reading by spreadsheets importing CSV, the third

column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".

* fix syntax error

* docstring edit

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-22 00:27:17 -08:00
Aaryan YVS da600abd2b Added --output_format option (#333)
* Added --output option

--output option will help select the output files that will be generated.

Corrected the logic, which wrongly shows progress bar when verbose is set to False

* Changed output_files variable

* Changed back the tqdm verbose

* refactor output format handling

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-21 23:58:38 -08:00
Jong Wook Kim 12e1089462 use stdout for printing transcription progress (#867) 2023-01-20 00:54:05 -08:00
Jong Wook Kim 9d646db9d8 print '?' if a letter can't be encoded using the system default encoding (#859) 2023-01-17 23:28:36 -08:00
adamreis 70861c7ce3 Fix tiny transcribe() docstring typo (#857)
s/successfully/successively, which I believe was the intent.
2023-01-16 22:42:01 -08:00
Jong Wook Kim 02aa851a49 fix to return only the text token ids 2022-11-15 16:25:11 -08:00
Jong Wook Kim d18e9ea5dd transcribe() on English-only model won't complain when language="en" is not given 2022-10-09 02:40:12 -07:00
eudoxos 35713c66e0 Add --threads option to transcribe (#278)
* Add --threads option to transcribe

Torch on CPU uses by default number_of_cores/2. This option allows to
override this default.

* Update transcribe.py

Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>
2022-10-09 02:11:15 -07:00
Jibin Mathew 0b1ba3d46e Add model_dir to arguments (#202)
* Add model_dir to arguments

* minor formatting change

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2022-09-30 14:45:51 -07:00
Jong Wook Kim 7cb4cc21bf allowing nonzero initial temperature 2022-09-29 18:05:12 -07:00
Vicki Anand 2b0c2971af Don't update duration if last timestamp is same as begin (#191) 2022-09-29 12:27:48 -07:00
Jong Wook Kim 62fe7f1009 patience definition to match the paper 2022-09-27 19:00:41 -07:00
Nick Konovalchuk b4308c4782 fix: transcribe verbosity (#140) 2022-09-26 11:46:21 -07:00
VulumeCode 2037b65f3f Context prompt (#128)
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 05:22:33 -07:00
EliEron fc0f40981d Write each sentence as a separate line for the txt output (#101)
* Write each sentence as a separate line for the txt output

Write each sentence as a separate line for the txt output

* Update utils.py

Co-authored-by: EliEron <example@example.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 04:52:28 -07:00
fatih ead77fab97 add srt subtitle export utility (#102)
* add srt subtitle export utility

* simplifying

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 03:50:26 -07:00
fatih 9e7e418ff1 add progress bar for transcribe loop (#100)
* add progress bar to transcribe loop

* improved warning message for English-only models

* add --condition_on_previous_text

* progressbar renames

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-26 03:24:13 -07:00
Jong Wook Kim 5d8d3e75a4 add --condition_on_previous_text 2022-09-25 05:16:08 -07:00
Jong Wook Kim 2d3032de01 improved warning message for English-only models 2022-09-25 02:10:36 -07:00
Jong Wook Kim 15ab548263 nocaptions -> nospeech to match the paper figure 2022-09-23 15:45:32 +09:00
mj-kh 61989529b7 Fix possible mistake when loading model to device (#57)
Before this change, the model is loaded into GPU regardless of the value of "device" argument in CLI.

(e.g. whisper "test.wav" --device cpu loads into GPU anyway)
2022-09-23 15:21:47 +09:00
hanacchi c85eaaae29 Use UTF-8 encoding to save the txt and vtt files (#37)
Explicitly set the text encoding to UTF-8 in order to avoid UnicodeEncodeErrors

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-23 12:10:55 +09:00
EliEron 759e8d47a8 Fix output_dir argument when audio file is a path (#45) 2022-09-23 11:38:37 +09:00
Jong Wook Kim 834f00a0ea making small model the default 2022-09-22 02:45:12 +09:00
Jong Wook Kim 6e3be77e1a initial commit 2022-09-22 01:09:43 +09:00