whisper

Author	SHA1	Message	Date
Jong Wook Kim	55f690af79	Release 20230124 v20230124	2023-01-24 11:11:08 -08:00
Jong Wook Kim	7f1ef223ab	handle printing even if sys.stdout.buffer is not available (#887 )	2023-01-24 10:12:04 -08:00
Niels Mayer	f5bfe004ec	Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228 ) * Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas> * for easier reading by spreadsheets importing CSV, the third column of the CSV file is delimited by quotes, and any quote characters that might be in the transcript (which would interfere with parsing the third column as a string) are converted to "''". * fix syntax error * docstring edit Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-22 00:27:17 -08:00
Aaryan YVS	da600abd2b	Added --output_format option (#333 ) * Added --output option --output option will help select the output files that will be generated. Corrected the logic, which wrongly shows progress bar when verbose is set to False * Changed output_files variable * Changed back the tqdm verbose * refactor output format handling Co-authored-by: Jong Wook Kim <jongwook@openai.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-21 23:58:38 -08:00
zer0-x	9f7aba6099	Handle XDG_CACHE_HOME properly for download_root (#864 ) Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2023-01-21 01:09:39 -08:00
Jong Wook Kim	12e1089462	use stdout for printing transcription progress (#867 )	2023-01-20 00:54:05 -08:00
Markus Hennerbichler	ea1c266709	Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm (#659 ) Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-18 10:41:11 -08:00
Jong Wook Kim	8135a7c31c	verbose outputs from pytest	2023-01-18 10:30:18 -08:00
Jong Wook Kim	9d646db9d8	print '?' if a letter can't be encoded using the system default encoding (#859 )	2023-01-17 23:28:36 -08:00
Jong Wook Kim	37a4f1be6d	Release 20230117 v20230117	2023-01-17 16:08:28 -08:00
Romain Beaumont	b9f9b433ae	Add github action to automatically push to pypi on Release x.y.z commit (#681 ) * Add github action to automatically push to pypi on Release x.y.z commit * some housekeeping for pypi upload * add version.py Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-17 15:50:26 -08:00
Umar Farooqi	f0083e7eb2	Use ndimage.median_filter instead of signal.medfilter (#812 ) For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s. Co-authored-by: Umar Farooqi <umar@paystash.com> Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2023-01-17 14:43:05 -08:00
Jong Wook Kim	a84191faae	rename GitHub workflow	2023-01-17 13:54:40 -08:00
Jong Wook Kim	b1d213c0c7	allow test_transcribe to run on CPU when CUDA is not available	2023-01-17 13:43:36 -08:00
Jong Wook Kim	493dfffa37	add github action to run pytest	2023-01-17 13:38:33 -08:00
Mikko Vedru	0f39c89d92	Update README.md (#804 )	2023-01-16 23:46:42 -08:00
Markus Hennerbichler	6df3ea1fb5	Support batch-dimension in log_mel_spectogram (#839 )	2023-01-16 23:46:15 -08:00
adamreis	70861c7ce3	Fix tiny transcribe() docstring typo (#857 ) s/successfully/successively, which I believe was the intent.	2023-01-16 22:42:01 -08:00
Jong Wook Kim	f82bc59f5e	torch.concatenate -> torch.cat for compatibility	2023-01-10 10:53:18 -08:00
Jong Wook Kim	28769fcfe5	word-level timestamps in Multilingual_ASR notebook	2022-12-31 10:03:42 -07:00
Jong Wook Kim	53807677fe	MultiHeadAttention to return qk as well	2022-12-30 01:53:57 -07:00
Jong Wook Kim	9323b2526c	Revert "saving the qk matrix in the attention module for convenience" This reverts commit `68e44bd83c`.	2022-12-29 23:53:31 -07:00
Jong Wook Kim	68e44bd83c	saving the qk matrix in the attention module for convenience	2022-12-29 23:02:52 -07:00
Jong Wook Kim	0b5dcfdef7	large-v2 figure and arxiv url update	2022-12-09 00:12:39 -05:00
altryne	b9265e5796	Update Hebrew language code to he per IANA registry (#401 ) * Update Hebrew language code to he per IANA registry Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he` The correct subtag: ``` %% Type: language Subtag: he Description: Hebrew Added: 2005-10-16 Suppress-Script: Hebr %% ``` And the deprecation ``` %% Type: language Subtag: iw Description: Hebrew Added: 2005-10-16 Deprecated: 1989-01-01 Preferred-Value: he Suppress-Script: Hebr %% ``` * Update hebrew ISO code to he Per discussion, it's ok to make this change without backwards compatibility	2022-12-07 13:45:31 -05:00
Paul Harter	fd8f80c8b8	Explicitly closing model file after reading it (#630 )	2022-12-06 12:07:19 -05:00
Jong Wook Kim	4179ed2475	add large-v2 model - The "large-v2" model is trained for more epochs with regularization and shows improved performance compared to the previous large. - It has the same architecture as the original large model. - When `load_model("large")` is called, the "large-v2" model will be loaded. - We will soon update the paper regarding this new model.	2022-12-05 11:07:14 -05:00
jumon	ec1b34bb90	fix compression ratio function (#561 )	2022-12-04 17:27:42 -06:00
Jong Wook Kim	eff383b27b	invoking __call__ instead of forward()	2022-11-16 04:18:50 -08:00
Jong Wook Kim	02aa851a49	fix to return only the text token ids	2022-11-15 16:25:11 -08:00
jumon	76148a56c5	suppress generating non-timestamp tokens at the beginning (#532 )	2022-11-15 11:44:36 -08:00
Vicki Anand	9f70a352f9	Fix attention caching to make it actually work (#370 )	2022-10-19 16:44:03 -07:00
Sumana Harihareswara	7f3e408e09	Add package metadata to setup.py (#315 ) Add project summary, license, etc. for display with "pip show" and similar Python package distribution tools.	2022-10-17 13:51:16 -07:00
Michael Monashev	f680570016	Fix bug (#305 ) Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)	2022-10-17 11:38:20 -07:00
Jong Wook Kim	d18e9ea5dd	transcribe() on English-only model won't complain when language="en" is not given	2022-10-09 02:40:12 -07:00
David Marx	82725cea9c	infer download_root from XDG_CACHE_HOME if avail (#257 )	2022-10-09 02:14:03 -07:00
eudoxos	35713c66e0	Add --threads option to transcribe (#278 ) * Add --threads option to transcribe Torch on CPU uses by default number_of_cores/2. This option allows to override this default. * Update transcribe.py Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>	2022-10-09 02:11:15 -07:00
Corentin Jemine	9e653bd0ea	Fixed CoW RuntimeError in DecodingTask.run() (#240 )	2022-10-04 08:49:31 -07:00
Tom Stuart	02b74308ff	Fix timestamps and strip extraneous whitespace in WebVTT output (#219 ) * Use two-digit hours in WebVTT timestamps Per the WebVTT specification [0]: > A WebVTT timestamp consists of the following components, in the given > order: > > 1. Optionally (required if hours is non-zero): > 1. Two or more ASCII digits, representing the hours as a base ten > integer. > 2. A U+003A COLON character (:) YouTube won’t accept timestamps containing single-digit hours. [0] https://www.w3.org/TR/webvtt1/#webvtt-timestamp * Strip segment text in WebVTT output We already do this for plain text and SubRip output, so we should do it for WebVTT too.	2022-10-03 14:51:07 -07:00
Jibin Mathew	0b1ba3d46e	Add model_dir to arguments (#202 ) * Add model_dir to arguments * minor formatting change Co-authored-by: Jong Wook Kim <jongwook@openai.com>	2022-09-30 14:45:51 -07:00
Caleb McQuillin	60132ade70	Use , character instead of . for SRT output. (#197 ) The SRT format uses the decimal comma character as the fractional separator rather than the decimal point character. Adjust format_timestamp and write_srt to specify the separator character. See https://en.wikipedia.org/wiki/SubRip#:~:text=the%20fractional%20separator%20used%20is%20the%20comma%2C%20since%20the%20program%20was%20written%20in%20france.	2022-09-29 20:44:12 -07:00
Jong Wook Kim	7cb4cc21bf	allowing nonzero initial temperature	2022-09-29 18:05:12 -07:00
Jong Wook Kim	30dc5c581b	pointer to the show and tell section	2022-09-29 14:57:49 -07:00
Szabolcs Pasztor	5905e503b8	Update README.md (#161 ) * Update README.md * merging paragraphs Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-29 14:18:54 -07:00
Fabiano	0457aac342	Adds missing command for install (mac) (#90 ) * Adds missing command for install (mac) Required for users who didn't previously have Rust installed. * minor wording change Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>	2022-09-29 14:08:58 -07:00
sawadata	deafef05f3	Update audio.py (#178 ) add '-nostdin' argument	2022-09-29 12:34:04 -07:00
Vicki Anand	2b0c2971af	Don't update duration if last timestamp is same as begin (#191 )	2022-09-29 12:27:48 -07:00
Jong Wook Kim	62fe7f1009	patience definition to match the paper	2022-09-27 19:00:41 -07:00
Nick Konovalchuk	b4308c4782	fix: transcribe verbosity (#140 )	2022-09-26 11:46:21 -07:00
Michael Goin	9c8183a179	Use PyTorch as logits transpose for ONNX support (#141 )	2022-09-26 10:54:26 -07:00

1 2

76 Commits