Commit Graph

76 Commits

Author SHA1 Message Date
Jong Wook Kim 55f690af79 Release 20230124 v20230124 2023-01-24 11:11:08 -08:00
Jong Wook Kim 7f1ef223ab handle printing even if sys.stdout.buffer is not available (#887) 2023-01-24 10:12:04 -08:00
Niels Mayer f5bfe004ec Add TSV formatted output in transcript, using integer start/end times in milliseconds. (#228)
* Add CSV format output in transcript, containing lines of characters formatted like: <startTime-in-integer-milliseconds>, <endTime-in-integer-milliseconds>, <transcript-including-commas>

* for easier reading by spreadsheets importing CSV, the third

column of the CSV file is delimited by quotes, and any quote
characters that might be in the transcript (which would interfere with
parsing the third column as a string) are converted to "''".

* fix syntax error

* docstring edit

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-22 00:27:17 -08:00
Aaryan YVS da600abd2b Added --output_format option (#333)
* Added --output option

--output option will help select the output files that will be generated.

Corrected the logic, which wrongly shows progress bar when verbose is set to False

* Changed output_files variable

* Changed back the tqdm verbose

* refactor output format handling

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-21 23:58:38 -08:00
zer0-x 9f7aba6099 Handle XDG_CACHE_HOME properly for download_root (#864)
Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2023-01-21 01:09:39 -08:00
Jong Wook Kim 12e1089462 use stdout for printing transcription progress (#867) 2023-01-20 00:54:05 -08:00
Markus Hennerbichler ea1c266709 Fix bug where mm is mistakenly replaced with hmm in e.g. 20mm (#659)
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-18 10:41:11 -08:00
Jong Wook Kim 8135a7c31c verbose outputs from pytest 2023-01-18 10:30:18 -08:00
Jong Wook Kim 9d646db9d8 print '?' if a letter can't be encoded using the system default encoding (#859) 2023-01-17 23:28:36 -08:00
Jong Wook Kim 37a4f1be6d Release 20230117 v20230117 2023-01-17 16:08:28 -08:00
Romain Beaumont b9f9b433ae Add github action to automatically push to pypi on Release x.y.z commit (#681)
* Add github action to automatically push to pypi on Release x.y.z commit

* some housekeeping for pypi upload

* add version.py

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-17 15:50:26 -08:00
Umar Farooqi f0083e7eb2 Use ndimage.median_filter instead of signal.medfilter (#812)
For a 30s long audio file which didn't have any silence, ndimage.median_filter took 7s where signa.medfilter took 30s.

Co-authored-by: Umar Farooqi <umar@paystash.com>
Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2023-01-17 14:43:05 -08:00
Jong Wook Kim a84191faae rename GitHub workflow 2023-01-17 13:54:40 -08:00
Jong Wook Kim b1d213c0c7 allow test_transcribe to run on CPU when CUDA is not available 2023-01-17 13:43:36 -08:00
Jong Wook Kim 493dfffa37 add github action to run pytest 2023-01-17 13:38:33 -08:00
Mikko Vedru 0f39c89d92 Update README.md (#804) 2023-01-16 23:46:42 -08:00
Markus Hennerbichler 6df3ea1fb5 Support batch-dimension in log_mel_spectogram (#839) 2023-01-16 23:46:15 -08:00
adamreis 70861c7ce3 Fix tiny transcribe() docstring typo (#857)
s/successfully/successively, which I believe was the intent.
2023-01-16 22:42:01 -08:00
Jong Wook Kim f82bc59f5e torch.concatenate -> torch.cat for compatibility 2023-01-10 10:53:18 -08:00
Jong Wook Kim 28769fcfe5 word-level timestamps in Multilingual_ASR notebook 2022-12-31 10:03:42 -07:00
Jong Wook Kim 53807677fe MultiHeadAttention to return qk as well 2022-12-30 01:53:57 -07:00
Jong Wook Kim 9323b2526c Revert "saving the qk matrix in the attention module for convenience"
This reverts commit 68e44bd83c.
2022-12-29 23:53:31 -07:00
Jong Wook Kim 68e44bd83c saving the qk matrix in the attention module for convenience 2022-12-29 23:02:52 -07:00
Jong Wook Kim 0b5dcfdef7 large-v2 figure and arxiv url update 2022-12-09 00:12:39 -05:00
altryne b9265e5796 Update Hebrew language code to he per IANA registry (#401)
* Update Hebrew language code to he per IANA registry

Per [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`

The correct subtag: 
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
``` 
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```

* Update hebrew ISO code to he

Per discussion, it's ok to make this change without backwards compatibility
2022-12-07 13:45:31 -05:00
Paul Harter fd8f80c8b8 Explicitly closing model file after reading it (#630) 2022-12-06 12:07:19 -05:00
Jong Wook Kim 4179ed2475 add large-v2 model
- The "large-v2" model is trained for more epochs with regularization and shows improved performance compared to the previous large.
- It has the same architecture as the original large model.
- When `load_model("large")` is called, the "large-v2" model will be loaded.
- We will soon update the paper regarding this new model.
2022-12-05 11:07:14 -05:00
jumon ec1b34bb90 fix compression ratio function (#561) 2022-12-04 17:27:42 -06:00
Jong Wook Kim eff383b27b invoking __call__ instead of forward() 2022-11-16 04:18:50 -08:00
Jong Wook Kim 02aa851a49 fix to return only the text token ids 2022-11-15 16:25:11 -08:00
jumon 76148a56c5 suppress generating non-timestamp tokens at the beginning (#532) 2022-11-15 11:44:36 -08:00
Vicki Anand 9f70a352f9 Fix attention caching to make it actually work (#370) 2022-10-19 16:44:03 -07:00
Sumana Harihareswara 7f3e408e09 Add package metadata to setup.py (#315)
Add project summary, license, etc. for display with
"pip show" and similar Python package distribution tools.
2022-10-17 13:51:16 -07:00
Michael Monashev f680570016 Fix bug (#305)
Fix bug: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)
2022-10-17 11:38:20 -07:00
Jong Wook Kim d18e9ea5dd transcribe() on English-only model won't complain when language="en" is not given 2022-10-09 02:40:12 -07:00
David Marx 82725cea9c infer download_root from XDG_CACHE_HOME if avail (#257) 2022-10-09 02:14:03 -07:00
eudoxos 35713c66e0 Add --threads option to transcribe (#278)
* Add --threads option to transcribe

Torch on CPU uses by default number_of_cores/2. This option allows to
override this default.

* Update transcribe.py

Co-authored-by: Jong Wook Kim <ilikekjw@gmail.com>
2022-10-09 02:11:15 -07:00
Corentin Jemine 9e653bd0ea Fixed CoW RuntimeError in DecodingTask.run() (#240) 2022-10-04 08:49:31 -07:00
Tom Stuart 02b74308ff Fix timestamps and strip extraneous whitespace in WebVTT output (#219)
* Use two-digit hours in WebVTT timestamps

Per the WebVTT specification [0]:

> A WebVTT timestamp consists of the following components, in the given
> order:
>
> 1. Optionally (required if hours is non-zero):
>   1. Two or more ASCII digits, representing the hours as a base ten
>      integer.
>   2. A U+003A COLON character (:)

YouTube won’t accept timestamps containing single-digit hours.

[0] https://www.w3.org/TR/webvtt1/#webvtt-timestamp

* Strip segment text in WebVTT output

We already do this for plain text and SubRip output, so we should do it
for WebVTT too.
2022-10-03 14:51:07 -07:00
Jibin Mathew 0b1ba3d46e Add model_dir to arguments (#202)
* Add model_dir to arguments

* minor formatting change

Co-authored-by: Jong Wook Kim <jongwook@openai.com>
2022-09-30 14:45:51 -07:00
Caleb McQuillin 60132ade70 Use , character instead of . for SRT output. (#197)
The SRT format uses the decimal comma character as the fractional separator rather than the decimal point character. Adjust format_timestamp and write_srt to specify the separator character.

See https://en.wikipedia.org/wiki/SubRip#:~:text=the%20fractional%20separator%20used%20is%20the%20comma%2C%20since%20the%20program%20was%20written%20in%20france.
2022-09-29 20:44:12 -07:00
Jong Wook Kim 7cb4cc21bf allowing nonzero initial temperature 2022-09-29 18:05:12 -07:00
Jong Wook Kim 30dc5c581b pointer to the show and tell section 2022-09-29 14:57:49 -07:00
Szabolcs Pasztor 5905e503b8 Update README.md (#161)
* Update README.md

* merging paragraphs

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-29 14:18:54 -07:00
Fabiano 0457aac342 Adds missing command for install (mac) (#90)
* Adds missing command for install (mac)

Required for users who didn't previously have Rust installed.

* minor wording change

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
2022-09-29 14:08:58 -07:00
sawadata deafef05f3 Update audio.py (#178)
add '-nostdin' argument
2022-09-29 12:34:04 -07:00
Vicki Anand 2b0c2971af Don't update duration if last timestamp is same as begin (#191) 2022-09-29 12:27:48 -07:00
Jong Wook Kim 62fe7f1009 patience definition to match the paper 2022-09-27 19:00:41 -07:00
Nick Konovalchuk b4308c4782 fix: transcribe verbosity (#140) 2022-09-26 11:46:21 -07:00
Michael Goin 9c8183a179 Use PyTorch as logits transpose for ONNX support (#141) 2022-09-26 10:54:26 -07:00