Whisper (OpenAI) - Automatic English Subtitles for Any Film in Any Language - An Intro & Guide to Subtitling JAV

Besh · Jan 3, 2024

mei2 said:
To my knowledge there are no models specifically trained for JAV --someone earlier in this forum was suggesting to embark on it. In general for Japanese you'd need a large model. The medium model does a decent job to give you a sense of the topic/dialogue but that's it.

@mei2

Is this a new thing? Never received that warning before:

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning: The secret `HF_TOKEN` does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks.

config.json reach 100% but stop there, nothing after. Never had to use HF Token with your collab before.

Version used is v0_6i

-Besh

mei2 · Jan 3, 2024

Besh said:
Is this a new thing? Never received that warning before:

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning: The secret `HF_TOKEN` does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks.

config.json reach 100% but stop there, nothing after. Never had to use HF Token with you collab before.

Version used is v0_6i

-Besh

Yes, I just received an issue report on Github too. It seems that something got broken in colab environment. Interestingly Google hasn't announced any official upgrades to colab since 18th December, but they have changed the security policy and some other updates, it seems.

I haven't been able to find a workaround for it yet. Will still dig into it.

cc: @dickie389389 , @bobe123 , @MrKid

mei2 · Jan 4, 2024

cc: @dickie389389 , @bobe123 , @MrKid , @Besh

Here is a quick workaround: version 1.0 (beta)

GitHub - meizhong986/WhisperJAV

Contribute to meizhong986/WhisperJAV development by creating an account on GitHub.

github.com

The root of the error seems to be a code break caused by the recent Colab upgrade: mayankmalik-colab .
In this beta release I have replaced faster-whisper with stable-ts package. For now, this release does not use the quantised model for faster speed. We have to wait untill the rot cause is solved to go back to that.

I have tested this relase only on a handful of test audios. If you come across any issues please let me know.

mei2 · Jan 4, 2024

Epanding on my previous post, for those who prefer faster whisper (2x speed):

There is a workaround to run 0.6i (faster whisper). The workaround is not the smoothest but it does the job. Follow the gudiline here:

i encountered issue in 0.6 version · Issue #3 · meizhong986/WhisperJAV

when i converting my audio file this message appeared on last step/process

github.com

Steps:
1. Run the cells (or all cells) once.
2. Select 'tools' (in the menu bar above) --> 'Command palette'.
3. Select 'use fallback runtime version'.
4. Run the cells (or all cells) again.

Note: the fallback environment is supposed to be a temporary solution. The google rep in the colab community says that they're doing their best to keep the environment up longer than usual, but eventually they need to shut it down. I hope by then we will have a robust solution.

mei2 · Jan 5, 2024

@MrKid , @Besh , @dickie389389

The fix is in:

Version 0.7:
https://github.com/meizhong986/WhisperJAV

This version is functionally the same as 06i (faster whisper), and has fixed the break after the colab upgarde. There is no need for falback runtime in this version.

Besh · Jan 5, 2024

mei2 said:
@MrKid , @Besh , @dickie389389

The fix is in:

Version 0.7:
https://github.com/meizhong986/WhisperJAV

This version is functionally the same as 06i (faster whisper), and has fixed the break after the colab upgarde. There is no need for falback runtime in this version.

very good, will test it today. Thanks for your work Mei2

-Besh

tguy1982 · Jan 9, 2024

Besh said:
very good, will test it today. Thanks for your work Mei2

-Besh

mei2 said:
@MrKid , @Besh , @dickie389389

The fix is in:

Version 0.7:
https://github.com/meizhong986/WhisperJAV

This version is functionally the same as 06i (faster whisper), and has fixed the break after the colab upgarde. There is no need for falback runtime in this version.

Any chance you will add the V3 model

mei2 · Jan 12, 2024

tguy1982 said:
Any chance you will add the V3 model

My assessment so far:

large-v3: do NOT use. Wait for the next update.

large-v2: use this one for general purpose.

large-v1: use this one for cross talks and multiple speakers

I'd wait untill the Whisper main branch is updated with necessary fixes. There are few fixes suggested to reduce V3's wild hallucination but still not ready for the prime time. Even the most robust fixes that have been suggested so far only make the results of V3 to be equal to V2 in quality. So no gain.

It's a pity that V3 has been such a big disappointment. OpenAI estimated that V3 would improve Japanese WER by 18%, but I guess they failed to measure that hallucination got worst by 200%

tguy1982 · Jan 13, 2024

mei2 said:
My assessment so far:

large-v3: do NOT use. Wait for the next update.
large-v2: use this one for general purpose.
large-v1: use this one for cross talks and multiple speakers

I'd wait untill the Whisper main branch is updated with necessary fixes. There are few fixes suggested to reduce V3's wild hallucination but still not ready for the prime time. Even the most robust fixes that have been suggested so far only make the results of V3 to be equal to V2 in quality. So no gain.

It's a pity that V3 has been such a big disappointment. OpenAI estimated that V3 would improve Japanese WER by 18%, but I guess they failed to measure that hallucination got worst by 200%

ok thanks much appreciated.

another question: what does "chunk_duration:" do, the default is 4 what does increasing or decreasing it do?

also "I'm going to sleep." is a phases said a lot in error.

porgate55555 · Jan 20, 2024

Does anybody else experiences timing issues fairly often when using WhisperWithVAD (up to ~30sec sometimes)?

mei2 · Jan 23, 2024

porgate55555 said:
Does anybody else experiences timing issues fairly often when using WhisperWithVAD (up to ~30sec sometimes)?

Are you using large-v2 model? And not large-v3 (and of course "large").

porgate55555 · Jan 24, 2024

mei2 said:
Are you using large-v2 model? And not large-v3 (and of course "large").

Yes, large-v2. Was also using large-v1 and large-v3 but all of them have the same flaw. There seems no apparent reason. Sometimes it is perfectly fine but then completely out of sync. Even retrying does not solve the issue.

panop857 · Jan 25, 2024

mei2 said:
My assessment so far:

large-v3: do NOT use. Wait for the next update.
large-v2: use this one for general purpose.
large-v1: use this one for cross talks and multiple speakers

I'd wait untill the Whisper main branch is updated with necessary fixes. There are few fixes suggested to reduce V3's wild hallucination but still not ready for the prime time. Even the most robust fixes that have been suggested so far only make the results of V3 to be equal to V2 in quality. So no gain.

It's a pity that V3 has been such a big disappointment. OpenAI estimated that V3 would improve Japanese WER by 18%, but I guess they failed to measure that hallucination got worst by 200%

V3 requires very different settings than V2 does, so I think VAD should be updated or tweaked to account for the new settings. Something like the calibration being different (0.5 threshold of one model corresponding to 0.55 on the other) would lead to large differences in hallucinations.

I think it is useful to do a basic run that uses the whisper-ctranslate2 / faster-whisper defaults, mostly, and then doing a second run with a lower threshold, a reduced length penalty, and higher repetition penalty.

panop857 · Feb 12, 2024

--repetition_penalty , at least in the ctranslate2 faster whisper, deals well with large-v3 hallucinations. Setting it high to something like 1.5 deals with most of the regular hallucinations where phantom lines are repeated over and over, but there will still be some hallucinations. I also turn down the length_penalty to 0.7, but that is less crucial.

I'm having good luck with setting the vad_threshold low (like 0.2 or 0.3), but using a high --patience or --best_of.

High patience (like, 15 or 20) seems to help with timings as well. Some of the Slave Color entries I did not spend a lot of time editing before get way better first drafts with large-v3, and the file size is smaller and fewer lines. It misses less but has a way better basic interpretation of what is being said in a lot of the series verbal torments.

pspo · Feb 18, 2024

has anyone compared the quality of subs produced by using google live captions(beta) with whisper .

mei2 · Feb 19, 2024

pspo said:
has anyone compared the quality of subs produced by using google live captions(beta) with whisper .

I dodn't even know of Google live captions. Any good?

triyandafil · Feb 25, 2024

guys im having this problem , anyone knows the reason?

RuntimeError: CUDA failed with error CUDA driver version is insufficient for CUDA runtime version

and also im having another error while trying the connect gpu , is that related to membership ? I thought its free
Unable to connect to GPU backend
You cannot connect to the GPU at this time due to usage limits in Colab. Get information
For more GPU access, you can purchase a Colab processing unit with Pay As You Go.

javerotikaz · Jul 27, 2024

I find that using Subtitle Edit you can do this much better with a GUI. No need for any fancy installation or python, etc. It has audio-to-text option where you can use different models and whisper engines.

flamy1 · Aug 3, 2024

javerotikaz said:
I find that using Subtitle Edit you can do this much better with a GUI. No need for any fancy installation or python, etc. It has audio-to-text option where you can use different models and whisper engines.

i like Subtitle Edit but the quality of the subs is a problem for me. which model are you using?

jeff0064 · Aug 3, 2024

Does anyone here upload any if these subbed movies to any site and if so which ones

Whisper (OpenAI) - Automatic English Subtitles for Any Film in Any Language - An Intro & Guide to Subtitling JAV

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

Well-Known Member

New Member

Well-Known Member

New Member

Active Member

Well-Known Member

Active Member

Well-Known Member

Well-Known Member

Member

Well-Known Member

Member

Active Member

Member

New Member