Whisper (OpenAI) - Automatic English Subtitles for Any Film in Any Language - An Intro & Guide to Subtitling JAV

Besh

Well-Known Member
Feb 2, 2018
340
779
To my knowledge there are no models specifically trained for JAV --someone earlier in this forum was suggesting to embark on it. In general for Japanese you'd need a large model. The medium model does a decent job to give you a sense of the topic/dialogue but that's it.

@mei2

Is this a new thing? Never received that warning before:

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning: The secret `HF_TOKEN` does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks.

config.json reach 100% but stop there, nothing after. Never had to use HF Token with your collab before.

Version used is v0_6i


-Besh
 
Last edited:

mei2

Well-Known Member
Dec 6, 2018
247
407
Is this a new thing? Never received that warning before:

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning: The secret `HF_TOKEN` does not exist in your Colab secrets. To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session. You will be able to reuse this secret in all of your notebooks.

config.json reach 100% but stop there, nothing after. Never had to use HF Token with you collab before.

Version used is v0_6i


-Besh

Yes, I just received an issue report on Github too. It seems that something got broken in colab environment. Interestingly Google hasn't announced any official upgrades to colab since 18th December, but they have changed the security policy and some other updates, it seems.

I haven't been able to find a workaround for it yet. Will still dig into it.

cc: @dickie389389 , @bobe123 , @MrKid
 
  • Like
Reactions: MrKid and Besh

mei2

Well-Known Member
Dec 6, 2018
247
407
cc: @dickie389389 , @bobe123 , @MrKid , @Besh

Here is a quick workaround: version 1.0 (beta)



The root of the error seems to be a code break caused by the recent Colab upgrade: mayankmalik-colab .
In this beta release I have replaced faster-whisper with stable-ts package. For now, this release does not use the quantised model for faster speed. We have to wait untill the rot cause is solved to go back to that.

I have tested this relase only on a handful of test audios. If you come across any issues please let me know.
 
  • Like
Reactions: Besh and MrKid

mei2

Well-Known Member
Dec 6, 2018
247
407
Epanding on my previous post, for those who prefer faster whisper (2x speed):

There is a workaround to run 0.6i (faster whisper). The workaround is not the smoothest but it does the job. Follow the gudiline here:


Steps:
1. Run the cells (or all cells) once.
2. Select 'tools' (in the menu bar above) --> 'Command palette'.
3. Select 'use fallback runtime version'.
4. Run the cells (or all cells) again.

Note: the fallback environment is supposed to be a temporary solution. The google rep in the colab community says that they're doing their best to keep the environment up longer than usual, but eventually they need to shut it down. I hope by then we will have a robust solution.
 

tguy1982

New Member
Dec 13, 2008
4
2
Last edited:

mei2

Well-Known Member
Dec 6, 2018
247
407
Any chance you will add the V3 model

My assessment so far:
large-v3: do NOT use. Wait for the next update.​
large-v2: use this one for general purpose.​
large-v1: use this one for cross talks and multiple speakers​

I'd wait untill the Whisper main branch is updated with necessary fixes. There are few fixes suggested to reduce V3's wild hallucination but still not ready for the prime time. Even the most robust fixes that have been suggested so far only make the results of V3 to be equal to V2 in quality. So no gain.

It's a pity that V3 has been such a big disappointment. OpenAI estimated that V3 would improve Japanese WER by 18%, but I guess they failed to measure that hallucination got worst by 200% :)
 

tguy1982

New Member
Dec 13, 2008
4
2
My assessment so far:
large-v3: do NOT use. Wait for the next update.​
large-v2: use this one for general purpose.​
large-v1: use this one for cross talks and multiple speakers​

I'd wait untill the Whisper main branch is updated with necessary fixes. There are few fixes suggested to reduce V3's wild hallucination but still not ready for the prime time. Even the most robust fixes that have been suggested so far only make the results of V3 to be equal to V2 in quality. So no gain.

It's a pity that V3 has been such a big disappointment. OpenAI estimated that V3 would improve Japanese WER by 18%, but I guess they failed to measure that hallucination got worst by 200% :)
ok thanks much appreciated.

another question: what does "chunk_duration:" do, the default is 4 what does increasing or decreasing it do?

also "I'm going to sleep." is a phases said a lot in error.
 
Last edited:

porgate55555

Active Member
Jul 24, 2021
52
163
Does anybody else experiences timing issues fairly often when using WhisperWithVAD (up to ~30sec sometimes)?
 

porgate55555

Active Member
Jul 24, 2021
52
163
Are you using large-v2 model? And not large-v3 (and of course "large").
Yes, large-v2. Was also using large-v1 and large-v3 but all of them have the same flaw. There seems no apparent reason. Sometimes it is perfectly fine but then completely out of sync. Even retrying does not solve the issue.
 

panop857

Active Member
Sep 11, 2011
171
247
My assessment so far:
large-v3: do NOT use. Wait for the next update.​
large-v2: use this one for general purpose.​
large-v1: use this one for cross talks and multiple speakers​

I'd wait untill the Whisper main branch is updated with necessary fixes. There are few fixes suggested to reduce V3's wild hallucination but still not ready for the prime time. Even the most robust fixes that have been suggested so far only make the results of V3 to be equal to V2 in quality. So no gain.

It's a pity that V3 has been such a big disappointment. OpenAI estimated that V3 would improve Japanese WER by 18%, but I guess they failed to measure that hallucination got worst by 200% :)

V3 requires very different settings than V2 does, so I think VAD should be updated or tweaked to account for the new settings. Something like the calibration being different (0.5 threshold of one model corresponding to 0.55 on the other) would lead to large differences in hallucinations.

I think it is useful to do a basic run that uses the whisper-ctranslate2 / faster-whisper defaults, mostly, and then doing a second run with a lower threshold, a reduced length penalty, and higher repetition penalty.
 

panop857

Active Member
Sep 11, 2011
171
247
--repetition_penalty , at least in the ctranslate2 faster whisper, deals well with large-v3 hallucinations. Setting it high to something like 1.5 deals with most of the regular hallucinations where phantom lines are repeated over and over, but there will still be some hallucinations. I also turn down the length_penalty to 0.7, but that is less crucial.

I'm having good luck with setting the vad_threshold low (like 0.2 or 0.3), but using a high --patience or --best_of.

High patience (like, 15 or 20) seems to help with timings as well. Some of the Slave Color entries I did not spend a lot of time editing before get way better first drafts with large-v3, and the file size is smaller and fewer lines. It misses less but has a way better basic interpretation of what is being said in a lot of the series verbal torments.
 
Last edited:

pspo

Member
Nov 24, 2021
17
62
has anyone compared the quality of subs produced by using google live captions(beta) with whisper .
 

triyandafil

Member
Dec 19, 2020
61
77
guys im having this problem , anyone knows the reason?


RuntimeError: CUDA failed with error CUDA driver version is insufficient for CUDA runtime version

and also im having another error while trying the connect gpu , is that related to membership ? I thought its free
Unable to connect to GPU backend
You cannot connect to the GPU at this time due to usage limits in Colab. Get information
For more GPU access, you can purchase a Colab processing unit with Pay As You Go.
 

javerotikaz

Active Member
Aug 19, 2008
278
163
I find that using Subtitle Edit you can do this much better with a GUI. No need for any fancy installation or python, etc. It has audio-to-text option where you can use different models and whisper engines.
 

flamy1

New Member
Apr 18, 2022
27
20
I find that using Subtitle Edit you can do this much better with a GUI. No need for any fancy installation or python, etc. It has audio-to-text option where you can use different models and whisper engines.
i like Subtitle Edit but the quality of the subs is a problem for me. which model are you using?