Whisper and its many forms

mei2 · Jan 6, 2025

I just tested one SRT translation by redirecting SubtitleEdit to use DeepSeek V3 endpoints. The result looks better than DeepL. However it was rather slow speed. Might that be because SubtitleEdit is not using optimised calls, or should I expect that Deepseek API calls are slow (copmapred to DeepL)?

Novus.Toto · Jan 7, 2025

mei2 said:
I just tested one SRT translation by redirecting SubtitleEdit to use DeepSeek V3 endpoints. The result looks better than DeepL. However it was rather slow speed. Might that be because SubtitleEdit is not using optimised calls, or should I expect that Deepseek API calls are slow (copmapred to DeepL)?

Hmm. This sounds interesting. I'd like to have a play.

Can I assume you redirected to the DeepSeek end point by changing the ChatGPT Auto-translate settings in SubtitleEdit?

But how did you get the DeepSeek model as an option available to select?

porgate55555 · Jan 7, 2025

I played around with deepseek and VSM and have no idea how it works for you. For me, half the subtitle is not translated and the other half is empty.
Therefore, I wrote my own little script which does the exact same thing. I also tried one which works with chunks instead of each line seperately. This massively improves the translation quality (due to context) and speed. But for some reason, it bugs out sometimes and does not translate lines or whole chunks. Retries don't work either. Haven't figured out what the issue is. If anyone has an idea how to solve it, I would be very interested.

The script works as follows (both versions, chunk one is bugged as mentioned):

- pip3 install openai
- enter your API Key
- specify the path where all your subtitles are
- adjust the prompt if you are not happy with the one I provided
- run it

porgate55555 · Jan 7, 2025

mei2 said:
I just tested one SRT translation by redirecting SubtitleEdit to use DeepSeek V3 endpoints. The result looks better than DeepL. However it was rather slow speed. Might that be because SubtitleEdit is not using optimised calls, or should I expect that Deepseek API calls are slow (copmapred to DeepL)?

They are really slow. One line of sub takes generelly 1.5s.

mei2 · Jan 7, 2025

Novus.Toto said:
But how did you get the DeepSeek model as an option available to select?

You can just type it in --the top down menu is a form too

mei2 · Jan 7, 2025

porgate55555 said:
I played around with deepseek and VSM and have no idea how it works for you. For me, half the subtitle is not translated and the other half is empty.
Therefore, I wrote my own little script which does the exact same thing. I also tried one which works with chunks instead of each line seperately.

I've found this repo to do a reasonable job for gpt:

GitHub - EtienneAb3d/ChatMate: ChatGPT file processing automation (Java version)

ChatGPT file processing automation (Java version). Contribute to EtienneAb3d/ChatMate development by creating an account on GitHub.

github.com

I haven''t tested it yet with Deepseek --the endpoints and models are in the coonfig file.

porgate55555 · Jan 7, 2025

mei2 said:
I've found this repo to do a reasonable job for gpt:

GitHub - EtienneAb3d/ChatMate: ChatGPT file processing automation (Java version)

ChatGPT file processing automation (Java version). Contribute to EtienneAb3d/ChatMate development by creating an account on GitHub.

github.com

I haven''t tested it yet with Deepseek --the endpoints and models are in the coonfig file.

Need to try it but deepseek usually has an issue with sending the whole file at once. Token max is reached pretty soon.

Novus.Toto · Jan 8, 2025

mei2 said:
You can just type it in --the top down menu is a form too

Ah. Didn't notice that.

Thanks.

Novus.Toto · Jan 8, 2025

porgate55555 said:
I played around with deepseek and VSM and have no idea how it works for you. For me, half the subtitle is not translated and the other half is empty.
Therefore, I wrote my own little script which does the exact same thing.

I queued four files to translate with VSM yesterday. Two failed and the other two were much slower than previous times.
Maybe there is or was an issue at DeepSeek rather than at our end?

And it's great you've written your scripts. I'm going to try them.
I was planning to do the same myself but hadn't got around to it. A simple script is much more elegant than running an application just to send API calls.
I'll let you know if I have the same issue with the chunks version.

Novus.Toto · Jan 8, 2025

porgate55555 said:
But for some reason, it bugs out sometimes and does not translate lines or whole chunks. Retries don't work either. Haven't figured out what the issue is. If anyone has an idea how to solve it, I would be very interested.

I’ve tried the chunks version of your script, just once so far, and also got untranslated lines and sections.

Not sure if it’ll help you solve the issue but I noticed that most of the untranslated parts in my test come immediately after an extremely long line (like a hallucination line with very many repeated characters or words).

Apart from that bug I believe you’re correct - working with chunks seems to greatly improve the translation.

Well done you!!!

Novus.Toto · Jan 11, 2025

porgate55555 said:
I played around with deepseek and VSM and have no idea how it works for you. For me, half the subtitle is not translated and the other half is empty.
Therefore, I wrote my own little script which does the exact same thing. I also tried one which works with chunks instead of each line seperately. This massively improves the translation quality (due to context) and speed. But for some reason, it bugs out sometimes and does not translate lines or whole chunks. Retries don't work either. Haven't figured out what the issue is. If anyone has an idea how to solve it, I would be very interested.

I have modified the chunks version of your script so it translates any untranslated lines, one-by-one, after translation by chunks. This is obviously not a real solution, but just a non-ideal workaround. It can take quite a while to do those line-by-line translations.

The modified script also add some ease-of-use changes.

Do you mind if I post my modification here?

porgate55555 · Jan 11, 2025

Novus.Toto said:
I have modified the chunks version of your script so it translates any untranslated lines, one-by-one, after translation by chunks. This is obviously not a real solution, but just a non-ideal workaround. It can take quite a while to do those line-by-line translations.

The modified script also add some ease-of-use changes.

Do you mind if I post my modification here?

Nice, I did the same.

Sure, please feel free.

Dom047 · Jan 11, 2025

Oh man looks like VLC is gonna implement Whisper for live translations.......Freakin sweet!!!

I wonder if its the existing whisper models or if they will have a new version launching alongside this, gonna be so easy goin forward!

Novus.Toto · Jan 11, 2025

porgate55555 said:
I played around with deepseek and VSM and have no idea how it works for you. For me, half the subtitle is not translated and the other half is empty.
Therefore, I wrote my own little script which does the exact same thing. I also tried one which works with chunks instead of each line seperately. This massively improves the translation quality (due to context) and speed. But for some reason, it bugs out sometimes and does not translate lines or whole chunks. Retries don't work either. Haven't figured out what the issue is. If anyone has an idea how to solve it, I would be very interested.

The script works as follows (both versions, chunk one is bugged as mentioned):

- pip3 install openai
- enter your API Key
- specify the path where all your subtitles are
- adjust the prompt if you are not happy with the one I provided
- run it

Here is a modified version of porgate55555's translation by chunks script. It provides a crude work-around for the untranslated lines issue by re-translating missed lines one-by-one. The one-by-one translation can take quite a while.

pip install openai python-dotenv tqdm

Add your API key to the .env file and put it in the same folder as the script.

Change the prompt if you want.

Run the script. Translated files will be saved in a "Translated" subfolder of the input folder.

Lijima · Jan 14, 2025

Hi all,
Nice to have found this thread. I've been using an app called Buzz for this workflow.
https://github.com/chidiwilliams/buzz
If you scroll down, you can see links to the installers for various platforms.

Once installed, you just drag and drop the video file onto it's window and then you get a popup of options to select the:

Model: (Whisper, Whisper.cpp, Hugging Face, Faster Whisper, OpenAI),
Whisper models include: Tiny, Base, Small, Medium, Large, LargeV2, LargeV3 and LargeV3-Turbo;
Task: Translate or Transcribe.
Language: Lots of Languages here, including Japanese and Javanese
Advanced options: set the Speech Recognition Temperature, Prompts, enable AI Translation, models and instruction

It will then exports as TXT, SRT or VTT with or without Word-level timings into the folder where the source video or audio file is. Timings obviously depend on the movie lengths, but from my history I can see: 31m, 23m, 13m, 9m, 11m, 14m, 34m, 15m to mention a few examples.

It sounds like the results are as per what has been reported here. Hit n' miss sometimes. Duplication. Periods of nothing (which I assume is down to the quality of the source material). I've tried extracted the audio, running it through an AI to strip out everything but the vocals (I used DJ Studio for this), but it didn't seem to affect the results too much.

Also thinking there may be some NSFW filtering going on, as the text says "sorry", "not sure I can do this"... I don't think it's the scenes every time!! As it's on github, I may take a look at the source and see if that can be turned off.

So, like current, the end file still requires work. However, the results are good enough for me to understand what is being said. Just doing a few bits of testing, I found the LargeV2 model I think gives the best results.

I'm not affiliated in any way. Just thought, I'd mention it just in case it makes life easier.

Cheers.

pukimai · Jan 16, 2025

today i try to run whisper trough google collab and it gives me error, can anybody please help to fix it? seems like it error when try to run ffmpeg, srt, and dlab

SamKook · Jan 16, 2025

The torch version is too old and no longer available. It needs to be changed to another compatible one, if there is such a thing. Or some other stuff needs to be changed to make the old torch version available, python modules aren't the easiest thing to deal with.

pukimai · Jan 16, 2025

any idea how to fix it @SamKook ? or any other whisper - google collab that i can use?

SamKook · Jan 16, 2025

I went back to the original whisperwithVAD that didn't force a specific version, and it seems that after installing a few extra packages, it works fine.

I manually added a new line of code to the setup whisper block before running it to install the packages it was saying were missing:

Code:

!pip install ffmpeg-python srt deepl

It should also work to copy the setup from that original, add the line I mentioned and replace the maintenance version or whichever else version code at that step with it, since those have a few quality of life things added to them so they may be preferred.

Edit: It seems that simply deleting "spleeter" from the code fixes things instead of adding the line since it prevents those 3 from being installed.

pukimai · Jan 16, 2025

work like wonder !!!! thank you a lot, so before this i try to add that !pip install line to the run whisper part, and it's not working at all, thank you a lot @SamKook

Whisper and its many forms

Well-Known Member

Member

Active Member

Attachments

Active Member

Well-Known Member

Well-Known Member

Active Member

Member

Member

Member

Member

Active Member

New Member

Member

Attachments

New Member

Active Member

Grand Wizard

Active Member

Grand Wizard

Active Member