Whisper and its many forms

I just tested one SRT translation by redirecting SubtitleEdit to use DeepSeek V3 endpoints. The result looks better than DeepL. However it was rather slow speed. Might that be because SubtitleEdit is not using optimised calls, or should I expect that Deepseek API calls are slow (copmapred to DeepL)?
 
I just tested one SRT translation by redirecting SubtitleEdit to use DeepSeek V3 endpoints. The result looks better than DeepL. However it was rather slow speed. Might that be because SubtitleEdit is not using optimised calls, or should I expect that Deepseek API calls are slow (copmapred to DeepL)?
Hmm. This sounds interesting. I'd like to have a play.

Can I assume you redirected to the DeepSeek end point by changing the ChatGPT Auto-translate settings in SubtitleEdit?

But how did you get the DeepSeek model as an option available to select?
 
I played around with deepseek and VSM and have no idea how it works for you. For me, half the subtitle is not translated and the other half is empty.
Therefore, I wrote my own little script which does the exact same thing. I also tried one which works with chunks instead of each line seperately. This massively improves the translation quality (due to context) and speed. But for some reason, it bugs out sometimes and does not translate lines or whole chunks. Retries don't work either. Haven't figured out what the issue is. If anyone has an idea how to solve it, I would be very interested.

The script works as follows (both versions, chunk one is bugged as mentioned):

- pip3 install openai
- enter your API Key
- specify the path where all your subtitles are
- adjust the prompt if you are not happy with the one I provided
- run it
 

Attachments

  • Like
Reactions: Novus.Toto and mei2
I just tested one SRT translation by redirecting SubtitleEdit to use DeepSeek V3 endpoints. The result looks better than DeepL. However it was rather slow speed. Might that be because SubtitleEdit is not using optimised calls, or should I expect that Deepseek API calls are slow (copmapred to DeepL)?
They are really slow. One line of sub takes generelly 1.5s.
 
Last edited:
I played around with deepseek and VSM and have no idea how it works for you. For me, half the subtitle is not translated and the other half is empty.
Therefore, I wrote my own little script which does the exact same thing. I also tried one which works with chunks instead of each line seperately.

I've found this repo to do a reasonable job for gpt:

I haven''t tested it yet with Deepseek --the endpoints and models are in the coonfig file.
 
I've found this repo to do a reasonable job for gpt:

I haven''t tested it yet with Deepseek --the endpoints and models are in the coonfig file.
Need to try it but deepseek usually has an issue with sending the whole file at once. Token max is reached pretty soon.
 
I played around with deepseek and VSM and have no idea how it works for you. For me, half the subtitle is not translated and the other half is empty.
Therefore, I wrote my own little script which does the exact same thing.
I queued four files to translate with VSM yesterday. Two failed and the other two were much slower than previous times.
Maybe there is or was an issue at DeepSeek rather than at our end?

And it's great you've written your scripts. I'm going to try them.
I was planning to do the same myself but hadn't got around to it. A simple script is much more elegant than running an application just to send API calls.
I'll let you know if I have the same issue with the chunks version.
 
But for some reason, it bugs out sometimes and does not translate lines or whole chunks. Retries don't work either. Haven't figured out what the issue is. If anyone has an idea how to solve it, I would be very interested.
I’ve tried the chunks version of your script, just once so far, and also got untranslated lines and sections.

Not sure if it’ll help you solve the issue but I noticed that most of the untranslated parts in my test come immediately after an extremely long line (like a hallucination line with very many repeated characters or words).

Apart from that bug I believe you’re correct - working with chunks seems to greatly improve the translation.

Well done you!!!
 
I played around with deepseek and VSM and have no idea how it works for you. For me, half the subtitle is not translated and the other half is empty.
Therefore, I wrote my own little script which does the exact same thing. I also tried one which works with chunks instead of each line seperately. This massively improves the translation quality (due to context) and speed. But for some reason, it bugs out sometimes and does not translate lines or whole chunks. Retries don't work either. Haven't figured out what the issue is. If anyone has an idea how to solve it, I would be very interested.
I have modified the chunks version of your script so it translates any untranslated lines, one-by-one, after translation by chunks. This is obviously not a real solution, but just a non-ideal workaround. It can take quite a while to do those line-by-line translations.

The modified script also add some ease-of-use changes.

Do you mind if I post my modification here?
 
I have modified the chunks version of your script so it translates any untranslated lines, one-by-one, after translation by chunks. This is obviously not a real solution, but just a non-ideal workaround. It can take quite a while to do those line-by-line translations.

The modified script also add some ease-of-use changes.

Do you mind if I post my modification here?
Nice, I did the same.

Sure, please feel free.
 
  • Like
Reactions: Novus.Toto
1736626925051.png

Oh man looks like VLC is gonna implement Whisper for live translations.......Freakin sweet!!!

I wonder if its the existing whisper models or if they will have a new version launching alongside this, gonna be so easy goin forward!
 
  • Like
Reactions: Novus.Toto
I played around with deepseek and VSM and have no idea how it works for you. For me, half the subtitle is not translated and the other half is empty.
Therefore, I wrote my own little script which does the exact same thing. I also tried one which works with chunks instead of each line seperately. This massively improves the translation quality (due to context) and speed. But for some reason, it bugs out sometimes and does not translate lines or whole chunks. Retries don't work either. Haven't figured out what the issue is. If anyone has an idea how to solve it, I would be very interested.

The script works as follows (both versions, chunk one is bugged as mentioned):

- pip3 install openai
- enter your API Key
- specify the path where all your subtitles are
- adjust the prompt if you are not happy with the one I provided
- run it
Here is a modified version of porgate55555's translation by chunks script. It provides a crude work-around for the untranslated lines issue by re-translating missed lines one-by-one. The one-by-one translation can take quite a while.

pip install openai python-dotenv tqdm

Add your API key to the .env file and put it in the same folder as the script.

Change the prompt if you want.

Run the script. Translated files will be saved in a "Translated" subfolder of the input folder.
 

Attachments

Last edited:
  • Like
Reactions: mei2
Hi all,
Nice to have found this thread. I've been using an app called Buzz for this workflow.
https://github.com/chidiwilliams/buzz
If you scroll down, you can see links to the installers for various platforms.

Once installed, you just drag and drop the video file onto it's window and then you get a popup of options to select the:
  • Model: (Whisper, Whisper.cpp, Hugging Face, Faster Whisper, OpenAI),
  • Whisper models include: Tiny, Base, Small, Medium, Large, LargeV2, LargeV3 and LargeV3-Turbo;
  • Task: Translate or Transcribe.
  • Language: Lots of Languages here, including Japanese and Javanese
  • Advanced options: set the Speech Recognition Temperature, Prompts, enable AI Translation, models and instruction
It will then exports as TXT, SRT or VTT with or without Word-level timings into the folder where the source video or audio file is. Timings obviously depend on the movie lengths, but from my history I can see: 31m, 23m, 13m, 9m, 11m, 14m, 34m, 15m to mention a few examples.

It sounds like the results are as per what has been reported here. Hit n' miss sometimes. Duplication. Periods of nothing (which I assume is down to the quality of the source material). I've tried extracted the audio, running it through an AI to strip out everything but the vocals (I used DJ Studio for this), but it didn't seem to affect the results too much.

Also thinking there may be some NSFW filtering going on, as the text says "sorry", "not sure I can do this"... I don't think it's the scenes every time!! As it's on github, I may take a look at the source and see if that can be turned off.

So, like current, the end file still requires work. However, the results are good enough for me to understand what is being said. Just doing a few bits of testing, I found the LargeV2 model I think gives the best results.

I'm not affiliated in any way. Just thought, I'd mention it just in case it makes life easier.

Cheers.
 
Last edited: