Post your JAV subtitle files here - JAV Subtitle Repository (JSP)★NOT A SUB REQUEST THREAD★

Is Deepseek just an AI translator or is it an alternative for Whisper?

I use it to first re-write the Whisper transcription, and then to translate that.
It can be done with any reasoning model. I get similar result from Mistral (French based) as with Deepseek (China based).
Unfortunately the US based providers reject/deny requests that have adult content. They usually tone down / ruin the tone.
 
I'm having a decent experience translating titles with gemini(google AI), if you're clear it's adult content, it doesn't usually tone it down and if it does, you can just tell it to not and it will listen.
I've been meaning to try deepseak though, but not sure if there's a free option other than hosting it yourself(which I'd love to do at some point).
 
  • Like
Reactions: Chuckie100 and mei2
I've been meaning to try deepseak though, but not sure if there's a free option other than hosting it yourself(which I'd love to do at some point).

Local hosting would be so cool, but unfortunately way way way out of my VRAM $ budget :)

They had a super cheap promo for couple of months. They have now switched to :
The promotional period for the deepseek-chat model has ended, and the pricing has been updated to $0.27 per million input tokens and $1.10 per million output tokens.
 
I've been meaning to ask all of you- how accurate are the Japanese to English translations using Whisper/Deepseek and other translation software?
Just curious. Wish I had the patience to do this. I'm not tech savvy but would someday love to learn. Or I could get Babbel and Learn Japanese; or better yet just move to Japan and immerse myself into the culture.
:)
 
Without understanding much japanese myself, asking an AI to translate seems to be pretty accurate as far as I can tell, if you take the time to explain context and tweak it a little by asking questions.
It's kinda complicated since japanese has a different sentence structure than english so often translation simply can't be 100% accurate and sound like what someone would say in english, but since the AI can explain every word and why it did what it did, you can direct it to make something more literal or more natural and stuff like that or make it explain it to you and you can then tell it if it makes sense or not what it did and it will try to fix it if it doesn't.

Basically what matters is how much work you put into it. Something fully automatic like whisper(the translation part, those AI don't do transcription as far as I know) will be a lot worse than giving some context information to an AI and feeding it all the lines and if you take the time to do each lines one by one and question the results, I think you end up with something really good(depending on how accurate the transcription was obviously).

I've been meaning to do a comparison with a fully whisper translation and just transcribing with whisper and doing the translation with an AI line by line for a movie that had about 100 lines of dialog, but who knows if I'll actually get to it. I have the files, just gotta work on it.
 

Attachments

I have always wanted to be able to extract hard-coded subs from JAV movies but until now I had been unsuccessful. I found a Simplified Tutorial or as I call it a "How to for Dummies"! The link to this tutorial is: https://www.videoconverterfactory.com/tips/extract-hardcoded-subtitles.html#1

I have only tried it on one video with English Subs hard coded in it, so not positive it will always work but it looks promising. I am posting the raw Sub for DVDES-626, but as you can see it could still use some editing help. Also I have not tried the procedure on Chinese subs, also it would take an extra step to translate the result. Anyway my first "success" with extracting hard coded subs.
try this one https://github.com/voun7/Video_Sub_Extractor
 

DVAJ-517 Nanami Kawakami Tempts Her Neighbor Who Lives Across Her Window
1740621374228.png


I've continued to experiment with the combination of Whisper and AI.
I wanted to make a quality check, so I selected this one which had a Chinese sub as well as a sub from JAVENGLISH. I did a scene by scene comparison and the whsiper+ai showed much better result than both of the other subs. Well, one sample dosen't mean much, but I hope it is a promising path.
 

Attachments

SONE-604 I Was Left In The Care Of My Uncle As A Teenager, And While I Felt Disgusted, He Licked My Body And Screwed Me.

1740709748688.png
I was using DeepL cause it was mostly free, but I'm now using deepseek which feels so much better. It's so much cheaper than other models, specially now they they are doing 75% off-peak discount prices. Something like this would probably cost less than €0.01. Only thing I notice is that you can ask deepseek to translate the same file 2 times and it will sometimes give a more correct translation. No idea if other models do that as well, not too bothered about testing them as they are way more expensive. And this quality is good enough anyway.

There were very minor corrections, 4 in total I think, and I like to keep in terms like Ojisan, Hentai, Lolicon, Oppai, ... (which you can ask the model to do)
 

Attachments

Last edited:
Anyone know to convert this kind of subtitles (WebVTT based thumbnails) ? 1740721771217.png
 
Last edited:
Anyone know to convert this kind of subtitles (WebVTT based thumbnails) ?

If you look at the duration, every single sub seem to be at 1:29 .09x length. You can also see that the url pointers are incremented by constant values. I think it indicates that this vtt is a bogus or a dummy filler than a viable subtitle file.
Which makes me wonder how the streaming server picks up the correct subs at time of playback.
 
I was using DeepL cause it was mostly free, but I'm now using deepseek which feels so much better. It's so much cheaper than other models, specially now they they are doing 75% off-peak discount prices. Something like this would probably cost less than €0.01. Only thing I notice is that you can ask deepseek to translate the same file 2 times and it will sometimes give a more correct translation. No idea if other models do that as well, not too bothered about testing them as they are way more expensive. And this quality is good enough anyway.

Thanks for the learning. I didn't know a second pass would make deepseek's translation better. Would the second pass be after the entire subs are once finished, or is it right after each sub?

About the off-peak, how's your experience with the response time? I tried to do a long srt file during the off-peak period, but the reponse time was more than a minute for each call. I wonder if there is something wrong with my code or is the off-peak really that busy.
 
Thanks for the learning. I didn't know a second pass would make deepseek's translation better. Would the second pass be after the entire subs are once finished, or is it right after each sub?

About the off-peak, how's your experience with the response time? I tried to do a long srt file during the off-peak period, but the reponse time was more than a minute for each call. I wonder if there is something wrong with my code or is the off-peak really that busy.
I wasn't talking about doing a second pass with deepseek, just doing "one pass" twice and getting different results.

Dunno if you are writing your own code, but with subtitle edit the response time seems incredibly slow yeah, not sure what's going on.
I've been now using GPTSubtitler.com (you can choose tons of models) and the response time seems normal (in line with their chat response time). I'm not a fan of their pricing and plans, I'm pretty sure they cheat you on amount of tokens used. I've been using the site but chose to use my own API key.

Why I like it is cause they have a great pre-configured default prompt that tells the models to do context checking. If you use R1 deepseek thinking model you can really see how it operates. They also have a sort of 2nd pass option, but I tried it once and I wasn't a fan at all. Even if you're using your own code, I'd have a look at their prompt.

I'm now testing Diarization with whisper. With a prompt like that I assume that it would benefit from differentiating the speakers and understand the context even better.
 
Last edited:
  • Like
Reactions: mei2
I'm now testing Diarization with whisper. With a prompt like that I assume that it would benefit from differentiating the speakers and understand the context even better.

I hadn't considered GPTSubtitler.com. I just checked them out --it does look quite decent and transparent on the surface.
I have adopted this repo for for my script: chatgpt-subtitle-translator.

Getting the names and the pronouns right is a pain. Diarization is a good thinking. In case you're interested (and planning to use pyannote) there is a Japanese fine tuned version that imo is more decent than the stock version: transcribe_japanese_with_diarization.py
 
  • Like
Reactions: t221152
I hadn't considered GPTSubtitler.com. I just checked them out --it does look quite decent and transparent on the surface.
I have adopted this repo for for my script: chatgpt-subtitle-translator.

Getting the names and the pronouns right is a pain. Diarization is a good thinking. In case you're interested (and planning to use pyannote) there is a Japanese fine tuned version that imo is more decent than the stock version: transcribe_japanese_with_diarization.py
I gave it a try on a small sample, and it's only seeing 1 speaker while there should be 3.
I even configured a number of speakers to 3 and upped the sensitivity and still only saw 1 speaker. Tried it 3 times and 1 time it saw 2.
I'll give it another go on Linux in SWL so I can use ROCm, cause atm I'm on AMD and transcription is sooo slow. But that's a project that might take a bit of time, since I'm not too well versed in Linux.

With the standard pyannote I had no problems, diarization was working correctly, but I didn't see much an improvement. I need to test it on one where it get's all the pronouns wrong, but deepseek does a great job at getting it right or just skirting around the problem of using pronouns.
 
  • Like
Reactions: mei2