How to machine translate jav?

Pendekar2020

Well-Known Member
Sep 2, 2013
215
792
There are many ways, utilizing the various translation services on the internet. This only what I know:

1. Write a small program, for example see this: https://towardsdatascience.com/language-translation-using-python-bd8020772ccc

2. Use services such as DeepL (https://www.deepl.com/) or subtitlecat (https://www.subtitlecat.com/)
2a. DeepL does not accept subtitle format such as srt, so you need to convert it to docx (change the extension to .txt, open in MS Word and Save as a Word document). The result from DeepL also needs to be copied back to an srt file (if you just convert it to text and add the extension .srt, it won't open). DeepL limit how many file you can translate each day/month unless you go pro.
2b. Subtitlecat, upload the original srt file, and then you can select the target language. There's also limitation unless you subscribe.

3. Use free translation such as https://translate.google.com/ https://www.bing.com/Translator/ https://translate.yandex.com/ https://papago.naver.com/
There's a character limit, usually 5,000 characters, that you can do in 1 go. If your translating from an .srt file, limit your translation to 50 dialog lines (for languages using alphabets) or 100 lines (languages using kanji). Open the .srt file using a text editor (I'm using BBEdit Free, since I'm on a Mac) and copy paste, including the time stamps (remember the character limit). Then copy back the results.
There are also applications that act as front end of the service. Like this
Screen Shot 2022-05-08 at 05.53.57.jpg
I normally use this method now.

I hope this information helps.
 

Walle12

JAV VR needs less clothes and more lesbian
Dec 6, 2008
1,837
1,062
But that is to translate text-to-text. If you want to translate audio to subtitles?
 
  • Like
Reactions: [Warlord]

Pendekar2020

Well-Known Member
Sep 2, 2013
215
792
But that is to translate text-to-text. If you want to translate audio to subtitles?
You need to transcribe it first from audio-to-text You can try py-transcribe. In Mac, there's an app called Libretto, which I think is a front-end of py-transcribe. In my experience both have bad results with inaccurate time stamps.
With actors talking in whispers and often talking while other is talking/moaning, it's a difficult job for the transcription software. There's also the background music.

There are online transcription services, but they're costly and usually does not make .srt file. :ngupil:
 

Walle12

JAV VR needs less clothes and more lesbian
Dec 6, 2008
1,837
1,062
OK, thanks. Yes, I suppose it is difficult to machine-translate audio porn... I would love to translate some of my favorite VRs....
 

seungri

New Member
Dec 26, 2019
5
8
You need to transcribe it first from audio-to-text You can try py-transcribe. In Mac, there's an app called Libretto, which I think is a front-end of py-transcribe. In my experience both have bad results with inaccurate time stamps.
With actors talking in whispers and often talking while other is talking/moaning, it's a difficult job for the transcription software. There's also the background music.

There are online transcription services, but they're costly and usually does not make .srt file. :ngupil:
if you have experience with python, py-transcribe, scikit-learn, keras and tensorflow:
- use an audio to text transcriber in order to get input data
- when you’ve built a collection of data, instead of trying to make subtitles, set up like an excel spreadsheet and tag “hits” and “misses” (if the transcription is dialogue, it’s a hit, if it came from a moan or anything else, it’s a miss)
- get a lot of this kind of data and tag it appropriately
- when you have this data, use ML to build a binary classifier or perform logistic regression on the data. a standard neural network with a few layers is probably fine. the key being you want to train input data so the model will recognize when transcribed audio is anything of worth or not. like, it’ll probably spit out a percentage that you can use.
- after training the model, you can then set up a function that’ll take audio transcription data, perform a prediction using your model, and then pass or fail it based on whatever tolerance you set (i.e. 80% chance it’s dialogue) and if it passes, output the transcription to a translator, and then log the translated text

of course it’s probably more work than worth and definitely easier said than done. this is an extremely high level explanation of the process to machine learning and theres a ton i glossed over.
 

djlandd01

Active Member
Feb 8, 2022
267
111
if you have experience with python, py-transcribe, scikit-learn, keras and tensorflow:
- use an audio to text transcriber in order to get input data
- when you’ve built a collection of data, instead of trying to make subtitles, set up like an excel spreadsheet and tag “hits” and “misses” (if the transcription is dialogue, it’s a hit, if it came from a moan or anything else, it’s a miss)
- get a lot of this kind of data and tag it appropriately
- when you have this data, use ML to build a binary classifier or perform logistic regression on the data. a standard neural network with a few layers is probably fine. the key being you want to train input data so the model will recognize when transcribed audio is anything of worth or not. like, it’ll probably spit out a percentage that you can use.
- after training the model, you can then set up a function that’ll take audio transcription data, perform a prediction using your model, and then pass or fail it based on whatever tolerance you set (i.e. 80% chance it’s dialogue) and if it passes, output the transcription to a translator, and then log the translated text

of course it’s probably more work than worth and definitely easier said than done. this is an extremely high level explanation of the process to machine learning and theres a ton i glossed over.
Only if my data scientist friend was into JAV, he would have done something similar. Sadly he is not he has mad alot of programs for himself though to maintain and organize his JAV database and media. :(
 
  • Like
Reactions: seungri

seungri

New Member
Dec 26, 2019
5
8
yea the solution i posted is if you wanted to build it from scratch. an idea i’ve been tinkering with for a while since i took my university’s ML class honestly. the method with using auto sub and deepL is probably much more simpler even if it comes at a premium
 
  • Like
Reactions: [Warlord]

granca

Member
Mar 4, 2017
62
79
let me add that there for sure other solutions on github that are easier than the one I indicated, I remember at least one tool that had a proper graphical interface (but i cannot remember the name) pretty sure it is mentioned in some other threads on this topic here..
Anyway the one I mentioned is one of the best solution in my opinion, results are very good for voice over and when characters speak close to the camera, it lose sentences when character are in foreground.. or speak in very low voice.. of course the results are far from perfect.. due to the fact that some words are context heavy.. R words and other slang expression are usually wrong etc...
 

fletcheree

New Member
Dec 27, 2021
18
13
You could probably also upload the audio track to yt and download the auto captions. Unlike for en the results aren't amazing for ja though.
 
  • Like
Reactions: [Warlord]

djlandd01

Active Member
Feb 8, 2022
267
111
You could probably also upload the audio track to yt and download the auto captions. Unlike for en the results aren't amazing for ja though.
haha I didnt know thta was possible to upload just audio, A long time back when I got desperate to find subtitle for a particular ID, I thought why not try YouTube just for translation, So I tried to upload the JAV to youtube set to private so I can auto translate and watch it on the site(I thought I had the perfect plan), apparently youtube recognizes adult content and deletes it automatically after upload LOL.

I tried with split parts, they usually d upload but yeah, the translations are not that fruitful, I mean certainly not par with the time and effort put in. I truly sometimes get the urge to learn machine learning just to accomplish this LMAO.
 
  • Like
Reactions: Huibor

jppilot

Member
Mar 20, 2011
63
15
1. AutoGenSubGui (portable version of autosub with gui, just download, unpack and use)
Link1 , Link2
For better results use: from Japanese to Japanese.
2. deepl.com (use Translate Files)
AutoGenSubGui generates srt file. Copy content from srt file to Word docx. Translate whole docx file via deepl. Open translated file and safe again in doc format to bypass restrict editing Detailed instruction. Create new srt file and copy content from doc file. DONE

If video have good audible speech , it is possible to get pretty good results.
 
Last edited:

Walle12

JAV VR needs less clothes and more lesbian
Dec 6, 2008
1,837
1,062
1. AutoGenSubGui (portable version of autosub with gui, just download, unpack and use)
Link1 , Link2
For better results use: from Japanese to Japanese.
2. deepl.com (use Translate Files)
AutoGenSubGui generates srt file. Copy content from srt file to Word docx. Translate whole docx file via deepl. Open translated file and safe again in doc format to bypass restrict editing Detailed instruction. Create new srt file and copy content from doc file. DONE

If video have good audible speech , it is possible to get pretty good results.

I have tryed AutoGenSubGui and it works well with trailers of 1 minute, but it says "memory error" when I try to translate a movie (5 GB). I have a modern PC, with 32 GB of RAM and a lot of HD space.

EDIT: Looking at the log, the error appears when it tries to extract the audio... I have tryed with two PCs and different videos, with no luck... :(

EDIT 2: It works well with files around 2 GB, but with bigger files it says "memory error". 99% of modern movies are bigger than 2 GB... :(
 
Last edited:
  • Like
Reactions: [Warlord]

Walle12

JAV VR needs less clothes and more lesbian
Dec 6, 2008
1,837
1,062
I have started to test VR translations with artificial intelligence, and I am getting quite acceptable results... :)


Untitled 3.jpg


Untitled 2.jpg


Untitled 1.jpg


It only works if the dialogues are clear and well pronounced, and only one person speaks at a time. If several people speak, speak in whispers, or speak between moans, it doesn't understand it.

But in the introduction scenes in one-on-one movies, where only the girl speaks, the AI manages to translate quite acceptable between 50 or 60% of the dialogues, which is very good.

You still miss phrases, but in the movies I've tried it I've managed to understand the plot and the main thread of the dialogues, so I'm happy.


The problem is that few VR players accept subtitles. HereSphere does not accept them, I had to use Virtual Home Theater on PC. I don't know if DeoVR accepts them, I have to test it.

The way to translate is simple: With AutoSub you use Google AI to convert the audio to Japanese subtitles in a standard subtitle format (SRT). And with the web DeepL (which is better than Google translator), you translate those texts from Japanese to your native language.

To translate first I have tried this solution proposed by Jppilot with AutoGenSubGui:

1. AutoGenSubGui (portable version of autosub with gui, just download, unpack and use)
Link1 , Link2
For better results use: from Japanese to Japanese.
2. deepl.com (use Translate Files)
AutoGenSubGui generates srt file. Copy content from srt file to Word docx. Translate whole docx file via deepl. Open translated file and safe again in doc format to bypass restrict editing Detailed instruction. Create new srt file and copy content from doc file. DONE

If video have good audible speech , it is possible to get pretty good results.

It works very well and offers the best translation, but if you use files larger than 2 GB it gives a memory error.... :( And 99% of VR files are bigger than 2 GB...

Then I have tried autosub-0.5.7-alpha-win-x64-pyinstaller (Download), which accepts files of any size, but it works with command line, through a RUN.BAT file, and the translation it offers is the worst of all, I don't know the reason.

Now I am using PyTranscriber 1.5 (Download). It works well, with an acceptable translation, but with some videos it crashes and does not translate them.

So for the moment I haven't found the perfect solution, but at least I get to know the plots. If someone knows a better solution, please explain it, thanks!
 

Attachments

  • Untitled 3.jpg
    Untitled 3.jpg
    232.4 KB · Views: 46
  • Like
Reactions: djlandd01

Electromog

Akiba Citizen
Dec 7, 2009
4,643
2,850
I have tryed AutoGenSubGui and it works well with trailers of 1 minute, but it says "memory error" when I try to translate a movie (5 GB). I have a modern PC, with 32 GB of RAM and a lot of HD space.

EDIT: Looking at the log, the error appears when it tries to extract the audio... I have tryed with two PCs and different videos, with no luck... :(

EDIT 2: It works well with files around 2 GB, but with bigger files it says "memory error". 99% of modern movies are bigger than 2 GB... :(
Do you have to use a video file or can you just use the audio? That should help you avoid the problem. The other option is to download a lower resolution version of the video. After all, you don't need high video resolution to make the subs, it's all about the audio.
You can even grab a version with lots of ugly watermarks, those tend to be smaller. You'll only use it to make the subtitles, you can then watch the movie with subtitles using your 5GB+ high quality file.
 
  • Like
Reactions: Taako and Walle12

Walle12

JAV VR needs less clothes and more lesbian
Dec 6, 2008
1,837
1,062
I don't know if I can use only the audio. It's a good trick, I'll try it. It is not easy to find a lower resolution version. unless you bought it in a store. I have some purchased, I will try it, thanks.
 

Taako

Akiba Citizen
May 25, 2017
1,335
940
There are many ways, utilizing the various translation services on the internet. This only what I know:

1. Write a small program, for example see this: https://towardsdatascience.com/language-translation-using-python-bd8020772ccc

2. Use services such as DeepL (https://www.deepl.com/) or subtitlecat (https://www.subtitlecat.com/)
2a. DeepL does not accept subtitle format such as srt, so you need to convert it to docx (change the extension to .txt, open in MS Word and Save as a Word document). The result from DeepL also needs to be copied back to an srt file (if you just convert it to text and add the extension .srt, it won't open). DeepL limit how many file you can translate each day/month unless you go pro.
2b. Subtitlecat, upload the original srt file, and then you can select the target language. There's also limitation unless you subscribe.

3. Use free translation such as https://translate.google.com/ https://www.bing.com/Translator/ https://translate.yandex.com/ https://papago.naver.com/
There's a character limit, usually 5,000 characters, that you can do in 1 go. If your translating from an .srt file, limit your translation to 50 dialog lines (for languages using alphabets) or 100 lines (languages using kanji). Open the .srt file using a text editor (I'm using BBEdit Free, since I'm on a Mac) and copy paste, including the time stamps (remember the character limit). Then copy back the results.
There are also applications that act as front end of the service. Like this
View attachment 2923159
I normally use this method now.

I hope this information helps.
Good advise for all :D