How to machine translate jav?

Inertia · May 7, 2022

Poorly unless it's something with crystal clear audio and mainly a one-on-one affair (think VR).

Pendekar2020 · May 7, 2022

There are many ways, utilizing the various translation services on the internet. This only what I know:

1. Write a small program, for example see this: https://towardsdatascience.com/language-translation-using-python-bd8020772ccc

2. Use services such as DeepL (https://www.deepl.com/) or subtitlecat (https://www.subtitlecat.com/)
2a. DeepL does not accept subtitle format such as srt, so you need to convert it to docx (change the extension to .txt, open in MS Word and Save as a Word document). The result from DeepL also needs to be copied back to an srt file (if you just convert it to text and add the extension .srt, it won't open). DeepL limit how many file you can translate each day/month unless you go pro.
2b. Subtitlecat, upload the original srt file, and then you can select the target language. There's also limitation unless you subscribe.

3. Use free translation such as https://translate.google.com/ https://www.bing.com/Translator/ https://translate.yandex.com/ https://papago.naver.com/
There's a character limit, usually 5,000 characters, that you can do in 1 go. If your translating from an .srt file, limit your translation to 50 dialog lines (for languages using alphabets) or 100 lines (languages using kanji). Open the .srt file using a text editor (I'm using BBEdit Free, since I'm on a Mac) and copy paste, including the time stamps (remember the character limit). Then copy back the results.
There are also applications that act as front end of the service. Like this

I normally use this method now.

I hope this information helps.

Walle12 · May 9, 2022

But that is to translate text-to-text. If you want to translate audio to subtitles?

Pendekar2020 · May 9, 2022

Walle12 said:
But that is to translate text-to-text. If you want to translate audio to subtitles?

You need to transcribe it first from audio-to-text You can try py-transcribe. In Mac, there's an app called Libretto, which I think is a front-end of py-transcribe. In my experience both have bad results with inaccurate time stamps.
With actors talking in whispers and often talking while other is talking/moaning, it's a difficult job for the transcription software. There's also the background music.

There are online transcription services, but they're costly and usually does not make .srt file. :ngupil:

Walle12 · May 9, 2022

OK, thanks. Yes, I suppose it is difficult to machine-translate audio porn... I would love to translate some of my favorite VRs....

seungri · May 10, 2022

Pendekar2020 said:
You need to transcribe it first from audio-to-text You can try py-transcribe. In Mac, there's an app called Libretto, which I think is a front-end of py-transcribe. In my experience both have bad results with inaccurate time stamps.
With actors talking in whispers and often talking while other is talking/moaning, it's a difficult job for the transcription software. There's also the background music.

There are online transcription services, but they're costly and usually does not make .srt file.

if you have experience with python, py-transcribe, scikit-learn, keras and tensorflow:
- use an audio to text transcriber in order to get input data
- when you’ve built a collection of data, instead of trying to make subtitles, set up like an excel spreadsheet and tag “hits” and “misses” (if the transcription is dialogue, it’s a hit, if it came from a moan or anything else, it’s a miss)
- get a lot of this kind of data and tag it appropriately
- when you have this data, use ML to build a binary classifier or perform logistic regression on the data. a standard neural network with a few layers is probably fine. the key being you want to train input data so the model will recognize when transcribed audio is anything of worth or not. like, it’ll probably spit out a percentage that you can use.
- after training the model, you can then set up a function that’ll take audio transcription data, perform a prediction using your model, and then pass or fail it based on whatever tolerance you set (i.e. 80% chance it’s dialogue) and if it passes, output the transcription to a translator, and then log the translated text

of course it’s probably more work than worth and definitely easier said than done. this is an extremely high level explanation of the process to machine learning and theres a ton i glossed over.

djlandd01 · May 10, 2022

seungri said:
if you have experience with python, py-transcribe, scikit-learn, keras and tensorflow:
- use an audio to text transcriber in order to get input data
- when you’ve built a collection of data, instead of trying to make subtitles, set up like an excel spreadsheet and tag “hits” and “misses” (if the transcription is dialogue, it’s a hit, if it came from a moan or anything else, it’s a miss)
- get a lot of this kind of data and tag it appropriately
- when you have this data, use ML to build a binary classifier or perform logistic regression on the data. a standard neural network with a few layers is probably fine. the key being you want to train input data so the model will recognize when transcribed audio is anything of worth or not. like, it’ll probably spit out a percentage that you can use.
- after training the model, you can then set up a function that’ll take audio transcription data, perform a prediction using your model, and then pass or fail it based on whatever tolerance you set (i.e. 80% chance it’s dialogue) and if it passes, output the transcription to a translator, and then log the translated text

of course it’s probably more work than worth and definitely easier said than done. this is an extremely high level explanation of the process to machine learning and theres a ton i glossed over.

Only if my data scientist friend was into JAV, he would have done something similar. Sadly he is not he has mad alot of programs for himself though to maintain and organize his JAV database and media.

granca · May 10, 2022

The easiest way is to use this version of autosub: https://github.com/BingLingGroup/autosub/tree/dev/autosub .
Then I use https://github.com/GeorgeMan6644/DAT , but it requires Deepl API (Paid) to work.

There are for sure other options that require less work to setup.

Of course your mileage may very in term of results since audio recording is not always great.

seungri · May 11, 2022

yea the solution i posted is if you wanted to build it from scratch. an idea i’ve been tinkering with for a while since i took my university’s ML class honestly. the method with using auto sub and deepL is probably much more simpler even if it comes at a premium

granca · May 11, 2022

let me add that there for sure other solutions on github that are easier than the one I indicated, I remember at least one tool that had a proper graphical interface (but i cannot remember the name) pretty sure it is mentioned in some other threads on this topic here..
Anyway the one I mentioned is one of the best solution in my opinion, results are very good for voice over and when characters speak close to the camera, it lose sentences when character are in foreground.. or speak in very low voice.. of course the results are far from perfect.. due to the fact that some words are context heavy.. R words and other slang expression are usually wrong etc...

fletcheree · May 12, 2022

You could probably also upload the audio track to yt and download the auto captions. Unlike for en the results aren't amazing for ja though.

djlandd01 · May 12, 2022

fletcheree said:
You could probably also upload the audio track to yt and download the auto captions. Unlike for en the results aren't amazing for ja though.

haha I didnt know thta was possible to upload just audio, A long time back when I got desperate to find subtitle for a particular ID, I thought why not try YouTube just for translation, So I tried to upload the JAV to youtube set to private so I can auto translate and watch it on the site(I thought I had the perfect plan), apparently youtube recognizes adult content and deletes it automatically after upload LOL.

I tried with split parts, they usually d upload but yeah, the translations are not that fruitful, I mean certainly not par with the time and effort put in. I truly sometimes get the urge to learn machine learning just to accomplish this LMAO.

jppilot · May 15, 2022

1. AutoGenSubGui (portable version of autosub with gui, just download, unpack and use)
Link1 , Link2
For better results use: from Japanese to Japanese.
2. deepl.com (use Translate Files)
AutoGenSubGui generates srt file. Copy content from srt file to Word docx. Translate whole docx file via deepl. Open translated file and safe again in doc format to bypass restrict editing Detailed instruction. Create new srt file and copy content from doc file. DONE

If video have good audible speech , it is possible to get pretty good results.

Walle12 · Jun 12, 2022

jppilot said:
1. AutoGenSubGui (portable version of autosub with gui, just download, unpack and use)
Link1 , Link2
For better results use: from Japanese to Japanese.
2. deepl.com (use Translate Files)
AutoGenSubGui generates srt file. Copy content from srt file to Word docx. Translate whole docx file via deepl. Open translated file and safe again in doc format to bypass restrict editing Detailed instruction. Create new srt file and copy content from doc file. DONE

If video have good audible speech , it is possible to get pretty good results.

I have tryed AutoGenSubGui and it works well with trailers of 1 minute, but it says "memory error" when I try to translate a movie (5 GB). I have a modern PC, with 32 GB of RAM and a lot of HD space.

EDIT: Looking at the log, the error appears when it tries to extract the audio... I have tryed with two PCs and different videos, with no luck...

EDIT 2: It works well with files around 2 GB, but with bigger files it says "memory error". 99% of modern movies are bigger than 2 GB...

Walle12 · Jun 21, 2022

I have started to test VR translations with artificial intelligence, and I am getting quite acceptable results...

It only works if the dialogues are clear and well pronounced, and only one person speaks at a time. If several people speak, speak in whispers, or speak between moans, it doesn't understand it.

But in the introduction scenes in one-on-one movies, where only the girl speaks, the AI manages to translate quite acceptable between 50 or 60% of the dialogues, which is very good.

You still miss phrases, but in the movies I've tried it I've managed to understand the plot and the main thread of the dialogues, so I'm happy.

The problem is that few VR players accept subtitles. HereSphere does not accept them, I had to use Virtual Home Theater on PC. I don't know if DeoVR accepts them, I have to test it.

The way to translate is simple: With AutoSub you use Google AI to convert the audio to Japanese subtitles in a standard subtitle format (SRT). And with the web DeepL (which is better than Google translator), you translate those texts from Japanese to your native language.

To translate first I have tried this solution proposed by Jppilot with AutoGenSubGui:

jppilot said:
1. AutoGenSubGui (portable version of autosub with gui, just download, unpack and use)
Link1 , Link2
For better results use: from Japanese to Japanese.
2. deepl.com (use Translate Files)
AutoGenSubGui generates srt file. Copy content from srt file to Word docx. Translate whole docx file via deepl. Open translated file and safe again in doc format to bypass restrict editing Detailed instruction. Create new srt file and copy content from doc file. DONE

If video have good audible speech , it is possible to get pretty good results.

It works very well and offers the best translation, but if you use files larger than 2 GB it gives a memory error....

And 99% of VR files are bigger than 2 GB...

Then I have tried autosub-0.5.7-alpha-win-x64-pyinstaller (Download), which accepts files of any size, but it works with command line, through a RUN.BAT file, and the translation it offers is the worst of all, I don't know the reason.

Now I am using PyTranscriber 1.5 (Download). It works well, with an acceptable translation, but with some videos it crashes and does not translate them.

So for the moment I haven't found the perfect solution, but at least I get to know the plots. If someone knows a better solution, please explain it, thanks!

Electromog · Jun 21, 2022

Walle12 said:
I have tryed AutoGenSubGui and it works well with trailers of 1 minute, but it says "memory error" when I try to translate a movie (5 GB). I have a modern PC, with 32 GB of RAM and a lot of HD space.

EDIT: Looking at the log, the error appears when it tries to extract the audio... I have tryed with two PCs and different videos, with no luck...

EDIT 2: It works well with files around 2 GB, but with bigger files it says "memory error". 99% of modern movies are bigger than 2 GB...

Do you have to use a video file or can you just use the audio? That should help you avoid the problem. The other option is to download a lower resolution version of the video. After all, you don't need high video resolution to make the subs, it's all about the audio.
You can even grab a version with lots of ugly watermarks, those tend to be smaller. You'll only use it to make the subtitles, you can then watch the movie with subtitles using your 5GB+ high quality file.

Walle12 · Jun 21, 2022

I don't know if I can use only the audio. It's a good trick, I'll try it. It is not easy to find a lower resolution version. unless you bought it in a store. I have some purchased, I will try it, thanks.

Playguuu · Jun 22, 2022

Most people take the Chinese translations and Google Translate them...

Taako · Jun 22, 2022

Pendekar2020 said:
There are many ways, utilizing the various translation services on the internet. This only what I know:

1. Write a small program, for example see this: https://towardsdatascience.com/language-translation-using-python-bd8020772ccc

2. Use services such as DeepL (https://www.deepl.com/) or subtitlecat (https://www.subtitlecat.com/)
2a. DeepL does not accept subtitle format such as srt, so you need to convert it to docx (change the extension to .txt, open in MS Word and Save as a Word document). The result from DeepL also needs to be copied back to an srt file (if you just convert it to text and add the extension .srt, it won't open). DeepL limit how many file you can translate each day/month unless you go pro.
2b. Subtitlecat, upload the original srt file, and then you can select the target language. There's also limitation unless you subscribe.

3. Use free translation such as https://translate.google.com/ https://www.bing.com/Translator/ https://translate.yandex.com/ https://papago.naver.com/
There's a character limit, usually 5,000 characters, that you can do in 1 go. If your translating from an .srt file, limit your translation to 50 dialog lines (for languages using alphabets) or 100 lines (languages using kanji). Open the .srt file using a text editor (I'm using BBEdit Free, since I'm on a Mac) and copy paste, including the time stamps (remember the character limit). Then copy back the results.
There are also applications that act as front end of the service. Like this
View attachment 2923159
I normally use this method now.

I hope this information helps.

Good advise for all

How to machine translate jav?

Active Member

Akiba Citizen

Well-Known Member

JAV VR needs less clothes and more lesbian

Well-Known Member

JAV VR needs less clothes and more lesbian

New Member

Active Member

Member

New Member

Member

New Member

Active Member

Member

JAV VR needs less clothes and more lesbian

JAV VR needs less clothes and more lesbian

Attachments

Akiba Citizen

JAV VR needs less clothes and more lesbian

Active Member

Akiba Citizen