What's the criteria behind the choose of jav to subtitle?

GrannyBongo

Member
Jan 24, 2018
56
11
8
34
Saying it 'cause is it me or most of the jav subbed are conventional vanilla-netorare and the like that don't require actual subbing to understand what's happening considering the conventional plot. On the other hand, content like games, fetishisms, extreme topics and the like, that requires subbing for a better understanding, are mostly ignored.

I have the impression, then, that the "weird"content, which grabbed my attention around 2008, was never popular to begin with.
 
Last edited:
if you called them a " subtitle "???
thats because .... they cant do it... ( reason number 1 )
most of them come from " auto machine " ...and thats it ..
the machine cant do ..................lots of thing ... they can create texts..... yeah ,,,,

some of subtitle was created by people whom love to do it ...( so
and lots of website ,,,,,they dont sub for nothing ...they do sub for increase views and money
ahh..... and they dont even do it themselve ,,, they stolen sub from somewhere els
if the website think " people love to watched this ,,,," then they post moives with sub ..
 
  • Like
Reactions: GrannyBongo
What's the criteria behind what jav titles are uploaded on the internet and what aren't?
Saying it 'cause is it me or most of the jav subbed are conventional vanilla-netorare and the like that don't require actual subbing to understand what's happening considering the conventional plot. On the other hand, material like games, fetichism, extreme and the like that requires subbing for a better experience of the content are mostly ignored. I have the impression that the "weird shit"content, which grabbed my attention around 2008, was never popular to begin with, meanwhile, in my opinion, the decrease of quality of jav after 2015 isn't helping neither
i think depend on popularity

machine sub using openai has been pretty good (about 80% accurate), you can do it yourself or i can help you.
 
i think depend on popularity

machine sub using openai has been pretty good (about 80% accurate), you can do it yourself or i can help you.
Hey. Interested. You are saying I can sub by myself using a tool called machine sub / openai?
 
Hey. Interested. You are saying I can sub by myself using a tool called machine sub / openai
It's now easier than before when I was doing it with google colab. You can use SubtitleEdit directly with OpenAI whisper built in. Just download subtitleedit, see this video for more instructions
 
Saying it 'cause is it me or most of the jav subbed are conventional vanilla-netorare and the like that don't require actual subbing to understand what's happening considering the conventional plot. On the other hand, content like games, fetishisms, extreme topics and the like, that requires subbing for a better understanding, are mostly ignored.

I have the impression, then, that the "weird"content, which grabbed my attention around 2008, was never popular to begin with.

I do and did translation for all kind of JAV movies (except scat, animals and snuff) for customers.


i think depend on popularity

machine sub using openai has been pretty good (about 80% accurate), you can do it yourself or i can help you.

All the AI subs are done this way : transcription of the japanese (already not accurate, even Youtube fails at it once it's more than one person talking very clearly without any background noise) > translation to chinese and then those chinese subs are translated to english. Leading to a crappy result. Just look at all the subs shared here for example.
 
i think only the most predictable plots and titles get subbed. in fact those javs don't need be subbed cause is obvious how the story is developing. unlike game javs
 
  • Like
Reactions: Taako
It's work to create a subtitle so the 2 determining factors for what gets subbed are:
- Who paid money to have something they wanted subbed(or any other money earning related reasons)
- What the person doing the subbing taste are.
Basically nobody is going to spend their time doing subs of something they aren't interested in for free.

It's fairly easy to get a decent enough machine tl sub you can make yourself to grasp most of the general meaning of the movie. Some harder to hear lines get skipped and some noise gets mistranslated, how much will vary depending on the movie audio, but making something proper would mean weeks of work for one person so you're not going to find many of those for free.

You can look at the whisper tutorial in my signature and use the colab in there, which is a pretty popular option for machine tl on the forum that require basically no skill and no specific hardware to do and see for yourself how good they are.
 
  • Like
Reactions: vincent_z and Taako
If you want perfect subtitles done by a pro, the answer is easy: @darksider59. You wont find something with a better quality.

If for you, what is important is to have a general idea of what's going on, learn to do it yourself with the tutorial of @SamKook or the google collab of @mei2, both are pretty awesome.

-Besh
 
i think only the most predictable plots and titles get subbed. in fact those javs don't need be subbed cause is obvious how the story is developing. unlike game javs
I tend to agree. Easier plots where the stories have been done repeatedly are well... easier to subbed.
Honestly, that's ok.
 
It's work to create a subtitle so the 2 determining factors for what gets subbed are:
- Who paid money to have something they wanted subbed(or any other money earning related reasons)
- What the person doing the subbing taste are.
Basically nobody is going to spend their time doing subs of something they aren't interested in for free.

It's fairly easy to get a decent enough machine tl sub you can make yourself to grasp most of the general meaning of the movie. Some harder to hear lines get skipped and some noise gets mistranslated, how much will vary depending on the movie audio, but making something proper would mean weeks of work for one person so you're not going to find many of those for free.

You can look at the whisper tutorial in my signature and use the colab in there, which is a pretty popular option for machine tl on the forum that require basically no skill and no specific hardware to do and see for yourself how good they are.
I do notice something with this whisper and I might be imagining it but...

I noticed the sentences that are wrong.... it 'forces' the preceding sentence to fit its' narrative.
Almost like an bad autocorrect feature or something.

At first I thought I was dreaming, but sure enough it will take a sentence that is wrong, and make the other sentence either 'before or after' fit that wrong sentence.

Like, for example, it will say the person knows the Sister is cheating and make the next sentence lean toward that narrative.
When in fact, there is no such conversation. Strange.

Now, I admit I don't used Whisper like the others, but I have notice with some of the subs
made with it.
It's so well placed that you think it's the actual dialogue and it's not.
I thought I share that info:)

Have you noticed this as well?
 
I haven't used it a lot and don't really check the result when I do(also haven't watched other peoples sub for more than a few lines) so I don't think I've noticed multiple sentences changed like that, but it does not surprise me at all. It's always changing the output when you run it again with identical settings so weird things are bound to happen if that's how it act.

Whisper has no problem with guessing and it will make things fit with the context when it does.

The models have obviously been trained on youtube videos since a sub will often end with "Thank you for watching", enough that's it's blacklisted in the colab people use.

A funny one I've noticed is when I did a DTS audio test with the movie 10 things I hate about you and tried the different large models when it failed spectacularly on the songs at the beginning to see how they'd differ and that's only transcribing, no translation required. It's a song so the lines don't make a ton of sense as a conversation so it's hard to say if it really tries to adapt or just outputs what it thinks it hears, but some sentence are definitely reformed to make sense. (large and large-v2 are the exact same model in that test and only large-v1 got the first line right. I think I shared this in some form in the past but can't find it, that was made 2 days after christmas over a year ago):
Whisper_10tilay.jpg

I do want to give it a shot for a JAV where it seemed to have done a decent job if I mix v2 and v3 but it's kind of low in the priority list so I haven't gotten to it yet.
 
  • Like
Reactions: Taako
I haven't used it a lot and don't really check the result when I do(also haven't watched other peoples sub for more than a few lines) so I don't think I've noticed multiple sentences changed like that, but it does not surprise me at all. It's always changing the output when you run it again with identical settings so weird things are bound to happen if that's how it act.

Whisper has no problem with guessing and it will make things fit with the context when it does.

The models have obviously been trained on youtube videos since a sub will often end with "Thank you for watching", enough that's it's blacklisted in the colab people use.

A funny one I've noticed is when I did a DTS audio test with the movie 10 things I hate about you and tried the different large models when it failed spectacularly on the songs at the beginning to see how they'd differ and that's only transcribing, no translation required. It's a song so the lines don't make a ton of sense as a conversation so it's hard to say if it really tries to adapt or just outputs what it thinks it hears, but some sentence are definitely reformed to make sense. (large and large-v2 are the exact same model in that test and only large-v1 got the first line right. I think I shared this in some form in the past but can't find it, that was made 2 days after christmas over a year ago):
View attachment 3463840

I do want to give it a shot for a JAV where it seemed to have done a decent job if I mix v2 and v3 but it's kind of low in the priority list so I haven't gotten to it yet.
Very interesting analysis. I guess I might had been right after all:D
I was checking other people subs and noticed how it was actually changing dialogue.
Thanks.
 
Very interesting analysis. I guess I might had been right after all:D
I was checking other people subs and noticed how it was actually changing dialogue.
Thanks.

In short: Absolutely, you're spot on – and that's no hallucination (pun intended).

In long: (I started writing this and it ended up way much longer than I wanted :D ):

Blame it on temperature. What makes Whisper (or any similar AI model) to go wild is the "temperature" parameter. The temperature parameter basically guides how freely the model can make guesses. It can go from conservative (temperature = 0) to wild wild guesses (temperature = 1). The default settings of Whisper is to incrementally increase temperature from 0 to 1 as it processes each audio segment.

Unfortunately many users just use the default settings, which deffinitely is not a good fit for JAV subtitling. That's why you come across wild narratives. A better settings would be to set the temperature to 0 in the parameter settings.

In my experience A balanced combintaion of settings to increase accuracy and to reduce hallucination at the same time would be:

"temperatures": [0.0],
"condition_on_previous_text": False,
"patience": 2,
"beam_size":2,
"hallucination_silence_threshold":1.5,



Also I strongly recommend opting for large-v2. Here is a recent benchmark to compare some of the models out there. As many have noticed, large-v3 just sucks. I hope OpenAI releases v4 in couple of months to address the shortcomings of v3.

PS. some people have reported that large-v1 is a better choice for noisy background and cross-talk situations.



rtf1.png
 
Last edited:
  • Like
Reactions: Taako
In short: Absolutely, you're spot on – and that's no hallucination (pun intended).

In long: (I started writing this and it ended up way much longer than I wanted :D ):

Blame it on temperature. What makes Whisper (or any similar AI model) to go wild is the "temperature" parameter. The temperature parameter basically guides how freely the model can make guesses. It can go from conservative (temperature = 0) to wild wild guesses (temperature = 1). The default settings of Whisper is to incrementally increase temperature from 0 to 1 as it processes each audio segment.

Unfortunately many users just use the default settings, which deffinitely is not a good fit for JAV subtitling. That's why you come across wild narratives. A better settings would be to set the temperature to 0 in the parameter settings.

In my experience A balanced combintaion of settings to increase accuracy and to reduce hallucination at the same time would be:

"temperatures": [0.0],
"condition_on_previous_text": False,
"patience": 2,
"beam_size":2,
"hallucination_silence_threshold":1.5,



Also I strongly recommend opting for large-v2. Here is a recent benchmark to compare some of the models out there. As many have noticed, large-v3 just sucks. I hope OpenAI releases v4 in couple of months to address the shortcomings of v3.

PS. some people have reported that large-v1 is a better choice for noisy background and cross-talk situations.
Looks like you been keeping track to what might be better if someone uses Whisper.
Maybe if the tech continue to improve I will go back to using it:)