What's the criteria behind the choose of jav to subtitle?

GrannyBongo · Apr 28, 2023

Saying it 'cause is it me or most of the jav subbed are conventional vanilla-netorare and the like that don't require actual subbing to understand what's happening considering the conventional plot. On the other hand, content like games, fetishisms, extreme topics and the like, that requires subbing for a better understanding, are mostly ignored.

I have the impression, then, that the "weird"content, which grabbed my attention around 2008, was never popular to begin with.

maload · Apr 29, 2023

if you called them a " subtitle "???
thats because .... they cant do it... ( reason number 1 )
most of them come from " auto machine " ...and thats it ..
the machine cant do ..................lots of thing ... they can create texts..... yeah ,,,,

some of subtitle was created by people whom love to do it ...( so
and lots of website ,,,,,they dont sub for nothing ...they do sub for increase views and money
ahh..... and they dont even do it themselve ,,, they stolen sub from somewhere els
if the website think " people love to watched this ,,,," then they post moives with sub ..

SUNBO · Apr 29, 2023

GrannyBongo said:
What's the criteria behind what jav titles are uploaded on the internet and what aren't?
Saying it 'cause is it me or most of the jav subbed are conventional vanilla-netorare and the like that don't require actual subbing to understand what's happening considering the conventional plot. On the other hand, material like games, fetichism, extreme and the like that requires subbing for a better experience of the content are mostly ignored. I have the impression that the "weird shit"content, which grabbed my attention around 2008, was never popular to begin with, meanwhile, in my opinion, the decrease of quality of jav after 2015 isn't helping neither

i think depend on popularity

machine sub using openai has been pretty good (about 80% accurate), you can do it yourself or i can help you.

ArgentGrace · Apr 29, 2023

SUNBO said:
i think depend on popularity

machine sub using openai has been pretty good (about 80% accurate), you can do it yourself or i can help you.

Hey. Interested. You are saying I can sub by myself using a tool called machine sub / openai?

SUNBO · Apr 29, 2023

ArgentGrace said:
Hey. Interested. You are saying I can sub by myself using a tool called machine sub / openai

It's now easier than before when I was doing it with google colab. You can use SubtitleEdit directly with OpenAI whisper built in. Just download subtitleedit, see this video for more instructions

GrannyBongo · Aug 3, 2023

i think only the most predictable plots and titles get subbed. in fact those javs don't need be subbed cause is obvious how the story is developing. unlike game javs

GrannyBongo · Apr 4, 2024

how good are those software translators?

SamKook · Apr 4, 2024

It's work to create a subtitle so the 2 determining factors for what gets subbed are:
- Who paid money to have something they wanted subbed(or any other money earning related reasons)
- What the person doing the subbing taste are.
Basically nobody is going to spend their time doing subs of something they aren't interested in for free.

It's fairly easy to get a decent enough machine tl sub you can make yourself to grasp most of the general meaning of the movie. Some harder to hear lines get skipped and some noise gets mistranslated, how much will vary depending on the movie audio, but making something proper would mean weeks of work for one person so you're not going to find many of those for free.

You can look at the whisper tutorial in my signature and use the colab in there, which is a pretty popular option for machine tl on the forum that require basically no skill and no specific hardware to do and see for yourself how good they are.

Besh · Apr 4, 2024

If you want perfect subtitles done by a pro, the answer is easy: @darksider59. You wont find something with a better quality.

If for you, what is important is to have a general idea of what's going on, learn to do it yourself with the tutorial of @SamKook or the google collab of @mei2, both are pretty awesome.

-Besh

Taako · Apr 5, 2024

GrannyBongo said:
i think only the most predictable plots and titles get subbed. in fact those javs don't need be subbed cause is obvious how the story is developing. unlike game javs

I tend to agree. Easier plots where the stories have been done repeatedly are well... easier to subbed.
Honestly, that's ok.

Taako · Apr 5, 2024

SamKook said:
It's work to create a subtitle so the 2 determining factors for what gets subbed are:
- Who paid money to have something they wanted subbed(or any other money earning related reasons)
- What the person doing the subbing taste are.
Basically nobody is going to spend their time doing subs of something they aren't interested in for free.

It's fairly easy to get a decent enough machine tl sub you can make yourself to grasp most of the general meaning of the movie. Some harder to hear lines get skipped and some noise gets mistranslated, how much will vary depending on the movie audio, but making something proper would mean weeks of work for one person so you're not going to find many of those for free.

You can look at the whisper tutorial in my signature and use the colab in there, which is a pretty popular option for machine tl on the forum that require basically no skill and no specific hardware to do and see for yourself how good they are.

I do notice something with this whisper and I might be imagining it but...

I noticed the sentences that are wrong.... it 'forces' the preceding sentence to fit its' narrative.
Almost like an bad autocorrect feature or something.

At first I thought I was dreaming, but sure enough it will take a sentence that is wrong, and make the other sentence either 'before or after' fit that wrong sentence.

Like, for example, it will say the person knows the Sister is cheating and make the next sentence lean toward that narrative.
When in fact, there is no such conversation. Strange.

Now, I admit I don't used Whisper like the others, but I have notice with some of the subs
made with it.
It's so well placed that you think it's the actual dialogue and it's not.
I thought I share that info

Have you noticed this as well?

SamKook · Apr 6, 2024

I haven't used it a lot and don't really check the result when I do(also haven't watched other peoples sub for more than a few lines) so I don't think I've noticed multiple sentences changed like that, but it does not surprise me at all. It's always changing the output when you run it again with identical settings so weird things are bound to happen if that's how it act.

Whisper has no problem with guessing and it will make things fit with the context when it does.

The models have obviously been trained on youtube videos since a sub will often end with "Thank you for watching", enough that's it's blacklisted in the colab people use.

A funny one I've noticed is when I did a DTS audio test with the movie 10 things I hate about you and tried the different large models when it failed spectacularly on the songs at the beginning to see how they'd differ and that's only transcribing, no translation required. It's a song so the lines don't make a ton of sense as a conversation so it's hard to say if it really tries to adapt or just outputs what it thinks it hears, but some sentence are definitely reformed to make sense. (large and large-v2 are the exact same model in that test and only large-v1 got the first line right. I think I shared this in some form in the past but can't find it, that was made 2 days after christmas over a year ago):

I do want to give it a shot for a JAV where it seemed to have done a decent job if I mix v2 and v3 but it's kind of low in the priority list so I haven't gotten to it yet.

Taako · Apr 6, 2024

SamKook said:
I haven't used it a lot and don't really check the result when I do(also haven't watched other peoples sub for more than a few lines) so I don't think I've noticed multiple sentences changed like that, but it does not surprise me at all. It's always changing the output when you run it again with identical settings so weird things are bound to happen if that's how it act.

Whisper has no problem with guessing and it will make things fit with the context when it does.

The models have obviously been trained on youtube videos since a sub will often end with "Thank you for watching", enough that's it's blacklisted in the colab people use.

A funny one I've noticed is when I did a DTS audio test with the movie 10 things I hate about you and tried the different large models when it failed spectacularly on the songs at the beginning to see how they'd differ and that's only transcribing, no translation required. It's a song so the lines don't make a ton of sense as a conversation so it's hard to say if it really tries to adapt or just outputs what it thinks it hears, but some sentence are definitely reformed to make sense. (large and large-v2 are the exact same model in that test and only large-v1 got the first line right. I think I shared this in some form in the past but can't find it, that was made 2 days after christmas over a year ago):
View attachment 3463840

I do want to give it a shot for a JAV where it seemed to have done a decent job if I mix v2 and v3 but it's kind of low in the priority list so I haven't gotten to it yet.

Very interesting analysis. I guess I might had been right after all

I was checking other people subs and noticed how it was actually changing dialogue.
Thanks.

mei2 · Apr 7, 2024

Taako said:
Very interesting analysis. I guess I might had been right after all
I was checking other people subs and noticed how it was actually changing dialogue.
Thanks.

In short: Absolutely, you're spot on – and that's no hallucination (pun intended).

In long: (I started writing this and it ended up way much longer than I wanted

):

Blame it on temperature. What makes Whisper (or any similar AI model) to go wild is the "temperature" parameter. The temperature parameter basically guides how freely the model can make guesses. It can go from conservative (temperature = 0) to wild wild guesses (temperature = 1). The default settings of Whisper is to incrementally increase temperature from 0 to 1 as it processes each audio segment.

Unfortunately many users just use the default settings, which deffinitely is not a good fit for JAV subtitling. That's why you come across wild narratives. A better settings would be to set the temperature to 0 in the parameter settings.

In my experience A balanced combintaion of settings to increase accuracy and to reduce hallucination at the same time would be:

"temperatures": [0.0],
"condition_on_previous_text": False,
"patience": 2,
"beam_size":2,
"hallucination_silence_threshold":1.5,

Also I strongly recommend opting for large-v2. Here is a recent benchmark to compare some of the models out there. As many have noticed, large-v3 just sucks. I hope OpenAI releases v4 in couple of months to address the shortcomings of v3.

PS. some people have reported that large-v1 is a better choice for noisy background and cross-talk situations.

Taako · Apr 7, 2024

mei2 said:
In short: Absolutely, you're spot on – and that's no hallucination (pun intended).

In long: (I started writing this and it ended up way much longer than I wanted ):

Blame it on temperature. What makes Whisper (or any similar AI model) to go wild is the "temperature" parameter. The temperature parameter basically guides how freely the model can make guesses. It can go from conservative (temperature = 0) to wild wild guesses (temperature = 1). The default settings of Whisper is to incrementally increase temperature from 0 to 1 as it processes each audio segment.

Unfortunately many users just use the default settings, which deffinitely is not a good fit for JAV subtitling. That's why you come across wild narratives. A better settings would be to set the temperature to 0 in the parameter settings.

In my experience A balanced combintaion of settings to increase accuracy and to reduce hallucination at the same time would be:

"temperatures": [0.0],
"condition_on_previous_text": False,
"patience": 2,
"beam_size":2,
"hallucination_silence_threshold":1.5,

Also I strongly recommend opting for large-v2. Here is a recent benchmark to compare some of the models out there. As many have noticed, large-v3 just sucks. I hope OpenAI releases v4 in couple of months to address the shortcomings of v3.

PS. some people have reported that large-v1 is a better choice for noisy background and cross-talk situations.

Looks like you been keeping track to what might be better if someone uses Whisper.
Maybe if the tech continue to improve I will go back to using it

Search

Search

What's the criteria behind the choose of jav to subtitle?

GrannyBongo

Member

maload

Well-Known Member

SUNBO

Active Member

ArgentGrace

Member

SUNBO

Active Member

GrannyBongo

Member

GrannyBongo

Member

SamKook

Grand Wizard

Besh

Well-Known Member

Taako

Akiba Citizen

Taako

Akiba Citizen

SamKook

Grand Wizard

Taako

Akiba Citizen

mei2

Well-Known Member

Taako

Akiba Citizen

Similar threads