Yeah, the mixed (opposite) use of pronouns when spoken by characters, both about themselves as well as about others, is pretty much stock standard in every effort that I've made, hence my reluctance to publish them.
I never realised that changing the model_size would yield those kinds of variations. But my preference would be to have more dialogue picked up (albeit possibly inaccurate) than to have them be missed out altogether.
However, in addition to the issue of mixed-up pronouns, occasionally (about half the time) my efforts would result in subtitles which would contain certain artifacts which would fill up the whole screen and make the movie unwatchable, which means that I would have to scan all my results and then personally edit those which have them.
Notwithstanding these issues, and as brilliant a tool as Whisper is at the moment, artificial intelligence still cannot translate idioms across languages comfortably.
As a non-Japanese speaker, hearing them say, "I want to sleep" in the throes of passion threw me off at first, but i kinda get the gist of it now. However, I'm still grasping at straws to try to replace it with an English alternative which sounds like a normal expression, like maybe "I'm in bliss" or something. Maybe others (who know Japanese) can help here.
that's just a problem with translations. Most people prefer to have 10x or more subtitles for JAV out there than having that 10% but with proper pronouns.
There is probably potential for a Whisper model that weights under the hood to get better sex talk or pronouns but that would be very difficult.