Medium would mess up the pronouns and be less accurate and have HE/HIM all over the place. Large seem's to get the pronouns right, but would miss audio for large chunks on some titles.
I think mixing the two gives the best results I could automate...
Yeah, the mixed (opposite) use of pronouns when spoken by characters, both about themselves as well as about others, is pretty much stock standard in every effort that I've made, hence my reluctance to publish them.
I never realised that changing the model_size would yield those kinds of variations. But my preference would be to have more dialogue picked up (albeit possibly inaccurate) than to have them be missed out altogether.
However, in addition to the issue of mixed-up pronouns, occasionally (about half the time) my efforts would result in subtitles which would contain certain artifacts which would fill up the whole screen and make the movie unwatchable, which means that I would have to scan all my results and then personally edit those which have them.
Notwithstanding these issues, and as brilliant a tool as Whisper is at the moment, artificial intelligence still cannot translate idioms across languages comfortably.
As a non-Japanese speaker, hearing them say, "I want to sleep" in the throes of passion threw me off at first, but i kinda get the gist of it now. However, I'm still grasping at straws to try to replace it with an English alternative which sounds like a normal expression, like maybe "I'm in bliss" or something. Maybe others (who know Japanese) can help here.