The AI is just guesswork so sometimes it think noise is people speaking and sometimes it thinks people speaking is noise. That can happen even if it's loud enough.
AI as we have it(which is machine learning, not actual intelligence since it relies 100% on its training being accurate) will never be perfect so manual editing/fiddling will always be required.
If you don't tell avidemux to increase the volume, it just copies it as-is, which is the point of the demuxing part of my tutorial.
A problem that can happen if you increase the volume overall at all times is that it might makes some parts too loud to be recognized as speech so that should really only be used for specific portions.
What you want to have more speech be recognized is to play with the whisper settings themselves. There is no perfect settings to always use or it would be the default so it'll depend on what you want. More speech recognized also means more noise recognized as speech.
large-v3 will recognize more speech as large-v2 from what I saw but the quality of the translation doesn't seem to be as good so an easy way would be to use v3 to fill in what v2 misses.
There is a thread that discusses tuning the settings that you may want to look at here: https://www.akiba-online.com/thread...age-an-intro-guide-to-subtitling-jav.2115103/
AI as we have it(which is machine learning, not actual intelligence since it relies 100% on its training being accurate) will never be perfect so manual editing/fiddling will always be required.
If you don't tell avidemux to increase the volume, it just copies it as-is, which is the point of the demuxing part of my tutorial.
A problem that can happen if you increase the volume overall at all times is that it might makes some parts too loud to be recognized as speech so that should really only be used for specific portions.
What you want to have more speech be recognized is to play with the whisper settings themselves. There is no perfect settings to always use or it would be the default so it'll depend on what you want. More speech recognized also means more noise recognized as speech.
large-v3 will recognize more speech as large-v2 from what I saw but the quality of the translation doesn't seem to be as good so an easy way would be to use v3 to fill in what v2 misses.
There is a thread that discusses tuning the settings that you may want to look at here: https://www.akiba-online.com/thread...age-an-intro-guide-to-subtitling-jav.2115103/
Last edited: