May i know the average size of an audio that is extracted from 1-hour video? Because when i am converting a 1-hour video to mp3 it has an average size of 120mb. I wonder what is the size of an extracted audio compared to mp3. Thank you for your explanation btw.
Depends entirely on the bitrate used to encode the original audio. What matters most is the quality of the audio. With equal size, between mp3, m4a(aac) and opus, opus will sound closer to the original, followed by aac, followed by mp3.
And if you encode the original audio to either of those 3 format, you're going to lose some quality which could affect the ability of whisper to recognize voice(but in practice it doesn't seem to have much effect, although it's hard to tell with the randomness of it).
That's why I recommend keeping the audio as-is and just separating it from the video, you get the best possible quality this way and as a bonus, the size is usually smaller than if you use quality settings for mp3s.
To answer your question more directly, these days, I see either 192Kbps aac audio in most HD releases which would be about 87MB per hour if I did my math right or 128Kbps which would be 54MB and also the most common bitrates you'll encounter for any audio from ripped movies.
Sounds like you're using 256Kbps for your mp3s which would be around 115MB with the same math which is bigger than most audio I've seen, it rarely goes above 192Kbps(for ripped content, DVD/BluRay audio is much bigger).
I use 128Kbps opus for my own encodes for the quality version(which is of similar quality than your mp3 but only half the size) and 64Kbps for the small version.