So that ffmpeg command you listed is the one they recommend:I wouldn't know how to make that monofile with ffmpeg. I use Audacity to merge channels and encode as variable bitrate ogg for now. Better than mp3 in retaining information at the same bitrates. I use the webbased Whisper only. If I were to try and upload uncompressed files, I'd have to wait forever for them to be uploaded and I'd get messages about being inactive and get logged off. Upload to Whisper is pretty slow.
Also, I tried Whisper with an ac3 file once but that didn't work. It couldn't convert that to wav.
I could try flac, since that's not re-encoding the audio, but the files will still be very big.
So, if I want to use the Command prompt or powershell to encode audio to mono with ffmpeg, what do I do?
Do I use your example:
Open powershell in the directory for ffmpeg:
ffmpeg -i input.mp3 -ar 16000 -ac 1 -c:a pcm_s16le output.wav ?
Change input.mp3 to input.wav.
Will that work?
Done:
ffmpeg -i input.wav -ar 16000 -ac 1 -c:a pcm_s16le output.wav
I got a 250MB file from a 1.18GB file. Is that correct? It's been downsampled a lot, right?
Will this work better than a mono file from Audacity?
I'll try it, anyway.
On a bash shell I just loop through and convert all my mp4s to wavs:
for f in *.mp4; do ffmpeg -n -i "$f" -ar 16000 -ac 1 -c:a pcm_s16le "../wav/${f%.mp4}.wav"; done
I've also noticed sometimes some models work better than others, so if a movie is proving difficult I'll run a couple different models and then merge the results and clean up in aegisub.
I just pull models from here: https://huggingface.co/models?language=ja&other=whisper
Also sometimes it helps if you use an offset, so if the dialogue doesnt start for a while the transcription can be poor, so if you know the dialogue starts at 3 minutes in, then trim that off the beginning of the file, and then just offset the SRT post-processing. At least for whisper.cpp there is a parameter to do offset that makes things easier.
Last edited: