Below is a sample of transcription from Whisper. I clean it up a little bit with the pronouns like I/you, he/she, etc. I also transcribed the same movie in Capcut and the result is similar in meaning. If someone wants to review if the transcriptions are correct or wrong, then just message me so I can give you the file.
I believe that the Whisper transcriptions are correct because they convey similar meanings after testing the same movie with Capcut. In addition to that, the transcription matches the scenario of the scene.
If the transcription results of whisper did not satisfy you after using it, then maybe it satisfied someone else. Just don’t expect too high that the result is close to human transcriptions.
And to those who are getting repeated lines, it happened to me too. The solution to that is to run the same file again, maybe there are some errors since I am running the whisper with the web version. After running it again, the lines don’t repeat anymore, and you can compare your first result to the second one. Also, I am converting the video file to MP3 with the highest volume possible in the converter. I don’t know if this will work for you, but it worked for me.
The sample below is a 41-minute video that generated 600+ lines.
I believe that the Whisper transcriptions are correct because they convey similar meanings after testing the same movie with Capcut. In addition to that, the transcription matches the scenario of the scene.
If the transcription results of whisper did not satisfy you after using it, then maybe it satisfied someone else. Just don’t expect too high that the result is close to human transcriptions.
And to those who are getting repeated lines, it happened to me too. The solution to that is to run the same file again, maybe there are some errors since I am running the whisper with the web version. After running it again, the lines don’t repeat anymore, and you can compare your first result to the second one. Also, I am converting the video file to MP3 with the highest volume possible in the converter. I don’t know if this will work for you, but it worked for me.
The sample below is a 41-minute video that generated 600+ lines.