1. Restart your computer again and try again.Hopefully someone can help me out here as I'm tearing my hair out over this one. I've used pyTranscriber to get some basic translations/timestamps on a video, but after getting it to work once every single subsequent .srt file it churns out is blank. It produces .txt files just fine. Any suggestions?
I've written a notebook that combines Whisper with a separate VAD. It works much better than Whisper alone on long-form inputs, and also runs about 2-4x faster.
Hi - how can you make the audio file (mp3) with Audicity to have better-transcribing results? I mean, what do you edit or tweak to make it better recognized by whisper / pytranscriber / or any software you use to transcribe?It's like Maload said. The clearer the movie dialog, the better pyTranscriber works. And along with Audacity it's great. Remember pytranscriber will give you timing code and some subs at the very least. The rest is up you
You can use both the .mp4(audio/video file) or mp3(audio file) with pyTranscriber. I think mp3 works best because with Audacity you can make the audio a little better.
And remember, 90% of JAV dialog is the same words/phrases.
You should be prepare to guess some scenes as the music and other people talking is just too hard to understand.. unless you know Japanese.
If it's girls-- and they're at school-- and it's just background talk-- then just say, Girl A: How was the exam? Or something like that. Girl B: Um, I didn't score well. Or something like that.
But look/listen for the main star(s) to talk louder OR whoever talks the loudest. That's your focus when everyone talking at once. OR just skip it. Sometimes its just better than stressing
My simple advice is increase the volume to reasonable levels, add bass reasonably, listen to it, then export to mp3, and then used that mp3 for pytranscriber. I used this alot with some personal settings.Hi - how can you make the audio file (mp3) with Audicity to have better-transcribing results? I mean, what do you edit or tweak to make it better recognized by whisper / pytranscriber / or any software you use to transcribe?
thank you so muchI've posted this before, but my posts got wiped out due to reasons (long story). I'm posting it again in case it helps anyone.
Essentially, it's a Japanese to English spreadsheet (csv format) of common words and phrases I learned when translating over a dozen titles. The Japanese phrases are written and sorted in romaji (Japanese written using English characters). There is the romaji word or phrase, a literal translation, an alternate translation (in case the phrase is a euphemism), and notes.
I'm far from literate in Japanese, so these are all based on my own research and best guesses based on context. If anyone spots flaws or inconsistencies, or wants to add their own words or phrases, let me know. Also, I translate lesbian titles, so the phrases are skewed in that direction.
Those are indeed some very common words in JAV.I've posted this before, but my posts got wiped out due to reasons (long story). I'm posting it again in case it helps anyone.
Essentially, it's a Japanese to English spreadsheet (csv format) of common words and phrases I learned when translating over a dozen titles. The Japanese phrases are written and sorted in romaji (Japanese written using English characters). There is the romaji word or phrase, a literal translation, an alternate translation (in case the phrase is a euphemism), and notes.
I'm far from literate in Japanese, so these are all based on my own research and best guesses based on context. If anyone spots flaws or inconsistencies, or wants to add their own words or phrases, let me know. Also, I translate lesbian titles, so the phrases are skewed in that direction.
I am currently using Whisper, an AI technology, with more precision in the transcription of the dialogue of the movies, it can also be used on computers and in my case remotely, that is to say virtually without using my computer resources, to improve some programmers still including technologies such as VAD audio improvement, which greatly improves the quality of the dialogue lines, even better it does not create cut or invented words or that do not exist, logically it is not perfect but today it is better than autosu or some other system, It also gives you the .srt file in the language or if you prefer in English.Long time lurker here who is immensely grateful for all of the great work that all you talented and hard working JAV subtitlers have done! It really does add a new level of enjoyment when you actually understand what they are saying, especially in the drama-driven films which I particularly enjoy.
Now, I've been meaning to get off my butt and try my hand at attempting to make some subs for some of my favourite films and hopefully if all goes well, I'll share some of my work here. This is coming from a non-Japanese speaker so the challenges are quite numerous, as you would imagine. Thankfully, there is an abundance of AI-driven methods out there, with varying accuracy.
This takes me to the point of my post - and I apologise if it is something that has been mentioned already - which is whether anyone has tried to use Adobe Premiere Pro's Speech-To-Text captioning tools yet? If so, I'd be interested to know how it compares with other more popular or proven methods of creating captions from the speech of a video from scratch without hard or soft-coded subtitles already available?
I have only run a preliminary test (thankfully Japanese was one of the 13 languages supported), and it seems to work for the most part, but there is a lot of polishing to be done. Now, it only extracts the captions in the language spoken (it won't autodetect which language is spoken, you must specify) so the resulting transcription must be translated afterwards (for which I just use good ol' Google Translate).
So it looks promising, but I'm just currently racking my brain as to how to export the resulting transcription into a conventionally-recognised subtitle file format, short of painstakingly going through it to make the edits myself, so if anyone who is more versed in it can point me in the right direction, I'd greatly appreciate it!
Seems promising.I am currently using Whisper, an AI technology, with more precision in the transcription of the dialogue of the movies, it can also be used on computers and in my case remotely, that is to say virtually without using my computer resources, to improve some programmers still including technologies such as VAD audio improvement, which greatly improves the quality of the dialogue lines, even better it does not create cut or invented words or that do not exist, logically it is not perfect but today it is better than autosu or some other system, It also gives you the .srt file in the language or if you prefer in English.
Here is an example of what AI Whisper technology can achieve.
View attachment 3119103View attachment 3119104
Thanks for the heads up, I'll definitely check it out. I must say, though, that it does indeed look so much more polished that with my rudimentary attempts with Adobe Premiere Pro.I am currently using Whisper, an AI technology, with more precision in the transcription of the dialogue of the movies, it can also be used on computers and in my case remotely, that is to say virtually without using my computer resources, to improve some programmers still including technologies such as VAD audio improvement, which greatly improves the quality of the dialogue lines, even better it does not create cut or invented words or that do not exist, logically it is not perfect but today it is better than autosu or some other system, It also gives you the .srt file in the language or if you prefer in English.
Here is an example of what AI Whisper technology can achieve.
View attachment 3119103View attachment 3119104
Hello, the Whisper artificial intelligence system is really new, it is open source, so those people who have the knowledge will be able to add improvements to the AI, like the following example, there is currently a version that adds VAD, I understand this system improves the audio, that added to whisper has given me results than with any other similar program, like the example that I uploaded, that film is recorded outdoors, there are between 3 and 6 people at a time, the Collab, you can also find it in this page.Seems promising.
1. How does it handle multiple people talking at once?
2. Background noises overrides the speaker? Most JAV has the inner dialogue of the actress/actor talking to themselves and with music(piano music or whatever). Or music introduction between scenes.
3. Low talking. The actress talks softly or the sound person(mic) is too far away.
thanks, I can't wait to hear the results.Thanks for the heads up, I'll definitely check it out. I must say, though, that it does indeed look so much more polished that with my rudimentary attempts with Adobe Premiere Pro.
Although after reading @Taako's comments above, I left out the part where Adobe automatically transcribes the results with multiple speakers, the cast of which is automatically populated but later editable after the captions are finished.
However, it's accuracy (transcription quality and recognition of different speakers) is something I've yet assessed properly, but it does give me an excuse to peruse through my favourite JAV again lol!
I'll report back once I can get a more definitive answer to these questions, although the quality of the transcription can be better assessed by more (or less non-existent) Japanese linguists.
This brings me to another question about Whisper AI (which, I gather, is currently the gold-standard in automatic transcription) and that is whether it is competent at recognising slang or colloquial terms, not to mention the many expletives prevalent in such 'exotic' movies lol?
1. So does it work when multiple speakers...talking at the same?Hello, the Whisper artificial intelligence system is really new, it is open source, so those people who have the knowledge will be able to add improvements to the AI, like the following example, there is currently a version that adds VAD, I understand this system improves the audio, that added to whisper has given me results than with any other similar program, like the example that I uploaded, that film is recorded outdoors, there are between 3 and 6 people at a time, the Collab, you can also find it in this page.
to answer your other question, with the sound or the voice of the actress, yesterday I transcribed a low-quality film, both audio and video, it was SD quality at 480, the actress speaks in a low voice in several of the scenes and even so, the whisper+VAD program managed to obtain many clean lines of text, the AI is programmed to give you real text, it does not put invented text or invent words that do not exist, in any case it leaves some spaces without translation, obviously it is not perfect I have detected that sometimes, mainly in sex scenes, he repeats some words when he was unable to detect the audio, but this was before using the version with VAD, in conclusion up to now it is the best system that I have tried and it has left me satisfied, passing from 200 line subtitles to 500+ lines depending on the movie, most are throwing me that amount in others even many more, the record is 1700 lines and the best thing is that this can only get better, the developments lladores formed at some point in the TESLA company.1. So does it work when multiple speakers...talking at the same?
2. Does it work when the speaker is talking low?
3. Does it work if the sound quality of the movie is low?
4. What movie did you test it on?
5. Did you try a difficult movie with a group talking such as rctd-459. This has a sub...
but i remember there's scenes between "sisters" talking, while the "brother" have sex with mom. The brother and mom would talk as well.
In this movie, the mom and daughters is unaware. So the talking continues as if the brother is not there.
I wonder how Whisper would handle it?
Thank you. It sounds promising. I will wait to see how others might like it.to answer your other question, with the sound or the voice of the actress, yesterday I transcribed a low-quality film, both audio and video, it was SD quality at 480, the actress speaks in a low voice in several of the scenes and even so, the whisper+VAD program managed to obtain many clean lines of text, the AI is programmed to give you real text, it does not put invented text or invent words that do not exist, in any case it leaves some spaces without translation, obviously it is not perfect I have detected that sometimes, mainly in sex scenes, he repeats some words when he was unable to detect the audio, but this was before using the version with VAD, in conclusion up to now it is the best system that I have tried and it has left me satisfied, passing from 200 line subtitles to 500+ lines depending on the movie, most are throwing me that amount in others even many more, the record is 1700 lines and the best thing is that this can only get better, the developments lladores formed at some point in the TESLA company.
and the best, it's free.