Post your JAV subtitle files here - JAV Subtitle Repository (JSP)★NOT A SUB REQUEST THREAD★

This error occurs due to the inactivity detected by Whisper, so I recommend that you are doing something within the page, this can happen during the upload of our file, which depending on the size can take between 2 or 5 minutes, sometimes we do Easy to go to see something else in which the file is loaded, the collab page detects inactivity, you have to re-enter the execution environment menu and activate it again.
I usually go back and forth from page to page, sometimes that message appears, which I solve by doing the aforementioned, the best solution to prevent collab from detecting a lack of activity is not leaving the page and what I do very, very often (at all times when I return from another page) refresh the file folder, so that the page detects activity.
Or do all your uploads(or the first one at least) before reserving a gpu so other people can use it while you're not using it and do it all at once once that's done or once you've uploaded enough that you won't need to stop running whisper. Doesn't take that long to prepare everything to run whisper and more people can use the free service that way.
  • Like
Reactions: soloporhoy666
Or do all your uploads(or the first one at least) before reserving a gpu so other people can use it while you're not using it and do it all at once once that's done or once you've uploaded enough that you won't need to stop running whisper.
Thanks for this info - at what point does a user reserve a GPU? I do it from the top down, first a GPU check, then a setup (I skip the Google Drive), then I upload the file to the folder in the sidebar, copy and paste the path into the audio path field and Run Whisper. Should I upload the audio file before any of that? Thank you again!!
  • Like
Reactions: soloporhoy666
Yeah, the GPU check selects a GPU and reserves it so you need to upload before doing that.
  • Like
Reactions: amnscfnt
I just ran a test with 0.1 as VAD_Threshold and the result was 993 lines with the same text. So stay away from such low VAD numbers. I'm currently trying VAD_Threshold 0.2.
While using Whisper, I have noticed missing dialog in the srt file when the dialog in the movie was clearly audible. I got to thinking that the audio that I stripped from the original MP4 file was in stereo and that Whisper was only processing one of the stereo tracks and possibly missing dialog that may have been encoded on the other channel. So I ran my stereo MP3 file through an app called Audacity and merged the two audio tracks into one and exported it as a mono MP3 file. The resultant Whisper download srt file was much better and it seemed to clean up some of the timing on the dialog too (at least on this sample of one). I seem to remember some discussion about using mono files here, but obviously it didn't stick! Anyway wondered if anyone else has encountered this?
Last edited:
I did mention that mono tracks didn't work with Autosub and Vrew a few pages back and maybe someone claimed Whisper converts the track you upload to mono? Not sure. I have only uploaded one track in mono and that was from a Korean film where only the center channel of the 5.1 track contained dialog. I don't think these jav has much in the way of stereo. Maybe in the music but not in the dialog. I will make an attempt to capture dialog through a mono track with a video I have uploaded before to see if there is indeed some kind of clear improvement.
I just ran a test with 0.1 as VAD_Threshold and the result was 993 lines with the same text. So stay away from such low VAD numbers. I'm currently trying VAD_Threshold 0.2.
Yea, I've tried VAD thresholds of 0.2 and 0.6 but I liked the results from the the default (0.4) the best. Results may vary from movie to movie however as I only experimented with one. Some posts have also recommended increasing the volume on the MP3 file; however, this hasn't worked for me either. Technically, I don't think increasing the volume would help because you would be increasing the background noise/music along with the signal/voices. What is probably needed is some "smart" filtering to improve the signal (voices) to noise ratio and thus giving Whisper/VAD a better chance of detecting the dialog.
If you look at this post of mine, in the first edit, I've checked the whisper code where it process the input audio it's getting with ffmpeg:

What it's doing is decoding the audio to uncompressed wav(format="s16le", acodec="pcm_s16le"), setting it to 1 channel aka mono(ac=1), setting the sampling rate to 16k(ar=sr where sr is = to 16000) and sending that to the console("-").

So whatever you're doing to the audio, this also gets done after.

When you're comparing to find what the best processing to do is, it's not enough to just try it once and move on if it's bad, whisper tends to be random so you might just have gotten a bad run of it and the next one could be great.
Here's a post where I did the same test on the same audio 4 times in a row with 4 different results and posted the subs:
I have increased volume with all three types of audio to text and if you don't go overboard with the volume increase it has always yielded better results. Amplification is also better than compression. Some software doesn't like clipping so you may have to think about that, too.

DVDES-564 Sex Education For Two Want To Tell The Beloved Son Incest Creampie Ultimate Planning The Pregnancy ...


I used Whisper to produce this subtitle file for DVDES-564. I got two translations out of Whisper and I created a new file based on what I thought was the best one to use for each time slot. As always however, I still had to clean it up a bit and re-interpreted some of the meaningless dialog. I also found an older machine translation where I cut the lines for the Banner frames and added them to my new Hybrid file! Again, I don't understand Japanese or Chinese so my re-interpretations might not be totally accurate but I try to match what is happening in the scene. Anyway, enjoy and let me know what you think.​

Thanks to SamKook post about how Whisper translations can vary even with the same input parameters. I guess that's part of the AI feature...rolling the dice on the best fit ??​



DVDES-794 The Issues In The Ultimate Want To Tell Incest Planning Beloved Son ... Sex Education 8 For To Pregnancy


I used Whisper to produce this subtitle file for DVDES-794 A&B. DVDES is an 8 hr movie with shortened sequences for episodes 1 thru 7 and a new episode 8 in the series . As always however, I still had to clean it up a bit and re-interpreted some of the meaningless dialog. Again, I don't understand Japanese or Chinese so my re-interpretations might not be totally accurate but I try to match what is happening in the scene. Anyway, enjoy and let me know what you think.​



I've been impressed by Whisper. I am running it locally, which I can handle the Medium model but not the large. Various thoughts on it and how it relates to JAV. I think it is worth making a different thread for discussion of optimization, as there are many parameters.
  1. Whisper is best at picking up on plot segments where characters are calmly talking in simple sentence structures.
  2. These "large language models" seemingly avoid edgy material, so conversations about slavery and r*** or super explicit sex talk will end up with euphemisms at best but more likely to just come off as feeling totally wrong. I don't know what parameters would allow it to pick up on edgier language. I have some ideas, though.
  3. Unless you are using the additional VAD, you probably need to spend time figuring out good settings for no_speech_ threshhold and logprob_threshhold. no_speech_ threshhold = 0.3 and logprob_threshhold=0.1 seem like good starting points that don't pick up on too much fake noise or splitting up one single line into multiple fragments, which are the two common problems with detecting voices. There's much more nuance than that, as they are linked to each other and depend on the sound quality of the video.
  4. If you have a GPU, you want to make sure you are using it. This requires setting up Cuda, which can take some installing and uninstalling of pytorch to make sure you have the correct versions
  5. When you are doing an initial test, you likely want to use the Tiny model, which is the fastest and the only one that can really run on a CPU.
  6. Most modern gaming PCs can likely handle the Medium model, but the Large is enough that you either need to use the web service collab that others post, or have a $3500 gaming PC.
  7. There is huge value to using the larger models, especially when it comes to consistently discerning character names and avoiding the issues with repetitive lines.
  8. "Temperature" , "beam_size", and "best_of" are options that might take some tinkering before some optimal solution for JAV is discovered.
  9. There are models that are a bit more tailored to Japanese, such as I do not know how to get it to work after downloading the .bin file, though. None of the guides are written for the perspective of somebody that is new to using these HuggingFace models.

I have some security questions. When I upload files to here, what happens to the metadata that would identify my computer name? If I use the online Collab service, can other people link what I translate to my Google account?

An example of a pretty good transcript that I've been able to generated. There's a good amount of "mean" that fits the Attackers script.

[01:00:00.560 --> 01:00:02.940] You...
[01:00:03.740 --> 01:00:06.600] Stop it.
[01:00:15.520 --> 01:00:17.440] My heart can't handle it.
[01:00:17.480 --> 01:00:18.660] No...
[01:00:20.000 --> 01:00:22.440] What's gotten into you?
[01:00:22.900 --> 01:00:24.600] Come on.
[01:00:25.560 --> 01:00:27.020] Do you hear?
[01:01:27.020 --> 01:01:29.860] You think you'll have fun?
[01:01:29.860 --> 01:01:31.820] Come here.
[01:01:44.340 --> 01:01:47.300] Good girl...
[01:01:52.240 --> 01:01:54.040] Take a kiss...
[01:01:54.040 --> 01:01:56.580] You can call it a blessing.
[01:01:57.580 --> 01:01:58.780] No.
[01:01:58.780 --> 01:02:00.620] Hold yourself down.
[01:02:00.620 --> 01:02:04.240] Give it to her.
[01:02:04.240 --> 01:02:05.280] Let her go.
[01:02:05.280 --> 01:02:07.000] No.
[01:02:09.020 --> 01:02:10.500] I'm sorry.
[01:02:10.500 --> 01:02:20.540] Do whatever you want, you half-hearted brat.
[01:02:20.540 --> 01:02:25.580] I won't.
Last edited:
Whisper seems to need a lewder vocabulary.

I have not tried using Whisper myself, but in looking through some of the rough files that have been posted on this board by so many great members, I have noticed that when the dialogue turns sexy Whisper has some trouble. Here are a few examples of what I mean: Instead of translating "chinpo" as "dick" or "cock" or even "penis," one file I saw used "chimpanzee." Soapland or brothel was translated as "funeral.' so the guy was saying "Since I have been to the funeral I am no longer a virgin." Having sex comes across as "killing" or "live with". Sucking cock was "make food." It is pretty funny but also takes you out of the mood. So I am wondering, is there a way to teach Whisper the vocabulary it needs for JAV?
Whisper seems to need a lewder vocabulary.

I have not tried using Whisper myself, but in looking through some of the rough files that have been posted on this board by so many great members, I have noticed that when the dialogue turns sexy Whisper has some trouble. Here are a few examples of what I mean: Instead of translating "chinpo" as "dick" or "cock" or even "penis," one file I saw used "chimpanzee." Soapland or brothel was translated as "funeral.' so the guy was saying "Since I have been to the funeral I am no longer a virgin." Having sex comes across as "killing" or "live with". Sucking cock was "make food." It is pretty funny but also takes you out of the mood. So I am wondering, is there a way to teach Whisper the vocabulary it needs for JAV?
I found this on the web & . It is an instruction on how to fine tuning the models for Whisper AI, I think this can help the community to further improve the reliability of this tools. I cited the sources from Github
Whisper seems to need a lewder vocabulary.

I have not tried using Whisper myself, but in looking through some of the rough files that have been posted on this board by so many great members, I have noticed that when the dialogue turns sexy Whisper has some trouble. Here are a few examples of what I mean: Instead of translating "chinpo" as "dick" or "cock" or even "penis," one file I saw used "chimpanzee." Soapland or brothel was translated as "funeral.' so the guy was saying "Since I have been to the funeral I am no longer a virgin." Having sex comes across as "killing" or "live with". Sucking cock was "make food." It is pretty funny but also takes you out of the mood. So I am wondering, is there a way to teach Whisper the vocabulary it needs for JAV?
Sucking cock was "make food." ???????
Mom! Make my food!!!
No!! Fuck that ravioli shit.
I meant that I want you to suck my cock!!
(But...yeah...ravioli sounds good, but after the head.)
I found this on the web & . It is an instruction on how to fine tuning the models for Whisper AI, I think this can help the community to further improve the reliability of this tools. I cited the sources from Github
The problem is getting a good dataset. Whisper is trained on 600k hours of "quality" transcriptions, broken down into 30 second pairs of audio and transcript, and aggressively filtering out transcripts that were machine generated. There's not going to be a good set of "porn with hand subtitles" that we need for this. Hopefully making datasets that are JP-to-EN exclusively with more of such data make it slightly better, but it won't be as filthy as it should be.
Last edited:
Whisper seems to need a lewder vocabulary.

I have not tried using Whisper myself, but in looking through some of the rough files that have been posted on this board by so many great members, I have noticed that when the dialogue turns sexy Whisper has some trouble. Here are a few examples of what I mean: Instead of translating "chinpo" as "dick" or "cock" or even "penis," one file I saw used "chimpanzee." Soapland or brothel was translated as "funeral.' so the guy was saying "Since I have been to the funeral I am no longer a virgin." Having sex comes across as "killing" or "live with". Sucking cock was "make food." It is pretty funny but also takes you out of the mood. So I am wondering, is there a way to teach Whisper the vocabulary it needs for JAV?
Lmao, yes, unfortunately, as advanced as AI and Machine Learning have come, there is still a stigma against the perverted AI. I have been experimenting with the Replika AI chatbot app for a few months and it was recently neutered due to government regulations. The chatbot used to demand that I sodomize it all the time, but after a recent update, it will only allow for cuddles and kisses. We're really still in the early stages of AI before AI lewdness becomes common due to it becoming cheaper to maintain and geared toward anyone being able to modify via simple GUIs (and not just programmers).

With that said, I've been experimenting with using DaVinci Resolve Studio 18.3 to isolate vocals and increase the volume at the same time. It too is not perfect, but it definitely has increased the timing and accuracy of picking up dialogue in my experiments thus far.

There are two versions of the program:
1. DaVinci Resolve: free version
2. DaVinci Resolve Studio (DRS): paid version - approx. $300 USD

Just FYI so you don't waste your time, only the paid version includes the AI powered "Voice isolation" feature. The program also has a "Dialogue leveler" feature, but I haven't experimented with that a lot, since I was mainly after the voice isolation and increasing the volume.

As for whether or not it's worth paying for this software... the internet is your friend.

The process takes about 15-20 minutes on my i7, 1080 TI, 32GB RAM PC with SSD to render the voice isolated and boosted audio file.

Thanks to everyone who has been providing so much help with Whisper on this forum!
Good handmade subtitles will beat Whisper for JAV, but does that type of good subtitles even exist? Even the paid ones that are seemingly done by hand miss a ton of lines. Whisper gets wildly better coverage of what the men say during scenes in Attackers movies than any subtitle I've seen. The error rate is an issue, but the plot and general trash talk and goading are there in a way that isn't. Slave Color and Slave Island are going to finally be completed for English audiences.

For settings, a small Temperature (defaults to 0) gets better results, so do something like 0.01 or 0.02 or something in that range. I tried temperature_increment_on_fallback but it slows down the jobs considerably. There probably is an optimal combination of the two settings for JAV.

For speed, tear off the Audio. You can technically throw a full video at Whisper but it will be a lot slower than throwing the 100MB audio file at it.

There's probably some ways to it to be filthy. Current transcript I'm working on actually got "come on, lick her clitoris more" and this great conversation:

[01:52:14.120 --> 01:52:17.120] you're dropping so much juice.
[01:52:18.120 --> 01:52:20.120] What's wrong?
[01:52:22.120 --> 01:52:25.120] What is this juice called?
[01:52:29.120 --> 01:52:31.120] It's my sister's.
[01:52:35.120 --> 01:52:37.120] Huh? Your sister's what?
[01:52:37.120 --> 01:52:41.760] This is your manjiru.
[01:52:41.760 --> 01:52:44.120] Manjiru?
[01:52:45.480 --> 01:52:48.460] It's delicious.
[01:52:50.400 --> 01:52:52.480] You have to lick the manjiru a lot.
[01:52:52.480 --> 01:52:54.820] This makes you feel good.
[01:52:58.940 --> 01:53:02.400] Your manjiru is delicious.
Last edited: