Post your JAV subtitle files here - JAV Subtitle Repository (JSP)★NOT A SUB REQUEST THREAD★

Thanks, SamKook. No, I don't know anything about code or scripts, really. I never used source separation on the old Colab because it never worked.
I'll see if I can use this. In the meantime I'll use the old Colab and Faster Whisper using the model Medium. The large model gives me SO many bad lines. I wonder why.
 
There's 3 version of large.

From my limited testing, I've found that v3 seems to more aggressively try to detect speech so it finds more but also makes a lot of mistakes and v2 finds less but also makes less mistakes.
Not too sure about v1, didn't notice anything particular about it. I've found comparing to be mostly useless since even comparing the exact same settings with the exact same model can give wildly different results so seems more like luck than anything else for what you get with some general tendencies.
 
  • Like
Reactions: ericf
Are there parameters in Faster Whisper for using different versions? I didn't look that hard for command line parameters for Faster Whisper but what I found was just small, medium and large.
This is what I use:
faster-whisper-xxl.exe File.mp4 --language ja --task transcribe --model large --compute_type auto --device cuda --standard --print_progress --vad_alt_method pyannote_v3 --vad_min_silence_duration_ms 2000

The version model in Faster Whisper is LargeV3. Can I change that by downloading a different version? Do I need to make a change in the command to use a different large model or do I have to remove the folder from Models?
 
Last edited:
Also, the old colab seems to need some help. I get really short scripts from it compared to Faster Whisper. A 3 hr video gives me 228 lines in colab and 1044 in Faster Whisper with Medium.
 
"large" for the --model will default to whatever the version you're using set it to, but you can specify the specific version to choose it yourself, it's simply an alias that redirects to one of the version.

If I remember right, the correct spelling is "large-v2" where the number can be any of the 3 and it'll download it if it doesn't have it.

Never used faster whisper so I can't comment on it.
 
  • Like
Reactions: ericf
Thanks. Yeah, I managed to find the command info here: https://github.com/Purfview/whisper-standalone-win/discussions/231
I can't seem to fix the Collab problem with recognizing too few lines. I've been told to upload mono tracks so I'm using the recommended ffmpeg command line to convert stereo to mono:
ffmpeg -i input.wav -ar 16000 -ac 1 -c:a pcm_s16le output.wav

It always worked great.
Splitting the audio generated a few more recognized lines (228 > 263) but the result is ridiculous compared to how it was about a year ago when I used it last.
 
Whisper will convert the audio to that internally so it doesn't change anything, it's just ideal to do that if you modify the audio since it's a lossless format so you avoid degrading the audio with an extra lossy conversion.

There was a version issue after something updated on the original colab everyone was using here so one that forces the proper versions was made: https://colab.research.google.com/g...AV/blob/main/notebook/WhisperWithVAD_mr.ipynb

Not sure which you're using but that might change something if it's not that one.

How many lines it detects is a pretty bad indicator since medium might simply split up lines more or something like that or it might just makes a ton of mistakes, but I assume you took a closer look to them when comparing.
 
I'm afraid I don't understand the text about moving parts of the script in collab.
The script is divided into
#@markdown **Run Whisper**
# @markdown Required settings:
# Generate VAD timestamps
# Add a bit of padding, and remove small gaps
# If breaks are longer than chunk_threshold seconds, split into a new audio file
# This'll effectively turn long transcriptions into many shorter ones
# Merge speech chunks
# Convert timestamps to seconds
# Run Whisper on each audio chunk
Lots of small edits to the above
# DeepL translation

# Write SRT file

I don't understand which 'block' to move and where.
Is it everything in the first block like this:

if "http://" in audio_path or "https://" in audio_path:
print("Downloading audio...")
urllib.request.urlretrieve(audio_path, "input_file")
audio_path = "input_file"
else:
if not os.path.exists(audio_path):
try:
audio_path = uploaded_file
if not os.path.exists(audio_path):
raise ValueError("Input audio not found. Is your audio_path correct?")
except NameError:
raise ValueError("Input audio not found. Did you upload a file?")
out_path = os.path.splitext(audio_path)[0] + ".srt"
out_path_pre = os.path.splitext(audio_path)[0] + "_Untranslated.srt"
if source_separation:
print("Separating vocals...")
!ffprobe -i "{audio_path}" -show_entries format=duration -v quiet -of csv="p=0" > input_length
with open("input_length") as f:
input_length = int(float(f.read())) + 1
!spleeter separate -d {input_length} -p spleeter:2stems -o output "{audio_path}"
spleeter_dir = os.path.basename(os.path.splitext(audio_path)[0])
audio_path = "output/" + spleeter_dir + "/vocals.wav"
print("Encoding audio...")
if not os.path.exists("vad_chunks"):
os.mkdir("vad_chunks")
ffmpeg.input(audio_path).output(
"vad_chunks/silero_temp.wav",
ar="16000",
ac="1",
acodec="pcm_s16le",
map_metadata="-1",
fflags="+bitexact",
).overwrite_output().run(quiet=True)
print("Running VAD...")
model, utils = torch.hub.load(
repo_or_dir="snakers4/silero-vad:v4.0", model="silero_vad", onnx=False
)
(get_speech_timestamps, save_audio, read_audio, VADIterator, collect_chunks) = utils

and move it to under
# Write SRT file

??
 
Whisper will convert the audio to that internally so it doesn't change anything, it's just ideal to do that if you modify the audio since it's a lossless format so you avoid degrading the audio with an extra lossy conversion.

There was a version issue after something updated on the original colab everyone was using here so one that forces the proper versions was made: https://colab.research.google.com/g...AV/blob/main/notebook/WhisperWithVAD_mr.ipynb

Not sure which you're using but that might change something if it's not that one.

How many lines it detects is a pretty bad indicator since medium might simply split up lines more or something like that or it might just makes a ton of mistakes, but I assume you took a closer look to them when comparing.
Yeah. It's a difference in file size: 16 kb vs 69 or 70 kb.
The settings for line division are off in Faster Whisper, some lines are very long (12 seconds) but I guess I can edit that later.
 
You copy everything from the first step code, which is this if you keep the default:
Code:
#@title Whisper Transcription Parameters

model_size = "large-v2"  # @param ["large-v3", "large-v2", "medium", "large"]
language = "japanese"  # @param {type:"string"}
translation_mode = "End-to-end Whisper (default)"  # @param ["End-to-end Whisper (default)", "Whisper -> DeepL", "No translation"]
# @markdown VAD settings and DeepL:
deepl_authkey = ""  # @param {type:"string"}
source_separation = False  # @param {type:"boolean"}
vad_threshold = 0.4  # @param {type:"number"}
chunk_threshold = 3.0  # @param {type:"number"}
deepl_target_lang = "EN-US"  # @param {type:"string"}
max_attempts = 1  # @param {type:"integer"}

#@markdown Enter the values for the transcriber parameters. Leave unchanged if not sure.
verbose = False #@param {type:"boolean"}
temperature_input = "0.0" #@param {type:"string"}
compression_ratio_threshold = 2.4 #@param {type:"number"}
logprob_threshold = -1.0 #@param {type:"number"}
no_speech_threshold = 0.6 #@param {type:"number"}
condition_on_previous_text = False #@param {type:"boolean"}
initial_prompt = "" #@param {type:"string"}
word_timestamps = True #@param {type:"boolean"}
clip_timestamps_input = "0" #@param {type:"string"}
hallucination_silence_threshold = 2.0 #@param {type:"number"}

#@markdown Decoding Options (for advanced configurations, leave unchnaged if unsure):
best_of = 2 #@param {type:"number"}
beam_size = 2 #@param {type:"number"}
patience = 1 #@param {type:"number"}
length_penalty = "" #@param {type:"string"}
prefix = "" #@param {type:"string"}
suppress_tokens = "-1" #@param {type:"string"}
suppress_blank = True #@param {type:"boolean"}
without_timestamps = False #@param {type:"boolean"}
max_initial_timestamp = 1.0 #@param {type:"number"}
fp16 = True #@param {type:"boolean"}
# Parsing and converting form inputs
try:
    temperature = tuple(float(temp.strip()) for temp in temperature_input.split(',')) if ',' in temperature_input else float(temperature_input)
except ValueError:
    temperature = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0)  # Default
clip_timestamps = clip_timestamps_input.split(',') if ',' in clip_timestamps_input else clip_timestamps_input
if clip_timestamps != "0":
    try:
        clip_timestamps = list(map(float, clip_timestamps)) if isinstance(clip_timestamps, list) else float(clip_timestamps)
    except ValueError:
        clip_timestamps = "0"  # Default if parsing fails
language = None if not language else language
initial_prompt = None if initial_prompt == "" else initial_prompt
length_penalty = None if length_penalty == "" else float(length_penalty)

assert max_attempts >= 1
assert vad_threshold >= 0.01
assert chunk_threshold >= 0.1
assert language != ""
if translation_mode == "End-to-end Whisper (default)":
    task = "translate"
    run_deepl = False
elif translation_mode == "Whisper -> DeepL":
    task = "transcribe"
    run_deepl = True
elif translation_mode == "No translation":
    task = "transcribe"
    run_deepl = False
else:
    raise ValueError("Invalid translation mode")


# Prepare transcription options
transcription_options = {
    "verbose": verbose,
    "compression_ratio_threshold": compression_ratio_threshold,
    "logprob_threshold": logprob_threshold,
    "no_speech_threshold": no_speech_threshold,
    "condition_on_previous_text": condition_on_previous_text,
    "initial_prompt": initial_prompt,
    "word_timestamps": word_timestamps,
    "clip_timestamps": clip_timestamps,
    "hallucination_silence_threshold": hallucination_silence_threshold
}
# Prepare decoding options
decoding_options = {
    "task": task,
    "language": language,
    "temperature": temperature,
    "best_of": best_of,
    "beam_size": beam_size,
    "patience": patience,
    "length_penalty": length_penalty,
    "prefix": prefix,
    "suppress_tokens": suppress_tokens,
    "suppress_blank": suppress_blank,
    "without_timestamps": without_timestamps,
    "max_initial_timestamp": max_initial_timestamp,
    "fp16": fp16,
}

And then put it at the start of the last step code block which is:
Code:
#@markdown **Run Whisper**
# @markdown Required settings:
audio_path = "/content/drive/MyDrive/test.wav"  # @param {type:"string"}
assert audio_path != ""


import tensorflow as tf
import torch
import whisper
import os
import ffmpeg
import srt
from tqdm import tqdm
import datetime
import deepl
import urllib.request
import json
from google.colab import files

if "http://" in audio_path or "https://" in audio_path:
    print("Downloading audio...")
    urllib.request.urlretrieve(audio_path, "input_file")
    audio_path = "input_file"

*rest of the code here*

So you end up with:
Code:
#@title Whisper Transcription Parameters

model_size = "large-v2"  # @param ["large-v3", "large-v2", "medium", "large"]
language = "japanese"  # @param {type:"string"}
translation_mode = "End-to-end Whisper (default)"  # @param ["End-to-end Whisper (default)", "Whisper -> DeepL", "No translation"]
# @markdown VAD settings and DeepL:
deepl_authkey = ""  # @param {type:"string"}
source_separation = False  # @param {type:"boolean"}
vad_threshold = 0.4  # @param {type:"number"}
chunk_threshold = 3.0  # @param {type:"number"}
deepl_target_lang = "EN-US"  # @param {type:"string"}
max_attempts = 1  # @param {type:"integer"}

#@markdown Enter the values for the transcriber parameters. Leave unchanged if not sure.
verbose = False #@param {type:"boolean"}
temperature_input = "0.0" #@param {type:"string"}
compression_ratio_threshold = 2.4 #@param {type:"number"}
logprob_threshold = -1.0 #@param {type:"number"}
no_speech_threshold = 0.6 #@param {type:"number"}
condition_on_previous_text = False #@param {type:"boolean"}
initial_prompt = "" #@param {type:"string"}
word_timestamps = True #@param {type:"boolean"}
clip_timestamps_input = "0" #@param {type:"string"}
hallucination_silence_threshold = 2.0 #@param {type:"number"}

#@markdown Decoding Options (for advanced configurations, leave unchnaged if unsure):
best_of = 2 #@param {type:"number"}
beam_size = 2 #@param {type:"number"}
patience = 1 #@param {type:"number"}
length_penalty = "" #@param {type:"string"}
prefix = "" #@param {type:"string"}
suppress_tokens = "-1" #@param {type:"string"}
suppress_blank = True #@param {type:"boolean"}
without_timestamps = False #@param {type:"boolean"}
max_initial_timestamp = 1.0 #@param {type:"number"}
fp16 = True #@param {type:"boolean"}
# Parsing and converting form inputs
try:
    temperature = tuple(float(temp.strip()) for temp in temperature_input.split(',')) if ',' in temperature_input else float(temperature_input)
except ValueError:
    temperature = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0)  # Default
clip_timestamps = clip_timestamps_input.split(',') if ',' in clip_timestamps_input else clip_timestamps_input
if clip_timestamps != "0":
    try:
        clip_timestamps = list(map(float, clip_timestamps)) if isinstance(clip_timestamps, list) else float(clip_timestamps)
    except ValueError:
        clip_timestamps = "0"  # Default if parsing fails
language = None if not language else language
initial_prompt = None if initial_prompt == "" else initial_prompt
length_penalty = None if length_penalty == "" else float(length_penalty)

assert max_attempts >= 1
assert vad_threshold >= 0.01
assert chunk_threshold >= 0.1
assert language != ""
if translation_mode == "End-to-end Whisper (default)":
    task = "translate"
    run_deepl = False
elif translation_mode == "Whisper -> DeepL":
    task = "transcribe"
    run_deepl = True
elif translation_mode == "No translation":
    task = "transcribe"
    run_deepl = False
else:
    raise ValueError("Invalid translation mode")


# Prepare transcription options
transcription_options = {
    "verbose": verbose,
    "compression_ratio_threshold": compression_ratio_threshold,
    "logprob_threshold": logprob_threshold,
    "no_speech_threshold": no_speech_threshold,
    "condition_on_previous_text": condition_on_previous_text,
    "initial_prompt": initial_prompt,
    "word_timestamps": word_timestamps,
    "clip_timestamps": clip_timestamps,
    "hallucination_silence_threshold": hallucination_silence_threshold
}
# Prepare decoding options
decoding_options = {
    "task": task,
    "language": language,
    "temperature": temperature,
    "best_of": best_of,
    "beam_size": beam_size,
    "patience": patience,
    "length_penalty": length_penalty,
    "prefix": prefix,
    "suppress_tokens": suppress_tokens,
    "suppress_blank": suppress_blank,
    "without_timestamps": without_timestamps,
    "max_initial_timestamp": max_initial_timestamp,
    "fp16": fp16,
}


#@markdown **Run Whisper**
# @markdown Required settings:
audio_path = "/content/drive/MyDrive/test.wav"  # @param {type:"string"}
assert audio_path != ""


import tensorflow as tf
import torch
import whisper
import os
import ffmpeg
import srt
from tqdm import tqdm
import datetime
import deepl
import urllib.request
import json
from google.colab import files

if "http://" in audio_path or "https://" in audio_path:
    print("Downloading audio...")
    urllib.request.urlretrieve(audio_path, "input_file")
    audio_path = "input_file"

*rest of the code here*
 
Thanks. But no, that's not how the default script on that address looked from what I remember.
https://colab.research.google.com/g...hisperWithVAD_pro.ipynb#scrollTo=sos9vsxPkIN7
I'll check back when I've finished what I mention below.
I'll try that script later.

The new/old collab link you sent me seems to work. It does take a lot more time than the old one, though. It took forever before it started checking the audio and dividing it (in 375 chunks). It's at 81% after 30 minutes. Well, it's a 3 hr video.

Thank you so much for your time. I won't hold you up any longer. Have a nice day.
 
Thanks. But no, that's not how the default script on that address looked from what I remember.
https://colab.research.google.com/g...hisperWithVAD_pro.ipynb#scrollTo=sos9vsxPkIN7
I'll check back when I've finished what I mention below.
I'll try that script later.

The new/old collab link you sent me seems to work. It does take a lot more time than the old one, though. It took forever before it started checking the audio and dividing it (in 375 chunks). It's at 81% after 30 minutes. Well, it's a 3 hr video.

Thank you so much for your time. I won't hold you up any longer. Have a nice day.
I downloaded my adapted script. Just upload it to your colab. Works slightly different. Make sure to change this line in the code to the path where you store the audio: audio_folder = "/content/drive/MyDrive/Colab Notebooks/"
It will pick the audio from there and run it.
 

Attachments

  • Like
Reactions: xsf27 and ericf

[Reducing Mosaic]JUR-007 Every Night, I Heard The Moans Of The Woman Next Door, And I Was Worried About It… ~A Sweaty Afternoon With A Frustrated Married Woman~ Rena Fukiishi​


jur007pl.jpg

I downloaded this recent, reduced mosaic JAV starring one of my favorite MILFs. I used Purfview's Whisper in Subtitle Edit (using model V2-Large) to create this Sub. I also attempted to clean it up a bit and re-interpreted some of the meaningless/ "lewd-less" dialog. Again, I don't understand Japanese so my re-interpretations might not be totally accurate but I try to match what's happening in the scene. Anyway, enjoy and let me know what you think.​

Not much of a storyline nor is it an Incest themed movie but It is Rena Fukiishi!​

 

Attachments

Last edited:
That is the link I copy pasted the code from so it should be identical, the first block(Whisper Transcription Parameters) code is hidden by default so you have to make it show up first.
Ahh! I see now. I was looking at the parameters at Run Whisper only.

So I put all that code before this?:

else:
if not os.path.exists(audio_path):
try:
audio_path = uploaded_file
if not os.path.exists(audio_path):
raise ValueError("Input audio not found. Is your audio_path correct?")
except NameError:
raise ValueError("Input audio not found. Did you upload a file?")

Or do I remove the else: part since it refers back to
if "http://" in audio_path or "https://" in audio_path:
?
You see how little I know about scripts?
Thanks for the information.
 
Remove nothing, just put the whole thing at the beginning.

It's just variable declarations so it just needs to be there and not overlap with another line.

There might be a better way to use the colab, but that's just my solution for not having looked into the issue more than a few secs.
 
  • Like
Reactions: ericf

[Reducing Mosaic]MEYD-090 Was Being × Up To Father-in-law While You Are Not … Ayumi Shinoda​

meyd090pl.jpg


I downloaded this recent, reduced mosaic JAV starring my favoritest MILF. I used Purfview's Whisper in Subtitle Edit (using model V2-Large) to create this Sub. I also attempted to clean it up a bit and re-interpreted some of the meaningless/ "lewd-less" dialog. Again, I don't understand Japanese so my re-interpretations might not be totally accurate but I try to match what's happening in the scene. Anyway, enjoy and let me know what you think.​

Not my preferred flavor of Incest JAV, but it was Ayumi Shinoda and it was pretty erotic.​

 

Attachments

[Reducing Mosaic]JFB-011 Four Hours I’m The Only Mom Of Six-friendly (Sons)

jfb011pl.jpg

I downloaded this recent, reduced mosaic JAV starring many of my favorite MILFs. I used Purfview's Whisper in Subtitle Edit (using model V2-Large) to create this Sub. I also attempted to clean it up a bit and re-interpreted some of the meaningless/ "lewd-less" dialog. Again, I don't understand Japanese so my re-interpretations might not be totally accurate but I try to match what's happening in the scene. Anyway, enjoy and let me know what you think.​

 

Attachments

Last edited:
Could you help me with the error message I get?
NameError Traceback (most recent call last)

<ipython-input-6-1bc508f8c32b> in <cell line: 37>()
35 out_path = os.path.splitext(audio_path)[0] + ".srt"
36 out_path_pre = os.path.splitext(audio_path)[0] + "_Untranslated.srt"
---> 37 if source_separation:
38 print("Separating vocals...")
39 get_ipython().system('ffprobe -i "{audio_path}" -show_entries format=duration -v quiet -of csv="p=0" > input_length')


NameError: name 'source_separation' is not defined

The help in Whisper doesn't seem to make it clear what it is I'm supposed to do.
I have tried to just check the box at Source Separation but that doesn't do it.
Thanks.

This is late reply and I see that you have found various workarounds already.
In case this happens again in future:
- Keep source-separation unchecked. Source-separation extracts vocals from the music in the audio by using spleeter. You'd only needed it in rare oxxasions that the background music overpowers the dialog.
- I believe the error happens if somehow the code execution has been interrupted and cell 1 of the notbook needs to be re-executed. Just press the (thick) arrow head by the first cell to get a green tickmark.
- I checked the code, it looks correct and should work. If you still see a problem please send me a screenshot of the notebook.

cheers