While I appreciate the feedback, I'm not sure how you're expecting me to do that for the approximate 4K+ subtitles that are alternate versions for the same movie. This is an automated process, so I can't just arbitrarily pick the larger version, as you pointed out, since it might be worse than the smaller one. And of course, I could only really judge the ~500 English files in this category.
There are also some examples of people releasing part A & B subtitles. Unfortunately, this impossible to detect and distinguish between incomplete versions, and semantically, it was too difficult to preserve that. So, for scenarios like this, I think the safest policy is to delete only exact duplicates. I understand that people may need to look through multiple files, and there's no guarantee of quality or completeness, but I just don't have the time or ability to go though and clean up the entire collection by hand.