JAV database

princeali692

Jav is love... Jav is life...
Jun 29, 2012
355
394
Does anyone know roughly how much space is needed to store just all the cover images in DMM? I've been wanting to get more into programming as a career change and I think a project like this would be great practice, although I doubt I could include it in a portfolio lol.
 

ldjb

ゴローさんの一番ファン
Jan 5, 2016
44
39
Well, we can come to a ballpark figure relatively easily.
Suppose the average size of a cover image is 290 kB. And suppose there are 270,000 such cover images that we want to store. These are both conservative figures but
Then it's a simple matter of 290 * 270,000 = 78,300,000 kB = 76,000 MB = 74 GB.
 
  • Like
Reactions: princeali692

Casshern2

Senior Member...I think
Mar 22, 2008
7,029
14,510
Well, we can come to a ballpark figure relatively easily.
Suppose the average size of a cover image is 290 kB. And suppose there are 270,000 such cover images that we want to store. These are both conservative figures but
Then it's a simple matter of 290 * 270,000 = 78,300,000 kB = 76,000 MB = 74 GB.
They are more like under 500kb. Maybe an average of 450? The math would be about...?
 

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,094
think a project like this would be great practice, although I doubt I could include it in a portfolio lol.
Why the hell not? Extremely unlikely you will go to a job interview in the IT world and the manager has issue with AV. But if you tell the manager it took you more than 2 days to do the project, he might not respect your skillz.

Well, we can come to a ballpark figure relatively easily.
Suppose the average size of a cover image is 290 kB. And suppose there are 270,000 such cover images that we want to store. These are both conservative figures but
Then it's a simple matter of 290 * 270,000 = 78,300,000 kB = 76,000 MB = 74 GB.
Wait... 270000 vids? That sounds a bit high. On an average day, how many new vids get released? 5-10? Not 30 I think... If we take 10 a day, than 270000 vids is 74 years worth of AV productions!

I think we can pick some sensible cut off lines. To keep the project small, we might do only DVD's, so that goes back only to late 90s? That is at most 90,000 vids, or 25 GB. A more realistic guess on my part is 15 GB.

Even if you include VHS (and BetaMax??), I'd still put a hard cap at 25 GB.

The thing that might throw a wrench at the works is DMM inventory multiple products (e.g. BD vs DVD vs special edition (include a panty or signed polaroid etc) for each vid, so if you mirror that you could really inflate it back to @ldjb 's 74GB.

They are more like under 500kb. Maybe an average of 450? The math would be about...?
Nah... DMM's cover images are about 250-290kb, but Amazon often use a high quality image (the resolution is excellent) that sometimes does push 500kb. I generally use DMM's covers for my local database, not because of file size or low resolution. But DMM's covers are used by almost everyone so it's super fast to scrap them. Whenever there's a vid I really like, I do go to locate it on Amazon and see if there's a high res cover for it.
 

Casshern2

Senior Member...I think
Mar 22, 2008
7,029
14,510
Nah... DMM's cover images are about 250-290kb, but Amazon often use a high quality image (the resolution is excellent) that sometimes does push 500kb. I generally use DMM's covers for my local database, not because of file size or low resolution. But DMM's covers are used by almost everyone so it's super fast to scrap them. Whenever there's a vid I really like, I do go to locate it on Amazon and see if there's a high res cover for it.
WOW yes sorry I stand corrected. I had the size of the thumbnails I make for posting things, not the covers, they are indeed mostly less than 200kb or so.
 

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,094
UPDATE: I've managed to implement the Japanese HTML pages too...
OK... I've came back to Python. So... the last time I was using Python (and last time I was working on JAV database) was 2014. 4 years gap, I ought to congratulate myself.

Anyway... This time around, I follow my bad habit of ignoring my old code and start from scratch. It took surprising short time to rebuild the core functionalities of my old project from scratch. So much easier this time around, thanks to the fancy new modules the Python community have made.

OTOH, the competitions are stiffer now. There are other, better, projects already rolled out. And there are more (mostly are profitable, I assume) websites out there. So I feel it would be silly of me to try to compete.

So once more I'm building a new project for my personal need, and I am only mentioning it here because of one single feature/functionality that perhaps fellow JAV fans might be interested in: advanced search.

E.g. find all creampie vids by any idol when she was under 21 years old. Yeah... that's a kinda crazy example. But the point is it's doable. (no... that particular search hasn't been implemented yet. just throwing it out there to gauge interest level). But something like "all vids starring Julia but not MIDD/MIDE/PPPD is very easy.

As far as I know, no website (including of course Akiba) has yet offered advanced search on JAVs. So... any comments? Indifference? Encouragement?
 
Last edited:
  • Like
Reactions: Casshern2

R18.com

Well-Known Member
Jun 29, 2015
349
260
OK... I've came back to Python. So... the last time I was using Python (and last time I was working on JAV database) was 2014. 4 years gap, I ought to congratulate myself.

Anyway... This time around, I follow my bad habit of ignoring my old code and start from scratch. It took surprising short time to rebuild the core functionalities of my old project from scratch. So much easier this time around, thanks to the fancy new modules the Python community have made.

OTOH, the competitions are stiffer now. There are other, better, projects already rolled out. And there are more (mostly are profitable, I assume) websites out there. So I feel it would be silly of me to try to compete.

So once more I'm building a new project for my personal need, and I am only mentioning it here because of one single feature/functionality that perhaps fellow JAV fans might be interested in: advanced search.

E.g. find all creampie vids by any idol when she was under 21 years old. Yeah... that's a kinda crazy example. But the point is it's doable. (no... that particular search hasn't been implemented yet. just throwing it out there to gauge interest level). But something like "all vids starring Julia but not MIDD/MIDE/PPPD is very easy.

As far as I know, no website (including of course Akiba) has yet offered advanced search on JAVs. So... any comments? Indifference? Encouragement?

Well, in R18.com you have at the right and ADVANCE SEARCH feature. You cannot do anything you want but you can search Julia tittles of a particular Studio and in one particular category. You can do Julia tittles in Big tits and creampie category etc.

http://www.r18.com/videos/vod/movie...pc_top_vod_actresses_actress&i3_ref=recommend
 
  • Like
Reactions: ding73ding

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,094
My bad! How did I miss that feature on R18. :oops:

BTW, that advanced search box also proved me wrong on the total number of JAV. So @ldjb was exactly right about more than a quarter millions JAVs. So like I said... 10 new vids per day, 74 years give you 270,000 vids. FREAK! Or watch JAV 24 hours per day, that would take more than 50 years to finish watching every JAV once. Almost forgot to post a screenshot. Almost forgot to post a screenshot, yes super basic html-based wall of new releases (rendered at 50%)
browse-10-8.jpg
 
Last edited:

R18.com

Well-Known Member
Jun 29, 2015
349
260
My bad! How did I miss that feature on R18. :oops:

BTW, that advanced search box also proved me wrong on the total number of JAV. So @ldjb was exactly right about more than a quarter millions JAVs. So like I said... 10 new vids per day, 74 years give you 270,000 vids. FREAK! Or watch JAV 24 hours per day, that would take more than 50 years to finish watching every JAV once.

Yeah! We do have more than 270,000 tittles in the site and adding around 1500-2000 new tittles every month :p
 
  • Like
Reactions: Casshern2

Casshern2

Senior Member...I think
Mar 22, 2008
7,029
14,510
OK... I've came back to Python. So... the last time I was using Python (and last time I was working on JAV database) was 2014. 4 years gap, I ought to congratulate myself.

Anyway... This time around, I follow my bad habit of ignoring my old code and start from scratch. It took surprising short time to rebuild the core functionalities of my old project from scratch. So much easier this time around, thanks to the fancy new modules the Python community have made.

OTOH, the competitions are stiffer now. There are other, better, projects already rolled out. And there are more (mostly are profitable, I assume) websites out there. So I feel it would be silly of me to try to compete.

So once more I'm building a new project for my personal need, and I am only mentioning it here because of one single feature/functionality that perhaps fellow JAV fans might be interested in: advanced search.

E.g. find all creampie vids by any idol when she was under 21 years old. Yeah... that's a kinda crazy example. But the point is it's doable. (no... that particular search hasn't been implemented yet. just throwing it out there to gauge interest level). But something like "all vids starring Julia but not MIDD/MIDE/PPPD is very easy.

As far as I know, no website (including of course Akiba) has yet offered advanced search on JAVs. So... any comments? Indifference? Encouragement?
What language are you using for your personal use program??
 

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,094
It wasn't clear enough? I'm using Python. Compared to 4 years ago, the language (with the recent community-contributed powerful modules) is so much easier to develop now. I do have to spend a bit of time to figure out the new features and dig around for docs since the modules aren't part of the core language so the docs aren't all in one location. But it's not too hard and well worth it. In case the other coders aren't using Python and interested, the new features are:
- native support of unicode str (in fact there's no more 8-bits char or string) (wish I could get back all those hours wasted on fighting unicode Japanese chars on Java/Python etc)
- pandas gives Excel-like table/database independent of SQL (SQL support was available since a long time ago, of course)
- BeautifulSoup4 is much more powerful and reliable than the first gen html parser
- new dev platform are also a bit better than 4 years ago

There are more, but no one's asking...
 

Casshern2

Senior Member...I think
Mar 22, 2008
7,029
14,510
It wasn't clear enough? I'm using Python. Compared to 4 years ago, the language (with the recent community-contributed powerful modules) is so much easier to develop now. I do have to spend a bit of time to figure out the new features and dig around for docs since the modules aren't part of the core language so the docs aren't all in one location. But it's not too hard and well worth it. In case the other coders aren't using Python and interested, the new features are:
- native support of unicode str (in fact there's no more 8-bits char or string) (wish I could get back all those hours wasted on fighting unicode Japanese chars on Java/Python etc)
- pandas gives Excel-like table/database independent of SQL (SQL support was available since a long time ago, of course)
- BeautifulSoup4 is much more powerful and reliable than the first gen html parser
- new dev platform are also a bit better than 4 years ago

There are more, but no one's asking...
Sorry, I know 0 about Python. I'll look up a nice IDE for it so I can learn it.
 

OscarLewis

Member
Jan 11, 2018
65
14
Something like MusicBrainz for porn in general would rock. It would not even be that hard to make it. Copy their source code, change the AcoustiID (audio fingerprint) to some sort of video fingerprint. Let the community submit the fingerprints and metadata. Would take time, money and effort, but is totally possible if someone spear-head it.
 

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,094
Sorry, I know 0 about Python. I'll look up a nice IDE for it so I can learn it.
Well then you have to ask yourself why learn a new language. I try very hard to control my resources (most precious resource is time) on JAV. I worked on my first JAV project because I had a work reason to use (learn) Python. And then I put it on hold (it was quite functional, but I found no partner to work together or take it over), because at that time I stopped needing Python in my work And then the websites that my old project depended on was changed and it broke my program. And I banned myself from fixing it, because my time shouldn't be wasted on it. And now I'm back on Python for work, and my JAV project 2.0 is on.

If you do decide to try out Python, I would be happy to share my experience, (limited) knowledge and even code. You need to learn core (pure) Python (including the standard modules os, re ) and these modules: urllib, BeautifulSoup, pandas.

For IDE, there are several pro choices out there. But my current platform and recommendation for newbies is Enthought Canopy. It's not a pro choice because it lacks (free) debugger, refactoring and project management. (if you know what refactoring means, go with PyCharm, Eclipse or Spyder, otherwise Canopy is plenty enough for you). Canopy's huge advantage is one single installation gives you everything to build a full JAV project (Python platform, IDE with interactive console, and all the modules you would need) not the way a pro might want, but painless for newbie. Also I love it because its footprint is pretty small, I do a lot of JAV projectdevelopment on a nano PC.

Something like MusicBrainz for porn in general would rock. It would not even be that hard to make it. Copy their source code, change the AcoustiID (audio fingerprint) to some sort of video fingerprint. Let the community submit the fingerprints and metadata. Would take time, money and effort, but is totally possible if someone spear-head it.
If I could make a video fingerprint, I would be making a high 6-figure income at Facebook or Netflix and banging hot chicks every weekend, instead of fapping to pirated JAV.
 
  • Like
Reactions: Casshern2

Casshern2

Senior Member...I think
Mar 22, 2008
7,029
14,510
Well then you have to ask yourself why learn a new language. I try very hard to control my resources (most precious resource is time) on JAV. I worked on my first JAV project because I had a work reason to use (learn) Python. And then I put it on hold (it was quite functional, but I found no partner to work together or take it over), because at that time I stopped needing Python in my work And then the websites that my old project depended on was changed and it broke my program. And I banned myself from fixing it, because my time shouldn't be wasted on it. And now I'm back on Python for work, and my JAV project 2.0 is on.

If you do decide to try out Python, I would be happy to share my experience, (limited) knowledge and even code. You need to learn core (pure) Python (including the standard modules os, re ) and these modules: urllib, BeautifulSoup, pandas.

For IDE, there are several pro choices out there. But my current platform and recommendation for newbies is Enthought Canopy. It's not a pro choice because it lacks (free) debugger, refactoring and project management. (if you know what refactoring means, go with PyCharm, Eclipse or Spyder, otherwise Canopy is plenty enough for you). Canopy's huge advantage is one single installation gives you everything to build a full JAV project (Python platform, IDE with interactive console, and all the modules you would need) not the way a pro might want, but painless for newbie. Also I love it because its footprint is pretty small, I do a lot of JAV projectdevelopment on a nano PC.


If I could make a video fingerprint, I would be making a high 6-figure income at Facebook or Netflix and banging hot chicks every weekend, instead of fapping to pirated JAV.
How's the project coming along?
 

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,094
I've finished a first phase of the back-end, I keep one slim database with about 14,000 vids with links to torrents of my fav FHD release group. I keep a "fat" database with each vids info (most important are performers, release date and genres) which can potentially scrap say 50,000 vids in one go. But I don't want to unleash it unnecessarily. Currently I've only scrapped 200 vids that were released in last few wks.

I have yet to create a third database which holds my own collection with my own tags (my own rating, watermark, storage HDD etc). And all three database can be easily inter-referenced. Also I have not yet created the English "translation" part of it, so all the data in the database is Japanese only so far. I do have my own English/Japanese idols list and also a genre English/Japanese translation list. So technically it's easy to write the program for it. I've done it in 2014, now it should be much easier.

Only I've banned myself from spending too much time on it. Especially building a front-end for it would be time-killing but not challenging nor fun. So now when I do a search I have to type in a command:
Code:
javlib.findvid(jbus,'アナル', key='title').sort_index(by='aired').iloc[-5:]
In the database jbus, find all vids with 'アナル' (anal) in the title, sort them by release date (aired) and return only the last 5 results. Then one possible thing to do is to pass those results to another function that generates a HTML thumb wall. Yeah I know it's totally unusable except by me (I've a bunch of demo code to help remind myself how to do things) but like I said, I am not going to spend time to make pretty GUI if I'm the only user.
 

Casshern2

Senior Member...I think
Mar 22, 2008
7,029
14,510
Hey, I don’t blame you for making something just for yourself, that’s always 100% the easiest thing to do. No one to ask you to add this, change that. The inevitable It was working yesterday now it’s not. HELP! I’m more than half tempted to go the same route and just build something for my use only. If fact! Thank you for your post! I think I will do just that, build this for me alone. Sure, I’ve already started my HTA thread over there yonder, but, so far only a few seem interested. And at least one of them looks like they can take it and change it as they please, so, if anything I might keep that thread up with update posts on how it is working for me. If anyone wants it they can take any updates as-is. Well…that’s still not doing it just for me in a way. Dammit!

I’m curious, when you say “databases” do you literally mean you have three separate standalone databases? Or you have one database with three separate tables to query from?

I took a look at some YouTube video of a guy who seems to teach programming courses. He’s from Montreal. Nice guy! I enjoy watching him. He says Python has really gotten popular on StackOverflow.com, that says a lot in itself. But, you are wise to say what you said. Why learn a new coding language when all I need is something I can do with what I already know. It won’t be great, won’t be popular, -10% scalable, but I should be able to get something to work for me. And that should really be the goal.
 
Last edited:

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,094
I've only been re-learning Python since a month ago. I have been a lifetime Matlab (and a bit of C) guy. But I'm pretty sure I will transition to Python (I do have tons of Matlab codes still, so it will take forever to take all those legacy code to Python). Matlab and C had been dominant in my field (engineering) but lots of people in my field are transitioning to Python, and that doesn't happen every decade! And it's a pro choice for social science/commercial data science, so there are really powerful modules for database. So I was not at all discouraging you to learn Python. I was only saying it will be a huge mistake to invest too much time (say learn a new language) if the reason is JAV alone.

"database" is almost empty word in my project. I don't even know how to answer you "standalone" question. OK each database has different rows (individual vids) and columns (fields: such as title, actor, genre), so I guess yes each is standalone. It's easy to cross-ref the two database by code (PPPD-601) (if you don't know the code, you can search by any data field). Some of my methods require both databases to work.

I guess you asked a good question... I could merge the two databases into one. Python (pandas) has very smart methods to deal with missing data (there would be a lot of missing data due to both database having very different data) and it's easy and highly efficient to generate multiple "slices" or "views" of the same database, somewhat like your saying "one database with three separate tables to query from" only it might be even more efficient and powerful than what you have in mind.

And the third, not yet implemented database (for my collection). I am putting it on hold indefinitely. Again... should spend more time on life and/or career, not JAV. Since at least 2 years ago, my viewing habit doesn't really need managing my whole collection. I pretty much have only some of my collection from past 6-12 months in a big folder so there's no real need to build a database for it. Actually I probably rarely watch anything more than 4 months old.
 
  • Like
Reactions: Casshern2