JAV database

CodeGeek

Akiba Citizen
Nov 2, 2010
5,180
1,866
I have always wanted to start something like http://themoviedb.org for JAV... Problem is i have zero skills in programming.

You should have a look at it as it is perfect for what we want. It even has an API for easy automatic scraping of information with tools like XBMC or metadata managers like Ember Media Manager

The API documentation is located here : http://docs.themoviedb.apiary.io/

My role model is AniDB. They also have several APIs, but only 2 are public: The HTTP API which is very restricted, and the UDP API which is more complex and difficult to use, but offers a bigger variety of functions.

But the documentation of the API you mentioned is very nice - better than the documentation of AniDB.

At the moment I'm not at the point were I can implement an API. I'm still fighting with the program itself as well as the update mechanism (the program has to distribute updates as well as getting change request back from the users). But an API (or maybe a few APIs) are definitely part of my plan.

Thanks for the links. :gayprance:

wow ... is this thing still alive? I want to make something like this too.

Yes, it is still alive. Also I didn't write no line of code the last 3 or 4 weeks because of - positive - personal matters.

Do you program?
 
  • Like
Reactions: Blade Runner

Casshern2

Senior Member...I think
Mar 22, 2008
7,017
14,455
Good luck with this! I built a desktop app in VB6 back in the day but I scrapped it once I purged everything. It was kinda cool, nothing on this scale. Access backend, browser control, HTML creation on the fly so there was always only one *.html file instead of a page for every title. Now that I talk about it I miss it! Haha!
 

CodeGeek

Akiba Citizen
Nov 2, 2010
5,180
1,866
Good luck with this! I built a desktop app in VB6 back in the day but I scrapped it once I purged everything. It was kinda cool, nothing on this scale. Access backend, browser control, HTML creation on the fly so there was always only one *.html file instead of a page for every title. Now that I talk about it I miss it! Haha!

Thanks, Casshern2. :gayprance: I'm planning to implement an interface so 3rd party programs can also use the data (maybe SOAP, some kind of web service or simply a TCP based protocol). Maybe you can start development then again.

Hope this project comes about. It would be great to know the actress names of obscure jav.

Thank you, ragemanger. :gayprance:
But I guess that aspect depends on how many people participate in this. I don't have any plans to maintain the data / content on my own. There are too many movies and too little time (on my side) to do that by only one person. It's a community project.



By the way: Work is in progress. But I already faced the first bigger problem (for some of us). To get the meta data from the movies I use the MediaInfo library. That library is really great. It only has one problem: It doesn't support ISO at this point. I read in some Internet forums that the author will include ISO - when he gets paid for it. Here is the feature request in the request tracker: http://sourceforge.net/p/mediainfo/feature-requests/265/
 

CodeGeek

Akiba Citizen
Nov 2, 2010
5,180
1,866
Still working on that project. But it seems I'm facing the first bigger problem. If my calculations are correct the database will have a size of 17 or 18 GB and contain 90.000. The problem is the size. I'm not sure how I should publish this amount of data. I thought that I could start uploading attachments in this forum as I thought it would have only a few MB. But that size is too big for that... (Same problem goes for change requests from the community)

Now my strategy is to first complete an alpha version and then think about the distribution.
 

elgringo14

Survived to Japan
Super Moderator
Apr 28, 2008
9,092
339
If the data is full text (ASCII), you can compress it with usual ZIP/RAR formats with a compression rate of 80% to 90%, that would make something more easy to disseminate. :study:

But good luck with the compression time... :notagain:
 

CodeGeek

Akiba Citizen
Nov 2, 2010
5,180
1,866
If the data is full text (ASCII), you can compress it with usual ZIP/RAR formats with a compression rate of 80% to 90%, that would make something more easy to disseminate. :study:

But good luck with the compression time... :notagain:

Currently my calculation is based on a ZIP archive (highest compression used) for each movie containing the data of the movie as XML and the cover as JPG (The XML document has about 1 KB while the cover has about 160 KB). Means there is not much to compress anymore as it is already compressed.

Based on some comparison charts I found on the Internet using gzip or bzip2 wouldn't result in a much more smaller size.
 

CodeGeek

Akiba Citizen
Nov 2, 2010
5,180
1,866
Okay, the first step is done: I have the basic data of more than 87.000 movies (more than 16 GB of data). Unfortunately this data is not perfect as there are still some studios and labels missing (e.g. Sky High Entertainment, REDHOT Collection, ...). Also I haven't merged the performers yet - some are performing using different aliases and it would be nice to have these aliases all linked with one performer.

So I hope I can release a first version in the course of next month.
 

iori11

Member
Nov 25, 2009
100
2
:hi: do you know about this website? is that similar to wat you are trying to do?:puzzled:

javlibrary.com
 

CodeGeek

Akiba Citizen
Nov 2, 2010
5,180
1,866
:hi: do you know about this website? is that similar to wat you are trying to do?:puzzled:

javlibrary.com

Similar, but not the same. I hope I can release an alpha version as soon as possible so it will be more clear what I'm planning to do. So stay tuned.
 

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,092
Hey CG,

I started coding something for myself yesterday (got half a day off) and then today I decide to take another look online in case someone else had done it already (and like 1000x better than mine). And what a surprise I bump into your thread.

So... any update on your end?

Let me tell you about my very modest project.

1. Mission Statement: a standalone end-user app (not a website or server) that generate database entries for my media center software (XBMC)

2. Customer: ME FIRST. I'm not that good a developer and I'm super busy in real life, so I can't address everyone's needs. I am willing to share my work, but don't expect me to solve others' problems. That said, I'm happy if my work could be useful for others. I know Chinese, so I'm reasonably ok with kanji or even some kana... this point will come into play when designing my app.

3. Timeline: alpha version in 5 days of effort (which may turn out to be 3-30 days of real time, who knows), beta version in 15 days of effort.

4. Scope + functionality: respecting 2 and 3, the scope and functionality will be very limited: it's mainly to help me browse my vids when using XBMC. It should let XBMC browse/find vids by actor, DVD code, year, DVD title, genre, and studio. The DVD title in my case is my translation into English, which is sometimes based on the unofficial English title found on Akiba and elsewhere, but usually refined by myself (often with help from Google Translate). The genre will function more like tags (each vid will have many tags).

5. Input: again due to point 2, my input is very specific, whenever I download a vid, I take a few mins to find the infos in Japanese from DMM, Akiba or other sites, (but DMM is getting harder to use from where I live) and put into a text file. So my input is these text file, the standard format is the textual info copy/pasted from DMM (see bottom of post). Usually I edit the text a bit (removing some useless items such as Customers' Rating).

6. Output: XML formatted database entries or whole database. It's apparently what IMDB provides and compatible with SQLite so many apps will be able to import it.

7. Translation: the app will translate some of the text by a dumb dictionary lookup method. Basically I don't trust any translator and translation out there, so my philosophy is translate those that I absolutely need and feel confident about, for the rest leave it as Japanese. So that means translate the headings (出演女優=actor,製品番号=DVD code etc), and genre (単体作品=solo work, 中出し=cream pie etc). The DVD title and plot summary will absolutely NOT be translated by this app. The actor names... not now, but it might be a future project, well past beta, if ever. I've compiled a simple dictionary file for this.

8. DVD covers: again my habit is to download a DVD cover and put in the same directory, I also manually crop a "poster" out of the DVD cover (usually from DMM) so XBMC presents a beautiful and visually distinguishable cover for each vid. My app will NOT do anything about the DVD covers up to beta release. Beyond beta there are some ideas... including portraits for the actors.

9. (FAR) Future: perhaps this app will attempt to scrap from DMM or other webshop/website. If this app is to be useful to users other than myself, I imagine this will be kinda critical. Another idea is to add a database for actors, linking up their Japanese and English names, multiple alias and perhaps portraits.

Here's where we might meet. As I understand you are writing a sever end app and have already compiled a huge database. Perhaps we can discuss if our apps might interface? Or some of my work might prove useful to you?

Can't wait to hear your thoughts. Now back to coding...

温泉女将の誘惑 JULIA
Hot Spring Owner's Seduction Julia
発売日: 2013/09/13
収録時間: 120分
出演者: JULIA
監督: 朝霧浄
メーカー: 溜池ゴロー
ジャンル: 女将・女主人 寝取り・寝取られ 巨乳 人妻 単体作品 独占配信 ハイビジョン DVDトースター
品番: MDYD-821
夫の実家の旅館を立て直すべく女将として働くJULIA。ライバル旅館の女将に対抗するため、一肌脱ごうとCM撮影に臨むも、脱ぐのは一肌では済まなかった!番頭に犯され、性的接待を余儀なくされ、ついにはお客の前で夫と見せ物SEX!特別出演にHitomiを迎え、スケベ満載・超豪華温泉絵巻!
 
  • Like
Reactions: 2 people

CodeGeek

Akiba Citizen
Nov 2, 2010
5,180
1,866
Hello ding73ding, :hihi:

my gosh, that's a long post. Your project sounds interesting - also I don't use XBMC myself. But it will be very useful to other people here. I keep my fingers crossed for you that you will be able to implement all the things you have in mind.

[...]
Here's where we might meet. As I understand you are writing a sever end app and have already compiled a huge database. Perhaps we can discuss if our apps might interface? Or some of my work might prove useful to you?

Can't wait to hear your thoughts. Now back to coding...

Okay, maybe I start at the beginning: I always was looking for something like AniDB.net for JAV movies. That site - and also the programs they are offering - were very convenient back when I watched animes. Beside the information about animes they offer also a calendar which shows all the new releases. And they had listed a lot of subtitles files and hard-subbed movies. In the JAV community there is nothing comparable to this. And because there is not such a thing may also one of the reasons why there are not many subtitles available: Because the authors won't get any credits / fame. Maybe even others will steal their subtitles.

So my first thought was to create something similar to AniDB.net. But there was the first problem: The web server. Of course you get nowadays web servers for free. But in their user agreements they exclude porn. And I guess they also wouldn't be very happy about having hashes of JAV movies in their databases. Another problem is that they offer only PHP. I'm honest: I really hate PHP. So far I have used a few programming languages. But PHP was one of the worst.

The next idea was to create a main database / main program and a client database / client program. The main database would hold the master data (performers, studios, labels, categories, tags, movies, hashes, etc.), while the local client database would hold the master data and additional the user data (wishlist, bookmarks, etc.). The user can create a request for new data or modifying existing data. This request will be stored separately from the master data and can be transmitted to the mods or the admin by uploading or sending a file, or by a base64-encoded message via e.g. a PM. The admin or mod is than checking the change request. And if he/she is granting it, it will be written to the main database. After some time some kind of update file is generated which contains all the modified data for updating the clients. That file is made available via file hoster, post attachment, ed2k link and/or torrent file. The user has to download it, the program reads it and updates its local master data.

The problem is that even that 2nd version is such a huge project that I have to break it down in very small steps. Especially as I also have a private life. So what I'm currently doing is to write the database layer as well as some parts of the logic layer. And I will first write a single user version for myself, but will keep in mind that it will be split into a master program and a client program later. That first program won't have that data synchronization feature. So I won't release it because it doesn't make any sense.

The moment I split the program as well as the database I also plan to offer several interfaces. One is a plug-in interface which makes the program extensible without modifying the code of the core program. For use cases like yours I'm planing to implement a web service which offers access to - at least - the master data via HTTP. Responses will be available in XML and JSON. Maybe that will be one of the advantages of that approach in comparison to AniDB.net: There interfaces block you if you send to many requests. But in my case the program is running locally on your PC or server. So there won't any kind of flood protection and you can send as many requests as you want - until the program and/or the hardware collapses. I also want to offer that kind of service for the reason you already mentioned: If everyone is doing website crawling sites like DMM will have a heavy load. That's not good for these companies, not good for other member of the community who use this sites without any additional program. And it is also not good for the ones like us who also may use crawlers, but share the information with others, because these companies will take countermeasures or even have to close. And that makes it even harder. Maybe I can prevent that from happening by that web service.

I guess that also is the part which would be interesting for you, isn't it? If you want I can sketch an example request-response (only sketch because there is no actual code there).

[...]
温泉女将の誘惑 JULIA
Hot Spring Owner's Seduction Julia
発売日: 2013/09/13
収録時間: 120分
出演者: JULIA
監督: 朝霧浄
メーカー: 溜池ゴロー
ジャンル: 女将・女主人 寝取り・寝取られ 巨乳 人妻 単体作品 独占配信 ハイビジョン DVDトースター
品番: MDYD-821
夫の実家の旅館を立て直すべく女将として働くJULIA。ライバル旅館の女将に対抗するため、一肌脱ごうとCM撮影に臨むも、脱ぐのは一肌では済まなかった!番頭に犯され、性的接待を余儀なくされ、ついにはお客の前で夫と見せ物SEX!特別出演にHitomiを迎え、スケベ満載・超豪華温泉絵巻!
Good choice. One of my favourite performers as well as one of my favourite movies.
 
  • Like
Reactions: 2 people

cyberzen

New Member
Apr 8, 2010
64
21
Hi, I am also a software developer thinking of starting a project like this, mainly for organizing my 4 TB JAV collection. I have a very simple QT app that somewhat does some of the features already like tagging and video previews, but it's mainly for my own use case, lots of stuff hard coded.

My ideal software would have these features:
1) Tagging
2) Tag normalization
3) Tag input via checkbox & autocomplete
4) Actress by scene
5) Actress normalization (aliases)
6) Actress input via checkbox & autocomplete
7) Actress autocomplete popup with name and profile pic
8) Request actress / video identification
9) Video Previews
10 Video Player
11) Browse files via thumbnails, filename, cover
12) Dockable interface, can arrange sub windows
13) Sync local database of tags to server
14) Tag ranking to filter out bad tags
15) Subtitles uploading (credit person who did subtitle)
16) Duplicate detection
17) Internationalization, program hardcoded text in an XML file
18) Recommendations
19) Comments
20) Filtering by tag, actress, partial name, producer (NHDTA)

regarding tags and actress for videos, a user will tag his local video and sync that info (optional) to the server, the server will then rank the tags accordingly and show the top tags, if anyone has used Empornium tag system they will be familiar with this. (Relevant tags will rank higher)

Right now I'm saving the current info into an sqlite database:
1) Video Title
2) Video Description
3) Producer ie. NHDTA (Natural High)
4) Actress by scene
5) Tags
6) File hash

BTW I was motivated to create this project due to Mikuni Maisaki, I got all her videos that are listed on javlibrary.com, but I also found some of her that were not listed there. I'm quite sure that many people on akiba has had that same experience where they want to find similar actress / video of same genre, as evidenced by so many of these video recommendation threads.
 

cyberzen

New Member
Apr 8, 2010
64
21
Also in lieu of a central server, maybe we can try having it as a P2P network. Not for the actual videos themselves but just for the metadata. Assuming we allow for 10kb per video, multiply by 1000 videos is only about 10 MB of data. The bandwith intensive part would be sharing of video covers.
 

CodeGeek

Akiba Citizen
Nov 2, 2010
5,180
1,866
Hello cyberzen, :hihi:

thanks for taking your time and posting here.

Yes, it is a Swing application. My role model for the application is AniDB.net. I also would have developed it based on e.g. Struts 2. But the problem is that I would need a web server which is able 1) to act as a Servlet container, 2) which is for free and 3) on which I'm allowed to host these informations. So far I haven't found anything like that. So I decided to write a Swing application (which is not bad because web programming is not my world).

I use Java not only because it is the programming language which I use for 15 years now. Thanks to it it is also possible to run the program later on Windows, Linux and Mac OS X. Even a port to Android devices is not that difficult, also I have some doubt of how useful it would be there or even realistic because of the amount of data.

I also plan to implement most of the feature which you mentioned. But a few I definitely don't plan to implement:
  • 8) Request actress / video identification
    I think that doesn't belong into a database, but in the forum here. The reason is simple: There are new videos each day. And each old videos disappear (e.g. on Dirty Minded Wife Advent Vol.35 : Satomi Suzuki
    I think something like that takes too much space. I have a database containing all covers of the DMM movies of the DVD section. These are more than 80,000 - which doesn't include all available JAV movies, only the ones they are selling. And the covers take up more than 20 GB of space. If you would have also have a preview it the size will explode. That's the reason why I think it is not a good idea.
  • 10) Video Player
    I plan to offer to start an external video player (like VNC). I also plan to create playlists for some of these. But I don't plan to include a player in my application.
  • 13) Sync local database of tags to server
    In my case that belongs to the issue of sending data (change requests / CREQs) to the server and getting updates from the server.
  • 14) Tag ranking to filter out bad tags
    Haven't thought about that yet. Maybe you can explain a little bit more in detail how you will implement it.
  • 15) Subtitles uploading (credit person who did subtitle)
    I plan to handle subtitles like I handle video files or video files with hardsubs. I don't have any special feature plant for this. For each file the user can define which group is the producer of it (exactly like in AniDB.net). I hope that this will motivate people to upload / share movies without watermarks and with better quality as well as creating subtitles without living in fear that someone will replace their authorship in the file and re-upload it.

Mikuni Maisaki / 舞咲みくに? That's her, right?

In my current plans I'm using a central database. All the master data will be held there (performers, movies, tags, recommendations, comments, ...). The local database is like the central database, but has additional user data (like the movies you have, their location, your bookmarks, what movies you want to see, which movies you already have seen, etc.).

I also thought about a P2P network for the data. As I don't have a server it will be a very tiring and inconvenient to upload your CREQs or sending them by PM. And you also have to take care to get the updates from the central database (via file from the forum, a file hoster or P2P [emule, torrent]). But so far I think it is the only solution.

Why? Because you can't transmit and hold structured data in a P2P network (like KAD / Kademlia.

Okay, there are also other forms of database. Big SQL databases (like Oracle or IBM DB2) or IBM Lotus Notes are able to replicate data. In case of an SQL database it is more or less simple: If you do a read, you do it on your local database. But if you do a write all databases will lock the record and write it simultaneously. If one database gets offline, you won't be able to write any records anymore.
In case of Lotus Notes it works a little bit different. Here databases can go offline without any problem. Each record has a history of timestamp when it was modified. If the same records in two different database gets modified Notes compares these timestamps and tries to bring it into an order. Example:
Code:
Database 1 - list of modifications of record A:
10:00
10:15
11:00
Database 2 - list of modifications of record A:
10:00
10:15
11:00
11:30
In that case the database 2 contains the up-to-date record and the record will be transferred from database 1 to 2. But in case the record in database 1 got also modified in the meantime, one of the two records will be saved as a "conflict document" and the other one will become the normal record. Then the user has to look after it and fix it manually.
So these 2 approaches are also not very convenient or safe.



So there are now 3 people including me who are working on a JAV database: ding73ding and cyberzen. I think the projects are a little bit different from each other. But maybe we can join forces in aspect of the main data.
 

cyberzen

New Member
Apr 8, 2010
64
21
Regarding server,

operating a server is not that expensive, probably around $150 a month, can get a pretty powerful server from hetzner / OVH. I also do server admin so I can manage the server myself.

8) Request actress / video identification
I agree that this should probably belong in the forum, it will be low priority. As for the covers, not everyone will need the full 20 gb, I'm quite sure most people will have around 1-4 tb of vids, that's about 1000 - 2000 videos, depending on how many HD vids they have.

10) Video Player
this option comes pretty cheap if it's just a simple video player, probably like 10 - 20 lines of code.

13) Sync local database of tags to server
the cost of the server can be offset by some advertising, so the server is no issue.

14) Tag ranking to filter out bad tags
If the user does not connect to the server, then he just uses his own tags. But if he logs on to the server, the server tags will be synced to his local database when he views a video. Some other person would most likely have tagged the video before. Each tag will have a rank based on how many times a user has chosen that tag before on that particular video. Just show the top 5-10 tags per video.

15) Subtitles uploading (credit person who did subtitle)
this feature I am not sure belongs in the app itself, but I want this feature for sure somewhere.

I have some experience with Swing, but I prefer Qt, it's also cross platform, and performs better :) plus swing has some pretty awful font rendering and it's not using native widgets :(

Have not done any work on the web server part yet. The first version of the app will be local first. This is what it looks like now,

Interface.jpeg
 
Last edited by a moderator:

cyberzen

New Member
Apr 8, 2010
64
21
Yup that's her, I got 104 GB folder, and still I'm missing some vids :(

After some thought, I think P2P will be too complex, I have no experience in this area, I will be going the server route.
 
Last edited by a moderator:

CodeGeek

Akiba Citizen
Nov 2, 2010
5,180
1,866
About the server I have 2 additional points:
  • I don't want to pay for it, I also don't want to have some advertisements there. And servers-for-free also don't allow that kind of data to be hostet.
  • Even if I pay for it: Most providers don't allow that kind of content.

  • 10) Video player
    Ah, okay, so you simply embed one or Qt offers one as a component. There is also a video player component for Java. But it doesn't come with many codecs. And it also requires the installation of an extra software. I don't want that. I will even ship the Java JRE with my application so I doesn't need to be installed and also won't installed the Java browser plugin (which is a security issue).

I'm not sure about the performs. I heard so many times that Java and its components have a bad performance. But the times I experienced a really slow Java program it was the fault of the code (also I have produced some code like this in the past). If it is coded well there are no problems at all.
About the font rendering: I can remember that it didn't support anti-aliasing of fonts in the past. Yes, that was really awful. But nowadays it does.
The biggest point are surely native components (you can't include e.g. MS Word in your application as a component), that's true. But to be honest: I never have needed any of these until now. Hm, maybe the Adobe Acrobat Reader. Luckily there was a rendering engine for PDF files available for Java.