JAV database

cyberzen

New Member
Apr 8, 2010
64
21
Haha :)

the crash could be due to japanese characters, maybe try without any japanese in filename first. I have no problems importing 1000s of my videos on linux, but I have not tested with much videos on the windows side due to running in a virtual machine.
 

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,092
Simple, probably not, but by the URL there is some parsing you can do:...
Below is my quick and crude way to get the DVD code from the URL. Maybe using regular expressions it can be done more efficiently but I'm fried from work. ...

Hey sorry for going submarine. I had a massive project that finished last week, than I took a whole weekend to relax and "to be with my family". It seems weeks since I even click on an JAV. Luckily I only had such pressure once every couple years. (and feel sorry for guys whose work pressure cause them problems) BTW when I (mis)-read you were fired from work I did a double take...

Back to codes. Yeah man you need to get regexp on if you want to do this project.

But anyway I did consider seriously writing a similar DMM decoder. But ended up scrapping the idea. The problem isn't technical difficulty, but reliability. Let's say you get it up to 99% reliable, then what? For collectors with 1000s of vids, that leaves 10s of unresolved cases. Which is still manageable by adding exception rules I guess... But I don't like the uncertainty.

Also is there a trend that DVD codes format is also becoming less rigid? I see not just 4-digits (natural, some long running series have run well into the 900s) and 2-letters, but also single letter prefixes. And even single-letter-5-digits, a minor studio I assume, but the idea should be to try to cover all cases. This is make it more tricky to decode the DMM codes.

So my current thinking is to extract DVD code from either JAVIMDB or AVIMDB (or both). I don't know how complete they are compared to DMM (meaning are there a lot of vids missing from their database?). And AVIMDB even lacks genre info.

Both JAVIMDB and AVIMDB hot links to DMM for cover and actress images, so actually it's quite easy to extract the DMM code from DVD code using the URL of these images.

But for some reason I can't request the correct HTML from my favorite programming platform. So I'm a little stuck. ...

Correction: now I'm re-learning Python. The good news is that it's cross-platform. I hope to bash something together within a week.

The other positive is that both these sites are bilingual (actually quadralingual, but who cares about Chinese??) so now I'm running out of excuse not to support bilingual (hey is that a plus or minus?). They also list some uncensored vids. But I don't assume they are comprehensive.

Uncensored vids are becoming more important. I hope to find a good source of metadata.

A really really crazy idea is to scrap Akiba Online. Or rather just feed off the new posts daily.
 
Last edited by a moderator:

cyberzen

New Member
Apr 8, 2010
64
21
I got a database of DVD codes, description (google translated i think), release date, studio, tags, acts in an sqlite database 22 MB in size. Also got the cover images from those DVD codes correctly named, SDMT-819 > SDMT-819.jpg. There are a 42998 video titles in the database and currently the image folder size is 4.4 GB.

This database is also being used on the server, so if your files are correctly named, the tags and acts will be synced back to your local database.


I'm guessing not everyone will want ALL the cover images, only those video titles they have on their HDD. So I'm working on using bittorrent to selectively download images for video titles that have been imported into my program.


@DingDing, if you want to scrape using python, I recommend scrapy (full featured spider) / lxml if you just want a simple html parser. I think it's better to scrape JavImdb, their html structure if a lot easier to parse.
 

Casshern2

Senior Member...I think
Mar 22, 2008
7,016
14,452
@ding73ding, HAHA! Glad I'm not fired as well. There are some unique DVD codes out there, that's for sure. For my needs I would stick close to DMM. I do really like the ideas going around and I applaud the efforts. You'll have some pretty neat stuff to when you are finished. But as a programmer knows...you're almost never finished.
 

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,092
UPDATE: I've managed to implement the Japanese HTML pages too...

Current status:
  • Search JAVZOO by DVD codes, actor name, title etc in either English or Japanese
  • Obtain most recent releases (60 vids per page)
  • Access individual vid page (typically from either the above two method)
  • The above three each produce an HTML page which can feed into following:
  • Save the HTML into a auto-named local file(s)
  • Parse the HTML to obtain metadata of vid(s)
  • After parsing the HTML(s), the data can be output into a couple format: XML format (XBMC-friendly) or text format (human-friendly)
  • Parsing Japanese HTML of individual vid
  • Merging Japanese and English metadata

To do (short term):
  • Parsing Japanese HTML of individual vid
  • Merging Japanese and English metadata
  • Saving an entire database in a secure format (I was being lazy and using shelve for now, but I read that it's not secure)
  • GUI???

So cyberzen was right, of course, I can't avoid learning SQLite of some favor...

Running example (command line)
Code:
[FONT=Fixedsys]T:\src\Python\jc>c:\Python27\python.exe jget.py search TEAM-025
opening ... http://www.javzoo.com/en/search/TEAM-025
saving to s-TEAM-025.htm ...
Search term: 'TEAM-025', found 1 JAVs
opening ... http://www.javzoo.com/en/movie/466e
parttitle:
postdate:
poster:
title: TEAM-025 10 Situation Yoshikawa Aimi
originialtitle:
aired: 2014-03-13
runtime: 238鍒嗛挓
cover: http://pics.dmm.co.jp/digital/video/team00025/team00025pl.jpg
director: Zakkuarai
studio: teamZERO
label: teamZERO
series:
id: TEAM-025
plot:
actor: Yoshikawa Aimi as
genre: Hi-Def
genre: Beautiful Girl
genre: Various Worker
genre: DMM Exclusive
genre: Individual Item
genre: Big Tits
genre: Cosplay
screen: http://pics.dmm.co.jp/digital/video/team00025/team00025jp-1.jpg
screen: http://pics.dmm.co.jp/digital/video/team00025/team00025jp-2.jpg
screen: http://pics.dmm.co.jp/digital/video/team00025/team00025jp-3.jpg
screen: http://pics.dmm.co.jp/digital/video/team00025/team00025jp-4.jpg
screen: http://pics.dmm.co.jp/digital/video/team00025/team00025jp-5.jpg
screen: http://pics.dmm.co.jp/digital/video/team00025/team00025jp-6.jpg
screen: http://pics.dmm.co.jp/digital/video/team00025/team00025jp-7.jpg
screen: http://pics.dmm.co.jp/digital/video/team00025/team00025jp-8.jpg

Generating TEAM-025.nfo

DONE

T:\src\Python\jc>[/FONT]

Running result (sorry badly indented XML is kinda ugly):
Code:
<movie>
	<title>RCT-468 An Office Lady confrontation! A BDSM Queen confrontation! Hon wife VS lover confrontation! The Older Sister catfight that is new Gachi battle Kirei to show in Situation</title>
	<originialtitle>RCT-468 OL対決!SM女王様対決!本妻VS愛人対決!シチュエーションで見せる新ガチバトル 綺麗なお姉さんキャットファイト</originialtitle>
	<aired>2013-01-24</aired>
	<runtime>134分钟</runtime>
<thumb aspect="poster">http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468pl.jpg</thumb>
	<director>神戸たろう</director>
	<studio>ROCKET</studio>
	<label>ROCKET</label>
	<series></series>
	<id>RCT-468</id>
	<plot></plot>
	<actor>
<name>Sumire</name>
<role>Sumire</role>
<order>0</order>
</actor>
<actor>
<name>天野小雪</name>
<role>Amano Koyuki</role>
<order>1</order>
</actor>
<actor>
<name>美緒みくる</name>
<role>Miomikuru</role>
<order>2</order>
</actor>
<actor>
<name>秋本詩音</name>
<role>Akimoto Shion</role>
<order>3</order>
</actor>

	<genre>Hi-Def</genre>
<genre>Action</genre>
<genre>&</genre>
<genre>Fighting</genre>
<genre>DVD Toaster</genre>
<genre>Older Sister</genre>
<genre>Married Woman</genre>

		<fanart>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-1.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-2.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-3.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-4.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-5.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-6.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-7.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-8.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-9.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-10.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-11.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-12.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-13.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-14.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-15.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-16.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-17.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-18.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-19.jpg</thumb>
		<thumb>http://pics.dmm.co.jp/digital/video/1rct00468/1rct00468jp-20.jpg</thumb>
	</fanart>

</movie>
 
Last edited by a moderator:
  • Like
Reactions: 1 person

iori11

Member
Nov 25, 2009
100
2
Haha :)

the crash could be due to japanese characters, maybe try without any japanese in filename first. I have no problems importing 1000s of my videos on linux, but I have not tested with much videos on the windows side due to running in a virtual machine.


:glasses: i renamed the jav and nothing so i divided my jav folder in multiples 200 videos folders and then they get imported normally ^^.

2 question :
-i have some jav movies that are are in 2 parts usually named AAA-111_A and the second part is usually named AAA-111_B but then no cover appear in Tagu, How should it be named?
-maybe should we create another post for tagu bug report? i don't want people to feel like i hijack their threat.

like always thanks for the app hope your work is going fine pls keep us updated:beg2:
 
Last edited by a moderator:

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,092
maybe should we create another post for tagu bug report? i don't want people to feel like i hijack their threat.

No no no... tagu is very relevant to JAV database. It's pretty quiet even with both tagu discussion and my little python project rolling.


I'm still putting off learning SQLite... I'm a science type and database is not appealing to me... (hint: any python coder wants to help out?)

I just discovered today avimdb has been renamed to javpee and now javpee strictly covers uncensored vids (all censored vids seem to have disappeared) and likewise javzoo covers censored vids. And it's clear now javzoo and javpee are sister sites.

Past few days, I've been trying to write a parser for Caribbean, hope to figure it out soon. I think I may be turning into a fan of Caribbean. (Asian characters coding schemes (and unicode) is a little nightmare)

Then when I get more coding mojo I will write a directory (collection) scanner and scrap metadata for all the vids on one's HDD. Should be easy enough... Just the dealing with the exception cases (such as multi-part vids) is going to take time and energy...
 

CodeGeek

Akiba Citizen
Nov 2, 2010
5,180
1,866
maybe should we create another post for tagu bug report? i don't want people to feel like i hijack their threat.
No no no... tagu is very relevant to JAV database. It's pretty quiet even with both tagu discussion and my little python project rolling.

I would recommend to split the thread sooner or later - especially the bug reports. Otherwise it will get confusing. But reports about the progress of the single projects as well as introducing projects can stay here.

As soon as I have something to post (hopefully I will have some time around Easter) I will open a new thread, but will post a link here to that thread. And I will post important things about it here as well.

It's nice that so many people have fun investing their time in little as well as bigger tools. When I opened this thread I thought I would be the only one. I thought that maybe another one would post here. Maximum. But now I see here a few people. That is very motivating.
 

DoctorD1501

New Member
Jul 28, 2014
1
1
Hi All,

Not aware of a lot of the effort already underway in this thread until just now, I developed my own Java Swing XBMC Jav Metadata scraper and released an alpha version of it today on GitHub. It uses DMM, JavLibrary, ActionJav, and SquarePlus.co.jp as sources of metadata and attempts to amalgamate all the data together between the various sites while finding the "best" data. However, the user can choose which data to use or supply their own, although I haven't implemented editing in all the categories yet.

Here's the link:

https://github.com/DoctorD1501/JAVMovieScraper

Please let me know if you notice bugs or have feature requests and feel free to contribute / comment on GitHub as well. Thank you.
 
  • Like
Reactions: ding73ding

masterbet

Member
Mar 4, 2009
37
25
The Tagu software is exactly what i am looking for right now. However I am a low-tech guy then, is there anyone can tell me how can i use this Tagu software? I install the software and add my JAV folder but when i choose Menu Database\Import video, nothing happens? I read the Tagu download page and see:


Requirements
Installation
  • qmake
  • make

Does the above mean that i have to install Qt5, sqlite and FFMPEG in order to get the software run?

Thank you!
 

ding73ding

Akiba Citizen
Oct 25, 2009
2,337
2,092
released an alpha version of it today on GitHub.

Fantastic! I have spent only a few mins looking at your src and determined it's pro-quality. When I get home (and find time) I will take a closer look.

I do have a lot of questions and perhaps suggestions, but since I can't do any serious look into JAVMovieSraper now I should refrain from babbling now...

For my own development, I have hardly touched my own code for months... see.. my last post here was April, I did a bit in Sept, then another little burst of coding in November and now it's functional state. It scan a folder for movie files and scrap javzoo.com for both Japanese and English metadata and generate an NFO file. Since I have a habit of manually translating (more precisely manually fixing Google translation) the title myself, I junk the English title from scrapping and keep my manually translated title. Since my Python code runs on PC, and my XBMC runs on an Android set-top box, I still have to go to XBMC and manually add the NFO(s) to the database.

My coding skill is quite good for an armature, but not as skilled and pro-quality compared to you. E.g. it seems Carribbeancom's HTML code can break Python's HTMLparser, so I'm stuck. (and in recent months I have hardly d/l any uncensored vids so there's less motivation to crack that problem).

I should try out your program and see if there's anyway I can integrate some of my fav features into your code.
 

TigerT

Member
Mar 14, 2011
40
34
is there a site like javlibrary that lets you search based on keywords and not label numbers? Cause as much as javlibrary is a great site, people don't seem to tag shit correctly
You can search by keywords on javlibrary in Advanced search.