Hi, just a quick update first on my end, then a general reply to your new posts since last time I posted, over a month ago.
It took just over 1 day to finish the coding of what I called my alpha, however I hit a stupid problem, and after 2 days of trying, I got pissed off and took a break... then Chinese New Year arrived and only yesterday I took a look at the problem again. And this time (as expected) it took me only couple hours to solve it. The problem was I couldn't get XBMC to display Japanese (Unicode) text which is something I did ages ago (on the old XBox even) and forgotten how... and some of the XBMC documentation is so horrendous that such a simple task is STILL not properly documented.
Anyway I got myself a second Android PC and installed XBMC on it, and took another crack at it and as I said, solved the Unicode display problem (and confirming my own program was correct all along) and so my little project is back on. To remind (re-confirm) you, my project (and philosophy) is to exploit as much as possible what's already out there and well-supported. So most of the things you guys are talking about I'm relying on XBMC to do for me, e.g. browse (or search or filter) my collection by title, genre, actress and DVD code, all these will be provided by XBMC, I only need to find the meta data and feed it to XBMC.
To re-cap, I have now a small program that read one or more plain text file containing the meta data (from DMM etc) and generate one or more .nfo files (in XML format) that can be readily imported by XBMC into it's video library. This is it, nothing more than a format translator (from plain text to XML) and a simple language translator (using super-simple dictionary look-up) which translate the field names (発売日, 収録時間, 出演者 etc) into English. Some variation in the input file is handled correctly, e.g. 発売日, 配信開始日, 商品発売日 are all treated as "premiered".
I've also created a simple dictionary which covers only the field names and genres. Translating actress names, studios etc would be super easy technically (adding literally less than 5 lines of code) but I don't have a list of actresses in both languages. Handling actresses with more than 1 aliases would be slightly more difficult, but again the main issue is data, not program.
Translating title from Japanese to English is of course technically difficult. There might be a way to interface with Google Translate and exploit that functionality. The result would be quite garbled but it beats nothing. Personally I'm not interested in machine (including Google) translation since I do manual translation (starting from Google Translate) for every movie that I download anyway (which is about 1 per week, about...)
Next I would like to ask for more meta data, esp. the 80,000 database films that CG already collected. I would be surprised if my modest JAV collection isn't already completely covered by that database.
As for new films that are released daily or weekly, it would be easy to code a "scrapper" if only there's a reliable source to scrap. DMM is very tricky to use for me, sometimes I pull off a cover image (DMM has the best quality cover scans) from Yahoo or Google's cached or translated version, but that's not 100% reliable. No, I haven't figured out how to proxy-access DMM.
Currently my biggest problem is that the actress names (in Japanese) and genres (in English) don't seem to get properly imported into XBMC library. I have to figure out if it's XBMC's config problem or my output format needs some tweaking.
As for the structure and content of the data, I strongly believe in keep it simple, initially. Also I want very high reliability and data quality, I want to trust only data directly or indirectly obtained from DMM or similar e-tailer or studio's official sites. So here is the nearly complete list of fields to be included:
title (English, if available), originaltitle (ie. Japanese title if English title is available), studio, series, runtime, genre, premiered (ie. release date), actors, director, plot (the textual description (hundreds of Japanese words) that accompany every product that DMM sells), id (DVD code)
For all of these, the field names that I gave is also exactly the same field names used by XBMC's library. For all except one, DVD code (=id in XBMC), my choice is IMHO pretty un-controversial. It's easy to think up many more, but I want to get these down before (if ever) taking on more.
One problem is that most of the un-censored video were not (presumably?) released as DVD so lacks a DVD code. But many (major) studio has their own consistent coding scheme, e.g. Carribbean.com use a code MMDDYY-NNN (e.g. 122713-508). In general, meta data for uncensored videos are a bit lower in completeness and/or quality.
When I get home, and has some free time, I will get some screen shots to post here.
That's it for my update, next I will reply some...
It took just over 1 day to finish the coding of what I called my alpha, however I hit a stupid problem, and after 2 days of trying, I got pissed off and took a break... then Chinese New Year arrived and only yesterday I took a look at the problem again. And this time (as expected) it took me only couple hours to solve it. The problem was I couldn't get XBMC to display Japanese (Unicode) text which is something I did ages ago (on the old XBox even) and forgotten how... and some of the XBMC documentation is so horrendous that such a simple task is STILL not properly documented.
Anyway I got myself a second Android PC and installed XBMC on it, and took another crack at it and as I said, solved the Unicode display problem (and confirming my own program was correct all along) and so my little project is back on. To remind (re-confirm) you, my project (and philosophy) is to exploit as much as possible what's already out there and well-supported. So most of the things you guys are talking about I'm relying on XBMC to do for me, e.g. browse (or search or filter) my collection by title, genre, actress and DVD code, all these will be provided by XBMC, I only need to find the meta data and feed it to XBMC.
To re-cap, I have now a small program that read one or more plain text file containing the meta data (from DMM etc) and generate one or more .nfo files (in XML format) that can be readily imported by XBMC into it's video library. This is it, nothing more than a format translator (from plain text to XML) and a simple language translator (using super-simple dictionary look-up) which translate the field names (発売日, 収録時間, 出演者 etc) into English. Some variation in the input file is handled correctly, e.g. 発売日, 配信開始日, 商品発売日 are all treated as "premiered".
I've also created a simple dictionary which covers only the field names and genres. Translating actress names, studios etc would be super easy technically (adding literally less than 5 lines of code) but I don't have a list of actresses in both languages. Handling actresses with more than 1 aliases would be slightly more difficult, but again the main issue is data, not program.
Translating title from Japanese to English is of course technically difficult. There might be a way to interface with Google Translate and exploit that functionality. The result would be quite garbled but it beats nothing. Personally I'm not interested in machine (including Google) translation since I do manual translation (starting from Google Translate) for every movie that I download anyway (which is about 1 per week, about...)
Next I would like to ask for more meta data, esp. the 80,000 database films that CG already collected. I would be surprised if my modest JAV collection isn't already completely covered by that database.
As for new films that are released daily or weekly, it would be easy to code a "scrapper" if only there's a reliable source to scrap. DMM is very tricky to use for me, sometimes I pull off a cover image (DMM has the best quality cover scans) from Yahoo or Google's cached or translated version, but that's not 100% reliable. No, I haven't figured out how to proxy-access DMM.
Currently my biggest problem is that the actress names (in Japanese) and genres (in English) don't seem to get properly imported into XBMC library. I have to figure out if it's XBMC's config problem or my output format needs some tweaking.
As for the structure and content of the data, I strongly believe in keep it simple, initially. Also I want very high reliability and data quality, I want to trust only data directly or indirectly obtained from DMM or similar e-tailer or studio's official sites. So here is the nearly complete list of fields to be included:
title (English, if available), originaltitle (ie. Japanese title if English title is available), studio, series, runtime, genre, premiered (ie. release date), actors, director, plot (the textual description (hundreds of Japanese words) that accompany every product that DMM sells), id (DVD code)
For all of these, the field names that I gave is also exactly the same field names used by XBMC's library. For all except one, DVD code (=id in XBMC), my choice is IMHO pretty un-controversial. It's easy to think up many more, but I want to get these down before (if ever) taking on more.
One problem is that most of the un-censored video were not (presumably?) released as DVD so lacks a DVD code. But many (major) studio has their own consistent coding scheme, e.g. Carribbean.com use a code MMDDYY-NNN (e.g. 122713-508). In general, meta data for uncensored videos are a bit lower in completeness and/or quality.
When I get home, and has some free time, I will get some screen shots to post here.
That's it for my update, next I will reply some...