Results 1 to 9 of 9

Thread: Can't install extreme dmoz extractor

  1. #1
    Join Date
    Sep 2004
    Posts
    6

    Default Can't install extreme dmoz extractor

    I'm trying to installl the evaluation version of Extreme Dmoz Extraction, and I get this:

    c:\windows\system32\msxml4.dll
    unable to register the dll/ocx: loadlibrary failed; code 1114

    If I press 'ignore' and install it anyway, when I try to parse the rdf, it says:

    run time error 429 activex component can't create object

    I've installed it twice, the first time, it said also that I had a later version of comdlg32.ocx, and I've kept the file I already had. The second time, as I thought that maybe that was the problem, I've installed the version of this file that Extreme Dmoz Extraction included.

    My system is a Windows XP, in Spanish.

  2. #2
    Join Date
    Sep 2004
    Posts
    6

    Default

    Never mind, I've found the answer in older messages.

  3. #3
    Join Date
    Sep 2004
    Posts
    6

    Default

    I see in older messages that the evaluation version can't filter by path, which is quite inconvenient for me because I wanted to test the http://dmoz.org/World/Espa%c3%b1ol/Pa%c3%adses/ (Top: World: Español: PaÃ*ses ) path to make sure that extreme dmoz extractor doesn't have any issues with the UTF8 codes (it's not an unnecesary check, it seems that a competing product, the one that crawls dmor.org instead of using the dumps, can't crawl categories with UTF8 codes in the URL). So, anybody can confirm that extreme dmoz extractor parses the international categories withouth known problems?

  4. #4
    Join Date
    Aug 2001
    Location
    Indonesia
    Posts
    3,732

    Default

    With evaluation version, you're restricted to:
    - parse only first 1000 records of dmoz rdf dump
    - path filter is disabled
    - split file output is disabled

    To get full functions working, you need to buy the license.

    About western and asian characters issue, the current version is not able get these characters parsed well. The result will become a blank or unreadable characters.

    We're still working with this issue. We develop our own xml parser to get these characters parsed, so we will not use msmxl parser from microsoft again. The output file is set to be a unicode character set.

    All extreme dmoz extractor will get free update for version 2, you can expect it will be released in next two or three months. The current development status: we have successfully parsed structure.8u.rdf with the result in unicode character set, but we encounter problems in parsing content.8u.rdf. Still working on it

  5. #5
    Join Date
    Sep 2004
    Posts
    6

    Default

    Bummer!

    Do you think the following would be a solution until the next version is issued?:

    - To convert all UTF8-extended codes (at least the ones I need for my categories, world/español) to the codes dmoz uses in the URLs (for example: ñ (ñ) into %c3%b1) in the u8 files before parsing them?

  6. #6
    Join Date
    Aug 2001
    Location
    Indonesia
    Posts
    3,732

    Default

    - To convert all UTF8-extended codes (at least the ones I need for my categories, world/español) to the codes dmoz uses in the URLs (for example: ñ (ñ) into %c3%b1) in the u8 files before parsing them?
    If you need only to parse category with western or asian characters (like espanol), you should not buy right now because we do not provide a solution yet. But if you like to parse other category with english characters, we'd be happy if you buy our software

    No need to encode characters to URL encoded

  7. #7
    Join Date
    Sep 2004
    Posts
    6

    Default

    Well, since I really need now this DMOZ data and I had no alternatives, and the price wasn't so high, I've bought the license and I've tried to apply my solution, and so far it works. Yes, to change the UTF8 codes to Urelncoded, parse the dumps, and change again the urlencoded to whatever you want again is a pain in the ass, I hope the update makes all this unncecessary, but I'm happy with my twisted, overworked solution.

    Now, I have another question: are there more fields that the ones that appear when you press 'Default'? I was thinking specifically about the description of the categories (I'm not using Indexu but a customized solution, and these descriptions would come in handy since my system has the right place to use them).

  8. #8
    Join Date
    Aug 2001
    Location
    Indonesia
    Posts
    3,732

    Default

    As I know, dmoz does not provide description of categories.

  9. #9
    Join Date
    Sep 2004
    Posts
    6

    Default

    Yes, they do (I don't know if there are for all topics, but browsing the rdf dumps, I see plenty of 'd: Description' fields. E.g:

    <Topic r:id="Top/World/Catal%c3%a1*/Regional/Pa%c3%ads_Valenci%c3%a1*/Val%c3%a8ncia/Camp_de_T%c3%baria/Sant_Antoni_de_Benaixeve">
    <catid>1127131</catid>
    <d:Title>Sant_Antoni_de_Benaixeve</d:Title>
    <d: Description>Llocs web, associacions, institucions i empreses relacionats amb la poblaci%c3%b3. </d: Description>
    <altlang r:resource="Castell%c3%a1*:Top/World/Espa%c3%b1ol/Pa%c3%adses/Espa%c3%b1a/Comunidades_Aut%c3%b3nomas/Comunidad_Valenciana/Valencia/Camp_de_T%c3%baria/San_Antonio_de_Benag%c3%a9ber"/>
    <lastUpdate>2004-06-02 15:34:58</lastUpdate>
    </Topic>

    What would be also great is to have the 'see alsos' (there are different variations, like 'This category in other languages', etc), but I don't know how that could be accomplished since the data returned by Extreme Dmoz Extractor is more spreadsheet-like than xml-like and a category can have many 'see alsos' or can have none. Maybe a single field (or two or three, one for every kind of 'see also') with all the codes for the categories pointed by the 'see alsos'.

Similar Threads

  1. DMOZ Extractor
    By esm in forum Extreme Dmoz Extractor - dmoz.org xml parser
    Replies: 38
    Last Post: 05-14-2006, 06:37 PM
  2. Dmoz Extractor, should I try it?
    By Mortekai in forum v5.x
    Replies: 13
    Last Post: 09-18-2004, 02:52 AM
  3. extreme DMOZ Extractor doesn't work ?!!
    By Frank71 in forum Pre-Sales Questions
    Replies: 3
    Last Post: 09-15-2004, 05:32 AM
  4. Extreme DMOZ Extractor
    By Enigma in forum v5.x
    Replies: 1
    Last Post: 08-14-2004, 08:34 PM
  5. Using the Dmoz Extractor
    By Polo5 in forum v5.x
    Replies: 10
    Last Post: 04-29-2004, 02:43 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •