Results 1 to 5 of 5

Thread: dmoz extractor plugin updated

  1. #1
    Join Date
    Aug 2001
    Location
    Indonesia
    Posts
    3,732

    Default dmoz extractor plugin updated

    I just updated the dmoz extractor plugin which is available in indexutemplates.com

    It fix parsing dmoz links because dmoz.org change the html codes. Also I added html_entity_decode function to fix problem with non-english characters, the output is in utf8 format.

    Also in this announcement, I would like to show you some tips and useful sql queries to clean categories and links, in case you are not happy with the import result.

    REMEMBER TO ALWAYS BACKUP DATABASE FIRST. You can use indexu admin panel or phpmyadmin to generate full database backup.

    --

    HOW TO USE IT

    1) Upload Category File (categories.txt) from http://rdf.dmoz.org/rdf/categories.txt

    2) Load Category File, it will put the dmoz categories into database

    3) Import Category, enter the dmoz path here. Add / at the end of path.
    Correct entry: Business/Real_Estate/
    Wrong entry: Business/Real_Estate

    If you want to import the links, do in this step by ticking Import with link option.

    4) If you want to populate links in different categories with dmoz have, use Import Link menu. Here you have freedom to put links in any categories.

    5) Do Update Category Path
    6) Do Update number of links


    SQL queries

    SQL query to clean category:

    Code:
    delete from idx_category
    Then you also need to clear the category path table too, otherwise this plugin will refuse to import. It will see the category has been imported / duplicated.

    Code:
    delete from idx_category_path
    SQL query to clean link:

    Code:
    delete from idx_link

    COMMON PROBLEMS

    1) Problem: Incorrect DMOZ Path
    Solution: add / at the end of dmoz path

  2. #2
    Join Date
    Aug 2001
    Location
    Indonesia
    Posts
    3,732

    Default

    Do we run the sql cleaning queries each time we use the plugin? Assuming that we can use the plugin multiple times choosing to import one category then later another group. Each time do we need to clean?
    Run the query only if you want to clear your directory. The query will remove all categories and links.

    Does this cleaning need to be done when using the URL method for importing cats and links or does this relate only to manual RDF uploads?
    The dmoz extractor plugin is not using dmoz'e RDF. For category import, it use plain categories.txt file provided by dmoz. Then indexu populate it into database for further process.

    For importing links, indexu will crawl dmoz sites for each of categories. The more categories you have, it will take longer.

    Only clean the category and links when you are unhappy with the result.

    Also, I tried multiple ways to enter the DMOZ path when I got the invalid path error. I even copied your example cut and paste and it did not work, so I am hoping that the update to the plugin will fix this problem.
    Which dmoz path / url you want to import? I will help you with a correct entry.

  3. #3
    Join Date
    Aug 2001
    Location
    Indonesia
    Posts
    3,732

    Default

    Do we run the sql cleaning queries each time we use the plugin? Assuming that we can use the plugin multiple times choosing to import one category then later another group. Each time do we need to clean?
    Run the query only if you want to clear your directory. The query will remove all categories and links.

    Does this cleaning need to be done when using the URL method for importing cats and links or does this relate only to manual RDF uploads?
    The dmoz extractor plugin is not using dmoz'e RDF. For category import, it use plain categories.txt file provided by dmoz. Then indexu populate it into database for further process.

    For importing links, indexu will crawl dmoz sites for each of categories. The more categories you have, it will take longer.

    Only clean the category and links when you are unhappy with the result.

    Also, I tried multiple ways to enter the DMOZ path when I got the invalid path error. I even copied your example cut and paste and it did not work, so I am hoping that the update to the plugin will fix this problem.
    Which dmoz path / url you want to import? I will help you with a correct entry.

  4. #4
    Join Date
    Aug 2001
    Location
    Indonesia
    Posts
    3,732

    Default

    @antony
    I just emailed you with your account information.

  5. #5
    Join Date
    Aug 2001
    Location
    Indonesia
    Posts
    3,732

    Default

    Ah.. thanks.
    I just fixed it.

Similar Threads

  1. Dmoz Extractor Plugin Released
    By maulana in forum Plugins
    Replies: 20
    Last Post: 09-23-2010, 01:48 AM
  2. Dmoz Extractor Plugin Problem
    By hermit36 in forum Plugins
    Replies: 1
    Last Post: 09-16-2010, 10:13 AM
  3. DMOZ Extractor
    By esm in forum Extreme Dmoz Extractor - dmoz.org xml parser
    Replies: 38
    Last Post: 05-14-2006, 06:37 PM
  4. Dmoz Extractor, should I try it?
    By Mortekai in forum v5.x
    Replies: 13
    Last Post: 09-18-2004, 02:52 AM
  5. Dmoz extractor
    By johngreen in forum v5.x
    Replies: 13
    Last Post: 05-11-2004, 10:13 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •