I just updated the dmoz extractor plugin which is available in indexutemplates.com
It fix parsing dmoz links because dmoz.org change the html codes. Also I added html_entity_decode function to fix problem with non-english characters, the output is in utf8 format.
Also in this announcement, I would like to show you some tips and useful sql queries to clean categories and links, in case you are not happy with the import result.
REMEMBER TO ALWAYS BACKUP DATABASE FIRST. You can use indexu admin panel or phpmyadmin to generate full database backup.
--
HOW TO USE IT
1) Upload Category File (categories.txt) from http://rdf.dmoz.org/rdf/categories.txt
2) Load Category File, it will put the dmoz categories into database
3) Import Category, enter the dmoz path here. Add / at the end of path.
Correct entry: Business/Real_Estate/
Wrong entry: Business/Real_Estate
If you want to import the links, do in this step by ticking Import with link option.
4) If you want to populate links in different categories with dmoz have, use Import Link menu. Here you have freedom to put links in any categories.
5) Do Update Category Path
6) Do Update number of links
SQL queries
SQL query to clean category:
Then you also need to clear the category path table too, otherwise this plugin will refuse to import. It will see the category has been imported / duplicated.Code:delete from idx_category
SQL query to clean link:Code:delete from idx_category_path
Code:delete from idx_link
COMMON PROBLEMS
1) Problem: Incorrect DMOZ Path
Solution: add / at the end of dmoz path


