 |



 |
Currently Active Users: 79 2 members and 77 guests
|
 |

|

|

Extreme DMOZ Extractor & IndexU™
How to use Extreme Dmoz Extractor with IndexU™™ v5.4 :
1 – You must have the dmoz database files. You can download them at http://rdf.dmoz.org
2 – Make sure you have IndexU™ v5.4.
Please
note that IndexU™ v5.4 is not designed to handle large a database.
IndexU™ will crash with a database over 2000 categories and 50,000 web
links. IndexU™ is best able to handle a database with 1000 categories
and 30,000 web links. You can extract all the DMOZ database records
with Extreme DMOZ Extractor. You can do a full extract or a partial
one. However because of the IndexU™ data limits, you can't use a large
database with IndexU™ as mentioned above. Please use it wisely.
|
3 – Open Extreme DMOZ Extractor, which will automatically start new project.
4 – Select which files type you want to parse, category or web links.
5 – Select the .RDF file location.
6 – Select output file location.
7 – Click default in “Arrange output data fields”.
8 – Next, if you parse a category, then 'Set a root' option will be appear. You should use this option to parse database partially, otherwise just it leave empty.
Enter which DMOZ Category you want to parse.
For example you want to parse category Soccer that is actually at the http://www.dmoz.org/Sports/Soccer/. You should enter Top/Sports/Soccer (Replace http://www.dmoz.org with Top, and do not end it with "/"). If you parse web links, the 'Category Path Filter'
option will be appear. The same as when you parse categories. You
should use this option to parse a database partially, otherwise just
leave empty. Enter which DMOZ category you have parsed before. For
example you have parsed category Soccer. Now you want to extract its
web links. You should enter Top/Sports/Soccer, must be the same when you parse the category. |
9 – You may want to split the output file. This is useful to keep from
getting too many results. The larger file size you get, the more time
is needed to load the file. Set this value to 5,000-10,000.
If you parse a category for IndexU™ v5.4, you must not split it.
Otherwise the Indexu category will be incorrect. But you can easily
split the file when you parse the web links.
10 - Click Parse and wait for several minutes until it is done. And see the results.
11 – Now you have captured all the data from Sports/Soccer category and
are ready to import it into IndexU™. It contains over 400 categories
and 11,000 web links.
12 - Open your IndexU™ v5.4. Make sure
your database is empty. IndexU™ can't import if you have a category and
web links in your database. To make sure your database is empty, you
can use a tool like phpMyAdmin or run the following query in query box (Administrator area -> Database Tables -> Alter)
delete from idx_link
and
delete from idx_category
If you need to add to your existing database you need to do it with
Extreme DMOZ Extractor output data. You must synchronize the category
id with your existing database. You may do this manually using MS Excel
or another spreadsheet.
13 – When you are ready to import the data to IndexU™. Go to Administrator area -> Database Tables-> Import. Select the file name and table. Then select file format to Extreme DMOZ Extractor. Click Import.
Remember to import the category first before web links
14 – Now let IndexU™ recalculate the number of web links. Click Administrator area > Tools > Update Number of Links.
| And it's done. Enjoy your new web directory. |
|
 |