Has anyone used the Extreme DMOZ Extractor that was loaded to the Download page on 16 Oct?
Has anyone used the Extreme DMOZ Extractor that was loaded to the Download page on 16 Oct?
esm
"The older I get, the more I admire competence, just simple competence, in any field from adultery to zoology."
.
esm, any question about extreme?
The trial version can extract database up to 1000 records. No filter and file split.
justing wondering if anyone had any experience with it. I don't have a need for it myself. Unless I try to build some link sites.
file split?
esm
"The older I get, the more I admire competence, just simple competence, in any field from adultery to zoology."
.
When extracting dmoz database, you can easily have thousands web links. Say you get 43,000 web links. You can tell Extreme to split the result in a few smaller files, 5000 web links each file.
This is useful if you will work with excell for further process, or you want to import into indexu. It's good idea to import 5000 records 9 times rather than 43,000 records at once.
I downloaded extreme dmoz extractor but during the installation I get the following error:
C:\windows\system32\msxml4.dll
Unable to register the dll/ocx: loadlibrary failed; code 1114
A dynamic link library (DLL) initialization routine failed.
and when I try to parse the data I get:
Run time error '429'
ActiveX component can't create object
Would you please help me?
ms xml 4 is failed to install. Let me know your windows version.
I downloaded the file from microsoft and now your program appears to work,
but when I try to do a partial parse I only get the first 1000 results of
Top/Arts. I followed your instructions as you mention for Top/Sports/Soccer
but it always returns data for Top/Arts/movies
When I look at the "summary" it says:
Filters: No
Why is that? Am I doing anything wrong? is the program?
Extreme DMOZ Extractor ?
Where can I find it ???
Frank
at the bottom of
http://www.nicecoder.com/download.php
esm
"The older I get, the more I admire competence, just simple competence, in any field from adultery to zoology."
.
Found it
http://www.nicecoder.com/community/s...&threadid=1900
The same problem like me, Dody, I begun to be a little bit angry, seems that guy is waiting since 12 days for an answer and now I spend 1 hour of my time to see that I have the same problem and nothing is working.
Is this program working, this means did you test ist succesful ?
Frank
nope, never tried it. That's why I asked if anyone had.
esm
"The older I get, the more I admire competence, just simple competence, in any field from adultery to zoology."
.
Do not waste your time
I tried it 2 hours and because of my nearly 30 not answered mails to the staff I also tried various cracks, but the problem is, that the program won't work and because of the missing parent_cat_id at the moment I have no idea how it should work...
Frank
What?? It's strange! I have replayed your email frank. I did 3 emails from you 2 days ago.Originally posted by Frank71
Do not waste your time
I tried it 2 hours and because of my nearly 30 not answered mails to the staff I also tried various cracks, but the problem is, that the program won't work and because of the missing parent_cat_id at the moment I have no idea how it should work...
Frank
Where you post your email? It should be support@nicecoder.com. Do not email me at support@indexu.com and support@sentraweb.com.
Hello Dody,
sorry but I mailed to all your emails and you should have also several unread messages in your forum box - I'm talking about emails over the whole year...
Frank
Btw: The dmoz tol still doesn't work.
Hi Frank, I posted my replay here too so the others that have the same problem can solve their problem here.
Ok, according to your explaination, I assume that you want to extract dmoz's database which listed under
http://dmoz.org/World/Deutsch/Online-Shops/ (there're over 10,000 web links here).
I'm not sure where you failed, when extracting database or when import extracted database into indexu
extracting dmoz's database:
-------------------------------------
1. Make sure you have downloaded dmoz.org's database here http://rdf.dmoz.org
You should get 2 files from there: structure.rdf.8u.gz and content.rdf.u8.gz. Those 2 files are zipped, so you must unzip them first before you can use. You may use winzip or winrar software to extract .gz files. Now you should have structure.rdf.8u and content.rdf.u8
2. Open Extreme Dmoz Extractor. First you will need to extract the categories. Your input should be:
- File type: "Category hierarchy information"
- Open RDF file: c:\structure.rdf.8u (should point to your structure.rdf.8u file location in your harddisk)
- Save output: c:\dmoz\online-shop-cat.txt (it's result where your output file will be generated)
- Arrange default data field: <click default button>
- Set root: Top/World/Deutsch/Online-Shops (notice this input do not ends with /)
- Split files: (leave this empty)
3. Click Parse
That is now you should have output files. Remember that if you use evaluation version, you're limited to extract 1000 records only. It will automatically stop to parse when it reach 1000.
4. Follow the same step above to parse web links file content.rdf.8u
Importing into indexu 3.1
-------------------------------
1. Make sure you have installed Extreme Patch. This patch contain functionality to import data extracted from Extreme DMOZ Extractor. You can download patch here: http://www.nicecoder.com/download.php
2. Remember that data extracted from Extreme may not able to be used with your existing database. They have different category structure (the id, parent_id, etc). So you should use empty database. Otherwise use ms excell or text editor to syncronize them.
3. Go to indexu administration -> database tables -> import
Then select file format to Extreme DMOZ Extractor. Click Import. Remember to import category first before web links
4. Follow the above steps to import web links.
5. Then you need to recalculate web links number
Click Administrator area -> Tools -> Update Number of Links.
I think I have clearly explain the step here
http://www.nicecoder.com/dmoz_extractor.php
Let me know which step you have failed.