[topicmapmail] DMOZ in XTM

Lars Marius Garshol larsga@ontopia.net
Wed, 13 Oct 2004 09:42:32 +0200


* Marcel Ferrante
|
| Ok, Lars. As say Jack "The Ripper", let's start by pieces...

Right. That's quite a morbid image. :)
 
| - As DMOZ, in the heart of project I don't suggest a big xtm file
| for my system makes the queries. I had been downloaded the dmoz file
| and it is more than 1 GB...Beeing pratical, I suggest implemet a ER
| model of topic maps concept, that is diferent of XTM as you said.

Actually, we did a DMOZ-to-TM conversion way back in 2000 or 2001 as a
scalability test. It worked fine, but we didn't do anything more with
it. So you can certainly represent DMOZ as a topic map. The question
is whether there's any point in if it all you're going to have is the
same taxonomy structure as DMOZ already has. You only need topic maps
if you use them to do something more.

|   - So, the next question is: why I'm using the XTM after all ?

Yes. :-)

|     - To interchange the data. If some one want makes his
| classification of-line or to make available the service to another
| applications. This point of view see the web services in the next
| moment.

For this kind of thing topic maps would work well (despite the
verbosity of XTM), especially because it would be easy to define
identifiers that would let anyone attach their own extensions at any
point in the structure.

| "since creating an ontology for life, the universe, and everything is
| quite a challenge."
| 
| - Let's start with simplicity. The focus is organize the URLs in the
| beginning.

Yes, I think if you want to get anywhere you have to simplify the
approach somewhat.
 
|   - The objective at first is fill the lack of DMOZ. For me this
| project stopped in the time. It is the same thing, same procedure for
| the user since three years ago. Points to attack: [...]

This made sense to me, and certainly topic maps would let you improve
all of these things.
 
| - For the future the project we can thing:
|   - Construct a client software for the user do it's classification
| with more agility or off line.

Probably something web-based will simplify the project quite a lot.
Doing a desktop application is a lot more work, and gets you into
issues about which platform to support, etc

|   - Don't limited the topics maps crawler in the DMOZ project, the
| Wikipedia is the next victim (and I see google in the last battle,
| with Bill don't arrive before..)

Actually, Wikipedia might be a good source of the start of an ontology
if you can parse the text to extract at least key topics, their names,
and their types. 

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
GSM: +47 98 21 55 50                  <URL: http://www.garshol.priv.no >