[topicmapmail] DMOZ in XTM

Marcel Ferrante Marcel Ferrante <marcelf@gmail.com>
Thu, 7 Oct 2004 06:29:36 -0300


Ok, Lars. As say Jack "The Ripper", let's start by pieces...

"I think such a project would be very interesting, but it would take
quite a bit of resources, both in terms of hardware and hosting
capacity."

Yes, to implement such thing, I will considered the:

- As DMOZ, in the heart of project I don't suggest a big xtm file for
my system makes the queries. I had been downloaded the dmoz file and
it is more than 1 GB...Beeing pratical, I suggest implemet a ER model
of topic maps concept, that is diferent of XTM as you said.
  - So, the next question is: why I'm using the XTM after all ?
    - To interchange the data. If some one want makes his
classification of-line or to make available the service to another
applications. This point of view see the web services in the next
moment.

"since creating an ontology for life, the universe, and everything is
quite a challenge."

- Let's start with simplicity. The focus is organize the URLs in the beginning.

  - The objective at first is fill the lack of DMOZ. For me this
project stopped in the time. It is the same thing, same procedure for
the user since three years ago. Points to attack:
    - The structure of DMOZ is confuse the concepts. In the same
taxonomy we could find agregation, specialization, localization, etc.
    - They use a poor faceted classification. The resource (URL)
appers in the many topics but it's and the topic? Should allows this
too.
    > So their struture shall be divided the faceted categories, like
is present in project like flamenco or facetmap. To divide we can use
a good web thesaurus (eg eurovoc).
    > And the principal: The user must have the possibility to
classify the URLs and topics using the mapic topics concetps. It maybe
has a wizard to trainne the user to do this.
    - The navigation show only one hierchical level. So, to goes to a
extremity the use have to wait the page refresh a five or six times.
Very, very boring !!
    >  See www.knowledgeprocessors.com
    - The search in the directory (by google) show the URL's in the topics.
    > I want produce a  filter or reflection in the structure. That's
a navegation combined with the search like flamenco
(http://bailando.sims.berkeley.edu/flamenco-interface.html)

  - Do a prototipe to feel the reactions.
     - In the begging I'm thinking just use mysql that is free, but we
can use oracle if the project increase it's dimension.

- For the future the project we can thing:
  - Construct a client software for the user do it's classification
with more agility or off line.
  - Retrieve the best URL classitication done. The favorities or
bookmarks of the users.
  - Don't limited the topics maps crawler in the DMOZ project, the
Wikipedia is the next victim (and I see google in the last battle,
with Bill don't arrive before..)

To finalize: "as well as man-hours and sheer know-how"
I'm talking from Brasil, thank you for attention, I became very
suprise when the answer arrive from a name that I took from the thesis
that I read, pardon me for my english, and you can divide your costs
by 5 if the project here. I'm serious, this is only a fact.

-- 
Marcel Ferrante Silva
Especialista em Engenharia de Sistemas
55 31 88519069     55 31 33789069 ICQ:218148957
MSN: marcelferrante@hotmail.com