[topicmapmail] DMOZ in XTM
Jan Algermissen
algermissen@acm.org
Thu, 07 Oct 2004 15:16:52 +0200
Marcel,
Marcel Ferrante wrote:
> - As DMOZ, in the heart of project I don't suggest a big xtm file for
> my system makes the queries. I had been downloaded the dmoz file and
> it is more than 1 GB...Beeing pratical, I suggest implemet a ER model
> of topic maps concept, that is diferent of XTM as you said.
> - So, the next question is: why I'm using the XTM after all ?
> - To interchange the data. If some one want makes his
> classification of-line or to make available the service to another
> applications. This point of view see the web services in the next
> moment.
Yes, exactly! There is no point in maintaining an ontology (any kind of
data, actually) as an XTM document, propably not even as a topic map in
a topic map engine[1]. The issue is to make data available *as* a
topic map (looking at the data through topic map eyes). I realized this
after converting some thesauri into XTM...it felt so useless, given that
the thesaurus was already stored in a suitable format.
The key (IMHO) is to use XTM as the message mime type (assuming we'll
have application/xtm+xml at some point in time) for HTTP based
interactions with data providers (services/stores) such as DMOZ.
Why don't you, for experimental purposes, write a CGI that mimiks XTM
based communication with www.dmoz.org, by scraping DMOZ's HTML and
turning it into XTM. I did that once for Google's link: feature - it's
fun and very educating.
Jan
[1] For highly demensional data it does make sense, but usually the
domain that the data is about is in itself constraining enough to
justify storage in a relational database.
>
> "since creating an ontology for life, the universe, and everything is
> quite a challenge."
>
> - Let's start with simplicity. The focus is organize the URLs in the beginning.
>
> - The objective at first is fill the lack of DMOZ. For me this
> project stopped in the time. It is the same thing, same procedure for
> the user since three years ago. Points to attack:
> - The structure of DMOZ is confuse the concepts. In the same
> taxonomy we could find agregation, specialization, localization, etc.
> - They use a poor faceted classification. The resource (URL)
> appers in the many topics but it's and the topic? Should allows this
> too.
> > So their struture shall be divided the faceted categories, like
> is present in project like flamenco or facetmap. To divide we can use
> a good web thesaurus (eg eurovoc).
> > And the principal: The user must have the possibility to
> classify the URLs and topics using the mapic topics concetps. It maybe
> has a wizard to trainne the user to do this.
> - The navigation show only one hierchical level. So, to goes to a
> extremity the use have to wait the page refresh a five or six times.
> Very, very boring !!
> > See www.knowledgeprocessors.com
> - The search in the directory (by google) show the URL's in the topics.
> > I want produce a filter or reflection in the structure. That's
> a navegation combined with the search like flamenco
> (http://bailando.sims.berkeley.edu/flamenco-interface.html)
>
> - Do a prototipe to feel the reactions.
> - In the begging I'm thinking just use mysql that is free, but we
> can use oracle if the project increase it's dimension.
>
> - For the future the project we can thing:
> - Construct a client software for the user do it's classification
> with more agility or off line.
> - Retrieve the best URL classitication done. The favorities or
> bookmarks of the users.
> - Don't limited the topics maps crawler in the DMOZ project, the
> Wikipedia is the next victim (and I see google in the last battle,
> with Bill don't arrive before..)
>
> To finalize: "as well as man-hours and sheer know-how"
> I'm talking from Brasil, thank you for attention, I became very
> suprise when the answer arrive from a name that I took from the thesis
> that I read, pardon me for my english, and you can divide your costs
> by 5 if the project here. I'm serious, this is only a fact.
>