[topicmapmail] The potential of TM fragments

Murray Altheim m.altheim@open.ac.uk
Thu, 16 Oct 2003 23:04:35 +0100


Carlo Moneti wrote:
[...]
> My question is, is there a way to define 
> TM fragments in documents so that when harvesting them, a rich TM can be 
> automatically generated?
> 
> In trying to answer my own question, this is what I came up with:
> 
> 1. if there exists a rich set of PSIs,
> 2. if the fragments use PSIs everywhere,
> 3. if the documents are of the same knowledge domain,
> 4. if you already have a TM template that defines all of the topic-types,
>   association-types, and the associations among those topic-types for that
>   domain,
>   then, it seems you would have all the bits of information necessary to 
> process the harvested fragments and the TM template into a rich map. Is 
> this roughly correct?

Carlo,

One of the failings in the scheme of the "Semantic Web" is the idea
that authors will (a) know enough about their own documents and the
domain in which they're in; (b) know how to find and use appropriate
categorization schemes, (c) know enough about categorization schemes
to correctly categorize their own documents; (d) have the time and
energy to do so; (e) not abuse categorization for the sake of Web
"hits"; know enough about markup or have sufficiently "intelligent"
tools to assist them in modifying their web pages; (f) tools will
miraculously become available to harvest those miraculously-marked
up pages' metadata.

Now, having said all that, the idea of marking up your own content
so that you can harvest it is a very good idea. If you are trying
to develop a web-based document management system and you have any
control at all over the web site's contents (say, you've talked
your management into instituting certain policies over content
metadata markup), these kinds of schemes might work.

One of the most active and successful of these schemes is of course
Dublin Core (dublincore.org). If you aren't familiar with DCMES, it's
a bunch of "PSIs" for document metadata, and is standardized and in
wide use (you may already be familiar with DC). The Dublin Core has
a number of documents that discuss how to embed DC metadata in Web
page <meta> elements.

Now, to answer your question given the above context (which is
just one of many possible ones, but happens to be one I'm myself
using):

 > 1. if there exists a rich set of PSIs

The DC Metadata Element Set version 1.1.

 > 2. if the fragments use PSIs everywhere,

You are able to enforce usage of DC-in-XHTML or DC-in-HTML.

 > 3. if the documents are of the same knowledge domain

This isn't strictly necessary if you hook DC with one of the
available library schemas, such as Dewey, LoC, etc.

 > 4. if you already have a TM template that defines all of the topic-types,
 >   association-types, and the associations among those topic-types for that
 >   domain,

Okay, I see your point about #3. But you would probably be better
off generalizing the topic-types (not the topics, but the generic
types), association-types and associations rather than having them
specific to a domain, unless the level of assertions you wish to
make goes beyond categorization.

 >   then, it seems you would have all the bits of information necessary to
 > process the harvested fragments and the TM template into a rich map. Is
 > this roughly correct?

Yes, absolutely. Jack Park and I have been discussing this same
issue for the past few days, trying to figure out how to hook
Lucene into TM4J and Ceryle. We think it's a very viable approach,
and takes advantage of Topic Maps' merging features to handle the
"infoglut" that will occur when indexing large volumes of records.
This is essentially what Topic Maps were designed for.

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

   Monkeys use thoughts to control robotic arm
     http://www.sfgate.com/cgi-bin/article.cgi?file=/c/a/2003/10/13/MN2018.DTL
   Bush uses media expertly to push apocalyptic view
     http://truthout.org/docs_03/091403J.shtml