[topicmapmail] automated generation of topic map

Steve Pepper pepper@ontopia.net
Wed, 20 Feb 2002 11:07:29 +0100


At 10:11 08/02/02 -0600, Dan Wu wrote:
>I was thinking that constructing a topic map for
>a domain is a very time consuming and labour intensive
>job cauae one, who I assume should be the domain expert,
>has to find out all the topics, occurrences and
>associations. Is there a way    already devloped or being discussed
>that we can automate this process.

There are many ways. I presented some of them in my talk at XML 2001 and 
Eric Freese and I will be going into the matter in more depth at KT 2002 in 
Seattle (Tuesday, March 12th 2002, 4pm):

   Methods for the Automatic Construction of Topic Maps

   Presented by Eric Freese, ISOGEN International and Steve Pepper, Ontopia

   A topic map can be regarded as an indexing layer that provides unified
   access to information resources emanating from multiple, disparate
   sources. Because of its emphasis on capturing semantics, topic mapping
   is more akin to "intellectual indexing" (a term which covers
   back-of-book indexes, thesauri, and glossaries) than to the "mechanical
   indexing" typical of full-text indexes. This accounts for topic maps'
   superiority in terms of increased precision and recall, but raises the
   question of whether the effort required to create and maintain topic
   maps may be prohibitive.

   This double session seeks to address that question and to demonstrate
   how the creation and maintenance of topic maps can be partially or even,
   in many cases, wholly automated. The first part of the presentation will
   describe the tasks involved in creating topic maps and then enumerate
   various sources of topic map data, including pre-existing ontologies,
   document metadata, structured and unstructured document content, and
   information systems. Following this, a number of data extraction
   techniques will be described and rules of thumb provided for when best
   to use each one. Finally, practical demonstrations will be given of an
   open source application employing Natural Language Processing and a
   toolkit that exploits the synergies between topic maps and RDF to
   generate topic maps from semi-structured data.

As you can see from the abstract, there are many aspects to the issue of 
autogenerating topic maps. The good news is that, although topic maps *are* 
a form of "intellectual indexing", there are very efficient ways of 
leveraging whatever intellectual effort may already have been expended on 
codifying the semantics of a particular domain or set of information resources.

That kind of intellectual effort is being expended all the time: for 
example, whenever someone creates or populates a database schema or DTD, or 
adds metadata to a document. The problem up until now has been that such 
effort has not been easily reusable. With topic maps we now have a standard 
way of doing just that.

Steve

--
Steve Pepper, Chief Executive Officer <pepper@ontopia.net>
Convenor, ISO/IEC JTC1/SC34/WG3  Editor, XTM (XML Topic Maps)
Ontopia AS, Waldemar Thranes gt. 98, N-0175 Oslo, Norway.
http://www.ontopia.net/ phone: +47-23233080 GSM: +47-90827246