[topicmapmail] automated generation of topic map
Steve Pepper
pepper@ontopia.net
Wed, 20 Feb 2002 11:07:29 +0100
At 10:11 08/02/02 -0600, Dan Wu wrote:
>I was thinking that constructing a topic map for
>a domain is a very time consuming and labour intensive
>job cauae one, who I assume should be the domain expert,
>has to find out all the topics, occurrences and
>associations. Is there a way already devloped or being discussed
>that we can automate this process.
There are many ways. I presented some of them in my talk at XML 2001 and
Eric Freese and I will be going into the matter in more depth at KT 2002 in
Seattle (Tuesday, March 12th 2002, 4pm):
Methods for the Automatic Construction of Topic Maps
Presented by Eric Freese, ISOGEN International and Steve Pepper, Ontopia
A topic map can be regarded as an indexing layer that provides unified
access to information resources emanating from multiple, disparate
sources. Because of its emphasis on capturing semantics, topic mapping
is more akin to "intellectual indexing" (a term which covers
back-of-book indexes, thesauri, and glossaries) than to the "mechanical
indexing" typical of full-text indexes. This accounts for topic maps'
superiority in terms of increased precision and recall, but raises the
question of whether the effort required to create and maintain topic
maps may be prohibitive.
This double session seeks to address that question and to demonstrate
how the creation and maintenance of topic maps can be partially or even,
in many cases, wholly automated. The first part of the presentation will
describe the tasks involved in creating topic maps and then enumerate
various sources of topic map data, including pre-existing ontologies,
document metadata, structured and unstructured document content, and
information systems. Following this, a number of data extraction
techniques will be described and rules of thumb provided for when best
to use each one. Finally, practical demonstrations will be given of an
open source application employing Natural Language Processing and a
toolkit that exploits the synergies between topic maps and RDF to
generate topic maps from semi-structured data.
As you can see from the abstract, there are many aspects to the issue of
autogenerating topic maps. The good news is that, although topic maps *are*
a form of "intellectual indexing", there are very efficient ways of
leveraging whatever intellectual effort may already have been expended on
codifying the semantics of a particular domain or set of information resources.
That kind of intellectual effort is being expended all the time: for
example, whenever someone creates or populates a database schema or DTD, or
adds metadata to a document. The problem up until now has been that such
effort has not been easily reusable. With topic maps we now have a standard
way of doing just that.
Steve
--
Steve Pepper, Chief Executive Officer <pepper@ontopia.net>
Convenor, ISO/IEC JTC1/SC34/WG3 Editor, XTM (XML Topic Maps)
Ontopia AS, Waldemar Thranes gt. 98, N-0175 Oslo, Norway.
http://www.ontopia.net/ phone: +47-23233080 GSM: +47-90827246