[topicmapmail] Classification of occurrences using keywords
Murray Altheim
m.altheim@open.ac.uk
Wed, 13 Nov 2002 17:37:46 +0000
Lars Marius Garshol wrote:
> * Johannes Koppenwallner
> |
> | Your example was quite convincing, so I think I will do it that
> | way. The only drawback I can see is a bigger (but more detailed)
> | topic map and eventually a more complicated navigation in it.
>
> A precise ontology is harder to create, and does require more
> information to be input, but the benefit is much much greater. Whether
> it is the right approach or not depends very much on the application
> and its domain.
>
> However, as long as you realize that topics and keywords are the same
> thing you can choose an ontology that has lower precision and still
> get quite good results.
While you may enter keywords as individual topics in a topic map
in order to better process them (as topics in their own right),
the use of keywords for searching and document identification
and their identity with "topics" (in the general sense) is hardly
a given, and one I would strongly disagree with. "Topic" and "keyword"
are not synonyms in either the dictionary, in common use, or in their
use in topic maps (even in your examples).
I provided an example of this which I'll reiterate. I wrote:
>
> [...] So, searching for say a paper on "Navigable History" (a subject)
> we might use the keywords:
>
> event history, navigable history, constructive time, edit-based
> indexing, information workspace, analysis, interpretation, authoring,
> spatial hypertext
>
> I mentioned that keywords essentially are a deconstruction or decompo-
> sition of a topic.
What I mean by this is that the set of keywords I provided *together*
describe the paper by Shipman.
> Just to test this theory, I grabbed those keywords from a specific paper
> by Frank M. Shipman and Haowei Hsieh. I can take those keywords and paste
> them into Google and find the paper. [goes off and tries it] Damn, but
> it works!
>
> Now, it'd be hard to argue that "analysis" (or really, any of the
> above keywords) matches the subject "Navigable History: A Reader's
> View of Writing".
Now in response to this you made the point that these keywords could
be brought into a topic map-based ontology and an author could then
make meaningful associations between those keywords (as topics) and
other topics in the topic map.
I don't disagree with that. It's a useful scenario, as the paper that
you and Steve published establishes.
But that is *not* a use of keywords. You've *taken* keywords and
modified the concept. Keywords in common usage (as in the example
I provided, or in any of the hundreds of thousands of Dublin Core
records in use) have as Jason Cupp pointed out an "aboutness" kind
of relation to the topic they are being identified with. It's in a
sense the combination of *all* the keywords provided that establishes
an identity with the topic they describe, not any one individual
keyword. The keywords provided by Shipman about "Navigable History"
attempt to say what that paper is about, but none of those keywords
on their own have a topic identity with the published paper.
The real question for me (nor for the original questioner) is not
how to have an author manually build a topic map ontology for a
given set of keywords, as that is a manual task, enormously complex
and requiring both ontological and domain-specific skills that might
not be available, for large document sets is an unreasonable task,
and besides, I don't think any of us have the authority or skills
to do what librarians do when they classify publications. I certainly
don't feel qualified to take someone else's publication (ie., a real
one, with an ISBN number) and add my own set of keywords-as-topics,
ignoring the real ones published with the document. And for the
several hundred documents I've got (that have their own existing
keyword sets) it would take a huge amount of time. What about 50,000
documents? 300,000 documents? OCLC's WorldCat has 48 million records,
and they all have keywords.
What the question (I believe) here is, is how to best use existing
*sets* of keywords in a topic map in such a way as to use the
conjunction of all their meanings as an identifier for the subject
being entered as a topic in a topic map. To best use them in a topic
map.
I don't think that question has yet been addressed (it's what I've
been thinking about for the past few weeks).
Murray
......................................................................
Murray Altheim <http://kmi.open.ac.uk/people/murray/>
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK
If you're the first person in a new territory,
you're likely to get shot at.
-- ma