[topicmapmail] Classification of occurrences using keywords

Kal Ahmed kal@techquila.com
Wed, 13 Nov 2002 18:19:12 +0000


On Wednesday 13 November 2002 17:37, Murray Altheim wrote:
> Lars Marius Garshol wrote:
> > * Johannes Koppenwallner
> >
> > | Your example was quite convincing, so I think I will do it that
> > | way. The only drawback I can see is a bigger (but more detailed)
> > | topic map and eventually a more complicated navigation in it.
> >
> > A precise ontology is harder to create, and does require more
> > information to be input, but the benefit is much much greater. Whethe=
r
> > it is the right approach or not depends very much on the application
> > and its domain.
> >
> > However, as long as you realize that topics and keywords are the same
> > thing you can choose an ontology that has lower precision and still
> > get quite good results.
>
> While you may enter keywords as individual topics in a topic map
> in order to better process them (as topics in their own right),
> the use of keywords for searching and document identification
> and their identity with "topics" (in the general sense) is hardly
> a given, and one I would strongly disagree with. "Topic" and "keyword"
> are not synonyms in either the dictionary, in common use, or in their
> use in topic maps (even in your examples).
>

Why can a keyword *not* be treated as a subject ? It is true that there i=
s a=20
world of difference between "Navigable History" as a subject and "Navigab=
le=20
History" as a keyword, but surely a topic playing the role of keyword in =
a=20
has-keyword association with a reified resource can be treated as the lat=
ter=20
rather than the former ?

> I provided an example of this which I'll reiterate. I wrote:
>  > [...] So, searching for say a paper on "Navigable History" (a subjec=
t)
>  > we might use the keywords:
>  >
>  >    event history, navigable history, constructive time, edit-based
>  >    indexing, information workspace, analysis, interpretation, author=
ing,
>  >    spatial hypertext
>  >
>  > I mentioned that keywords essentially are a deconstruction or decomp=
o-
>  > sition of a topic.
>
> What I mean by this is that the set of keywords I provided *together*
> describe the paper by Shipman.
>

Yes, but each keyword individually serves as data to a search mechanism. =
There=20
is a collective set "the keywords of the paper "Navigable History: A Read=
er's=20
View of Writing"" and there is each individual member of that set. I feel=
=20
that these are two distinct concepts.

>  > Just to test this theory, I grabbed those keywords from a specific p=
aper
>  > by Frank M. Shipman and Haowei Hsieh. I can take those keywords and
>  > paste them into Google and find the paper. [goes off and tries it] D=
amn,
>  > but it works!
>  >

What happens if you choose one of those keywords or two or three from the=
=20
collection ? Surely the same paper is still found (modulo Google magic). =
Just=20
as the "Navigable History" keyword topic might play the role of "keyword"=
 in=20
an association with multiple reified resources.

>  > Now, it'd be hard to argue that "analysis" (or really, any of the
>  > above keywords) matches the subject "Navigable History: A Reader's
>  > View of Writing".
>

Not matches, no but "analysis" has been selected as a keyword for the=20
resource. So there is an association between the topic "the keyword=20
'analysis'" and the topic which reifies this paper.

<snip/>

> The real question for me (nor for the original questioner) is not
> how to have an author manually build a topic map ontology for a
> given set of keywords, as that is a manual task, enormously complex
> and requiring both ontological and domain-specific skills that might
> not be available, for large document sets is an unreasonable task,
> and besides, I don't think any of us have the authority or skills
> to do what librarians do when they classify publications. I certainly
> don't feel qualified to take someone else's publication (ie., a real
> one, with an ISBN number) and add my own set of keywords-as-topics,
> ignoring the real ones published with the document. And for the
> several hundred documents I've got (that have their own existing
> keyword sets) it would take a huge amount of time. What about 50,000
> documents? 300,000 documents? OCLC's WorldCat has 48 million records,
> and they all have keywords.
>

In my experience (mainly with tech. doc. so YMMV), keywords are either (a=
)=20
taken from a controlled vocabulary or (b) created by an author/indexer in=
 an=20
ad-hoc manner. I find that (b) is more common than (a), though I should=20
imagine that librarians would tend toward (a). I think that if working wi=
th=20
an keywords taken from a controlled vocabulary, one should seek to determ=
ine=20
whether or not there is an ontology underlying that vocabulary and if so,=
=20
model it in a topic map. If there is no underlying ontology or if keyword=
s=20
have been created in an ad-hoc manner, you can (automatically) do no more=
=20
that treat them as "keyword" topics with no relationship between them (on=
ly a=20
relationship to the resources)

> What the question (I believe) here is, is how to best use existing
> *sets* of keywords in a topic map in such a way as to use the
> conjunction of all their meanings as an identifier for the subject
> being entered as a topic in a topic map. To best use them in a topic
> map.
>

Now thats an interesting question. But I would argue that in most search=20
systems using keywords, a user will search for one/some of the keywords, =
not=20
all of them. In this case, what is the value of treating the keywords as =
a=20
set ?

> I don't think that question has yet been addressed (it's what I've
> been thinking about for the past few weeks).
>

I would be interested in hearing how you follow that train of thought to =
its=20
conclusion. I hereby offer several beers over which you can explain it to=
 me=20
;-)

Cheers,

Kal

--=20
Kal Ahmed, techquila.com
XML and Topic Map Consultancy

e: kal@techquila.com
p: +44 7968 529531
w: www.techquila.com