[topicmapmail] Classification of occurrences using keywords

Kal Ahmed kal@techquila.com
Wed, 13 Nov 2002 09:52:03 +0000


On Wednesday 13 November 2002 00:35, Jason Cupp wrote:
> Well, I guess there are keywords and there are keywords, but maybe for
> topic maps they're all roses...
>
> I guess I wanted to pose the question: is there room in topic maps for
> ambigious associations between any uncontrolled word found in a documen=
t
> and that document (like free text searches on the web)? I work with the
> Z39.50 protocol, and can choose to do a completely free-text search or
> limit my search to an abstact, keywords (made explicit by the author),
> publisher, etc...
>
> Is it practical for a topicmap to support the free-text search (make ev=
ery
> occurance of a word a topic scoped accordingly), or is it too "against =
the
> grain"? I also think the "aboutness" association shouldn't always be a
> second-class citizen, in that it definately isn't a nonsense propositio=
n.
>

I don't think that attempting to replicate a full-text index in a topic m=
ap in=20
this way will be very efficient. It might work for a small sample set, bu=
t I=20
have a feeling that it would not scale very well, and you would start to =
have=20
big problems when you add in the information needed to do proximity and=20
phrase searches.

But what might be more practical is combining a topic map which provides=20
well-defined contextual information with a free-text search. Consider:

"I want to find all documents containing the phrase 'free-text search" wh=
ich=20
are related to the subject 'topic maps'"=20

Such  a search could make use of a text-search engine (possibly over Z39.=
50)=20
and then a filter applied by comparing the returned resource addresses=20
against occurrences in a topic map.

Cheers,

Kal

> | -----Original Message-----
> | From: Lars Marius Garshol [mailto:larsga@garshol.priv.no]
> | Sent: Tuesday, November 12, 2002 3:48 PM
> | To: topicmapmail@infoloom.com
> | Subject: Re: [topicmapmail] Classification of occurrences
> | using keywords
> |
> |
> |
> | * Jason Cupp
> |
> | | For unsupervised classification of a heterogeneous document
> | | collection, the only reliable relationship to devine for keywords
> | | would be the "aboutness" mentioned in "Lessons on Applying Topic
> | | Maps" ( and for collections as broad and the WWW, even this fails )
> |
> | Jason, you're making it impossible to have a meaningful dialogue
> | here. When you say keyword above, what do you MEAN? What IS a
> | "keyword" as you use the term? Without knowing that I have no idea
> | what you are saying above.
> |
> | (The title of the paper is "The XML Papers", by the way. "Lessons on
> | applying topic maps" is just the subtitle.)
> |
> | | What about representing both in a topicmap: the "aboutness"
> | | association and (given standarized vocabularies & thesauri & PSIs)
> | | more meaningful associations.
> |
> | That sounds like what we did, but you're not being very precise, so
> | it's hard to tell.
> |
> | --
> | Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
> | ISO SC34/WG3, OASIS GeoLang TC        <URL:
> | http://www.garshol.priv.no >
> |
> | _______________________________________________
> | topicmapmail mailing list
> | topicmapmail@infoloom.com
> | http://www.infoloom.com/mailman/listinfo/topicmapmail
>
> _______________________________________________
> topicmapmail mailing list
> topicmapmail@infoloom.com
> http://www.infoloom.com/mailman/listinfo/topicmapmail

--=20
Kal Ahmed, techquila.com
XML and Topic Map Consultancy

e: kal@techquila.com
p: +44 7968 529531
w: www.techquila.com