[topicmapmail] Testbed for Subject Identity Measure
Dipl.-Wirtsch.-Inf. Lutz Maicher [Universität Leipzi g]
maicher@informatik.uni-leipzig.de
Fri, 25 Jun 2004 09:15:39 +0200
> Dipl.-Wirtsch.-Inf. Lutz Maicher [Universität Leipzig] wrote:
>
> > But for the development of the SIM we need a testbed. We need two Topic
Maps
> > which describes similar domains with "generic" Subjects. Unfortunately
we
> > don't have such Topic Maps. If anybody can aid us with some data we are
> > pleased with your contact.
> >
> > If we have first results we will post it at the mailinglist in hope for
a
> > vital discussion. If anybody is interested in advance we look forward to
> > your questions.
>
> Thomas B. Passin wrote:
> How large do you want the topic maps to be?
For evaluating the measure we have three aspects to consider:
1. Precision: how many of the mergings proposed by the SIM were accepted by
the user?
2. Recall: how many of the mergings deemed necessary by the user were
proposed by the SIM?
3. Performance: time complexity for calculating the SIM for a given pair of
topics AND for the a whole Topic Map (in the latter case, some pairs of
Topics might be eliminated straight away).
For the first two aspects I guess Topic Maps with several hundered Topics
will be good. For the last aspect big is beautiful.
But indeed more important than largeness is the character of the given
Topic Maps. We need two Topic Maps which are part of the same domain
(because we calculate Subject Identity). Mainly we calculate the SIM from
the Topic characteristics. Therefore we need Topic Maps where these
charcteristics are sufficiently stuffed. And in addition I think that Topic
Maps which are made for humans (less conform to any kind of logical
constraint) are better than Topic Maps which are made for some logical
reasoning.
Chears Lutz