[topicmapmail] Testbed for Subject Identity Measure
Kal Ahmed
kal@techquila.com
Fri, 25 Jun 2004 08:14:49 +0100
Hi Lutz,
If you want some pretty regular topic maps then you could use the
TopicMapDoclet that is a (not widely known) part of TM4J. You can simply
run it with the normal javadoc command over any Java source code. It
will pick up the comments as well as creating a topic map reflecting the
structure of the code - depending on your code base, that could end up
being a pretty big topic map (point it at a few Apache projects to make
a monster topic map!)
I have been experimenting with topic maps of Parliamentary debates and
votes - its still in an early stage at the moment but it can be
automated so it makes it easy to generate a lot of topic map data and
you can probably hack the code to insert a lot more of the text of the
debates (not currently a feature of my topic map output). Let me know if
that might be of interest to you.
Cheers,
Kal
On Thu, 2004-06-24 at 09:56, Dipl.-Wirtsch.-Inf. Lutz Maicher
[Universit=C3=A4t Leipzig] wrote:
> Dear all,
>=20
> In our current research project at University of Leipzig, department of N=
LP,
> we develop a tool for the automatic generation of Topic Maps from texts i=
n
> distributed environments. As a part of this project we research on mergin=
g
> of distributed Topic Maps.
>=20
> In such distributed scenario the equality rules of TMDM failure because
> distributed Topic Map authors maybe don't agree about a common vocabulary
> for declaring Subjects. Therefore we develop a SIM (Subject Identity
> Measure) which bases on language independent NLP algorithms. This SIM is
> some kind of likelihood whether two Topics describe the same Subject. The
> value of this measure may support users to decide which Topics should be
> merged if two distributed Topic Maps concur. This approach might be
> interesting especially for Topic Maps which make assertions about generic
> Subjects, for example: "Introduction of quality management in our company=
".
>=20
> But for the development of the SIM we need a testbed. We need two Topic M=
aps
> which describes similar domains with "generic" Subjects. Unfortunately we
> don't have such Topic Maps. If anybody can aid us with some data we are
> pleased with your contact.
>=20
> If we have first results we will post it at the mailinglist in hope for a
> vital discussion. If anybody is interested in advance we look forward to
> your questions.
>=20
> Best regards
> Lutz Maicher
>=20
> _________________________________________________________________________=
___
> _____
> Dipl.-Wirtsch.-Inf. Lutz Maicher
> Graduiertenkolleg Wissensrepr=C3=A4sentation | Universit=C3=A4t Leipzig
> Abteilung Automatische Sprachverarbeitung | Institut f=C3=BCr Informatik =
|
> Augustusplatz 10-11 | 04109 Leipzig
>=20
> fon 0341 97 32 303 | mail: maicher + informatik.uni-leipzig.de
> http://www.informatik.uni-leipzig.de/~maicher/
>=20
>=20
>=20
> _______________________________________________
> topicmapmail mailing list
> topicmapmail@infoloom.com
> http://www.infoloom.com/mailman/listinfo/topicmapmail
--=20
Kal Ahmed <kal@techquila.com>
techquila