[topicmapmail] Creating Topic-Map-View of relational data

Dipl.-Wirtsch.-Inf. Lutz Maicher [Universität Leipzi g] maicher@informatik.uni-leipzig.de
Fri, 26 Nov 2004 11:06:30 +0100


Dear Rani, Jan, Lars Marius, Christian,
Dear all,

thanks for your first statements regarding my request to the creation of 
Topic Map Views about relational data. Because a lot of questions arose I 
try to answer all within this mail.

* Jan Algermissen wrote
> *why* do you face that challenge? What is your scenario?

* Lars Marius Garshol
>  c) do you plan to produce generic software, a running solution, a
>       paper, or some combination of these?


I'm still researching in the further development of the Subject Identity 
Measure [1], especially for the use in EII [2] scenarios. Maybe I have to 
underline, that I'm solely interested in the semantic integration, aside of 
all technical problems of create-read-update-delete in heterogeneous, 
distributed environments.

My research is driven by the observation, that the use of PSI has some 
crucial points (nevertheless PSIs are the best known approach to define the 
intended relationship between a Topic and its Subject, including a defined 
behaviour in exchange scenarios, but):

(a) "Subjects" don't have such sharp borders like PSIs feign [4].
(b) In distributed environments the use of a common vocabulary (PSIs) has to 
be questioned.

Therefore I want to "exploit" all information which is given about a Subject 
of a Topic to decide whether the Subject of an opposite Topic might be 
*similar*. I foresee TMAs with equality rules which bases on such Similarity 
functions to allow information integration in heterogeneous, distributed 
environments. Let's call these methods Subject Similarity Service (only to 
have a name for that).

My ideas are discussed in more detail in [3]. The main questions are:
(1) Which methods a Subject Similarity Service should use to decide Subject 
Similarity?
(2) I hypothesise that the quality of the Service depends on the structural 
attributes of the underlying Topic Maps. How the quality of the Service 
depends on these structural attributes?

To answer the second question, I have to create flexible testbeds. I want to 
"transform" existing relational data (always the same) in different Topic 
Maps (Views) with different structural attributes. From my point of view, I 
only want to "parameterise" the transformation to get a new test series for 
my empiricism.

* Lars Marius Garshol
>  a) how directly should the resulting topic map reflect the
>     underlying RDBMS? (Ie: do you expect to do normalization or
>     create a simplified view; do you want to do string processing on
>     values in the database, etc?)

The Topic Map doesn't reflect the structure of the underlying RDBMS. I want 
to extract data from the RDBMS to create Topics which reflect my one, 
arbitrarily defined Subjects (i. e. I may have a table "author" in my 
database, but for my empirical tests I create an Occurrence of type "author" 
for each book which contains the name of the author).

Additionally I think, that copying all needed data from the RDBMS into my 
Topic Map is sufficient for my purposes.

* Lars Marius Garshol
>  b) are you looking for a procedural approach (do this, then do that,
>     then do like this) or a declarative approach (this table is a
>     topic type; this table an association type; ...)

Thats a good question. I foresee the following (naive) process which 
consists of:
1. SQL statement --> returns a result set of some columns of two types (by 
virtue): Subject Identity Columns and Value Columns
2. Define for each entry of a Value Column the Subject Identity of the Topic 
(and the Topic Characteristic) where it belongs to.
3. If this Topic (or the regarding Topic Characteristic) doesn't exist -> 
create it
4. Append the value to the Topic Map

Example:

SELECT  ISBN, title FROM ... WHERE ....

Subject Identity Column: ISBN
Value Column: title

[addValue(TopicIdentity, TopicCharacteristic, Value)]
addValue(ISBN, topicName, title)

But this is only a very vaque idea. I'm open to all expertises which 
simplify that mapping.

[One remark: I might be a bit confusing that on the one hand I negate the 
existence of "sharp" Subjects and on the other hand I want to use something 
like "Subject Identity Columns" to define the Subjects in my Topic Map 
Views. This is due to the fact that for the empirical tests I always need an 
objective criterion to decide whether to Topics represent the same Subject. 
That means, I decide with the help of the Subject Similarity Service whether 
two Topics *might* represent the same Subject (of course without using 
something like the ISBN to decide the similarity). After that I use the ISBN 
to calculate precision, recall etc.of the current test serie].

I'm looking forward your responses!

Greeting from Leipzig
Lutz

References:
[1] http://www.informatik.uni-leipzig.de/~maicher/forschung2.html#[maic04b]
[2] http://www.informatik.uni-leipzig.de/~maicher/forschung3.html#[maic04d]
[3] 
http://www.informatik.uni-leipzig.de/~maicher/forschung1.html#[asvWS0405]
[4] 
http://www.idealliance.org/papers/extreme03/html/2003/Kent01/EML2003Kent01-toc.html