[topicmapmail] Expressive capabilities of Topic Maps
Thomas B. Passin
tpassin@comcast.net
Tue, 22 Jul 2003 10:00:43 -0400
[ <jalgermissen@topicmapping.com>]
>
> Suppose a company indends to migrate the following relational data to
> topic maps:
>
>
> CREATE TABLE PERSON (
> ID CHAR(10),
> SURNAME CHAR(60),
> FIRSTNAME CHAR(60),
> BIRTHDATE DATE,
> CITY CHAR(30),
> ZIPCODE CHAR(6),
> STREET CHAR(40),
> COUNTRY CHAR(3),
> PRIMARY KEY(ID),
> UNIQUE (SURNAME,FIRSTNAME)
> );
>
> INSERT INTO PERSON VALUES
> ('6657A54','Hansen','Hans','02-02-1970','Hamburg','22607','Im Gehoelz
> 33','FRG');
>
>
> How would the entity Hans Hansen be represented in a topic map (propably
> in XTM syntax),
> assuming that all the attributes are to remain attributes (meaning that
> they are not be
> regarded as subjects in their own right and thus not to be represented
> as topics).
Create a topic for each row in the table. Use the primary key as the
topic's id value (adjusted to meet the lexical rules for an id).
Note that for this to work properly, the table has to be properly
normalized - to third normal form, I think. That is, different cells in a
row only depend on the primary key and not on each other, and the primary
key must be the only discriminant between different rows. Otherwise you
will have to normalize the data.
> How would we 'solve' the following issues:
>
> - How to represent the attributes (e.g. as occurrences/resourceData in
> XTM ??)
Create a topic for each column type. Each cell of a row becomes an
occurrence that is an instance of the topic for that column type.
An alternative is to make an association for the row and to have each cell
become a role-playing topic of the association. You do need to use
associations for join tables, unless you want to reify each occurrence,
since you need to be able to refer to the entries in a join table as things
in their own right (since they represent rows in other tables, and not just
cells).
Actually, if a cell in a row contains a foreign key, it would best be
modeled using an association, since a FK represents a relationship between
two tables. I think, therefore, that a row should be modeled using
occurrences for each cell that is not a FK, and an association for each FK.
This automatically handles join tables because they use FKs.
Some people - Murray Altheim has argued strongly for this though not
specifically in the context of relational tables - want to model all
properties as associations, not occurrences. Doing so would give you more
access to the values in the cells, because they could take part in
associations and therefore you could annotate them or say whatever else you
want about them.
I, on the other hand, would be happy to use occurrences for ordinary
(non-FK) cells for most of the applications I can think of right now. This
gives you the same level of access to the values as you can get from a
relational database (although not more access), which seems reasonable to
me.
> - How to preserve the expressive capability of the original data, such
> as the data types
> of the various properties and the uniqueness constraints
> - how to document/communicate the decisions so that others can
> understand them (like the
> way we can all understand the SQL statements)
>
It depends on whether you mean machine-usable or just human-usable. If
human-only, no problem. Just add an annotation occurrence to the topic for
the column type and write an explanation. To make the data type more
machine-usable, you could add an occurrence to hold a string specifying the
data type. You could make it one of the XML Schema data types, for example,
or an SQL data type. Or you could associate the column type topic to a data
type topic, or make it an instance of a data type topic, but ultimately,
there is no standard way as yet in topic maps to specify data types.
Of course, you would have to write your processor to use this information,
since this approach is not part of any standard. You could publish your
scheme as a kind of topic maps profile, and try to get widespread support
for it.
As for constraints, there is no such standard yet, so again you would have
to roll your own or try to use one of the vendor constraint languages, if it
would suit your needs.
To sum up, topic maps are well suited for modeling relational data, though
of course SQL, as a specialized language, is more compact than a
general-purpose one like XTM, but there are no standards way so far to
indicate data types or specify constraints. It would be fairly easy to
write a processor that took basic SQL table desfinitions and data and
produced a topic map from them (some of the constraints and subtleties would
be harder). We will have a constraint language at some point, but it is not
here yet.
On the other hand, a general topic map processor would be hard-pressed to
match a good relational database at making efficient queries, in storage
efficiency, in applying integrity constraints, and in doing updates. These
capabilities have been developed by relational systems over decades. So in
many cases, it might be better to write a wrapper that makes a relational
database look like a topic map.
In fact, if the database is properly designed - so that it maps naturally
into the kind of structures described above - you could think of the
database as being a kind of specialized implementation of a topic map. As
long as it can produce proper XTM, who is to say it isn't?
Cheers,
Tom P