[topicmapmail] Expressive capabilities of Topic Maps
Thomas B. Passin
tpassin@comcast.net
Thu, 11 Sep 2003 22:35:01 -0400
[ <jalgermissen@topicmapping.com>]
> > * Lars Marius Garshol
> > I think internal occurrences are as simple as anything is likely to be
> > within topic maps. What is it you think is too complicated about them?
> > Are they difficult to use?
>
> No, of course they are not difficult to use. My point is that they add
> substantial overhead that is not neccessary when one is only interested
> in creating a simple property. I can't understand how we want to do
> information management on serious data sizes and not give topic maps the
> ability to represent simple properties with as little overhead as
> possible. Again, if I have millions of, for example, persons with
> their ages I see absolutely no reason to require topic map users to
> create an object for each person<->age relationship, give that object
> a scope, a type etc..
>
How you implement an occurrence depends on the implementation, and you might
be able to make it very low overhead. What I do not want to see is a model
where there is no difference between an occurrence and an association,
because then how would I know which ones to put in my low-everhead
occurrence and which ones have to go in a higher-overhead association?
> Consider an owner of data sets of substantial size, that are just
> perfectly suited for modeling them on the basis of the entity-
> relationship model, except that the data owner would like to
> take advantage of the merging capability of topic maps (1).
> Maybe the owner needs to plan for future integration of his data
> base with those of partners. Maybe the data to be modeled consists
> 90% of simple properties (persons with age,street,zipcode,department;
> goods with weights, prices, amount in stock) (2). Requireing to
> use occurrences for all these properties does not make Topic
> Maps look like a good choice (allthogh their capabilities are
> explicitly demanded).
>
> Why are you not concerned about this?
>
A property is usually thought of as a (name,value) pair - or (uri,value)
pair. But the name implicitly carries a type (or the name directly
represents the type, depending). An occurrence is basically a tuple (type,
value, list-of-scopes). The extra overhead has two parts
1) The list of scopes. If you don't need them, that overhead goes away.
2) The optional for the value to be either data or a reference to an
addressable resource. This one cannot easily go away.
So an occurrence is not necessarily that high in overhead - it is a (type,
value) pair with some additions. If you do not need the additions, you can
avoid their overhead. It is up to the implementation designer to figure out
how to omit the overhead when it is not needed and to keep it when it is.
That may not be so easy. As an illustration, there is a type of data
structure called a "Judy array" (if I am remembering it right) - a C++ thing
that is like a very fast sorted hash table. It is supposed to be just
slightly slower than an bare array, even for large sizes. One of its claims
to fame is that it is specifically designed to minimize forcing the
processor to reload its cache - that is how the Judy gets its speed. It
uses, I think, some 22 different data structures behind the scenes to get
this performance, but the user programmer does not need to know anything
about that.
Now I do not want to have to __write__ such a beast! But with topic maps,
you generally need to provide for scopes because you might end up needing to
use them. So you cannot just have properties without them. Same for ref
vs. data value.
When we have a schema/constraint language, it may become possible to avoid
some kinds of overhead by letting some types, for example, be specified in a
schema so they do not have to be put into every instance. That is the way
it works with a relational database. It is the flexibility and options in
topic maps that leads to both their flexibility and their overhead.
Do you propose adding another structure to topic maps - a plain (type,value)
pair? That would be the other approach. That would be the lowest overhead
you could get for a property.
> ...
> Hmmm, assuming maybe 20 bytes for a simple property value, I'd consider
> the size of an occurrence item substantial (3).
>
>
> Jan
>
> ...
> (3) Definition of occurrence item in the latest SAM draft:
> http://www.isotopicmaps.org/sam/sam-model/#sect-occurrence
>
Well, here I agree with you about overhead, but the extra overhead here is
mostly about the locator. Any reifying topic would not be part of the
occurrence, and I assume the "reifier" would be simply an id for the
occurrence so it could be referred to. The locator, fortunately, is
optional, so you can just not bother with it.
Next, your occurrence would not really have to be an object per se, as long
as it were to behave right when the map is addressed through its API or (in
the future) query language. I don't think the SAM is trying to prescribe
the actual classes to be implemented, just that things should behave "as if"
they were that way.
Again, what do you propose as an alternative? I fully agree with you that a
simple, low-overhead way to capture and retrive properties - literal
values - is very important - though I think that it must be possible to type
the values, too.
Cheers,
Tom P