[topicmapmail] Expressive capabilities of Topic Maps

Thomas B. Passin tpassin@comcast.net
Sat, 13 Sep 2003 00:38:54 -0400


[ <jalgermissen@topicmapping.com>]

> > A property is usually thought of as a (name,value) pair - or (uri,value)
> > pair.  But the name implicitly carries a type (or the name directly
> > represents the type, depending).
>
> Ok, a property is a class of property/value pairs, do you agree on
> that.
>

Well, I would tend to call it a "property type" - as opposed to a
"property", by which I usually mean an instance, i.e., a name/value pair.
But I think we mean the same, except that I think a property can also be
defined by its intension as well as its extension.

>  An occurrence is basically a tuple (type,
> > value, list-of-scopes).  The extra overhead has two parts
> >
> > 1) The list of scopes.  If you don't need them, that overhead goes away.
>
> No, the SAM draft requires a scope.
>
But the scope does not have to contain anything, thus it can be of as
optional.  I am referring here to a topic map that you design for a
particular purpose, so that you could know that scopes werre not required.
My point was that, for particular purposes, you can optimize an application
by not including scopes, or by using two types of property implementations,
one with scopes and one without.

> > 2) The optional for the value to be either data or a reference to an
> > addressable resource.  This one cannot easily go away.
>
> There is also (at least an empty reference) to a reifier and a set
> of source locators. Assuming 4 bytes for an integer or a pointer,
> this is 8 bytes (plus the ones you mention above). Compared to
> storing e.g. the age as a value this adds up quite a bit if you
> have large datasets.
>

Well, along with the age value you have to store (a reference to) its type,
which would presumably be at least an integer in size.  If you do not, you
don't know what kind of thing the age value represents.  In a regular table
structure, you don't have to have the type reference, but only because the
application knows that the "age" slot will always be in the same place in
the data structure.  This is the rigidity imposed by a relational style
table.

> >
> > So an occurrence is not necessarily that high in overhead - it is a
(type,
> > value) pair with some additions.  If you do not need the additions, you
can
> > avoid their overhead.  It is up to the implementation designer to figure
out
> > how to omit the overhead when it is not needed and to keep it when it
is.
>
> I cannot see how that would work ( I know what you mean, but you cannot
> simply erase everything or you'd need to specialize the TM system).
>
> > That may not be so easy.  As an illustration, there is a type of data
> > structure called a "Judy array" (if I am remembering it right) - a C++
thing
> > that is like a very fast sorted hash table.  It is supposed to be just
> > slightly slower than an bare array, even for large sizes.  One of its
claims
> > to fame is that it is specifically designed to minimize forcing the
> > processor to reload its cache - that is how the Judy gets its speed.
>
> So, let's judy-fy our TM engines ;-)
>
>
>  It
> > uses, I think, some 22 different data structures behind the scenes to
get
> > this performance, but the user programmer does not need to know anything
> > about that.
>
> Interesting stuff. Have to look at that.
>

http://judy.sourceforge.net/

> > Do you propose adding another structure to topic maps - a plain
(type,value)
> > pair?  That would be the other approach.  That would be the lowest
overhead
> > you could get for a property.
>
> I am not proposing that, it is in the Reference Model already.
> [ http://www.isotopicmaps.org/TMMM/TMMM-latest-clean.html#parid6412 ]
>

I am still getting up to speed on the RM ... looks like they are allowing
scopes to be TMA-conferred rather than be native (to avoid that discussion
about non-conferred).

Cheers,

Tom P