[topicmapmail] Expressive capabilities of Topic Maps
jalgermissen@topicmapping.com
jalgermissen@topicmapping.com
Fri, 12 Sep 2003 10:10:02 +0200
"Thomas B. Passin" <tpassin@comcast.net> schrieb am 12.09.2003,
04:35:01:
> [ ]
>
> > > * Lars Marius Garshol
> > > I think internal occurrences are as simple as anything is likely to be
> > > within topic maps. What is it you think is too complicated about them?
> > > Are they difficult to use?
> >
> > No, of course they are not difficult to use. My point is that they add
> > substantial overhead that is not neccessary when one is only interested
> > in creating a simple property. I can't understand how we want to do
> > information management on serious data sizes and not give topic maps the
> > ability to represent simple properties with as little overhead as
> > possible. Again, if I have millions of, for example, persons with
> > their ages I see absolutely no reason to require topic map users to
> > create an object for each personage relationship, give that object
> > a scope, a type etc..
> >
>
> How you implement an occurrence depends on the implementation, and you might
> be able to make it very low overhead. What I do not want to see is a model
> where there is no difference between an occurrence and an association,
> because then how would I know which ones to put in my low-everhead
> occurrence and which ones have to go in a higher-overhead association?
Hmm, by looking at the association type? To me an occurrence *is* an
association anyway (sure you knew that ;-)
>
> > Consider an owner of data sets of substantial size, that are just
> > perfectly suited for modeling them on the basis of the entity-
> > relationship model, except that the data owner would like to
> > take advantage of the merging capability of topic maps (1).
> > Maybe the owner needs to plan for future integration of his data
> > base with those of partners. Maybe the data to be modeled consists
> > 90% of simple properties (persons with age,street,zipcode,department;
> > goods with weights, prices, amount in stock) (2). Requireing to
> > use occurrences for all these properties does not make Topic
> > Maps look like a good choice (allthogh their capabilities are
> > explicitly demanded).
> >
> > Why are you not concerned about this?
> >
>
> A property is usually thought of as a (name,value) pair - or (uri,value)
> pair. But the name implicitly carries a type (or the name directly
> represents the type, depending).
Ok, a property is a class of property/value pairs, do you agree on
that.
An occurrence is basically a tuple (type,
> value, list-of-scopes). The extra overhead has two parts
>
> 1) The list of scopes. If you don't need them, that overhead goes away.
No, the SAM draft requires a scope.
> 2) The optional for the value to be either data or a reference to an
> addressable resource. This one cannot easily go away.
There is also (at least an empty reference) to a reifier and a set
of source locators. Assuming 4 bytes for an integer or a pointer,
this is 8 bytes (plus the ones you mention above). Compared to
storing e.g. the age as a value this adds up quite a bit if you
have large datasets.
>
> So an occurrence is not necessarily that high in overhead - it is a (type,
> value) pair with some additions. If you do not need the additions, you can
> avoid their overhead. It is up to the implementation designer to figure out
> how to omit the overhead when it is not needed and to keep it when it is.
I cannot see how that would work ( I know what you mean, but you cannot
simply erase everything or you'd need to specialize the TM system).
> That may not be so easy. As an illustration, there is a type of data
> structure called a "Judy array" (if I am remembering it right) - a C++ thing
> that is like a very fast sorted hash table. It is supposed to be just
> slightly slower than an bare array, even for large sizes. One of its claims
> to fame is that it is specifically designed to minimize forcing the
> processor to reload its cache - that is how the Judy gets its speed.
So, let's judy-fy our TM engines ;-)
It
> uses, I think, some 22 different data structures behind the scenes to get
> this performance, but the user programmer does not need to know anything
> about that.
Interesting stuff. Have to look at that.
>
> Now I do not want to have to __write__ such a beast! But with topic maps,
> you generally need to provide for scopes because you might end up needing to
> use them. So you cannot just have properties without them. Same for ref
> vs. data value.
>
> When we have a schema/constraint language, it may become possible to avoid
> some kinds of overhead by letting some types, for example, be specified in a
> schema so they do not have to be put into every instance. That is the way
> it works with a relational database. It is the flexibility and options in
> topic maps that leads to both their flexibility and their overhead.
>
> Do you propose adding another structure to topic maps - a plain (type,value)
> pair? That would be the other approach. That would be the lowest overhead
> you could get for a property.
I am not proposing that, it is in the Reference Model already.
[ http://www.isotopicmaps.org/TMMM/TMMM-latest-clean.html#parid6412 ]
> > ...
> > Hmmm, assuming maybe 20 bytes for a simple property value, I'd consider
> > the size of an occurrence item substantial (3).
> >
> >
> > Jan
> >
> > ...
> > (3) Definition of occurrence item in the latest SAM draft:
> > http://www.isotopicmaps.org/sam/sam-model/#sect-occurrence
> >
>
> Well, here I agree with you about overhead, but the extra overhead here is
> mostly about the locator. Any reifying topic would not be part of the
> occurrence,
But you need to record the fact that the occurrence is not reified.
and I assume the "reifier" would be simply an id
(= 4 bytes ;-)
for the
> occurrence so it could be referred to. The locator, fortunately, is
> optional, so you can just not bother with it.
Optional does not mean that there doesn;t have to be at least an
empty value.
>
> Next, your occurrence would not really have to be an object per se, as long
> as it were to behave right when the map is addressed through its API or (in
> the future) query language. I don't think the SAM is trying to prescribe
> the actual classes to be implemented,
No, of course not. But at some point you have to store the beasts.
just that things should behave "as if"
> they were that way.
>
> Again, what do you propose as an alternative?
See above, the reference model has properties.
Also, another thing to consider: There are lots of properties
(subject identifiers, source locators, types of occurrences,
subject address) and they are all NOT represented with the overhead
of an occurrence, why is that? To me, they are qute important, more
important than a birthdate or an address of a person, but while the
address has scope,reifier and all that, the other properties don't.
I might want to say something about the fact that a certain subject
indicator indicates the subject of a topic. I cannot do that with the
SAM...but I can assert something about the person-age relationship.
That seems weird design to me.
Jan
I fully agree with you that a
> simple, low-overhead way to capture and retrive properties - literal
> values - is very important - though I think that it must be possible to type
> the values, too.
>
> Cheers,
>
> Tom P
>
>
> _______________________________________________
> topicmapmail mailing list
> topicmapmail@infoloom.com
> http://www.infoloom.com/mailman/listinfo/topicmapmail
--
Jan Algermissen <algermissen@acm.org>
Consultant & Programmer
http://www.topicmapping.com
http://www.gooseworks.org