Use and abuse of occurrence RE: [topicmapmail] AreFacetsReally Simple After All?

Jan Algermissen algermissen@acm.org
Mon, 01 Dec 2003 20:23:30 +0100


Kal Ahmed wrote:

> > Well...I seek the opinions of end-users.
> >
> 
> Hmm, but I am an end user. I just happen to develop software too.

Would you mind to change your software to handle an extra element if
you happened to be persuaded by me....?



> > > The subject address and subject indicators do not need typing,
> >
> > Huh? Sure those properties have a type - how else could one make
> > sense of the values?
> >
> 
> I am trying to keep things clear. Obviously failing ;-) There are
> properties that express the subject address and subject indicators of a
> topic. But the values of those properties are not typed. 

So, what data type does your software use to represent them? Pointer to void?
java.lang.Object?

Sure they have a type: set of locators....that's even in the SAM draft.


> The
> [occurrences] property of a topic consists of a sequence of typed
> values.

And? What is your point here?  (the values are occurrence items, yes?)



> 
> > > they do
> > > not need scoping. But given the topic "Jim" and the string "34", how do
> > > I establish the age property without a type.
> >
> > I am not saying that property values do not need a property type! All I am
> > saying is: lower the required overhead.
> >
> 
> OK, so you conceed that typing is not an overhead.
> 
> > > How do I establish that my
> > > statement is only valid in the context of "my best guess from looking at
> > > Jim" ? I would need scope to do that.
> >
> > Is that the usual use-case for simple properties????
> >
> 
> Why do you want to prevent this use case ?

I don't want to prevent it, I just don;t want it to drive the design decisions of
the model.

> 
> > If you need the overhead: reify the property-value and use an assertion, what is
> > the problem?
> >
> 
> What is the problem with scope being optional ?

You need an object (in the sense of some structure) to attach the information too.

If you have simple properties, all you need is the value type of the property and
the values (like an RDBMS stores the values of a given attribute,property,column
(or whatever you call it)). Can you imagine using extra objects to store all those
values? Thats exactly what occurrences require.


> 
> > >
> > > Both type and scope are optional.
> >
> > I am not only taling about type and scope! You still need more complexity
> > to store an occurrence than to store a value.
> >
> 
> An occurrence consists of a value (string or locator), a type, and a
> scope. Thats it. Where is this additional complexity.

suppose I store the value (say the integer 45) of the property "age", all it
takes is storing the value, 4 bytes maybe.

The additional copmplexity is the whole object that glues the occurrence stuff
together.


> > > What would a property facility provide that is not provided by
> > > occurrences ?
> >
> >
> > * less overhead for topic map authors
> 
> In what sense? Syntactically ? I thought we agreed not to discuss the
> XTM syntax and its shortcomings. If you mean that occurrences carry
> semantic baggage that is overhead for topic map authors, perhaps that is
> true to some extent. But I don't think that learing "occurrence = type,
> scope and value" is a lot of "overhead".

It surely needs more space in a given document (if we consider the syntax)
but it also increases the storage size (see above).

> 
> > * more space and time efficient implementations
> >
> 
> Surely that is an implementation consideration. You could do that now
> with occurrences. Perhaps setting aside a special table for occurrences
> of a specific type 

Ah....so I need to specialize my storage for certain kinds of occurrences?
For example, if I want to store ages, I could tell the implementation to
not create all the occurrences but just store the values?


Really, why does it have to be so complicated? I don't get it.



> or of a specific meta-type. Perhaps restricting such
> occurrences to contain only string values. Perhaps with some other funky
> application-specific optimisations. No one is stopping you. No one is
> stopping you from creating your own syntax either. But I still fail to
> see the need to change the Topic Maps Data Model.

The drafted topic maps data model is a bit like an entity relationship
model that has 4 or 5 predefined attributes...

The current draft says:

"Occurrences are essentially a specialized kind of association,..."  (5.7)
"Essentially, a base name is a specialized kind of occurrence,..." (5.5)

Then you say it is ok and common practice to use occurrences to represent
properties.

In addition there are specialized properties in the model (subjectAddress,
SubjectIndicators, SourceLoactors) that are *NOT* represented as occurrences.


Why are all the specialisations in place? Why not use associations for
everything since all the other stuff is *essentially* just an association?

I see absolutely no reason for all the specialized items. Can you tell me
(and hopefully convince me) why they are there?


> 
> >
> > Not enough?
> >
> >
> > > AFAIK RDF statements *are* resources and *do* have identity, so the
> > > assignment of "34" to "Jim" is something that can be identified in the
> > > RDF model, just as an occurrence of the topic "Jim" is something that
> > > can be identified in the XTM model, so I don't see the divide that you
> > > imply here.
> >
> > Seriously: if you have 1 million subjects that all have age, height, weight, etc.
> > you don;t care to create an occurrence foe each of these properties?
> >
> 
> No, I don't mind doing that at all. Why should I ? I can still optimise
> my engine (or as an end user choose the engine that is optimised) to
> handle this case. 

Sou you'd need an optimized engine for the most general case? Sounds
weired, really!


> I am not prevented from doing so by the Topic Maps
> Data Model.
> 
> > I don't get that. Is really everyone comfortable with this?
> >
> 
> I'm speaking only for myself. Not for anyone else.

Sure, but I wonder if there is anyone (besides TM experts) who
thinks that I am right or that I am wrong or whatever.


> 
> >
> > >
> > > >
> > > > * Introducing a new element does not harm existing XTM documents, they
> > > >   can easily be transformed into instances of the new DTD.
> > > >
> > >
> > > Just because it can be done doesn't mean it should be ;-) What is the
> > > real objection here ?
> > >
> > > Is it that you want simpler syntax? I guess not from your comment about
> > > XTM
> >
> > I want a more self-consistent model I think. Why not represent SubjectAddress
> > and SubjectIndicators as occurrences too? Why have additional constructs in
> > the model if occurrence can do that? Why make the model more complicated than
> > it needs to be?
> >
> 
> Because occurrences are not used to establish identity and there is no
> mechanism in topic maps for saying "this property establishes identity"
> and "this property does not".

But why can't that be simply added to the model?  

 Thats one of the powers of the Topic Maps
> Reference Model, but its not part of the Topic Maps Data Model.

Not yet ;-)

[...]

> > So, why is the model more complex than it needs to be?
> >
> 
> Because if you don't you end up with RDF. Sorry, thats a flippant
> answer. 

Please keep RDF out of here....it is really so different and has nothing
to do with what I am talking about.


> The reason is that the topic maps data model provides a certain
> degree of semantics beyond a simple graph of objects with properties. In
> my book thats a Good Thing and I also happen to think that the balance
> in the current Topic Maps Data Model is about right.

Well, you can simply get the core semantics by a core set of association types
and simplyfy the model by throwing out occurrence and basename. IOW, if this
can be done for class-instance and superclass-subclass (both are part of the core
semantics, yes?) why for those and not for occurrence and basename?

It would only make the model simpler. Isn't that a reasonable goal for a
standard?


Jan
> 
> Cheers,
> 
> Kal
> --
> Kal Ahmed, Techquila
> Standards-based Information Management
> e: kal@techquila.com
> w: www.techquila.com
> p: +44 7968 529531

-- 
Jan Algermissen                           http://www.topicmapping.com
Consultant & Programmer	                  http://www.gooseworks.org