[topicmapmail] Re: Use and abuse of occurrence RE: [topicmapmail] AreFacetsReallySimple After All?

Kal Ahmed kal@techquila.com
01 Dec 2003 22:15:09 +0000


On Mon, 2003-12-01 at 21:59, Jan Algermissen wrote:
> Kal Ahmed wrote:
> 
> > > > I am trying to keep things clear. Obviously failing ;-) There are
> > > > properties that express the subject address and subject indicators of a
> > > > topic. But the values of those properties are not typed.
> > >
> > > So, what data type does your software use to represent them? Pointer to void?
> > > java.lang.Object?
> > >
> > > Sure they have a type: set of locators....that's even in the SAM draft.
> > >
> > 
> > Now its my turn to say Huh? When you say "type", what do you mean, the
> > class of the allowed value, 
> 
> Sure. You wrote: "But the values of those properties are not typed".
> 
> And I answered: They *are* typed.
> 

And I didn't and don't understand what you mean by typed.

> 
> >
> > > > Why do you want to prevent this use case ?
> > >
> > > I don't want to prevent it, I just don;t want it to drive the design decisions of
> > > the model.
> > >
> > 
> > So if the model is designed not to support scope, what do I do ?
> 
> Use associations.
> 
> IOW: If you regard the relationship between a subject and a certain property as
> a subject in it's own right, then make that relationship an association, otherwise
> don't.
> 
> This is absolutely equivalent to entity-relationship modeling: If a relationship
> needs attributes of its own, make that relationship class a table, otherwise make
> it a column.
> 

So your property is now just a special case of an association ? Then
what is your point ?


> > > >
> > > > > If you need the overhead: reify the property-value and use an assertion, what is
> > > > > the problem?
> > > > >
> > > >
> > > > What is the problem with scope being optional ?
> > >
> > > You need an object (in the sense of some structure) to attach the information too.
> > >
> > 
> > Only if the scope property has a value, otherwise you could assume it
> > was the empty set. 
> 
> What aboutthe unconstrained scope? Did that go away? No scope is something else
> that the unconsrained scope, or?
> 

You encode your optimisation as you see fit. I would say that no scope
== unconstrained scope as that tends to be more common than "the scope
of no themes".

> 
> Its not a big coding overhead as optimisations go, is
> > it ?
> 
> Suppose you have a property whose values represent latitude/longitude information
> about a subject. How would you provide an R-Tree index in this property if the
> values are to be stored as occurrences (that happen to be of type string according
> to the data model)?
> 
> When using a property/value storage mechanism the index creation can be driven by the
> value type of the property. How is this supposed to work with the topic maps data
> model?  
> 

Since when has an special data type index been a modelling issue ? And
how does your proposal help ?


> > 
> > > If you have simple properties, all you need is the value type of the property and
> > > the values (like an RDBMS stores the values of a given attribute,property,column
> > > (or whatever you call it)). Can you imagine using extra objects to store all those
> > > values? Thats exactly what occurrences require.
> > >
> > 
> > Optimise.
> 
> For all occurrences or again depending on the particular use case?
> 

I'm not going to tell you how to write your software. Do what you think
works. You are not prevented from doing so.

> > 
> > >
> > > >
> > > > > >
> > > > > > Both type and scope are optional.
> > > > >
> > > > > I am not only taling about type and scope! You still need more complexity
> > > > > to store an occurrence than to store a value.
> > > > >
> > > >
> > > > An occurrence consists of a value (string or locator), a type, and a
> > > > scope. Thats it. Where is this additional complexity.
> > >
> > > suppose I store the value (say the integer 45) of the property "age", all it
> > > takes is storing the value, 4 bytes maybe.
> > >
> > Optimise
> > 
> > > The additional copmplexity is the whole object that glues the occurrence stuff
> > > together.
> > >
> > 
> > Optimise
> 
> 
> Sorry Kal, that is not very persuading....
> 

Well that gives me little to go on to continue persuading :-)


> > 
> > >
> > > > > > What would a property facility provide that is not provided by
> > > > > > occurrences ?
> > > > >
> > > > >
> > > > > * less overhead for topic map authors
> > > >
> > > > In what sense? Syntactically ? I thought we agreed not to discuss the
> > > > XTM syntax and its shortcomings. If you mean that occurrences carry
> > > > semantic baggage that is overhead for topic map authors, perhaps that is
> > > > true to some extent. But I don't think that learing "occurrence = type,
> > > > scope and value" is a lot of "overhead".
> > >
> > > It surely needs more space in a given document (if we consider the syntax)
> > > but it also increases the storage size (see above).
> > >
> > 
> > Storage size in the database created from parsing the syntax ? In which
> > case, see above ;-)
> > 
> > As for the size of the XML document, an XTM occurrence element's
> > instanceOf and scope children are both optional. I don't see how you can
> > get any more compact.
> > 
> > > >
> > > > > * more space and time efficient implementations
> > > > >
> > > >
> > > > Surely that is an implementation consideration. You could do that now
> > > > with occurrences. Perhaps setting aside a special table for occurrences
> > > > of a specific type
> > >
> > > Ah....so I need to specialize my storage for certain kinds of occurrences?
> > > For example, if I want to store ages, I could tell the implementation to
> > > not create all the occurrences but just store the values?
> > >
> > >
> > > Really, why does it have to be so complicated? I don't get it.
> > >
> > >
> > 
> > I don't see the complexity. You want to optimise so JFDI.
> > 
> > >
> > > > or of a specific meta-type. Perhaps restricting such
> > > > occurrences to contain only string values. Perhaps with some other funky
> > > > application-specific optimisations. No one is stopping you. No one is
> > > > stopping you from creating your own syntax either. But I still fail to
> > > > see the need to change the Topic Maps Data Model.
> > >
> > > The drafted topic maps data model is a bit like an entity relationship
> > > model that has 4 or 5 predefined attributes...
> > >
> > > The current draft says:
> > >
> > > "Occurrences are essentially a specialized kind of association,..."  (5.7)
> > > "Essentially, a base name is a specialized kind of occurrence,..." (5.5)
> > >
> > > Then you say it is ok and common practice to use occurrences to represent
> > > properties.
> > >
> > > In addition there are specialized properties in the model (subjectAddress,
> > > SubjectIndicators, SourceLoactors) that are *NOT* represented as occurrences.
> > >
> > >
> > > Why are all the specialisations in place? Why not use associations for
> > > everything since all the other stuff is *essentially* just an association?
> > >
> > > I see absolutely no reason for all the specialized items. Can you tell me
> > > (and hopefully convince me) why they are there?
> > >
> > 
> > I don't think I can convince you because I don't think that you would
> > agree with my world-view, and to explain my world view and discuss that
> > with you would take more typing than I care to commit to without a book
> > deal ;-). I'll try and do it in a paragraph or two, but I'll probably
> > miss something out. I look at it like this:
> > 
> > There is a cost to explaining the topic maps model. It is not like RDF
> > where you say "its just a graph" and hope that your tutee knows what a
> > graph is (I've sat in more than one RDF presentation that takes this
> > approach and for CS graduates it usually works...). In explaining the
> > topic maps data model you need to explain topics, occurrences and
> > associations. I often draw a diagram that I first saw in a presentation
> > by Steve Pepper (I think) with topics in a "topic space", and resources
> > being mapped in a "resource space" where only occurrences cross the line
> > between the two. That makes sense to people. There are things that
> > connect topics to resources, and things that connect topics to topics.
> > One is not merely syntactic sugar for the other. I don't agree with the
> > "X is essentially just a form of Y", because basenames are special,
> > occurrences are special and they are special becauase they have unique
> > meaning in the topic map data model.
> > 
> > Similarly there are special properties which confer identity on topics.
> > Occurrences do not confer identity on topics. Nor do associations. Only
> > subject addresses and subject indicators do that (and I hope you are not
> > going to ask me to conflate those).
> > 
> > Reductionism only gets you so far. It gets you to a graph and then you
> > have to do the "Oh, but when you have that kind of node, you do this"
> > dance. Its a hell of a lot easier to my mind to say "Occurrences connect
> > topics to resources, associations connect topics to topics" than to say
> > "look at all your connected nodes if its one of these then it means
> > this, if its one of those then it means the other". Sure it might make
> > sense from an implementation point of view (though its not an approach I
> > chose to take) - but its hellish difficult to explain.
> 
> Hmm, if the goal of the data model is to be easy to explain then I understand you,
> that's just not my goal...
> 

Perhaps it isn't but I hope you agree that making a data model that is
easy to explain and easy to understand is not unimportant.

> >
> > > Please keep RDF out of here....it is really so different and has nothing
> > > to do with what I am talking about.
> > >
> > 
> > "You started it" ;-) You said (I paraphrase) "Why does topic maps make
> > it so hard to encode my RDF DC properties". I thought that was at least
> > part of your problem with occurrences.
> 
> I was talking about storing Dublin Core properties in a topic map, that they
> may come in RDF was a side issue - propably not clear, sorry.
> 

No problem - its all getting a bit confused in here. I think I have made
my points:

1) You can express properties of subjects with topics
2) You can optimise engines to improve efficiency of storage if that is
your concern
3) You can create a custom syntax with a mapping to 13250-2 if file size
is your concern.
4) You can express ISO 13250 facets with topics and occurrences.
5) I think the balance between the cognitive overhead imposed by a
larger number of object classes and the cognitive overhead imposed by
trying to mentally parse an abstract graph-based model is right in XTM /
ISO 13250:2002

I suspect that it is really only (5) that we fundamentally disagree on.

Cheers,

Kal
-- 
Kal Ahmed, Techquila
Standards-based Information Management
e: kal@techquila.com
w: www.techquila.com
p: +44 7968 529531