Use and abuse of occurrence RE: [topicmapmail] AreFacetsReallySimple
After All?
Jan Algermissen
algermissen@acm.org
Mon, 01 Dec 2003 22:59:23 +0100
Kal Ahmed wrote:
> > > I am trying to keep things clear. Obviously failing ;-) There are
> > > properties that express the subject address and subject indicators of a
> > > topic. But the values of those properties are not typed.
> >
> > So, what data type does your software use to represent them? Pointer to void?
> > java.lang.Object?
> >
> > Sure they have a type: set of locators....that's even in the SAM draft.
> >
>
> Now its my turn to say Huh? When you say "type", what do you mean, the
> class of the allowed value,
Sure. You wrote: "But the values of those properties are not typed".
And I answered: They *are* typed.
>
> > > Why do you want to prevent this use case ?
> >
> > I don't want to prevent it, I just don;t want it to drive the design decisions of
> > the model.
> >
>
> So if the model is designed not to support scope, what do I do ?
Use associations.
IOW: If you regard the relationship between a subject and a certain property as
a subject in it's own right, then make that relationship an association, otherwise
don't.
This is absolutely equivalent to entity-relationship modeling: If a relationship
needs attributes of its own, make that relationship class a table, otherwise make
it a column.
> > >
> > > > If you need the overhead: reify the property-value and use an assertion, what is
> > > > the problem?
> > > >
> > >
> > > What is the problem with scope being optional ?
> >
> > You need an object (in the sense of some structure) to attach the information too.
> >
>
> Only if the scope property has a value, otherwise you could assume it
> was the empty set.
What aboutthe unconstrained scope? Did that go away? No scope is something else
that the unconsrained scope, or?
Its not a big coding overhead as optimisations go, is
> it ?
Suppose you have a property whose values represent latitude/longitude information
about a subject. How would you provide an R-Tree index in this property if the
values are to be stored as occurrences (that happen to be of type string according
to the data model)?
When using a property/value storage mechanism the index creation can be driven by the
value type of the property. How is this supposed to work with the topic maps data
model?
>
> > If you have simple properties, all you need is the value type of the property and
> > the values (like an RDBMS stores the values of a given attribute,property,column
> > (or whatever you call it)). Can you imagine using extra objects to store all those
> > values? Thats exactly what occurrences require.
> >
>
> Optimise.
For all occurrences or again depending on the particular use case?
>
> >
> > >
> > > > >
> > > > > Both type and scope are optional.
> > > >
> > > > I am not only taling about type and scope! You still need more complexity
> > > > to store an occurrence than to store a value.
> > > >
> > >
> > > An occurrence consists of a value (string or locator), a type, and a
> > > scope. Thats it. Where is this additional complexity.
> >
> > suppose I store the value (say the integer 45) of the property "age", all it
> > takes is storing the value, 4 bytes maybe.
> >
> Optimise
>
> > The additional copmplexity is the whole object that glues the occurrence stuff
> > together.
> >
>
> Optimise
Sorry Kal, that is not very persuading....
>
> >
> > > > > What would a property facility provide that is not provided by
> > > > > occurrences ?
> > > >
> > > >
> > > > * less overhead for topic map authors
> > >
> > > In what sense? Syntactically ? I thought we agreed not to discuss the
> > > XTM syntax and its shortcomings. If you mean that occurrences carry
> > > semantic baggage that is overhead for topic map authors, perhaps that is
> > > true to some extent. But I don't think that learing "occurrence = type,
> > > scope and value" is a lot of "overhead".
> >
> > It surely needs more space in a given document (if we consider the syntax)
> > but it also increases the storage size (see above).
> >
>
> Storage size in the database created from parsing the syntax ? In which
> case, see above ;-)
>
> As for the size of the XML document, an XTM occurrence element's
> instanceOf and scope children are both optional. I don't see how you can
> get any more compact.
>
> > >
> > > > * more space and time efficient implementations
> > > >
> > >
> > > Surely that is an implementation consideration. You could do that now
> > > with occurrences. Perhaps setting aside a special table for occurrences
> > > of a specific type
> >
> > Ah....so I need to specialize my storage for certain kinds of occurrences?
> > For example, if I want to store ages, I could tell the implementation to
> > not create all the occurrences but just store the values?
> >
> >
> > Really, why does it have to be so complicated? I don't get it.
> >
> >
>
> I don't see the complexity. You want to optimise so JFDI.
>
> >
> > > or of a specific meta-type. Perhaps restricting such
> > > occurrences to contain only string values. Perhaps with some other funky
> > > application-specific optimisations. No one is stopping you. No one is
> > > stopping you from creating your own syntax either. But I still fail to
> > > see the need to change the Topic Maps Data Model.
> >
> > The drafted topic maps data model is a bit like an entity relationship
> > model that has 4 or 5 predefined attributes...
> >
> > The current draft says:
> >
> > "Occurrences are essentially a specialized kind of association,..." (5.7)
> > "Essentially, a base name is a specialized kind of occurrence,..." (5.5)
> >
> > Then you say it is ok and common practice to use occurrences to represent
> > properties.
> >
> > In addition there are specialized properties in the model (subjectAddress,
> > SubjectIndicators, SourceLoactors) that are *NOT* represented as occurrences.
> >
> >
> > Why are all the specialisations in place? Why not use associations for
> > everything since all the other stuff is *essentially* just an association?
> >
> > I see absolutely no reason for all the specialized items. Can you tell me
> > (and hopefully convince me) why they are there?
> >
>
> I don't think I can convince you because I don't think that you would
> agree with my world-view, and to explain my world view and discuss that
> with you would take more typing than I care to commit to without a book
> deal ;-). I'll try and do it in a paragraph or two, but I'll probably
> miss something out. I look at it like this:
>
> There is a cost to explaining the topic maps model. It is not like RDF
> where you say "its just a graph" and hope that your tutee knows what a
> graph is (I've sat in more than one RDF presentation that takes this
> approach and for CS graduates it usually works...). In explaining the
> topic maps data model you need to explain topics, occurrences and
> associations. I often draw a diagram that I first saw in a presentation
> by Steve Pepper (I think) with topics in a "topic space", and resources
> being mapped in a "resource space" where only occurrences cross the line
> between the two. That makes sense to people. There are things that
> connect topics to resources, and things that connect topics to topics.
> One is not merely syntactic sugar for the other. I don't agree with the
> "X is essentially just a form of Y", because basenames are special,
> occurrences are special and they are special becauase they have unique
> meaning in the topic map data model.
>
> Similarly there are special properties which confer identity on topics.
> Occurrences do not confer identity on topics. Nor do associations. Only
> subject addresses and subject indicators do that (and I hope you are not
> going to ask me to conflate those).
>
> Reductionism only gets you so far. It gets you to a graph and then you
> have to do the "Oh, but when you have that kind of node, you do this"
> dance. Its a hell of a lot easier to my mind to say "Occurrences connect
> topics to resources, associations connect topics to topics" than to say
> "look at all your connected nodes if its one of these then it means
> this, if its one of those then it means the other". Sure it might make
> sense from an implementation point of view (though its not an approach I
> chose to take) - but its hellish difficult to explain.
Hmm, if the goal of the data model is to be easy to explain then I understand you,
that's just not my goal...
>
> > Please keep RDF out of here....it is really so different and has nothing
> > to do with what I am talking about.
> >
>
> "You started it" ;-) You said (I paraphrase) "Why does topic maps make
> it so hard to encode my RDF DC properties". I thought that was at least
> part of your problem with occurrences.
I was talking about storing Dublin Core properties in a topic map, that they
may come in RDF was a side issue - propably not clear, sorry.
Jan
--
Jan Algermissen http://www.topicmapping.com
Consultant & Programmer http://www.gooseworks.org