Use and abuse of occurrence RE: [topicmapmail] AreFacetsReally
Simple After All?
Kal Ahmed
kal@techquila.com
01 Dec 2003 21:07:30 +0000
On Mon, 2003-12-01 at 19:23, Jan Algermissen wrote:
> Kal Ahmed wrote:
>
> > > Well...I seek the opinions of end-users.
> > >
> >
> > Hmm, but I am an end user. I just happen to develop software too.
>
> Would you mind to change your software to handle an extra element if
> you happened to be persuaded by me....?
>
>
If you persuaded the ISO committee to change the ISO 13250-3 or if you
came up with the new killer syntax that could be mapped to ISO 13250-2,
then yes.
>
> > > > The subject address and subject indicators do not need typing,
> > >
> > > Huh? Sure those properties have a type - how else could one make
> > > sense of the values?
> > >
> >
> > I am trying to keep things clear. Obviously failing ;-) There are
> > properties that express the subject address and subject indicators of a
> > topic. But the values of those properties are not typed.
>
> So, what data type does your software use to represent them? Pointer to void?
> java.lang.Object?
>
> Sure they have a type: set of locators....that's even in the SAM draft.
>
Now its my turn to say Huh? When you say "type", what do you mean, the
class of the allowed value, the [type] property of allowed values, the
property itself, or something else ?
> > The
> > [occurrences] property of a topic consists of a sequence of typed
> > values.
>
> And? What is your point here? (the values are occurrence items, yes?)
>
>
And occurrences have [type] properties.
>
> >
> > > > they do
> > > > not need scoping. But given the topic "Jim" and the string "34", how do
> > > > I establish the age property without a type.
> > >
> > > I am not saying that property values do not need a property type! All I am
> > > saying is: lower the required overhead.
> > >
> >
> > OK, so you conceed that typing is not an overhead.
> >
> > > > How do I establish that my
> > > > statement is only valid in the context of "my best guess from looking at
> > > > Jim" ? I would need scope to do that.
> > >
> > > Is that the usual use-case for simple properties????
> > >
> >
> > Why do you want to prevent this use case ?
>
> I don't want to prevent it, I just don;t want it to drive the design decisions of
> the model.
>
So if the model is designed not to support scope, what do I do ?
> >
> > > If you need the overhead: reify the property-value and use an assertion, what is
> > > the problem?
> > >
> >
> > What is the problem with scope being optional ?
>
> You need an object (in the sense of some structure) to attach the information too.
>
Only if the scope property has a value, otherwise you could assume it
was the empty set. Its not a big coding overhead as optimisations go, is
it ?
> If you have simple properties, all you need is the value type of the property and
> the values (like an RDBMS stores the values of a given attribute,property,column
> (or whatever you call it)). Can you imagine using extra objects to store all those
> values? Thats exactly what occurrences require.
>
Optimise.
>
> >
> > > >
> > > > Both type and scope are optional.
> > >
> > > I am not only taling about type and scope! You still need more complexity
> > > to store an occurrence than to store a value.
> > >
> >
> > An occurrence consists of a value (string or locator), a type, and a
> > scope. Thats it. Where is this additional complexity.
>
> suppose I store the value (say the integer 45) of the property "age", all it
> takes is storing the value, 4 bytes maybe.
>
Optimise
> The additional copmplexity is the whole object that glues the occurrence stuff
> together.
>
Optimise
>
> > > > What would a property facility provide that is not provided by
> > > > occurrences ?
> > >
> > >
> > > * less overhead for topic map authors
> >
> > In what sense? Syntactically ? I thought we agreed not to discuss the
> > XTM syntax and its shortcomings. If you mean that occurrences carry
> > semantic baggage that is overhead for topic map authors, perhaps that is
> > true to some extent. But I don't think that learing "occurrence = type,
> > scope and value" is a lot of "overhead".
>
> It surely needs more space in a given document (if we consider the syntax)
> but it also increases the storage size (see above).
>
Storage size in the database created from parsing the syntax ? In which
case, see above ;-)
As for the size of the XML document, an XTM occurrence element's
instanceOf and scope children are both optional. I don't see how you can
get any more compact.
> >
> > > * more space and time efficient implementations
> > >
> >
> > Surely that is an implementation consideration. You could do that now
> > with occurrences. Perhaps setting aside a special table for occurrences
> > of a specific type
>
> Ah....so I need to specialize my storage for certain kinds of occurrences?
> For example, if I want to store ages, I could tell the implementation to
> not create all the occurrences but just store the values?
>
>
> Really, why does it have to be so complicated? I don't get it.
>
>
I don't see the complexity. You want to optimise so JFDI.
>
> > or of a specific meta-type. Perhaps restricting such
> > occurrences to contain only string values. Perhaps with some other funky
> > application-specific optimisations. No one is stopping you. No one is
> > stopping you from creating your own syntax either. But I still fail to
> > see the need to change the Topic Maps Data Model.
>
> The drafted topic maps data model is a bit like an entity relationship
> model that has 4 or 5 predefined attributes...
>
> The current draft says:
>
> "Occurrences are essentially a specialized kind of association,..." (5.7)
> "Essentially, a base name is a specialized kind of occurrence,..." (5.5)
>
> Then you say it is ok and common practice to use occurrences to represent
> properties.
>
> In addition there are specialized properties in the model (subjectAddress,
> SubjectIndicators, SourceLoactors) that are *NOT* represented as occurrences.
>
>
> Why are all the specialisations in place? Why not use associations for
> everything since all the other stuff is *essentially* just an association?
>
> I see absolutely no reason for all the specialized items. Can you tell me
> (and hopefully convince me) why they are there?
>
I don't think I can convince you because I don't think that you would
agree with my world-view, and to explain my world view and discuss that
with you would take more typing than I care to commit to without a book
deal ;-). I'll try and do it in a paragraph or two, but I'll probably
miss something out. I look at it like this:
There is a cost to explaining the topic maps model. It is not like RDF
where you say "its just a graph" and hope that your tutee knows what a
graph is (I've sat in more than one RDF presentation that takes this
approach and for CS graduates it usually works...). In explaining the
topic maps data model you need to explain topics, occurrences and
associations. I often draw a diagram that I first saw in a presentation
by Steve Pepper (I think) with topics in a "topic space", and resources
being mapped in a "resource space" where only occurrences cross the line
between the two. That makes sense to people. There are things that
connect topics to resources, and things that connect topics to topics.
One is not merely syntactic sugar for the other. I don't agree with the
"X is essentially just a form of Y", because basenames are special,
occurrences are special and they are special becauase they have unique
meaning in the topic map data model.
Similarly there are special properties which confer identity on topics.
Occurrences do not confer identity on topics. Nor do associations. Only
subject addresses and subject indicators do that (and I hope you are not
going to ask me to conflate those).
Reductionism only gets you so far. It gets you to a graph and then you
have to do the "Oh, but when you have that kind of node, you do this"
dance. Its a hell of a lot easier to my mind to say "Occurrences connect
topics to resources, associations connect topics to topics" than to say
"look at all your connected nodes if its one of these then it means
this, if its one of those then it means the other". Sure it might make
sense from an implementation point of view (though its not an approach I
chose to take) - but its hellish difficult to explain.
>
> >
> > >
> > > Not enough?
> > >
> > >
> > > > AFAIK RDF statements *are* resources and *do* have identity, so the
> > > > assignment of "34" to "Jim" is something that can be identified in the
> > > > RDF model, just as an occurrence of the topic "Jim" is something that
> > > > can be identified in the XTM model, so I don't see the divide that you
> > > > imply here.
> > >
> > > Seriously: if you have 1 million subjects that all have age, height, weight, etc.
> > > you don;t care to create an occurrence foe each of these properties?
> > >
> >
> > No, I don't mind doing that at all. Why should I ? I can still optimise
> > my engine (or as an end user choose the engine that is optimised) to
> > handle this case.
>
> Sou you'd need an optimized engine for the most general case? Sounds
> weired, really!
>
You say general case I say your special application. So what. Do the
optimisation. Its really not that hard.
>
> > I am not prevented from doing so by the Topic Maps
> > Data Model.
> >
> > > I don't get that. Is really everyone comfortable with this?
> > >
> >
> > I'm speaking only for myself. Not for anyone else.
>
> Sure, but I wonder if there is anyone (besides TM experts) who
> thinks that I am right or that I am wrong or whatever.
>
It would be interesting to find out. It could be that I'm in a minority
here. ;-)
>
> >
> > >
> > > >
> > > > >
> > > > > * Introducing a new element does not harm existing XTM documents, they
> > > > > can easily be transformed into instances of the new DTD.
> > > > >
> > > >
> > > > Just because it can be done doesn't mean it should be ;-) What is the
> > > > real objection here ?
> > > >
> > > > Is it that you want simpler syntax? I guess not from your comment about
> > > > XTM
> > >
> > > I want a more self-consistent model I think. Why not represent SubjectAddress
> > > and SubjectIndicators as occurrences too? Why have additional constructs in
> > > the model if occurrence can do that? Why make the model more complicated than
> > > it needs to be?
> > >
> >
> > Because occurrences are not used to establish identity and there is no
> > mechanism in topic maps for saying "this property establishes identity"
> > and "this property does not".
>
> But why can't that be simply added to the model?
>
I guess that it could, but I don't see the reason for doing it. I find
that I have all the identity-conferring properties that I need for the
general case, and the standard does not prevent me from implementing
specialised merging rules if I need them and if I am prepared to accept
the portability hit.
> Thats one of the powers of the Topic Maps
> > Reference Model, but its not part of the Topic Maps Data Model.
>
> Not yet ;-)
>
Good luck ! :-)
> [...]
>
> > > So, why is the model more complex than it needs to be?
> > >
> >
> > Because if you don't you end up with RDF. Sorry, thats a flippant
> > answer.
>
> Please keep RDF out of here....it is really so different and has nothing
> to do with what I am talking about.
>
"You started it" ;-) You said (I paraphrase) "Why does topic maps make
it so hard to encode my RDF DC properties". I thought that was at least
part of your problem with occurrences.
>
> > The reason is that the topic maps data model provides a certain
> > degree of semantics beyond a simple graph of objects with properties. In
> > my book thats a Good Thing and I also happen to think that the balance
> > in the current Topic Maps Data Model is about right.
>
> Well, you can simply get the core semantics by a core set of association types
> and simplyfy the model by throwing out occurrence and basename. IOW, if this
> can be done for class-instance and superclass-subclass (both are part of the core
> semantics, yes?) why for those and not for occurrence and basename?
>
Class-instance has specific properties in 13250-2. Superclass-subclass
is not something defined by 13250-2, so it was added later as a set of
PSIs. Thats the way to do this sort of thing.
> It would only make the model simpler. Isn't that a reasonable goal for a
> standard?
>
No. The goals of a standard are adoption and interoperability. I don't
think that the changes you propose simplify the model in a way which
drives towards either of those goals.
Cheers,
Kal
--
Kal Ahmed, Techquila
Standards-based Information Management
e: kal@techquila.com
w: www.techquila.com
p: +44 7968 529531