[topicmapmail] How do you deal with (lack of) association templates ?

Lars Marius Garshol larsga@garshol.priv.no
12 Jun 2002 11:41:17 +0200


* Bernard Vatant
| 
| I was sure you will be the first one to bite :))

Yeah, I'm sure you put bait on the hook especially suited to my
tastes. :-)

Wish I knew how to make you bite on the "scope is union"/"scope is
intersection" (term-scope-def) issue...
 
| I was not clear maybe just to figure what everyone was thinking
| about when speaking of templates.  Let me be more precise: what I
| mean by that is the pattern of roles.
| 
| For example, an "employment" template would be (whatever syntax):
| 
| template (employment) = [employer (1), employee (1+)]
| 
| That means that an association with type "employment" has exactly
| one topic playing the "employer" role and one or more topic playing
| the "employee" role.

Right. Then I understand what you mean.
 
| The use of identifying such a template would be to be able to merge
| e.g.
| 
| -- employment : employer (X), employee(Y)
| -- employment : employer (X), employee(W, Z)
| 
| with X having the same subjectID in the two associations
| into the following
| 
| -- employment : employer (X), employee(W, Y, Z)

Well, the template you gave above doesn't give you enough information
for you to be able to know that that is safe. You have to know that
there will only ever be one "employment" association per "employer",
and the template you gave doesn't tell you that. So doing such merging
wouldn't be safe in this case, because for all the application knows
it might well be that separation of the employees into two groups has
some semantic significance.
 
| Which is very natural. You welcome new members in your association,
| you don't create a new one.

I disagree completely. Binary associations are easier to work with,
easier to optimize, and if you use them then this isn't a problem at
all. It seems clear to me that each employee has a separate employment
relationship to the employer, so there's no loss of information or
abuse of semantics in doing this with binary associations. In those
cases I very much prefer using binary associations, because it saves a
whole lot of trouble.
 
* Lars Marius Garshol
|
| The two associations are not equal, since the last role player is
| different (assuming the two X-es share something that makes them
| merge), so there will be two associations.
 
* Bernard Vatant
|
| That's what is annoying. So if you have 50 employees for the same
| employer, gathered one by one in a workflow, you'll get 50
| associations ... too bad.

Well, it has to be that way, as long as the application doesn't have
enough information to know that it is safe to do further
transformation on the merged associations. In many cases it won't be
safe. Applications that have enough information can of course do the
extra transformations themselves.

I guess this raises the issue of the possible impact of schema
information on merging rules. The SAM needs to be specified in a way
that allows this to be done in a standardized way using TMCL, once
that is ready. (I've now added this as an issue in the SAM document.)

Personally, I would *want* to see 50 associations in this case, as I
said above.
 
* Lars Marius Garshol
|
| In none of these cases will any of the associations be removed.
 
* Bernard Vatant
|
| That's the point. This is not scalable in a workflow environment.
| The topic map will get rapidly cluttered with redundant information.

This information is only redundant if you use n-ary associations in
cases where you don't need to. That's one of several reasons not to do
that. I fail to see any impact on scaling.
 
* Lars Marius Garshol
|
| Well, Bernard, you have to distinguish between what the standard
| requires, and what an application can do to help users. The first
| you must do, and everyone must do it the same way. The second you
| can do, provided the user asks you to (and perhaps even provides you
| with extra information).
 
* Bernard Vatant
|
| Since there is nothing in the standard about it, I'm just curious to
| know how applications and developers deal with it or at least think
| about it ... hoping some day it moves up to a standard. That's my
| point

Ah, I see. I agree that this is a use case that it would be good for
the topic map family of standards to support. This is what I put into
the SAM document just now:

  <issue id="merge-use-of-schemas">
  <p>The presence of a TMCL schema may allow applications to improve
  the result of merging topics/topic maps by providing enough
  information to allow implementations to do additional
  transformations and redundancy removal. How should the SAM
  specification deal with this possibility?</p>
  </issue>

I think this belongs in the realm of TMCL, and that the SAM needs to
be carefully specified in a way that does not keep TMCL from doing so
when the time comes. I guess this is what you are hinting at by
bringing association templates into the discussion, except that
association templates are not really powerful enough to be of much
help in these cases, not even the extended notation that you used
above. 
 
* Lars Marius Garshol
|
| You'd have to explicitly assert that "company" and "employer" are
| the same thing, and perhaps remove the name "company" from the
| merged topic, but that's easy.
 
* Bernard Vatant
|
| Yes. Or given that "company" being NT for "employer", keep "company"
| in the final association.

Not sure exactly what you mean by this. NT is "narrower-term", right?
Better known as subclass? :)
 
* Lars Marius Garshol
|
| - D is different, and clearly requires structural transformations.
|   Our goal is to let the query language deal with that, but for now
|   we would use the API to implement the transformation. It's quite
|   easy, actually.
 
* Bernard Vatant
|
| But there are so many ways to do it ... My point is: how would you
| set some general rules for that?

You mean, "I want a general rule that solves this for all topic maps"?
If we can do that, that would be good. I'm not sure we can. however.
*If* we can, TMCL seems to be the key to doing it. 

| I don't really figure it has to do with the QL. It's a question of
| graph reduction. Do you think graph reduction is in the scope of
| TMQL?

Not at all. I was saying that in the absence of other ways to do it
(you weren't giving much in the way of context, remember) the QL could
be used to implement a solution specific to this application.

| Well. I really think about workflow and automatic process in a
| closed environment, and yes with TNC applying. In the use case I
| think about, the topic map repository is updated either by
| integrated authoring tool or text/metadata mining tool on
| semi-structured documents, in any case using a controlled
| vocabulary, able for example to extract "employer-employee"
| relationships and add them on the fly to the topic map. In that case
| I don't want human intervention in the transformation, and I want
| e.g. every new employee to be added as a member to existing
| association if any, and not have as many associations as employees.

Well, I would change my mind on that. It would make your life a lot
easier. There *are* cases where n-ary associations are appropriate,
but I think that in the cases where they are you will need information
about the world being modelled in order to tell what association to
put the role players in. In other words, the standard probably won't
be able to help you with that.

-- 
Lars Marius Garshol, Ontopian         <URL: http://www.ontopia.net >
ISO SC34/WG3, OASIS GeoLang TC        <URL: http://www.garshol.priv.no >