[topicmapmail] PSIs - alternatives

Simon Grant asimong at btinternet.com
Fri Jun 23 05:41:00 EDT 2006


At 08:44 2006-06-23, Steve Pepper wrote:
>My main point is that we should be extremely careful about keeping agreement
>concerning the *identity* of a subject separate from agreement concerning
>*opinions* about a subject. We should therefore avoid overloading the PRD
>with assertions, especially machine-processable assertions.

OK.

>However, assertions about the *PRD* (as opposed to assertions about the
>subject it describes) are another matter. I have no problem with either
>human-readable or machine-processable metadata about the PRD, such as its
>publisher, type, date, version, etc. Such metadata can play an important
>role in your 2. ("whatever else may be necessary towards a self-sustaining
>infrastructure and motivation to use it").
>
>Whether references to other PRIs deemed to be equivalent belong here is
>something that needs to be discussed more fully.

(I will use "PSI" for continuity with previous discussions and 
terminology, in place of "PRI")

Agreed. To set this in the context of the above position, references 
to other PSIs would be limited to being assertions on the 
equivalence, or non-equivalence of other PSIs and their related PRDs.

>How would they be used? Certainly not when actually merging topic maps,
>because the whole point of the PRI/PRD mechanism is that computers do not
>need to dereference the PRI in order to ascertain if two things are the
>same: they simply compare strings.
>
>I guess there could be a use for harvesters that go round collecting such
>equivalences in order to build mapping tables, but could the resulting
>tables really be trusted? The minter of PRI "X" might claim that it is
>equivalent to somebody else's PRI "Y", but who's to say whether the minter
>of PRI "Y" agrees with that assertion?

Easily arranged. If there is a reciprocal link, then it can be 
trusted (to the extent that anything can be trusted - not 
absolutely). You can claim as much as you like, hopefully, that your 
PSI is equivalent to my PSI, but what counts is whether I am prepared 
to recognise that by including the reverse equivalence in my PRD. A 
stronger approach than leaving a PSI out would be to explicitly mark 
one's own PSI as different from another PSI.

>At the very least there are issues here that need to be thought through more
>carefully before designing a complete solution. My initial goal with the PRI
>initiative is to go for a less complete but more immediately adoptable
>solution and then see how things develop.

Adoptable is not necessarily identical to effective. I suppose we 
want an optimal balance between ease of adoptability and 
effectiveness for things people want to do. I'd say this is where 
traction comes from.

>| My suggestion addresses both the situations where people won't do
>| the merging, and the situation where alternatives exist prior to
>| merging. Having a list of accepted equivalent PRIs included means
>| that the machine comparison is still relatively easy - it would
>| involve fetching the two PRDs and then string comparison of two
>| lists against each other. That's all.
>
>I certainly think there is a place for services along these lines. I've
>already created one myself: the mapping between ISO 3166 and CIA country
>codes. All I'm questioning is whether it makes sense to include the ability
>to specify mappings in the PRD itself. I want PRDs to be as simple as
>possible, and I want to encourage reuse rather than a free-for-all. That's
>why I suggested a standard way to deprecate one's own PRI in favour of
>someone else's: We would get the mapping you want while at the same time
>making it clear which is the preferred PRI.

Well, deprecation is easy in terms of the format I suggest. Imagine 
first a three-part PRD
1. metadata about the PRD
2. human-oriented description
3. list of equivalent PSIs
The convention could be that a PSI includes itself in the list of 
equivalent PSIs to indicate that it is to be regarded as current. 
Leaving ones own PSI out of the list would indicate self-deprecation: 
"don't use me, use one of these other ones".
Straightforward superseding would be by placing exactly one PSI, not 
one's own, in the list of equivalents.

I still think one extra part would be optimal:
4. list of PSIs that are explicitly marked as different, if they 
were, for any reason, likely to be mistaken as candidates for 
equivalency. It would be like saying "I've thought about those ones 
and, no, they aren't the same thing".

I also like the idea of "harvesters" crawling the web of connections 
to find reliable sets of equivalent PSIs, as well as noting 
differences. This could be part of a PSI management tool. It would be 
good to collaborate in designing such a tool: checking whether PSI 
marked as equivalent had changed their PRD; finding out extra PSIs 
that had been added to another PRD, so that they can be presented for 
human decision whether to add to one's own set; etc.

Simon



More information about the topicmapmail mailing list