[topicmapmail] RE: [topicmaps-comment] multilingual thesaurus - language, scope
,and topic naming constraint
Bandholtz, Thomas
thomas.bandholtz@koeln.sema.slb.com
Sun, 3 Mar 2002 13:43:35 +0100
This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.
------_=_NextPart_001_01C1C2B1.08E0E180
Content-Type: text/plain;
charset="iso-8859-1"
Steven, sorry for the delay.
> -----Original Message-----
> From: Steven R. Newcomb [mailto:srn@coolheads.com]
> Sent: Saturday, February 02, 2002 9:59 PM
> To: Bandholtz, Thomas
> Cc: 'topicmaps-comment@lists.oasis-open.org';
> topicmapmail@infoloom.com
> Subject: Re: [topicmaps-comment] multilingual thesaurus - language,
> scope ,and topic naming constraint
>
>
> "Bandholtz, Thomas" <thomas.bandholtz@koeln.sema.slb.com> writes:
> > 3. A basename is a basename and not an association.
>
> The above remark is both true and false, depending on
> how you read it.
>
> It is true that a <basename> element is not an
> <association> element.
>
> However, at the most fundamental level of the semantics
> of topic maps, a basename (the name indicated by the
> content of a <baseNameString>) is itself a subject.
> Every <baseName> element makes the assertion that a
> specific subject has, as one of its names, a specific
> name, which itself is a subject. At the most
> fundamental level, the only difference between such a
> "topic-basename" assertion, and any other kind of
> assertion (including all the kinds of assertions that
> one might make via <association> elements) is the
> semantics of the assertion type.
I would not like to call the name of a subject a subject itself.
In the end we would have a world consisting only of two categories: subjects
and assertions between them.
Of course this is a legal high-level abstraction - but what is it good for?
If I publish a subject and i say "my subject has a name" then I do not want
to make the name itself my subject. The name is just an attribute.
I find it useful to distinguish both categories.
> > 4. Back to the roots: The identifier of a topic is ID
> > and not basename.
>
> The deepest roots are not fully visible in the syntax.
> Again, the above remark is both true and false,
> depending on how you read it.
>
> It is true that the identifier of a <topic> element is
> its ID.
>
> It is also true that, in the Standard Application Model
> (SAM) of the Topic Maps paradigm, a subject can be
> addressed by means of its base names. In fact, the
> primary reason for the Topic Naming Constraint is to
> preserve the possibility of unambiguous addressing of
> subjects by means of their base names. (The Topic
> Naming Constraint is that no two subjects can have the
> same name in the same scope.)
In a single XML document, all attributes of type ID have to be unique. A
validating parser will check this. Generally this asures as well that all
internal references (using IDREF) are valid. Based on this, two Topic Maps
may use the same IDs while both of them are valid by themselves. When you
merge them into one document, you will have to unify the IDs and IDREFs by
replacing them in at least one of the two source documents. This means: IDs
are a moving target - as long as I cannot asure that nobody else in the
world uses the same IDs like me.
I think this shall be overcome by the TNC. But - what I said about IDs is
also true about "naming characteristics". How to know every naming
characteristic used somewhere in the world to avoid duplicates?
OK, TC PubSubj and TC geolang are working in this field, but only for
Published Subject Indicators - not for naming characteristics.
ISO13250 has a solution to this problem:
<<<<<<<<<<<
NOTE 36: If any two topic maps that are to be merged conflict with one
another because they happen to provide the same name within the same scope
for two different subjects, the merger of the different subjects can be
prevented by applying different added themes to one or both of their
containing topic map documents, using one or more addthms elements. The
added themes specified by such addthms elements can serve to distinguish the
two identical names, because they will no longer appear within exactly the
same scope.
>>>>>>>>>>>
But this not machine-detectable! A machine can find naming duplicates, but
it cannot decide that these just have "happened" and should be corrected, or
that the topics should be merged.
ISO13250 contains many more constraints about when topics may be merged (The
most exact rule is about "Public Subject Descriptor" checking, and this is
now applied by the Published Subject Identifier TCs).
> > If you refer to a topic you use IDREF and not the
> > basename.
>
> Correct. The XTM syntax doesn't provide any constructs
> that would explicitly support the addressing of a
> particular subject by means of one of its basenames.
> (Nevertheless, the underlying SAM is designed to
> support it, and applications are free to do it.)
See my note on the local scope of ID/IDREF above. PSI use xlink:href, not
IDREF nor basenames. xlink:href is of type xsd:anyURI, so you may use any
kind of URI, that means: it may be a URN, which is a unique name in a unique
namespace only - not a working web address like URL.
I am currently implementing a Web Service that gives access to a topic (or a
ranked topic list) by several search conditions (including basename, of
course).
This one will use SOAP messages that are described in a WSDL document.
Did you see my postings:
http://lists.oasis-open.org/archives/tm-pubsubj-comment/200202/msg00010.html
([tm-pubsubj-comment] using UDDI for PSI ), and
http://lists.oasis-open.org/archives/geolang-comment/200202/msg00045.html (a
summary of my TM-related work).
Looking forward to continue this thread,
Thomas Bandholtz
XML Competence Center
SchlumbergerSema
Sema GmbH
Kaltenbornweg 3
D50679 Köln/Cologne
++49 (0)221 8299 264
------_=_NextPart_001_01C1C2B1.08E0E180
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2654.45">
<TITLE>RE: [topicmaps-comment] multilingual thesaurus - language, scope =
,and topic naming constraint</TITLE>
</HEAD>
<BODY>
<BR>
<P><FONT SIZE=3D2>Steven, sorry for the delay.</FONT>
</P>
<P><FONT SIZE=3D2>> -----Original Message-----</FONT>
<BR><FONT SIZE=3D2>> From: Steven R. Newcomb [<A =
HREF=3D"mailto:srn@coolheads.com">mailto:srn@coolheads.com</A>]</FONT>
<BR><FONT SIZE=3D2>> Sent: Saturday, February 02, 2002 9:59 =
PM</FONT>
<BR><FONT SIZE=3D2>> To: Bandholtz, Thomas</FONT>
<BR><FONT SIZE=3D2>> Cc: 'topicmaps-comment@lists.oasis-open.org'; =
</FONT>
<BR><FONT SIZE=3D2>> topicmapmail@infoloom.com</FONT>
<BR><FONT SIZE=3D2>> Subject: Re: [topicmaps-comment] multilingual =
thesaurus - language,</FONT>
<BR><FONT SIZE=3D2>> scope ,and topic naming constraint</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> "Bandholtz, Thomas" =
<thomas.bandholtz@koeln.sema.slb.com> writes:</FONT>
</P>
<BR>
<P><FONT SIZE=3D2>> > 3. A basename is a basename and not an =
association.</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> The above remark is both true and false, =
depending on</FONT>
<BR><FONT SIZE=3D2>> how you read it.</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> It is true that a <basename> element is =
not an</FONT>
<BR><FONT SIZE=3D2>> <association> element.</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> However, at the most fundamental level of the =
semantics</FONT>
<BR><FONT SIZE=3D2>> of topic maps, a basename (the name indicated =
by the</FONT>
<BR><FONT SIZE=3D2>> content of a <baseNameString>) is itself =
a subject.</FONT>
<BR><FONT SIZE=3D2>> Every <baseName> element makes the =
assertion that a</FONT>
<BR><FONT SIZE=3D2>> specific subject has, as one of its names, a =
specific</FONT>
<BR><FONT SIZE=3D2>> name, which itself is a subject. At the =
most</FONT>
<BR><FONT SIZE=3D2>> fundamental level, the only difference between =
such a</FONT>
<BR><FONT SIZE=3D2>> "topic-basename" assertion, and any =
other kind of</FONT>
<BR><FONT SIZE=3D2>> assertion (including all the kinds of =
assertions that</FONT>
<BR><FONT SIZE=3D2>> one might make via <association> =
elements) is the</FONT>
<BR><FONT SIZE=3D2>> semantics of the assertion type.</FONT>
</P>
<P><FONT SIZE=3D2>I would not like to call the name of a subject a =
subject itself.</FONT>
<BR><FONT SIZE=3D2>In the end we would have a world consisting only of =
two categories: subjects and assertions between them.</FONT>
<BR><FONT SIZE=3D2>Of course this is a legal high-level abstraction - =
but what is it good for? </FONT>
</P>
<P><FONT SIZE=3D2>If I publish a subject and i say "my subject has =
a name" then I do not want to make the name itself my =
subject. The name is just an attribute.</FONT></P>
<P><FONT SIZE=3D2>I find it useful to distinguish both =
categories.</FONT>
</P>
<P><FONT SIZE=3D2>> > 4. Back to the roots: The identifier of a =
topic is ID</FONT>
<BR><FONT SIZE=3D2>> > and not basename. </FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> The deepest roots are not fully visible in the =
syntax.</FONT>
<BR><FONT SIZE=3D2>> Again, the above remark is both true and =
false,</FONT>
<BR><FONT SIZE=3D2>> depending on how you read it.</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> It is true that the identifier of a =
<topic> element is</FONT>
<BR><FONT SIZE=3D2>> its ID.</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> It is also true that, in the Standard =
Application Model</FONT>
<BR><FONT SIZE=3D2>> (SAM) of the Topic Maps paradigm, a subject can =
be</FONT>
<BR><FONT SIZE=3D2>> addressed by means of its base names. In =
fact, the</FONT>
<BR><FONT SIZE=3D2>> primary reason for the Topic Naming Constraint =
is to</FONT>
<BR><FONT SIZE=3D2>> preserve the possibility of unambiguous =
addressing of</FONT>
<BR><FONT SIZE=3D2>> subjects by means of their base names. =
(The Topic</FONT>
<BR><FONT SIZE=3D2>> Naming Constraint is that no two subjects can =
have the</FONT>
<BR><FONT SIZE=3D2>> same name in the same scope.)</FONT>
</P>
<P><FONT SIZE=3D2>In a single XML document, all attributes of type ID =
have to be unique. A validating parser will check this. Generally this =
asures as well that all internal references (using IDREF) are valid. =
Based on this, two Topic Maps may use the same IDs while both of them =
are valid by themselves. When you merge them into one document, you =
will have to unify the IDs and IDREFs by replacing them in at least one =
of the two source documents. This means: IDs are a moving target - as =
long as I cannot asure that nobody else in the world uses the same IDs =
like me.</FONT></P>
<P><FONT SIZE=3D2>I think this shall be overcome by the TNC. But - what =
I said about IDs is also true about "naming characteristics". =
How to know every naming characteristic used somewhere in the world to =
avoid duplicates?</FONT></P>
<P><FONT SIZE=3D2>OK, TC PubSubj and TC geolang are working in this =
field, but only for Published Subject Indicators - not for naming =
characteristics. </FONT></P>
<P><FONT SIZE=3D2>ISO13250 has a solution to this problem:</FONT>
<BR><FONT SIZE=3D2><<<<<<<<<<<</FONT>
<BR><FONT SIZE=3D2>NOTE 36: If any two topic maps that are to be merged =
conflict with one another because they happen to provide the same name =
within the same scope for two different subjects, the merger of the =
different subjects can be prevented by applying different added themes =
to one or both of their containing topic map documents, using one or =
more addthms elements. The added themes specified by such addthms =
elements can serve to distinguish the two identical names, because they =
will no longer appear within exactly the same scope. </FONT></P>
<P><FONT SIZE=3D2>>>>>>>>>>>></FONT>
<BR><FONT SIZE=3D2>But this not machine-detectable! A machine can find =
naming duplicates, but it cannot decide that these just have =
"happened" and should be corrected, or that the topics should =
be merged.</FONT></P>
<P><FONT SIZE=3D2>ISO13250 contains many more constraints about when =
topics may be merged (The most exact rule is about "Public Subject =
Descriptor" checking, and this is now applied by the Published =
Subject Identifier TCs).</FONT></P>
<P><FONT SIZE=3D2>> > If you refer to a topic you use IDREF and =
not the</FONT>
<BR><FONT SIZE=3D2>> > basename.</FONT>
<BR><FONT SIZE=3D2>> </FONT>
<BR><FONT SIZE=3D2>> Correct. The XTM syntax doesn't provide =
any constructs</FONT>
<BR><FONT SIZE=3D2>> that would explicitly support the addressing of =
a</FONT>
<BR><FONT SIZE=3D2>> particular subject by means of one of its =
basenames.</FONT>
<BR><FONT SIZE=3D2>> (Nevertheless, the underlying SAM is designed =
to</FONT>
<BR><FONT SIZE=3D2>> support it, and applications are free to do =
it.)</FONT>
</P>
<P><FONT SIZE=3D2>See my note on the local scope of ID/IDREF above. PSI =
use xlink:href, not IDREF nor basenames. xlink:href is of type =
xsd:anyURI, so you may use any kind of URI, that means: it may be a =
URN, which is a unique name in a unique namespace only - not a working =
web address like URL. </FONT></P>
<P><FONT SIZE=3D2>I am currently implementing a Web Service that gives =
access to a topic (or a ranked topic list) by several search conditions =
(including basename, of course).</FONT></P>
<P><FONT SIZE=3D2>This one will use SOAP messages that are described in =
a WSDL document.</FONT>
<BR><FONT SIZE=3D2>Did you see my postings:</FONT>
<BR><FONT SIZE=3D2><A HREF=3D"http://lists.oasis-open.org/archives/tm-pu=
bsubj-comment/200202/msg00010.html" =
TARGET=3D"_blank">http://lists.oasis-open.org/archives/tm-pubsubj-commen=
t/200202/msg00010.html</A> ([tm-pubsubj-comment] using UDDI for PSI ), =
and</FONT></P>
<P><FONT SIZE=3D2><A =
HREF=3D"http://lists.oasis-open.org/archives/geolang-comment/200202/msg0=
0045.html" =
TARGET=3D"_blank">http://lists.oasis-open.org/archives/geolang-comment/2=
00202/msg00045.html</A> (a summary of my TM-related work).</FONT>
</P>
<P><FONT SIZE=3D2>Looking forward to continue this thread,</FONT>
</P>
<P><FONT SIZE=3D2>Thomas Bandholtz </FONT>
<BR><FONT SIZE=3D2>XML Competence Center </FONT>
<BR><FONT SIZE=3D2>SchlumbergerSema </FONT>
<BR><FONT SIZE=3D2>Sema GmbH </FONT>
<BR><FONT SIZE=3D2>Kaltenbornweg 3 </FONT>
<BR><FONT SIZE=3D2>D50679 K=F6ln/Cologne </FONT>
<BR><FONT SIZE=3D2>++49 (0)221 8299 264 </FONT>
</P>
</BODY>
</HTML>
------_=_NextPart_001_01C1C2B1.08E0E180--