[topicmapmail] xml:base and #foo URIs

Murray Altheim m.altheim@open.ac.uk
Thu, 24 Apr 2003 11:25:54 +0100


Lars Marius Garshol wrote:
> * Murray Altheim
> | 
> | I understand the issue. I have repeatedly said that XML Base (as it
> | says fairly clearly in the Recommendation) alters the base URI of
> | the document prior to *any* further resolution. That's it.
> 
> That makes no difference if fragment identifier URIs are not resolved
> relative to the base URI, does it?

You're exploiting a weakness in the text of RFC 2396 (in section 4.2)
for this? A fragment identifier is a same-document reference, a type
of relative reference, and that small section is only talking about
their relation to the current document, not about resolution issues.

Section 4.1 "Fragment Identifier" states:

    When a URI reference is used to perform a retrieval action on the
    identified resource, the optional fragment identifier, separated from
    the URI by a crosshatch ("#") character, consists of additional
    reference information to be interpreted by the user agent after the
    retrieval action has been successfully completed.  As such, it is not
    part of a URI, but is often used in conjunction with a URI.

So, as I said, you don't deal with fragment identifiers *at all*
until after the retrieval action. The base URI does not affect the
retrieval itself, but does affect interpretation of relative references
within the document.

Section 5.1 "Establishing a Base URI"

    The term "relative URI" implies that there exists some absolute "base
    URI" against which the relative reference is applied.  Indeed, the
    base URI is necessary to define the semantics of any relative URI
    reference; without it, a relative reference is meaningless.  In order
    for relative URI to be usable within a document, the base URI of that
    document must be known to the parser.

So, to resolve relative references (of which fragment identifiers are
a type, as intra-document references), you use the base URI. In
Section 5.1.1. of RFC 2396 "Base URI within Document Content":

    Within certain document media types, the base URI of the document can
    be embedded within the content itself such that it can be readily
    obtained by a parser.  This can be useful for descriptive documents,
    such as tables of content, which may be transmitted to others through
    protocols other than their usual retrieval context (e.g., E-Mail or
    USENET news).

The base URI of XML documents is altered by use of xml:base. Resolution
and meaning of fragment identifiers is not defined in RFC 2396 or XML
Base, nor should it be, as this is application-dependent.

The rock the W3C ran upon on fragment IDs within XML was that they'd
thought them through with HTML: they were references to document IDs.
And in HTML, it was clear what were and what weren't document IDs. But
in XML one doesn't necessarily know which attributes are within the
ID namespace of a particular document, so they were stumped. This is
why Paul had to answer you with XPointer. It's the only solution the
W3C has been able to come up with, and it's sucky. If you don't parse
a schema you don't know about the ID namespace, and fragment IDs
don't work. So for example, an XML processor that didn't know that
in XTM we used the 'id' attribute for IDs, fragment identifiers have
absolutely no meaning. [If you're interested, see the note at
http://www.w3.org/TR/xptr-framework/#shorthand ]

But pragmatically, we all know in XTM about the ID namespace. So
what you are caught in is all the miasmatic nonsense of the W3C
trying to solve a problem that should have been addressed back in
1995 or so. But at that time, nobody could apparently envision a
web that didn't use HTML documents (except those working towards
"SGML on the Web" like Yuri Rubinsky and Murray Maloney -- if I
remember right, they addressed this in their book).

A mountain of a molehill, if you ask me. I'd simply state that
fragment identifiers resolve according to the base URI (as do
relative references) and be done with it. Too much time has been
spent on this, and any other approach would violate principle of
least surprise (given that you'd then have frag IDs and relative
references resolving according to different base URIs, which
would be weird).

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

                                                Moonlight slanting
                                                  through all the
                                                  bamboo forest...
                                                and nightingale song
                                                          -- Basho