[topicmapmail] Re: Document Object Identifiers/CrossRef

W. Eliot Kimber eliot@isogen.com
Sat, 07 Sep 2002 08:11:26 -0500


Daniel Rivers-Moore wrote:

> Does anyone on this list have a sense of whether there is a strong lobby
> to get W3C to open its attitude in this regard? I am RivCom's W3C rep,
> but don't have the bandwidth to follow all the issues, so have not
> caught up on this one for some time. Certainly in the old days there was
> a kind of religious war over whether the URN had any intrinsic value
> over the URL. Not whether URN was 'better' or 'worse' than URL, but on
> whether 'URL alone' was better than 'URL and URN, each used
> appropriately'.

The last time I discussed this with Dan C. (about a year ago, I think)
he re-iterated Tim B-L's argument made in a paper on the W3C site that
there's no useful difference between URLs and URNs--they're both just
magic strings. One key aspect of the argument was/is that the
indirection provided by URNs can be provided today by any Web server and
that, in any case, it is always the resposibility of the manager of the
resource to maintain the ability to resolve it, which either means
maintaining something like the mappings provided for by the DOI/CrossRef
infrastructure and PURL systems or maintaining a redirect mapping on
your local server. 

I keep going back and forth in my own mind on this issue--I was raised
on public IDs and believed they were the answer for a long time. Then I
helped make XML and decided that public IDs were bogus--that system IDs,
especially in the context of a Web-type resolution infrastructure, were
all you needed (by the "do it on the server" argument above).

But in thinking about it more since then, I think that, while Tim is
technically correct, I think that the real difference between URNs and
URLs (that is, between names and locators) is one of expectation: if you
see a URN (a name) you expect two things: that you will have to go
through some sort of indirection mechanism to resolve it and that it
will probably resolve to the "correct" thing whenever you do choose to
resolve it, whereas with a URL you expect that you can resolve it
directly but you also aren't surprised if it fails to resolve because
"it's just a locator".

[Side note: I think what's bogus about public IDs is not that SGML and
XML provide both a name and a locator but that you can use *both* in a
single reference--that's the bogus part. There should be a single
external reference, where the syntax of the reference lets you
distinguish names from locators--that is, URIs got it right.]

>From the standpoint of publishers, I think there is value in having a
name-based addressing mechanism that matches both the requirement and
expectation that "persistent" names are being used--if I'm publishing a
scholarly work that I want to be findable and usable (through any
references it makes) 5, 10, or 100 years out, I want some assurance that
the names I'm using will resolve appropriately in the future. If this
resolution is being managed by a disinterested 3rd party (e.g.,
CrossRef) I'll probably have greater confidence than if it's being
managed by the enterprise that happens to be serving the named things at
the moment (if for no other reason than that my experience with the Web
suggests that Web sites tend to be poorly managed over time, eroding my
confidence in Web sites generally). 

What I think this comes down to is that by exposing and centralizing the
indirection mechanism (CrossRef instead of a bunch of redirect tables on
a bunch of Web servers) it provides a single point of contact for both
creating and maintaining the indirections by resource managers and
finding and verifying references by resource users.

I think part of the problem with the URL-only argument is that URLs are
unavoidably bound to particular domain names, which are too bound to
brand identity (even though the URL spec says that URLs are to be
opaque). Thus, the temptation to move resources from one domain name to
another as brands evolve or ownership changes is too great. When a
resource moves from one server to another with a different name it can
be very difficult to find that resource's new location. It's unlikely,
especially when ownership changes, that the old server would be
maintained in order to provide redirections to the new server. 

Interestingly, the DOI syntax avoids this problem to some degree by
making the owner identifiers opaque as well--very clever I think, and
possibly key to making DOIs truly persistent. That is, the DOI
recognizes that the "owner" of the name is not necessarily the owner of
the intellectual property named, just the manager of the name itself
(that is, the manager of the name's mapping to addressible resources).
There's no need for brand identification of name managers--that's just
plumbing, after all. (But note that the DOI mechanism can't restrict
what you use for resource identifiers in a DOI, so there's still room
for brand identifiers in DOIs, for example, if you use an existing URL
as the resource ID part of a DOI.)

Therefore, I think that the "do it all with URLs" argument is naive--it
ignores human nature. A centralized indirection mechanism at least
lowers the long-term cost of maintaining the resolvability of names.
Once the setup cost has been spent, there's no reason not to use the
indirection unless the ongoing cost is prohibitive. Ideally this type of
resource would be a public utility as DNS is today.

OK, I've convinced myself that URLs are not enough.

Cheers,

E.
-- 
W. Eliot Kimber, eliot@isogen.com
Consultant, ISOGEN International

1016 La Posada Dr., Suite 240
Austin, TX  78752 Phone: 512.656.4139