[topicmapmail] Fragmented XTM for web metadata, and some ontology?

Kal Ahmed kal@techquila.com
29 Jun 2003 19:40:39 +0100


On Sun, 2003-06-29 at 17:55, Murray Altheim wrote:
> Kal Ahmed wrote:
>  > On Sun, 2003-06-29 at 13:39, Murray Altheim wrote:
> [...]
>  >>I understand that. But Kal, describe for me a reasonable approach to
>  >>allowing arbitrary XML in <resourceData> that doesn't completely screw
>  >>us in terms of interchange. Once you open that door, there's no closing
>  >>it. I just don't see how that would make any sense given that the first
>  >>document coming down the pike with unknown markup (or JavaScript code)
>  >>is just completely opaque to an application that can't process it, or
>  >>worse yet, transparent, i.e., the user doesn't even know what is missing.
>  >
>  > As I previously suggested, I don't see anything wrong with an
>  > XTM-compliant parser doing no more than validating the resourceData
>  > content as well-formed and making it available in the SAM as string
>  > data. The validation of the resourceData content can be done at a higher
>  > level. The XTM-compliant parser would be able to flag the string data as
>  > well-formed XML and then at application level you would either handle
>  > the XML or not.
> 
> The "or not" is the big problem here. Until you can solve the "or not"
> problem, you're basically advocating the Microsoft approach:  you can
> read any document so long as it is a document understood by our software.
> I don't want people opening up interchange documents and getting complete
> blanks, missing words, misinterpretations, different user "experiences"
> based on whether or not their application can grok the ugliness sent by
> another application. That's just broken.
> 

Why is it any different to putting escaped XML in resourceData to get
around this limitation. Or putting base-64 encoded binary in
resourceData ? XTM 1.0 doesn't prevent either of those things, but you
do not consider that to be broken. Neither does XTM 1.0 stop you from
using a resourceRef to reference data that is in a format that the
application layer cannot grok. Nothing prevents Shockwave, PDF, MS Word
or any other proprietary format appearing at the end of a resourceRef,
so whats the difference ?

>  >>I argued this same case to the W3C HTML Working Group for years, where we
>  >>had Dave Raggett and others advocating that we do away with having an HTML
>  >>DTD at all and just defining things in terms of "tag sets". Had this come
>  >>to pass, we'd never had an XHTML DTD at all, just some weird notion of a
>  >>well-formed "XHTML" document where one could intermix anything anyone
>  >>wanted anytime -- complete freedom, and completely useless to anyone
>  >>except monopolists who import and export their own brand of proprietary
>  >>markup muck (export "HTML" from MS Word to see what I mean). If you look
>  >>at the latest XHTML 2.0 draft [1] you'll see they're still trying to
>  >>figure out some way of specifying a language without using a schema.
>  >>[okay, I should have wrapped this in <rant>. I just don't want us to go
>  >>down that same road.]
>  >
>  > But I am not suggesting complete mix-n-match. I am suggesting one
>  > particular place in the XTM DTD where XML from other namespaces would be
>  > allowed. There is no question (in my mind) of changing the "XTM on top"
>  > approach of the DTD that says that the topicMap element should be the
>  > document element, nor is there any question of allowing other markup to
>  > appear anywhere else than inside resourceData. I know that other people
>  > have suggested such changes, but I leave it to them to defend that :)
> 
> But allowing arbitrary markup even within <resourceData> means that XTM
> applications *all* need to unambiguously and consistently process whatever
> that markup happens to be. So let's say, for sake of argument, we allow
> a subset of XHTML markup to appear there. We're not talking things that
> appear in HTML's <head>, we're not attaching CSS stylesheets, we're not
> using <base> or <applet> or <object> or JavaScript or even <table>. Even
> with a *really* constricted declared content, we'd still be requiring all
> XTM applications to correctly process it, as for those processors that
> didn't, things could go completely haywire. Content might be missing,
> disappear from the screen, words be stuck together because of missing
> implied whitespace (because of now-ignored markup), and the user experience
> might rely on that markup to make sense.
> 
Thats exactly what we would not be requiring. XML content of
resourceData would be flagged up by the XTM parser as such. The XML data
will then be available as a string to the application layer. The
application layer may then choose to parse the XML and/or validate it
against the schema (small s) identified by a facet of the occurrence
type. The schema and/or namespace of the XML will allow the application
to determine the correct handler to be used for the data. If no correct
handler is available, then it is an "attachment" of data. What the
application chooses to do with the data is really up to it. e.g it might
allow the data to be opened in an XML editor (remember I am only
proposing XML data, nothing else) or it might render the XML in a plain
text view (as IE and Mozilla do) or it might offer the user the
opportunity to download the XML and open it in an application of their
choice. 

> It's just an extremely slippery slope. I've thought quite a number of
> times about creating a combo-DTD mixing XTM and XHTML, where XHTML is
> the document element and XTM appears as a single block at the end.
> There'd be a lot of interesting applications for something like this,
> but it would ruin interchange of XTM if if actually became popular, as
> suddenly we'd have accidentally upped the ante on what was required of
> XTM applications, something I've assiduously tried to avoid. I've never
> believed the idea that sending around well-formed XML was going to
> catch on much, simply because the ground assumption of an application
> able to *correctly* process such documents would be enormous, and hell,
> the Web community can't even get Netscape or IE to work correctly and
> reliably with HTML after many years of trying.
> 

Well, firstly it is not true that Netscape and IE do not work correctly
and reliably with XHTML or HTML 4 as long as you bother to put the
DOCTYPE at the top and so put the browsers into "strict" mode, but thats
another issue.

Its worth repeating that I am not suggesting that XTM be embedded in
XHTML or in any other form of markup. Nor am I suggesting that XML from
a namespace other than XTM 1.0 appear anywhere but in the content of a
resourceData element.

Cheers,

Kal