[topicmapmail] Fragmented XTM for web metadata, and
some ontology?
Kal Ahmed
kal@techquila.com
29 Jun 2003 22:34:01 +0100
On Sun, 2003-06-29 at 22:05, Murray Altheim wrote:
> Kal Ahmed wrote:
> > Murray Altheim wrote:
> [...]
> > Why is it any different to putting escaped XML in resourceData to get
> > around this limitation. Or putting base-64 encoded binary in
> > resourceData ? XTM 1.0 doesn't prevent either of those things, but you
> > do not consider that to be broken.
>
> Oh, but I do. Being merely valid doesn't make a document interpretable.
> If a processor comes upon "escaped" XHTML markup or a GIF image encoded
> as base-64, yes, it's valid, but it's only going to be correctly processed
> by applications that know what to do with that. And unless we've decided to
> raise the bar on all XTM applications to require processing (or even
> understanding) of escaped content, it's broken. By definition. It don't work,
> and the user experience is going to be that something is either wrong or
> missing or both.
>
So you see my point that there is nothing that prevents this situation
now.
> > Neither does XTM 1.0 stop you from
> > using a resourceRef to reference data that is in a format that the
> > application layer cannot grok. Nothing prevents Shockwave, PDF, MS Word
> > or any other proprietary format appearing at the end of a resourceRef,
> > so whats the difference ?
>
> The difference is that a link outside of an XTM document is exactly that:
> a link. How topic map applications handle external links can be handled
> quite easily by treating them as endpoints (i.e., "we don't know or care
> what's at the other end, but here it is"), whereas requiring applications
> to do something with or even correctly parse or validate arbitrary markup
> can be problematic, assuming that it's very unlikely that that markup is
> going to have an attached schema.
>
So what makes a link special ? Why not apply the same logic to XML
content in resourceData ?
> >> >>I argued this same case to the W3C HTML Working Group for years, where we
> >> >>had Dave Raggett and others advocating that we do away with having an HTML
> >> >>DTD at all and just defining things in terms of "tag sets". Had this come
> >> >>to pass, we'd never had an XHTML DTD at all, just some weird notion of a
> >> >>well-formed "XHTML" document where one could intermix anything anyone
> >> >>wanted anytime -- complete freedom, and completely useless to anyone
> >> >>except monopolists who import and export their own brand of proprietary
> >> >>markup muck (export "HTML" from MS Word to see what I mean). If you look
> >> >>at the latest XHTML 2.0 draft [1] you'll see they're still trying to
> >> >>figure out some way of specifying a language without using a schema.
> >> >>[okay, I should have wrapped this in <rant>. I just don't want us to go
> >> >>down that same road.]
> >> >
> >> > But I am not suggesting complete mix-n-match. I am suggesting one
> >> > particular place in the XTM DTD where XML from other namespaces would be
> >> > allowed. There is no question (in my mind) of changing the "XTM on top"
> >> > approach of the DTD that says that the topicMap element should be the
> >> > document element, nor is there any question of allowing other markup to
> >> > appear anywhere else than inside resourceData. I know that other people
> >> > have suggested such changes, but I leave it to them to defend that :)
> >>
> >>But allowing arbitrary markup even within <resourceData> means that XTM
> >>applications *all* need to unambiguously and consistently process whatever
> >>that markup happens to be. So let's say, for sake of argument, we allow
> >>a subset of XHTML markup to appear there. We're not talking things that
> >>appear in HTML's <head>, we're not attaching CSS stylesheets, we're not
> >>using <base> or <applet> or <object> or JavaScript or even <table>. Even
> >>with a *really* constricted declared content, we'd still be requiring all
> >>XTM applications to correctly process it, as for those processors that
> >>didn't, things could go completely haywire. Content might be missing,
> >>disappear from the screen, words be stuck together because of missing
> >>implied whitespace (because of now-ignored markup), and the user experience
> >>might rely on that markup to make sense.
> >
> > Thats exactly what we would not be requiring. XML content of
> > resourceData would be flagged up by the XTM parser as such. The XML data
> > will then be available as a string to the application layer. The
> > application layer may then choose to parse the XML and/or validate it
> > against the schema (small s) identified by a facet of the occurrence
> > type. The schema and/or namespace of the XML will allow the application
> > to determine the correct handler to be used for the data.
>
> This was bandied about years ago within the W3C, but nobody has yet come
> up with a way to attach a schema to an arbitrary chunk of content within
> another chunk of content. XML Schema claims to make this possible, but it
> doesn't make it simple enough that anyone in their right mind would ever
> do it. There's so few people capable of writing an XML Schema that would
> reliably constrain such content, that a Venn diagram cross with our
> pool of topic map authors would reveal few if anyone capable of performing
> such feats. Could anyone on this list do it? I couldn't.
>
<xs:any namespace="##other"/> should do the trick. But thats really not
the point I think.
> > If no correct
> > handler is available, then it is an "attachment" of data. What the
> > application chooses to do with the data is really up to it. e.g it might
> > allow the data to be opened in an XML editor (remember I am only
> > proposing XML data, nothing else) or it might render the XML in a plain
> > text view (as IE and Mozilla do) or it might offer the user the
> > opportunity to download the XML and open it in an application of their
> > choice.
>
> No offense intended, but none of these proposals do anything "intelligent"
> with the content, nor could they. Absent not only a schema but a processing
> model for any given content, it's impossible to do anything intelligent
> with it. If you said, "all XHTML content should be displayed as if in
> a Web browser", that'd be one thing. But it's that kind of statement you'd
> have to make about every bit of content, and you'd have to have a way of
> stating it in a machine-understandable way. Popping up a dialog box for
> each instance of embedded content (there could be hundreds), and opening
> up that markup in an editor is about the last thing I'd want to force upon
> a user.
>
Firstly, I still don't see anything different between that and the way
in which resourceRef is handled now. Or indeed the way in which
resourceData would have to be handled in the face of encoded binary.
Secondly the whole point is the XTM parser does not have the
"intelligence", the application does. And the applicaiton has a more
constrained environment than the XTM parser. Whats the big deal with
that, thats the way all systems are built. Your XML parser doesn't
understand XTM, but the XTM parser you built on top of it works with a
more constrained environment, building objects as a result of parsing
the XTM. Its the same thing.
> >>It's just an extremely slippery slope. I've thought quite a number of
> >>times about creating a combo-DTD mixing XTM and XHTML, where XHTML is
> >>the document element and XTM appears as a single block at the end.
> >>There'd be a lot of interesting applications for something like this,
> >>but it would ruin interchange of XTM if if actually became popular, as
> >>suddenly we'd have accidentally upped the ante on what was required of
> >>XTM applications, something I've assiduously tried to avoid. I've never
> >>believed the idea that sending around well-formed XML was going to
> >>catch on much, simply because the ground assumption of an application
> >>able to *correctly* process such documents would be enormous, and hell,
> >>the Web community can't even get Netscape or IE to work correctly and
> >>reliably with HTML after many years of trying.
> >
> > Well, firstly it is not true that Netscape and IE do not work correctly
> > and reliably with XHTML or HTML 4 as long as you bother to put the
> > DOCTYPE at the top and so put the browsers into "strict" mode, but thats
> > another issue.
>
> I don't think it is at all. Your experience of the Web is apparently much
> different than most peoples'. While *some* of the major web sites seem to
> operate correctly for me, perhaps the majority (>50%), but there are a lot
> of sites that are completely broken, have inaccessible content, can't be
> browsed without cookies, have funky JavaScript menus or forms that don't
> work, whatever. I use Netscape on linux and OSX and the Web is often broken.
> Better than it was five years ago, but not by much. Now, it's usually not
> the XHTML or HTML 4 that's the problem, it's all the extras, which is my
> point. And so long as we can't create a reliable baseline for XTM, I don't
> think we should be proposing raising the bar on application requirements.
>
Now that I see your point, I will concede that JavaScript/CSS support is
a mess. But it is a whole other point. The only additional requirement
on an XTM parser will be that it recognise and flag XML data. Not that
it does anything with it, that is left to the application layer, where
those kind of decisions belong.
> > Its worth repeating that I am not suggesting that XTM be embedded in
> > XHTML or in any other form of markup. Nor am I suggesting that XML from
> > a namespace other than XTM 1.0 appear anywhere but in the content of a
> > resourceData element.
>
> I understand that. I have less a problem with the former than the latter,
> as at least the <topicMap> element and its content remains inviolate.
>
> Perhaps this as a better line of questioning: if we were to allow XML
> markup within <resourceData>:
>
> 1. how would you propose it be validated? How would you attach one of
> the three schema languages in a way that would tell them the context
> of the content?
Your proposal of properties of the occurrence type (your 'facets' as
opposed to my 'facets' or XTM's 'facets' ;-) makes sense. A 'content
schema' facet would do the trick. The application could then decide if
this is a schema language it can process and if so validate the content
against the specified schema.
> 2. how would you propose correct application behaviours be attached
> so that the user experience across TM applications be similar?
> (Stylesheets? Behaviour sheets? etc.)
Thats not my concern. Application behaviour across TM applications is
completely different. Just compare OKS and K42. TM parsers would be
consistent because the XML content feature would be part of the SAM and
a conformant parser would be required to flag the resourceData as being
XML content and make that content available as a string.
> 3. which specific markup languages would you permit? Anything at all?
Anything at all. There is probably a good argument for disallowing XTM,
so I would be prepared to say anything except XTM.
> 4. what would you say to application vendors who don't want to have
> to invest in supporting whatever variety of markup you're proposing,
> and that an XTM document is no longer necessarily simply interpreted?
>
Which vendors ? TM parser vendors ? TM browser vendors ? To the former,
there is no change. To the latter, their support for a wide range of
different content types would be a differentiator between them and their
competitors.
> As a start, we could try rich text via a very small subset of XHTML. We
> could allow a subset of presentational markup (<b>, <i>, <tt>) and a
> subset of linking (<a> with only 'id' and 'href' -- in HTML 4, <a> has
> 29 attributes). No applets, tables, no lists, no client-side imagemaps, etc.
> Now, when an <a href> points at a <a id> within a different <resourceData>,
> what should the application do? Haven't we created another linking layer
> on top of the XTM linking? And this is just one very small issue. We
> would have to write up a specification detailing what should happen for
> each of the HTML/XHTML behaviours when occurring within a <resourceData>
> context. Some would be easy, some not.
>
This is completely irrelevant to my suggestion. My suggestion has
nothing to do with rendering or presentation. There are no costraints
for XTM presentation. There are no requirements for what resources can
or cannot be pointed at by a resourceRef, there are no constraints on
the inclusion of unparsed entities or base-64 encoded binary in
resourceData. There is a large degree of freedom for XTM authors
already. I am simply asking for the freedom to embed structured data
represented in XML in resourceData.
> How many of the current topic map vendors want to start supporting XHTML
> markup? How many would be willing to support all of it? IOW, within their
> applications all support what Netscape and IE claim to support? Then we
> can move on to SVG, MathML, SMIL, and the rest. Opening the door to
> arbitrary markup should only happen after there's at least support for
> some beginning set.
>
> I just can't imagine this being a productive move for this community
> and the advancement of XTM. It wouldn't promote communication, it would
> just add some fancy features to topic map applications in a very
> proprietary way.
>
Eh? XML (a W3C rec) inside XTM (an ISO standard) == proprietary ? What
is proprietary about it ?
Look, I can already point to the stuff, and if I click on the link what
happens ? Is it defined by XTM ? No of course it isn't. So what is the
difference between that and having it inline ? The only difference is a
more convenient standard that acknowledges that not every data structure
needs to be represented as topic map structures and that allows authors
the freedom to put structured data into their topic maps.
Cheers,
Kal