[topicmapmail] Fragmented XTM for web metadata, and some ontology?
Murray Altheim
m.altheim@open.ac.uk
Sun, 29 Jun 2003 22:05:29 +0100
Kal Ahmed wrote:
> Murray Altheim wrote:
[...]
> Why is it any different to putting escaped XML in resourceData to get
> around this limitation. Or putting base-64 encoded binary in
> resourceData ? XTM 1.0 doesn't prevent either of those things, but you
> do not consider that to be broken.
Oh, but I do. Being merely valid doesn't make a document interpretable.
If a processor comes upon "escaped" XHTML markup or a GIF image encoded
as base-64, yes, it's valid, but it's only going to be correctly processed
by applications that know what to do with that. And unless we've decided to
raise the bar on all XTM applications to require processing (or even
understanding) of escaped content, it's broken. By definition. It don't work,
and the user experience is going to be that something is either wrong or
missing or both.
> Neither does XTM 1.0 stop you from
> using a resourceRef to reference data that is in a format that the
> application layer cannot grok. Nothing prevents Shockwave, PDF, MS Word
> or any other proprietary format appearing at the end of a resourceRef,
> so whats the difference ?
The difference is that a link outside of an XTM document is exactly that:
a link. How topic map applications handle external links can be handled
quite easily by treating them as endpoints (i.e., "we don't know or care
what's at the other end, but here it is"), whereas requiring applications
to do something with or even correctly parse or validate arbitrary markup
can be problematic, assuming that it's very unlikely that that markup is
going to have an attached schema.
>> >>I argued this same case to the W3C HTML Working Group for years, where we
>> >>had Dave Raggett and others advocating that we do away with having an HTML
>> >>DTD at all and just defining things in terms of "tag sets". Had this come
>> >>to pass, we'd never had an XHTML DTD at all, just some weird notion of a
>> >>well-formed "XHTML" document where one could intermix anything anyone
>> >>wanted anytime -- complete freedom, and completely useless to anyone
>> >>except monopolists who import and export their own brand of proprietary
>> >>markup muck (export "HTML" from MS Word to see what I mean). If you look
>> >>at the latest XHTML 2.0 draft [1] you'll see they're still trying to
>> >>figure out some way of specifying a language without using a schema.
>> >>[okay, I should have wrapped this in <rant>. I just don't want us to go
>> >>down that same road.]
>> >
>> > But I am not suggesting complete mix-n-match. I am suggesting one
>> > particular place in the XTM DTD where XML from other namespaces would be
>> > allowed. There is no question (in my mind) of changing the "XTM on top"
>> > approach of the DTD that says that the topicMap element should be the
>> > document element, nor is there any question of allowing other markup to
>> > appear anywhere else than inside resourceData. I know that other people
>> > have suggested such changes, but I leave it to them to defend that :)
>>
>>But allowing arbitrary markup even within <resourceData> means that XTM
>>applications *all* need to unambiguously and consistently process whatever
>>that markup happens to be. So let's say, for sake of argument, we allow
>>a subset of XHTML markup to appear there. We're not talking things that
>>appear in HTML's <head>, we're not attaching CSS stylesheets, we're not
>>using <base> or <applet> or <object> or JavaScript or even <table>. Even
>>with a *really* constricted declared content, we'd still be requiring all
>>XTM applications to correctly process it, as for those processors that
>>didn't, things could go completely haywire. Content might be missing,
>>disappear from the screen, words be stuck together because of missing
>>implied whitespace (because of now-ignored markup), and the user experience
>>might rely on that markup to make sense.
>
> Thats exactly what we would not be requiring. XML content of
> resourceData would be flagged up by the XTM parser as such. The XML data
> will then be available as a string to the application layer. The
> application layer may then choose to parse the XML and/or validate it
> against the schema (small s) identified by a facet of the occurrence
> type. The schema and/or namespace of the XML will allow the application
> to determine the correct handler to be used for the data.
This was bandied about years ago within the W3C, but nobody has yet come
up with a way to attach a schema to an arbitrary chunk of content within
another chunk of content. XML Schema claims to make this possible, but it
doesn't make it simple enough that anyone in their right mind would ever
do it. There's so few people capable of writing an XML Schema that would
reliably constrain such content, that a Venn diagram cross with our
pool of topic map authors would reveal few if anyone capable of performing
such feats. Could anyone on this list do it? I couldn't.
> If no correct
> handler is available, then it is an "attachment" of data. What the
> application chooses to do with the data is really up to it. e.g it might
> allow the data to be opened in an XML editor (remember I am only
> proposing XML data, nothing else) or it might render the XML in a plain
> text view (as IE and Mozilla do) or it might offer the user the
> opportunity to download the XML and open it in an application of their
> choice.
No offense intended, but none of these proposals do anything "intelligent"
with the content, nor could they. Absent not only a schema but a processing
model for any given content, it's impossible to do anything intelligent
with it. If you said, "all XHTML content should be displayed as if in
a Web browser", that'd be one thing. But it's that kind of statement you'd
have to make about every bit of content, and you'd have to have a way of
stating it in a machine-understandable way. Popping up a dialog box for
each instance of embedded content (there could be hundreds), and opening
up that markup in an editor is about the last thing I'd want to force upon
a user.
>>It's just an extremely slippery slope. I've thought quite a number of
>>times about creating a combo-DTD mixing XTM and XHTML, where XHTML is
>>the document element and XTM appears as a single block at the end.
>>There'd be a lot of interesting applications for something like this,
>>but it would ruin interchange of XTM if if actually became popular, as
>>suddenly we'd have accidentally upped the ante on what was required of
>>XTM applications, something I've assiduously tried to avoid. I've never
>>believed the idea that sending around well-formed XML was going to
>>catch on much, simply because the ground assumption of an application
>>able to *correctly* process such documents would be enormous, and hell,
>>the Web community can't even get Netscape or IE to work correctly and
>>reliably with HTML after many years of trying.
>
> Well, firstly it is not true that Netscape and IE do not work correctly
> and reliably with XHTML or HTML 4 as long as you bother to put the
> DOCTYPE at the top and so put the browsers into "strict" mode, but thats
> another issue.
I don't think it is at all. Your experience of the Web is apparently much
different than most peoples'. While *some* of the major web sites seem to
operate correctly for me, perhaps the majority (>50%), but there are a lot
of sites that are completely broken, have inaccessible content, can't be
browsed without cookies, have funky JavaScript menus or forms that don't
work, whatever. I use Netscape on linux and OSX and the Web is often broken.
Better than it was five years ago, but not by much. Now, it's usually not
the XHTML or HTML 4 that's the problem, it's all the extras, which is my
point. And so long as we can't create a reliable baseline for XTM, I don't
think we should be proposing raising the bar on application requirements.
> Its worth repeating that I am not suggesting that XTM be embedded in
> XHTML or in any other form of markup. Nor am I suggesting that XML from
> a namespace other than XTM 1.0 appear anywhere but in the content of a
> resourceData element.
I understand that. I have less a problem with the former than the latter,
as at least the <topicMap> element and its content remains inviolate.
Perhaps this as a better line of questioning: if we were to allow XML
markup within <resourceData>:
1. how would you propose it be validated? How would you attach one of
the three schema languages in a way that would tell them the context
of the content?
2. how would you propose correct application behaviours be attached
so that the user experience across TM applications be similar?
(Stylesheets? Behaviour sheets? etc.)
3. which specific markup languages would you permit? Anything at all?
4. what would you say to application vendors who don't want to have
to invest in supporting whatever variety of markup you're proposing,
and that an XTM document is no longer necessarily simply interpreted?
As a start, we could try rich text via a very small subset of XHTML. We
could allow a subset of presentational markup (<b>, <i>, <tt>) and a
subset of linking (<a> with only 'id' and 'href' -- in HTML 4, <a> has
29 attributes). No applets, tables, no lists, no client-side imagemaps, etc.
Now, when an <a href> points at a <a id> within a different <resourceData>,
what should the application do? Haven't we created another linking layer
on top of the XTM linking? And this is just one very small issue. We
would have to write up a specification detailing what should happen for
each of the HTML/XHTML behaviours when occurring within a <resourceData>
context. Some would be easy, some not.
How many of the current topic map vendors want to start supporting XHTML
markup? How many would be willing to support all of it? IOW, within their
applications all support what Netscape and IE claim to support? Then we
can move on to SVG, MathML, SMIL, and the rest. Opening the door to
arbitrary markup should only happen after there's at least support for
some beginning set.
I just can't imagine this being a productive move for this community
and the advancement of XTM. It wouldn't promote communication, it would
just add some fancy features to topic map applications in a very
proprietary way.
Murray
...........................................................................
Murray Altheim http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK .
"There's a lot of intelligence out there that you don't
know if it's true or not." -- Anonymous US official
http://news.bbc.co.uk/1/hi/world/middle_east/3014850.stm