[topicmapmail] Fragmented XTM for web metadata, and
some ontology?
Kal Ahmed
kal@techquila.com
30 Jun 2003 18:17:45 +0100
On Mon, 2003-06-30 at 01:54, Murray Altheim wrote:
> Kal Ahmed wrote:
> > On Sun, 2003-06-29 at 22:05, Murray Altheim wrote:
> >
> >>Kal Ahmed wrote:
> >> > Murray Altheim wrote:
> >>[...]
> >>
> >>>Why is it any different to putting escaped XML in resourceData to get
> >>>around this limitation. Or putting base-64 encoded binary in
> >>>resourceData ? XTM 1.0 doesn't prevent either of those things, but you
> >>>do not consider that to be broken.
> >>
> >>Oh, but I do. Being merely valid doesn't make a document interpretable.
> >>If a processor comes upon "escaped" XHTML markup or a GIF image encoded
> >>as base-64, yes, it's valid, but it's only going to be correctly processed
> >>by applications that know what to do with that. And unless we've decided to
> >>raise the bar on all XTM applications to require processing (or even
> >>understanding) of escaped content, it's broken. By definition. It don't work,
> >>and the user experience is going to be that something is either wrong or
> >>missing or both.
> >
> > So you see my point that there is nothing that prevents this situation
> > now.
>
> Yes, but as I just said to Peter, so what? If you want to embed base-64
> encoded binaries within XTM, you can do it. But this just makes your
> ability to communicate more difficult, and lessens the chance that your
> XTM documents can be interpreted correctly. You can do that now without
> asking for any changes to XTM, and nobody is going to come and arrest
> you. You have that freedom.
>
I think that I am saying "You can already put data that is not
meaningful to the client application into XTM" as a supporting argument
for my position and you are positing the same as a supporting argument
for your position.
What I mean is that if you have that freedom with non-XML data already,
what is the reason for denying the same freedom to XML data ?
> >>>Neither does XTM 1.0 stop you from
> >>>using a resourceRef to reference data that is in a format that the
> >>>application layer cannot grok. Nothing prevents Shockwave, PDF, MS Word
> >>>or any other proprietary format appearing at the end of a resourceRef,
> >>>so whats the difference ?
> >>
> >>The difference is that a link outside of an XTM document is exactly that:
> >>a link. How topic map applications handle external links can be handled
> >>quite easily by treating them as endpoints (i.e., "we don't know or care
> >>what's at the other end, but here it is"), whereas requiring applications
> >>to do something with or even correctly parse or validate arbitrary markup
> >>can be problematic, assuming that it's very unlikely that that markup is
> >>going to have an attached schema.
> >
> > So what makes a link special ? Why not apply the same logic to XML
> > content in resourceData ?
>
> Because there's a big difference between including arbitrary markup within
> XTM and not doing so. A link is an endpoint -- traversing or not traversing
> it is a fairly simple decision. But if I send you an XTM document and it
> contains WTFYLML, what are you going to do with it? Unless every TM application
> understands WTFYLML, we've lost the ability to effectively interchange topic
> maps. That's not a good thing.
>
But my question is about the handling of that data in client
applications. I have already outlined how an XTM parser would deal with
the data and I undestood your objections to be on the grounds that the
applications that use that parser ("higher" in the stack if you like)
won't know how to render/present/otherwise handle the data. That is
true, but my question is why is that any different to the same
application having to render/present/otherwise handle data that is
linked to by a resourceRef ?
> [...]
> >>>If no correct
> >>>handler is available, then it is an "attachment" of data. What the
> >>>application chooses to do with the data is really up to it. e.g it might
> >>>allow the data to be opened in an XML editor (remember I am only
> >>>proposing XML data, nothing else) or it might render the XML in a plain
> >>>text view (as IE and Mozilla do) or it might offer the user the
> >>>opportunity to download the XML and open it in an application of their
> >>>choice.
> >>
> >>No offense intended, but none of these proposals do anything "intelligent"
> >>with the content, nor could they. Absent not only a schema but a processing
> >>model for any given content, it's impossible to do anything intelligent
> >>with it. If you said, "all XHTML content should be displayed as if in
> >>a Web browser", that'd be one thing. But it's that kind of statement you'd
> >>have to make about every bit of content, and you'd have to have a way of
> >>stating it in a machine-understandable way. Popping up a dialog box for
> >>each instance of embedded content (there could be hundreds), and opening
> >>up that markup in an editor is about the last thing I'd want to force upon
> >>a user.
> >
> > Firstly, I still don't see anything different between that and the way
> > in which resourceRef is handled now. Or indeed the way in which
> > resourceData would have to be handled in the face of encoded binary.
>
> If somebody makes a link to a resource that is of an unknown or
> unprocessable MIME type (from a specific application's POV), that
> topic map is to some extent broken. It can't be used in the way that
> it was intended, and the user experience will perhaps cause misinterpretation
> or at best, simply a blank. There isn't some police force keeping people
> from doing this, but if you're trying to interchange with people you don't
> deliberately choose language they don't know.
>
Surely the same applies to XML data inside a resourceData element
> > Secondly the whole point is the XTM parser does not have the
> > "intelligence", the application does. And the applicaiton has a more
> > constrained environment than the XTM parser. Whats the big deal with
> > that, thats the way all systems are built. Your XML parser doesn't
> > understand XTM, but the XTM parser you built on top of it works with a
> > more constrained environment, building objects as a result of parsing
> > the XTM. Its the same thing.
>
> XTM is an interchange syntax. If you're not interested in interchange,
> it's up to you to pollute it with anything you like. Within a proprietary
> application, nobody will mind. But for purposes of sharing topic maps
> with other applications, there needs to be a common playing field.
>
Surely in this case XTM is a carrying case for the XML data - the XTM
data is not "polluted" and XTM parser can still say "There is an
occurrence of this topic of type X, it consists of inline XML data. From
the facets of the occurrence type, it has been asserted that it uses the
schema 'foo'". The topic map still works, you just cannot read the data
that your application cannot handle. Same as if I go to a website that
has Flash on it without the plug-in installed. The XTM is not broken.
The application is not broken. The topic map is not broken. There is
just some data that I cannot understand. At least because it is using a
standard syntax (XML) my parser can say "Yup, thats XML alright" or
"This topic map data is not well-formed", so I have a basic level of
syntactic validation "for free" - all the higher level stuff is punted
where it belongs, up the stack.
> >>>>It's just an extremely slippery slope. I've thought quite a number of
> >>>>times about creating a combo-DTD mixing XTM and XHTML, where XHTML is
> >>>>the document element and XTM appears as a single block at the end.
> >>>>There'd be a lot of interesting applications for something like this,
> >>>>but it would ruin interchange of XTM if if actually became popular, as
> >>>>suddenly we'd have accidentally upped the ante on what was required of
> >>>>XTM applications, something I've assiduously tried to avoid. I've never
> >>>>believed the idea that sending around well-formed XML was going to
> >>>>catch on much, simply because the ground assumption of an application
> >>>>able to *correctly* process such documents would be enormous, and hell,
> >>>>the Web community can't even get Netscape or IE to work correctly and
> >>>>reliably with HTML after many years of trying.
> >>>
> >>>Well, firstly it is not true that Netscape and IE do not work correctly
> >>>and reliably with XHTML or HTML 4 as long as you bother to put the
> >>>DOCTYPE at the top and so put the browsers into "strict" mode, but thats
> >>>another issue.
> >>
> >>I don't think it is at all. Your experience of the Web is apparently much
> >>different than most peoples'. While *some* of the major web sites seem to
> >>operate correctly for me, perhaps the majority (>50%), but there are a lot
> >>of sites that are completely broken, have inaccessible content, can't be
> >>browsed without cookies, have funky JavaScript menus or forms that don't
> >>work, whatever. I use Netscape on linux and OSX and the Web is often broken.
> >>Better than it was five years ago, but not by much. Now, it's usually not
> >>the XHTML or HTML 4 that's the problem, it's all the extras, which is my
> >>point. And so long as we can't create a reliable baseline for XTM, I don't
> >>think we should be proposing raising the bar on application requirements.
> >
> > Now that I see your point, I will concede that JavaScript/CSS support is
> > a mess. But it is a whole other point. The only additional requirement
> > on an XTM parser will be that it recognise and flag XML data. Not that
> > it does anything with it, that is left to the application layer, where
> > those kind of decisions belong.
>
> Yes, certainly, that is where these decisions belong.
>
> >>>Its worth repeating that I am not suggesting that XTM be embedded in
> >>>XHTML or in any other form of markup. Nor am I suggesting that XML from
> >>>a namespace other than XTM 1.0 appear anywhere but in the content of a
> >>>resourceData element.
> >>
> >>I understand that. I have less a problem with the former than the latter,
> >>as at least the <topicMap> element and its content remains inviolate.
> >>
> >>Perhaps this as a better line of questioning: if we were to allow XML
> >>markup within <resourceData>:
> >>
> >> 1. how would you propose it be validated? How would you attach one of
> >> the three schema languages in a way that would tell them the context
> >> of the content?
> >
> > Your proposal of properties of the occurrence type (your 'facets' as
> > opposed to my 'facets' or XTM's 'facets' ;-) makes sense. A 'content
> > schema' facet would do the trick. The application could then decide if
> > this is a schema language it can process and if so validate the content
> > against the specified schema.
>
> So we'd have to come up with a whole way of attaching a schema to a
> specific block of markup within <resourceData> and standardize it so
> that all vendors use the same methodology.
>
Yes. Thats vital. That is why I haven't just gone and created KalTM or
an XTM XML Schema with the right bits tweaked. I don't want to create a
custom extension, I would really like to see if there is a general
requirement for this feature and work on a standard method for
supporting it.
> >> 2. how would you propose correct application behaviours be attached
> >> so that the user experience across TM applications be similar?
> >> (Stylesheets? Behaviour sheets? etc.)
> >
> > Thats not my concern. Application behaviour across TM applications is
> > completely different. Just compare OKS and K42. TM parsers would be
> > consistent because the XML content feature would be part of the SAM and
> > a conformant parser would be required to flag the resourceData as being
> > XML content and make that content available as a string.
>
> As a designer of systems meant to interchange across applications, it
> should be your concern. At the level of advocating a change to XTM that
> would require TM applications to be able to correctly process any XML
> content, we'd have to set some guidelines. Otherwise, we'd have all
> manner of confusion.
>
This is important. I keep saying this, but perhaps not forcefully
enough:
I am not asking for XTM applications to be *required* to do anything.
I am asking as a topic map creator for the freedom to exercise my own
judgment regarding the data I insert into a resourceData element. When I
do that I don't do so blind to the realities of the operating
environment. As an author I should know my target audience and I should
create content accordingly. Why should I be told that I am only allowed
PCDATA as inline content in XTM ?
> >> 3. which specific markup languages would you permit? Anything at all?
> >
> > Anything at all. There is probably a good argument for disallowing XTM,
> > so I would be prepared to say anything except XTM.
> >
> >
> >> 4. what would you say to application vendors who don't want to have
> >> to invest in supporting whatever variety of markup you're proposing,
> >> and that an XTM document is no longer necessarily simply interpreted?
> >
> > Which vendors ? TM parser vendors ? TM browser vendors ? To the former,
> > there is no change. To the latter, their support for a wide range of
> > different content types would be a differentiator between them and their
> > competitors.
>
> This is precisely what I'm trying to avoid, the competition over features
> derived from differentiations in XTM markup. So we end up with a big
> competition between the TM application vendors, each trying to outdo the
> next on which XML markup languages they support, and how well they support
> them. Great. If I'm vendor A and I implement say, WTFYLML support, and if
> I know vendors B, C, and D don't support it, I can be absolutely sure that
> my customers have a richer user experience than those using the products
> of the other vendors. And the winners? One company, in the end. The losers:
> the rest of us, as we can never be sure that our topic map documents will be
> interchanged reliably any longer, that a document created on one will
> process correctly on another.
>
THEY WILL PROCESS CORRECTLY AS XTM DOCUMENTS
The only thing that will not process is the content.
> Now, that may sound to some like good ol' American-style competition, but
> it completely sucks from a user perspective. This is what we have with MS
> Word. Do we really want that for an interchange syntax for topic maps?
> (Noting that any vendor may very well create their own proprietary document
> formats without asking anyone.)
>
ONLY XTM SYNTAX DECLARES TOPIC MAP CONSTRUCTS
All the other XML is isolated within <resourceData> elements.
> >>As a start, we could try rich text via a very small subset of XHTML. We
> >>could allow a subset of presentational markup (<b>, <i>, <tt>) and a
> >>subset of linking (<a> with only 'id' and 'href' -- in HTML 4, <a> has
> >>29 attributes). No applets, tables, no lists, no client-side imagemaps, etc.
> >>Now, when an <a href> points at a <a id> within a different <resourceData>,
> >>what should the application do? Haven't we created another linking layer
> >>on top of the XTM linking? And this is just one very small issue. We
> >>would have to write up a specification detailing what should happen for
> >>each of the HTML/XHTML behaviours when occurring within a <resourceData>
> >>context. Some would be easy, some not.
> >
> > This is completely irrelevant to my suggestion. My suggestion has
> > nothing to do with rendering or presentation. There are no costraints
> > for XTM presentation. There are no requirements for what resources can
> > or cannot be pointed at by a resourceRef, there are no constraints on
> > the inclusion of unparsed entities or base-64 encoded binary in
> > resourceData. There is a large degree of freedom for XTM authors
> > already. I am simply asking for the freedom to embed structured data
> > represented in XML in resourceData.
>
> My point was that if we can't expect something so elementally simple
> as a subset of XHTML to be supported amongst all the vendors, how can
> you expect *arbitrary* markup to be reliably supported.
>
> Arbitrary markup is complete death to an interchange syntax.
>
I am not proposing arbitrary markup for topic maps, only for
resourceData
> >>How many of the current topic map vendors want to start supporting XHTML
> >>markup? How many would be willing to support all of it? IOW, within their
> >>applications all support what Netscape and IE claim to support? Then we
> >>can move on to SVG, MathML, SMIL, and the rest. Opening the door to
> >>arbitrary markup should only happen after there's at least support for
> >>some beginning set.
> >>
> >>I just can't imagine this being a productive move for this community
> >>and the advancement of XTM. It wouldn't promote communication, it would
> >>just add some fancy features to topic map applications in a very
> >>proprietary way.
> >
> > Eh? XML (a W3C rec) inside XTM (an ISO standard) == proprietary ? What
> > is proprietary about it ?
>
> Come on, Kal. Please. Microsoft creates MS Word documents using Unicode
> characters and probably runs their computers on ANSI standard 110 volts
> too. Their HTML export format might even be well-formed XML, but nobody
> in their right mind would attempt to deal with it (especially since, just
> like RTF, they deliberately keep changing the "spec"). Proprietary is
> anything that isn't standardized, and is created with the express purpose
> of differentiating a vendor within the marketplace. XML isn't a markup
> language, it's a meta-language. With it you can create standards as well
> as proprietary markup. Arbitary markup is proprietary markup, especially
> when it's done by a vendor.
>
So what ? This is nothing to do with the structure of topic maps. It is
only the data pointed to by topics in a topic map. I can *right now*
create a topic map that points to arbitrary formats of data that you
will have no chance of interpreting. I just do it with resourceRef
instead of resourceData. I am askig for a facility that I can use
responsibly or can abuse. Just as I can with XTM as it stands.
> Say for example that Ontopia begins putting custom XML markup in their
> XTM documents and has their applications support it. They do like Microsoft
> has done with RTF and either don't publish a spec for it, or they don't
> keep the spec up to date, or they just change it pretty continuously. And
> so Mondeca comes along and wants to be able to open up and process Ontopia
> XTM documents so that the user experience is similar. It's in Ontopia's
> interest to make that as difficult as possible, and so just after Mondeca
> updates their software to process Ontopia TM documents, suddenly there's
> an update.
>
But the topic map will be the same. Only the occurrences would be
unreadable to other applications. Just as if they created resourceRef
pointers to their own markup language.
> This is a scenario that promotes disintegration of interchange, not
> improvement. The good solution to this is to allow vendors to do what
> they want in creating applications, binary formats for their documents,
> complete freedom (since there's no police force here to enforce anything
> less), but keep the interchange syntax free of that kind of proprietary
> content.
>
> > Look, I can already point to the stuff, and if I click on the link what
> > happens ? Is it defined by XTM ? No of course it isn't. So what is the
> > difference between that and having it inline ? The only difference is a
> > more convenient standard that acknowledges that not every data structure
> > needs to be represented as topic map structures and that allows authors
> > the freedom to put structured data into their topic maps.
>
> I look at that as the freedom to babble incoherently.
Much like email :)
> It's not convenient
> when the person you're talking to can't understand you. Standards aren't
> about convenience, they're about clear and unambiguous communication. This
> shouldn't be part of an interchange syntax, it should be part of a vendor's
> proprietary storage format, as you say, so that they can differentiate
> themselves from their competition.
>
I agree with you, but I don't see why you do not allow an escape hatch
for the "proprietary" which may have perfectly valid business
justification for its presence in a topic map. As I have repeatedly
said, this is not intended to provide a way for authors to modify the
structure of the topic map using arbitrary markup. It is simply a way
for them to provide richer resourceData to their audience. It is a
powerful tool which can be used responsibly or irresponsibly. I think we
are all grown up enough to make the decisions that are right for us.
Cheers,
Kal