[topicmapmail] Fragmented XTM for web metadata, and some ontology?

Murray Altheim m.altheim@open.ac.uk
Mon, 30 Jun 2003 01:54:11 +0100


Kal Ahmed wrote:
> On Sun, 2003-06-29 at 22:05, Murray Altheim wrote:
> 
>>Kal Ahmed wrote:
>> > Murray Altheim wrote:
>>[...]
>>
>>>Why is it any different to putting escaped XML in resourceData to get
>>>around this limitation. Or putting base-64 encoded binary in
>>>resourceData ? XTM 1.0 doesn't prevent either of those things, but you
>>>do not consider that to be broken.
>>
>>Oh, but I do. Being merely valid doesn't make a document interpretable.
>>If a processor comes upon "escaped" XHTML markup or a GIF image encoded
>>as base-64, yes, it's valid, but it's only going to be correctly processed
>>by applications that know what to do with that. And unless we've decided to
>>raise the bar on all XTM applications to require processing (or even
>>understanding) of escaped content, it's broken. By definition. It don't work,
>>and the user experience is going to be that something is either wrong or
>>missing or both.
>
> So you see my point that there is nothing that prevents this situation
> now.

Yes, but as I just said to Peter, so what? If you want to embed base-64
encoded binaries within XTM, you can do it. But this just makes your
ability to communicate more difficult, and lessens the chance that your
XTM documents can be interpreted correctly. You can do that now without
asking for any changes to XTM, and nobody is going to come and arrest
you. You have that freedom.

>>>Neither does XTM 1.0 stop you from
>>>using a resourceRef to reference data that is in a format that the
>>>application layer cannot grok. Nothing prevents Shockwave, PDF, MS Word
>>>or any other proprietary format appearing at the end of a resourceRef,
>>>so whats the difference ?
>>
>>The difference is that a link outside of an XTM document is exactly that:
>>a link. How topic map applications handle external links can be handled
>>quite easily by treating them as endpoints (i.e., "we don't know or care
>>what's at the other end, but here it is"), whereas requiring applications
>>to do something with or even correctly parse or validate arbitrary markup
>>can be problematic, assuming that it's very unlikely that that markup is
>>going to have an attached schema.
>
> So what makes a link special ? Why not apply the same logic to XML
> content in resourceData ?

Because there's a big difference between including arbitrary markup within
XTM and not doing so. A link is an endpoint -- traversing or not traversing
it is a fairly simple decision. But if I send you an XTM document and it
contains WTFYLML, what are you going to do with it? Unless every TM application
understands WTFYLML, we've lost the ability to effectively interchange topic
maps. That's not a good thing.

[...]
>>>If no correct
>>>handler is available, then it is an "attachment" of data. What the
>>>application chooses to do with the data is really up to it. e.g it might
>>>allow the data to be opened in an XML editor (remember I am only
>>>proposing XML data, nothing else) or it might render the XML in a plain
>>>text view (as IE and Mozilla do) or it might offer the user the
>>>opportunity to download the XML and open it in an application of their
>>>choice. 
>>
>>No offense intended, but none of these proposals do anything "intelligent"
>>with the content, nor could they. Absent not only a schema but a processing
>>model for any given content, it's impossible to do anything intelligent
>>with it. If you said, "all XHTML content should be displayed as if in
>>a Web browser", that'd be one thing. But it's that kind of statement you'd
>>have to make about every bit of content, and you'd have to have a way of
>>stating it in a machine-understandable way. Popping up a dialog box for
>>each instance of embedded content (there could be hundreds), and opening
>>up that markup in an editor is about the last thing I'd want to force upon
>>a user.
>
> Firstly, I still don't see anything different between that and the way
> in which resourceRef is handled now. Or indeed the way in which
> resourceData would have to be handled in the face of encoded binary.

If somebody makes a link to a resource that is of an unknown or
unprocessable MIME type (from a specific application's POV), that
topic map is to some extent broken. It can't be used in the way that
it was intended, and the user experience will perhaps cause misinterpretation
or at best, simply a blank. There isn't some police force keeping people
from doing this, but if you're trying to interchange with people you don't
deliberately choose language they don't know.

> Secondly the whole point is the XTM parser does not have the
> "intelligence", the application does. And the applicaiton has a more
> constrained environment than the XTM parser. Whats the big deal with
> that, thats the way all systems are built. Your XML parser doesn't
> understand XTM, but the XTM parser you built on top of it works with a
> more constrained environment, building objects as a result of parsing
> the XTM. Its the same thing.

XTM is an interchange syntax. If you're not interested in interchange,
it's up to you to pollute it with anything you like. Within a proprietary
application, nobody will mind. But for purposes of sharing topic maps
with other applications, there needs to be a common playing field.

>>>>It's just an extremely slippery slope. I've thought quite a number of
>>>>times about creating a combo-DTD mixing XTM and XHTML, where XHTML is
>>>>the document element and XTM appears as a single block at the end.
>>>>There'd be a lot of interesting applications for something like this,
>>>>but it would ruin interchange of XTM if if actually became popular, as
>>>>suddenly we'd have accidentally upped the ante on what was required of
>>>>XTM applications, something I've assiduously tried to avoid. I've never
>>>>believed the idea that sending around well-formed XML was going to
>>>>catch on much, simply because the ground assumption of an application
>>>>able to *correctly* process such documents would be enormous, and hell,
>>>>the Web community can't even get Netscape or IE to work correctly and
>>>>reliably with HTML after many years of trying.
>>>
>>>Well, firstly it is not true that Netscape and IE do not work correctly
>>>and reliably with XHTML or HTML 4 as long as you bother to put the
>>>DOCTYPE at the top and so put the browsers into "strict" mode, but thats
>>>another issue.
>>
>>I don't think it is at all. Your experience of the Web is apparently much
>>different than most peoples'. While *some* of the major web sites seem to
>>operate correctly for me, perhaps the majority (>50%), but there are a lot
>>of sites that are completely broken, have inaccessible content, can't be
>>browsed without cookies, have funky JavaScript menus or forms that don't
>>work, whatever. I use Netscape on linux and OSX and the Web is often broken.
>>Better than it was five years ago, but not by much. Now, it's usually not
>>the XHTML or HTML 4 that's the problem, it's all the extras, which is my
>>point. And so long as we can't create a reliable baseline for XTM, I don't
>>think we should be proposing raising the bar on application requirements.
>
> Now that I see your point, I will concede that JavaScript/CSS support is
> a mess. But it is a whole other point. The only additional requirement
> on an XTM parser will be that it recognise and flag XML data. Not that
> it does anything with it, that is left to the application layer, where
> those kind of decisions belong.

Yes, certainly, that is where these decisions belong.

>>>Its worth repeating that I am not suggesting that XTM be embedded in
>>>XHTML or in any other form of markup. Nor am I suggesting that XML from
>>>a namespace other than XTM 1.0 appear anywhere but in the content of a
>>>resourceData element.
>>
>>I understand that. I have less a problem with the former than the latter,
>>as at least the <topicMap> element and its content remains inviolate.
>>
>>Perhaps this as a better line of questioning: if we were to allow XML
>>markup within <resourceData>:
>>
>>   1. how would you propose it be validated? How would you attach one of
>>      the three schema languages in a way that would tell them the context
>>      of the content?
> 
> Your proposal of properties of the occurrence type (your 'facets' as
> opposed to my 'facets' or XTM's 'facets' ;-) makes sense. A 'content
> schema' facet would do the trick. The application could then decide if
> this is a schema language it can process and if so validate the content
> against the specified schema.

So we'd have to come up with a whole way of attaching a schema to a
specific block of markup within <resourceData> and standardize it so
that all vendors use the same methodology.

>>   2. how would you propose correct application behaviours be attached
>>      so that the user experience across TM applications be similar?
>>      (Stylesheets? Behaviour sheets? etc.)
>
> Thats not my concern. Application behaviour across TM applications is
> completely different. Just compare OKS and K42. TM parsers would be
> consistent because the XML content feature would be part of the SAM and
> a conformant parser would be required to flag the resourceData as being
> XML content and make that content available as a string.

As a designer of systems meant to interchange across applications, it
should be your concern. At the level of advocating a change to XTM that
would require TM applications to be able to correctly process any XML
content, we'd have to set some guidelines. Otherwise, we'd have all
manner of confusion.

>>   3. which specific markup languages would you permit? Anything at all?
>
> Anything at all. There is probably a good argument for disallowing XTM,
> so I would be prepared to say anything except XTM.
> 
> 
>>   4. what would you say to application vendors who don't want to have
>>      to invest in supporting whatever variety of markup you're proposing,
>>      and that an XTM document is no longer necessarily simply interpreted?
>
> Which vendors ? TM parser vendors ? TM browser vendors ? To the former,
> there is no change. To the latter, their support for a wide range of
> different content types would be a differentiator between them and their
> competitors.

This is precisely what I'm trying to avoid, the competition over features
derived from differentiations in XTM markup. So we end up with a big
competition between the TM application vendors, each trying to outdo the
next on which XML markup languages they support, and how well they support
them. Great. If I'm vendor A and I implement say, WTFYLML support, and if
I know vendors B, C, and D don't support it, I can be absolutely sure that
my customers have a richer user experience than those using the products
of the other vendors. And the winners? One company, in the end. The losers:
the rest of us, as we can never be sure that our topic map documents will be
interchanged reliably any longer, that a document created on one will
process correctly on another.

Now, that may sound to some like good ol' American-style competition, but
it completely sucks from a user perspective. This is what we have with MS
Word. Do we really want that for an interchange syntax for topic maps?
(Noting that any vendor may very well create their own proprietary document
formats without asking anyone.)

>>As a start, we could try rich text via a very small subset of XHTML. We
>>could allow a subset of presentational markup (<b>, <i>, <tt>) and a
>>subset of linking (<a> with only 'id' and 'href' -- in HTML 4, <a> has
>>29 attributes). No applets, tables, no lists, no client-side imagemaps, etc.
>>Now, when an <a href> points at a <a id> within a different <resourceData>,
>>what should the application do? Haven't we created another linking layer
>>on top of the XTM linking?  And this is just one very small issue. We
>>would have to write up a specification detailing what should happen for
>>each of the HTML/XHTML behaviours when occurring within a <resourceData>
>>context. Some would be easy, some not.
>
> This is completely irrelevant to my suggestion. My suggestion has
> nothing to do with rendering or presentation. There are no costraints
> for XTM presentation. There are no requirements for what resources can
> or cannot be pointed at by a resourceRef, there are no constraints on
> the inclusion of unparsed entities or base-64 encoded binary in
> resourceData. There is a large degree of freedom for XTM authors
> already. I am simply asking for the freedom to embed structured data
> represented in XML in resourceData.

My point was that if we can't expect something so elementally simple
as a subset of XHTML to be supported amongst all the vendors, how can
you expect *arbitrary* markup to be reliably supported.

Arbitrary markup is complete death to an interchange syntax.

>>How many of the current topic map vendors want to start supporting XHTML
>>markup? How many would be willing to support all of it? IOW, within their
>>applications all support what Netscape and IE claim to support? Then we
>>can move on to SVG, MathML, SMIL, and the rest. Opening the door to
>>arbitrary markup should only happen after there's at least support for
>>some beginning set.
>>
>>I just can't imagine this being a productive move for this community
>>and the advancement of XTM. It wouldn't promote communication, it would
>>just add some fancy features to topic map applications in a very
>>proprietary way.
>
> Eh? XML (a W3C rec) inside XTM (an ISO standard) == proprietary ? What
> is proprietary about it ? 

Come on, Kal. Please. Microsoft creates MS Word documents using Unicode
characters and probably runs their computers on ANSI standard 110 volts
too. Their HTML export format might even be well-formed XML, but nobody
in their right mind would attempt to deal with it (especially since, just
like RTF, they deliberately keep changing the "spec"). Proprietary is
anything that isn't standardized, and is created with the express purpose
of differentiating a vendor within the marketplace. XML isn't a markup
language, it's a meta-language. With it you can create standards as well
as proprietary markup. Arbitary markup is proprietary markup, especially
when it's done by a vendor.

Say for example that Ontopia begins putting custom XML markup in their
XTM documents and has their applications support it. They do like Microsoft
has done with RTF and either don't publish a spec for it, or they don't
keep the spec up to date, or they just change it pretty continuously. And
so Mondeca comes along and wants to be able to open up and process Ontopia
XTM documents so that the user experience is similar. It's in Ontopia's
interest to make that as difficult as possible, and so just after Mondeca
updates their software to process Ontopia TM documents, suddenly there's
an update.

This is a scenario that promotes disintegration of interchange, not
improvement. The good solution to this is to allow vendors to do what
they want in creating applications, binary formats for their documents,
complete freedom (since there's no police force here to enforce anything
less), but keep the interchange syntax free of that kind of proprietary
content.

> Look, I can already point to the stuff, and if I click on the link what
> happens ? Is it defined by XTM ? No of course it isn't. So what is the
> difference between that and having it inline ? The only difference is a
> more convenient standard that acknowledges that not every data structure
> needs to be represented as topic map structures and that allows authors
> the freedom to put structured data into their topic maps.

I look at that as the freedom to babble incoherently. It's not convenient
when the person you're talking to can't understand you. Standards aren't
about convenience, they're about clear and unambiguous communication. This
shouldn't be part of an interchange syntax, it should be part of a vendor's
proprietary storage format, as you say, so that they can differentiate
themselves from their competition.

Murray

...........................................................................
Murray Altheim                         http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK                    .

        "There's a lot of intelligence out there that you don't
         know if it's true or not."  -- Anonymous US official
         http://news.bbc.co.uk/1/hi/world/middle_east/3014850.stm