[topicmapmail] Should resourceData have a MIME type?

Murray Altheim m.altheim@open.ac.uk
Mon, 13 Jan 2003 17:58:44 +0000


Paul Goldstein wrote:

> Maybe I'm missunderstanding the spec, but is this not allowed?:
> <topic id="test">
>    <baseName>
>        <baseNameString>Hello this is a &lt;test&gt;</baseNameString>
>    </baseName>
> </topic>
> </topicMap>


The entities &lt; and &gt; are considered "predefined" by the XML
specification, so they aren't XHTML-specific. If you'd used &uuml;
from XHTML, all hell would have broken loose. :-)

This is a general problem that is solved by using Unicode characters
and a correct XML declaration for your specific way of encoding those
characters. A Unicode-aware editor will do this in UTF-8 for you, so
you can use 'ΓΌ' directly instead of &uuml;.

The necessary evil of character entities, a legacy from SGML in those
dark days prior to Unicode, still pervades XML. Actually, I myself
have little difficulty with character and parameter entities, though
I know I'm a bit odd in this regard (I wrote the modular XHTML DTDs,
which is about an arcane a use of syntax as I've seen in awhile).


> And you want to let your application know that the baseNameString needs 
> to be processed so that the character entities are transformed to html 
> markup characters.
> 
> If I'm totally off the point, let me know. Perhaps someone could send me 
> some examples what they were talking about with the resourceData element.


No, I think what you're asking for is not at all unusual -- the problem
is that once you open up <baseNameString> to allow XHTML, how much of
XHTML do you allow? Frames? JavaScript? the whole mess of HTML/XHTML?
It's better to point to an external resource via ID or XPath and
transclude that content in an external representation, IMO. But that
is more complicated than simply inlining some markup, admittedly. We
could have included in the design of XTM, say, XHTML inline markup,
within <baseNameString>, or had a special element for that purpose,
but we had to draw the line somewhere. We certainly did talk about
these types of things.


> By the way the following is pretty absurd, but validates with XML spy. 
> It has baseNameString as richtext format:
> 
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE topicMap SYSTEM "C:\work\topicmap\xtm1.dtd">
> <topicMap>
> <topic id="test">
>    <baseName>
>        <baseNameString>{\rtf1\ansi\ansicpg1252\uc1 
> \deff0\deflang1033\deflangfe1033{\fonttbl{\f0\froman\fcharset0\fprq2{\*\panose 

[...]

I can't remember if RTF ever includes XML markup characters, but
certainly if there's no danger of that happening, it's absurd but
valid markup. Problem is, if you share that topic map with anyone
else, they won't have a clue what it means (unless either the
software application or the human can natively read RTF).

Murray

......................................................................
Murray Altheim                  <http://kmi.open.ac.uk/people/murray/>
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK

      In the evening
      The rice leaves in the garden
      Rustle in the autumn wind
      That blows through my reed hut.  -- Minamoto no Tsunenobu