[topic_maps] Re: [topicmapmail] Multiple members in XTM associations

Murray Altheim m.altheim@open.ac.uk
Sat, 14 Feb 2004 01:30:28 +0000


Lars Marius Garshol wrote:
> * Murray Altheim
> | 
> | A great deal of the functionality of what you're trying to
> | accomplish I've implemented in Ceryle. 
> 
> Until Ceryle is released I guess you'll keep seeing people
> reimplementing what you've done there... :-)

The exigencies of Ph.D. programs, academic rites, etc. and the fact
that I'm just one person...  but I wasn't bringing this up to talk
about Ceryle, more suggest that TouchGraph was helpful.

> | But as you indicate, even TM4J is a bit much to download, and the
> | number of jar files to support TM4J are even larger than it is (this
> | is true of a lot of applications, of course).
> 
> That's true. However, I suspect that if you only need the XTM importer
> and the in-memory implementation you could get by with just a few
> classes plus a small SAX parser (Piccolo or Ælfred2, say).

I think there's perhaps a bit of a misunderstanding in what it takes.
It's not simply parsing LTM or XTM into in-memory objects, it's also
*correctly* handling all the merging rules, URI equivalence issues
(which includes base URI issues), duplicate association suppression,
etc.  As somebody who also wrote my own engine (which I effectively
dumped in favour of TM4J), it's non-trivial. To use that phrase.

> | I've heavily extended TouchGraph (TG) for use in Ceryle, [...]
> 
> Murray, are those extensions available anywhere? I see you've gotten
> some patches into TG, but did you push it all back to the project, or
> does it live in Ceryle?

They're not simply things that could be translated back into TouchGraph.
I adapt TG into my own package (such that it still has nothing to do
with Topic Maps), then I've got an entire layer that further adapts my
adaption into a bipartite Topics-and-Associations style graph (which
actually includes several other node types). There's also a TopicMapVisualizer
API and an implementation of that API, which relies on TM4J. So it wouldn't
make any sense to try to push that back into TG. It would be an entirely
different project, two layers on top of TG plus TM4J. And even my use of
TM4J has been heavily extended with the use of a bunch of utility classes
and methods.

It all lives in Ceryle.

> | I think Alex is to be commended on creating a very functional
> | toolkit, [...]
> 
> Absolutely!
>  
> | As for some sort of XTM processor, a lightweight one that did most
> | of what TM4J does would be very cool, though in looking at TM4J I
> | don't know what you could trim, apart from not having the storage
> | backends (i.e., doing everything in memory), and perhaps dumping the
> | utility classes. 
> 
> I'd have to agree. You usually can save a lot on .jar size if you try
> (compare Ælfred against almost any other XML parser), but still...

If one took everything in the packages:

   org.tm4j.topicmap.*
   org.tm4j.topicmap.memory.*
   org.tm4j.topicmap.cmd.*
   org.tm4j.topicmap.index.*
   org.tm4j.topicmap.utils.*

there'd be still well over a hundred classes. One may be able to
remove some of the utility classes, but a lot of them are necessary
to provide merging and other required functionality.

So I hesitate to believe it's possible to implement a conformant
engine without *approaching* the complexity of TM4J, simply because
I don't know if there's much to trim, really. For example, one can
dump the LTM lexer/parser/builder, etc. out of the utilities class,
one can dump being able to serialize to XTM (assuming you don't need
to export, and that you need to at least keep one method of import),
one can dump some of the classes for extractors (assuming you don't
need them), but we've removed less than a dozen classes. From my
own experience, I was able to do a *lot* of what a basic engine
does in about two dozen classes, but I was a long way from being
compliant (and I resorted to some clever tricks from my days working
in assembly language to keep code size down).

(I actually contributed to TM4J's bloat a bit by adding XML catalog
support, which I find absolutely necessary for my work. That alone
adds the Sun catalog resolver jar, which is another 30 classes,
though it's at least in its own jar.)

And of course this isn't really about class count, it's about
functionality. If somebody can implement a compliant XTM processor
in a small footprint, great. I'd just like to see it demonstrated
that it did proper merging and other required behaviour, so that
it would properly process XTM documents, not some subset or
restricted version of XTM (like one that prohibited certain elements
or constructs), because that wouldn't be XTM any longer.

Not trying to be a wet blanket, just a caution about how "non-trivial"
building a Topic Map engine is. If it was, I think we'd see a lot
more of them. In the earlier history of XML parsers, there were at
least several dozen of them. I wrote one (while at Sun, which was
never made public), which also did DTD analysis (used to assist in
building the modular XHTML DTDs). Nowadays there's two or three that
have survived. But while non-trivial, XML parsers *are* pretty
trivial compared to creating a compliant XTM processor simply because
we're building something that operates at a different (i.e., graph)
level. Now if anyone wants to build an antlr for XTM... (and I don't
mean an antlr grammar, I mean an xtmantlr).

Murray

......................................................................
Murray Altheim                    http://kmi.open.ac.uk/people/murray/
Knowledge Media Institute
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

  "I'm a war president. I make decisions here in the Oval Office
   in foreign policy matters with war on my mind." -- George W. Bush
   http://news.bbc.co.uk/1/hi/world/americas/3470139.stm

  "This is the new Mein Kampf. Only Hitler did not have nuclear
   weapons. It's the scariest document I've ever read in my life."
         -- Dr. Helen Caldicott, referring to the Project for the
   New American Century report entitled "Rebuilding America's
   Defenses: Strategy, Forces and Resources For a New Century"
   http://home.earthlink.net/~platter/neo-conservatism/pnac.html

     "This report proceeds from the belief that America should seek
      to preserve and extend its position of global leadership by
      maintaining the preeminence of U.S. military forces." [op. cit.]

     "[...] and advanced forms of biological warfare that can target
      specific genotypes may transform biological warfare from the
      realm of terror to a politically useful tool." [op. cit.]

  "This is a blueprint for US world domination."
   http://www.guardian.co.uk/comment/story/0,3604,1036571,00.html