[topicmapmail] Topic Map-based website: www.nzetc.org

Murray Altheim m.altheim@open.ac.uk
Fri, 29 Apr 2005 09:51:24 +0100


Conal,

Congratulations on the [re]launch of your Topic Map-based digital
library! This is very welcome news, and I'll be interested in
hearing your war stories -- I'll be doing some similar work over
the next few months and I'd be keen to share notes and ideas. The
Open University Library is hopefully going to have a Topic Map-
based overview of parts of its digital library services, if I have
anything to do with it.

Great news!

Murray

Conal Tuohy wrote:
> The website of the New Zealand Electronic Text Centre has been
> re-launched, with a new topic map infrastructure based on TM4J.
> http://www.nzetc.org/
> 
> The website is a digital library, providing access to a couple of
> hundred digitised books and manuscripts. The site has been running for
> about 3 years, but this week we've upgraded it significantly, putting it
> on a new foundation - a topic map. 
> 
> The topic map presently contains 46807 topics, 192492 associations, and
> 43942 occurrences; roughly 150Mb of XTM. We are using TM4J as out topic
> map server, using TM4J's "in-memory" back-end, running on Java 1.4.1 on
> Windows 2000. The topic map consumes approximately 1.3GB of RAM.
> 
> The source material for the site is a collection of TEI (Text Encoding
> for Interchange) XML files, each of which is an encoding of a source
> object (i.e. a book). Most of the topic map is harvested from these
> files using XSLT. Each book, chapter, subsection, figure, author,
> publisher, etc, is represented by a topic, names are harvested from
> headings and captions in the text, and the containment hierarchy is
> represented by associations. These associations are used to generate
> tables of contents, as well as to provide "next" and "previous" links
> between web pages. 
> 
> For each fragment of TEI text, we harvest 2 HTML occurrences which are
> alternative representations of that piece of text. One is a "scholarly"
> (fussy) view, in which page numbers, errors, deletions and corrections
> (in manuscripts), etc are all rendered, and the other is a "basic"
> (simplified) view, in which spelling errors are silently corrected, page
> numbers are not displayed, etc. These alternatives are distinguished
> with "basic" and "scholarly" scoping topics. At present only the
> scholarly view is visible on the public website, but we plan to make the
> basic view visible during next week. Cocoon XSLT pipelines are used to
> transform the TEI into HTML (and some other formats).
> 
> Names of people, places, etc, are also marked up in the TEI, and these
> are also harvested as topics, with associations linking each person to
> the places in the texts where they are mentioned, the figures in which
> they are depicted, and to the texts which they wrote. We use a MADS XML
> file to maintain an authoritative list of names, from which we also
> harvest some biographical notes and links to external websites.
> Consequently, the system can generate a web page to represent each
> person, providing links to all the places in the library where they are
> mentioned, all the texts they wrote, and a thumbnail gallery of the
> pictures in which they appear, and links to relevant external sites.
> e.g. http://www.nzetc.org/tm/scholarly/name-207418.html
> 
> The ontology used is a subset of the CIDOC CRM (a museum ontology).
> 
> The front end of the site uses Cocoon to render pages (each of which
> represents a topic, and some "neighbouring" topics). We use Cocoon's
> templating system "jxtemplate" to render each topic. JXTemplate is
> designed to be very like XSLT, with an expression language called
> "JXPath" which is more-or-less a superset of XPath, but which also
> allows for traversal of Java objects via path expressions, e.g.
> "$topic/occurrences[type=$ontology/html]". This avoids the conceptual
> mis-match that can occur when using XSLT, which is tree-oriented, to
> style XTM, which really represents a cyclic graph. We had to write a few
> Java functions to add JXPath support for topic sorting, traversal of the
> type hierarchy, and a few other features, but nothing too hard. We use
> several different templates to render the different types of topics.
> 
> In future we plan to harvest dates from the texts, and provide
> timeline-based access to the texts. Our main technical concern is to
> replace the in-memory topic map with a database, since we need to scale
> up the topic map as our collection grows, and as we add more semantic
> markup to the TEI. 
> 
> Thanks very much to the members of TopicMapMail who have been an
> invaluable resource during the (several month) gestation of the new
> website.
> 
> Regards
> 
> Con
> 
> 
> ----
> 
>   "I believe we were all glad to leave New Zealand. It is not a
>   pleasant place. Amongst the natives there is absent that 
>   charming simplicity which is found in Tahiti; and the greater
>   part of the English are the very refuse of society. Neither 
>   is the country itself attractive. I look back but to one 
>   bright spot, and that is Waimate, with its Christian 
>   inhabitants."
> 
>          - Charles Darwin
> 
> _______________________________________________
> topicmapmail mailing list
> topicmapmail@infoloom.com
> http://www.infoloom.com/mailman/listinfo/topicmapmail
> 


-- 

Murray

......................................................................
Murray Altheim                          http://www.altheim.com/murray/
Strategic and Services Development
The Open University Library
The Open University, Milton Keynes, Bucks, MK7 6AA, UK               .

   MORE swift, more fleet, than the sun-stained feet of the Dawns that trample the night--
   More fleet, more swift, than the gleams that lift in the wake of a wild star's flight--
   Through the unpathed deeps of a sea that sweeps unplumbed, unsailed, unknown,
   Where the forces untamed, unseen, unnamed, have ruled from the First, alone,
   Now the Ghosts of Thought, with a message caught from the tales of the dreaming past,
   Unheard, unseen, with nor sound nor sheen, speed through the ultimate vast.

   excerpt from "Wireless Telegraph." by Don Marquis.  http://donmarquis.org/wireless.htm