[topicmapmail] Topic Map-based website: www.nzetc.org
Murray Altheim
m.altheim@open.ac.uk
Fri, 29 Apr 2005 09:51:24 +0100
Conal,
Congratulations on the [re]launch of your Topic Map-based digital
library! This is very welcome news, and I'll be interested in
hearing your war stories -- I'll be doing some similar work over
the next few months and I'd be keen to share notes and ideas. The
Open University Library is hopefully going to have a Topic Map-
based overview of parts of its digital library services, if I have
anything to do with it.
Great news!
Murray
Conal Tuohy wrote:
> The website of the New Zealand Electronic Text Centre has been
> re-launched, with a new topic map infrastructure based on TM4J.
> http://www.nzetc.org/
>
> The website is a digital library, providing access to a couple of
> hundred digitised books and manuscripts. The site has been running for
> about 3 years, but this week we've upgraded it significantly, putting it
> on a new foundation - a topic map.
>
> The topic map presently contains 46807 topics, 192492 associations, and
> 43942 occurrences; roughly 150Mb of XTM. We are using TM4J as out topic
> map server, using TM4J's "in-memory" back-end, running on Java 1.4.1 on
> Windows 2000. The topic map consumes approximately 1.3GB of RAM.
>
> The source material for the site is a collection of TEI (Text Encoding
> for Interchange) XML files, each of which is an encoding of a source
> object (i.e. a book). Most of the topic map is harvested from these
> files using XSLT. Each book, chapter, subsection, figure, author,
> publisher, etc, is represented by a topic, names are harvested from
> headings and captions in the text, and the containment hierarchy is
> represented by associations. These associations are used to generate
> tables of contents, as well as to provide "next" and "previous" links
> between web pages.
>
> For each fragment of TEI text, we harvest 2 HTML occurrences which are
> alternative representations of that piece of text. One is a "scholarly"
> (fussy) view, in which page numbers, errors, deletions and corrections
> (in manuscripts), etc are all rendered, and the other is a "basic"
> (simplified) view, in which spelling errors are silently corrected, page
> numbers are not displayed, etc. These alternatives are distinguished
> with "basic" and "scholarly" scoping topics. At present only the
> scholarly view is visible on the public website, but we plan to make the
> basic view visible during next week. Cocoon XSLT pipelines are used to
> transform the TEI into HTML (and some other formats).
>
> Names of people, places, etc, are also marked up in the TEI, and these
> are also harvested as topics, with associations linking each person to
> the places in the texts where they are mentioned, the figures in which
> they are depicted, and to the texts which they wrote. We use a MADS XML
> file to maintain an authoritative list of names, from which we also
> harvest some biographical notes and links to external websites.
> Consequently, the system can generate a web page to represent each
> person, providing links to all the places in the library where they are
> mentioned, all the texts they wrote, and a thumbnail gallery of the
> pictures in which they appear, and links to relevant external sites.
> e.g. http://www.nzetc.org/tm/scholarly/name-207418.html
>
> The ontology used is a subset of the CIDOC CRM (a museum ontology).
>
> The front end of the site uses Cocoon to render pages (each of which
> represents a topic, and some "neighbouring" topics). We use Cocoon's
> templating system "jxtemplate" to render each topic. JXTemplate is
> designed to be very like XSLT, with an expression language called
> "JXPath" which is more-or-less a superset of XPath, but which also
> allows for traversal of Java objects via path expressions, e.g.
> "$topic/occurrences[type=$ontology/html]". This avoids the conceptual
> mis-match that can occur when using XSLT, which is tree-oriented, to
> style XTM, which really represents a cyclic graph. We had to write a few
> Java functions to add JXPath support for topic sorting, traversal of the
> type hierarchy, and a few other features, but nothing too hard. We use
> several different templates to render the different types of topics.
>
> In future we plan to harvest dates from the texts, and provide
> timeline-based access to the texts. Our main technical concern is to
> replace the in-memory topic map with a database, since we need to scale
> up the topic map as our collection grows, and as we add more semantic
> markup to the TEI.
>
> Thanks very much to the members of TopicMapMail who have been an
> invaluable resource during the (several month) gestation of the new
> website.
>
> Regards
>
> Con
>
>
> ----
>
> "I believe we were all glad to leave New Zealand. It is not a
> pleasant place. Amongst the natives there is absent that
> charming simplicity which is found in Tahiti; and the greater
> part of the English are the very refuse of society. Neither
> is the country itself attractive. I look back but to one
> bright spot, and that is Waimate, with its Christian
> inhabitants."
>
> - Charles Darwin
>
> _______________________________________________
> topicmapmail mailing list
> topicmapmail@infoloom.com
> http://www.infoloom.com/mailman/listinfo/topicmapmail
>
--
Murray
......................................................................
Murray Altheim http://www.altheim.com/murray/
Strategic and Services Development
The Open University Library
The Open University, Milton Keynes, Bucks, MK7 6AA, UK .
MORE swift, more fleet, than the sun-stained feet of the Dawns that trample the night--
More fleet, more swift, than the gleams that lift in the wake of a wild star's flight--
Through the unpathed deeps of a sea that sweeps unplumbed, unsailed, unknown,
Where the forces untamed, unseen, unnamed, have ruled from the First, alone,
Now the Ghosts of Thought, with a message caught from the tales of the dreaming past,
Unheard, unseen, with nor sound nor sheen, speed through the ultimate vast.
excerpt from "Wireless Telegraph." by Don Marquis. http://donmarquis.org/wireless.htm