| Problems with linking, and reuse of text | Table of contents | Indexes | Parallel worlds: why XML and Java are changing everything yet breaking nothing | |||
Euler, Topic Maps, and Revolution |
| Steve Pepper |
| Senior Information Architect |
| STEP Infotek A.S
Gjerdrums vei 12 N-0486 Oslo Norway Phone: +47 22 02 16 87 Fax: +47 22 02 16 81 Email: pepper@infotek.no Web: http://www.infotek.no |
Biographical notice: |
Steve Pepper is the Senior Information Architect at STEP Infotek, a company in the STEP group, based in Norway, Germany and Hungary, that specialises in information reengineering. |
A frequent speaker at SGML events around the world, he is the author and maintainer of the popularWhirlwind Guide to SGML and XML tools , which is freely available on the internet at http://www.infotek.no/sgmltool/ , and co-author (with Charles Goldfarb and Chet Ensign) of theSGML Buyer's Guide , a comprehensive guide to choosing SGML and XML products and services. |
ABSTRACT: |
ISO/IEC JTC 1/SC 34 topic maps ![]() |
Introduction to Topic Maps |
Current status |
This paper is based on a version of the text as sent out for ballot that is already partially revised and it should therefore correspond in most, if not all details with the final standard. |
Biezunski, Michel Bryan, Martin ![]() CApH ![]() Dalrymple, Fred Davenport group ![]() GCARI Hunting, Sam Kimber, W. Eliot ![]() Newcomb, Peter Newcomb, Steve O'Reilly & Associates Wohler, Wayne |
Background |
cross reference ![]() glossary ![]() knowledge representation ![]() table of contents ![]() thesaurus ![]() |
Purpose |
So what can the standard be used for? |
| SGML architecture |
Formally speaking, the standard interchange representation of topic maps is defined in terms of an SGML architecture . A topic map is basically an SGML (or XML) document (or set of documents) in which different element types, derived from a basic set of architectural forms, are used to represent topics, occurrences of topics, and relationships (or “associations”) between topics. The key concepts, then, are thetopic (andtopic type ), thetopic occurrence (andoccurrence role ), and thetopic association (andassociation type ). Other concepts which extend the expressive power of the topic map model are those ofscope ,public subject andfacets . |
subject topic ![]() |
Topics and their occurrences |
What, then, is a topic? |
You can't get much more general than that! |
In fact, this is almost word for word how the topic map standard definessubject , the term used for the abstraction that the topic itself stands in for. |
| topic link |
Strictly speaking, the term “topic” refers to the element in the topic map document (thetopic link ) that represents the subject being referred to. However, in this paper it will often be used more loosely to denote both of these things together. Whenever there is a need to distinguish between the two, we will use the terms “topic link” and “subject”. |
encyclopaedia ![]() |
So, in the context of anencyclopaedia , a topic might represent subjects such as “Spain”, “Andalusia”, “Granada”, “La Alhambra”, the poet “Federico García Lorca”, or a piece of music by Manuel de Falla: anything that might have an entry in the encyclopaedia – but also much else besides. |
|
Topics
|
||||||||||||||||||||||
topic type ![]() |
A topic has atopic type – or perhaps multiple topic types. |
hypernymy ![]() hyponymy ![]() |
Thus, Spain would be a topic of type “country”, Andalusia a topic of type “region”, Granada and Sevilla topics of type “city”, García Lorca a topic of types “poet” and “playwright”, etc. In other words, topic types represent a typicalclass-instance relationship (variously called hyponymy/hypernymy, subordination/superordination, or the IS-A relation). |
| legal publishing software documentation technical documentation ![]() thesaurus ![]() |
Just what one regards as topics and topic types will vary depending on the kind of information in question: In athesaurus , topics would represent terms and domains; insoftware documentation they might be functions, variables, objects and methods; inlegal publishing , laws, cases, courts, concepts and commentators; intechnical documentation , components, suppliers, procedures, error conditions, etc. |
| topic map grove |
Topic types are themselves defined as topics by the standard. You can explicitly declare “country”, “city”, “poet”, etc. as topics in your topic map if you want (in which case you will be able to say more about them using the topic map model itself); otherwise a topic map processor will tacitly interpret them as topics and instantiate them as such in the internal data structure it uses to represent the topic map (called the “topic map grove”). |
|
Topic types
|
||||||||||||||||||||||
| topic characteristics |
What are the characteristics of a topic? |
cross reference ![]() topic name |
First of all, a topic can have aname – or more than one. |
| base name display name sort name |
The standard therefore provides an element form forname , which it allows to occur zero or more times for any given topic, and to consist of one or more of the following types of name: |
|
Topic names
|
||||||||||||||||||||||
scope ![]() |
The ability to be able to specify more than one topic name can be used to name topics within differentscopes (about which more later), such as language, style, domain, geographical area, historical period, etc. |
| occurrence topic occurrence |
The second characteristic of a topic is that it can have one or moreoccurrences . |
| HyTime addressing XPointers |
A topic occurrence is an occurrence (or set of occurrences) of a topic within one or more addressable information resources. It could be a monograph devoted to a particular topic, for example, or an article about the topic in an encyclopaedia; it could be a picture or video depicting the topic, a simple mention of the topic in the context of something else, a commentary on the topic (if the topic were a law, say), or any of a host of other forms in which an information resource might have some relevance to a topic. |
|
Occurrences
|
||||||||||||||||||||||
An important point to note here is theseparation into two layers of the topics and their occurrences. This separation is one of the clues to the power of topic maps and we shall return to it later. |
| occurrence role |
Occurrences, as we have already seen, may be of any number of different types (we gave the examples of “monograph”, “article”, “illustration”, “mention” and “commentary” above). Such distinctions are supported in the standard by the concept of theoccurrence role . |
|
Occurrence roles
|
||||||||||||||||||||||
Topic associations |
| topic association |
The really interesting thing, however, is to be able to describerelationships between topics, and for this the topic map standard provides a construct called thetopic association . |
A topic association is (formally) a link element that asserts a relationship between two or more topics. Examples might be as follows: |
|
Topic associations
|
||||||||||||||||||||||
| association type |
Just as topics can be grouped according to type (country, city, poet, etc.) and occurrences according to role (mention, article, commentary, etc.), so too can associations between topics be grouped according to their type. The association type for the relationships mentioned above might be “is in” (or geographical containment), “born in”, “written by”, “collaborated with”. As with most other constructs in the topic map standard, association types are themselves regarded as topics, whether or not they are explicitly declared to be so. |
topic type ![]() |
It is worth noting that topic types can be regarded as a special kind of association type; the semantics of a topic having a type (for example, of Granada being a city) could quite easily be expressed through an association (of type “instance-of”) between the topic “Granada” and the topic “city”. The reason for having a special construct for this kind of association is the same as the reason for having special constructs for certain kinds of names (indeed, for having a special construct for names at all): The semantics are so general and universal that it is useful to standardise them in order to maximise interoperability between systems that support topic maps. |
|
Association types
|
||||||||||||||||||||||
cross reference ![]() |
It should also be pointed out that while both topic associations and normal cross references are hyperlinks, they are very different creatures: In a cross reference, the anchors (or end points) of the hyperlink occurwithin the information resources (although the link itself might be outside them); with topic associations, we are talking about links (between topics) that arecompletely independent of whatever information resources may or may not exist or be considered as occurrences of those topics. |
Why is this important? |
|
Topic maps as independent resources
|
||||||||||||||||||||||
| association role |
Each topic that participates in an association has a correspondingassociation role which states the role played by the topic in the association. In the case of the relationship “García Lorca was born in Granada”, expressed by the association between García Lorca and Granada, those roles might be “person” and “birthplace”; for “La vida breve was written by Manuel de Falla” they might be “opera” and “composer”. It will come as no surprise now to learn that also association roles are regarded as topics in the topic map standard! |
holonymy hypernymy ![]() hyponymy ![]() meronymy transitive relationship |
Other association types, such as those that express class/instance and part/whole (meronymy/holonymy) relationships, are “transitive”: If we say that Lorca is a poet, and that a poet is a writer, we have implicitly said that Lorca is a writer. Similarly, by asserting that Granada is in Andalusia, and that Andalusia is in Spain, we have automatically asserted that Granada is in Spain and any topic map-aware search engine should be able to draw the necessary conclusions without the need for making the assertion explicitly. Note:
|
scope ![]() theme ![]() topic characteristic ![]() |
Scope |
|
Scoping topic names, occurrences and associations
|
||||||||||||||||||||||
| identity attribute public subject |
Public subject |
| facet facet value metadata ![]() |
Facets |
The final feature of the topic map standard to be considered in this introduction is the concept of thefacet . |
|
Applying facets for filtering
|
||||||||||||||||||||||
Conclusion |
The applicability of topic maps extends to all spheres of information management, not least commercial reference works and legal publishing. |
Support for topic maps is currently being implemented in a number of information management tools, including STEP's document management and editorial system,SigmaLink . |
dictionary encyclopaedia ![]() publishing ![]() reference works publishing |
Topic Maps and Reference Works Publishing |
infoglut ![]() |
Paradoxically, the answer lies in the fact that most users today do not needmore information – if anything, they needless , because they are already drowning in enormous quantities of it. At the very least, they need the ability to be able to find their way to relevant information as quickly as possible and to be able to filter out the “noise” created by all the information for which they have no use. They also need to be able to trust the information they receive, to know that it is reliable and up-to-date. |
AltaVista ![]() |
When writing this paper, I wanted to know who wrote the songGranada in order to be able to make my point about the scope of names. So I did a search using AltaVista and eventually, after several attempts to narrow the number of hits, found the following: |
http://www.sonyclassical.com/releases/62625.htm
Thus, two of the most important “value-adds” that commercial publishers can provide are |
Before looking at how topic maps can help solve these problems, here is some background drawn from the ideas of two leading European publishers of reference works, PWN and KF. |
| Ksiezyk, Rafal Mother Encyclopaedia Polish Scientific Publishers |
The “Mother Encyclopaedia” |
|
PWN's “Mother Encyclopaedia”
|
||||||||||||||||||||||
| Kunnskapsforlaget theme model |
The “Theme” Model |
|
KF's “Theme model”
|
||||||||||||||||||||||
Once again, the goal is to ease the task of maintenance, and enable more reuse and the faster creation of new works from the existing pool of information. |
Applying Topic Maps |
navigation ![]() |
New navigational paths |
A traditional encyclopaedia typically provides a number of different navigational mechanisms: |
“Hard facts” |
architectural forms ![]() |
Sometimes such facts are best stored as attributes of the topic itself (or its name); sometimes it makes more sense to store them along with similar information about other topics of the same type – not least because they will then be available for use in comparative tables and diagrams. (This applies particularly to population figures and other kinds of statistics, that also become easier to maintain when stored together in relational tables.) |
The following example shows how information stored in such a way might be referenced from within an SGML or XML document by means of an SQL query defined using architectural forms: |
<tbody SLquery="tptable" tptype="country" tpnames="DK DE NO PL" tpprops="name area population gdp" key="ISO3166_A2" order="name" > |
The result of such a query would be a table that is automatically kept up-to-date as the statistical information in the relational table is updated: |
| controlled vocabulary semantic validation |
Semantic validation |
Finally, the holy grail of total factual accuracy becomes a little less distant when semantic validation mechanisms are implemented, and these become immeasurably easier to maintain when they are linked to the topic paradigm. |
Thus editorial guidelines might stipulate that the set of allowable values for language attributes, used when providing etymological or other language-related information, is that specified in ISO 639; that countries should have a geographical code taken from ISO 3166; etc. These guidelines can be enforced by storing the controlled vocabularies as attributes of the topic types “language” and “country” respectively and once again using architectural mechanisms to invoke the validation: |
<!NOTATION iso639-1 PUBLIC "ISO 639:1988//NONSGML Two letter language codes//EN" > <!ELEMENT term - - (#PCDATA) > <!ATTLIST term language CDATA #IMPLIED SLvalid CDATA #FIXED "CONVOC language cvlocatt" cvlocatt CDATA #FIXED "iso639-1 icase" > |
| graph theory |
Euler, Topic Maps, and Revolution |
So what does all this have to do with Euler – or revolution, for that matter? |
Leonhard Euler, born in Switzerland in 1707, was one of the greatest mathematicians of all time – and also one of the most prolific. He spent most of his life serving at the court of Catherine the Great in St. Petersburg, and at the Academy of Sciences of Frederick the Great in Berlin. He contributed to every mathematical field that existed at the time, and created several new ones. Euler could turn his mind to any practical problem – provided it involved mathematics. One such was the famousBridges of Königsberg problem. |
The city of Königsberg (now the Russian enclave of Kaliningrad) lies astride the River Pregel (or Pregolya) at a point where it separates into two branches and forms an island. In Euler's day the river was crossed by seven bridges and it was a favourite Sunday afternoon pastime for the city's inhabitants to stroll around the town crossing the bridges. |
|
The seven bridges of Königsberg
|
||||||
The question inevitably arose as to whether it was possible to find a route which would allow all seven bridges to be crossed in the course of an afternoon stroll without recrossing any of them. Since their attempts had always failed, most people believed that the task was impossible, but it was not until 1736 that the problem was treated from a mathematical point of view. |
As you have guessed, Euler solved the problem – in fact by proving that no such route was possible. He recognised that the exact form of the land masses, river and bridges was irrelevant to the problem, and he reduced it to the following diagram showing four nodes (representing the land masses), connected by seven arcs (representing the bridges): |
|
Nodes and arcs representing Königsberg
|
||||||
From this simplification of the problem he proceeded to work out a number of principles that are relevant to any system of interconnected nodes, and thus gave rise to the branch of mathematics that is today known as “graph theory”. |
The connection, of course, is that topic maps are also graphs and that the methods that are brought to bear by mathematicians and computer scientists to solve such diverse problems as the “shortest path problem”, the “Chinese postman problem”, the “travelling salesman problem”, and a host of others, can fruitfully be applied to extending our understanding of what topics maps are and how they can most profitably be applied. |
But that isn't the only connection: One of Euler's colleagues while at the Berlin Academy was the mathematician Jean Le Rond d'Alembert, and Euler once had a memorable encounter with the philosopher Denis Diderot in St.Petersburg during which he produced his famous algebraic “proof” of the existence of God. Note: d'Alembert and Diderot were well acquainted, of course, since they were the joint editors ofEncyclopédie , generally regarded as the first modern encyclopaedia and an important factor in the development of ideas leading up to the French revolution – all of which somehow brings us back to our point of departure... |
Life is full of connections; knowledge and creativity thrive on them, encyclopaedias, dictionaries and other reference works need to be able to exploit them – and topic maps are the tool that enable them to do so, efficiently and reliably. That's why they are part of the revolution. |
| Problems with linking, and reuse of text | Table of contents | Indexes | Parallel worlds: why XML and Java are changing everything yet breaking nothing | |||