XML DTDs for Electronic Commerce and EDI   Table of contents   Indexes   Cognitive Agents for Automatic Generation of Valid XML Documents

 Germany 
Hemrich, Martina
 Rimpar 
 STEP Electronic Publishing Solutions GmbH 
 
Martina Hemrich
 Consultant
STEP Electronic Publishing Solutions GmbH
  Technologiepark Würzburg-Rimpar Pavillon 7 Rimpar  Germany (D-97222)
Email: consulting@step.de Web site:www.step.de
 Biography
 Martina Hemrich works as an SGML/XML consultant at STEP's consulting department. She started at STEP in March 1995 specializing in DTD design and the rendering of structured data (both layout and linking concepts). The focus in her conceptual work lies on analysis and design of information architectures targeting the intelligent presentation of information. The main aspects of her consulting work are the analysis of publication processes, Information Process Reengineering, creating new concepts for a presentation of (intelligent) information that meets the requirements of both contents, audience, and medium. She carries out SGML implementation projects, coaches the introduction of new publication processes, gives workshops on SGML/XML concepts, information design and rendering strategies. Degree in English and Romance Language and Literature. Magister Artium 1987. Editorial work using structured capturing mechanisms. Teaching position. Additional degree in computer linguistics in 1994 with major in SGML.
 Germany 
 Rimpar 
 STEP Electronic Publishing Solutions GmbH 
Schäfer, Ulrike
 
Ulrike Schäfer
 Consultant
STEP Electronic Publishing Solutions GmbH
  Technologiepark Würzburg-Rimpar Pavillon 7 Rimpar  Germany (D-97222)
Email: consulting@step.de Web site:www.step.de
 Biography
HTML link
 ISO/IEC 13250 
 SGML 
 Topic Maps 
 World Wide Web  
 XML 
 hypertext 
 hypertext system 
 link 
 linking concepts 
 
Ulrike Schäfer works as an SGML/XML consultant at STEP´s consulting department. She joined STEP in April 1994 as a software developer in SGML-based information management projects, specialising in data conversion, database programming, DTD development, workflow development, and training. Since April 1998 she has worked as a consultant with the main focus on analysis and re-design of information processes, workflow design, and SGML/XML-based information management. She gives workshops on SGML and XML concepts. Ulrike Schäfer studied German language and literature, philosophy and computer science at the University of Würzburg. Magister Artium 1991. Teaching position at the University of Würzburg until 1994.
 

 
"The process of tying two items together is the important thing." [BUSH45, 19]
 

"What Isn't Hypertext?"

 Since we are social beings, one of our basic needs is to communicate information to other humans. Language is our "tool" for communication, and the instance , the physical representation of language, istext .
 Communication of human knowledge usually is text-based -- be it written or spoken text--, and it usually happens within a context. That is, a communicated piece of information always has some relevance to another piece of information or a specific situation. In text-based communicationlinks express the relation of one information unit to others and thus express thecontext in which this information unit is relevant. Without context, information cannot be remembered and, as a consequence, will not reach the status of knowledge .
 Text creators use links to build up atextual system that is more than the written text they actually write down. Almost each textual statement has some connection to other texts that serve as support, contradiction, proof, supplement, comment, etc. For the written, paper-based, presentation of the connection and the multiple connection types, various strategies have been established: tables of contents, indexes, cross references like "see page 123", support the readers in their navigation through a given text; citations, cross references to external information sources, commentaries, and annexes provide additional information; bibliographies contain the information where additional (supporting, commenting, opposing...) information can be found; glossaries and thesauri have explanatory functions.
 An introductory description of "contextual" information -- of the hypertext --, and its characteristics, its implications on authors and readers, the challenge it imposes on the development of new presentational concepts and technologies, and the chances it gives us in the field of information creation, distribution and management will be given before investigating on the various link types and concepts that are used for constructing hypertextual infrastructures.
 

"Weaving the Hypertext"

 If one looks up the meaning of the word components of hyper-text in Webster's New World Dictionary of the American Language (1976), the following entries are found for text :
 "the actual or original words used by an author, as distinguished from notes, commentary, paraphrase, translation, etc. "
 and furtheron
 "the principal matter on a printed or written page,as distinguished from notes, headings, illustrations, etc. "
 The prefix hyper is defined as
 "over, above,more than the normal, excessive ..."
 Now, if we combine thebold parts of the definitions, the conclusion can be drawn thathypertext must be something more than text but somehow connected to the "normal". If the "normal" is the words used by an author, the "hyper" probably has something to do with the "notes, commentaries, paraphrases, translations, headings, illustrations, etc."
 If we have a closer look at the source where we got these definitions from, we are faced with an information pool -- a dictionary -- that consists of entries and that uses hypertextual devices like paraphrases, synonyms, references to other entries, illustrations, etc. as explanatory information to themain entry word -- the lemma .
 Explicit references like "seetextual " are direct hints as to where additional information related to the lemma can be found. Other references are not as obviously stated, like the mere mention of synonyms, paraphrases, antonyms, etc. The "curious reader", though, follows these hints to make sure she gets a complete picture of the lemma she wants to know about.
 Admittedly, dictionaries are not the typical information pools that make extensive use of explicit references. Information pools and types that are far more hypertextual than dictionaries and in a far more obvious way -- encyclopedias for example -- will be described later.
 hypertext 
text
web
 
But still, reading such an entry in a dictionary, an "ideal reader" will find enough anchors that catch her attention and let her own associations and ideas emerge -- thus complementing, supplementing, -- co-authoring --, the dictionary's entry. She might even consider consulting other reference works, persons, the Web, ... -- thus getting involved in "Weaving the Hypertext".
 

The Spiral Web

 
"It is exactly as though the physical items had been gathered together 
from widely separated sources and bound together to form a new book."
[BUSH45, 19]
 

Traditional notion oftext vs. hypertext

 
text hypertext
complete : one whole self-consistent entity modular : multiple information units
linear : following in time and order; only one reading order possible non-linear : various possibilities for reading order
two dimensional : printed presentation on paper two+n dimensional : link networks existing above the text
 

modular: "... widely separated sources...":

 The knowledge reservoir of a human being consists of uncountable information units that we keep on augmenting during our entire life-span. This information pool per se is not logical and ordered. It is rather chaotic and difficult to calculate. We don't speak about knowledge before some mental links have been established between units thus putting them in acontext . Information without a context is not accessible.
 In analogy to the human "knowledge management" going on in our brains, a hypertext constitutes a textual system consisting of multiple composable information units that -- depending on each composition design -- form different wholes. For instance, an information pool consisting of individual encyclopedia articles could serve as a reservoir for the production of a 1--volume edition, or a 24--volume edition or the edition of a mere biographic edition containing only the biographical articles of the pool.
 Each new composition of the information units has its own individuality like a musical composition that is made of known material (notes); by the combination of the individual components a new melody emerges which in itself has different effects on each listener.
 

non-linear: "... bound together...":

 Human thinking is not linear. Human knowledge is more than two-dimensional. The articulation of what we know isthinking and the highways from one unit of knowledge to others we calllinks . Links establish the connection between the bits of knowledge in the information pool called thehuman mind .
 A linear -- paper-based -- presentation of what is going on in the human mind inevitably implies a loss of information. Since communication of knowledge usually is text-based (whether verbal or written), new, more adequate media for the presentation of text had to be found for knowledge presentation.
 Human knowledge can be regarded as an immense information pool consisting of interconnected information units based on the associations of the human brain. And this is what hypertext strategies are trying to mirror in computers. Paper never has been a satisfying medium for communicating human creative information. Not only in modern times authors have struggled to transcend the limits of paper (like James Joyce; Arno Schmid; Douglas Adams; Italo Calvino). With today's technological background, the "computer" and its underlying technological concepts represent a medium which is being developed in such a direction as to carry out human actions as perfectly as possible.
 The technological playground for creative, non-linear knowledge representation and communication is ready for "authors" and "readers" -- for the "textual" persons involved in weaving the hypertext -- who are not afraid of the possibilities the new technologies offer them.
 One major implication of the combination of the hypertext paradigm and the technological possibilities certainly is the changing role of "authors" and "readers". "Weaving the Hypertext" will be an interactive process rather than a presentational one once the technological infrastructure is ready for the mass of people. The availability of browsers like Microsoft Internet Explorer version 5 with its browsingand authoring facilities is one more step towards the becoming real of this changing paradigm for a broad audience.
 

two+n--dimensional: "... to form a new book...":

 knowledge management 
 
When it comes to managing knowledge in the human mind, we classify andcategorize the role the information unit plays within a given context. In this way weorganize and manage what we know with the main target to have it there when we need it: we prepare our mental information pool for retrieval andnavigation .
 Itis possible to represent this mental knowledge management on paper; text creators have developed strategies liketable of contents , index ,glossary ,cross reference ,citation , quotation-- but the character of these organizational devices ishyper -- "over, above,more than the normal, excessive... " -- that is,over and above the printed letter that represents information in a linear and flat -- two-dimensional -- way, thus implying at least uncomfortable reading if not loss of information.
 Technological development in the computer sector in the past decades has brought usfast (hardware capacities) andautomatized (search engines) information management and retrieval facilities. But this is not enough for making information accessible in the way the human brain does: intelligent linking concepts are needed for representing the complex structure of knowledge and providing forintelligent retrieval facilities andnavigational support . Linking concepts are developed according to the information type that is to be managed and the retrieval and navigational requirements the audience of this information type imposes.
 An encyclopedia, for instance, is an information type that uses various types of references which serve as anchors for reader navigation. If you come across an entry in an encyclopedia like the following:
 
World Wide Web ..., Informationstechnik: »» WWW. [BROCK15]
 the next action to proceed in accessing the information you are looking for is to go to the entry
 
WWW [Abk. Abkürzung für engl. englisch World Wide Web "weltweites Netz"], 
(Web, W3), Dienst im Internet, der über eine Benutzeroberfläche mithilfe
eines »»Browsers den Zugriff auf weltweit verteilte, auf Servern 
gespeicherte Information ermöglicht; verbreitete Browser sind der 
"Netscape Navigator" und der "Microsoft Internet Explorer". 
Die Suche von Dokumenteninhalten und nach in ihnen verwendeten Begriffen 
erfolgt mithilfe von »»Suchmaschinen. [BROCK15]
 You know this latter entry is there in the encylopedia because you are familiar with the symbols used in this specific information pool ("»»" specifying a reference to another entry in the encyclopedia) and you go to the relevant volume and the page where this entry is. On a CDROM World Wide Web would be blue and underlined (for example) using the familiar signal for navigation in the Web. This signal would be followed by an action -- a mouse click in this case, followed by a performance of the software.
 Everybody who knows the Web is familiar with this very simple link concept: theone-way link (<a href="www.step.de"> Link to an XML/SGML company in Germany </a> ) that constitutes the basis for netsurfing in WWW-documents.
hypertext systems
 
Hypertext systems with link concepts that are much more sophisticated than the World Wide Web exist -- like Augment, Xanadu, FRESS, KMS, SEPIA, HyperCard, just to name some of them. But they exist only in a given restricted environment for an audience that is privileged by having an intelligent information hypertext system at their disposal. The common person who uses the WWW has to put up with the insufficiencies of a hypertext system whose link concept up to now is much too simple to express and present human knowledge.
 hypertext 
 knowledge management 
 linking concepts 
standards
 
Standards that allow for expressing clever linking concepts adequate for the individual information types and the WWW audience's requirements, though, are currently being developed:XLink , Xpointer, Topic Maps , just to name some of the basics which will be dealt with in the second part of this paper.
 

Typical hypertext information types/situations

 The WWW should be regarded as a platform for hypertext applications rather than a hypertext system itself [MUENZ97]. It provides for the infrastructure (data format, exchange protocols), but the representational design of the various information types calls for more powerful concepts with respect to data formats and link types. This becomes clear when we have a look at some hypertextual information types and the different concepts needed to present them in an optimal way.
 According to Ben Shneiderman (Reflections on authoring, editing, and managing hypertext. In Barrett, E. (Ed.): The Society of Text. MIT Press, Cambridge, MA, 115--131 ), the following criteria for hypertext applications should be considered: Hypertext is an applicable concept when an information pool is large and consists of individual components which are related to each other. Furthermore, the user needs access to each single information component at a certain time. [NIELS96] gives a thorough description of hypertext-prone information types (pp. 67--129) which will be summarized shortly.
 
  1.  Commercial applications:
     
    •  Legal texts and reference works (encyclopedia, dictionaries) are highly hypertextual information types that have already gone beyond the limitations of paper-based presentation. In encyclopedias, for example, the communication of knowledge is based on navigational strategies that are based on links like "World Wide Web ... » WWW." [BROCK15].
    •  Legal texts employ linking systems for example for expressing relations to supplementary information like commentaries and to information that serves as evidence.
    •  Manuals and technical documentation are presented to the reader in a much more effective way when links can be followed interactively in the instance of maintenance procedures, for example.
  2.  Applications in thecomputing area: Complex link concepts can combine technical documentation, online help texts, tutorials, and software documentation to be rendered in interactive user interfaces, thus constituting a huge hypertext prepared for allowing customized views for different user profiles and profile-dependent actions.
  3.  knowledge management 
     
    Hypertextualtraining environments: The communication of knowledge is best realized when the training platform and the information carrier used mirrors the infrastructure of the human brain. By establishing hypertextual training "manuals" various ways through the contents can be offered to the students for each student to be able to choose the way best suited for her. In cooperative hypertext systems students will also be able to contribute to the contents and interact with teachers and fellow students.
 

Visions and Standards

hyperspace
 
The vision is to use the power of today's (and tomorrow's) technological means andstandardized concepts for representing human knowledge and understanding in an adequate way, and, as a consequence, make information accessible for the audience without letting them get "lost in hyperspace".
aspects
filters
 
To achieve this, the ideas of hypertext systems imply that information exist in self-describing fragments that are connected to each other in multiple (self-describing and intelligent) ways. This enables the automated creation of filtered views onto an information pool, of navigational devices, of infrastructures for cooperative text creation, etc.
 infoglut 
 linking concepts 
 
What technological concepts are out there to make computers mirror human knowledge and -- in combination with their technical advantages (fastness and automated processing) -- rescue it from the infoglut by making it accessible and readable to humans?
 

Components of Linking Concepts

 HyTime  
 SGML 
 Topic Maps 
 Web 
 XLink 
 XML 
 hypertext 
 hypertext system 
 
When we talk about hypertext or linking mechanisms, we mention quite different issues: In the SGML/XML world, one talkes about HyTime or XLink (which are linking languages), Topic Maps (which is a concept for modelling knowledge structures), intranets or the Web (which are platforms for hypertext applications), link engines based on HyTime or other languages (which are hypertext systems). But how are these topics related to my real-life publishing scenario, e.g. as a legal publisher, where I have to provide links in my data which can be used in heterogeneous media like print and the Web, or where I have to ensure that a link created by an editor doesn't point to a target which has been deleted during the editorial process?
Linking languages
 
Linking languages, concepts, and systems should be "enabler" to build powerful individual linking concepts for creating, generating, editing, maintaining the different kinds of links in my information pool and — sometimes — even from my information pool to others.
Linking Concepts
 
Therefore questions concerning XML-based linking concepts should — according to this map — start out from the individual requirements:
 
  Linking Concepts
 
  1.  Individual Linking Concepts 
     
    Individual Linking Concepts: Which typical requirements and problems occur in real-life publishing situations?
  2. Linking Languages
     
    Linking Languages: Which XML-based or XML-related linking languages are available? Which kinds of links do they provide to express the different relationships given in real-life scenarios?
  3.  General Linking Concepts 
     
    General Linking Concepts: Which general concepts can be used to solve the general parts of the individual linking requirements?
  4. Hypertext Systems
     
    Hypertext Systems: What features should document management or editorial systems provide to let the editors build sophisticated link networks with automatic support?
 Individual Linking Concepts 
 
The following chapters will focus on the first three issues: Which real-life requirements are out there (e.g. in reference works and legal publishing, online help systems, technical documentation, and the Web) and how can they be fulfilled using an XML-based linking language (XLink, XPath, XPointer)? Finally we will give a short introduction of a general SGML/XML-based linking (and even knowledge modelling) concept: The new standard ISO/IEC 13250:Topic Maps.
 

Individual Linking Concepts

 

For Example: Reference Works...

 Link Types 
 

Link Types

 
Bernstein, Leonard: 
(b. Aug. 25, 1918, Lawrence, Mass., U.S. — d. Oct. 14, 1990, New York, N.Y.), American 
conductor and composer noted for his accomplishments in both classical and popular 
music, for his flamboyant conducting style, and for his pedagogic flair, especially 
in concerts for young people. [...] In 1943 Bernstein was appointed assistant conductor 
of the —> New York Philharmonic; his first signal success came on Nov. 14, 1943... 
[BCD: "Leonard Bernstein"]
"content-to-object" links
 
The link in this article points from a sequence of words—> New York Philharmonic to an entire lexicon entry wherein New York Philharmonic is the lemma (main topic). Given that the articles in a reference work are treated as separate documents during the editorial process, most of the links in a reference work are "content-to-object" links which means that they point from a part of a document to an entire document (another article).
content-to-content links
object-to-content links
object-to-object links
 
Of course other kinds of links occur in other contexts, like content-to-content links (pointing to a paraphrase or definition of a term), object-to-content links (pointing from the articleLeonard Bernstein to the index entry Bernstein, Leonard ) or object-to-object links (connecting two articles in a dictionary of synonyms). Linking languages must provide a vocabulary to describe all these kinds of links.
 

Different Media Types

 It should be possible to publish a reference work in different media types like print products, CD ROMs, etc., using the same document sources. The links created in these sources should be ready to use in all the different media types.
 A reference work may, e.g., contain an index of the persons mentioned in the articles of the work. In the print product there is usually a link which points from an index entry likeBernstein, Leonard to the occurrences of this topic (the person Leonard Bernstein ) in several articles. The reader also may choose the other direction: from an occurrence to the index entry, but this action is not supported by an explicit link in the printed work.
bidirectional link
 
In an electronic version of the reference work, it would be useful to make this link (pointing from the occurences to the index entry) an explicit link: The reader could — supported by the electronic system — go from one occurrence to the index entry in order to choose another occurrence. According to this mechanism, all the occurences should be connected to the index in both directions. For this requirement, there could be used two links, one for each direction. But maintenance would be much easier if there was only one, bidirectional link:
 
... Leonard Bernstein... (article) 

<—>

Bernstein, Leonard (index entry)
 multidirectional links 
 
One should go further: It would be most useful to the reader if there was a link from each occurrence of Leonard Bernstein to all the other occurrences in the work. When activating the link, the reader then could choose between the different targets. For this effect there have to be structures for multidirectional links available in the linking language.
link targets
 
In this scenario, presenting the different link targets to the reader would only make sense when the targets are described or categorized somehow so that the reader can choose the target(s) which is / are most interesting to her, e.g. further information about Leonard Bernstein :
 
  1.  biography (link target 1)
  2.  compositions (link target 2)
  3.  conductions (link target 3)
 multidirectional link 
 typed link 
 
Providing multidirectional, typed links is a requirement in many other scenarios, e.g. in online help systems or electronic manuals: All the existing link targets concerning a certain topic have to be presented and categorized so that the user can choose the appropriate target information in every situation.
 Variants 
 

Variants, Dependent on Media or Products

 In the CD ROM product of the reference work, internet links related to the topic of the article should be added, e.g. Web sites about Leonard Bernstein :
 
Internet Links:

  http://www.classical.net/music/comp.lst/bernstein.html
  ...
variants
 
In this context, a concept for variants is needed: The text part above belongs only to the CD ROM variant of the document. Other parts may belong to both the CD ROM and the print product.
 In XML, the variants could be marked up by using attributes:
 
<further-info book="no" cdrom="yes">
  <A href="http://www.classical.net/music/comp.lst/bernstein.html">
    Leonard Bernstein
  </A>
  ...
</further-info>
 dangling link 
 hypertext 
 
When variants are used, the editors have to avoid that links point from one variant to another: In the given context, a link stored in a text part belonging only to the print variant must not point to text parts belonging only to the CD ROM variant of the reference work. In the print product, this would be a link pointing to a non-existing target (dangling link). That type of error can't be avoided by the link language. But a system with hypertext support should be able to prevent the editor from creating those errors.
 

Facts: Linking by Query

 
London: 
... Its population was stabilizing at about 6.6 million, almost comparable in size 
to that of New York City...
query links
 
The facts given in an article, e.g. the population number, should be edited only once, usually in a database. In the reference work's articles, these facts are included by query links to the fact base. This concept would make maintenance much easier: When the facts change, the editors have to change them only once (in the database) instead of changing every occurrence in every work.
 A linking language has to provide several ways to address a link target such as "addressing by (database) query" in the example above. There are other requirements for addressing link targets as well (see the example below: "Read-Only Targets").
 Besides, the result of such a link address sometimes has to be processed in a specific way in order to generate the link target on-the-fly: In the given example, the database entry may contain the value 6.600.000 in an attribute called "population". The presentation of this value in the reference work article is 6.6 million .
 

Read-Only Targets

 Imagine that the fifth paragraph of a certain Web resource contains background information about the production of theYoung People's concert of Leonard Bernstein:
 
"The total number of production, technical and special personnel needed to get a 
YOUNG PEOPLE'S CONCERT on videotape amounted to 75. Add to this the conductor and 
orchestra, plus the New York Philharmonic staff, and around 220 people in all were 
involved in putting a 'simple' music program on the television screen."
     [ENGL]
 In the online reference work, a link should point from the article aboutLeonard Bernstein not to the entire web site but exactly to the portion of text containing the quoted background information. But the web side is a read-only document which means there is no way to add a unique identifier to the target paragraph. There have to be other mechanisms provided by link languages to address a link resource without changing it.
 

... And Many Others

 There are many other scenarios where sophisticated linking concepts are required. In some scenarios the requirements are similiar to the ones described above; in some others, there are additional, quite specific ones to be taken into account.
 

Legal Publishing

time-based variants
 
In legal publications, variants play an even more important role than in many other contexts. These variants are time-based:
link validity
link version
 
A changed bill will be valid on a certain date, which means that two variants of the bill exist; the validity is dependent on the current date. There is also an interdependence with versions: When the date changes, another variant gets the current (published) version of the bill.
 Again, the different variants must not cause dangling links, as mentioned in the example above. Besides, there could exist links pointing to other bills or commentaries which are related to that bill. The link targets may be changed at some time as well. The editor — or better: the editorial system — has to ensure that there are always the appropriate variants interconnected together.
 

Technical Documentation, Online Help Systems

 Variants 
 
Variants often play a role in technical documentation if there exist, e.g., several modifications of a product. In this case the documentation will be quite the same but differ in some parts. Besides, there is a need for variants in online help systems, e.g. for starters / advanced users / experts. Again, one has to take care of the interdependence between links and variants.
 multidirectional link 
 typed link 
 
In online help systems, multidirectional, typed links are an important issue: When searching for a certain topic in online help systems, all the occurrences of that topic should be listed in order to let the reader choose the subject that is useful in the current situation.
 

Finally: The Web!

 World Wide Web 
 
The most extensive use of links can be observed in the Web which lives of highly interconnected bits of information. Many of the requirements already mentioned can be found there as well, in particular for the needs of professional, commercial Web publishing:
 
  1.  multidirectional link 
     typed link 
     
    If a legal publisher, e.g., wants to use the internet as commercial platform, the Web must "speak" link languages that provide structures much more sophisticated as HTML, like multidirectional, typed links as presented in the examples above.
  2.  Besides, there is an enormous number of different target groups using the web. They should have access to the same information pool, but this pool should be interconnected under different aspects, according to the different interests of the user groups.
  3.  The problem of read-only sources is usual business in the internet. Of course there has to be a way to add links which point into such documents without changing them.
 

Conclusion: Requirements of Individual Linking Concepts

 The real-life examples described above show that there is an extensive need for sophisticated linking concepts as well as for systems supporting them. Some of the requirements are quite the same in every context, some of them vary on the different editorial and publishing situations. The main requirements can be summarised as follows:
 

Standard

 SGML 
 XML 
 
There should be chosen a standard format for all text sources of the information pool in order to be able to design a consistent link network in an effective way. To build a consistent link network is one reason — among many others — to choose a standardized language like SGML (Standard Generalized Markup Language) or XML (eXtensible Markup Language) for encoding the textual information pools.
 

Link Languages

 Link Types 
 
There must be link structures given in standard languages which provide all the link types needed; most of them were mentioned in the scenarios above:
 
  Link Types
 
 These link types — informations about all the aspects of a link — should be stored in a way that is independent on any specific system or application, which again is a reason for choosing a standardized language for text encoding in general as well as for designing link structures:
 "Remember, be kind to heirs of your growing information repository. Use links to build inter-document relationships like you use markup to build intra-document relationships. Software may fail, but data will live on forever." [ KIPP, p. 3]
 General Linking Concepts 
 

General Linking Concepts

 RDF 
 Topic Map 
 
General linking concepts (link architectures) like Topic Maps or RDF (Resource Description Framework) should provide facilities to build information and knowledge networks based on links. To develop standardized, general concepts for fulfilling general requirements of course offers many advantages, in particular when there will be developed powerful tools providing automatic support for these standards.
 

Compatible to other concepts

 Linking concepts have to be compatible to other concepts like variants or versions. There are several kinds of interdependences between these concepts. Tools have to provide configuration and implementation capabilities to adapt the system exactly to the individual requirements.
 

Tool Support

XML Linking Language
 
Link generation, e.g. processing of target on-the-fly, and maintenance should be supported by the editorial or document management system, as well as functions that let the editor create and edit links with automatic support and consistency check during the editorial process.
 

XML Linking Language

 HTML, Hypertext Markup Language 
 HyTime 
 SGML 
 
The link structures of HTML don't fulfill the requirements of linking concepts in professional, commercial publishing scenarios. Most of the examples presented above can't be expressed by HTML links. HyTime, a powerful link language with SGML support, is not supported by a large number of systems, and some of the HyTime linking structures are only required in very specific applications.
 XLink 
 XML 
 XPath 
 XPointer 
 
XLink and its related concepts XPath and XPointer have been developed in order to provide link structures widely required that can be used in XML documents. In the following chapter, these concepts will be described shortly. The main focus lays on specific link structures which are not available in HTML and are needed in many different contexts.
 XLink is a language for "construction of hypertext links, which connect located information [...] and provide descriptive information about these connections" [XPointer].
 What kinds of links can be expressed by XLink and its related languages for adressing, XPointer and XPath?
Link Semantics
 

Link Semantics

 multidirectional links 
 
In the reference works scenario described above, multidirectional links are used to directly connect the different occurrences ofLeonard Bernstein to each other. Their meanings have to be described so that the reader knows what information to expect on the different targets.
 XLink 
 
In XLink, the semantics of a link as well as the semantics of the resources (source and target) can be described, which could mean e.g. the relation between the lemma (short article about Leonard Bernstein ) and the related informations in other documents (e.g. his biography, stored in another article; his compositions, described in a separate reference work of musical art; the works he conducted, described in separate articles of the reference work; etc.):
 
Leonard Bernstein —> West Side Story (occurrence) 
composer          —> work            (role)
 
<lemma-moreinfo 
        xml:link="simple" 
        role="work"
  title="West Side Story"
  href="musicals/west-side-story.xml"
  content-role="composer">
    Leonard Bernstein
</lemma-moreinfo>
Multidirectional Links
 

Multidirectional Links

 All the works composed byLeonard Bernstein can be stored as a collection of links. When the reader activates one of the link sources, all the other resources can be presented as possible link targets. The list of possible targets contains the "meaning" of the links, i.e. all the "titles":
 
<works xml:link="extended">
  <loc 
    xml:link="locator" 
    role="composition" 
    title="composition: West Side Story" 
    href="musicals/west-side-story.xml"/> 
  <loc 
    xml:link="locator" 
    role="composition" 
    title="composition: Chichester Psalms" 
    href="liturgy/20thcentury.xml#id(bernstein-chichester)"/> 
  <loc 
    xml:link="locator" 
    role="composer" 
    title="composer: Leonard Bernstein" 
    href="leonard-bernstein.xml"/> 
</works>
 This set of interrelated links can be stored in a document separate from the link resources (targets). Maintenance is much easier this way.
 

Adressing by Navigation

 XLink 
 XPointer 
 
With XLinks combined with XPointers, it is possible to describe a link target (remote resource in the terms of the XLink specification) by stepwise navigating to the target element or character string:
 In the fifth paragraph of a Web resource there is given background information about the production of theYoung People's concert [ENGL]:
 
"The total number of production, technical and special personnel needed to get a YOUNG 
PEOPLE'S CONCERT on videotape amounted to 75. Add to this the conductor and orchestra, 
plus the New York Philharmonic staff, and around 220 people in all were involved in 
putting a 'simple' music program on the television screen."
 This link target can be described by using an XPointer:
 
<background xml:link="extended"> 
  <loc 
    xml:link="locator" 
    role="production" 
    title="production of videotape" 
    href="http://www.leonardbernstein.com/studio/young-peoples-production#
          /descendant::para[position()=5]"/>
</background>
read-only resources
 
This structure is extremely important when dealing with read-only resources as they are in the Web: An author is able to address not only an entire Web document but also the internal structure of the document. This feature fulfills one of the main requirements for more powerful Web linking concepts.
 XPointer and XPath make not only elements but also character strings of a document accessible by navigation in the XML document tree and pattern matching expressions.
Link Networks
 

Multiple Link Networks

 By being able to store the lists of links outside the resources that are connected to each other, it is possible to maintain several link networks which can be used as a "layer" over a set of information objects (like XML documents). Dependent on the situation, e.g. the user groups and their interests, a certain link network can be chosen.
 Besides, "if link engines are required to determine all (linked-to, linked-by) relationships at run time, performance will degrade substantially. For this reason, interconnected clusters may be declared a priori using the group links." [KIPP, p. 3 ]
link group
 
By defining the link group, the used link network can be chosen. The link group says which sources contain the relevant links that should be presented when navigating through an information pool.
 
<works xml:link="extended"> 
  <loc 
    xml:link="locator" 
    role="composition" 
    title="composition: West Side Story" 
    href="musicals/west-side-story.xml"/> 
  <loc 
    xml:link="locator" 
    role="composition" 
    title="composition: Chichester Psalms" 
    href="liturgy/20thcentury.xml#id(bernstein-chichester)"/>
  <loc 
    xml:link="locator" 
    role="composer" 
    title="composer: Leonard Bernstein" 
    href="leonard-bernstein.xml"/>
</works>
 At the start of the reference works article, this link file is declared within the link group declaration:
 
<link-network xml:link="group"> 
  <link-list 
    xml:link="document"
       href="linknetwork-compositions.xml"/>
</link-network>
 Topic Map 
 
The separation of thelink from thelinked information offers the great opportunity to use the same data for more than one publication but with different links. These different views on the same information pool enables one to reuse information, processing and presenting it under several aspects.
 

Topic Maps: A Standard Format for Link Networks

 GPS, Global Positioning System 
 ISO/IEC 13250 
 
Topic Maps is a new ISO standard (ISO/IEC 13250: Topic Maps) published in summer 1999. It defines the concepts and architectural forms for the semantic structuring of link networks thus declaring an interchange format. Named as “GPS for the information universe” topic maps will become the solution for organizing andnavigating large and continuously growing information pools.
 knowledge management 
knowledge presentation
 
A topic map organizes large sets of information resources. It builds a structured semantic link network over the resources. Essentially, it is a set of hyperlinks that links topics to the relevant instances in the information pool. The network described in a topic map allows easy and selective navigation to the requested information. Searching in a topic map can be compared to searching in knowledge structures. In fact, topic maps are a base technology forknowledge representation andknowledge management .
 topic map 
 
A topic map is a separate document independent from the information objects it describes. It is basically an SGML/XML document that describes different element types, derived from a basic set of architectural forms, which are used to representtopics ,occurrences of topics, and relationships (associations ) between topics. The key concepts are thetopic (andtopic type ), thetopic occurrence (and occurrence role type ), and the topic association (andassociation type as well asassociation role type ). Other concepts which extend the expressive power of this model arescope ,public subject , andfacets .
 

Topics

 topic 
 topic type 
 
A topic , in its most generic sense, can be any “thing” whatsoever – a person, an entity, a concept, really anything – regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever. With the words of the standard, the term “topic” refers to the element in the topic map instance (thetopic link ) that represents the subject being referred to.
 Examples oftopics are: USA, New York, Leonard Bernstein, New York Philharmonic, West Side Story.
 A topic should have one or moretopic types .Topic types are a typical class-instance relation and they are themselves defined as topics by the standard. Having topic types as topics, the expressive power of topic maps is used to say more about the type.
 Examples oftopic types are: country, city, conductor, orchestra, composer, work.
 

Topic characteristics

 topic characteristic 
 
Every topic has two characteristics (or at least one of them): atopic name and anoccurrence . The topic name consists of three parts: the base name , thedisplay name , and thesort name . Only thebase name is required.
 Examples oftopic names (base / display / sort) are: NY/Big Apple/New York.
 Anoccurrence is a link to an information resource that is somehow relevant to the topic. The linked resource is typically an information object outside the topic map.
 Examples of occurrences are: article about Leonard Bernstein, video about the Young People's Concert, foto of an orchestra, audio tape of a concert. Everyoccurrence belongs to one occurrence role type. Occurrence role types are – as topic types – themselves topics.
 Examples ofoccurrence role types are: article, video, foto, audio tape.
 

Associations

 associations 
 relationships 
 
The real power of topic maps results fromassociations between topics.Associations describe the kind of relationship between topics.
 Examples ofassociations are: New Yorkis in USA, Young People's Concert took place in New York, New York Philharmonicis conducted by Leonard Bernstein.
 Each association has oneassociation type .
 Examples ofassociation types are: is in, took place in, is conducted by.
 Each topic that participates in an association plays a role. The role is described by anassociation role type .
 Examples ofassociation role types are: state / country, event / place, orchestra / conductor.
 Bothassociation types and role types are again topics.
 

Scopes

 scope 
 theme 
 
The concept of scope is important to avoid ambiguities between topics and their characteristics. Any assignment of a characteristic to a topic is considered to be valid within certain limits, which may or may not be specified explicitly. The limit of validity of such an assignment is called itsscope . A scope is defined in terms of themes andthemes are topics.
 Examples ofscopes are: to distinguish between “Paris” in France, “Paris” in Texas, and “Paris” the Greek hero, assign the scopes “France”, “USA”, and “Greek mythology” to the three topics.
 

Public Subjects

topic identity
 
Merging of topic maps requires a way of establishing the identity between seemingly disparate topics from different maps. The specification ofidentity attributes on the topic elements that address the same public subject is the explicit solution the standard offers. The other solution is implicitly through the topic naming constraint which states that any topics that have the same name in the same scope refer to the same subject.
facets
 

Facets

 Facets basically provide a mechanism for assigning property-value pairs to information resources. A facet is a properts (such as "language" or "level of knowhow"); its values are called facet values . By using facets one can apply sets of metadata on an informarion pool for filtering out the relevant information according to user profiles, for example.
 
 ISO/IEC 13250 
 knowledge 
 topic map 
 
Adding SGML/XML markup to raw data leads to information. Adding a topic map to an information pool leads to knowledge . A topic map describes explicitly the knowledge structures implicitly present in a set of information assets. The ISO/IEC 13250 Topic Maps Standard defines a format that allows information designers to describe knowledge structures in an interchangeable format and independent of the actual content they are derived from.
 

Conclusion

 HTML, Hypertext Markup Language 
 XLink 
 XPath 
 XPointer 
 
XLink provides much more complex link structures as HTML. There are many capabilities in
 
  1.  adressing a resource
  2.  describing the linked resources and the relationship between them
  3.  processing and
  4.  presenting the link resource
 provided in XLink and its related languages, XPath and XPointer. In particular the concepts of multidirectional links, as well as the possibility to adress read-only units by navigation, bring the linking concepts used in the Web to a higher level of complexity.
 The specifications of XLink, XPointer, and XPath are not yet finished. When they will become official recommendations of the World Wide Web Consortium, the next step has to be done by the system providers: implementing automatic support for these linking languages in a wide range of tools, applications, and systems. Not only Web publishers are looking forward to that.
 ISO/IEC 13250 
 Topic Map 
 knowledge management 
 
The new ISO/IEC standard Topic Maps specifies a language for describing the knowledge implicitly present in a collection of information objects thus enabling intelligent navigation strategies in information pools. With Topic Maps it will be possible to transcend mere information management and enter the next level, knowledge management, on a standardized basis. Topic Maps will be applicable wherever information "consumers" must be protected against irrelevant information - the main challenge for information providers in the information age.
 

Information sources

 
  •  [BCD], Encyclopaedia Britannica. Multimedia Edition (CD ROM) 1999
  •  [BROCK15], Der Brockhaus: in fünfzehn Bänden, F.A. Brockhaus GmbH Leipzig - Mannheim, 1999
  •  [BROCK24], Brockhaus: Die Enzyklopädie in 24 Bänden. Bd. 24, F.A. Brockhaus GmbH Leipzig - Mannheim, 1999
  •  [BUSH45] Vannevar Bush, "As we may think.", Atlantic Monthly 176, pp. 101-108, July 1945
  •  [COPP] Patrick J. Coppock, "The electronic hypermedia encyclopedia: transcending the constraints of the "authoritative work"?, http://www.hf.ntnu.no/anv/wwwpages/Hyper/Hypermedia.htm
  •  [ENGL] Roger Englaender, Behind the Scenes: The YOUNG PEOPLE'S CONCERTS in the Making. http://www.leonardbernstein.com/studio/element.asp?id=57
  •  [FOLDOC] Fre On-Line Dictionary Of Computing, http://foldoc.doc.ic.ac.uk/foldoc/ index.html
  •  Hypertext at Brown University, http://www.stg.brown.edu/projects/hypertext/landow/HTatBrown/BrownHT.html
  •  [KIPP], Neill Kipp, Simple XLinks, Extended XLinks, and XLinks in Groups. <TAG> Volume 11, Number 11, 1998
  •  Ksiezyk, R.: Trying not to get lost with a Topic Map, in: Proceedings of XML Europe 99 Conference, GCA, Alexandria, VA, 1999.
  •  [KUHL91] Rainer Kuhlen, Hypertext. Ein nicht-lineares Medium zwischen Buch und Wissensbank, Berlin, Heidelberg, New York, 1991
  •  [LAND97]George P. Landow, Hypertext 2.0. The Convergence of Contemporary Critical Theory and Technology, The John Hopkins University Press, 1997
  •  [MUENZ97] , Hypertext, http://www.teamone.de/hypertext/
  •  [NELS81] Theodor H. Nelson, Literary Machines, Swarthmore, Pa.: Self-published, 1981
  •  [NELS87] Theodor H. Nelson, Computer Lib/Dream Machines, Seattle, Wash.: Microsoft Press , 1987
  •  [NIELS96] Jakob Nielsen , Multimedia, Hypertext und Internet. Grundlagen und Praxis des elektronischen Publizierens., Vieweg, 3-528-05525-1, 1996
  •  [OASIS] The OASIS HyTime Page, http://www.oasis-open.org/cover/hytime.html
  •  Pepper, S.: Euler, Topic Maps, and Revolution, in: Proceedings of XML Europe 99 Conference, GCA, Alexandria, VA, 1999.
  •  [SCHUET94] Dr. Helge Schütt, Datenbankunterstützung für kooperative Hypermedia Management Systeme , Würzburg 1994
  •  [TOPMAP] ISO99 International Organization for Standardization: ISO/IEC 13250:1999 Document description and processing languages: Topic Maps, ISO, Geneva, 1999.
  •  [WEB76] , Webster's New World Dictionary of the American Language, William Collins + World Publishing CO., Inc., 1976
  •  [WEB99], Merriam Webster WWWebster Dictionary, http://www.m-w.com/netdict.htm
  •  [XLink], XML Linking Language. Working Draft of the World Wide Web Consortium. http://www.w3.org/TR/xlink
  •  [XPath], XML Pointer Language. Working Draft of the World Wide Web Consortium. http://www.w3.org/TR/xpath
  •  [XPointer], XML Pointer Language. Working Draft of the World Wide Web Consortium. http://www.w3.org/TR/xptr

XML DTDs for Electronic Commerce and EDI   Table of contents   Indexes   Cognitive Agents for Automatic Generation of Valid XML Documents