Guidelines for using XML for Electronic Data Interchange   Table of contents   Indexes   Regulations Worldwide Online at the Siemens Public Communication Networks Group

 
 

Implementing a Link Editor


 
Eduardo   Gutentag
  Staff Engineer
  Sun Microsystems, Inc.
17 Network Circle, MPK17–102
Menlo Park   California  USA  94025
Phone: (650) 786-5498
Fax: (650) 786-5727
Email: eduardo@eng.Sun.COM
 
Biographical notice:
 
Eduardo Gutentag
 
Eduardo Gutentag is a Staff Engineer with Sun's Online Information and Tools Development group within SunSoft, where his primary responsibilities include architecting and implementing the transition to SGML and coordinating with t he Information Engineering group. He is a sponsor member of the Davenport Group and an alternate member of the W3C's XML and XSL Working Groups.
 
ABSTRACT:
 SunSoft 
authoring tool
customization
external links
internal links
 

The implementation of a link editor as a customized layer on top of the authoring tool was recognized very early by SunSoft as having the highest priority. The link editor allows writers to point and click in order to create internal and external links without having to know the first thing about ID/IDREFs or about FPIs. Many writers and implementors will recognize the need for this tool in their environment.
 
 

The Background

 AnswerBook 
 Sun  
 SunSoft  
 

In 1991 SunSoft (an operating company of Sun Microsystems, Inc.) introduced the first AnswerBook technology as part of the Solaris environment. AnswerBooks we re one of the first attempts in the computer industry to deliver online documentation with a reasonable navigation method and a powerful fulltext search engine.
 
At the time we thought that AnswerBooks would meet our needs for many years to come.
postscript
 

Unfortunately, AnswerBooks were based on a proprietary manipulation of PostScript. That was bad enough, but on top of that it soon became clear that the one-to-one correspondence between the online and the printe d page presented more problems than it solved. Among them were the inability to cut and paste and the inability to generate other formats from the source material.
 SGML 
 

So in 1994 we decided to revamp the whole system from the ground up, using SGML  (Standard Generalized Markup Language) .
WYSIWI G
 conversion 
 migration 
 

Of course, if we were going to use SGML, we needed either a reliable conversion system to go from the WYSIWYG  (What You See Is What You Get) editor in use then to SGML every time we needed to produce SGML, or we needed a good SGML authoring system. And, since there really is no method to convert cheaply from unstructured to structured markup with 100% reliability and no human intervention, we decided to migrate to an SGML authoring environment.
 
This kind of migration is not easy, as some of you may know.
 
And one of the most recalcitrant problems (from the point of view of training and editing environment) is what to do with the SGML itself: How much do you hide from the writers? How much do you show? And is there a tool that does the r ight amount of showing and hiding?
 
 

The Problem

Adept
 DTD, Document Type Definition 
DocBook
 

So there we were, sometime in the middle of 1995: we had already chosen Adept*Editor from ArborText, the most robust editing and printing tool that the market offe red at that time, and we had already chosen the DocBook DTD as the basis for the DTD that our writers were going to use.
 
And I had already started having nightmares.
 
In my nightmares I was being confronted by a group of confused writers, and a Question and Answer session developed between us:
 
Q: How do I insert a link to a chapter in my book?
Gentext
 

A: Do you want a gentext link , or a link with authored text?
 
Q: What is gentext?
FOSI
 

A: That is where the FOSI  (Format Output Specification Instance) inserts the text of the title of the element that you are targeting as the linkend of your link.
 
Q: What is a FOSI?
 
A: It's a Format Output Specification Instance. It's used for formatting.
Style sheet
 

Q: But I heard we were using style sheets...
 
A: A FOSI is a kind of a style sheet, but it's not the same as the style sheets that we use for online presentation. But it's OK, because also the online style sheets will insert the autogenerated text.
 
Q: What is “the linkend of my link”?
 
A: That's the target of your link, the thing you are cross-referencing to.
 
Q: So how do I do it? How do I tell my link to go somewhere?
 
A: You insert an xref element and you assign to its linkend attribute an IDREF value that corresponds to the ID of the target. Unless you want to author the “hot text” yourself, in which case you insert a link element, plus whatever you want to be its content, and you assign to its linkend attribute an IDREF value that corresponds to the ID of the target. Of course, if you want to make a Web kind of link then you have to use the ulink element. But if you want to cross reference to another book you must use the olink element, and set the value of its targetdocent attribute to the right entity, which must be ndata SGML .
 
Q: Say what?
 
Indeed. Say what.
 
 

The solution

 
This kind of problem is not unique. Trying to explain SGML issues and methods to an audience composed of very bright individuals with no experience in SGML is not an easy task, particularly when these individuals try (as we all do) to understand the explanations by mapping them to their previous experience with WYSIWIG authoring systems.
crisis
 training 
 

One early answer to this quandary was to rely on good and solid training. But training only goes so far, is very expensive, and still doesn't solve the problem of the writer who's hi red to fill in a position that's been open for a couple of months and is told by the manager ”Look, we have a crisis on our hands; you'll get your training after we finish with it. For now, just go and do your job.”
 
The idea of hiring only SGML-trained writers was considered, discussed, examined, and rejected in about one minute. Those writers simply do not exist.
 
It was at about this point that I said that of course this problem wouldn't even arise if the application had a good link editor.
 
The words “link editor” became, at that point, the magic mantra that would save our day. And I was assigned to design and implement it.
 
 design 
principle
 

The design

 
The Link Editor's design has not changed in the past two years,as it went from a couple of words tossed carelessly in a brainstorming session to a concrete reality.
 
One of its first elements is that whoever uses it does not have to know the first thing about SGML or about the DTD in use .
 
This basic design principle established the basis for all the rest:
 
  1. All actions should be GUI  (Graphical User Interface) -driven. In other words, all actions, including element insertion, attribute value assignment, etc., should be performed through point-and-click.
  2. There should be only one panel — usability tests have proven that a proliferation of panels to accomplish slightly different but essentially similar actions can be extremely confusing.
  3. The GUI panel need not be fancy, but it must be clear.
  4. The application should behave identically no matter what the entry point to the action is. That is, the behavior should be the same whether the writer tries to insert an element by hand or through the link editor. There should be no surprises.
  5. The writer should be totally oblivious as to what the application does in the background: database contacts, error recovery, data checks, etc., should all be transparent to the user.
  6. Derived needs, such as maintaining unique IDs or updating the links, should be automated, too.
 GUI, Graphical User Interface 
Usability test
clarity
point-and-click
 
 
As you can see from the above, some of the design principles, when condensed in one or two sentences, seem pretty basic, not to say lame, obvious and not worth the time it takes to enunciate them.
 
However, the implementation strategies forced by these “brain-dead” principles can be anything but.
 
implementation
 

The implementation

 link editor  
 links 
target selection
 

One of my first programming teachers once said that a good programmer never writes a line of original code but instead “borrows” code from existing applications.
 
In a way, that is precisely what I did when I started implementing the link editor. From a functional point of view, what I wanted to do was show the user the possible link targets for selection. The difference between a list of titled targets and a table of contents is quite minimal, at least in terms of appearance. Therefore, I took the dynamic TOC  (Table of Contents) panel In Adept*Editor:
 
 
and changed it into:
 
 
The main visible difference is what appears at the top of the panels, and the fact that the “Find” and “View” menu items are replaced by “Switch Modes”, which, when selected, shows:
 
ease-of-understanding
 

This allows users to select between the four possible types of links they can author. Note that the aesthetics of a well proportioned menu have given way to ease-of-understanding requirements.
 
Note also that the Link Editor is run independently of the method chosen to insert a link. That is, writers may go to Tools→Link Editor to initiate it, or they may try to insert any of the link elements through the markup insertio n panels, or they may even use the command line…but in all cases they always get the same result, as predicated by the design principles.
 link 
ulink
xref
 

Th e appearance of the “Links” mode of the link editor is exactly the same as that of the “Xrefs” mode; the “Ulinks” mode is actua lly just a little help window with a “click here” notice that loads the mouse, as it were:
 
 
And, once the mouse is loaded, a dialog appears, prompting for the desired URL.
 
 
After entering the URL, the writer is prompted for “hot text”:
 
 
If the writer enters anything, that is what will become the “hot text” in the output; otherwise the URL itself becomes the hot text .
 olink 
 

The appearance of the “Olinks” mode is very similar to that of the “Xrefs” and “Links” ones, but the behavior is somewhat different. Where “Xrefs” and “Links ” present the hierarchy of a single document, the “Olinks” mode first presents a list of all the AnswerBooks and Collections that there ever were. Each one of them is expandable into the list of books that it includes, and each of the books, if written in SGML, is in turn expandable into the legitimate link targets that it contains.
 
 insertion 
internals
 

The internals

 
So what happens when a user double-clicks a target in the Link Editor and then double-clicks the insertion area? What is the magic that allows for the automatic insertion of the link with all attribute values assigned and, if needed, a ll entities declared automatically?
 
 

Internal links

 PI 
Processing Instruction
target location
 

In order for the above to be accomplished, each line in the Link Editor, when in “Xrefs” or “Links” mode, carries P I information about the location of the target in the document. For example
 
<?Pub Lcl Command="exec xtoc::linktop_oid('56 ','(13,1,56)')"
>
 double-click 
 

A double-click the above target in the Link Editor loads that information into memory, and maps mouse double-clicks to the execution of that command. When the user next double-clicks in the insertion are a, the executed function obtains the id value of the location that was put in memory, writes the link into the document with this new information, and re-maps mouse double-clicks back to the default.
 
 

External links

 olink 
 

When the Link Editor is in “Olinks” mode, however, the internals are quite different and, of course, a bit more complicated.
 AnswerBook 
 database 
 

Initially the Link Editor shows all AnswerBooks and Collections ever registered in the database, whether actually published or not:
 
 PI 
 

Hidden in each of the lines and completely invisible to the user, there is a PI containing the following information:
 
<?Pub Lcl Command="xtoc::display_books(47,5,'Solaris 2. 7 System
Administrator Collection')">
 database 
 double-click 
 

When the line is double-clicked on, the display_books() function is executed, and it retrieves from the database the list of books contained by the Collection 47.5, which is how it is known to the database
 
 
If the user now double-clicks the guillemet arrow, the expanded list of books is contracted back to the original; if the user double-clicks the name of the Collection then the mouse becomes “loaded” and ready to insert a reference to the collection (in the form of a <citetitle>.) However, if the user double-clicks one of the books, the book expands to a list of its targetable contents:
 
 olink 
targetdocent
 

And, once again, clicking the guillemet quotes contracts the book; clicking the book name loads the mouse to insert an olink to the top of the book, while clicking any of the book targets loads the mouse and enables the next double-click to insert an olink to that precise location in the book, like this:
 
<olink targetdocent="BINARY" localinfo="INTRO-23217"
type="V-ONLY"><quote>To Be or Not to Be 64–bit</quote> in
<citetitle>64–bit Solaris Application
Develpment</citetitle></olink>
entity declaration
 

At the same time, a declaration is introduced in the document, if it doesn't exist already, of the form:
 
<!ENTITY BINARY PUBLIC "-//Sun::SunSoft//DOCUMENT BINARY Version
2.0//EN" NDATA sgml>
 
This “BINARY” entity, together with the Version number in the public text portion of the declaration (“Version 2.0”), and the Language specified (“EN”), is unique in the database and specifies one an d only one book with a given part number. The olink that is introduced in the document refers to it through the targetdocent attribute, and goes further into the document through the value of the localinfo attribute.
 AnswerBook 
AnswerBook2
 URL 
 

Once the document is published as part of an AnswerBook2 collection, when the user double-clicks the hot tex,t the AnswerBook2 server translates all this information into the URL needed to locate the targeted book and the internal location of the target.
 
 database 
 

The Database

 
Of course, none of the above would be possible if there weren't a database behind the scenes that supported it. All of the Link Editor's functionality when it comes to external links is based on on-demand database queries.
 relational database 
 

Our database, although it could be called “primitive” in that it is a straightforward relational database at this point, contains all the information we need about all the books and col lections, and then some more.
book version
part number
short name
 

Each book is assigned, before anyone actually starts writing it, a part number. Each book also has a short name associated with it and with the f amily of books to which it belongs. For instance, the book on Solaris 2.5 System Administration belongs in the same family as the book on Solaris 2.6 System Administration, and they both therefore share the same short name. However they each have a different version number, and a different part number.
 AnswerBook  
 

Each book can also be associated with more than one AnswerBook collection.
 
Each book's canonical title is registered in the database, and so is each AnswerBook's title.
 repository 
 

When a book is checked-in to the repository, the integrity of the data is verified against the database, and the book is accepted or rejected by virtue of its compliance (among other things). At the same ti me, the book is scanned, and a list of targettable contents and their IDs is obtained. This is the list presented to the writer when a book is expanded in the Link Editor.
 
But why go to such lengths?
 
If the goal of linking from a document to an external book were to be able to link to one version, and one version only, of a book, then it would make sense to use the target's part number as the identifier in the olink . H owever, there are many occasions where it is desirable to go to a different version of the targeted book.
 
For instance, if links targeted part numbers it would be impossible to degrade gracefully a link that points to an unavailable book version; it would also be impossible to shift target language if the current language version sin't ava ilable.
link degradation
link upgrade
 

If in the example above there was a link going to part number 805–1123 (the Solaris 2.6 System Administration book), how would one be able to determine that, in it s absence, it is ok to link to part number 802–1432 (the Solaris 2.5 System Administration book)? There is no relationship between the two part numbers.
 FPI  
 URL 
 URN  
 

However, because the link actually goes to the EN version of Version 2.0 of SYSADMIN, it is possible to degrade or upgrade to the JA version or to Version 1.9 of the sam e. In this sense, the use of FPIs to indicate link targets can be compared to the use of URNs to locate URLs (the difference being, of course, that URNs are still not there).
 
 

The auxiliary functions

 
In order for the link editor to function correctly, auxiliary functionality must be also in place. This functionality is
  1. Automatic unique ID generation at authoring time and on demand.
  2. Paste callback to prevent ID duplication, and
  3. Link updates.
 ID generation 
 callback 
 link update 
 paste 
 
 
None of the above is needed per se to enable the Link Editor to function. However, from a user's point of view, duplicate IDs and incorrect hot titles are intimately related to linking. In other word s, authors will tend to believe that the Link Editor is broken, or will lose confidence in it, if the results obtained are wrong due to things that ultimately may not have anything to do with the Link Editor but which appear to be part of th e same functionality.
 
 ID generation 
 

Automatic ID generation

 
The automatic ID generation functionality that we have in place currently is far from ideal, although it performs properly and does everything that's needed.
 
I mention ideal because, at least theoretically, keeping track of the last assigned ID number should be done through a text entity; however, there are practical reasons why we cannot do that yet.
 
The main obstacle is the fact that Adept allows writers to edit the file entities that compose a book either as part of the book or separately from it. This is actually good, since it enables writers to share book authoring simply by p arceling out chapters among themselves. But the problem is that the text entities that keep track of IDs at the book level may get out of sync with the same text entities when one of the files is edited by itself.
 
In other words, if the book's internal declaration subset asserts that the last ID assigned to Preface is 15, and then one of the writers edits Preface.sgm by itself, upping that number to 23 by the end of the editing session, when the Preface is again edited as part of the book, the last ID assigned to it will still be 15, leading to ID duplication next time an ID is generated.
 
The solution to this problem would be to keep these entities in parameterized declarations, if these were read/write. Unfortunately Adept accesses parameterized declarations in a read-only manner, so it can't be done without a lot of m anipulation.
 
So instead of an ideal solution we have a file based one, where a file entity's last ID is kept in memory during an editing session, but saved to a separate file in between sessions. It is not ideal, but it certainly works well (especi ally if writers remember to carry those files with them when they copy separate chapters from directory to directory).
 
In terms of functionality, a list of ID candidates (like chapter, sect1, sect2, table, etc.) is kept in memory, and whenever one of the candidate elements is inserted, a new ID is generated for it and, through a callback to the insert_ tag function, assigned to the element.
 
Similarly, a writer can run a specialized function to insert IDs throughout the book at any time. This is especially useful after doing a copy-and-paste as indicated below.
 
ID duplication
 callback 
 paste 
 

Paste callback

 
One of the most bothersome issues when trying to keep one's document clean of duplicated IDs is what to do when pasting text that contains IDs.
 
This is generally the source of most duplicated IDs.
 
The solution in this case is a callback to paste. Whenever text that comes from a paste buffer is about to be inserted, it is inspected for the presence of elements with ID attributes. If one is encountered, the user is advised and ask ed whether to get rid of these attributes to prevent duplicate IDs.
 
If the user answers in the affirmative, the buffered text is traversed, and any ID attributes in it are deleted before pasting the buffer in.
 
To avoid the problem presented by elements without IDs, there is a function, as mentioned above, that the writer can use at will, which traverses the whole book and inserts IDs in those elements that are missing them.
 
 link update 
 

Link Updates

 
Whenever a document contains a link to another document the risk exists that the text referring to the target can become out of sync with the text that actually exists in the target.
 
A method for updating a document's existing links is therefore essential, from a writer's point of view. Otherwise they end up with a great tool for inserting external cross references but with the burden of manually consulting the dat abase in order to update them.
 
Without going into the most intimate details of how our link updating tool works, I can say that its operation is relatively simple.
 
The author is first presented with the list of all the books the document links to:
 
 AnswerBook 
 

After selecting one of them, the writer is presented with the list of all the AnswerBooks collections in which that book appears:
 
 AnswerBook 
 

After selecting an AnswerBook or Collection, the writer is presented with a confirmation panel.
 
 
After confirming the choice, the writer goes to the next book, and so on until all the books are done.
 
At the end of the process all entity declarations are updated to the right version and language, and all the link text is updated to reflect the latest available version of that particular book.
 
 support 
 training 
xl
xpointer
 

The moral

 
The question of whether something as conceptually simple as linking merits this kind of development effort is a legitimate question to pose at this point.
 
Wouldn't training be a better way to go?
 
Why can't writers just insert the IDs by hand?
 
Why can't writers just access the database through other methods when they want to link to external books?
 
Why can't writers accept that writing in SGML is hard, and just get over it, and if they don't like it, tough luck?
 
In short: is it worth it?
 
The answer, from my perspective, is a resounding yes. There are a variety of reasons why.
 
Perhaps the most important is that, with the Link Editor in place and working today, we are very well positioned to deal squarely and straighforwardly with XLL  (eXtensible Linking Language) when this becomes necessary and possible. Instead of having to answer questions about in-line and out-of-line links, simple and extended links, show and actuate attributes, locators , xpointer s and children , all we will have to do is modify the Link Editor slightly, perhaps add a dialog or two, and voila! we're done.
 
Another reason is that ever since the link editor and related functions were put in place, I have received not a single telephone call or email message asking for guidance or troubleshooting related to linking. This is probably the str ongest testimony as to the value and importance of hiding the internals of unfamiliar operations from the writers.
 
And finally, because who among you would be have shown any interest in a talk about link training?

Guidelines for using XML for Electronic Data Interchange   Table of contents   Indexes   Regulations Worldwide Online at the Siemens Public Communication Networks Group