| Acknowledgments | Table of contents | Indexes | SGML Extended Facilities and HyTime Two | |||
Using Meta data to Automate XML Document Production and Maintenance |
|
Joe Gelb |
| Vice President, Electronic Document Projects |
| LiveLink Systems Ltd. POB 34059, 5 Mercaz Shatner Jerusalem 91340 Israel Phone: +972-2-6528274 Fax: +972-2-6528356 Email: joeg@livelink.com Web: www.livelink.com |
Biographical notice: |
Joe Gelb |
LiveLink Systems Ltd. ![]() Reichman, Katriel |
Prior to joining LiveLink Systems in 1996, Mr. Gelb worked for General Electric Astro Division and McDonnell Douglas. He earned a B. Engineering (Mechanical Engineering) and a BA in History from Stevens Institute of Technology in 1992. |
Katriel Reichman |
| President and founder |
| LiveLink Systems Ltd. |
ABSTRACT: |
Brief Abstract |
Full Abstract |
Introduction |
|
| associative documentation linear documentation |
The advent of electronic documents created and used over enterprise-wide networks and internet webs is, in some ways, an even more radical event than the invention of the printing press. Gutenberg only automated an existing process - reproduction of hardcopy documents. Electronic documents, however, require a fundamental shift in paradigm for effective production and maintenance. An effective paradigm is needed to bridge the gap between how documents are authored (linearly) and how they are used when delivered via browsers (associatively) . In this paper we will refer to a major subset of this challenge as the link coding problem . |
| Link Coding Problem |
The Link Coding Problem |
| associative document preparation linear document preparation |
While for-print documents are traditionally prepared linearly , beginning with an outline and following a structure that is apparent to both the author and the reader, electronic documents require associative preparation . Readers need clues that will enable them to navigate and locate related topics without reference to a document hierarchy. |
| Enterprise Publishing cooperative problem |
Enterprise Publishing TM Provides an Alternative Paradigm for Link Coding |
A new paradigm developed by LiveLink, and implemented in our Enterprise Publishing software, bridges the gap between linear document authoring and associative coding of hyperlinks. |
content objects meta data ![]() |
The paradigm uses meta data to separate between content objects , information about the content object, and information about how the object relates to other objects. Meta data is descriptive information about document objects. |
| automatic hyperlinking |
Using the meta data, Enterprise Publishing automatically codes hyperlinks and controls the properties of the hyperlinks. As documents are edited, deleted or modified, the automatic hyperlinking process determines the best possible links and codes them automatically. |
Goals of the Paradigm |
|
| automatic hyperlink updating |
The goal of the paradigm is to enable genuine automation of hyperlink coding and updating without human intervention in real-life work environments. We suggest evaluating our own proposal, and others, by benchmarking it against realization of the goals. |
| criteria for automatic hyperlinking |
Criteria for Evaluating the Paradigm |
Automation |
|
In order to meet the goals of fast creation and easy maintainability, the solution needs to be fully automated. All coding should be performed by the software using decision rules. |
Intelligence |
|
Adaptability for Meeting Real-Life Needs |
|
The solution should be adaptable to meet the idiosyncratic needs of individual projects. |
Standards-based |
|
JavaScript ![]() XML linking ![]() |
The solution should support the file formats used today and currently in development. In the context of the Web, meeting this goal requires support for input and output of different implementations of HTML (including JavaScript ) and XML (including the variations of XML linking ). |
How the Paradigm Works |
|
| document enrichment |
LiveLink assumes that the starting point (or "input") is structured electronic documents. The software works by identifying and using that structure first to break up and then reassemble "enriched" documents that improve upon the original documents. |
| folder meta data, container sub-folder |
The software first assigns documents to folders and sub-folders , using a file cabinet metaphor. Folders and sub-folders act as containers to associate meta data to the files contained in each folder. For example, all of the documents associated with a particular product might be assigned to a particular folder. All of the documents for the product relating to maintenance of that product may be assigned to a sub-folder and product operation to a second sub-folder. |
| dissection examples idiosyncratic links ![]() notes pre-marked links table captions tables topic hierarchy |
After assigning documents to folders, the software breaks documents into topics using tags to identify topic hierarchy . It then further distinguishes between different elements in the topics. Coarse dissection separates main and sub topics, text and graphics. Fine dissection separates parenthetical information such as notes and examples , and detailed information such as tables , table captions and pre-marked links. |
| content level expert level |
Distinguishing Between Content Level (documents) and Expert Level (meta data) |
database ![]() |
LiveLink software distinguishes between content level and expert level information. Content level refers to text and media, stored in documents, that the software processes. Expert level refers to meta data - information about how the content should be interpreted. Expert level information is stored in a database . |
| expert information persistence re-usable |
LiveLink software makes the expert level information persistent . That is, the information recorded by experts should be re-usable as the source files for the project change and should be portable to new projects in the organization. |
| database driven |
Database Driven |
| potential targets |
LiveLink software automatically populates a database by parsing files and isolating potential targets for hyperlinks . The database can be edited using standard database tools. The information stored in the database is read back and used by LiveLink products to control how documents are enriched. |
| aliases ambiguous links nonsensical links stop list ![]() |
The database approach allows users complete control over the database interface and expert level content. The user can create a "stop list" disabling particular targets that are too general and would lead to nonsensical or ambiguous links . In addition, aliases for potential targets may be added to the database. For example, "dipstick" can be an alias for "oil gauge." |
| database rules |
Database rules may be coded to improve the quality of document enrichment. A sample database rule might be to ignore all targets with the text string "hint". |
Implicit and Explicit Database Clues |
|
Clues are derived explicitly (typically from style tags in the source documents) and implicitly (using key phrases, juxtapositioning and other hints). |
Grouping Files in Projects, Folders and Sub-Folders |
|
| folders project sub-folders |
LiveLink software supports the grouping of files in projects by folders and sub-folders . The realm of files that are processed together is called a project . Folders are logical groupings of files within the project. Sub-folders enable enhanced functionality in various areas: |
assigning properties inheritance ![]() precedence ![]() source and target ![]() source-only target only ![]() |
Distinguishing Between Targets and Sources |
source and target ![]() source only target only ![]() |
LiveLink software allows the user to account for different levels of file ownership or readiness for public viewing by differentiating between "source-only" , "target-only" and "source and target" . Source-only files or topics can link to targets in other files but cannot themselves be the target of links. For example, an access-restricted file or a file that has not yet been approved for public viewing would be a good candidate for source-only status. Target-only files or topics can be linked to but not from, for example, a corporate glossary. |
precedence ![]() |
Precedence |
| ambiguous link solutions |
The intelligent agents built-into LiveLink software use precedence rules to deliver a high level of intelligence in links and to arbitrate under ambiguous conditions. An ambiguous condition is where the decision rules dictate multiple solutions but only one solution is possible, or where the solutions need to be ordered by rank. |
| link properties |
Link Properties |
JavaScript ![]() VB Script XML link attributes |
LiveLink software can control the type of link, as well as the link itself. In HTML this is accomplished by adding JavaScript or VB Script to the hyperlink. In XML this is accomplished by specifying XML link attributes . |
A link property might specify, for example, if the link should open a new instance of the browser or display the information in the current browser window. |
At production, the link property is rendered using either XML attributes or JavaScript. |
| XML link properties |
Handling Special XML Link Properties |
| extended link groups |
To illustrate the power of the LiveLink paradigm, we will consider how it would be applied for controlling three relatively straightforward aspects of XML linking. Of course, the paradigm is equally valid (and even more useful) for XML linking features that are more difficult to code, such as extended link groups . |
| Show link attribute |
Show |
| Actuate link attribute |
Actuate |
Link Quality |
|
idiosyncratic links ![]() |
Idiosyncratic Links |
We anticipate that there will be some links that cannot be coded automatically. This category of links is referred to as idiosyncratic links. |
The most common source of idiosyncratic links is in the input files themselves. These are links that are coded on the source files by the author or by a content expert. |
Avoiding Nonsensical Links |
|
alias list stop list ![]() |
Like the physician, LiveLink software says "do no harm." That is, avoid coding nonsensical links that distract the reader. Precedence, alias lists and stop lists are just some of the tools that the software puts at the disposal of the content expert to avoid nonsensical links. Our experience has been that careful definition of precedence rules, alias lists and stop lists typically enable total automation of linking. |
| images |
Handling Images |
The LiveLink paradigm supports the full range of media, in addition to straight text documents. |
| expert level database identifying strings |
To add links to images or other media files, simply add one or more identifying strings for each file to the expert level database . Whenever the string appears in any of the topics, the software automatically codes a link to the specified file. Whenever a new image or other media file is added to the database (or removed), the software automatically updates the links. |
PDF ![]() |
Sample applications include: (a) On-line technical documents that can be matched to illustrations stored as PDF documents; (b) mention of part numbers that can be matched to refer to photographs of the parts; or (c) descriptions of procedures that can be linked to video clips that walk through the procedure. |
| site verification |
The Difference Between Site Verification and Automatic Coding |
Using JavaScript and Down-converting to HTML |
|
Acknowledgments |
| Acknowledgments | Table of contents | Indexes | SGML Extended Facilities and HyTime Two | |||