SGML and the Auto Industry: Contrasting East-West Management Strategies   Table of contents   Indexes   SGML, still a cutting-edge technology?

 Gregory  Arofan 
 

Wishful thinking or thinking ahead? Envisioning the next generation of SGML editors

  One of the main barriers to those wishing to create SGML systems is the high "cost" of authoring: the level of expertise necessary to author in a structured editor can be prohibitive, as can the cost of up-translating documents from traditional word-processing formats. While SGML editing tools are slowly improving, they are still a far cry from the sort of applications that would enable SGML to be adopted as a mainstream technology in settings where WYSIWYG word-processors are the norm. At the same time, emerging standards in the SGML world such as HyTime and DSSSL, as well as the demand for support of such features of the standard as LINK, SUBDOC, and the internal declaration subset (among a more technical audience) require that SGML editors fill a different role, but one that is currently neglected. This paper presents a practical look at how (and how far) these goals can be met, given the existing state of SGML technology and related standards.
  By examining the different authoring paradigms, and by looking at successful word-processors, code editors, HTML authoring tools, and structured editors, a set of requirements is developed for the "ideal" SGML authoring tool(s). The design approaches that can be taken for meeting these requirements is then considered. Emphasis is placed on utilizing architectural forms as an enabling technology within software applications, and leveraging the DSSSL and HyTime standards to enhance both application functionality and the value of the documents produced.
  Features that would be desireable for ease of integration with SGML document management systems are also addressed. The impact of XML on the development of editing tools is examined, and some approaches recommended for dealing with the impending wave of "para-SGML" documents this new "standard" threatens to generate.
  While an SGML editor cannot do everything for the user, the real world demands that this class of applications be significantly improved, both in terms of usability and functionality. This paper focuses on where these improvements can realistically be made, and what approaches have become possible given advances in the implementation of SGML and related technologies. In a more traditional vein, an analysis of successful authoring paradigms shows how SGML applications could be improved without requiring new technologies. It is hoped that this paper will help users vocalize what they most want in their SGML editors, and help developers understand how these features might successfully be implemented.
 

INTRODUCTION

  One of the main barriers to those wishing to create SGML systems is the high "cost" of authoring: the level of expertise necessary to author in a structured editor can be prohibitive, as can the cost of up-translating documents from traditional word-processing formats. Most SGML implementors probably share a similar experience here: While SGML editing tools are slowly improving, they are still a far cry from the sort of applications that would enable SGML to be adopted as a mainstream technology in settings where WYSIWYG word-processors and similar tools are the norm. Whether evaluating different editors for use in a particular implementation setting, or helping others to select the "right" authoring solution as a consultant, there is a good deal of frustration involved in finding a workable solution.
  The objective of this paper is to outline a set of guidelines for developing the authoring and editing tools that will make SGML an easier technology to implement in the future. As never before, SGML and standards-based technology is providing us with the ability to solve some of the key problems with the creation of structured documents; we must overcome the "authoring" barrier if SGML is to realize its potential over the coming decade. I believe that this is a realistic goal.
  When we set about developing an application, and drawing up requirements, the most important qustion is this: "Who will use the application, and what will they do with it?" (It is tempting, of course, to ask "How will I make money from it?", but this should not be thefirst question...) All too often, software developers make assumptions about the tools they will create, based on what existing tools already do, and on their own assumptions about how users perform certain tasks. The result, as we can see throughout the computer industry, is software that is difficult to learn, and difficult to use effectively. Because SGML is already a "difficult" technology, this mistake will be even more damaging for the developers of SGML software.
  Once this basic question is answered, however, we are equipped to define the functional requirements that will allow us to examine existing tools, and take from them what is good, while improving on what is lacking. Specific to the SGML world are a number of initiatives that have implications for how our functional requirements can be implemented; in many cases, because these initiatives are fairly new, there are no existing applications to examine, and we are forced to think for ourselves. As you would expect, HyTime, DSSSL, and XML are the major expressions of these intitiatives, and we will be looking at them in determining how needed functionality can be implemented.
  To summarize, this paper will address: (1) Who are we designing applications for, and how do they think about their software? (2) What functionality do they require? (3) How do we implement this functionality? The end of the paper will cover the long-term view: What are the ultimate implications of the internet when combined with modular functionality as provided by languages such as Java?
 

AUTHORING PARADIGMS: WHO ARE WE CREATING SOFTWARE FOR?

 
 

Authoring Paradigms

  We will approach this question by establishing a set of "authoring paradigms." The term "authoring" is used loosely, because there are a number of tasks that are not directly related to writing, but that we will need to support in the creation of an authoring tool.
  There are many places where we can look to determine who our "authors" will be, and what their "authoring paradigms" are:
 
  • The world of print production, especially in terms of page-layout tools, word-processors, and related applications.
  •  
  • The world of "database" authoring: authors who use forms-based applications to populate databases, as we often see in corporate IS systems where there is a high level of information coordination with "real-time" information demends.
  •  
  • The world of Web-site creation and management, which has given rise to a set of tools that are very similar to the ones that we will want to build, even though the emphasis on structure is so much weaker.
  •  
  • The world of applications development, especially code editors. These tools help sophisticated users generate "documents" with structures than can be more exacting than those of SGML, and so we will profit by examining the similarities.
  •  
  • The world of structured editors as they exist today.
  •   Not surprisingly, our "authoring paradigms" can be drawn fairly directly from this breakdown. Four types of "authors" are described, in order of what proportion of the world's users belong to each group (in my estimation). All four types can (and do) "author" SGML, although the break-down is much more general than that:
     Type I: "I don't know, and I don't want to know.": This is the vast majority of people who write documents on computers: people who need to use the machine to do a job, and whose first requirement is ease of use. These authors must do anything from writing a corporate memo or report to producing the next Pulitzer Prize-winner. Typically, they are focused on the content that they produce, and accept computers as a means to an end. They often work under deadlines, and may be somewhat underpaid. They don't know much about data formats or structured text, and if life is good to them, they never will. They don't care, and are too busy doing their jobs to care.
      Very often, these people think of computer files as "virtual" pieces of paper, and this is why WYSIWG word-processors make them so happy: simple is good.
      At the high end, these people are content experts or editors - people who focus on how well-written, accurate, consistent, organized, and generally communicative the content of their documents is, whether they are textbooks, magazine articles, or what have you. They are intelligent people, and are computer-literate, but they still want computer applications that are simple, and that relate directly to the task being performed. At the low end, these people are not capable of learning the more sophisticated aspects of creating structured documents; they need to produce simple documents in a fashion that can be effectively learned by rote.
      A subset of this class,Type IA , performs a similar job in an office, but uses a computer program to fill forms in with specific items of data: bank clerks, insurance salesmen, people who enter telephone orders into the fulfillment system, etc. People who use these systems learn the patterns that are required for the task - which piece of information goes where - and they don't need or want to know anything more. Very often, these types of "authors" are populating relational databases, as opposed to writing paper "documents". (To us, this shouldn't make any difference: It's really all just information, right?)
     Type II: Graphic Designers and Layout People: This group of "authors" are only concerned with the way in which chunks of content appear on a page; they do not really care specifically about "what" is said, but how it looks. This group includes people who design Web home pages, people who write design specifications for commercial books and journals; graphic artists who do ad layout; people who write "style sheets" or FOSIs for print or CD-ROM delivery, etc.
      The focus here is on format, and at the level of the "look and feel" of a document. What the document says is a secondary issue in this paradigm. Art - especially "cool" art - is often popular with these folks. They like Java in their Web pages. For users in this paradigm, "content" has been reduced to a set of "point-and-click objects" that they move around a conceptual schema on their screens.
      The best examples of applications that use this paradigm are page-layout tools like PageMaker and Quark, and Website creation tools like NetObjects'. Icons represent content, and these can be manipulated without the editing (or even viewing) of the objects themselves. The organization of data "objects" has been abstracted so that it is easily manageable.
     Type III: People who like "code", but who aren't engineers: This group is made up of those quasi-specialists within most organizations who are not really very technical, but who have an understanding of the technical issues. Good SGML authors and editors belong in this class: they understand the importance of structure, as well as the importance of content, and they know how to use relatively complicated tools to impart this structure to the documents they work with. They aren't "engineers", but they generally understand HTML and/or SGML, and know the DTDs they work with intimately. The current crop of SGML editing applications (Adept Editor, Author/Editor, etc.) use this authoring paradigm. (The more "structured" HTML editing tools also use this paradigm: they are effectively SGML editors that work with a limited set of DTDs.)
      Applications that use this paradigm make it easy to see and manipulate the structure of documents, and are not heavily concerned with the appearance of those documents.
     Type IV: Engineers: Yet another group of authors: these people don't like English (or any other natural language) - they like C (and Fortran, and Cobol, and Java, and everything else...) They "author" in a very unforgiving paradigm, according to "structures" that are generally both more arcane and more exacting than those of SGML. (They tend to find SGML simple to learn, but often resent having to use such a complicated format for their documentation when man-pages seem to work just fine.) The "documents" produced in this paradigm are actually source-code, but these documents have some similarity to SGML documents from the authoring perspective. (Also in this class are the "die-hard" SGML experts who insist on using text-editors to write their SGML; vi is a "popular" SGML editing tool in certain circles, however limited...)
      The focus of this authoring paradigm is absolute access to the syntax of the code being created; a good editing application makes it easy for an author to adhere to a rigid formal language and to see the relationships between different pieces of code.
     
     

    General Comments

      These groups of users each have their own sets of demands for authoring tools; many applications are designed to serve more than one group of users, but most have a primary focus. Many users belong to more than one of the groups outlined above, however, and so we have tools that do both word-processing and fairly sophisticated layout, for example. The exact combination of users for a specific application in so tied to the anticipated market that it is not useful for us to examine these combinations here. Suffice it to say that for a particular application, one or more of the described authoring paradigms will be addressed.
     

    FUNCTIONAL REQUIREMENTS

     
     

    Introduction

      Given the descriptions of our authoring paradigms, the question becomes one of determining what features are required by which authors, and how best to allow them to perform their tasks easily and efficiently. For this, we will be looking at some existing applications, both from within the SGML world and outside of it. We need to be aware, too, that the tasks performed by authors are becoming more and more complex, both because tasks that used to belong to other kinds of users are now controlled from the desktop, and because some internet-related tasks did not exist historically within the domain of "authoring" (such as hyper-linking).
     
     

    Simple Tools for Simple Authors: Word-Processing and Forms-Based Authoring

      Our largest (and most demanding) group of authors are those who use WYSIWYG word-processors today, and for whom such HTML-authoring applications as FrontPage and PageMill have been developed.
     
    1. WYSIWYG
    2. The first requirement that this group of authors imposes on us is that the tools they use be WYSIWYG: they need to see exactly what it is they are creating. The problem with this requirement is determining what you actually get when you are authoring "content" to be presented in several different ways on different media: Do you show the user the Web-page equivalent? One of the six different print renditions of your SGML? The CD-ROM version? What do you really "get" when what you are creating a structured document? There are two solutions here: (1) fix your authors; and (2) build more powerful "WYSIWYG" features. "Fixing authors" requires weening them away from the "virtual page" concept of what a document is, and moving them toward the concept of authoring "data". This is related to the idea of "forms authoring". Because an SGML document contains rich structural information, it is possible to create an interface that is a "flexible form," allowing the structure of the information itself to dictate the appearance and functionality of the interface. If, when we load an instance into the system, it has a knowledge of what basic roles each element plays (a matter of configuration, see below), then a convention for authoring each type of element could be established, hopefully across all applications. Think about the types of distinct roles played by what we see in a "virtual paper" document: we have paragraph-type objects (including lists), we have tables, figures, and in-line elements that perform a variety of roles (link-ends, emphasis, content-specific tagging, etc.) The variety of these "on-screen" roles is not so great that it couldn't be captured in a useable fashion, but one that has been generalized according to convention. The appropriate choices for indicating which of the elements that share that behavior could then be displayed for user selection. The place to look for this kind of "convention" is the Web: because interactive forms through CGI established a particular look and feel for the creation and use of forms, more powerful languages such as Java have adopted the same conventions. While simple text fields and text-entry boxes may not be sufficient for good structured authoring, a similar approach could be taken for an "appearance-neutral" interface convention that was somewhat richer. The key here is having a convention that unsophisticated authors can be reasonably expected to learn and understand, and to use across all similar applications. The ability to add deeper structures, and more specific structures, will need to be realized, of course. Real WYSIWYG editing demands format, however, and is too important to be sacrificed. The second option, is, therefore, to let authors choose which deliverable they are authoring for. The Importance of Format: There has been a long-standing tenet in the SGML world that we must get authors to understand the difference between "content" and "format". While this is true, it can also be a damaging perception. One of the SGML editing tools that is gaining in popularity today is FrameMaker + SGML. One of the reasons for the success of this editor is that, unlike more "structure-based" SGML editing tools, it has a very rich screen presentation. The key here is that the same relationship that we use to derive structure from appearance in a document analysis exists when authors are creating documents as well: structure is reflected in format. If properly implemented, on-screen formatting encourages authors to build correct structures. It is possible to build an application that gives the author a choice of which deliverable to be derived from the SGML they "see" while they are working. In order to implement this is a WYSIWYG fashion, however, we need to have style-sheets that can be transferred between applications: I feed my SGML instance and my style-sheets to the application, and for each style sheet that is available, I can select that as my on-screen presentation. An interesting example of this is seen in the GRIF XML authoring demo, which uses cascading style sheets to drive WYSIWYG authoring. (There are a number of other implications here, too, which will be discussed in the next section, which addresses the formatting requirements more explicitly.)
    3. FORMAT The "standard" SGML formatting functions will all be needed: context-definition, attribute sensitivity, inheritance, sibling distinctions, etc. We will not dwell on these details: suffice it to say that formatting capability will need to exist that is the equal of what is available in today's word-processing capabilities. (A good example of how to develop a GUI for creating this level of stylesheet exists in SoftQuad's Panorama Pro.) If our users expect to have the format of their document available to them, they will also expect other people who work with the information to do the same. There is an absolute requirement for SGML documents to have importable and exportable stylesheets that are expressed in a standard fashion. The content and formatting of a document should be a single entity, even if there are multiple formats available to the user. While cascading style sheets may give us this ability on the most primitive level, they are not enough. Consider the ways in which several deliverable versions of the same document relate to each other: very often, an element displayed in one view is hidden in another. This is a simple matter of having a "show/hide" switch for each element. This is not, however, enough: a print document delivered to the Web will not only look different, it will be organized differently, with some elements re-ordered, others producing different functionality (cross references, for example), etc. The ability to re-order information for the express purpose of authoring single-source/multiple-output documents in a WYSIWYG fashion is critical. Authors will need to see the appearance of their documents in all of their multiple incarnations, so that they can edit and write accordingly. This functionality does not currently exist in any editing tool that I have seen, but the idea that this kind of transformation needs to be part of the same stylesheet that carries the formatting is present in DSSSL. DSSSL offers us the key to capturing a full set of the needed information in a standard fashion, and there is an important implication here for developers: the logic used in DSSSL to describe document formatting is something that can be rendered in C++ in the creation of application-specific binary internal representations of those documents. Even such a fairly constricting API such as the Microsoft Foundation Classes allows for each application to determine what a "document" will look like for that particular application. If the internal logic of DSSSL is followed, then deriving a DSSSL spec on export (and using it to drive document views) becomes a much simpler problem. Structure On-the-Fly (and other problems) with XML: One potential ramification of this capability has already come up in the discussions surrounding XML: will DTDs need to be created at the time of authoring to allow for the re-ordering that will be required? (An interesting application with great potential in this area is OCLC's "Fred".) To be XML-compatible, it is suggested that an author have the ability to choose an existing DTD (as they currently choose "templates" in word processors) to author in, or that they have the ability to generate their own structure, with a "Fred"-style DTD generated on output to allow for the document to be parsed. (In the latter scenario, they would also be responsible for generating formatting information to accompany their "customized" element structures.) XML presents us with a number of scenarios that are probably necessary in the great scheme of things, but that are upsetting to SGML traditionalists: documents will only be parsed to the extent that they need to be parsed for the purposes of the application using them. Hand-in-hand with this capability comes the looming spectre of a host of new processing instructions that are application-specific. The requirement exists, however, that the ideal editing tool must handle XML, and do intelligent placement of DTD declarations in the internal declaration subset, etc., to be fully XML-compatible. Similarly, it will need to generate and handle the PIs that will be encountered. Ultimately, the DTD that travels with a document should become an inseparable part of it: such things as notations could be profitably manipulated by applications that automatically filter graphics formats for display, for example. Unless the application has access to the DTD whenever it is needed, it will not be possible to make the minor accommodations that may be required to support a specific feature. (See "Configurability," below.) Conditionality: One of the most powerful features of SGML is that it allows a single document to contain different-but-related information in a single document, such that the required version for any specific output is used when appropriate. A good example of this exists in writing technical documentation for different platforms, or versions of a product: the Unix version of the documentation is 80% the same as the Windows 95 version, but the 20% of difference is important. A good SGML editor is required to support this kind of conditionality, whether indicated in the SGML as marked sections, attribute values, or distinct elements. This is largely a formatting issue: "show/hide" should take care of it, but the requirement stands that even if I can't see it in a particular view, the information remains a part of the document.
    4. CONFIGURABILITY Another major requirement of SGML editors is that they be able to handle arbitrary DTDs with a minimum of effort to configure. Existing tools that handle arbitrary DTDs are all less-than-ideal in this respect: they require too much effort on the part of an SGML "expert" (or an expert in the specific application) to accommodate new DTDs, or revisions of existing DTDs. At the very least, the tool should contain a simple point-and-click GUI interface for configuring new DTDs on import. Configuration would include default formatting (if not specified in a style sheet), and element and attribute behaviors/functionality (to identify graphics, tables, links, conditionality, etc.). The "configuration file" would ideally be in some standard format that could be used by other applications as well. While this necessity has been anticipated for format and transformation in DSSSL, and for linking in HyTime, the idea that functionality (as well as appearance) is tied to structure in a meaningful way has not yet been fully explored. It is suggested that a standard be developed (if only a standard DTD), to express the functionality that is important to those people who are creating editing applications and other SGML software. For performance reasons, such a standard "configuration file" might need to be written in a binary format (Java's byte-stream code springs to mind...)
    5. LINKING The proprietary ability of today's word-processors and HTML editors to do cross-references and links is something that can be bettered in an SGML editor; particularly with SGML Open catalogs, and the linking techniques offered us through HyTime and TEI, there is no reason why the links that have traditionally been expressed as hard-coded local pathnames cannot take advantage of indirection to become more portable. Why shouldn't my editor simply ask for the local location of a specific entity (as indicated by it's public identifier) the first time it needs to use it? (Interesting applications to look at are HyBrowse and HyMinder; the upcoming crop of XML demos should also provide some interesting functionality.) At the same time, HTML editors such as PageMill offer a good example of how GUI behavior can make fairly complex linking simple to use - it is fairly straightforward to create a client-side image map, for example, using point-and-click techniques. (Some of today's structured editors would require the insertion of a number of elements, and then the entry of pixel coordinates and link targets in a fairly non-intuitive "specify attribute" dialog for those elements.) The problem here is that each DTD may handle linking behavior in different ways - the "standard" link behaviors can be made configurable, however. What is needed is a general industry-wide concensus about what the major linking behaviors are, perhaps through the offices of SGML Open. (With XML, TEI, and HyTime all addressing this issue, surely a cataloguing of major linking strategies is possible...) Ultimately, authors should be able to use a familiar mechanism (a "file open" dialog box) to indicate sophisticated links expressed in an indirect fashion; a simple algorithm could allow every resource on my computer to have a public identifier, generated when needed if not already in existence. On export, the editor should produce a document-specific SGML Open catalog file, so that other applications could determine in advance what entities would be required for processing that document. Further, the ability to identify specific linking behavior and the purpose for a link are things that can be expressed in a standard way through HyTime - my editing tool should know enough to generate this syntax, and to prompt me for what it does not already know through configuration. Further, the capability should exist for a document management system to supply my editing application with resources that exist elsewhere on a network, or even on the Web. This might require some integration of the tool with the system of which it was a part, but the support needs to be present in the editor's API. If the existence of these resources is something that can be verified, then my editor can also help my authors by performing link validation routines when documents are opened, or links are inserted, and drawing attention to any links that are broken in simple language. Again, a standard for interaction between structured documents and databases would need to be implemented - database standards could be easily leveraged here: ODBC, SQL, etc. (For example, my editor sends out an SQL query to the database indicated in a configuration file, according to parameters established in the same configuration file, to verify the existence of a particular link target). The prevalence of Formal Public Identifiers would make the implementation of this kind of automatic validation possible; the functionality for checking specific target IDs once the target entity was identified is something that would need to be implemented on the database side.) Supporting Files - Indexes, Navigation Bars, Annotations, etc: Another powerful repercussion of having sufficient linking is that I can automatically render virtual documents made up of only links to support my primary document: indexes, TOCs, linked annotation files - all these could easily be generated. The capabilities of existing word-processors should at least be equalled, and in a fashion that is itself portable. (In a standard SGML format, perhaps? Again, Panorama Pro shows us how this kind of support can be implemented.)
     
     

    OTHER AUTHORING PARADIGMS

      Having discussed the creation of a WYSIWYG SGML editor, the other "authoring" paradigms seem simple. Obviously, much of the functionality of the first application would be useful elsewhere, so the differences will primarily be addressed for the remainder.
     Layout Tools:
      Existing layout applications such as Quark and PageMaker, and to some extent, FrameMaker, have proven that the abstraction necessary to do good page layout can be well implemented. The ideal SGML layout tool would require support for style-sheets and document views in much the same way as our WYSIWYG editor has done. Similarly, it would be nice to have the same linking capability. Once the "data objects" that the application handles are understood to be SGML, however, then the tool itself becomes almost identical to those we see today: Quark, PageMaker, NetFusion, etc.
      There is only one area in which layout tools become extremely problematic for use with SGML: good page layout (for print or the Web) requires a fair amount of case-by-case "tweaking," which cannot be readily captured in an SGML format (not even in a DSSSL stylesheet). We need to be able to refine the layout and edit the content of our documents, and be able to pass a full set of information along in a standard fashion.
      The solution to this problem lies in the linking capabilities of technologies such as HyTime. If I create a "virtual document" that is merely a set of pointers to content objects, then these pointers can live in a separate SGML document that does nothing but capture formatting information, with attributes and elements that describe each "tweak" on a case-by-case basis, and a DSSSL stylesheet that describes the general cases.
      One implication of using this kind of solution is that support for SUBDOC would be almost mandatory; I would naturally want to be able to point to pieces of different documents that used different DTDs, and be able to assemble them in a single set of pages, with formatting described in a single DSSSL specification, etc. A premium is also placed on link management in this scenario - I need to have absolute control over the relationships between my documents, because the slightest revision could be disastrous when I am using "treeloc" addressing to include content objects by reference in the "laid-out" version of my print document. The issues here are similar to those discussed above, however.
     Editors for Engineers and Authors Who Like "Code"
     We have two authoring paradigms remaining: both are the ones that involve more-or-less direct access to the syntax of the "code" we are creating. Oddly enough, the basic feature required by these paradigms is available in many relatively unstructured HTML tools: in Netscape Navigator, for example, you can take a peek at color-coded "raw" HTML using a simple menu pick; PageMill allows you access to the "raw" HTML code as well, at the click of a button, etc.
      A similar feature could be built into my WYSIWYG editor: at the click of a button, I can see the "raw" SGML that I am creating, and I can author from there. Such "supporting" documents as DTDs and DSSSL specs could also be available to view or edit as appropriate, with a similar "raw code" approach. While most users would not use these functions, the ones who need them would have them.
     The specific behavior of the SGML "code-view" would be drawn from another source, however: such tools as Microsoft's Visual Basic and Visual C++ provide engineers with text editors that not only color-code the content according to its function, but pop up message boxes that tell them when they've created malformed statements.
      For editing "raw" SGML, most of the features I want would involve an on-the-fly parsing of the document, at the time I write the "code", expressed in the same basic fashion: through color-coding and message boxes. (The parsing could be triggered by the entry of a hard return, for example, as in the Visual Basic editor.) Such syntax-checking would need to be configurable: if I get a message box every time I write an invalid IDREF attribute, I want to be able to turn the silly message box off! An example of these features would be automatic referencing of all markup for validity according to the DTD, including syntax checking of elements, data-type checking for attributes, validation of ID/IDREF constructions, verification of entities and references to them, etc. The features of SGML that would lend themselves to such a code-editor are fairly obvious.
     General Note on Implementation: Architectural Forms
      It is obvious that much of the requested functionality will rely on existing standards for implementation: DSSSL and HyTime are the most prevalent; XML is likely to produce standard or quasi-standard technologies that will also enable the implementation of many of these features. It should be noted that the use of architectural forms in HyTime and DSSSL is not at all accidental: when building tools that are required to handle arbitrary DTDs, an architecture is perhaps the easiest way to associate values that the application understands with element names that are changeable. As you are probably aware, James Clark has produced a parser that is capable of associating an architecture written in a file external to the DTD with which it is associated. This technique should be examined closely, as it provides a powerful way of allowing an application to "recognize" a DTD in a meaningful way, and the creation of the external "architecture" files can be implemented relatively easily through a point-and-click GUI.
     
     

    THE LONG-TERM: TRUE INTERACTIVE DOCUMENTS

      There is much discussion these days about the creation of computers that are nothing but a way to access the internet: machines that consist of a few megs of RAM, a Java interpreter, and enough hardware to let you use your mouse, keyboard, and monitor. The Web browser would be provided as a "desktop," with capabilities for ftp and e-mail and similar internet-related applications.
      In a world where such computers are truly useful, documents exist as nothing but chunks of data, with associated "application" functionality being provided as part of the document itself. This is not a revolutionary vision!
      Ultimately, there should be no such thing as an "editing application" - the objects available on the internet should come with functionality associated with them according to their structure, and to the user's stated intent when the document is downloaded - if I want editing functionality, it should be available to me; if I want to browse, that functionality is all I get; if I want to print the document, I'll have to wait while the Java classes that let my computer send things to the printer come to me through my ISDN line...
      In XML-related discussions, this vision of the future has come up, and there have been some remarkable attempts to create proofs-of-concept. Ultimately, the problems that surround "interactive documents" that do not require the intermediation provided by specific software applications will be solved. Chances are, XML will point the way: "just-in-time" Java applications will accompany the documents they allow you to view, edit, and generally use. Whether this will become the dominant computing paradigm is another question, but for some types of information, this model seems to make a great deal of sense, and will no doubt be realized.
      One thing is certain, however: without structured information, and the associations that it allows us to make between data of all types and the functionality that needs to accompany it, this vision of the future is empty. XML represents a huge step forward in realizing this vision, but it also shows us something else: the lessons that so many of us have learned in implementing SGML are very likely going to be learned again, the hard way, by newcomers to the world of structured information.
     

    SUMMARY

      While an SGML editor cannot do everything for the user, the real world demands that this class of applications be significantly improved, both in terms of usability and functionality. This paper's message is simple: the technology exists to provide these improvements! You have probably realized that this paper proposes nothing that has not already been said elsewhere - we are starting to see proof that SGML editing technology can be made easy to use; with the advent of XML, we are also likely to see sufficient resources dedicated to the development of these tools. Hopefully, this paper has provided an overview of how such applications might be designed, and established a general set of requirements for them, sufficient, at least, for the purposes of discussion. (Anyone interested in this topic is encouraged to contact me with questions and/or comments at: atg@passage.com)

    SGML and the Auto Industry: Contrasting East-West Management Strategies   Table of contents   Indexes   SGML, still a cutting-edge technology?