How XML Enables Internet Trading Communities and Marketplaces   Table of contents   Indexes   Financial Products Markup Language

 Arofan T. Gregory
 Lead Scientist
Commerce One
 2440 W. El Camino real Mountain View (California)
 Biography
 Arofan Gregory worked for seven years in commercial book and journal publishing before becoming an SGML consultant at Passage Systems. After working as a consultant and manager of consulting there, he started his own consultancy, Aeon, also specializing in practical SGML implementation. Before coming to Commerce One, he worked for Documentum, Inc., as their Practice Leader for Technical Publications, with a focus on SGML and XML systems for repository-based Internet publishing. He has presented at major SGML and XML conferences, and has recently served as an editor for the CommerceNet eCo Semantics Specification for XML business documents.
 Documentum, Inc. 
 Pleasanton 
Quiggle, Jeff
 
Jeff Quiggle
 Consulting Manager
Documentum, Inc.
 6801 Koll Center Parkway Pleasanton (California)
 Biography
 Jeff Quiggle has worked in Web technology for five years as an HTML developer and Web Master, developing and managing several websites for the US Government, military, and corporations. He has worked as a developer and project manager for Documentum's Web Content Management Practice, and recently has focused on XML applications for extranet business to business exchange and content management for e-commerce solutions. This is his first presentation to an XML conference.
 

Introduction and Background

 Traditionally, SGML implementations, while having excellent return on investment over time, have made the lives of content authors more complex: new information was demanded of them, new tools needed to be mastered, and a level of abstraction was introduced into their lives that had not existed before.
 XML is syntactically much like SGML, and from the viewpoint of content authors, the differences are minimal. Whatis different with XML - as is becoming increasingly evident - is the implementation of the technology. SGML often came with a "complexity-equals-power" philosophy, resulting in long implementation times, complex, heavily customized systems, and heavily-revised budgets. One of the measures of SGML's complexity is that HTML, SGML's more widely used offspring, is much simpler by comparison.
 This traditional view of SGML is no longer accurate in many ways. In the first place, despite the improvement in HTML authoring tools, the HTML tag set has grown to the point where HTML is no longer simple but increasingly complex. With the more structured approach that XML allows in terms of implementation, it is possible to write an XML DTD that makes it easier to produce HTML output with XML than is possible with typical HTML authoring tools.
 This case study examines how an XML application to produce better HTML output using XML and document management technologies can be completed in a relatively short period of time and with a relatively small budget. Although the solution described is not simple, it is fairly simple in conception, and serves to illustrate the component pieces of a successful publishing implementation that has abandoned the traditional SGML approach and used a more limited and practical "XML" approach to produce an immediate return on investment.
 The problems addressed are not just technical in nature: while there were technology problems, the basic match between human process and enabling technology provides the full solution, just as has traditionally been the case with more complex SGML applications.
 

The Problem Space

 The Customer Support Documentation concerned in this case study is a set of small HTML articles: FAQs, configuration notes, and similar documents written to reduce call volume to support personnel by providing end users technical information to help them solve their problem on their own. The documents were written by technical support staff, not dedicated writers, and there were many problems with developing appropriate content and publishing it to the website effectively. Given how expensive customer support generally is, this publishing effort represented a simple, effective way to reduce costs. From a publishing perspective, however, it is far from ideal in that there is little to encourage overworked support staff to spend time writing support documents for the website in a culture whose primary task is to provide phone support.
 It should be noted that the client was already using document management technology, both for managing content and for maintaining published content on the website. What was missing was a structured process for developing suitable content, reviewing it for accuracy, transforming it for the web, and delivering it to the website. The client was, if anything, ahead of the curve in terms of where many similar organizations are today in their use of content management systems.
 

Content Authoring

 In this section, we will briefly describe the problems found on the content authoring side.
 

Authoring and Review Cycle

 The authoring and review cycle was not managed; the authors would write a document based on the types of calls they experienced most often while supporting a particular product, and would e-mail the file to another writer in the group for comment. The documents would go through an uncontrolled and frequently prolonged series of exchanges, in which each participant would re-write the document before returning it to the other. Documents would go back and forth as many as a dozen times, often wasting effort over issues not related to content but to writing style. Frequently, managers were dragged into disputes over differences of opinion between the writers, representing a further drain of resources and delay in getting information to the end users.
 

Unsophisticated Authors

 Another distinctive aspect of this implementation was the authoring culture in that customer support staff are not generally hired for their writing skills, and the quality of the writing in these documents varied widely. In some instances professional technical writers from the engineering publication departments were brought in to exercise some level of editorial control, but this did not provide a complete answer. Further, these authors were generally unsophisticated in their use of software tools: they generally used Microsoft Word, and sometimes used HTML editing tools, depending on the knowledge of a specific writer. No assumptions could be made about knowledge or training in this area, however, because content creation and editing was a secondary occupation for these writers.
 

No Link Between Source and HTML Documents

 There was no link between the reviewed form of a document - generally Word - and the eventual HTML document, which might or might not match the original source exactly. Both versions would be checked into a repository when complete, but subsequent changes could be made to either version of the document in an uncontrolled fashion. For example, the HTML version of longer documents were generally edited to include a hyperlink index to the main headings. Changing the Word version of the document required a second step of updating the HTML version as well.
 

No Way to Collect Metadata

 Because the documents were imported into the repository by a few support staff as a separate step from the authoring process, metadata was added to the documents in a haphazard way. Often the support staffers ended up working long hours to capture metadata that they were not always sure of. Insight that the authors had about the documents they created were not necessarily passed along to the document management support staff, so there was wasted effort discovering this information, which was vital since it is used to automatically generate index pages to the documents on the external website. The relationships between separate documents - used to drive organization and linking - was also something that was recaptured in this way, at great effort and not always in an accurate manner.
 

No Version Tracking

 The document management repository employed a sophisticated versioning system, recognizing major and minor versions of each document, as well as recording stated correlations between the source and HTML forms of documents (although not maintaining any automated relationship so that this could be verified.) The versioning scheme, as it turned out, was far too sophisticated, baffling the relatively unsophisticated pool of authors, and resulting in ade facto lack of version control.
 

Web Publishing and Maintenance

 A set of related issues existed with regard to the presentation and maintenance of the published material on the website. The Technical Support Website was already managed through a Documentum-based application, but this management process was separate from the content development process and in fact was maintained by a different group within the client organization.
 

Poor HTML

 Most of the HTML published to the Website was produced by using the "save as HTML" feature of Microsoft Word, and the result was haphazard at best. Those authors who knew HTML would often perform some degree of clean-up by hand, but the overall appearance of the website lacked consistency in both structure and format. This was through no fault of the authors, who lacked appropriate tools, motivation, and effective training to produce high-quality HTML. As is noted above, document publishing was a secondary activity for the technical support staff.
 

Unreliable Cross-Referencing

 Another major problem was that the links between documents, which is very useful when providing this kind of information about a related set of products, could not be managed effectively. There was no easy way for the authors to predict what the names of other HTML documents would be when they were published, or where they would live on the website. There was no mechanism for making sure that as documents were updated with the products they referred to, that other references to those files would not break - especially since there was no guarantee that the updated file would have the same name as the file it replaced. From both the authoring and the maintenance side, this was a problem.
 

Website Maintenance Nightmare

 A serious problem was that of generally maintaining the content on the website. Changes such as the re-naming of a product could affect many of the documents on the website, with no easy way to update the content. Changes in product status would dictate that some documents be retired, breaking links that could only be repaired manually. While there was some standard auto-generated inter-linking, driven by document metadata stored in the repository, there was no way to make sure that any of the content-based links did not break, and maintenance was manual. The publishing process, too, did not auto-generate a full set of content at a fixed period, so that incremental additions were made to the website. There was no systematic quality assurance in the publication process.
 

Historical Tools and Process

 Briefly, we should examine the tools that had been chosen to support this system, and the way in which they were deployed.
 

Authoring Tools

 The authoring tools have been described above: Word was used for generating content, with FrontPage, text editors, or other authoring tools used to do "clean-up" on the HTML when produced, by those who had the time and knowledge. There was some general standardization of document types and of typical outlines for each of these types, but there was no rigorous use of the template features of Word sufficient to drive automated HTML creation.
 

Document Management

 The document management system was Documentum's EDMS, used essentially without customization. Both client applications and the web interface were used, but because these were unmodified for an audience that had not been trained in the use of the tool, the authoring population mostly avoided the repository. The repository support staff ended up doing everything from simple check-in and check-out to the input of metadata, as described above. Consequently, the repository, while having extensive workflow features, was used only to store finished content.
 

Web Content Delivery

 Publishing was also done with a Documentum repository mounted outside the firewall. The standard process here was to use replication to reproduce the repository, which was set up to deliver the HTML views of the documents it contained. Metadata on the documents controlled where the documents appeared within the website and whether they could be seen by those coming to the website. As noted above, the input of metadata to support a publishing effort was often massive, because a few people were responsible for inputting metadata on documents they were unfamiliar with. They literally worked long weekends adding metadata to repository documents before a wholesale updating of the website.
 A good deal of effort had gone into thinking through the design of the website and the basic framework in which the content was displayed. The content authoring implementation was much simplified in that it did not involve a front-end redesign: it was simply a matter of automating the publishing process that fed the website, and making the content itself, rather than the basic organization and presentation framework, more consistent and of higher quality.
 One downside of this publishing system contributed significantly to the non-use of the internal repository by the writers, however: because a straight replication was required to publish to the website, the internal repository had to be organized in exactly the same way as the external one. This was fine for those coming to the website, but was very poor organization for those creating the content. The basic problem was that the internal authors saw everything in the repository, rather than just that small subset of documents that they cared about for linking, authoring, and editing purposes.
 

System Design and Improved Process

 Consultants were brought in to help with the system redesign, intended to address the problems noted above. This was a standard consulting engagement, although the deadlines were quite tight. The client program manager had never considered using XML or SGML to aid in this process - the idea was proposed by the consultants when they were brought in to propose a solution. This was not a pre-considered "SGML implementation," which was one of the factors that contributed to its success - XML was to be used as a purely enabling technology, resting on its own merits, and not on the hype that was focused on it in technology magazines.
 The solution design was intended to solve several of the more difficult problems noted above by simplifying the process for the authors and reviewers, automating transformations and workflows, and automating the publishing process. At the same time, and because of the difficulties noted above, the client manager was willing to consider any reasonable solution, and was not afraid of "cutting edge" technology. This willingness to judge a technology solution on its apparent strengths was another key factor in the eventual solution. In keeping with this attitude, demos using FrameMaker+SGML and Documentum were prepared by the consultants, to show that the promised functionality could indeed be delivered.
 

Implementation Plan

 The basic plan was created by conducting a series of group interviews with users, management, and support personnel. General dissatisfaction with the existing system made for a high degree of frankness from all parties, so that authors felt free to criticize the system in front of their managers. Requirements gathering was a quick process, taking only two weeks before a draft of the basic design was in place. The DTD itself, as can be seen below, was extremely simple, and was created after two days of document analysis based on supplied published HTML and corresponding source content, and a day of meetings with the authors and designers.
 In part, the brevity of requirements gathering was enforced by the fact that the authors were not available for longer periods of time. Since their primary responsibility was manning the phones, they were not available for lengthy document analysis meetings. Once the basic design was in place, it was reviewed with the project manager, and then was reviewed by a small group of authors and support staff. Changes were made, and implementation proceeded. The basic process was iterative development in which the consultants proposed a system, and changes were requested. Additionally, the consultants held several controlled pilot testing sessions with groups of authors and reviewers, which enabled the end users to provide immediate usability feedback. This iterative rapid application design process made optimal use of the limited time available with the users, support staff, and management.
 

System Design

 The system itself was directly aimed at addressing the problems presented above - the end goal was kept firmly in view, so that there was neither "cool technology for coolness' sake," nor project "scope creep." The general process flow is depicted in the figure below. The application essentially uses a series of repositories that support a specific aspect of the application; processing, workflow, and transformations in the first repository, presentation and staging in the second, and web delivery in the final repository. Documents are moved from the first repository through the second and third via automated replication. This model separates the content authors from the web presentation process; they only have to worry about creating useful and accurate content, while a webmaster is concerned with the templates that control web presentation. The end user experience is a combination of meaningful content and presentation.
 
 

Content Authoring and Review

 Authoring was originally going to be done in FrameMaker+SGML. The DTD was written as part of the initial design, although it was modified during implementation as needed. The original document format was intended to be SGML - changes here are discussed more fully below. To create a document, the users would go into a web-based interface that showed them only their own documents that were currently in-process. To create a new document, the author simply filled out a form by choosing values from picklists indicating the type of document and the basic metadata, and a blank template was launched in the authoring tool. The document would be stored in the repository until it was ready for review.
 To launch a review, the author simply chose the document and clicked on a "review" button. The workflow features of the repository were automatically triggered, performing a transformation of the SGML into a "review" HTML page, which was then forwarded to the reviewer automatically. The "review" HTML interlaced a series of text input boxes between the contents of the document, allowing the reviewer to make comments, but not allowing them to directly edit the content (see figure below).
 
 This solved part of the problem with the former review process: the reviewers could not rewrite the original document, but were only allowed to comment. Once the review was complete, the reviewer would press a button at the bottom of the review form, and it would be filtered into an "annotated" HTML view of the same document, which went back to the author (see figure below).
 
 The "annotated" view of the document had the reviewer's comments embedded in the document in a different font and color than the document content. This was automatically attached to the original source document in the repository, so that it remained as an audit trail in case the author and the reviewer had later questions about what corrections were suggested. Again, the author's view into the repository was sufficiently restricted that only the specific author's source documents and their reviewed "annotated" HTML versions were seen by any single writer. Once reviewed, the author could cut-and-paste from the reviewer's comments directly into the authoring tool. The number of reviews was limited to two so as to eliminate extended back-and-forth exchanges.
 Inter-document linking was handled by leveraging a neat feature of the Documentum repository: the authors would go to the external website and cut-and-paste the title (not the URL) of the document to be linked to. At publish time, a dynamic query was run against the document repository to supply information about the correct URL and to verify the existence of the linked document for the final version. The authoring interface was made to resemble the published content as closely as possible, and the same effect was achieved in the HTML created for review. The reviewers saw something that looked much like the final published version, which helped in visualizing the final output.
 As documents passed through the review process, they had a status automatically set by the workflow system, so that managers could see where each document was in the review process.
 

Publishing

 Once completed and reviewed, the SGML document and attached annotation documents would be automatically moved to a separate part of the repository, that was organized exactly like the external website, but remained inside the firewall. A dynamic publishing process was run on the documents at this point, so that the "final" HTML versions were made available for quality assurance by the support staff and managers. The entire repository would be "published" every night, with all linking being generated automatically from markup inside the SGML document, and the metadata that the authors had supplied in creating the document. Links were maintained by the repository between the "final" source documents and the HTML views that were published outside the firewall. This allowed some neat automation: support staff could query the website for the source document, and the name of the writer who created it, so that it was easy to request fixes to the documents. When a broken link was clicked on by a visitor to the website, an e-mail was sent to those who maintained the site, telling them which HTML page and which source file contained the broken link, and what the missing target was.
 All creation of HTML was automated from the SGML source document, so that the website achieved a hitherto-impossible degree of consistency and uniformity. Although the DTD itself was quite flexible, it provided the authors with both a blank "template" to start with, and only a single tag to achieve any given effect. The result were a high degree of consistency in spite of the fact that the DTD itself was very "forgiving." Parser validation was not used as a way to force authors to conform to a particular document structure - it was felt that the same effect could be achieved without making the authors frustrated. At the same time, an invalid document could not be checked into the repository, so the system forced the authors to obey what forgiving rules the DTD did contain.
 

Coding/Implementation

 The coding and implementation of this design was done by a team of 5 consultants, two of them full-time, and the rest half-time. A period of thirty days was allowed for development (see below).
 

Tools

 The tools chosen for implementation were those specific to the applications: Documentum has a strong set of customization tools for it's web interface, workflow, and automated renditioning features. The Editing tools required their own development, again internal to the package. Transformations were written using UNIX-based scripting languages.
 

Programming Languages

 The basic programming languages involved in creating this application were Perl - used for server-side scripting and transformations - and "DocBasic," the variant of Visual Basic that is used as a customization language inside of Documentum.
 

Parsers

 James Clark's NSGMLS parser was used for all server-side processing and validation. Transformations were based on simple ESIS processing, rather than on DOM or tree-processing techniques. (Standard Perl libraries are available for simple ESIS transformations.) As XML was eventually substituted for SGML, the same parser was used, but in "XML Mode." The built-in parsing capabilities of the authoring tool were used, obviously, for validation of documents while they were being created.
 

Editors

 FrameMaker+SGML was initially chosen as the authoring tool, and early development included the creation of a FrameMaker+SGML application. The users machines were found to have too little memory to run FrameMaker+SGML in a satisfactory way, with too much memory used by existing helpdesk applications that were the standard tools of the user's community. While FrameMaker+SGML made a positive impression on users in trials, it was ultimately rejected in favor of SoftQuad Software's XMetal XML editor. XMetal was at that point only in beta, but was made available to the client as part of an early adopter's program. It used sufficiently less memory to run effectively on the user's machines without requiring memory upgrades across the board.
 This choice in editing tools drove the shift from an SGML format to an XML format however, although this was not a major shift due to similarities in the syntax, and the fact that the same parser was retained.
 

Training Requirements

 It was recognized up-front that training and testing with the users would need to be conducted, and this was planned for. The writers had no problem using the authoring tool or repository interface, although they did report bugs and request changes that were implemented. Training was held on a repeating basis, to accommodate the schedules of authors who could not be pulled wholesale off the phones for a full-day event. A short one-day course was developed that introduced authors to structured authoring and the XMetal application, and then focused on authoring documents with the specific DTD and using the repository and web interface.
 

Implementation Challenges

 Rather than describe each of the difficulties encountered during implementation in detail , a simple listing will suffice:
 
  •  Handling embedded graphics was one of the most difficult to manage issues; passing them back and forth from client to server was difficult, and a mechanism had to be introduced for linking to large network diagrams that could not be displayed in-line. The initial choice of authoring application, FrameMaker+SGML, made it still more difficult in the way it handled graphics by naming them "image1, image2," etc. Switching to XMetal made things easier since it keeps graphics names unchanged. We ended up writing a routine that would parse the XML for a graphics or graphiclink element; upon finding one the user would then be prompted to import the graphic(s) from the file system. Checking out the file would invoke a similar process that would sequentially pass graphics back to the client, keeping them with the XML file.
  •  The shift from FrameMaker+SGML to XMetal, and the difficulty of working with a beta program that was being updated during development, caused delays. Also, scripts had to be updated to accommodate XML as a primary source format. SoftQuad was very responsive and helpful in working with us, and provided significant reassurance to the client's management to help them feel comfortable deploying a tool that was not yet in final release form.
  •  Server access created difficulty with the development environment, and it was found that the IS staff were in very high demand, making them less responsive to requests from outside consultants. Because development was done by consultants, access to servers was restricted, making their participation mandatory. Since the server platform was Sun/Solaris, not having root access to these systems and having to wait sometimes for days for IS support made development difficult at times.
  •  The short timeframe to get the application up and running placed high demands on all aspects of the project. The initial requirement was to have a pilot in 30 days for testing; ultimately, time was extended to twice this duration.
  •  Authoring client work stations were overloaded with applications and seriously underpowered (P100-150 with 32 MB RAM.) Upgrades were not an option, leading to the switch in authoring applications mentioned above. Dealing with this issue required time to evaluate options, however, and caused some doubt as to the ultimate success of the application.
  •  The difficult authoring culture had a serious impact on the project: unsophisticated users meant that interfaces and tools had to be very simple. At the same time, author availability for requirements gathering, testing, etc. was limited. Review processes exemplify the way in which automated solutions had to be found for limiting what authors could do, because the process around document creation and review could not be relied on.
 

Solutions and Considerations

 Ultimately, the project was successful, and the users reported that the system was both usable and represented a significant improvement over the one it replaced.
 

Overcoming Challenges

 The flexibility of the client manager assigned to the project was a major factor in this success. The support staff and certain of the authors were also very dedicated to helping the project succeed. By working with the implementors in a cooperative spirit, many of the difficulties listed above were minimized. The consulting team, too, was overall a very experienced one, with a good deal of familiarity with the Documentum platform, and considerable expertise with the Web-related aspects of the tool. It is worth noting that only two of the consultants had significant experience with SGML and XML before this project - the others were quick studies, but were new to the technology. This did not present a major barrier, however.
 

Considerations: The Good and the Bad

 A number of factors stand out in this implementation that can be pointed to as valuable experience for others doing similar web-publishing applications.
 

Definite Target, Definite Timeframe

 The very limited nature of this project was critical to its success, because the end goal had to be kept firmly in sight. It was not plagued by shifting requirements or long-term vision. Because timeframes were short, and need was high, the implementation maintained a focus that is not usual in this kind of project. Risk-management was a critical part of running the project, and the consulting team recognized that client management had to be kept informed at all times.
 Because SGML/XML was used strictly as a solution, and not because it was "cool" or "cutting edge," the possibilities of the technology did not hinder the deployment of the solution. Many aspects of repository-based SGML/XML systems, such as component reuse, were simply not considered, simplifying the problem to be solved.
 

Not Bound By "Traditional" SGML Implementation Philosophy

 As a direct result of the above factors, many of the difficulties encountered in traditional SGML implementations were avoided. There were no prior expectations on the part of client staff about what SGML/XML was supposed to do. The DTD, and the process used to create it, best demonstrate this aspect of the project. The attitude was "keep it simple," which allowed use of the tag set to be considerably simpler than either standard Word templates or typical content-based DTDs. Because content-tagging was only marginally useful, it was not a focus of the document analysis - formatting was the goal, and it was an easy target to hit.
 At the same time, this means that future use of the XML source may not be optimal. The highest priority was to get a system deployed, so no particular thought was given to what might be in the future. Even the possibility of directly delivering the XML over the Website was not considered, although the DTD would be sufficient for this purpose.
 

XML Does Nothing

 ( By itself, that is!) SGML and XML are just data formats, and without the right combination of tools, they do not convey any particular power to an application. The most difficult part of building an XML-based system, and one that requires a lot of thought, is the allocation of functionality to the different aspects of the system. In many cases, the Documentum repository was capable of doing things that could also be driven from XML-based technology. Very often, SGML and XML consultants will do things the hard way because it aligns with the "purer" religious aspect of open-standards technology. We did not have the luxury of doing this, so all technology decisions were driven by a need to provide solid, reliable functionality as quickly as possible
 This application combined XML-driven transformations, repository-based workflow, versioning, and publishing, automated querying, CGI programming, and a host of other technologies. The key was to find the best solution by choosing on the merits of each, rather than by focusing on demonstrating the power of a specific technology. This is harder to do that it sounds!
 

Tool Selection

 Tools selection was a critical part of this application. Choosing FrameMaker+SGML was not appropriate, because it placed requirements on the user's workstations that we knew could not be met. This placed the entire project at risk, and it was only through good fortune and the willingness of SoftQuad to provide beta software and the client to accept the risks with betting on a product still in beta that a solution was reached.
 Documentum was also the right tool for the job - what EDMS 98 lacked in native SGML/XML functionality, it more than made up for by providing workflow, transformation, and customization APIs. In many cases, a "native" XML or SGML tool is chosen when it meets the needs of the structured document format, but is not well-qualified in other respects.
 And Thank God for James Clark!
 

Summary

 The Web Content Authoring project shows what XML implementations promise to give us: an enabling Web-based technology that does not need to "push the envelope," but simply needs to get the job done. As XML tools become more common, they will be able to give us more and more functionality with less and less effort. The most conspicuous part of this project was that XML was used to simplify the HTML authoring process, something that is not generally considered a part of what XML has to offer us. In the future, it is hoped that implementors will not be bound by traditional uses of structured markup and related technologies, but will recognize that the flexibility they offer can be used in many different ways.
 

Appendix: The DTD

 The DTD is self-explanatory. Note that it is simple in the extreme, and that it is format-oriented more than it is content-oriented. As an exercise, compare it to the HTML 3.2 DTD, and you tell us which is easier to use!
 
<!-- DTD For ASCEND TECH PUBS (SGML Version 1.2) -->


<!-- GENERAL COMMENTS:

This DTD is meant to be as simple as possible, and to provide maximum flexibility 
for the authors. As a consequence, there is very little containership, except as 
necessary to drive known formatting and processing requirements. Although there is
a recognizable division structure in most of these documents, it has not been used here.

For consistency of structure, a mechanism has been provided as part of the 
Repository application to provide a template containing skeletal tags and 
some pre-generated headings, but the use of these headings is not
enforced by the parser.


Added the "SITELINK" element 5/14/99 - Arofan Gregory
Added the "GRAPHICSLINK" element 5/14/99 - Arofan Gregory

-->


<!ENTITY % text "#PCDATA | ITAL | BOLD | BOLDITAL | CODE | DOCLINK | SITELINK">

<!-- Entities for Special Characters -->
<!-- Note: Need to copy these inline, or determine a mechanism for delivering them to the 
client machine and processor where they an be used. Currently, it is anticipated that 
this will be done using the SGMLOpen catalog mechanism. -->

<!-- ISO SDATA ENTITY SETS -->

<!ENTITY % HTMLlat1 PUBLIC "-//W3C//ENTITIES Latin 1//EN//HTML">
     %HTMLlat1;

<!ENTITY gt ">" >
<!ENTITY lt "<" >
<!ENTITY amp "&" >
<!ENTITY Tab "      ">


<!ELEMENT PUBS.DOC - o (HEADER, BODY)>
<!ATTLIST PUBS.DOC     OBJECT.ID CDATA #IMPLIED
                    OBJECT.VERSION CDATA #IMPLIED>
<!-- These attributes will be populated automatically by the repository. -->

<!ELEMENT HEADER - o (ATTRIBUTES, TITLE, SUBTITLE?)>
<!ELEMENT BODY - o (CODEBLOCK | PARA | HEAD | SUBHEAD | SUBSUBHEAD | NUMLIST | BULIST | CAPTION | GRAPHIC | GRAPHICSLINK |
TABLE | NOTE | STEP)+>

                    

<!-- HEAD ELEMENTS -->
<!ELEMENT ATTRIBUTES - O EMPTY>
<!ATTLIST ATTRIBUTES
        SECURITY CDATA #REQUIRED
        ENTITLEMENT CDATA #REQUIRED
        CONTENT.TYPE CDATA #REQUIRED
        PRODUCT CDATA #REQUIRED
        PRODUCT.FAMILY CDATA #REQUIRED 
        >

<!ELEMENT TITLE - o (#PCDATA)>
<!ELEMENT SUBTITLE - o (#PCDATA)>
<!-- Inline tagging is not allowed in titles, as these values will be promoted to
attributes on the object, and displayed through an interface that will not understand 
any inline tagging (they would be displayed as literal text!) This helps to simplify 
processing. --> 


<!--BODY ELEMENTS-->
<!ELEMENT HEAD - o (%text;)+>
<!ELEMENT SUBHEAD - o (%text;)+>
<!ELEMENT SUBSUBHEAD - o (%text;)+>
<!ELEMENT PARA - o (%text;)+>
<!ELEMENT CODEBLOCK - o (#PCDATA | BOLD | ITAL | BOLDITAL)+>


<!-- LIST DECLARATIONS -->



<!ELEMENT BULIST - o (ITEM | SUB.NUMLIST | SUB.BULIST | CODEBLOCK)+>


<!ELEMENT NUMLIST - o (ITEM | SUB.BULIST | SUB.NUMLIST | CODEBLOCK)+>


<!ELEMENT SUB.NUMLIST - o (ITEM | SUB.SUB.NUMLIST | SUB.SUB.BULIST | CODEBLOCK)+>


<!ELEMENT SUB.BULIST - o (ITEM | SUB.SUB.NUMLIST | SUB.SUB.BULIST  | CODEBLOCK)+>


<!ELEMENT SUB.SUB.NUMLIST - o (ITEM  | CODEBLOCK)+>


<!ELEMENT SUB.SUB.BULIST - o (ITEM  | CODEBLOCK)+>



<!ELEMENT ITEM - o (%text;)+>

<!ELEMENT NOTE - - (#PCDATA)>
<!-- There is only ever a single paragraph in a NOTE -->

<!-- The STEP element causes everything inside it to be indented (put all HTML inside <UL></UL> tags. Each Step has an
auto-generated number in front of the STEP.TITLE (driven by the filter and authoring application, not by the HTML tagging).
A STEP.TITLE is formatted exactly like a HEAD. This construction is used to drive the auto-generated TOCs in the published
Web format. -->

<!ELEMENT STEP - - (STEP.TITLE, (CODEBLOCK | PARA |SUBHEAD | SUBSUBHEAD | NUMLIST | BULIST | CAPTION | GRAPHIC | TABLE |
NOTE)+)>

<!ELEMENT STEP.TITLE - o (#PCDATA)>


<!-- GRAPHICS -->

<!ELEMENT CAPTION - o (%text;)+>
<!ELEMENT GRAPHIC - o EMPTY>
<!-- The FILE attribute contains the name of the graphic to be included.
It is assumed that the graphic will be in the same directory as the XML
Document.  -->

<!ATTLIST GRAPHIC     FILE CDATA #REQUIRED
                    HEIGHT CDATA #REQUIRED
                    WIDTH CDATA #REQUIRED >
<!ELEMENT GRAPHICSLINK - o EMPTY >
<!ATTLIST GRAPHICSLINK FILE CDATA #REQUIRED
                        PLACEHOLDER  CDATA #FIXED "   ">


<!-- INLINE ELEMENTS -->
<!ELEMENT ITAL - - (#PCDATA)>
<!ELEMENT BOLD - - (#PCDATA)>
<!ELEMENT BOLDITAL - - (#PCDATA)>
<!ELEMENT CODE - - (#PCDATA)>
<!ELEMENT DOCLINK - - (#PCDATA)>
<!-- The DOCLINK element contains the text of the title of a referenced document 
object in the docbase. This is automatically verified on Check-In. Consequently, 
special characters and inline tagging are not allowed in titles. -->

<!ELEMENT SITELINK - - (#PCDATA)>
<!ATTLIST SITELINK URL CDATA #REQUIRED>
<!-- The SITELINK element contains the text of a reference to a site on the internet. 
The attribute URL contains the URL of the target site. -->

<!-- TABLES -->

<!ELEMENT TABLE - - (TR*)>
<!ATTLIST TABLE WIDTH CDATA #IMPLIED
                BORDER CDATA #IMPLIED>
<!ELEMENT TR - - (TD*)>
<!ATTLIST TR 
            ALIGN (LEFT|RIGHT|CENTER) #IMPLIED>
<!ELEMENT TD - - (%text;)+>
<!ATTLIST TD
            ALIGN (LEFT|RIGHT|CENTER) #IMPLIED>


How XML Enables Internet Trading Communities and Marketplaces   Table of contents   Indexes   Financial Products Markup Language