Regulations Worldwide Online at the Siemens Public Communication Networks Group   Table of contents   Indexes   Automating Language Translation Requires Document Management, Workflow, and Application Tools

 
 

XML and Electronic Commerce: But What About Documents?


 
Lani   Hajagos
  Adobe Systems, Inc.
345 Park Avenue
San Jose   California  95110
Phone: 408-536-6168
Fax: 408-537-4215
Email: hajagos@adobe.com Web: http://www.adobe.com
 
Biographical notice:
 
Lani Hajagos
 
Lani Hajagos is Senior Product Marketing Manager for SGML Products at Adobe Systems Incorporated. She has been involved with computer-assisted typesetting, authoring, and publishing for more than 25 years. During that time, she has participated in the design, development and marketing of systems for computer typesetting, database publishing, newspaper editorial and classified ad, technical documentation, and SGML authoring and publishing. She has been responsible for a number of innovations including new methods of kerning, WYSIWYG displays, generic markup methods, and parameter-driven database publishing.
 
Ms. Hajagos has been affiliated with a number of organizations including Latham Process Corp., CompuScan, Electronic Imaging Technology, Intergraph, Frame Technology, and currently, Adobe Systems. She is currently serving as a Director of the Graphic Communications Association.
 
ABSTRACT:
 
XML has quickly captured the attention of organizations involved in the WorldWide Web. At the Seybold San Francisco conference, Bill Gates proclaimed XML to be the "most exciting new technology" of the year. Numerous proposals have been submitted for using XML for Web applications.
 
The majority of these proposals involve use of XML in ways that are very different than the traditional use of SGML. For example, Netscape's RDF format deals with encoding metadata about documents that enable more precise searching and management of documents on a web site. Microsoft intends to use XML for its CDF format that supports automated delivery of information to subscribers based on a personalized profile.
 
XML has also been proposed as a standard format for encoding electronic commerce transactions. Currently, each vendor of electronic commerce tools utilizes a proprietary format for transactions, making it difficult manage sites using multiple tools. Adopting XML with a common vocabulary would greatly simplify this task, while still allowing each vendor to develop their own special features.
 
The "traditional" users of SGML envisage using XML to deliver documents over the web that can be rendered and searched with greater precision and control that is currently possible with the use of HTML. Before this is possible, however, the browser manufacturers will have to support rendering of XML documents with the still-unspecified XSL style specfication mechanism. In addition, the makers of search engines will have to enhance their tools to be able to constrain searches to specific elements to portions of hierarchies.
 
This paper surveys the status of XML in the various uses for which it has been proposed. It will also look at how the traditional uses of SGML/XML might be combined with the newer uses to enable better information exchange over the Web.
 
 

Who uses XML?

 
XML was designed first and foremost for use on the WorldWide Web. There have been efforts for some time to deliver SGML content over the Web. Products such as Panorama Pro and ViewPort can be run from inside Netscape and Microsoft Internet Explorer. However, delivering the files can be problematic. Not only must the instance be sent over the network, but the DTD, the declaration, and the style sheet as well. All of this makes for lots of network traffic and consequently slow performance.
 
In addition, many of the features of SGML result in compute-intensive operations. For example, inclusion, exclusions, and markup minimization add significant complexity to application software and lots of cycle time to the processing.
 
To overcome these obstacles and enable rich structured information to be delivered, viewed, searched, and otherwise processed in the context of Web applications, XML did the following:
  • Removed optional features such as Link and Concur
  • Removed features such as inclusion, exclusion, and markup minimization
  • Removed the necessity for shipping extra files by introducing the concept of well-formed document. Applications that can operate on well-formed documents do not require a DTD.
  • Eliminated the need for an SGML Declaration.
  •  
    Not only are XML application fast, but they are also compact. This allows a lot of processing to be moved from the server to the client machine, thus further eliminating network traffic.
     
    As application developers looked at XML, they realized that it is useful for modeling and encoding data as well as documents. Consequently, uses for XML were proposed in areas where SGML had never been considered.
     
     

    Areas of activity

     
    Applications for XML are being developed in three main areas:
    • Delivery control
    • Content management
    • Electronic commerce
     
     

    Delivery control

     
    Delivery control addresses the problem of how to get the right information to the right person in a timely manner. The WorldWide Web has made available to use enormous amounts of information. However, we all know the problems involved in finding and using that information. At Adobe Systems, we have a corporate intranet that contains all the information you’d ever want to know relating to our products, services, business practices, etc. However, finding the information, or even knowing that it’s there to be found, can be a daunting task. Think how much worse this problem is when it is multiplied by the tens of thousands of sites on the Internet.
     
    It’s difficult to find information because the only tool we have is full text searches. Instead of this crude, imprecise method, imagine the following:
    • The publisher of the information describes the contents of the document in terms of topics, keywords, etc.
    • The consumer of the information describes his or her interests, also in terms of topics and keywords.
    • A web agent matches the publisher’s content with the consumer’s interest, delivering the relevant information to the consumer’s desktop.
     
    The architecture of this delivery control mechanism is quite simple and elegant. The document itself could be in any format: SGML, XML, Word, FrameMaker, WordPerfect, ASCII, etc. It is associated with an “;envelop” of metadata in XML format, containing information about the document: the owner or publisher, the format, the date the information becomes effective, the date it becomes obsolete, limitations on usage, topics, keywords, etc. The XML envelop is what the Web Agent looks at in matching content with consumer.
     
     

    Delivery control examples

     
    A number of companies have developed, or are developing, delivery control tools. These include Microsoft, with CDF, Netscape with RDF, and DataChannel with the XML Toolkit and ChannelManager. Please note that this list is not exhaustive.
     
     

    Content management

     
    Attaching metadata to a document makes the document a lot smarter and easier to find. Identifying the information inside the document makes it even more useful.
     
    Content management is applying XML to the traditional uses of SGML: for document markup.
     
    Most document formats, including HTML, are designed to describe the appearance of the document. Describing information in terms of appearance severely limits its usefulness. A trouble-shooting manual might contain many ordered lists, encompassing information such as diagnostic procedures, repair procedures, tools required, etc. If each type of list is identified by what it is, not how it looks, it becomes possible to manipulate the information in many different ways. Individuals and organizations using SGML have known this for years, and the same holds true for XML.
     
    As with SGML, using XML to identify the components of information in a document requires use of a controlled tag set. XML enables the information to be organized hierarchically, making it easier to extract and manipulate portions of the information.
     
    Because the document is being marked up with an arbitrary tag set, it cannot be viewed with an HTML-based browser. What is required is either a specialized XML browser, or enhanced versions of Netscape and Internet Explorer that support browsing of XML.
     
    Content management is synergistic with delivery management. Using both aspects of XML provides maximum utility and power for your information.
     
     

    Content management architecture

     
    In this architecture, the document would consist of XML-tagged content, plus references to external contents which might be XML or binary formats. Depending on the DTD or schema used, the information might (and probably would be) arranged hierarchically. This XML document could, of course, be associated with an XML envelop containing metadata about the document to drive delivery management tools.
     
     

    What is needed?

     
    Those of us who have been publishing with SGML for years are ready and waiting to deliver XML-tagged content over the Web. It’s relatively straight forward transform an SGML document to XML. But tools need to be available on the delivery end as well.
  • XML support must be widely available in the browsers. This support should be transparent, where the browser figures out the format of the document and does the right thing.
  • A stable style sheet mechanism is needed so that the publisher can control its appearance. Currently, a committee under the auspices of the W3C is working on a proposal for the XSL  (eXtensible Style Language)
  • The linking mechanism also needs to be established. XLL  (eXtensible Link Language) , which is also in proposal stage, allows for a robust linking model that will provide much more functionality that is currently possible with the HTML model.
  • The search services, such as Yahoo and Infoseek, need to enhance their capabilities to allow for content-specific searches based on XML tags in documents.
  •  
    Once these tools are widely available, there will be an avalanche of XML data available on the Web.
     
     

    Content management examples

     
    Many tools are being developed, or are already available, both from commercial vendors and as freeware.
  • In the area of content creation, virtually all of the traditional SGML authoring vendors are adding XML output capabilities to their products. This includes Adobe, ArborText, and Softquad. In addition, Microsoft has announced that XML will become the standard output format for Word documents.
  • The browser vendors are beginning to implement support for XML. However, the first implementations are in the area of delivery control rather than content management. However, several of the browser manufacturers have announced support for displaying XML documents, including Microsoft, Enigma, DataChannel, and Open Market.
  • While none of the commercial search services such as Yahoo and Infoseek have publicly announced support for XML (at the time of writing this paper), there are some search engines incorporating this capabilities. These include Enigma, Inso, and Open Market.
  • Database vendors are also incorporating XML support, including Chrystal, Poet Software, Texcel, and Xyvision.
  •  
     

    Other activities in content management

     
    Many vertical industries have developed SGML applications to facilitate interchange of documents. A number of these are in the process of modifying their DTDs to make them XML-compatible. These include the Air Transport Association, the Telecommunications Industry Forum, and the HL7 group which is addressing health care. The semiconductor chip makers, under the auspices of SI2  (Silicon Industry Initiative) are interested in adopting their DTD to be XML-compatible; however, they require a sophisticated linking mechanism. Therefore, they are waiting for the XLL spec to be completed.
     
     

    Electronic Commerce

     
    Electronic commerce is an area that had never adopted SGML. However, there is a lot of XML activity going on.
     
    Ever since the web took off, enterprises have been wondering how to make money over the web. The web has been very useful for advertising and marketing products and services; however there have been few actual purchase transactions happening over the web. Partly this is because secure transmission tools are only now becoming widely available. The other problem has been that the electronic commerce tools that do exist all use proprietary formats to encode the transaction information.
     
    In addition to enabling purchasing transactions over the web, electronic commerce tools could enable a new class of relationships between business partners. For example, at present most companies take the responsibility for restocking inventory. With the appropriate e-commerce tools, however, businesses could set up parameters for restocking with their suppliers, and then pass them information about activities which affect the inventory levels. The suppliers could then take responsibility for restocking, based on these parameters. This arrangement would benefit both supplier and customer: the supplier has a closer and therefore more secure relationship with the customer, and the customer has unloaded some of its administrative functions.
     
     

    Current status

     
    Some electronic commerce software has already been deployed and it is possible to purchase merchandise are some web sites. However, the existing e-commerce packages utilize proprietary formats to encode their transactions. This poses a number of difficulties for people managing a web site.
     
    If you are using only one e-commerce package, the problem is manageable—you simply write an interface that can parse the package’s data format and transfer the information into your in-house software packages. However, if you are deploying packages from two vendors, and these packages must work together, things get more complicated. Now you need extra interfaces: between the two packages, and between the packages and your software. The problem grows exponentially with each package you add.
     
     

    ICE Initiative

     
    A group of software vendors and large users of e-commerce software have formed the ICE  (Information and Content Exchange) Initiative. The goal of this initiative is to create a standard way for businesses to control, share, and exchange information with other businesses. Based on XML, the goal is to greatly reduce the difficulty of implementing e-commerce packages from one or multiple vendors.
     
    Currently, members of the ICE Initiative include Adobe Systems, Firefly, JavaSoft, National Semiconductor, Vignette, CNET, Hollinger International, News Internet Services, Preview Travel, Tribune Media Service, and ZD Net.
     
     

    Other activities

     
    In addition to the ICE Initiative, there is considerable activity in the area of electronic commerce. CommerceNet is fostering the development and adoption of XML-based e-commerce solutions. Among the vendors, Vignette has announced StoryServer 3.2, and has previewed site-to-site technology. webMethods has announced its Web Automation Toolkit, and submitted WIDL  (Web Interface Definition Language) to W3C. Open Market has the Internet Commerce solutions: Live Commerce and Transact, which support XML.
     
     

    OSD  (Open Software Description)

     
    OSD is an XML-based format proposed jointly to W3C by Microsoft and Marimba. At the time this paper was written, it had been endorsed by CyberMedia, InstallShield Software Corp., LANovation, Lotus Development Corp., and Netscape.
     
    OSD describes software components, versions, structures, and relationships to other components. This format has the potential to greatly facilitate delivery of software and software updates over the Web, by matching the correct version to the user.
     
     

    Does this all fit together? And how?

     
    Looking at the areas in which XML is being adopted and implemented, it’s natural to ask whether these applications can or should work together. It’s obvious the delivery control and content management are synergistic. But electronic commerce doesn’t seem to fit into the picture.
     
     

    Electronic commerce and documents

     
    As we think about this further, however, there are areas where delivery control, content management, and electronic commerce work together.
     
    Take, for example, the commercial publisher that wants to sell content via the Web. They could offer this content in a variety of formats. If the purchaser wants to incorporate the content on their web site, they might want to get it in HTML format. If the purchaser intents to print in information, the best format is PDF. But if the purchaser wants to modify or manipulate the information, XML is the format of choice.
     
    The publisher could use a delivery control tool to help potential customers locate the content they wish to purchase. Electronic commerce tools would then enable the actual purchase, and the document could then be delivered in one or more formats.
     
    There is another area in which documents and electronic commerce are synergistic. Many e-commerce transactions must be supported by a lot of information. Take for example, filing a health insurance claim. The insurer requires that the relevant portions of the patient record, along with lab reports, diagnostic reports, etc. be submitted to support the claim. We can think of this as the claim document . A specialized tool would facilitate collecting the information for this document from databases and other files, put it in the proper format, supply the necessary wrapper, and submit it to the e-commerce tool.
     
    Another example is on-line catalogs. More and more retail enterprises are setting up on-line stores where shoppers can examine, select, and purchase merchandise. However, if the store offers hundreds of items, browsing the catalog can be time consuming and frustrating. A better solution is a tool which can create a subset catalog items on the fly, based on the shopper’s profile, and their expressed interest. This dynamically-created document will greatly enhance the shopping experience and result in more sales for the retailer.
     
     

    Where do we go from here?

     
    As we have seen, the potential for XML-based web applications is enormous. However, to have the development and adoption of these applications become a reality, a number of things have to happen:
  • The XLL and XSL specs have to completed and accepted by W3C.
  • The ICE specification needs to be developed.
  • OSD needs to be adopted.
  • Industry-specific vocabularies need to be developed in order to have smart searching. This means that both vertical and horizontal industry groups must get together and invest the time in analyzing their information to create DTDs and XML schemas.
  • FInally, we need to adhere to the KISS  (Keep It Simple, Stupid!) principal. XML is appealing because it eliminated many of the complicated features of SGML. We need to maintain this simplicity, and apply it as well to XSL and XLL.
  •  
    The Web has been hailed as the mechanism for making information universally available. XML is what is needed to making that information accessible and useful. It’s an exciting time for all of us.

    Regulations Worldwide Online at the Siemens Public Communication Networks Group   Table of contents   Indexes   Automating Language Translation Requires Document Management, Workflow, and Application Tools