Datasheets and Databooks at Fairchild Semiconductor   Table of contents   Indexes   Using to create information support for management applications

 

A Generalized Online Delivery Paradigm for XML Information

 AIS Software  
 France 
Lecluse, Christophe
 Paris 
 
Christophe  Lecluse
General Manager,  AIS Software 
 17 rue Remy Dumoncel
Paris  (France) 75014 
Email: clec@ais.berger-levrault.fr

Biographical notice

Christophe Lecluse is general manager of AIS Software. He has been working on SGML applications and systems for over eight years and specializes in SGML systems, object databases and languages. He was the originator of the Balise and Dual Prism products developed by AIS Software.

 Content Management 
Online Delivery
 

Introduction

  Publishing companies and other information providers increasingly think of themselves as content owners with information as their product. They increasingly establish centralized digital repositories to store all of the content owned by the organization, in various media: text, images, audio and video.
  Content Management thus cover all the applications that help organize this information so that its owners can benefit from the flexibility of digital information without getting lost in the "virtuality" of this new information process.
  This definition is maybe the broadest possible definition of Content Management. It characterizes the current evolution of IT technologies and Document Management technologies that are tending to converge into a unified framework. The Web introduced the "document" as a generalized metaphor for information systems interfaces.
Web-publishing
 

A "Web-publishing" Definition

  Closer to us, the very fast development of Web-publishing already justifies the introduction of specific concepts and technologies. Many Web publishing projects today struggle with issues like granularity of information, difficulty to merge information from various sources (especially information stored in traditional relational database systems) difficulty to personalize publications, etc.
  Web content management can thus be defined as a way to organize the production and delivery of structured information organized in Web publications.
  One of the key requirements of this organization is the possibility to abstract a particular piece of information from the final delivery representation of this information. It has became obvious that producing "flat" HTML pages organized through simple hyper-links and stored in file systems is not scalable to Web publishing projects of significant size.
  For a decade, SGML, and now XML, has proved to be an adequate method for abstracting information from its delivery in various media. The very fast adoption of XML in Web Content Management is thus not really surprising.
 

A "Functional" Definition

  Because Web Content Management establishes a bridge between information production and information delivery (or between document production and document delivery), the proposed solutions have features that relate to information management and other features that relate to information delivery. This of course contributes to the difficulty of characterizing and comparing solutions and products. However, a tentative functional definition for Web Content Management solutions could be the following.
 Production 
 
  • Define "document modules" at a granularity that is adequate for production and management, and not necessarily tight to the granularity used for delivery.
  •  
  • Define those modules in formats that are adequate for production and re-use, and are not necessarily the same as the delivery format. Of course we think XML/SGML should be the preferred format in many cases, as it is abstract enough to automate transformation into other formats. XML also gives access to the internal information of a module, thus allowing (when needed) management and manipulation at the component level rather than the module level.
  •  
  • Organize those modules in manageable hierarchies, and have easy access methods to search and retrieve information during the production phase.
  •  Delivery 
     
  • Automate the generation of one or several individualized delivery views from a given content. This generation may be done off-line (batch) or on-line (dynamic) depending on the constraints of a given project.
  •  
  • Largely reorganize the document modules or document modules content in this generation process
  •  
  • Merge document information with other sources of information like those stored in RDBMS.
  •  
  • Handle multiple destinations and/or multiple formats. HTML versus XML as a destination format is a typical example.
  •  
  • Easily handle large volumes of data and to provide searching capabilities on such volumes.
  •  
  • Leverage the tremendous efforts ongoing on the Web architectures, especially on protocols, security, servers and browser technologies etc. This means providing a complementary technology that uses standard Web components.
  •   In this paper, we will focus on the delivery side of the process and propose a framework for structuring server applications implementing these functions.
     

    Typical Online Delivery Applications

      There is clearly not one typical online delivery application. The spectrum of possible applications is very wide. We can however provide a few representative examples along this spectrum. These applications are typically the ones we had in mind when designing the framework described below.
     
  • Technical Documentation: IETM Applications like those implemented in the automotive industry (J-2008 DTDs) or in the aerospace industry (ATA and AECMA DTDs). Such applications require dynamic configuration handling and dynamic assembly of information from various sources and data modules.
  •  
  • Electronic books, like those produced and published by legal publishers, software companies, etc. All work done around the DOCBOOK SGML DTD is typical of this application domain.
  •  
  • Online Catalog Applications.
  •  
  • Online Press Applications.
  •  

    Common Characteristics: Dynamicity and Various Information Sources

     Being able to merge (in batch mode and/or dynamically) several sources of information, including RDBMS information is a very common and important aspect of Web online delivery systems, especially for applications using XML/SGML. Even if XML/SGML has (in theory) the capability of modeling most of a publication's content, we must be able to use existing databases and leverage RDBMS technologies and applications when they are appropriate.
      Figure 1 below presents the typical architecture for an online delivery server, merging several information sources.
     

    Challenge: Reduce Development and Maintenance Cost for Applications

      Important applications are already deployed within corporations using such an architecture. However, most of them have been built using large custom and low-level programs. Reducing development and maintenance cost for such applications is thus an important challenge for deploying cost-effective solutions and really taking benefit from dynamic personalized web sites without spending millions of dollars.
      A first condition to significantly reduce development (and maintenance) cost is to rely on higher level components and abstractions than just low level XML and database manipulation and programming. A second condition is to offer high level GUI-based development tools that implement this architecture and guide developers throughout the implementation process.
      This is the approach we adopted when developing our Dual Prism product. The web delivery framework we present below have defined and implemented into this product.
     

    Proposal for a Dynamic Web Delivery Framework

     

    A Repository for XML Data

      Most applications require a repository for the XML data that is published. A general (and generic) XML repository is thus a key component in a Web delivery system.
      The XML repository is a structured information storage space that contains all the resources required to configure and publish the online delivery applications. We organize it as a set of Collections. Each Collection is made up of one or more XML Modules. A Module corresponds to an actual document or part of a document to be published. Each module may itself be divided into Sub-Modules, thus allowing hierarchies of XML modules to be defined.
      Several versions of a Module may exist at a given time, and the server can choose to publish one particular version of a module at a given time.
      Genericity of the XML repository is key for providing a flexible system that can be used seamlessly to publish SGML Docbook documents, J2008 technical documentation modules, or hierarchies of catalog pages merging XML content with factual data.
      Of course, relational databases also play a major role in many of these applications for representing and managing factual data (prices in a catalog application, part references, etc.) and meta data (configuration information, keywords, etc.). However, modeling and decomposing every item of XML data used in an application into a set of relational tables would lead to complex and costly processes in many cases, and would lack genericity.
      We thus propose an framework where a native and generic XML data repository does co-exist with relational databases that store factual data, meta data, and in some cases, also XML fragments.
      In addition to native and generic XML storage, the repository offers indexing and searching facilities that are used in applications. This includes:
     
  • Full-text indexing with possible language control down to the XML element level (US English, European, Japanese, Korean, etc.)
  •  
  • Indexing of XML elements and attributes (can be defined independently for each element)
  •  
  • Creation of custom (application specific) indexes
  •  
  • Cross collection searches
  •  
  • Fielded searches (any XML tag name or set of tag names can be specified as a search field)
  •  

    Structure Style Sheets : Controlling XML Module Fragmentation and Indexing

      A Structure Style Sheet (S-Style Sheet) describes how a particular type of XML module should be split into smaller chunks (called fragments) for delivery over the network. An S-Style Sheet also contains indexing information that is used to generate full-text and structure indexes for each module in the repository, and to generate various types of table of contents.
     
     

    Fragmentation

      Controlling XML module fragmentation in the XML repository is important as we want the model to apply to a wide range of applications when XML modules can be of very different sizes. Typical examples could be:
     
  • a J2008 module (10 to 100Kb)
  •  
  • a catalog page (less than 10 Kb)
  •  
  • a DOCBOOK chapter, or a full book (from 10 Kb to 4Mb)
  •  
  • an ATA manual (from 10Mb to 150Mb)
  •   Only small fragments can be efficiently manipulated and transferred by the delivery server at a given time. The structure style sheet allows the application designer to express rules that are used by the repository to split a given module into a hierarchy of smaller fragments that become the elementary units for the server.
      Typical properties expressed in Structure Style Sheets rules are the following:
     
  • New-Page property : start a new fragment for a given container tag (CHAPTER, SECTION, IPC-PAGE, etc.)
  •  
  • No-Break property : make sure that the element is not splitted between two fragments.
  •   Properties are associated to XML elements names (CHAPTER) or XML element contexts (SECTION in CHAPTER).
     
     

    Table of Contents

      Structure Style Sheets also contain some properties that express how table of contents should be generated (if needed) for each module. Those properties are:
     
  • Toc-Names property : adds the given container tag to the listed table of contents.
  •  
  • Title property: used to indicate the title that should be used to represent the container tag in the table of content.
  •   Table of contents properties will be set for all XML tags that should appear in one or several tables of contents. This specification allows a completely automatic table of content generation process to take place.
     
     

    Indexing

      Being able to search over a collection of XML modules is of course a major requirement for an XML repository. A generic XML searching engine must be able to search on text, tags, attribute values (and any combination of those). Many application, however, also require specific indexes to be constructed :
     
  • an index that associates a section title to an ID for presenting section titles in cross reference elements,
  •  
  • an index that associates a figure to a part reference to be able to insert an icon whenever the part reference is encountered in the document module.
  •  Of course, such selections could be dynamically computed from the underlying generic indexes (using some kind of XML query language). However, XML query evaluation is complex and difficult to optimize, so this could rapidly become a performance bottleneck for very dynamic applications.
     We thus enable the creation of custom indexes that are constructed and stored in the repository. A custom index maps character strings (keys) to lists of objects of various kinds (values). These objects can be strings, numbers, or XML elements in the repository.
     
     

    Example

      <STYLE NAME='SEC1' INHERIT='#DEFAULT'> <NEW-PAGE>"true"</NEW-PAGE> <TOC-NAME>"tableOfContent , listOfFigures"</TOC-NAME> <TITLE> child("TSECT1") </TITLE> <CUSTOM> addEntry("XREF",attribute("ID"),child("TSECT1")) </CUSTOM> </STYLE> <STYLE NAME='SEC2' INHERIT='#DEFAULT'> <NEW-PAGE>"true"</NEW-PAGE> <TOC-NAME>"tableOfContent"</TOC-NAME> <TITLE> child("TSECT2") </TITLE> . . .
     
     

    Scripting

     All properties in style sheets or templates can be either simple values (like the value of the NEW-PAGE property above) or can be computed using an expression (like the value of the TITLE property above). In Dual Prism, we use the Balise language for scripting.
      Expressions in Structure Style Sheets are able to access any XML information in the repository. The child("TSECT1") example above specifies that the title to be used in the table of contents for the SECT1 element should be retrieved by searching for the first child of the SECT1 element with the name TSECT1. All navigation primitives defined in the W3C DOM (Document Object Model) are available here, together with general data manipulation functions.
     

    Templates : the Central Component

     Templates control the overall organization and layout of an application. They define placeholders (or flows) within a page into which information from the XML Repository or some external data source is inserted.
      In other words, templates specify which data goes into a page and where that data should be displayed. We designed the template mechanism to be as flexible as possible:
     
  • When processing simple data strings (such as a list of prices or product names extracted from a relational database), a template can specify directly how this information will be displayed in the page, constructing HTML markup as necessary to format the data for display.
  •  
  • When processing data that is already structured (such as XML objects extracted from a database or stored in the XML Repository), a template can invoke a Rendering Style Sheet to transform the XML data into a display structure (today an HTML fragment) before placing the data in the page at the appropriate position.
  •   Perhaps more importantly, you can mix both of the above-mentioned 'behaviors' together in the same application, as required.
      Templates are XML documents that contain both standard (typically HTML) tags, and a combination of special XML tags and custom functions. Standard tags are sent to the browser as is, whereas special tags are interpreted by the server and are either replaced with content or used to trigger specific actions, such as evaluating a function or launching a database query or external script.
      Templates can thus be used for a wide range of complex manipulations, including:
     
  • Fetching data from the XML repository or an external database
  •  
  • Applying a Rendering Style Sheet to XML data to render that data using HTML markup
  •  
  • Executing custom scripts
  •  
     

    Example : A Template fetching a Product List from a Database

      <DP:TEMPLATE> <DP:SCRIPT> function fetchProductList ( ) { // send an SQL query to a RDB . . . // return a list of items, with fields REFERENCE and DESCRIPTION } </DP:SCRIPT> <HTML><BODY> <TABLE BORDER="1"> <DP:FOREACH ITEMS='{ fetchProductList( ) }'> <TR><TD><DP:EVAL>item( ) ["REFERENCE"]</DP:EVAL></TD> <TD><DP:EVAL>item ( ) ["DESCRIPTION"]</DP:EVAL></TD> </TR> </DP:FOREACH> </TABLE> </BODY></HTML> </DP:TEMPLATE>
     
     

    Example : A Template Displaying an XML Fragment

      <DP:TEMPLATE module='*'> <HTML><BODY> <HR/> <DP:RENDER module = '{ param("module") }' rstyle='default' /> ..<HR/> <P>This page has been dynamically generated by Dual Prism</P> </BODY></HTML> </DP:TEMPLATE>
      In this example, the template takes a required parameter called "module". This parameter contains the name (identifier) of a module in the repository. It generates an HTML page with a top and bottom rule, and triggers the rendering style sheet called "default" on the given module. All tags produced by applying the rendering style sheet on the module are inserted in place of the DP:RENDER tag.
     
     

    Example : A Template Displaying a dynamically assembled XML Fragment

      <DP:TEMPLATE > <DP:SCRIPT> function assemblePage ( ) { // parse XML file 1 / parse XML file 2 // merge the two XML tree // return the merged XML tree } </DP:SCRIPT> <HTML><BODY> <HR/> <DP:RENDER node = '{ assemblePage() }' rstyle='page' /> ..<HR/> <P>This page has been dynamically generated by Dual Prism</P> </BODY></HTML> </DP:TEMPLATE>
     In this example, the assemblePage function is called. This function dynamically assembles of an XML fragment from various sources (XML files that are parsed, SQL queries, or other sources). Then the resulting XML fragment is passed to the DP:RENDER tag to be rendered using a rendering style sheet called "page".
     

    Rendering Style Sheets : On the Fly Generation of HTML for a Fragment

      Whereas templates specify which data goes into a page and where that data should be displayed, Rendering Style Sheets (R-Style Sheet) define how XML data should be displayed.
      When you need to display structured XML/SGML content within a page, the template invokes a Rendering Style sheet that transforms the original XML structure into a presentation structure (generally a series of HTML tags), suitable for display in a Web browser. The template then places this formatted content into the page at the appropriate position.
      The separation of content formatting and page layout in this way enables you to design highly modular applications. You can design several alternative page layouts (by creating several templates) and format document content in each case using a single Rendering Style Sheet. Alternatively, you can create several different Rendering Style Sheets for use with a single Template and dynamically select which one to use in order to present information differently, based on the profile or needs of the user.
      An R-Style Sheet contains a list of style rules and their associated properties. These style rules describe how the structure of the source data should be transformed into a presentation structure for delivery over the Web.
      As with Templates, Rendering Style Sheets are extremely flexible:
     
  • A basic Rendering Style Sheet might simply indicate which HTML elements to use to render the content of XML element in the source data.
  •  
  • A Rendering Style Sheet can also generate XML markup for one or more elements in the source data for display or processing by more recent browsers or custom applications.
  •  
  • A Rendering Style Sheet can generate Cascading Style Sheet (CSS) instructions together with HTML or XML markup to enable you to make full use of the document formatting capabilities of the latest Web browsers.
  •  
  • A style can be used to redirect the content of one or more elements into a particular flow in the associated Template.
  •  
  • A style can be used to hide the content of one or more elements.
  •   Style properties can contain any combination of Balise expressions and function calls, enabling you to use all the flexibility of the scripting language to perform a wide range of data manipulations.
     
     

    Example: Simple Rendering

      <STYLE NAME='ILIST' INHERIT='#DEFAULT'> <TAG>"UL"</TAG> </STYLE> <STYLE NAME='ITEM' INHERIT='#DEFAULT'> <TAG>"LI"</TAG> </STYLE> <STYLE NAME='P' INHERIT='#DEFAULT'> <TAG>"P"</TAG> </STYLE>
      The simplest rendering style sheet property is the Tag property that maps a given XML element to a rendering element, here the HTML UL, LI and P elements. Other properties allow prefix and suffix generation, CSS elements and attribute generations, automatic generation of HTML links and anchors, etc.
     
     

    Example: Using Expressions for Dynamicity

      <STYLE NAME='FLETTER' INHERIT='#DEFAULT'> <TAG>"P"</TAG> <TEXT-BEFORE> fetchInDB(attribute("LETTER")) </TEXT-BEFORE> </STYLE>
      In this example, a script function (fetchInDB) is called with an attribute value extracted from the FLETTER element. The possibility to call any scripting function when evaluating a rendering style sheet property is essential for merging various sources of information in a consistent way, and for implementing all sorts of dynamic filtering and assembly down to the XML element level.
     
     

    Relationship with XML and XSL

      Our rendering style sheet have a role in the framework that is very similar to the role planned for the forthcoming XSL standard, even if this standard is referred to as a way of expressing XML rendering at the browser level, not at the server level.
      As soon as XSL proposals will stabilize into a W3C standard, it will be the perfect candidate for modeling the rendering style sheet part of our framework. We plan to use XSL for rendering in the Dual Prism product.
      HTML or XML can be used as the rendering format. Templates and rendering style sheets produce flows of tagged data (they actually produce trees of elements and content). Today's applications use the HTML vocabulary as it is the one understood by the current generation of browsers, but XML fragments could be generated for display if browsers could understand and interpret them.
      When this will be possible, XSL will also take a place at the browser level for interpreting and displaying XML fragments dynamically generated by the server. The rendering engine in the server will be limited to filtering and assembly tasks and will not "render/convert" the XML tags anymore.
     

    Scripting at Every Corner

      Scripting stands at every corner in our framework. Structure style sheets, templates, and rendering style sheets can include Balise expressions that are dynamically evaluated; they can also refer to external function libraries.
      Database access is one of the most important facilities that is accessible through scripting. Minimum database support includes ODBC and native Oracle interfaces that enable you to integrate and manipulate external data sources directly from within a publication.
      This scripting capability is what makes the proposed framework more than a simple template system and allows arbitrarily complex Web publication applications to be developed.

    Datasheets and Databooks at Fairchild Semiconductor   Table of contents   Indexes   Using to create information support for management applications