Querying XML   Table of contents   Indexes   XML For Web-Based Collaborative Management

 

WDDX: Distributed Data for the Web

 Allaire Corporation 
 Cambridge 
 Massachusetts 
 Simeonov, Simeon 
 USA 
 
Simeon  Simeonov
Manager, Language Technology,  Allaire Corporation 
 One Alewife Center
Cambridge  (Massachusetts)  (USA) 02140 
Email: simeons@allaire.com

Biographical notice

Simeon (Sim) Simeonov has been developing software for more than ten years. Sim's areas of expertise encompass object-oriented technology, compiler theory, web application development, and tag-based languages. In his current role as Manager of Language Technology at Allaire, Sim provides direction for the evolution of the ColdFusion Markup Language (CFML) and the architecture of the ColdFusion Application Server. Sim's initiatives at Allaire have brought about ColdFusion's tag-based extensibility model, the CFML XML-based meta-information repository, and the Web Distributed Data Exchange (WDDX) technology.

 WDDX 
 

Introduction

  During the last two years, the Web has emerged as a new platform for network applications. The combination of browsers, servers, and network protocols with application servers and back office enterprise technologies has made a new generation of business applications possible. As the Web matures, one of the most important challenges will be the exchange of complex data within applications and between application environments. This paper describes a new technology proposed by Allaire, Web Distributed Data Exchange (WDDX), designed to facilitate the exchange of complex data structures between common web programming environments. WDDX is a pragmatic solution to an important problem faced by developers today. It compliments other standards and technologies that are available or proposed. Finally it's applicable now to a wide range of applications, and already available for many of the most popular language environments on the Web.
 COM, Component Object Model 
 CORBA, Common Object Request Broker Architecture 
RMI
 RPC 
 

Understanding the Challenge

 There is a fundamental disconnect between web technologies and the majority of efforts directed at extending them towards the goal of building a distributed application framework. The Web works because it is simple. HTTP is simple. HTML is simple. Scripting languages such as ECMAScript are simple.
  Simplicity does not mean low functionality. To the contrary, the dizzying array of applications and services that are emerging on the Web work only to the degree that they maintain the simplicity and the strict standards that make them available to the largest possible audience regardless of client platform. To date there have been a number of efforts to adopt solutions from the client-server world to build distributed applications that work over the Web. But client-server follows a fundamentally different model of distributed computing based on using Remote Procedure Calls (RPC) to invoke objects in distant applications.
  RPC-based approaches to distributed computing are highly efficient and provide control for developers over all aspects of communication and state management. But the RPC distributed object model is more difficult and is expensive in terms of development time and resources. In addition, distributed object frameworks such as COM, CORBA and Java's Remote Method Invocation (RMI) require dedicating future development to the same framework to ensure interoperability. They can work alongside the Web, but they lack the transparency, simplicity and broad scope that made the Web successful. A similar problem has emerged at the server. With the proliferation of application servers and frameworks, the transparency of HTTP is compromised by application-specific data structures that make communication between different vendors' servers difficult, if not impossible.
 While it is important to use the fundamental advantage of client-server that these approaches bring to the Web-namely the sharing of processing power between client machines, when available, and different tiers of servers to enable higher scalability of distributed web apps-the approach to these problems must also follow the model that makes the Web work or the Web will collapse under increasing complexity.
 At the same time, one would like to move away from the world in which web application servers and browsers are separated by the walls imposed by under-powered data exchange mechanisms to a world where web applications execute on the entire network. The key to solving each of these problems is providing a mechanism for exchanging structured data in a generic, cross-platform and Web-friendly way.
 ColdFusion 
 Perl 
 

Meeting the Challenge With WDDX

 WDDX overcomes the challenges described above by providing a flexible, open and pragmatic way to solve the problem of structured data exchange in web applications. Very simply, WDDX is a mechanism for exchanging complex data structures between application environments. WDDX consists of a language and platform neutral representation of instantiated data based on XML 1.0 and a set of modules that translate native language environment data structures into XML and vice versa.
  The usefulness of WDDX is best demonstrated with an example. A common use is server-to-server data exchange. For example, a ColdFusion order tracking web application could consume WDDX data produced by FedEx's Perl-based package tracking web application thus integrating package tracking and delivery confirmation into the corporate order tracking process across the Internet. This is one of many examples. (See below for other scenarios.) WDDX does not replace the need for other XML vocabularies. However, it can transform the heterogeneous direction of distributed web applications into a more seamless, interoperable environment.
 While WDDX packets are human readable because they are XML documents, it is expected that applications will use the format as a wire protocol only. This means that many developers who would otherwise not have time or inclination to learn to build XML can be brought into the fold. What's more, for interactions between web-based services that may have slower or different adoption rates and strategies for XML, WDDX provides a baseline that all applications can depend on when transmitting structured information. It is a bootstrap technology. Once the advantages of interoperability are apparent, it will be easier to convert the broad base of developers to supporting the full potential of XML.
 

WDDX vs. Other Approaches

 Because of the explosion of approaches to problems related to those that WDDX seeks to solve, it is extremely important to outline what the proposal is meant to solve. It is perhaps easiest to do this by contrasting it with a number of other technologies, proposals and strategies that have emerged in recent months.
 DOM, Document Object Model 
 WDDX 
 

WDDX, XML and the DOM

  Conceptually, WDDX overlaps some of the general goals of XML. In particular, the idea of arbitrarily moving structured data between applications would seem to be handled by the larger framework of validating parsers being able to extract information from structured documents regardless of their origin.
 While this is true in the abstract, much of the required infrastructure for typing data and validating structures is a valuable addition to the general portability of XML. WDDX can be thought of as a very-high level API built on top of the DOM. For all XML data not based on the WDDX DTD DOM processing makes the most sense.
 SOAP 
 WDDX 
WebBroker
 

WDDX, SOAP and WebBroker

  WDDX isn't about objects. WDDX is a mechanism for structured data exchange. It cannot be used in object-oriented scenarios to exchange object instances that have some complex interrelations. It is not well suited as a backbone technology for distributed object applications running on the Web. DataChannel's WebBroker and Microsoft's Simple Object Access Protocol (SOAP) are more appropriate for this task.
 WDDX 
WIDL
 

WDDX and WIDL

  WDDX is an XML vocabulary for representing application-level data structures in a portable, text-based format. It offers no facilities for structured data extraction/generation. Much of webMethods's Web Interface Definition Language (WIDL) is about the extraction of information from unstructured sources such as HTML pages.
 RDF 
 WDDX 
 

WDDX and RDF

  A WDDX XML data packet is a structurally equivalent representation of some application-level data structures. WDDX provides no mechanism via which the contents of its packet can be described or validated against some notion of what a "valid" data packet is. Therefore, WDDX is not related to RDF (Resource Description Framework) and does not use any of its capabilities.
DSO
 WDDX 
 

WDDX and Datasource Objects

  XML DSOs appeared in IE4 as a means for representing XML data in a tabular format. XML DSOs consume XML data and expose it as something similar to a recordset object for data binding. The WDDX XML format is used only internally (between the WDDX serializer and deserializer objects). Therefore, datasources that produce some generic XML documents cannot be used with WDDX.
 

WDDX and XML DTDs

 WDDX abstracts the process of DTD creation, XML data production, and XML data parsing for application-level data exchange. WDDX's timing is excellent because currently (a) there aren't many native XML datasources on the Web, (b) developers lack knowledge of XML, and (c) currently, most interesting data lives in some form of application data structures at some point, e.g., in a recordset before becoming a report web page. In the long run, many datasources will benefit from exposing some form of native non-WDDX XML interfaces for a variety of reasons, e.g., validation, publishing, etc.
 RPC 
 WDDX 
 

WDDX and Traditional RPC-based Systems

  As discussed in the introduction, the primary goal of WDDX is to provide a more Web-like way to transmit structured data objects between network entities without changing the programmatic approach to developing web applications from page-based to object-based. The following table outlines the primary attributes of each approach.
 
 
Object Method Requests Structured Data Exchange
 
 
Interaction Frequency
 
High
 
Low (once per HTTP request)
 
 
Packet size
 
Small (usually <1Kb)
 
Medium (usually >1Kb)
 
 
nowrap="nowrap">Data types
 
Simple, strictly typed and based on an interface
 
Complex, strictly typed but with no interface specifying the data
 
 
Data binding
 
Anonymous data bound by position to strict interfaces
 
Named data bound to variable contents
 Comparing RPC and WDDX Approaches to Distributing Structured Data
 

Technical Overview

 The WDDX technology is based on two basic elements: the WDDX DTD and the serialization/deserialization modules. Because the modules simply perform a translation function, understanding WDDX requires a thorough explanation of the WDDX DTD and the contents of WDDX packets.
 

WDDX packets

 Whenever an application converts data structures into WDDX a WDDX packet is created. This packet contains an XML representation of the data structures. The WDDX DTD can be used to validate WDDX packets. The following is an example of a WDDX packet:
 

<?xml version='1.0'?>
<!DOCTYPE wddxPacket SYSTEM 'wddx_0090.dtd'>
<wddxPacket version='0.9'>
    <header/>
    <data>
        <struct>
            <var name='s'>
                <string>a string</string>
            </var>
            <var name='n'>
                <number>-12.456</number>
            </var>
            <var name='d'>
                <dateTime>1998-06-12T04:32:12</dateTime>
            </var>
            <var name='b'>
                <boolean value='true'/>
            </var>
            <var name='a'>
                <array length='2'>
                    <number>10</number>
                    <string>second element</string>
               </array>
            </var>
            <var name='obj'>
                <struct>
                    <var name='s'>
                        <string>a string</string>
                    </var>
                    <var name='n'>
                        <number>-12.456</number>
                    </var>
                </struct>
            </var>
            <var name='r'>
                <recordset rowCount='2' fieldNames='NAME,AGE'>
                    <field name='NAME'>
                        <string>John Doe</string>
                        <string>Jane Doe</string>
                    </field>
                    <field name='AGE'>
                        <number>34</number>
                        <number>31</number>
                    </field>
                </recordset>
            </var>
        </struct>
    </data>
</wddxPacket>
 It defines a root level object that is a structure (also known as an associative array) of six properties:
 
  • s which is the string 'a string', - n which is the number -12.456,
  •  
  • d which is the date-time value June 12, 1998 4:32:12am,
  •  
  • b which is the boolean value true,
  •  
  • a which is an array of two elements (10 and 'second element'),
  •  
  • obj which is a structure with two properties s and n, and
  •  
  • r which is a recordset of two rows with fields NAME and AGE.
  • boolean
    dateTime
    number
    string
     

    Basic data types

      WDDX supports the following basic data types: boolean (true/false), number, date-time, and string.
     
  • Numbers  Numbers are internally represented with floating point numbers. Because of differences between WDDX-enabled languages, the range of numbers has been restricted to +/-1.7E+/-308. The precision has been restricted to 15 digits after the decimal point. These requirements are consistent with an 8-byte floating-point representation.
  •  
  • Date-time values  Date-time values are encoded according to the full form of ISO8601, e.g., 1998-9-15T09:05:32+4:0. Note that single-digit values for months, days, hours, minutes, or seconds do not need to be zero-prefixed. While timezone information is optional, it must be successfully parsed and used to convert to local date-time values. Efforts should me made to ensure that the internal representation of date-time values does not suffer from Y2K problems and covers a sufficient range of dates. In particular, years must always be represented with four digits.
  •  
  • Strings  Strings can be of arbitrary length and must not contain embedded nulls. To facilitate the inclusion of control characters in strings, the <string> element can contain <char code='??'/> elements. The value of the code attribute is a two-character representation of the UTF-8 hex code for a given control character. For example, <char code='0C'/> represents the form feed character. Control characters are characters in the UTF-8 range 00-1F. Note that tab (09) and newline (0A) characters can be included directly in XML text. The XML 1.0 specification Section 2.11 requires XML processors to not pass carriage return (0D) characters to applications.
  •  

    End-of-line handling

     End-of-line characters have platform and programming language specific representations. Different application environments may use a single newline (0A), a single carriage return (0D), or a carriage return and newline combination (0D0A). For the purposes of successful data encoding and translation the elements <char code='0A'/> and <char code='0D'/> must be used to encode newline and carriage return characters when they should be preserved in the deserialized string. Note that Section 2.11 of the XML 1.0 specification requires XML processors to translate all occurrences of carriage returns and the carriage return, newline combination to a single newline character. Therefore, for the purposes of XML, end-of-line is represented by a single newline character.
    array
    recordset
    struct
     

    Complex data types

     WDDX supports the following complex data types: arrays, structures, and recordsets.
     
  • Arrays  Arrays are integer-indexed collections of objects of arbitrary type. The starting index value is usually 0 with the notable exception of CFML whose arrays have an initial index value of 1. Because of these differences working with array indices can lead to non-portable data.
  •  
  • Structures  Structures are string-indexed collections of objects of arbitrary type. In many languages they are known as associative arrays. Structures contain one or more variables. Because some of the languages supported by WDDX are not case-sensitive, no two variable names can differ only by their case. Variable names must satisfy the regular expression [_.0-9A-Za-z]+ where the '.' stands for a period, not 'any character'.
  •  
  • Recordsets  Recordsets are tabular data encapsulations: a set of named fields with the same number of rows of data. Only simple data types can be stored in recordsets. For tabular data storage of complex data types, an array of structures should be used. Because some of the languages supported by WDDX are not case-sensitive, no two field names can differ only by their case. Field names must satisfy the regular expression [_.0-9A-Za-z]+ where the '.' stands for a period and not 'any character'.
  •  

    Data type comparison

     The following table compares the basic WDDX data types with those of languages/technologies commonly used on the Web.
     
     
    WDDX Type COM Type Java Type ECMAScript Type
     
     
    boolean
     
    BOOL
     
    java.lang.Boolean
     
    boolean
     
     
    number
     
    double
     
    java.lang.Double
     
    number
     
     
    dateTime
     
    DATE
     
    java.util.Date
     
    Date
     
     
    string
     
    BSTR
     
    java.lang.String
     
    string
     
     
    array
     
    VARIANT array
     
    java.util.Vector
     
    Array
     
     
    struct
     
    IWDDXStruct
     
    java.util.HashTable
     
    Object
     
     
    recordset
     
    IWDDXRecordset
     
    java.sql.ResultSet?
     
    WddxRecordset
     

    More on data types

     
  • Null values  WDDX provides no notion of a null object. Null objects should be serialized to empty strings. Upon deserialization it is up to the component performing the operation to determine whether and where should empty strings be deserialized to null values. Null support is one area where future extensions are likely.
  •  
  • Serialization model  WDDX serializes data using a model of pure aggregation. It has no mechanism for handling object references. Aliased references will result in multiple object instances being deserialized. WDDX serialization applied to a data structure that has cyclical references will most likely result in infinite iteration/recursion, depending on the serializer implementation. Object references support is another area of potential future investigation.
  •  

    DTD verbosity

     This DTD is purposefully made verbose to aid the readability of WDDX packets. If packet size becomes an issue, compressing WDDX packets using an HTTP-safe real time compression algorithm is likely to be a much more appropriate solution than, for example, a DTD that uses one character element and attribute names. Some experiments conducted at Allaire suggest that 5 - 15 fold compression rates are achievable.
     

    The WDDX DTD

     
    
    <!ELEMENT wddxPacket (header, data)>
    
    <!ATTLIST wddxPacket
              version CDATA #FIXED "0.9">
    
    <!ELEMENT header (comment?)>
    
    <!ELEMENT comment (#PCDATA)>
    
    <!ELEMENT data (boolean | number | dateTime | string | array | struct | recordset)*>
    
    <!ELEMENT boolean EMPTY>
    
    <!ATTLIST boolean 
              value (true | false) #REQUIRED>
    
    <!ELEMENT string (#PCDATA | char)*>
    
    <!ELEMENT char EMPTY>
    
    <!ATTLIST char 
              code CDATA #REQUIRED>
    
    <!ELEMENT number (#PCDATA)>
    
    <!ELEMENT dateTime (#PCDATA)>
    
    <!ELEMENT array (boolean | number | dateTime | string | array | struct | recordset)*>
    
    <!ATTLIST array 
              length CDATA #REQUIRED>
    
    <!ELEMENT struct (var*)>
    
    <!ELEMENT var (boolean | number | dateTime | string | array | struct | recordset)>
    
    <!ATTLIST var
              name CDATA #REQUIRED>
    
    <!ELEMENT recordset (field*)>
    
    <!ATTLIST recordset 
              rowCount CDATA #REQUIRED
              fieldNames CDATA #REQUIRED>
    
    <!ELEMENT field (boolean | number | dateTime | string)*>
    
    <!ATTLIST field
              name CDATA #REQUIRED>
    
     

    WDDX Scenarios

     WDDX has the potential to open up and ease the development of complex, distributed web applications. With its XML foundation, WDDX has broad applications in the web space. There are two major applications of WDDX: server-to-browser data exchange and server-to-server data exchange.
     ColdFusion 
     HTTP, Hypertext Transfer Protocol 
     JavaScript 
     Perl 
     WDDX 
     

    Server-to-browser data exchange

     Niclaus Wirth, creator of the Pascal programming language, said Algorithms + Data Structures = Programs. His point is clear: to perform useful functions a system must have data processing capabilities and data to process. How does this formula apply to the Web? While the processing capabilities of web browsers have improved since the advent of JavaScript 1.0, their ability to produce and consume complex data has not changed much. Web browsers are starved of structured data because HTTP is a text-based protocol and currently there is no mechanism for structured data representation in text form that is widely available to web application servers and common scripting languages. As a result, a browser's ability to perform complex operations beyond user interface display is vastly diminished.
     Complex data produced and consumed by browsers without much development overhead opens the door to an exciting range of possibilities from roundtrip binding of server datasources to browser UI components, to the establishment of predictable data exchange interfaces for dynamic pages and DHTML components.
      WDDX introduces a transparent means to server-to-browser data exchange. A developer can retrieve data using an application server, transform it into WDDX, transfer it to the browser as part of an HTTP response, and make it available as native JavaScript objects. Conversely, JavaScript data can be serialized to WDDX and transferred to an application server, usually in a hidden form field during an HTTP post operation, where it is instantiated as native objects in Perl, VBScript, the ColdFusion Markup Language, or any other application environment that has a WDDX deserialization module. Overall, this approach is straightforward to implement, supports a broad base of browser and OS platforms, and does not require developers to change their fundamental data access and manipulation approach to web programming.
     ColdFusion 
     HTTP, Hypertext Transfer Protocol 
     Perl 
     WDDX 
     

    Server-to-Server data exchange

      One of the more exciting aspects of the emerging web landscape is leveraging server-to-server data exchange to create distributed web applications that span across the Internet and Extranets. Whether for replicating data, centralizing data serving, or doing business-to-business exchange, XML-based server-to-server data exchange will be critical to the next generation of web applications.
     WDDX works as transparently with server-based communications as with server-to-browser communications. Using HTTP as the communications protocol, either end of a data exchange need only have the WDDX modules to map the data into its native environment. What this means is that an application could leverage data and services running on remote servers using any web platform. For instance, an application written in Perl could make an HTTP request against a ColdFusion server and have a recordset returned as WDDX, and then locally work with that data as if the query happened on the local server.
     This model of computing opens up a huge new range of business applications. For example, web content providers can now publish 'intelligent datasource URLs' that expose WDDX interfaces to dynamic applications running anywhere on the network. For instance, 'Weather.com' could publish a set of URLs that returned a WDDX packet with a structure of weather data based on a custom search. The 'client' application could then use that data locally in its native language environment. The model extends logically into Affiliate and Syndication networks emerging on the Web, where the value-added data, products and information of a web site are made available to 'Site Partners' via affiliate programs. These site partners leverage distributed data, and even invoke remote product orders and promotions, via these syndication networks. A framework as cross-platform and cross-language as WDDX and XML is a requirement for such a use of the Web to flower.
     

    WDDX Efforts

     Allaire is releasing WDDX freely to the web community. As part of this, Allaire and industry partners are involved in a range of efforts to create native implementations on popular web development and deployment platforms.
    CFML
     ColdFusion 
     

    ColdFusion

      ColdFusion Server 4.0 introduces substantial support for WDDX. At the center of the support is a new ColdFusion Markup Language (CFML) tag, which provides support for WDDX serialization and deserialization. With this tag, developers can transform any aggregated CFML data structures to and from WDDX packets. ColdFusion 4.0 also provides direct server-side translation between WDDX and JavaScript.
     JavaScript 
     

    JavaScript

      JavaScript support for WDDX is based on a 'wddx.js' package, available freely from Allaire. The package includes a WDDX serialization module for JavaScript and a JavaScript recordset object since one is not part of the language. A third party has provided a JavaScript deserialization module based on a JavaScript XML parser.
     ASP, Active Server Pages 
    C
     C++ 
     COM, Component Object Model 
    Delphi
    PowerBuilder
     VBScript 
    Visual Basic
     

    COM/ASP

      A lightweight set of COM components, available freely from Allaire, provides serialization and deserialization routines for conversion between COM data structures and WDDX packets. The COM implementation also supports the direct server-side translation between WDDX and JavaScript. Because of lack of a uniform COM implementation of associative arrays and recordsets, the components distributed by Allaire include one for representing each of these datatypes. The most important aspect of the COM implementation is that it can be used from within a wide variety of environments ranging from applications developed using Visual Basic, C/C++, Delphi, or PowerBuilder to scripts running in the Microsoft Active Server Pages environment.
     Java 
     Perl 
     

    Perl and Java

      Currently, there are independent efforts underway to add WDDX support for serialization, deserialization, and WDDX-to-JavaScript translation to Perl and Java.
     

    For more information

     Allaire is actively working on WDDX. For up-to-date information related web resources, visit the Allaire development page at http://www.allaire.com/developer.
     

    Acknowledgements

     The author would like to thank those at Allaire who have contributed much to bringing WDDX to life: J. J. and Jeremy Allaire, Adam Berrey, and Nate Zelnick. The Web will be a better place!

    Querying XML   Table of contents   Indexes   XML For Web-Based Collaborative Management