XML: What is it &, why use it?   Table of contents   Indexes   XML crosses the chasm

manager
 

Building an XML application

 key management issues
Roy, Jaideep
 
 Jaideep  Roy
 Vice President
 Bear Stearns & Co., Inc.
 Edison 
 New Jersey 
 USA 
Bear Stearns & Co., Inc.,  43 Reading Rd., Apt N
Edison  New Jersey  08817 USA
Phone: +1 973 793 5823 Fax: +1 973 793 7388 email: jroy@bear.com
 Biography
 Jaideep Roy - Jaideep Roy is Vice President of Information Technology at Bear Stearns & Co., a leading investment banking, securities trading and brokerage firm on Wall Street. He has extensive experience in designing and developing Internet based financial and e-commerce applications. He has a BS in Engineering, an MS in Computer Science, and is currently pursuing an MBA in Finance. He is an active member of the ebXML initiative. He is also a member of the New York Academy of Sciences, ACM, IEEE and IEEE Computer Society. He has published several articles related to Internet technologies in various publications including the IEEE. At Bear Stearns, he currently manages the design and development of various web based financial applications using XML, Java and other related Internet technologies. He can be reached at jroy@bear.com.
Ramanujan, Anupama
 
 Anupama  Ramanujan
 Computer Consultant
 AT&T
 Edison 
 New Jersey 
 USA 
AT&T,  43 Reading Rd., Apt N
Edison  New Jersey  08817 USA
Phone: +1 732 576 5516 email: ramanujan@att.com
 Biography
 Anupama Ramanujan - Anupama Ramanujan is a Computer Consultant at AT&T Labs where she is working on building leading-edge Business-to-Business e-commerce applications using XML and Java. She has designed and developed web applications for major Fortune 500 financial and telecommunications firms. She has a BS in Computer Engineering. She has published several scholarly articles related to Internet technologies in various publications including the IEEE. Her current interests include using XML, Java, Java Beans, JSP, EJB, CORBA and other related technologies to build e-commerce applications. She can be reached at ramanujan@att.com.
 Abstract
 XML is a hot new technology for describing data. It is also being touted as the technology that will enable the creation of the next-generation Internet applications. XML holds the promise of exchanging data seamlessly and efficiently between applications running on multiple platforms. However, it is difficult for an IT manager to adopt a new technology like XML and build applications successfully using it. Developing applications using a relatively new technology like XML involve considerable amount of investment in time, effort, money and manpower and an IT manager has to fully understand what the new technology is, what it offers, its advantages, limitations, and future direction, before making capital expenditure. This paper aims to dispel the myths surrounding XML.
 

What is XML?

 XML is the Extensible Markup Language. It is a subset of the Standard Generalized Markup Language (SGML), a complex standard for describing structure and content in documents. It is a markup (tag-based) language that is designed to organize data rather than format it. It is also a meta-language - a language for describing other languages. It lets you define your own customized markup languages for different classes of documents.
 XML is a project of the World Wide Web Consortium (W3C). The development of the XML specification is done under the supervision of W3C's XML Working Group. It is an open specification (non-proprietary) and the current specification (version 1.0) was accepted by the W3C as a Recommendation on Feb 10, 1998. A Recommendation by the W3C indicates that the specification is appropriate for widespread use. But XML is still evolving with the addition of new features and functionalities.
 XML is more flexible than a fixed format markup language like HTML. It adds context and gives meaning to data. In XML, you can define your own custom tags that represent data logically. shows a sample XML document.
 
Sample XML document
<?xml version="1.0"?>
<employee>
<id>12345</id>
<firstname>John</firstname>
<lastname>Smith</lastname>
<jobtitle>CEO</jobtitle>
<address>
<street>123 Street</street>
<city>New York</city>
<state>NY</state>
<zip>12345</zip>
<country>USA</country>
</address>
</employee>
  describes the personnel record of employee "John Smith". Note that from this document, we can ascertain key relationships about different items of data with regard to the whole "employee" entity. This is also referred to as self-describing data because the tags describe the information contained within.
 

The grammar of XML

 A DTD (Document Type Definition) defines the grammar of an XML document. It describes the markup (elements) available, where they may occur, and how they all fit together. It is essentially a description of the legal structure of an XML document. shows a sample DTD for the XML document specified in .
 
Sample DTD
<!ELEMENT employee (id,firstname,lastname,jobtitle,address)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT jobtitle (#PCDATA)>
<!ELEMENT address (street,city,state,zip,country)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<!ELEMENT country (#PCDATA)>
 The XML standard does not insist on using DTDs, but using a DTD means you can be certain that all documents, which belong to a particular type, will be constructed and named in a conformant manner. It will also help to ensure that XML documents adhere to relevant business rules if those rules are embedded in a DTD. But DTDs should be carefully designed. They should cover all possible document cases as well as allow for future enhancements. This is key to deploying effective XML applications.
 

Important features

 One of the important features that make XML extremely powerful and useful is that it is a simple language. The rules for creating a markup language in XML for encapsulating data are quite simple. For example, XML documents are composed of simple tags marked by angle brackets with data stored in between them as plain text. The tags almost always come in pairs and they can be nested to multiple levels, as shown in . Similarly, XML data is just ordinary text that isn't tied to any particular programming language or platform. Standard text editors can be used to create and edit XML documents and the XML markup usually makes sense to we humans also.
 XML supports the Unicode standard, a character-encoding system that supports all of the world's major languages. So virtually all the characters that are used in the world are legal characters in XML. With software that processes XML properly, this can be a huge benefit in developing web applications that span across national and cultural boundaries.
 XML documents essentially have a rooted tree structure and for many applications, a tree data structure is powerful enough to represent complex data. It is also easy to write software programs that manipulate tree-structured data and this again goes to the heart of XML - its simplicity.
 

Key application areas

 XML is not intended to replace HTML; instead it provides more flexible document definition and processing capabilities. XML allows us to reformat data to be displayed in multiple devices and platforms. Since XML separates display instructions from content definition, a web site designer can alter the look and feel of the site simply by applying a different style sheet to the same XML document. More importantly, this allows using the same content for other systems or devices such as PDAs (personal digital assistants) and wireless devices that do not use HTML for their display processing.
 XML excels in making on-line information search and retrieval fast and efficient. This is because XML documents also store meta-information (i.e., information about information). By looking at the tags, we can determine what each data is - whether it is the author's first name, last name, address and so on. Search engines can use this feature to efficiently search and retrieve documents. For example, it would be possible to process search queries like "find all documents where author's last name is Smith", a decisive advantage over HTML.
 One of the hottest application areas of XML is messaging. XML enables seamless and efficient transfer of data between applications. Since it is text-based, it is easily understood by all platforms. It can be used as the least common denominator for representing information. This makes it the perfect medium for exchanging information between organizations or within an organization in a platform independent way.
 XML promises to dramatically improve the way companies exchange and present information over the Internet. XML is gaining fast acceptance in the development of next generation Business-to-Business (B2B) e-commerce applications. XML can benefit e-commerce by enabling back-end systems to communicate business transaction information. For example, business partners can standardize on a specific XML syntax that describes a purchase order and can then automate the transfer of that information across otherwise incompatible systems. XML is the perfect choice in building these systems because data can be formatted for exchange between business partners in an easy to process platform neutral way. With B2B e-commerce expected to reach $1.3 trillion by year 2003 and a staggering $7.3 trillion by year 2004, XML promises to be a key enabling technology.
 

Getting started

 XML is best suited for developing applications that exchange data between systems or for building applications that offer different views of the same data. Adopting a new technology like XML involves considerable amount of investment in time, effort, money, manpower and other resources. A four-pronged approach can be used to ease the adoption of XML into an organization's technology mix.
 
  • Provide XML education:
    There is definitely a steep learning curve associated with learning XML and related technologies. The first step is to get the development team energized and quickly up to speed. XML articles, books, tutorials, white papers, and case studies can greatly aid in understanding the technology and how it can be used to solve real-life problems.
  •  
  • Launch an XML project:
    Launch an XML pilot project with reasonable project scope. A useful project that adds value to the organization as well as provides solid learning experience should be the criteria for selecting a project.


  •  
  • Tabulate results:
    Upon project completion, tabulate the results by identifying the lessons learned, the costs incurred, and the benefits achieved. Take note of the problems encountered, interoperability issues with existing systems, and how the issues were resolved.
  •  
  • Formulate strategy:
    The IT manager should now be in a better position to accurately determine how the organization can better leverage the technology. The manager can then revise application development plans and incorporate XML as appropriate.
  •  

    Using XML development tools

     Use of XML development tools and application servers can significantly aid in building XML based systems quickly and efficiently.
     
  • XML tools:
    There are a host of XML tools that are available from vendors and as free open-source software. The most widely used XML tools are parsers, programs that decode XML tags. Other useful tools include XML generators, XML document editors, DTD editors, Stylesheet editors and formatters. These tools are available in a wide variety of languages like Java, Perl, Tcl, and C++. Examples include, among others, the popular IBM XML for Java parser. These tools are typically quite robust and can significantly reduce application development time, effort and cost. A word of caution about using free open-source tools - you are unlikely to get any service and support.
  •  
  • XML application servers:
    These are middleware applications that automate the exchange of XML data. They store and retrieve data from various sources, apply the appropriate markup tags, and distribute it to applications. Care should be taken in selecting an XML application server because the storage methods and capabilities of these servers vary significantly between products from different vendors. The XML server should preferably work well with your existing database and web application servers. Software AG and Bluestone Software have recently introduced XML application servers into the market.
  •  

    Security

     XML is expected to facilitate Internet based Business-to-Business (B2B) messaging. But one of the biggest concerns in doing Internet B2B messaging is security. Internet is a public network and messages can be stolen or modified during transmission. XML by itself doesn't provide any security features. One possible solution is to make use of cryptographic protocols such as SSL to make the communication secure. Also, commercial products that can be used to digitally sign, encrypt, verify and decrypt XML documents have started to arrive in the market. One recent example is the X/Secure product from Baltimore Technologies. W3C and IETF (Internet Engineering Task Force) are also working on a digital signature standard for XML documents.
     Another security issue arises when XML documents refer to resources such as DTDs stored on external systems that are not adequately secured. A hacker attack on these systems that makes a small change to these external resources can cause havoc to XML processing. The easiest solution is to copy necessary resources to local secure systems. But this reduces flexibility, especially if the resources are being shared. These are critical issues that an IT manager needs to be aware of when designing XML based systems.
     

    Limitations and drawbacks

     In spite of all the hype surrounding XML, it isn't the one stop solution for all issues. It is hardly a good choice for building internal standalone systems. It is also not the ideal choice when security and efficient low-level communication are of critical importance.
     XML is limited when it comes to the data types that it supports. It is a text-based format and it doesn't have facilities for supporting binary data or other complex data types such as multimedia data. It also lacks data typing. Even though XML is excellent in validating the structure of a document using DTDs, it doesn't check for errors in data contained within a document. But this may soon change with the adoption of the XML Schema standard that the W3C is currently working on.
     Another limitation is the lack of XML support on the client side. Among the popular web browsers, only Microsoft's Internet Explorer (version 5) offers some support for displaying XML documents. Netscape's Communicator/Navigator hardly offers any in-built XML support at all.
     One of the major hindrances in adopting XML is the lack of standard vocabularies or tag sets. There is no clear consensus yet on how key business terms likecustomer orinvoice are defined within vertical or horizontal industry segments. For example, one company may define an XML tag forpurchase order usingcustomer name andaccount number where as another company may just use theaccount number . Information can get lost or be interpreted differently when data is transmitted between these two companies. The problem is exacerbated when data is exchanged between companies in different industries. Standard XML vocabularies, at least for specific industries, will ensure that systems can exchange data in a consistent manner. In fact, this is one of the most important issues surrounding XML today.
     

    A work in progress

     XML is still a work in progress. New features and functionalities are being added to it as well as new technologies are being developed around it. A few of the important technologies and standards that are rapidly evolving around XML include:
     
  • XHTML : Extensible Hypertext Markup Language (XHTML) is a markup language written in XML. It is the result of rewriting HTML (version 4.0) as an XML application and it creates a middle ground between HTML and XML. It is a technology that will help broaden the number of devices that can access information from the Web and increase the capabilities of those that already do so, such as cellular phones, personal digital assistants and other miniature devices.
  •  
  • XSL : Extensible Stylesheet Language (XSL) allows applying formatting rules to XML documents. It can be used to specify presentation format for XML documents (for example, font size). It can also be used to transform XML documents into different formats like HTML, PDF or even audio. For example, once an XML document is converted into HTML using XSL, it can be viewed in any browser. In addition, XSL can transform an XML document into another XML document.
  •  
  • XML Schema : An XML Schema essentially defines the elements that can appear within an XML document along with its attributes. It also defines the structure of the document - the parent and the child elements, the number of child elements, the sequence in which the child elements can appear, and whether an element can be empty or whether it can include text. It can also define default values for attributes. It provides a more powerful mechanism than DTDs for describing the structure of XML documents.
  •  
  • XPointer : XML Pointer Language (XPointer) is a language that supports addressing into the internal structure of an XML document. Essentially, it provides a mechanism to refer to elements, character strings, selections, and other parts of an XML document.
  •  
  • XLink : XML Linking Language (XLink) specifies constructs that may be inserted into XML documents to describe links between objects. It can be used to describe the simple unidirectional hyperlinks of today's HTML as well as more sophisticated bi-directional, multi-directional, and typed links.
  •  In addition to the core standards and technologies, use of XML as a tool for data exchange hinges on developing standard definitions of key business terms. Several industry initiatives are already under way to develop XML business vocabularies. Following lists some of the on-going initiatives in this regard.
     
  • ebXML : The United Nations body for Trade Facilitation and Electronic Business (UN/CEFACT) and the Organization for the Advancement of Structured Information Standards (OASIS) have joined forces to initiate a worldwide project to standardize XML business specifications. They have established the Electronic Business XML Working Group (ebXML) to develop a technical framework that will enable XML to be utilized in a consistent manner for the exchange of all electronic business data.
  •  
  • FpML : The Financial products Markup Language, based on XML, is a new initiative enabling e-commerce activities in the field of financial derivatives. The development of the standard, controlled by market participant firms, will allow the electronic integration of a range of services, from electronic trading and confirmations to portfolio specification for risk analysis.
  •  
  • HL7 initiative : Health level 7 (HL7) is a standards organization serving the health-care industry. They are currently working on an architecture based on XML for exchanging data between health care organizations.
  •  Most of these XML standards are in their early stages of development. An intelligent approach is to move forward with XML projects, but at the same time keep a careful watch on standards as they continue to evolve. It may be necessary to support emerging vocabulary standards, but there are tools available in the market to aid the transition.
     

    Conclusion

     XML is fast becoming the key language for an increasing number of new applications. It is poised to fundamentally alter the way information is delivered and used as well as enable the creation of new and powerful applications. It has a lot going for it and it certainly looks like a key technology that has the potential to shape the future, especially of the World Wide Web. A good understanding of the technology along with its advantages, limitations and future direction is the key to building applications successfully using XML.
     Acknowledgements
     The authors would like to acknowledge the support they got from their parents in writing this paper.
     Bibliography
     
    XMLS Information about XML standards - http://www.w3.org/XML
     
    XMLC Comprehensive information about XML and related technologies
     
    XMLB1 E. R. Harold, XML Bible, IDG Books Worldwide, 1999.
     
    XMLB2 H. Maruyama et al, XML and Java: Developing Web Applications, Addison Wesley, 1999
     
    XMLB3 J. Bosak and T. Bray, "XML and the Second-Generation Web", Scientific American, May 1999
     
    XMLB4 D. Megginson, Structuring XML Documents, Prentice Hall, 1998.
     
    XMLB5 N. Bradley, The XML Companion, Prentice Hall, 1998
     
    XMLB6 F. Boumphrey et al, Professional XML Applications, Worx Press Inc., 1998.

    XML: What is it &, why use it?   Table of contents   Indexes   XML crosses the chasm