| Traditional Electronic Printing On The Internet | Table of contents | Indexes | Querying XML | |||
XML Messaging at Chase Manhattan Bank Global Markets |
![]() Chase Manhattan Bank London ![]() O'Sullivan, John United Kingdom ![]() | John
O'Sullivan
Technical Officer, Chase Manhattan Bank
Biographical notice John O'Sullivan is a software developer who has worked in the mechanical and petroleum engineering sectors before moving into wholesale finance with Chase. He is an enthusiast for object technology and open software environments who has authored a book and several conference papers on related topics. |
C++ ![]() COBOL Content Model Java ![]() Natural XML-Data ![]() finance message formatting messaging ![]() | ABSTRACT: |
| Chase Manhattan Global Markets has many applications deployed on its primary trading floors in New York and London which perform functions such as trade capture, position tracking, profit and loss calculation, value at risk calculation, and pricing for a wide range of wholesale financial products. These front office applications run on PCs, Suns and AS/400s. In the back office applications run on mainframes and minis and handle confirmation, settlements and payments for the trades done by the front office. |
| The front and back office applications that support the Chase Global Markets businesses were developed or acquired independently in both precursor banks; Chemical and Chase. File transfer and file sharing are the primary means of inter application messaging. Each application has its own message formats; these formats specify messages as fixed length blocks of text, with the message fields occurring at fixed offsets and being of fixed widths. |
| Clearly Chase's existing solutions to our application's messaging requirements are brittle and support intensive. They require a lot of "handle turning". To remedy this the Global Markets Technical Architecture Group are now building the ASAP Messaging System (AMS). AMS will connect applications running on Windows NT, Solaris, AS/400 and S/390 platforms that are written in C, C++, Java, COBOL and Natural. AMS is built on IBM's MQ series, which provides the guaranteed delivery and queueing across platforms that we need to make our inter application messaging more resilient. But MQ series says nothing about message formats; to MQ messages are just strings. So how should those strings be formatted? |
| Candidate message formats evaluated by the AMS team included XML, ASN.1 and a home brewed option. XML won out for several reasons. It's an industry standard supported by third party tools, that has backing from all the major vendors. It is human readable. It enables us to put semantics into messages, and so make them self describing. XML supports optional elements, and elements that may occur a varying number of times. This supports dissent between sending and receiving applications over message content. The flexibility of XML will enable many different Chase applications to agree on, for instance, a single foreign exchange trade message format that has many optional parts. |
| This case study includes a detailed discussion of the Content Model developed for AMS messages, the selection of a third party XML parser, and the message formatting tools that Chase has developed. Those tools translate Java and C++ objects, as well as C, COBOL and Natural data structures, to and from XML. If and when XML-Data becomes a W3C recommendation its data types will be adopted as the foundation for the semantic types that capture financial information. |
Messaging: what's wrong with the current approach? |
| Chase Manhattan Bank's Global Markets Division has many applications deployed on its primary trading floors in New York and London which support trade capture, position tracking, profit and loss calculation, value at risk calculation, and pricing for a wide range of wholesale financial products. These front office applications are hosted by Solaris, Windows NT and AS/400 systems. In the back office applications run on S/390 mainframes and AS/400 minis and handle confirmation, settlements and payments for the trades done by the front office. |
| The front and back office applications that support the Global Markets businesses were developed or acquired independently in both precursor banks; Chemical and Chase. File transfer and file sharing are the principal means of inter application messaging. Each back end application has its own message formats; these formats specify messages as fixed length records, with the message fields occurring at fixed offsets and being of fixed widths. This ad hoc approach to messaging has three major shortcomings. Firstly, using FTP as a message transport means delivery is not guaranteed. And, secondly, it also means that messages are routed on a point to point basis. Thirdly, it entails that senders and receivers must be in exact agreement about message format. To address these problems we have built the ASAP Messaging System - AMS. |
| When point to point transfer of files containing messages fails it requires intervention by support staff. For example, the team supporting our FX options trading application occasionally have to manually ftp messages to middle office applications, for netting and value at risk, and to back office applications for confirmation and settlement. This sort of "handle turning" is expensive and unnecessary. A message transport that guarantees delivery of a message regardless of the state of the LAN/WAN and the receiving applications at any particular point in time would make this kind of support work redundant and would make our interapplication messaging far more reliable. |
| IBM's MQ series offers that functionality. It is the industry leading message queueing system and it runs on all the Chase platforms, desktop, mini and mainframe, and has APIs for all the programming languages we use. As well as guaranteeing message delivery, MQ stores "in flight" messages on queues, thereby decoupling sender and receiver. So an MQ based messaging system will address the first two of the three shortcomings listed above. However MQ doesn't offer a solution to the third problem; that of message formatting. From MQ's perspective, every message is simply an FBS - a "very big string". The question remains, how to format that string? One option is continuing with the existing approach, fixed length records of fixed width fields. To see why the alternatives are preferable let's examine an example of such a format currently in use at Chase. The GlobalNet application requires that notification of foreign exchange trades be formatted as a 614 character record containing 71 fields. Each field is a fixed width. The smallest fields are a single character. An example is the second field, the deal status, which takes one of four values: 'N', 'M', 'C', 'A' which stand for new deal, matured, cancelled or active. The largest field is 50 characters wide and holds the customer name. Other larger fields include cash amounts, which are 18 characters wide since this is the SWIFT standard (SWIFT is the standard for Interbank Financial Transfers). Date fields are 8 characters wide, which leaves Chase badly exposed to the Y10K bug. The content of some fields depends on their originator - the customer name field will have content unless the message originates from the Commodities Risk Management application, in which case it will be empty. Other fields are designated as historical remnants, for instance the "regulatory principal USD equivalent" - the amount of foreign currency purchased expressed in US dollars calculated using the current local exchange rate. GlobalNet doesn't read this field anymore, but it can't be removed as some applications still write to the field and its omission would change the offsets of all subsequent fields. |
| This approach to the formatting of inter application messages is brittle and imposes a maintenance burden on all the applications sending and receiving messages in a particular format. If one particular application wants to send a piece of application specific data to GlobalNet, then all applications sending to GlobalNet have to add empty versions of that field to their message formats. It also makes it impossible to retire a redundant message field if any application is still sending that field. Even if no application is sending any real content in that field all senders still have to change their formatting. |
Message Format |
| Having decided that Chase's existing message formats were unsatisfactory we needed to select a better way of both defining the content of a message - a "message definition language", and formatting it for transmission - a "message transmission format". Candidates evaluated by the AMS team included a consultant's home brewed, proprietary format, ASN.1, CORBA IDL and XML. The idea of using a proprietary format for our messages was discounted immediately. It would have meant developing all the tools ourselves, and would have offered zero interoperability. IDL appeared on the shortlist of candidates as something of a red herring. Since IDL was developed by the OMG as a language for describing interfaces in terms of classes and data structure it could conceivably be used to define messages. But in what format would messages so defined be transmitted ? The code generated by an IDL compiler uses IIOP or a proprietary protocol to encode the communication between a client and server. IIOP would not be suited to being the content of an MQ message; it is not human readable and has no semantics. Using IDL as a message definition language would have meant developing a proprietary message transmission format. |
| Abstract Syntax Notation 1 is a mature message format widely used in telecoms. Toolkits are freely available on the Net. ASN.1 uses pairs of bytes holding binary values to indicate the type and length of fields in messages. Its use of binary formatting information means that ASN.1 messages aren't readily human readable. |
| XML suffers from none of the disadvantages that eliminated the other candidate message formats. It is an industry standard, backed by both major vendors like Microsoft and new players like DataChannel. Adopting XML as our message format means we can get a lot of leverage off third party tools, most obviously browsers. The simplicity of using XML as both the "message definition language" and the "message transmission format" appeals greatly. What we submit to AMC, our message compiler that generates marshalling code, is exactly the same as a message itself. But the overwhelming advantage of XML as a message format is that it enables self-describing messages. The fact that elements are tagged, and that they may be declared as optional in the DTD, allows the sender and receiver of a message to dissent over its contents and still continue with their business normally. Senders can add extra elements that receivers ignore. Receivers can respond gracefully to a sender's omission of elements. Message formats can evolve without imposing an unwelcome maintenance burden on all the applications using that format. |
| XML's only shortcoming compared to other candidate message formats is its verbosity; this is also the source of its strength. Element tags make messages self- describing, human readable, and prevent all kinds of formatting and field mismatch errors that can occur with non self-describing formats. The only objection we regularly heard to the use of XML was "this is going to double or triple WAN traffic". |
| If XML's bandwidth requirements as a messaging format ever become a problem we plan to resort to compression; the repetition of tags means that the larger a piece of XML, the more compressible it tends to be. |
Content Model |
Figure 1: XML to name/value pairs to struct (xml98fig1.eps/gif)
|
| Once XML had been selected as our message format we had to tame the potential complexity in message formats permitted by XML's expressive power. We developed an AMS Content Model. We wanted to support the kind of formatting operation depicted in figure 1. That entailed everal desiderata for the content model: |
| 1. That XML messages map naturally into the constructs available in the C, C++, Java, COBOL and Natural programming languages. |
| 2. That we exercise the smallest possible subset of XML's features consistent with 1; that is, as simple as possible, but no simpler. |
| 3. That XML messages should map readily into a simple name/value pair format. |
| 4. That the content model should support the evolution of message content, and should allow applications to dissent on message content while retaining mutual comprehension. |
| 5. To get a validating parser to do as much work as possible before our own code has to start figuring the message out. |
| 6. To keep the code that marshals data in and out of XML as simple as possible. |
| We came up with several rules for our XML and the DTDs against which they are validated: |
| Elements may not be declared ANY |
| Element declarations of the following form are proscribed: |
<!ELEMENT banned (ANY)> |
| The ANY content model for an element effectively switches off validation inside that element. Forgoing the flexibility of the ANY element content model means that formatting code can benefit from all the checking done by validating parsers. If a message has somehow become hopelessly corrupt our validating parser will reject it and log the problem. If the ANY content model is in use a parser may try to make sense of it and present our code with a screwy node tree which we then have to figure out. We have experience of parsers that insert spurious elements in a node tree that contain nothing but whitespace when they encounter a carriage return in an element declared as ANY. Banning ANY means our code handles less error conditions when traversing the node tree created by the parser. To paraphrase Dijkstra - "ANY considered harmful". |
| Elements may not be declared EMPTY |
| Element declarations of the following form are proscribed: |
<!ELEMENT banned (EMPTY)> |
| This means that all elements in an ASAP message will have content, whether PCDATA or other elements. It could be argued that leaf nodes, which carry the actual field values of a message, should be empty, with the field value expressed as an attribute. One benefit of this approach would be less verbose XML, another would be that the distinct content models of leaf and branch nodes would make traversing the trees that XML parsers produce simpler. However, third party XML tools that extract data, such as Microsoft's XMLDSO, expect the data values to be element content, not attributes. We have followed this convention. |
| Elements may not be declared as mixed content |
| Element content must be declared as being #PCDATA, or other elements, but not both. For instance these two declarations are fine.... |
<!ELEMENT thisIsOK (#PCDATA)> <!ELEMENT fineToo (thisIsOK,anotherElement)> |
| ...but this isn't.... |
<!ELEMENT banned (#PCDATA,anotherElement)> |
| The thinking behind this is that, again, it makes the node trees generated by the parser simpler to traverse. |
| NB OFX (the Open Financial Exchange format retail banking) imposes a similar constraint. From p16 of OFX 1.5: " The key rule of Open Financial Exchange syntax is that each tag is either an element or an aggregate. Data follows its element tag. An aggregate tag begins a compound tag sequence, which must end with a matching tag; for example, <AGGREGATE> ... </AGGREGATE>. " |
| Leaf elements must declare their data type |
| Since we map leaf elements to atomic data types in our various programming languages we must know what data type to map into. We encapsulate the actual attributes inside parameterised entity references. Those entities are defined inside a base DTD; which all DTDs must include by means of an entity reference. base.dtd looks like: |
| In a DTD that references base.dtd this is a correct declaration: |
<!ELEMENT dealDate (#PCDATA)> <!ATTLIST dealDate %timestamp;> |
| The possible types that a leaf element may have are integer, float, decimal, string, fixedString, time, date or timestamp. Of these types fixedString, string and decimal require an attribute value to be given. In the case of types fixedString and string these are length and maximum length, which must be known for mapping into languages without dynamic memory allocation. For decimal, we need to know how many place after the decimal point we will have; this is important for example, with foreign exchange rates, which are usually quoted with four decimal places. |
| A leaf element need not declare its type with one of the eight data types. It may also use a semantic type, which is declared as being one of the data types, but carries additional attributes to indicate the meaning of the data. ISO currency codes occur frequently in financial messages, these are always three character strings like USD, GBP, DEM, JPY. |
| Encapsulating attribute values inside parameterized entity references means we can transparently switch our attributes from Tim Bray's SQL data typing proposal to XML- Data or any other data typing scheme used by a third party tool or application without changing any of our XML or DTDs. |
| Class elements must declare themselves so A branch element that will map to a class must declare itself as a tag. This is so that classes can be distinguished from other non leaf elements, such as arrays, or "holder" tags that will map to a variable name. In the example below, tradeDate and valueDate supply names for the two instances of date in deal. They are non leaf elements, but aren't classes; deal, however, is a class. |
<!ELEMENT tradeDate (date)> <!ELEMENT valueDate (date)> <!ELEMENT deal (tradeDate,valueDate)> <!ATTLIST deal %class;> |
| Array elements must declare themselves so Array elements need an attribute that distinguishes them for the same reason classes do. An array of dates would be declared thus: |
<!ELEMENT paymentDates (date*)> <!ATTLIST paymentDates %array; "date"> |
| Inside the array entity is an attribute bearing the value "date". Its job is to make it easy for the marshalling code to know what kind of elements to instance when marshalling into an empty array. |
| Do not give elements datatypes as names |
| Since AMS XML messages will be mapped into a programming language they shouldn't be given the same name as any built in datatype in any of the programming languages to which they may be mapped. For example, don't do this: |
<!ELEMENT int (#PCDATA)> |
Parser Selection |
| Validation, or just well-formedness? |
| Validating parsers enforce the content model defined in the DTD referenced by an XML. Non-validating parsers simply check the XML for well-formedness. We preferred a validating parser for several reasons. At the time we evaluated parsers, all the non- validating parsers we looked at were simply ignoring the DTD. To specify attribute values implicitly in the DTD we needed a validating parser. For instance, in the DTD we can specify a bid element so... |
<!ELEMENT bid (#PCDATA)> <!ATTLIST bid %amount;> |
| ...which allows a bid to be expressed in XML like so... |
<bid>1.6550</bid> Without validation a bid we have expressed a bid in XML as follows... <bid XML-SQLTYPE="DECIMAL" XML-SQLSCALE="4" XML-SQLKIND="AMOUNT">1.6550</bid> |
| Although we have rejected the "XML is too verbose for a messaging format" argument, this really would be too verbose. It is less readable too, as the element content doesn't stand out. But the really serious disadvantage of explicit attributes in the XML is that it doesn't allow us to encapsulate the attributes. Having our DTDs specify attributes implicitly via parameterized entity references means we can transparently switch the attributes without changing our DTD or XML. In the absence of a validating parser another alternative would have been to use "untyped" elements. This has two serious disadvantages. Firstly, it would permit mistakes like: |
<bid>Complete nonsense</bid> |
| And secondly, untyped elements can't be automatically mapped into a typed programming language. Client applications would have to use their knowledge of element semantics to infer the datatype and perform this mapping themselves. Effectively, this means presenting all XML elements to the application as strings. |
| Also, our experience with non-validating parsers has shown that they can interpret linefeeds as whitespace elements; code that traverses the node tree produced by a parser then has to have logic to eliminate these spurious elements. More generally, a validating parser that enforces a content model means that the marshalling code that moves data between XML and data structures has far less error conditions to deal with. |
| Which language? |
| The other factor in selecting a parser was language. There are a lot of parsers available on the net that are coded in Java. But we needed a parser that will run on mainframes and minis as well as desktop systems. To be sure that we could link the parser with COBOL applications it had to be written in C, and available in source code. Given those constraints, at the time there was only one parser that met our needs; LTXML from Edinburgh University. |
Tools we built |
| We built two core tools for message formatting, the AMS library, and the AMC message compiler. The AMS library is coded in portable C, and marshalls XML conforming to the AMS Content Model to and from a stream of name/value pairs. The content model guarantees that each piece of data in the XML, each leaf element, will have a unique name formed by composing the tags of that element and its parents all the way up the node tree generated by the parser. AMS has a JNI interface, so it can be driven by C, C++, Java or any language that can invoke C functions such as COBOL and Natural. We implemented AMS on top of LTXML. |
| AMC, the message compiler, is coded in portable C++, using the Standard Template Library, and is built on top of the AMS library. AMC compiles messages defined according to our content model, and generates classes or structures in Java, C++ or C. It also generates marshalling code which invokes the AMS library to convert XML to a stream of name/value pairs, and then to populate a class or structure instance from that stream. Marshalling code is also generated for the converse operation, dumping the contents of an object to an AMS name/value stream, and then invoking the AMS library to convert the stream into XML. |
| The third core element of AMS we built were the MQ APIs; these provide a high level wrapping of MQ that give an application a simplified interface in C, C++ or Java. These APIs ensure there is no type intrusion from MQ's own APIs into our applications, and so leave open the possibility of changing the messaging transport at a later date. |
Figure 2: how AMS and AMC generated code marshall data from XML to name/value pairs and then to an AMC generated class or struct (xml98fig2.eps/gif)
|
Template XML |
| Our content model defines what may possibly occur in a message, but it doesn't define what actually will occur in a given message. That is accomplished by "template XML", and the DTDs by which it is valid. We hope that one of the benefits of the AMS project will be the definition of standard message formats that all applications can use. So all messages concerning trades should conform to trade.dtd, and should come wrapped in <trade></trade> tags. What comes within those tags will vary according to the line of business that a particular trading floor application supports. It may be a spot foreign exchange trade, an option, an interest rate swap or a securities trade. An element declaration to express that might look like... |
| We've captured a lot of possibilities here by using recursive declarations with the "or" operator. Our option declaration includes the possibility of a compound option, since the underlying can itself be an option. The underlying might also be an interest rate swap, in which case we have a swaption. So an interest rate swap might occur at different levels in a trade. If we were to map the DTD into Java we might generate something like... |
| From the point of view of an application that supports foreign exchange options trading there is a lot of redundant complexity here. The problem is that even DTDs coded to AMS's restrictive content model may express many degrees of contingency in valid XML. So generating classes and marshalling code from a DTD that expresses a lot of contingency leads to redundant complexity in the code. In the example above, an application that wants to send a message about a simple option trade would be linking in and programmatically navigating through a lot of code irrelevant to its own purpose. One way of simplifying the generated code would be to use compiler switches to specify which optional parts of the DTD should be omitted in the generated code. A far simpler approach, however, is to compile "template XML", which expresses what will actually occur in a specific type of message. That is it specifies what a receiving application expects to get from a sender. We can then regard a DTD as specifing a set of message templates. An FX option template would look like... |
| The Java generated from this by AMC would be... |
| Which is far simpler than the code that covers all the possibilities implicit in the DTD. An application sending messages about FX option trades only has give the values for class members that are relevant to its own concerns, and then invoke a single method to generate an XML message. A receiving application using the same classes has only to invoke a single method to marshall the contents of an incoming XML message into the relevant members of those classes. Each of the class members has a status associated with it so the receiving application can check whether a data member value is the result of a successful marshalling operation or not. |
| As well as defining a specific sub class of message and functioning as input to the message compiler, a template can also be used to supply default values. On the sender side, where objects are converted into XML, the template element content will stand in where the object fails to supply a value. On the receiver side template element content can be used to fill in values not supplied by the sender. When this happens the status associated with that data member indicates that a default value is in place, rather than a bona fide datum from the sender. Clearly, sender side template values should be used sparingly, since they can't be distinguished from real sender side values by the receiver. Template values supplied on the receiver side are flagged as such, so may be freely employed. |
| Status flags and template values, together with XML's flexibility, enable the classes generated by our message compiler to react gracefully to missing or extra fields in a message. If a sender application adds a new field to a DTD and template, the receiver will remain blissfully unaware of it until it recompiles the new template and starts looking at the new field that has been added. And if a receiver adds a new field that a sender fails to supply it will get a flagged template default value if it chooses, otherwise fields will be flagged as unmarshalled. |
Message Registry |
| The message registry is Global Markets XML message format library. Enterprise support applications like GlobalNet, that take feeds from many trading floor applications supporting specific lines of business, are taking the lead in populating the registry with template XML and DTDs that define our standard message formats. In doing that they are also defining the semantic types that appear in base.dtd. A front office application that wants to dispatch messages to GlobalNet, will select a template from the registry that instances the combination of optional and mandatory fields that the application can supply, and then run it through AMC and generate classes or structs in the appropriate language. The app then needs code to set values in the class or struct, and to be linked to the AMS library and Chase's MQ API, and it is ready to send messages. |
| Futures |
| At present AMS supports the Solaris and NT platforms, and C, C++ and Java languages. To extend beyond connecting front to middle office applications, into the back office, we will have to broaden our platform support. We will add AMS support for S/390 and AS/400 systems running COBOL, Natural and possibly RPG applications. To do this will port the AMS library to S/390 and AS/400, and extend AMC to generate COBOL, Natural and RPG data structures. AMC will also have to generate C marshalling code that can link to a COBOL application, for instance, and populate the AMC generated data structure. |
| Another near term development we have scheduled is dynamic routing. Dynamic routing will allow sending applications to dispatch messages without specifying their destination. The router will examine the message, decide that the message is, for example a foreign exchange option trade, and dispatch the message to all the relevant middle and back office applications. This will relieve sender applications of the burden of having to know which receivers they must dispatch their messages to. |
| One of the most widely used desktop applications in Global Markets is Microsoft Excel. If Microsoft Office 2000 does adopt XML as its native file format, then our message formatting tools will offer a quick route to XML enabling our in house applications. These applications will then be able to exchange data with Microsoft Office applications. |
| One of the most exciting recent developments in software has been the rise of Open Source software, most notably the Linux operating system. Open Source software, in the shape of GNU, has been with us for some time. Recently ubiquitous Internet access has made it possible for many more developers to cooperate on the same codebase to far greater effect. The prominence now enjoyed by Linux illustrates the power of the Open Source software development model. A compelling account of Open Source software development is given by Eric Raymond in "The Cathedral and the Bazaar". Raymond argues convincingly that giving software away as source code enables one to build a community of co-developers who will fix bugs and add features to a product. The result is better software, at far lower cost. If one accepts the case for the Open Source development model, the only argument against it is the need for secrecy about one's sources of competitive advantage. Message formatting is not a source of competitive advantage for Chase Global Markets. So we are giving serious consideration to making the AMS library and the AMC Open Source. If you're interested in using them email me at john.osullivan@chase.com. |
Conclusion |
| Selecting IBM's MQ series means our messaging system guarantees message delivery. Using XML as a message format has many advantages over traditional fixed length record approaches to message formatting. XML is a great technology. It isn't rocket science, it's just a simple, flexible and general file format. And it's a standard. As such it frees us from many of the restrictions of rigid proprietary file formats that have made data interchange so difficult in the past. Why did it take so long for the computing community to create it? |
| Traditional Electronic Printing On The Internet | Table of contents | Indexes | Querying XML | |||