![]() |
Text analysis tools for XML documents using regular expressions &, XSL | Table of contents | Indexes | The application of core standards - a technical approach | ![]() |
|||
Business applications made easy |
Rivers-Moore, Daniel ![]() |
| Daniel Rivers-Moore |
| Director of New Technologies |
RivCom ![]() Swindon ![]() United Kingdom ![]() Wiltshire ![]() | RivCom,
Lotmead Business Village Wanborough Swindon Wiltshire SN4 0UY United Kingdom Phone: +44 (0) 1793 792004 Fax: +44 (0) 1793 792001 email: daniel-rivers-mooret@rivcom.com web site: www.rivcom.com |
| Biography |
| Abstract |
What is a software application? |
application development declarative programming ![]() procedural programming | A software application is essentially a tool for transforming one form of information into another, with or without a degree of human intervention. In this sense a software process is highly analogous to a business process or an industrial process. All three transform inputs into outputs of (hopefully) higher value. Such processes can be defined procedurally – by specifying how they are carried out – or declaratively – by specifying what should be transformed into what. |
XML ![]() | The historical evolution of software has been a steady movement from more procedural to more declarative approaches. Object orientation represented a significant step towards a more declarative paradigm for software application design.XML makes possible the next logical step in this direction. |
IT ![]() | There was a time whenIT was known asDP . At a time before graphical user interfaces and hands-on computing, the role of the computer was essentially seen as being one of transforming different data structures into one another as the information they represented moved through the business process. |
| DP, data processing | Today, humans are much more intimately involved in the information flows. The desktop PC took data out of the centralised data banks and put it within reach of office workers sitting at their desks. The home PC, the Web and the Internet-enabled mobile phone continue this trend and bring information and data to our fingertips and into every corner of our daily lives. But at root, what is happening is still the same. Information is being encoded through different data structures, and moved around the world, being transformed as it goes according to the needs of the business or of the individual user. |
HTML, Hypertext Markup Language ![]() WML ![]() user interface ![]() | The nature of the user interface |
XHTML ![]() | The browser is likely to become the ubiquitous user-interface host on the desktop or laptop PC, but other user interfaces will be provided through palmtop devices, mobile phones, TV sets with their remote controls and a host of other appliances ranging from motor-car dashboards to microwave ovens. The requirement on user-interface design is therefore not to create a dedicated application with its own graphical user interface, but rather to create ways of transforming whatever data the user needs to interact with intoXHTML ,WML , or whateverXML flavour is required for the particular interface device being used. |
WML ![]() XML ![]() | In order to maximise flexibility, and the possibilities of reuse of code, it is important to separate out logically distinct aspects of information, and hence separate out the transformations involved so that only one logical transform is performed at a time. It is also important to separate the structural aspects of the user-interface design, which will be the same whatever physical device type is being used to host the interface, from the purely presentational aspects, which will be specific to a particular kind of device. |
XML ![]() XSLT ![]() | XML itself is based on the principle of separation of content from presentation, so this approach finds a natural fit in theXML paradigm. And theXML transformation-specification language,XSLT , will play a crucial role in this aspect of application design (as indeed in several others, as we shall see shortly). UsingXSLT transformations at the point of delivery makes it possible for the application designer to consider thelogical structure of the user interface separately from itspresentational aspects. |
XSLT ![]() application design | Principles of application design |
W3C ![]() |
XML ![]() |
Minimising the number of distinct data structures |
| Let’s look now at the next principle – minimising the number of data structures used by an application. |
| A little thought will make it clear that there is in a sense a required minimum number of distinct data structures required in any Internet-based distributed application: |
| interchange format |
XML ![]() storage format |
HTML, Hypertext Markup Language ![]() WML ![]() XHTML ![]() |
XSLT ![]() interface logic ![]() |
CASE, computer-aided software engineering UML ![]() application logic ![]() |
generic data format ![]() | To summarise, an Internet-based distributed application will need, as a minimum, 2 to 6 distinct kinds of data format, namely: |
Building an application |
EDIFACT, Electronic Data Interchange For Administration, Commerce and Transport ![]() XML/EDI Pilot Project | I’d now like to take a look at how the application-design principles listed above were applied by the European XML/EDI Pilot Project to build a prototype application – a Transport Firm Booking application based around established EDIFACT messaging protocols for container transport operations. |
XML ![]() XSLT ![]() | Based on the first two of our guiding principles, the application is built usingXML forall the data formats it required, andXSLT to defineall the necessary transformations. This involved developing some general-purposeXML structures for the last three kinds of data format listed above. Let’s take a look at these now. |
Datasets and items |
DTD, Document Type Definition ![]() extensible information sets generic data format ![]() | In order to minimise the number of distinct data formats needed in its applications, the European XML/EDI Pilot Project developed aDTD for what it calledExtensible Information Sets (XIS), consisting ofDatasets andItems . This was used as a commonsupplementary data format throughout the application. |
<Dataset Suid="language" Type="language"> <Description xml:lang="EN">Language</Description> <Item Suid="EN"> <Description>English</Description> </Item> <Item Suid="FI"> <Description>Suomi</Description> </Item> </Dataset> |
| This presents aDataset consisting ofItems of type “language”, each with a “sibling-unique identifier” (Suid) and a description. |
<Dataset Suid="carrier" Type="carrier"> <Description>Carrier</Description> <Item Suid="carrier1"> <Description>R.C.Duke</Description> <Company>R.C.Duke and Co. Ltd.</Company> <Email>customerservice@rcduke.com</Email> <Phone>+44(0)171 123 4567</Phone> </Item> <Item Suid="carrier2"> <Description>Universal</Description> <Company>The Universal Transport Company</Company> <Email>orders@unitrans.co.uk</Email> <Phone>+44(0)181 222 3333</Phone> </Item> <Item Suid="carrier3"> <Description>ABC Carriers</Description> <Company>ABC Carriers of Europe</Company> <Email>info@abcc.co.uk</Email> <Phone>+44(0)1793 121212</Phone> </Item> </Dataset> |
<Dataset Duid="Label" Type="Label"> ... <Item Duid="Label24"> <Description xml:lang="EN" Variant="Full">Company name</Description> <Description xml:lang="FI" Variant="Full">Yrityksen nimi</Description> <Description xml:lang="EN" Variant="Compact">Company</Description> <Description xml:lang="FI" Variant="Compact">Yritys</Description> </Item> ... </Dataset> |
Logical forms and their presentation |
interface logic ![]() presentation ![]() | One of the requirements of the Transport Firm Booking application was to allow the user, once a carrier had been chosen, to enter details of the transportation operation he or she wants that carrier to perform. For a user seated at a PC with a Web browser, part of the data-entry form looks like this: |
| The way the first section of this form is encoded in theinterface logic specification is as follows: |
<Section LabelRef="Label2"> <Label/> <Item LabelRef="Label24" InfoSource="Context2"> <Label/> <ReadOnlyData/> </Item> <Item LabelRef="Label25" InfoSource="Context3"> <Label/> <ReadOnlyData/> </Item> <Item LabelRef="Label26" InfoSource="Context4"> <Label/> <ReadOnlyData/> </Item> </Section> |
| For a user with a mobile phone, theinterface logic specification is unchanged, but the way it would be displayed might be quite different: |
HTML, Hypertext Markup Language ![]() WML ![]() | TheHTML required to produce the first result, and theWML required to produce the second, are both generated out of the sameinterface logic data. It is important to notice that the words that appear on the form are not included in theinterface logic specification. Instead,LabelRef andInfoSource attributes are provided. TheLabelRef attribute identifies one of theItem elements in the LabelDataset which was introduced at the end of the last section. If you look back at the extract from that file, you will see that Label24 has a Full English-language variant of “Company name”, and a Compact English-language variant of “Company”. You will see that in the browser, it is the Full variant that has been used, and in the mobile phone it is the Compact variant. |
<Section LabelRef="Label3"> <Label/> <Item LabelRef="Label23" InfoSource="Context5"> <Label/> <Interface> <EditableField> <Value/> </EditableField> </Interface> </Item> <Item LabelRef="Label25" InfoSource="Context6"> <Label/> <Interface> <EditableField> <Value/> </EditableField> </Interface> </Item> </Section> |
Using XPath statements to identify information sources |
XPath ![]() | Just as theLabelRef attributes above referred to aDataset of Labels, so theInfoSource attributes refer to aDataset of Contexts. What is meant by a Context here is a specification of the place in theinterchange format structure where the data in a particular field in the user interface comes from, and to which it is returned after it has been edited by the user. |
<Dataset Type="Context"> ... <Item Duid="Context2"> <Description xml:lang="EN" Variant="Full">Carrier company name</Description> <Content> //PartyContactsGroup/Party[@PartyQualifier='Carrier']/Name/TextLine </Content> </Item> <Item Duid="Context3" Datatype="email"> <Description xml:lang="EN" Variant="Full">Carrier e-mail</Description> <Content>...</Content> </Item> ... </Dataset> |
XPath ![]() | There are twoItem s in this extract. Note that the second has aDatatype attribute of “email”. We shall return to this in the next section. For now, we’re interested in the firstItem . What this tells us, in English, is that this field contains the carrier company name. What it tells the system, usingXPath syntax, is that this piece of data is the content of theTextLine subelement of theName subelement of theParty subelement of aPartyContactsGroup element, where thePartyQualifier attribute of theParty element has the value “Carrier”. And indeed, when we look at the XML file that was read by the system in order to generate this form and present it to the user, we find the following structure: |
<PartyContactsGroup> <Party PartyQualifier="Carrier"> <Name> <TextLine>The Universal Transport Co.</TextLine> </Name> </Party> </PartyContactsGroup> |
XPath ![]() | Because the words “The Universal Transport Co.” appear here in a context that matches theXPath statement above, these words are displayed in the user interface as the content of the relevant read-only field. This field is identified for the user by a Label drawn from theDescription with the appropriate variant and user language identified by theLabelRef attribute in theinterface logic specification. |
Data validation |
datatypes validation ![]() | When the user presses theSubmit button on the form, it might be necessary to check the validity of the data they have entered, before sending the data on to the next stage of the application (which may be local or remote). This can be done very effectively by associating a datatype with each data context, and defining rules for the validation of each datatype. |
<Dataset Type="Datatype"> ... <Item Duid="email"> <Description>Email address</Description> <Dataset Type="validation"> ... <Item Processor="xslt"> <Dataset Type="test"> ... <Item Suid="test1"> <Description>An email address must contain an @ sign</Description> <Test>contains(Content, '@')</Test> </Item> ... </Dataset> </Item> ... </Dataset> </Item> ... </Dataset> |
XPath ![]() | The above structure specifies that the email Datatype is associated with a set of “validation” mechanisms. One of these validation mechanisms uses an “xslt” processor to carry out the validation. The specification for this validation process contains a set of tests that the XSLT processor can use to validate the content of any field that is associated with the email Datatype. Among these, the test called “test1” checks the data against theXPath expression “ contains(Content,'@') ”. We are also provided with an error message which will be displayed if this test fails. |
HTML, Hypertext Markup Language ![]() XSLT ![]() | When the user presses theSubmit button, a series ofXSLT transforms is triggered, which draw on the various data structures we have looked at, and which produce a newHTML display in the browser, which looks like this: |
Driving the application logic |
application logic ![]() declarative programming ![]() | Now it’s time to take a look at theapplication logic file that drives this data validation process. Here is the relevant extract: |
<Item> <Interface> <ActiveTextSpan Type="Button"> <Value>Submit</Value> <Trigger> <Event>OnClick</Event> <Action> <Window name="customer"/> <File uri="currentstate/DataValidation.htm"/> <Parameter name="Source" type="ref">context.xml</Parameter> <Parameter name="Stylesheet"> <Action> <Parameter name="Source" type="ref">datatype.xml</Parameter> <Parameter name="Stylesheet" type="ref">validate.xsl</Parameter> </Action> </Parameter> </Action> </Trigger> </ActiveTextSpan> </Interface> </Item> |
XSLT ![]() | Theapplication logic is specified in theAction element in the above snippet. This is embedded in a piece ofinterface logic which states that there is anInterface consisting of a “Button” labeled with the word “Submit”. The “OnClick” event on this buttonTrigger s anAction on the part of the application. ThisAction displays in theWindow named “customer”, and saves as aFile in a specified location, the result of performing two nested transforms. The main transform consists of running anXSLT stylesheet against thecontext.xml file (some of whose content we have seen above). The stylesheet used to drive this transform is itself generated by anAction , namely to run thevalidate.xsl stylesheet against thedatatype.xml file. |
Conclusion |
XML ![]() XPath ![]() XSLT ![]() | We have seen through a few small examples howXML and its related specifications (particularlyXSLT andXPath , can be used to drive a fully-functional application. The number of different data structures is quite small, and each one is quite simple. |
XML ![]() | When object-oriented programming came into being, it took some time before programmers and application designers had fully mastered the techniques and come to grips with the implications of the changing paradigm. The same will be true forXML -driven application design and development. But there can be no doubt that we are on the brink of an exciting period when new ideas will be put to the test and out of them will emerge powerful, robust, standards-driven application development paradigms. I hope the work shown here provides a useful contribution to that process. |
| Acknowledgements |
| All the members of the European XML/EDI Consortium. (See http://www.tieke.fi/isis-xmledi ) |
![]() |
Text analysis tools for XML documents using regular expressions &, XSL | Table of contents | Indexes | The application of core standards - a technical approach | ![]() | |||