"Book Purchase" Function in a Documentation Self-service at the EDFElectricité de France Research and Development Division   Table of contents   Indexes   The Use of SGML for a Police Information System

 

extended Structured Query Language (xSQL)

 Benjamin   Jung
  Research Assistant
  Trinity College Dublin  Knowledge and Data Engineering Group Department of Computer Science
Trinity College
Dublin   2  Ireland
Phone: +353-1-608 1335
Fax: +353-1-677 2204
Email: benjamin.jung@cs.tcd.ie Web: http://www.cs.tcd.ie/Benjamin.Jung/
 
Biographical notice:
 
Benjamin Jung is a PhD student and Research Assistant in the Knowledge and Data Engineering Group at Trinity College Dublin. Before he came to Ireland he graduated from Technische Universitat Munchen in 1997 with a masters degree in computer science and theoretical medicine. After attending a lecture by Jon Bosak about " XML  (eXtensible Markup Language) , Java and the future of the Web" in late 1997 he concentrated on the introduction and continued use of XML in his healthcare IT  (Information Technology) research work. He is involved in two major European Medical Informatics projects (Synapses, SynEx), where he works in the Electronic Healthcare Record Architecture group. He is also a member of the CEN/TC251  (Comite Europeen de Normalisation Technical Committee 251) , XML -Taskforce.
Dublin 2
 Grimson, Jane 
 Ireland  
 Trinity College Dublin  
 
Jane   Grimson
  Associate Professor
  Trinity College Dublin  Knowledge and Data Engineering Group Department of Computer Science
Trinity College
Dublin 2   Ireland
Phone: +353-1-608 1594
Fax: +353-1-677 2204
Email: jane.grimson@cs.tcd.ie Web: http://www.cs.tcd.ie/Jane.Grimson/
 
Biographical notice:
 
Jane Grimson obtained a bachelors degree in Computer Engineering from Trinity College Dublin followed by a Masters and Doctorate in Computer Science from the University of Toronto and the University of Edinburgh, respectively. Since 1980 she has been a member of the academic staff of the Department of Computer Science in Trinity College Dublin, where she is now an Associate Professor. She is Head of the Knowledge and Data Engineering Group and co-chairs the Centre for Health Informatics. She is currently also Dean of Engineering and Systems Sciences. She has written over 60 papers and co-authored a textbook on Distributed Database Systems published by Addison-Wesley. Her main research interests are in distributed database systems and Health Informatics. She is Project Manager of the Synapses Project, a major EU-funded project in Electronic Healthcare Records. She is a Chartered Engineer, Fellow and Vice President of the Institution of Engineers of Ireland, Fellow of the Irish Academy of Engineering, of Trinity College, of the Irish and British Computer Societies and of the Royal Academy of Medicine, and a member of the Irish Council for Science, Technology and Innovation, the IEEE  (Institute of Electrical and Electronics Engineers) and ACM  (Association for Computing Machinery) .
 
ABSTRACT:
 
The xSQL  (extended Structured Query Language) is a further development of the SQL  (Structured Query Language) to process queries on one or a collection of  XML documents. SQL 's easy to learn and very readable syntax as well as its wide usage in the world of RDMS  (Relational Database Management Systems) pushed the decision to use it as a basis and add on additional and more XML -specific functionality. The main focus of xSQL will be the selection of sub-trees and (possibly with children) and the join and union of pre-existing XML documents. Both will depend on conditions.
 
This paper was submitted to the " XML Europe '99" conference committee in December 1998. Additional research and implementation work in this topic will take place until the conference in April 1999. A first prototype-application that uses the xSQL specification will be ready to present at the conference.
 

Introduction

 
 XML was first introduced in 1996 and quickly spread into various information technology areas. With ongoing introduction and further specification of the new technology, XML documents became more and more complex and bigger in size. This progressively reduced the readability, usability and maintenance of this data-format for humans. In particular, a middle-tier between the XML document itself (raw data) and the  XSL  (eXtensible Stylesheet Language) -stylesheet (presentation) was needed for conditional selecting, extracting, reordering and joining elements (query). The existence of metadata, which is one of the integral parts of XML , allows query functionality to be very precise and efficient and therefore useful and successful.
 
SQL is widely used, easy to learn and has a very clear syntax; these three features influenced the decision to use it as the basis of a query language and extend it with additional and more XML -specific functionality. The main focus of this work is on expanding the existing SQL syntax to apply it to the conditional selection of sub-trees and elements from, and the merging (join and union) of pre-existing XML documents. The easy creation and comprehension of existing queries was another important goal.
 
This presentation will describe xSQL which fulfils all the above mentioned requirements. It can be used for local as well as for remote querying on a server, which leads to an immense reduction of the transmitted data. The close relation to the SQL syntax makes it easy for people who used SQL before and are familiar with the hierarchy of the XML document to use xSQL immediately in it's most efficient way. xSQL will be used in different projects as a middle-tier between the XML document and XSL stylesheets. This will allow the effectiveness of xSQL to be evaluated in a variety of applications. The resulting feedback will be used to further definition of the language in the light of practical experience.
 
Possible scenarios that use xSQL can be found in various areas, especially in healthcare, where the patient record will be widely distributed in the future and may be stored in many different native formats. The three-tier XML model can provide a unique interface to all sources. Patient's data is downloaded from various sites into one single XML environment, where new XML (source) documents are created by xSQL -querying the original source and presented according to predefined views ( XSL ). Query functions of SQL can be exploited for statistical analysis of extracted data from the original XML source.
 
A first prototype implementation of the xSQL engine has been developed and demonstrates the use of xSQL as well as its advantages and disadvantages. TheXML-documentbase application is an easy to use MS  (Microsoft) Access-like suite and uses the xSQL engine in the middle-tier. It integrates the (local and remote) opening, download and management of XML documents, xSQL queries on internal XML documents and their merge with different XSL stylesheets. Further development will include the implementation of a component or library that allows it to be used independently in various client, server or standalone applications. TheXML-documentbase application will firstly be used and tested in a distributed Electronic-Healthcare-Record environment.
 
This paper is divided into eight sections following the Open Distributed Processing Model. TheJUN-002 describes the major user requirements and needs for implementing a query language for XML documents, whereas theJUN-004 explains the model of the environment, it's component interfaces and shows exemplary object interaction in an event trace diagram. TheJUN-007 outlines some additional structures to synchronise the DOM  (Document Object Model) and underlying XML documents as well as some examples on how to map between xSQL syntax and SQL syntax.JUN-010 explains the theory of theXML-documentbase application and its use of the xSQL engine andJUN-012 lists all used and necessary hard- and software.JUN-014 describes very briefly how XML is used in the Synapses and SynExhealthcare environment to serve patient data wrapped up in XML andJUN-016 finishes the paper.
 

Enterprise Viewpoint

 
This section describes the main user-requirements for the introduction of a query-layer, an additional layer between the raw XML document itself and the (with XSL rules) processed presentation document. It also outlines the benefits whichwill be available to the user of the xSQL engine.


Three tier XML / xSQL / XSL structure

 
 
The strong influence of XML in a varietyof fields, not only in the IT area, lead to an enormous variety of XML documents which include huge collection of data in a single document, short messages wrapped up in XML , well-formed and valid documents, etc. Every week new solutions and ideas using XML are published and DTD  (Document Type Definition) s are discussed in the different newsgroups. In areas with live and continuous data-streams (e.g. healthcare, electronic patient records) XML documents are reaching an unbelievable size with many thousands of elements and attached attributes. Due to their sheer size, navigation in these documents is very difficult. Even with the aid of graphical representations (e.g. Treeview) it is difficult to locate a single record in a document which may contain thousands of elements.
 
Conditional selection of document fragments and conditional merging of two or more existing XML documents are the two main requirements to develop the xSQL engine. A simple syntax extended with additional XML support and an easy to use user-interface were further directives.
 
Conditional selection of document fragments: This procedure extracts document fragments based on conditions which were defined by the user. It not only provides easy access to a collection ofelements with common attributes ortag-names but also creates new XML documents, which are reduced in size by filtering of elements. Creating application specific data sets, filtering of relevant data or automatic statistical evaluation are but a few of the scenarios imaginable.
 
Merging of two or more existing XML documents: On the other side, this procedure provides the power to store data (similar to RDMS ) in different files and declare relations between them. It reduces the risk of data-inconsistencies caused by storing data twice, and reduces the time to maintain the data. Another advantage is the possibility to merge not only local documents but globally distributed ones. This is especially interesting in the healthcare area, where distributed electronic healthcare records are becoming more and more important.
 

Information Viewpoint

 
The following information viewpoint section is divided into two parts, an overview about the model which will be used to implement the xSQL engine and a short description of the existing interfaces. An event tracediagram will show the common scenarios of interaction between the different components.


Component architecture

 
 
The first diagram shows the connections between the main components in the architecture model. The XML documents are retrieved from the local hard-disk or over the network and stored in their internal DOM representation within the application as well as in the underlying database. The xSQL engine resides within the application and creates the Index tables for the XML documents, which are also stored in the database. It serves as a broker between the XML  DOM and its index in the Index table of the database. The xSQL engine retrieves queries from the user and maps it to the database internal query language. The result-setis passed back to the xSQL engine, which requests the specified elements from the DOM and creates a new document, which will be stored and indexed immediately in the database for further use. This allows the possibility of using already existing queries the as source for additional queries.
 
The second diagram outlines the four most important scenarios in the application using an event trace diagram.
 
Start application: The user initiates the start of the application, which then loads existing XML -documents from the database and creates internal DOM representations.
 
Load new XML doc: Again, the procedure is initiated from the user, who specifies a file which should be loaded from the network or a local hard-disk into the application. The application retrieves the XML document, saves it in the docs table of the database, creates and stores the index and creates an internal DOM representation.
 
Query XML doc: Initiated from the user, an xSQL query-string is sent to the xSQL engine, which translates it into database conformant query-string and executes it on the database tables. Depending on the retrieved result-sets from the database, the xSQL engine requests the selected elements (optionally including all children) and generates a new XML document, which will be saved and indexed in the underlying database as before. The resulting document is then presented to the user.
 
Edit XML doc: The user edits the internal DOM , which induces the application to update the corresponding docs and index tables in the databases.


Event trace diagramm

 
 

Computation Viewpoint

 
This section describes the implementation specific background and give some examples of the xSQL to SQL mapping, which is realised in the xSQL engine.
 
To establish a tight synchronisation between the XML  DOM in the application and the index table in the database a unique element/node identifier is needed. One possible solution is the introduction of a string, which is a concatenation of the parent node unique identifier and its actual position within its siblings. An example would be '1.' for the root node and '1.4' for the forth child of the root node. These unique identifiers are stored together with the element information in the Index table of the database.


Unique element identifier

 
 
The maintenance of this additional element identifier is easy and doesn't use many resources. The following scenarios are possible:
 
Add a new node/element: The inclusion of a new element requires the following two operations: Firstly, all actual-level-numbers in the unique identifier of the siblings and their children following the new element must be increased by one. Then, the new element can be included in the intended location.
 
Delete one node/element: Delete the intended element and all its children. Then decrease the level-numbers in the unique identifier of all the siblings which followed the deleted element and all of their children by one.
 
Move one node/element: This is simply a combination of the previous two operations, firstly a deletion of the selected element and then its inclusion at the specified location.
 
This structure also gives the ability to check simply whether an element A exists within the sub-tree of another element B. If B's unique identifier is a prefix of A's identifier then A is a child of B, otherwise isn't.
 
The mapping from xSQL to SQL is done in the following manner. xSQL -select-clause parameters as well as xSQL -where-clause parameter are mapped into SQL-where-clause parameters. The SQL-select-clause-parameter is always the uniqueID in order to retrieve the find the corresponding location within the DOM . xSQL -from-clause parameters are copied without change into SQL-from-clause parameters in the actual version. This might change in the next version when selection from sub-trees (instead of the whole tree) becomes important.
 
The following table shows some examples of the xSQL to SQL mapping actually provided in the xSQL engine (more complex examples will be available on theXML-documentbase Web-site http://www.cs.tcd.ie/Benjamin.Jung/xml/documentbase/ ):


 xSQL to SQL mapping examples

 
 

Engineering Viewpoint

 
'There are few real-world queries that need to do real tree-walking as opposed to ancestor-descendent processing' . This is the reason a relational database with its SQL functionality was the first choice for the prototype. Further implementation might require a more object oriented design and migration to an object oriented database is possible choice.
 
TheXML-documentbase application loads the XML data-file as well as xSQL queries and XSL stylesheetsfrom local or global resources and saves it in the relational database. To perfect the application and make it a real XML / xSQL / XSL suite, later versions should be able to create all the different components of the application in an integrated editor. To create different presentations, the application will process XML source files as well as XML files created from internal queries with different stylesheets. These can then be stored locally or made available globally. The parsing of an XML document after loading it into the application and the processing with a different XSL stylesheets can be done by using any available parser.


 xSQL engine architecture

 
 

Technology Viewpoint



Main screen of theXML-documentbase

 
 
This section outlines the software environment which was used to implement the first version of the xSQL engine. Future changes in the design and support of different components are likely, because this version is only a prototype, which was developed to show the available functionality to a broader audience. It is also expected to receive initial feedback about usefulness and further needs, which will lead to a more sophisticated and user-oriented version in the future.
 
TheXML-documentbase application was build using MS Visual Basic 5.0. Functionality to create and maintain the underlying database (in MS Access 97 format) is provided by the MS jet database engine. XML document parsing and DOM functionality is used through an DLL  (Dynamic Link Library) , msxml.dll version 2.0, that is freely available and shipped with MS Internet Explorer version 5.0b2. Due to the fact that this DLL is still in its beta stage, changes in future releases may be harmful and result in re-coding of various parts of the prototype. This already occurred while upgrading from version 1.0 to version 2.0. It lead to the consideration to move the whole application into a platform independent Java computing environment, which is supported by a huge number of available parsers. It is announced that the final version will be released in the middle of March 1999. The migration to an object-oriented database as the underlying storage for documents, queries, stylesheets and the indices ofthe XML documents will be taken into consideration for the next implementation.
 

The use of XML in the Synapses/SynEx ( FHCR  (Federated Healthcare Record) ) environment

 
This section describes briefly the Synapses and SynEx European healthcare projects and outlines the migration from a pure CORBA  (Common Object Request Broker Architecture) to an combined CORBA - XML environment. More information can be obtained from the Synapses or SynExwebsites.
 
The Synapses Project is a three-year project funded under the EU 4th Framework Health Telematics Programme. The consortium consists of 26 partners from 14 different countries representing the health software industry sector, research institutes and universities, and end-users through the participation of several hospitals. Synapses sets out to solve problems of sharing data between autonomous information systems, by providing generic and open means to combine healthcare records or dossiers consistently, simply, comprehensibly and securely, whether the data passes within a single healthcare institution or between institutions . The SynEx Project as a direct successor of Synapses Project is also funded under the EU 4th Framework Health Telematics Programme and assembles most of the previous partners and countries. The main goal of the two-year project is the integration of previously developed Synapses components with additional commercial modules into one HISA  (Healthcare Information Systems Architecture) .
 
The Synapses FHCR project exploited ideas from federated database technology which provide client applications with an integrated view of data stored in heterogeneous, distributed database systems. At the heart of Synapses is the FHCR server which accepts request for data (in the form of clinical objects) from clients, decomposes them into queries against the connected "feeder" systems, where the data is actually stored and integrates the responses dynamically 'on-the-fly'. Synapses was concerned with the specification of an open standard for the server and its interfaces and for pragmatic reasons used an ad hoc mechanism for exchanging clinical objects between feeders, server and client. An obvious choice for such an exchange mechanism is the XML and this is being actively pursued in the SynEx project.


SynEx CORBA / XML architecture

 
 
The diagram explains the changes which were made to migrate from the existing pure CORBA to an combined CORBA / XML environment. Due to the fact that the Synapses Object model is already object oriented, the task was relatively easy. Two additional methods were included into the Server code, which cast the internal Synapses object oriented record structure into one single CORBA object of type string, which is marked up in XML . RetrieveRecordShape creates an XML data-file with the structure or skeleton (without any patient data) of the Electronic Healthcare Record. The second method, RetrieveXmlComRIC, produces an XML data-file of a (specific) single Synapses object, filled up with patient-data from the feeder systems. This preserves the hierarchical structure and all attributes of each object for later use. For each method exists a correlating CGI  (Common gateway Interface) -script which will be invoked through the Webserver. The CGI -script names are exact copies of the server methods. The CGI server wrapper is used as a broker to cast HTTP  (Hypertext Transfer Protocol) calls into IIOP  (Internet Inter-ORB Protokoll) calls and vice versa.
 

Conclusion

 
As outlined in the previous chapters, the development of the simple xSQL engine was started at the end of last year to make the XML output of the Healthcare Synapses Server visible for clients over the Internet. Search capabilities where desperately needed to decrease the size of the requested documents, to find specific elements within the documents and to create (application-specific) data sets to make statistical evaluations easy available.
 
TheXML-documentbase , which uses the xSQL engine, is still a prototype version. Other companies, especially from the database area, have now integrated XML support in their products, which might make the xSQL engine unnecessary. Still, the easy SQL-like syntax is a big advantage, but it can't reach the functionality and accuracy of the very dense and complex query languages, discussed at the moment. TheXML-documentbase visualises possibilities and advantages of an integrated XML suite, which could be easily added to existing XML / DTD editors.
 
Bibliography
Con97
Connolly, D. (editor),XML Principles, Tools and Techniques , O'Reilly & Associates, 1997.
Hof98
Hoffmann, J.,Introduction to Structured Query Language (Version 4.11) , available at http://w3.one.net/~jhoffman/sqltut.htm", 1998.
Lig97
Light, R.,presenting XML , Sams.net Publishing, 1997.
RLS98
Robie, J., Lapp, J., Schach, D.,XML Query Language (XQL) , http://www.w3.org/TandS/QL/QL98/pp/xql.html, 1998.
Joh97
Johnson, J.L.,Database Models, Languages, Design , Oxford University Press, 1997.
Bra98
Bray, T.,Element sets: A minimal basis for an XML Query Engine , http://www.w3.org/TandS/QL/QL98/pp/sets.html, November 1998.
GGB98
Grimson, J., Grimson, W., Berry, D., Kalra, D., Toussaint, P. and Weier, O.,A CORBA -based integration of distributed electronic healthcare records using the Synapses approach , IEEE Transactions on Information Technology in Biomedicine, 2, 124-138, September 1998.
GBG98
Grimson, W., Berry, D., Grimson, J., Stephens, G., Felton, E., Given, P. and O'Moore, R.,Federated healthcare record server - the Synapses paradigm , International Journal of Medical Informatics, 1998.
Getal96
Grimson, J. et al,Synapses - Federated healthcare record server , Procs. MIE 96, IOS Press, 695-699.
Env97
CEN/TC251 WG I,Healthcare Information System Architecture Part 1 (HISA) Healthcare Middleware Layer , Final Draft prENV 12967-1, not publicly available at the moment, March 1997.
Xml98
Bray, T., Paoli, J., Sperberg-McQueen, C. M.,Extensible Markup Language (XML) 1.0 , http://www.w3.org/TR/1998/REC-xml-19980210, W3C Recommendation 10-February-1998.
Ora98
Oracle Corporation,XML Support in Oracle8i and Beyond , http://www.oracle.com/xml/documents/xml_twp/, November 1998.
SL90
Sheth, A.P., Larson, J.A.,Federated database systems for managing distributed, heterogeneous and autonomous databases , ACM Computing Surveys, 22, 183-235, 1990.

"Book Purchase" Function in a Documentation Self-service at the EDFElectricité de France Research and Development Division   Table of contents   Indexes   The Use of SGML for a Police Information System