The XML Assembly Line: Better Living Through Reuse   Table of contents   Indexes   Problems with linking, and reuse of text

 

XML data processing and Relational Database Systems

 Dr. med. Noelle   Guido
  Managing Director
  MED medicine online GmbH  Friedrich-Ebert-Strasse
Bergisch Gladbach   NRW  Germany  51429
Phone: +49 2204 8437 30
Fax: +49 2204 8437 31
Email: noelle@medicineonline.de Web: www.medicineonline.de
 
Biographical notice:
 
Dr. Noelle was born 1962 in Cologne/Germany. After his degree in medicine he was working 6 years as internal physician near by Osnabrück/Germany. 1992 he founded GMD - Gesellschaft für medizinische Datenverarbeitung GmbH in Cologne. Since 1995 he is the Managing Director of the MED medicine online GmbH in Bergisch Gladbach/Germany.
 
He look's back over 15 years experience in computer programming for healthcare applications. 1991 he designed MeDoc, a computer-based medical documentation system under MS-Windows . Since 1993 he is engaged in internet and intranet technologies .
 
Since 1999 he is a part-time lecturer at the advanced technical college in Hildesheim/Paderborn (Germany) for web-developement. With the Medical Informatics Department of the Justus-Liebig-University in Giessen (Prof. Dr. Dudeck) he is project partner in the XML/EDI Pilot Project sponsored by the EU. He has given lectures and published a lot of articles about medical informatic subjects. Actually his company is developing an intranet-based computer system for general practitioners.
 
ABSTRACT:
 
 XML  (Extensible Markup Language) documents can become complex hierarchical tree structures. Accessing parts of this tree can be a time consuming task which will not be tolerated by the users. A concept is presented which combines the advantages of relational databases and  XML representations. First applications in physician office systems have come out very successfully.
 

Introduction

 
My mission statement about  XML could be like the following: "  HTML  (Hypertext Markup Language) includes universal possibilities to present information,  XML includes universal possibilities to work with information. These days the users don't only want to see Websides, they want to work with them. And therefore  XML will help us very much in the next years."
Large XML Files
 

Whereas  XML is mostly discussed only as a data interchange format, we think that  XML will grow up to a storage and object format.  XML is especially suitable for working with unstructured data, and so medical data.
 
Although there are many unsolved problems with  XML today, we should begin to design  XML applications in healthcare for the future. In our point of view one central problem is today to work with large  XML files and structures: We have make the experience that in large  XML files data processing and accessing is a time consuming part. Accessing a special node or node-information in a big  XML file can take up several minutes.
 
No matter if this is an effect of β-Version-Software or the (non) powered hardware, we think that we have to develop database processing methods to work with large  XML files.
 

Technical Environment

Microsoft Platform
 

We are working completely on a Microsoft-based platform, which means to use MS Windows NT with MS Backoffice as a server-solution and MS Internet Explorer 4 or 5β on the client platforms. We use client- and server-side scripting, the  XML DOM-Parser from Microsoft and for the database access the Microsoft ADO extensions via OLE-DB/ODBC.
 
Although the Microsoft products often live their own life and have their own interpretation of the W3C standards we think that - today - the Microsoft products support  XML processing more than other products.
 

Possible Solutions

 
We just work in the last month to evalute  XML in the way of database processing. Meanwhile there are some good tools to work with  XML in object or post relational databases: But Poet and Cache are very expensive and they are only suitable for large applications like great hospitals or universities.
 
On the other handside conventional relational database systems are widespread, cheap and built up to work with terabytes of data without problems. But the disadvantage of RDBMS is the limited structuring possibility of data, one of the benifits of  XML .
 

Filestorage

 
Therefore  XML as a fileformat is not suitable to work with a large amount of data. Poor performance, missing security aspects and locking problems, for example write-access in a multi-user environment, force an issue to utilize databases (Figure NOE-005 ).

Storage of XML in a file

 
XML in a relational database
 

Database Storage

 
There are several ways to put  XML data in a relational database. Putting  XML data in a database works on the principle of splitting the documents in useful fragments and store them in table rows. The problems of indexing, referencing, searching and retrieving will be discussed below.
 

Storage of single element-values and attributes in a table

 
One possibility is to store each single  XML item in a single table-row. The parser have to split the  XML document in the database or combine the single elements from the database to a complete  XML document (Figure NOE-008 ). Thus the table-field ID must contain informations about the element level to rebuild the document properly. In large documents this procedure will also cost performance and time. The table will grow up very fast and contain a lot of rows. The benifit of this method is the possibility to identify each item and element with usual  SQL  (Structured Query Language) statements.

One element - one table row

 
 

Storage of complete nodes in a table

 
Another possibility is to split the  XML document in node-fragments and store these in a table. For example in a  XML document which contains (all) patients in a hospital you would put each single patient in one record. The  XML fragment would be stored in table blob-fields. The records are identified by an ID as an unique primary index (Figure NOE-010 ). The performance is much better because the parser has not much to work. On the other hand the data retrieval with  SQL on blob-fields is possible but will be disperformant. A better solution for solving this problem is to built meta index structures in a second table as shown below.

XML fragments in a table row

 
 

Building meta index tables

meta index table
 

In a second table, called meta index table , we put information about interesting element or attribute values. The column Index Name represents the element name, the column Index Value the value of the element or attribute, we want to search for. The "REF-to-ID" column contains a reference or link to the row in the  XML table, which includes the  XML fragment we search for. So with a simple  SQL statement we are able to localize the  XML fragments of interest (Figure NOE-012 ).

SQL queries over a meta-index table

 
 

Normalization

 
In more complex data structures you have to distribute the information among different tables and cross-reference the information by special  XML attributes like ID and REF . For example we build one table with patient-information (name, address, ...) and another table with diagnosis-data (Figure NOE-014 ). In contrast to "normal"  SQL statements with joins here we have to define our  SQL statements step by step.

Referencing tables

 
 

Next Steps

eXot - extensible organ specific tumor documentation
 

There are still some problems which we want to solve in the near future. In our eXot -project (Figure NOE-016 ), an organ specific tumor documentation under  XML , we actually evaluate  XML in our application design on each layer: We built prototypes for dynamic visualization of a reference information model in  XML or in a rational database system, dynamic queries with  XSL  (Extensible Stylesheet Language) on  XML -Data in databases, distribute information with server-server communication servlets and design a  XML -based form modeller, which allows to define  XML -based input-forms in a  HTML side. In eXot there is a  HTML framework with global function definitions in Java Scripting, the (organ) specific inputform is written in  XML -Code which is dynamically loaded in the framework. Our present findings give us cause for optimism to develop creative solutions for future problems in  XML .

eXot - eXtensible Organspecific Tumordocumention in XML

 
 

Conclusions

XML - the better alternative
 

We think that  XML will become in the near future more and more importance for creating dynamic user interfaces, manage application business logic and distributed data storage. Specially for the growing requirements of computer-assisted managed care applications, quality assurance programs and medical documentation  XML already is not only suitable, it is rather the better alternative. Nevertheless  XML will not replace classical tools like relational database systems, but make use of them.
 
Acknowledgments
 
The author wants to thank Prof. Dr. Dudeck and his team for their engagement to establish  XML in healthcare. I hope that in future they can fill even more poeple with enthusiasm in working with  XML .
 
Bibliography
Noe 1
G. Noelle, F. Warda, J. Dudeck: Kommunikation ohne Grenzen (Krankenhaus Umschau 1/1999, 14-16, 1999)
Noe 2
G. Noelle, A. Lüthy: Das Internet - eine kurze Historie (In Heuser/Lüthy (Hrsg.): Internet und Intranet @ Krankenhaus, Baumann Fachzeitschriftenverlag, 1998)
Noe 3
G. Noelle, M. Hettlage: Gegenwart und Zukunft - Wohin entwickelt sich die Webtechnologie? (In Heuser/Lüthy (Hrsg.): Internet und Intranet @ Krankenhaus, Baumann Fachzeitschriftenverlag, 1998)
Noe 4
G. Noelle: MeDoc: MeDoc EDV-gestützte Dokumentation und Interpretation medizinisch-technischer Daten (Prävention und Rehabilitation, Jahrgang 7, Nr.2/1995, 82-89)
Noe 5
G. Noelle, A. Bienek, A. Wirth: EDV in der stationären Rehabilitation: Medizinische Befunddokumentation für klinische und wissenschaftliche Zwecke - MeDoc (Prävention und Rehabilitation, Jahrgang 6, Nr.1/1994, 41-45)

The XML Assembly Line: Better Living Through Reuse   Table of contents   Indexes   Problems with linking, and reuse of text