Serializing Graphs of Data in XML   Table of contents   Indexes   XML: The way toward the virtual library

 

XSL Theory and Practice

 Neil   Bradley
  Senior Consultant
  ThomsonConsulting International  31st Floor
Centre Point
New Oxford Street
London   United Kingdom  WC1A 1PG
Phone: +44 171 917 1491
Email: neil.bradley@thomsonconsulting.com Web: www.thomsonconsulting.co.uk
 
Biographical notice:
 
Neil Bradley is a Senior Consultant specialising in publishing technologies, with over ten year's experience of building SGML editorial and publishing systems. In 1986, starting as a software developer at Pindar Plc, he wrote tools to aid the process of converting paper patent applications into SGML format, including probably the first ever WYSIWYG table editor. In the early 90's he took charge of a development team supporting the data conversion and editorial system needs of a variety of customers in the oil, engineering and publishing markets. A frequent speaker at industry events, he wrote The Concise SGML Companion, and more recently The XML Companion (both for Addison Wesley). Now working for ThomsonConsulting International, he focusses on the use of XML and related standards within the publishing process.
 
ABSTRACT:
 XSL  
 

This paper provides an overview of the progress and scope of a new stylesheet language called  XSL  (eXtensible Stylesheet Language) . It is compared with previous proprietary and standard attempts, and each feature of the language is briefly explained.
 

What Is a Stylesheet Language?

 
The SGML  (Standard Generalized Markup Language) and  XML  (eXtensible Markup Language) languages promote separation of content from format. This means that formatting instructions are not embedded within the text, but stored separately. The great advantage of this approach is that a different format can be applied to the same text simply by associating the text file with another formatting file. This formatting file is called a 'stylesheet'. Each document may contain a reference to the stylesheet that needs to be applied to it for its contents to be viewed, or an application that presents XML documents may give the user a choice of stylesheets to apply. There have been a number of approaches to defining a stylesheet, with differing syntax for the format instructions and differing capabilities, some proprietary to a product vendor, others the results of a standardisation effort, and the specification for each attempt can be termed a 'language'.
 
Notable efforts have included  CSS  (Cascading Style Sheets) ,  DSSSL  (Document Style and Semantics Specification Language) and  FOSI  (Format Output Specification Instance) . All of these are vendor independent. They have differing strengths and weaknesses and a targeted at different audiences.
 

Introducing XSL

 
XSL is the latest attempt at defining a standard, and is currently being completed by the W3C  (World Wide Web Consortium) . There have been several draft versions, at first differing widely, but now showing signs of settling down.
 
 XSL incorporates a number of features derived from experience gained in developing previous languages, including the use of  XML markup (like the FOSI language) as the syntax of the language itself, the transformation capabilities previously found in DSSSL, and the On-line display features taken from CSS .
 
Using XML as the underlying data format for XSL stylesheets brings a number of benefits. A stylesheet can be created using an XML editor, which may use the XSL DTD to guide the author. A stylesheet can be validated using the XSL DTD and an XML parser. A stylesheet can be stored in a document component management system, promoting efficient versioning, re-use of stylesheet fragments. Finally, an XSL stylesheet can be browsed using an XML-aware browser (possibly using XSL to style the stylesheet!).
 
XSL comprises a set of statements for replacing XML elements with formatting styles, using 'pattern matching' rules, but goes far beyond CSS in its ability to re-order and re-use parts of the source document in order to do such things as generate a table of contents. In addition, it is possible to use the pattern matching capabilities to make it act as a transformation language, outputting a different XML document structure, or even an HTML document (perhaps for viewing in pre-XSL-aware browsers).
 
It is expected that XSL will reach 'Recommended' status very soon. But even at this stage, there are a number of products available that have implemented one or other of the draft specifications, including Microsoft's Internet Explorer (version 5.0 beta) and the Java-based LotusXSL transformation tool (available from www.alphaworks.com).
 

Pattern Matching

 
At the heart of the XSL format is the capability to identify elements in the source documents which require specific formatting rules to be applied. This is termed 'pattern matching'. Its capabilities in this area range from the very simple (apply these formats to every element with the name 'Para') to the very complex (apply these formats to every 'Para' element that appears within a 'Section' element, immediately follows a 'Figure' element, and has an attribute of 'type' with a value of 'continuation').
 
In the simplest case, the pattern is just the name of an element type. For example, the pattern 'Para' matches all 'Para' elements in the document. The '|' symbol is used to group elements that must be formatted in the same way, so avoiding repetition. The pattern 'Para|Note' applies to both 'Para' elements and 'Note' elements.
 
Context can be taken into consideration using the '/' symbol, so the pattern 'Chapter/Para' matches all 'Para' elements that are contained directly within 'Chapter' elements, and 'Chapter/Intro/Para' matches only those paragraphs in the chapter's introduction. The '/' symbol on its own represents the root element. Users of command-line operating systems will immediately see the parallels with directory structures in this scheme, and indeed the '*', '.' and '..' notations are also used, and have predictable meanings to this audience. The '*' symbol is a 'wild-card' that stands-in for any element name. For example, 'Chapter/*/Para' matches paragraphs that are directly contained within any element that is itself directly contained within a 'Chapter' element, and 'Chapter/*' matches any element directly contained in the chapter. Note that '.' and '..' apply to relative locations from a 'current' node, and will be covered later, when discussing transformations.
 
It is also possible to restrict matches to elements that contain other specified elements, or include specific attributes and attribute value. AND and OR constructs can be used to further restrict or to broaden the matching criteria.
 

Templates

 
An XSL document mainly consists of a number of 'templates'. Each template contains a pattern that identifies when it should be applied. The 'template' element contains a 'match' attribute to do this, and the following template is activated and applied to each occurrence of a paragraph in the source document:
 
<template match="Para">...</template>
 
Source document elements are usually formatted in document order. If a document contains only three paragraphs, the first one is identified and matched with a template, which formats its contents, then the second paragraph is processed in the same way, and finally the last paragraph is formatted. However, XML documents are organised into hierarchical structures, so XSL must include a mechanism to reflect this structure in its rules. The XSL processor must 'drill-down' into the document structures, applying appropriate templates at each level. The paragraphs may be embedded within chapters, and may themselves contain emphasized words, names or keywords. The 'apply-templates' element is used for this purpose. To ensure that elements embedded within a paragraph are also formatted, the following template can be used:
 
<template match="Para">
<apply-templates/>
</template>
 
This explicit instruction to specify that the children need to be processed may seem superfluous at this point. While it is true that omitting this element allows the content of some elements to be hidden, its true importance will become more clear in the next section.
 
Note that when complex contextual patterns are used, it is possible that more than one template will match a given element instance. For example, a chapter title will match the patterns 'title' and 'chapter/title'. An XSL processor may choose to select the last template in the stylesheet that matches, but it is possible to be more explicit by giving each template a priority rating, using the 'priority' attribute (the more explicit the rule, the higher priority it should normally be given).
 

Transformations

 
Before looking at the actual formatting of text, the ability of XSL to modify the text coming from the source document should be recognised. From simply appending the text 'NOTE:' to the start of the text in a 'Note' element, to creating a table of contents or index, XSL has a number of mechanisms for transforming the document prior to actually styling the raw content.
 
Possibly the simplest form of transformation available in XSL is the ability to add prefix and suffix text to an element, as in the 'NOTE:' example above. To place this text before the content of the 'Note' element, it is entered before the 'apply-templates' element, as below:
 
<template match="Note">
<highlight style="bold">NOTE:</highlight><apply-templates/>
</template>
 
XSL has a powerful range of numbering options, for example to produce 'NOTE (3)', 'NOTE(4)', etc. Elements in the source document can be counted, which means that if any of these elements are re-used elsewhere, using the features described below, the original numbers are retained. In addition, elements in the generated output document can also be numbered, which means that structures built using the transformation features described below can themselves be numbered.
 
The ability to locate and re-use information from other parts of the source document is provided by adding a 'select' attribute to the 'apply-templates' element. For example, to locate and re-use the content of an element in another part of the document, such as to re-reproduce the title of the current chapter at the end of each contained note (a highly unlikely scenario!), it is possible to select the nearest ancestor of the note called 'chapter', then drill-down into that chapter to find its title:
 
<template match="Note">
<apply-templates/>
TITLE: <apply-templates <highlight  style="bold">
select="ancestor(chapter)/title"</highlight>/>
</template>
 
The distinction between pattern 'matching' and pattern 'selection' needs to be made clear at this point. When deciding which rule to apply to an element in the source document (using the pattern matching techniques described above), the 'match' attribute is used. For example, this attribute is used in the 'template' element to identify elements to style in a specific form (match="Para" finds all the paragraphs). But when a rule has been selected, and that rule includes instructions to locate and re-use information from another part of the source document, a 'select' attribute is used to specify the pattern needed to do this search. Selections imply a starting-point, a 'current' node in the source document that is being formatted by a template. From this location, it is possible to specify a pattern that indicates a 'relative' location. Again, symbols familiar to MS-DOS and UNIX aficionados are used, with '.' representing the current node and '..' representing its parent. For example, it is possible to select the title of the current chapter when processing one of its paragraphs using '../Title'.
 
This concept can be used to selectively process children, to locate and copy-out some grandchildren or other descendants, or even to locate and re-use information elsewhere in the document, either located somewhere in the ancestry of the element, or at a given location within the document as a whole, or identified by a unique identifier. The 'apply-templates' element can also be included in the template several times, each time with a different select pattern, and this approach would be ideal for building a table of contents at the top of the document; first listing the embedded chapters, then all the tables, then all the figures.
 
XSL can sort elements during formatting to produce alphabetically arranged output. Multiple sort levels can be arranged, for example to sort a list of names on first the last name, then the first name.
 

Formatting

 
Using the transformation features described above, XSL can format a document by simply replacing source elements with formatting tags in the output file. Outputting HTML will certainly be a popular choice, especially when the document is to be presented in current and older Web browsers. It should also be possible to output formats that are not remotely compatible with XML, such as the RTF format, by replacing source elements with something that the XSL processor will consider to be nothing but plain text, but which an RTF reader will 'see' as formatting instructions. However, there are some complications with this approach concerning significant XML characters such as '<', '>' and '&'; they are output as references such as '&lt;' because it is assumed that the output will be read by another XML application. Some additional processing (or even parsing) would be necessary.
 
Of course, it is possible to create an XML DTD that describes an XML-based formatting language. The XSL specification does in fact define a category of output from an XSL processor called a 'Formatting Object'. When the processor has no formatting capability itself, this may be made concrete in the form of XML elements. The standard includes elements such as 'block' (to represent a block of text that should be separated from surrounding text), and attributes such as 'font-weight' (to indicate whether the content is to be styled in bold). For example, wherever the source document has a 'Note' element, it may be desired that a bold paragraph be presented, and this may be represented in the output XML document as:
 
<block font-weight="bold">This is a note.</block>
 
When the XSL processor is also the formatting application, no such output is required. The application translates the DOM nodes (or whatever API is supported by the XSL processing model) directly into formatted output on the page or on-screen.
 
It is not possible to list, at this stage, all the formatting options the XSL will make available, as this is the least well developed part of the specification. While taking the wide scope of the DSSSL format's capabilities in this area as a starting-point, there is also a need to provide compatibility with CSS to help developers moving-up from this standard.
 

Helper Features

 
XSL also contains a number of features that do not in themselves add any new capabilities in terms of what can be done with a source document, but do make the construction of stylesheets much easier. Among the 'helper' features XSL includes, perhaps the following are the most significant.
 
The Macro feature allows frequently used instructions to be placed in a separate location, and named, so that they can be called from anywhere in the stylesheet. This makes the stylesheet smaller, and potentially more legible. Parameters can be passed to them to make them more flexible.
 
Large stylesheets can be made simpler to use and more re-usable by splitting them into smaller pieces, then using the importing capability to re-combine them in different ways.
 
When a regular structure is to be output, such as a table, the stylesheet can be made much simpler by incorporating all the formatting rules into a single template that uses instructions to state that 'for-each' matching instance, the template should be filled with data from specified sub-elements.
 

What's Out There Now?

 
The Java-based XSL processor called LotusXSL is available from 'www.alphaworks.ibm.com/formula/lotusxsl'.
 
The Java-based XSL processor called XT is available from 'www.jclark.com/xml/xt.html'.
 
The Java-based XSL processor called Koala XSL Engine is available from 'www.inria.fr/koala/XML/xslProcessor/'.
 
Internet Explorer Beta 5 also accepts XML documents that reference XSL stylesheets, but only works if the stylesheet transforms the XML document into an HTML document.
 

Summary

 
There are numerous other features in the draft specification that could not be covered in this short paper, but the general scope and nature of this language should be apparent from this description. With broad industry support, and a number of supporting applications on the way, this attempt to define a standard styling language should be a great success.

Serializing Graphs of Data in XML   Table of contents   Indexes   XML: The way toward the virtual library