![]() |
Query languages | Table of contents | Indexes | A canonical query language &, its efficient implementation | ![]() |
|||
Quilt: an XML query language |
Robie, Jonathan ![]() |
| Jonathan Robie |
| Research Fellow |
Durham ![]() North Carolina ![]() Software AG ![]() USA ![]() | Software AG, Durham North Carolina USA email: Jonathan.Robie@SoftwareAG-USA.com |
| Biography |
| Mr. Robie has a M.S. Degree from Michigan State University. |
| Chamberlin, Don |
| Don Chamberlin |
IBM Corporation ![]() | IBM Corporation, Almaden Research Center, email: chamberlin@almaden.ibm.com |
| Biography |
| Dr. Chamberlin holds a B.S. degree from Harvey Mudd College and a Ph.D. from Stanford University. |
| Florescu, Daniela |
| Daniela Florescu |
| Researcher |
| INRIA | INRIA, email: Daniela.Florescu@inria.fr |
| Biography |
| Dr. Florescu has a Ph.D. in Computer Science from the University of Paris VI. |
| Abstract |
Introduction |
| "Mediocre composers borrow; great composers steal." -- Igor Stravinsky |
| Quilt is a query language for XML. It originated when the authors attempted to apply XML query languages such as XML-QL, XPath, XQL, YATL, and XSQL to a variety of use cases. We found that each of these languages had strong advantages for some of the queries we examined, but was unable to express other queries we found equally important. Therefore, we chose to take some of the best ideas from these languages, plus some ideas from SQL and OQL, integrating them with a fresh syntactic approach. Our goal was to design a small, implementable language that met the requirements specified in the W3C XML Query Working Group's http://www.w3.org/XML/Group/xmlquery/xmlquery-req XML Query Requirements. During our design work, we have adapted features from various languages, carefully assembling them to form a new design--hence the name "Quilt". The resulting language supports queries that draw information from various sources and patch them together in a new form. This is another reason for the name "Quilt". |
A first look at Quilt |
| In this section, we will introduce the Quilt language using some example queries based on the following XML document: |
<?xml version="1.0"?> <!DOCTYPE bib SYSTEM "books.bibitemd"> <bib> <book> <title>Harold and the Purple Crayon</title> <author> <lastname>Johnson</last> <firstname>Crockett</first> </author> <pubinfo> <publisher>Harper and Row</publisher> <price>$4.76</price> <year>1955</year> </pubinfo> </book> <book> <title>Harold's Fairy Tale</title> <author> <lastname>Johnson</last> <firstname>Crockett</first> </author> <pubinfo> <publisher>Harper and Row</publisher> <price>$4.76</price> <year>1956</year> </pubinfo> </book> <book> <title>Rise Up Singing</title> <author> <lastname>Blood</last> <firstname>Peter</first> </author> <author> <lastname>Patterson</last> <firstname>Annie</first> </author> <pubinfo> <publisher>Sing Out Corporation</publisher> <price>$15.45</price> <year>1988</year> </pubinfo> </book> </bib> |
FOR $b IN document("http://www.biblio.com/books.xml")//book,
$a IN $b/author
WHERE $a/firstname = "Crockett" AND $a/lastname = "Johnson"
RETURN $b
|
<Result>
(
FOR $b IN document("http://www.biblio.com/books.xml")//book,
$a IN $b/author
WHERE $a/firstname = "Crockett"
AND $a/lastname = "Johnson"
RETURN $b
)
</Result>
|
<Result>
(
FOR $a IN DISTINCT document("http://www.biblio.com/books.xml")//author
RETURN
<BooksByAuthor>
<Author>
$a/lastname/text()
</Author>
(
FOR $b IN document("http://www.biblio.com/books.xml")//book[author=$a]
RETURN $b/title SORTBY(title)
)
</BooksByAuthor> SORTBY(Author)
)
</Result>
|
| The result of this query on the sample data shown above is as follows: |
<Result> <BooksByAuthor> <Author>Blood</Author> <title>Rise Up Singing</title> </BooksByAuthor> <BooksByAuthor> <Author>Johnson</Author> <title>Harold and the Purple Crayon</title> <title>Harold's Fairy Tale</title> </BooksByAuthor> <BooksByAuthor> <Author>Patterson</Author> <title>Rise Up Singing</title> </BooksByAuthor> </Result> |
| The queries discussed above explicitly state the URL of the document to be queried. This is not always necessary, and is sometimes undesirable. Some queries operate on a set of nodes for which there is no URL, or for which it is not convenient to provide a URL; for instance, a query may operate on the set of documents found in a web site or document repository, the set of nodes found in a collection used in some programming language, or the results of a previous query. Furthermore, it is useful to be able to write queries that may be applied to a variety of data sources, including data sources that do not exist at the time that the query is written. If the document () function does not occur in a variable binding, the binding is applied to the set of input nodes for the query, which is implicitly determined by the environment in which the query is executed. The following query is equivalent to the first query shown in this paper, except that it binds the variable $b to all book elements contained within the implicit set of input nodes for the query: |
FOR $b IN //book, $a IN $b/author WHERE $a/first = "Crockett" AND $a/last = "Johnson" RETURN $b |
| At this point, we have seen the basic structure of a Quilt query. The next section describes the syntax of Quilt in more detail. |
The Quilt language |
| This section discusses the structure and meaning of a Quilt query, starting with top-level syntax and then examining each clause in detail. |
Syntax |
| The following syntax shows the top-level structure of a Quilt query. The grammar is still being developed as the language evolves, and is incomplete in the current version of this document. Terminal symbols are shown in angular brackets and their lexical structure is not further specified. |
Query |
::= |
Function_defn* Expression |
Function_defn |
::= |
'FUNCTION' |
|
| <Function_name> |
|
| '(' <Variable> * ')' '{' Expression '}' |
Expression |
::= |
<Variable> | <Constant> | |
|
| <Function_name> '(' ExpressionList ')' | Expression Operator Expression | |
|
| <XPathExpression> | ElementConstructor | FLWR_Expression | |
|
| '(' Expression ')' |
ExpressionList |
::= |
Expression | Expression ',' Expression |
ElementConstructor |
::= |
StartTag ExpressionList Enbibitemag |
StartTag |
::= |
'<' <String> Attribute* '>' |
Enbibitemag |
::= |
'</' <String> '>' |
Attribute |
::= |
<String> '=' Expression | Expression |
FLWR_Expression |
::= |
(FOR_clause | LET_clause)+ WHERE_clause? RETURN_clause |
FOR_clause |
::= |
'FOR' FOR_binding (, FOR_binding)* |
FOR_binding |
::= |
<Variable> 'IN' 'DISTINCT'? Expression |
LET_clause |
::= |
'LET' LET_binding (, LET_binding)* |
LET_binding |
::= |
<Variable> ':=' 'DISTINCT'? Expression |
WHERE_clause |
::= |
'WHERE' Expression |
RETURN_clause |
::= |
'RETURN' Expression SORTBY_clause? |
SORTBY_clause |
::= |
'SORTBY' '(' ExpressionList ')' ( 'ASCENDING' | |
|
| 'DESCENDING' )? |
Operator |
::= |
'<' | '<=' | '>' | '>=' | '=' | '!=' | '+' | .... |
|
| A Quilt query can begin with the definition of one or more functions that are used in the body of the query. The body of the query is simply an expression. One important type of query is based on an "FLWR_expression", which is constructed from FOR, LET, WHERE, and RETURN clauses. But a query may also be based on other types of expressions, such as an XPath expression or a element constructor. Element constructors are often used to generate new output elements that contain data computed by the query. |
| In this grammar, XPathExpression is a terminal symbol. This symbol represents an expression based on the abbreviated syntax of XPath, enhanced by the following operators which are borrowed from XQL '99: |
|
| We also use the document() function, which is taken from XSLT. In this paper, the argument of the document() function is always a URL. When used in this way, the document() function returns the root node of the document. Some other functions are also used in the example queries, and are introduced where they are used. There is not yet a well-defined standard function library for Quilt. |
| The overall flow of information through an FLWR_expression is illustrated in . The input to the expression consists of one or more XML documents or fragments that are named in the FOR and/or LET clauses. Each of these fragments can be conceptualized as an "ordered forest"--that is, an ordered list of nodes representing XML elements, each with its tree of descendant nodes. The result of the FOR and LET clauses is an ordered list of tuples, each containing a value for each of the bound variables. The value of a variable bound by a FOR clause is a tree (a single node and its descendants). The value of a variable bound by a LET clause is a (possibly empty) ordered forest of nodes. The WHERE clause serves as a filter that discards some of the tuples and retains others. Finally, the RETURN clause is executed for each surviving tuple, generating output nodes from the values of the bound variables. The nodes generated by the RETURN clause can be linearized into an output XML document or fragment. |
|
FOR |
| The FOR clause generates bindings for one or more variables. Each variable introduced in the FOR clause is associated with an expression (typically, an XPath expression.) In general, each of these expressions returns a set of nodes. The result of the FOR clause is a set of tuples, each of which contains a binding for each of the variables. The variables are bound to individual nodes in the sets returned by their respective expressions, together with their descendants. The number of tuples generated by a FOR clause is the product of the cardinalities of the node-sets returned by the respective expressions. The implied ordering among the tuples generated by the FOR clause is derived from the ordering of their elements in the input document, with the first bound variable taking precedence, followed by the second bound variable, etc. |
| In a future version of the Quilt grammar, we plan to provide a way in which the ordering of the tuples generated by a FOR clause can be relaxed, in order to permit more efficient processing of queries in which order is not important. |
| The DISTINCT keyword can be applied independently to each expression in a FOR clause, serving to eliminate duplicate values from the node-sets returned by the expression. Equality is defined by equality of value rather than by identity. A node set generated using the keyword DISTINCT is unordered, and the tuples of bindings generated from such a node set are unordered also. When DISTINCT is specified and several candidate nodes of equal value are available for binding, Quilt does not specify which of the candidate nodes is bound to the variable. |
| The following queries illustrate use of the DISTINCT keyword: |
|
| The tuples generated by a FOR clause can be used to construct tabular representations of hierarchical data, such as might be used in HTML tables or relational databases. The following example query creates part of an XHTML document in which names of authors and book titles appear in separate columns of a table: |
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title> Books By Author </title> </head> <body> <p> The following table lists authors and books written by each author. </p> <table wibibitemh="50%" border="1"> ( FOR $a IN DISTINCT //author , $b IN //book[author = $a] RETURN <tr> <td> $a/first/text(), $a/last/text() </td>, <td> $b/title/text() </td> </tr> ) </table> </body> </html> |
| The table generated by the above query might have the following appearance: |
|
LET |
| The LET clause binds a variable to the value of an expression (typically an XPath expression.) Since an expression in general returns an ordered list of nodes, the value of a variable bound in a LET clause is an ordered list of nodes, together with their descendants (an "ordered forest"). The value bound by the LET clause preserves the ordering of nodes in the input document, unless DISTINCT is specified. DISTINCT causes duplicate nodes to be eliminated and the ordering of the nodes to become undefined. |
| The FOR and LET clauses work together to generate tuples of variable bindings. Unlike a FOR clause, however, a LET clause does not affect the number of tuples that are generated. Each LET clause binds its variable to exactly one value (an "ordered forest"). If a query contains a LET clause but no FOR clause, exactly one tuple of variable bindings is generated. |
| Each binding in a FOR or LET clause may contain references to variables that have already been bound in earlier clauses or in an outer-level query. |
| A LET clause is often used to bind a variable to a set of values that is used as the argument of some aggregate function such as avg(). For example, the following query returns the average price of all the books in the input document: |
LET $b := //book RETURN <avgprice> avg($b/price) </avgprice> |
| A LET clause can be combined with a FOR clause to generate several sets and to perform some computation on each. The following example uses variable $pub to iterate over all the publishers in the input document, binds variable $pri to the set of prices of books published by each publisher, and then generates an output document that lists each publisher with its average book price. |
FOR $pub IN DISTINCT //publisher
LET $pri := document("http://www.biblio.com/books.xml")
//book/pubinfo[publisher=$pub]/price
RETURN
<pubinfo>
$pub ,
<avgprice> avg($pri) </avgprice>
</pubinfo>
|
WHERE |
| The FOR and LET clauses generate tuples of bindings for the variables that are introduced in these clauses. Each of these tuples is subject to further filtering by the WHERE clause. Only those tuples of bindings for which the condition in the WHERE clause is true are used to invoke the RETURN clause. |
| In the WHERE clause, predicates may be combined using parentheses, AND, OR, and NOT. Predicates are based on XPath expressions that contain the variables bound in the FOR and LET clauses. The WHERE clause may also use several operators taken from XQL: set intersection is expressed with the INTERSECT keyword, sequence is expressed with the BEFORE and AFTER operators, and set difference is expressed using the EXCEPT operator. These operators will be illustrated by example queries throughout the remainder of this paper. |
| In the following example, the WHERE clause uses a built-in function named exists(), which returns False if its argument is the empty set, and otherwise returns True. The query finds books that have titles but no authors. |
<orphan_books> ( FOR $b IN //book WHERE exists($b/title) AND NOT exists($b/author) RETURN $b/title ) </orphan_books> |
| Remember that variables bound in a FOR clause are bound to individual nodes (with their descendants), but variables bound in a LET clause are bound to ordered sets of nodes (with their descendants). In the WHERE clause, appropriate predicates must be used with each type of variable. For example, in the following query, $bk is bound to a set of books, and the WHERE clause appropriately applies a count() function to count the number of books in the set. The query returns publishers who have published more than 100 books. |
FOR $pub IN DISTINCT //publisher LET $bk := //book[pubinfo/publisher=$pub] WHERE count($bk) > 100 RETURN $pub |
| If it were desired to add an additional condition on books, such as "find publishers who published more than 100 books in 1999", this condition could not be added to the WHERE clause, since the WHERE clause has access only to sets of books, not to individual books. The proper place to add such a condition would be in the XPath expression that defines $bk, as follows: |
FOR $pub IN DISTINCT //publisher LET $bk := //book[pubinfo/publisher=$pub AND pubinfo/year="1999"] WHERE count($bk) > 100 RETURN $pub |
RETURN |
| The RETURN clause generates the output of the FLWR-expression, which may be a node, an "ordered forest" of nodes, or a primitive value. The RETURN clause is invoked once for each tuple of variable bindings that is generated by the FOR and LET clauses and satisfies the condition in the WHERE clause. If an ordering exists among the tuples of variable bindings, the RETURN clause is invoked on each tuple, in order, and the order is preserved in the output document. The RETURN clause contains an expression that may include structured XML text, bound variables and XPath expressions based on these variables, and subqueries. We have already seen many examples of RETURN clauses, and we will see more in the queries that follow. |
| It is often important to specify an order for the elements in a query result, supplementing or superceding the ordering derived from the bindings of variables. If the query result contains several levels of elements, an ordering may be required among the elements at each level. Quilt provides a SORTBY clause that may be used after a constructed element, variable, or path expression to specify an ordering among the generated elements. The arguments of the SORTBY clause are evaluated relative to the individual nodes to be sorted, and may be followed by ASCENDING or DESCENDING to specify the direction of the sort (ASCENDING is the default). |
| The following query illustrates some of the features of SORTBY. It generates an alphabetical list of publishers. Within each publisher it generates an alphabetical list of book titles, followed by a list of authors ordered by their last and first names. The notation "SORTBY(.)" indicates that the generated elements are to be ordered by their content. |
<result> ( FOR $p IN DISTINCT //publisher RETURN <publisher> <name> $p/text() </name> , <books> ( FOR $t IN DISTINCT //book[pubinfo/publisher = $p]/title RETURN $t SORTBY(.) ) </books> , <authors> ( FOR $a IN DISTINCT //book[pubinfo/publisher = $p]/author RETURN $a SORTBY(lastname, firstname) ) </authors> </publisher> SORTBY(name) ) </result> |
Filtering documents |
| An important class of queries extracts selected elements from a document while preserving the relationships among the selected elements. In many cases, both hierarchic and sequential relationships must be preserved. This type of query can be expressed in Quilt by using the keyword FILTER in a LET clause. |
| As we have seen, a LET clause binds a variable to an ordered set of nodes that are selected by an XPath expression. We will refer to these nodes as the "selected nodes". In general, each of the selected nodes carries with it all its descendant nodes, so we may think of the selected nodes as a list of trees (node hierarchies). A LET clause may also contain the keyword FILTER followed by a second XPath expression called the filter. Each of the subtrees in the selected list is "filtered" in such a way that only those nodes are retained that satisfy the filter XPath expression. Descendants of nodes that satisfy the filter are not retained unless they independently satisfy the filter. The hierarchical and sequential relationships among the surviving nodes are preserved (for example, if node Y is a descendant of node X and all the intervening nodes between X and Y fail to satisfy the filter, then Y becomes an immediate descendant of X.) |
| If the right-hand side of a LET clause contains only the FILTER keyword followed by a filter expression, the input to the filter is the implicit input document of the query. |
| shows the effect of applying a filter to a tree of nodes. In this case, the filter retains only nodes of type A and B, causing the original tree to split into two trees. |
|
| We will illustrate the use of FILTER by writing some queries that operate on a document named "cookbook.xml". The document contains many different kinds of elements, including Section elements that may be nested inside each other to many levels of nesting. Each Section element immediately contains a Title element. Mixed among the other elements, at various levels of nesting, are Figure elements, which may also contain Title elements. The following fragment illustrates the structure of the cookbook: |
<Section><Title>Desserts</Title> <Para>This section tells you all about yummy desserts.</Para> <Section><Title>Cookies</Title> <Para>Cookies are flat things that sometimes have chocolate chips. <Figure image="cookie1"><Title>A Big Cookie</Title></Figure> </Para> </Section> <Section><Title>Candy</Title> <Para>Candy is dandy.</Para> </Section> <Section><Title>Pies and Tarts</Title> <Para>Pies and tarts are even better.</Para> <Figure image="applepie"><Title>An Apple Pie</Title></Figure> <Figure image="peachpie"><Title>A Peach Pie</Title></Figure> </Section> <Section><Title>Cakes and Cupcakes</Title> <Para>Cakes and Cupcakes are absolutely awesome.</Para> <Section><Title>Cakes</Title> <Figure image="choc_cake"><Title>A Chocolate Cake</Title></Figure> </Section> <Section><Title>Cupcakes</Title> <Para>Cupcakes are not as big but just as good.</Para> </Section> </Section> </Section> |
| Suppose that we wish to generate a table of contents for the cookbook, in which the hierarchy of nested sections is preserved but only the title of each section is included. We might think of this as a process of "projecting" the sections and section titles out of the document, preserving their original hierarchy and sequence, while eliminating other kinds of elements. The following query could be used for this purpose: |
<toc>
(
LET $s := document("cookbook.xml")
FILTER //Section | //Section/Title | //Section/Title/text()
RETURN $s
)
</toc>
|
| In this example, the LET clause uses FILTER to bind variable $s to a copy of the cookbook document that contains only Section elements and Title elements that are immediately contained within Section elements, preserving the original relationships among these elements. All other elements are removed, even if they are descendants of a selected element. Since $s then contains exactly the elements that we wish to include in the output, the query can simply return $s without any further processing. The result of applying this query to the document fragment above is as follows: |
<Section><Title>Desserts</Title> <Section><Title>Cookies</Title> </Section> <Section><Title>Candy</Title> </Section> <Section><Title>Pies and Tarts</Title> </Section> <Section><Title>Cakes and Cupcakes</Title> <Section><Title>Cakes</Title> </Section> <Section><Title>Cupcakes</Title> </Section> </Section> </Section> |
| The hierarchy and sequence of the query results are determined by the document, and could not have been known in advance by the person formulating the query; in fact, the purpose of the query is to determine this unknown hierarchy and sequence. |
| Now suppose that we wish to extend our table of contents to include the number of Figures in each Section. This is a much more complex transformation in the structure of the document, and we will accomplish it by defining a function. As shown in the Quilt grammar, a query can begin by defining a function that takes one or more parameters. In this example, we will define a function named "section_summary" that takes a Section element as its parameter. A function is allowed to call itself recursively, and we will make use of this capability in this example. The "section_summary" function generates a new Section element that contains only the Title of the original Section and the number of Figures that are immediately contained in it; it also invokes itself recursively to process all immediately nested Sections in the same way. |
| The main part of the query invokes the "section_summary" function on all the top-level Sections of the document. But before doing this, it filters the input document so that it contains only Section, Title, and Figure elements. The filtering step is necessary so that all Figure elements appear to be immediately contained within their respective Sections, even if they were originally nested inside some other element such as a paragraph or a list. By eliminating intervening elements, the FILTER enables each Section to compute an accurate count of its Figures. |
FUNCTION section_summary($s)
{
<Section>
$s/Title ,
(
LET $f := $s/Figure
RETURN <Figcount> count($f) </Figcount>
) ,
(
FOR $ss IN $s/Section
RETURN section_summary($ss)
)
}
<toc>
LET $stf := document("cookbook.xml")
FILTER //Section | //Title | //Figure
FOR $s IN $stf/Section
RETURN section_summary($s)
</toc>
|
| The result of applying this query to the document fragment shown above is as follows: |
<Section><Title>Desserts</Title> <Figcount> 0 </Figcount> <Section><Title>Cookies</Title> <Figcount> 1 </Figcount> </Section> <Section><Title>Candy</Title> <Figcount> 0 </Figcount> </Section> <Section><Title>Pies and Tarts</Title> <Figcount> 2 </Figcount> </Section> <Section><Title>Cakes and Cupcakes</Title> <Figcount> 0 </Figcount> <Section><Title>Cakes</Title> <Figcount> 1 </Figcount> </Section> <Section><Title>Cupcakes</Title> <Figcount> 0 </Figcount> </Section> </Section> </Section> |
Querying relational data |
| Since much of the world's business data is now stored in relational databases, access to relational data is a vitally important application for an XML query language. In this section, we will illustrate the use of Quilt to access relational data by a series of examples based on a very simple database that describes an online auction. The three tables in the auction database are shown below. The USERS table has a row for each user who is registered with the auction, either as a seller or a bidder, with the USERID column as its primary key. The ITEMS table lists all the items that are currently for sale, with the ITEMNO column as primary key and the OFFERED_BY column as a foreign key that matches the USERID of the offering user. The BIDS table lists the bids that have been received for various items, using the USERID and ITEMNO columns as foreign keys to identify the userid of the bidder and the item to which the bid applies. |
USERID |
NAME |
RATING |
|
ITEMNO |
DESCRIPTION |
OFFERED_BY |
RESERVE_PRICE |
|
USERID |
ITEMNO |
BID_AMOUNT |
BID_DATE |
|
| In this section, we will assume that a layer of software running on top of a relational database system presents each table in the form of an XML document. The name of the document element is the same as the name of the table, and each row of the table is represented by a nested element that has the same name as the table with the suffix "_tuple". Inside each row element are nested a series of elements that represent the column-values stored in that row. For example, two rows of the USERS table might be represented by the following XML view: |
<users> <user_tuple> <userid> 1005 </userid> <name> Baker </name> <rating> B </rating> </user_tuple> <user_tuple> <userid> 1007 </userid> <name> Carter </name> <rating> A </rating> </user_tuple> </users> |
| The Document Type Definitions for the XML views of the three tables in our online auction database are as follows: |
<!DOCTYPE users [ <!ELEMENT users (user_tuple*)> <!ELEMENT user_tuple (userid, name, rating)> <!ELEMENT userid (#PCDATA)> <!ELEMENT name (#PCDATA)> <!ELEMENT rating (#PCDATA)> ]> <!DOCTYPE items [ <!ELEMENT items (item_tuple*)> <!ELEMENT item_tuple (itemno, description, offered_by, reserve_price)> <!ELEMENT itemno (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ELEMENT offered_by (#PCDATA)> <!ELEMENT reserve_price (#PCDATA)> ]> <!DOCTYPE bids [ <!ELEMENT bids (bid_tuple*)> <!ELEMENT bid_tuple (userid, itemno, bid_amount, bid_date)> <!ELEMENT userid (#PCDATA)> <!ELEMENT itemno (#PCDATA)> <!ELEMENT bid_amount (#PCDATA)> <!ELEMENT bid_date (#PCDATA)> ]> |
| It is clear that many applications will require structured XML views of relational data, in which elements derived from different tables are nested to represent their logical relationships. We believe that an XML query language such as Quilt could be used to construct such structured XML views from simple default XML views of individual tables such as the ones described above. At the end of this section, we will give an example of how such a view can be constructed. |
| The example queries below illustrate several common types of access to relational data. Since SQL is a well-known relational language, we introduce each query by expressing it in English and in SQL, and then show its representation in Quilt. For each query, we also show a partial result containing enough data to clarify the expected structure of the query result. |
| Most of the queries shown below generate their output in the form of a single element named <result>, with nested elements that contain the data of the query result. This is a convention that is not required for all queries, as we will see in some examples near the end of the section. |
| Our first example query is a very simple relational query that retrieves selected columns from the rows of one table that satisfy some search-condition. |
|
| In Q1, the analogy between the clauses in SQL and Quilt is straightforward. The Quilt FOR-clause, like the SQL FROM-clause, binds a variable that ranges over the rows of the table. In both languages, the WHERE-clause is used to filter the set of rows by applying a search condition. In Q1, the search-condition could alternatively have been expressed by using predicates in square brackets in the FOR-clause. The Quilt RETURN-clause provides the function of the SQL SELECT and ORDER BY clauses, generating output, including tags, and controlling the order of the output data. |
| Query Q2 illustrates how data can be formed into groups and an aggregate function can be applied to each group. |
|
| The following aspects of the Quilt expression for Q2 are worth noting: |
|
| Query Q3 represents the type of query called a "join" by relational systems. |
|
| The result of Query Q3 includes some elements whose names are derived from the original document and some other elements whose names are generated by the query itself. For example, the expression $i/itemno evaluates to an element named "itemno", which is included in the query result with its original tag. The expression $u/name would have generated an element named "name". Since it is desired to include the text content of the name element in the query result with a different tag ("seller"), Q3 uses the text() function of XPath to extract the text content and encloses it in explicit begin- and end-tags. |
| Query Q4 is a relatively complex query that combines the relational functions of join, grouping, and ordering. |
|
| In the Quilt expression for Q4, the WHERE clause applies a filtering condition to the sets that are bound to $b in the LET clause. In this case, the WHERE clause of Quilt is analogous to the HAVING clause of SQL. |
| Query Q5 matches information from the USERS table with information from the ITEMS table, preserving those rows of the USERS table that have no counterpart in the ITEMS table. In relational terminology, this type of query is called an "outer join." |
|
| The following features of the Quilt expression for Q5 are worth noting: |
|
| Q6 illustrates the use of the exists() function, which takes a set of elements as its argument and returns True if the set is non-empty. In Q6, the argument of the exists() function is an XPath expression. |
|
| Unlike our previous examples, Q6 does not "wrap" its result in an enclosing element such as <result>. Instead, Q6 returns a list of <name> elements, ordered by their content. It is also possible for a Quilt query to return a simple value, as in Q7. |
|
| We have previously mentioned the importance of providing a nested view of a relational database. A Quilt query can be used to define such a view, based on simple default views such as the ones in the above examples. A view definition facility might take the form of a DEFINE statement that uses a query to define a view that can be subsequently used in other queries and view definitions as though it were a real document. A system that implements such a facility would need to maintain a catalog of view definitions. Although the details of view definition in Quilt remain to be worked out, we provide in Q8 an example of what a DEFINE statement might look like. |
|
Querying structured documents |
| In XML, the two main forms of structure are hierarchy and sequence. Since purely relational databases have neither hierarchy nor sequence, most database systems do not support queries that take advantage of this structure. Several queries shown earlier in this paper create hierarchies from flat relational data, map one hierarchy onto another hierarchy, or flatten hierarchical data to show a tabular view. We have also seen some queries that perform some transformation on a document while preserving its hierarchical structure. In this section, we will investigate queries that use the hierarchy or sequence of a document to formulate conditions. Our queries will be based the following surgical data, provided to us by Dr. Tom Lincoln, taken from a fictitious patient record encoded in a HL7 Patient Record Architecture format: |
<section><section.title>Procedure</section.title> The patient was taken to the operating room where she was placed in a supine position and <Anesthesia>induced under general anesthesia.</Anesthesia> <Prep> <action>Foley catheter was placed to decompress the bladder</action> and the abdomen was then prepped and draped in sterile fashion. </Prep> <Incision> A curvilinear incision was made <Geography>in the midline immediately infraumbilical</Geography> and the subcutaneous tissue was divided <Instrument>using electrocautery.</Instrument> </Incision> The fascia was identified and <action>#2 0 Maxon stay sutures were placed on each side of the midline.</action> <Incision> The fascia was divided using <Instrument>electrocautery</Instrument> and the peritoneum was entered. </Incision> <Observation>The small bowel was identified</Observation> and <action> the <Instrument>Hasson trocar</Instrument> was placed under direct visualization. </action> <action> The <Instrument>trocar</Instrument> was secured to the fascia using the stay sutures. </action> <!-- The section continues, but we excerpt it here... --> |
| Note that this document has a structure like that generally associated with documents, but contains data that partly resembles traditional database data. This is a useful general approach to embedding data in narrative or other kinds of documents. |
| First we will examine a few queries that use absolute position. XPath expressions use subscripts to indicate absolute position. Consider the following query: |
|
| Note that the precedence rules of XPath do not give the desired result without the parentheses in the RETURN clause. The expression "$s//Incision[2]" would return nodes that are the second Incision node within their immediate parent. Since we are looking for the second Incision among all the descendants of $s, regardless of level, we use the expression "($s//Incision)[2]". |
| Subscripts can also be used to indicate ranges. Consider the following query: |
|
| Now we will consider three queries that use the BEFORE and AFTER operators of XQL. The expression "exp1 BEFORE exp2", in which exp1 and exp2 are path expressions, evaluates to the list of nodes selected by exp1 that occur before at least one node selected by exp2. We will start with a query that combines AFTER with subscripts: |
|
| Next we will consider a query where the sequence has a more dramatic significance: |
|
| In many cases, BEFORE and AFTER are used to reconstruct a narrative, as in the following example. |
|
Exploiting references in documents |
| References from one element to another are an important part of XML. Since XPath expressions are a basic building block of the Quilt grammar, Quilt could exploit references by using the ID function of XPath, which was designed for this purpose. Because of the importance of references, however, Quilt introduces a new reference-related abbreviation that can be used in XPath expressions in Quilt queries. |
| In this section, we will describe the Quilt syntax for following references. Our examples will be based on intra-document references using attributes of type ID and IDREF, but we believe that a straightforward extension can accomodate inter-document references based on the XPointer specification. |
| Quilt introduces the notation X-> as an abbreviation for the XPath function ID(X). The expression X-> evaluates to the set of nodes that are referenced by X. The commonest use of this notation is with an attribute of type IDREF or IDREFS. For example, suppose that an element of type Figref has an IDREF-type attribute named "target" that references a Figure element. Then, in a Quilt query, the following expression could be used to find all Figure elements that are targets of Figref elements: |
//Figref/@target-> |
| The Quilt "right-arrow" notation allows an expression containing a reference to be read from left to right, and is intended to be easier to read than the XPath functional notation. For example, the following expression finds the titles of all Figure elements that are targets of Figref elements: |
//Figref/@target->/Title |
| To illustrate the use of references in Quilt queries, suppose that we have a repository containing a collection of papers. The repository might be represented as a large XML document in which each paper is represented by a Paper element with a unique ID attribute. Nested immediately inside each Paper element is a Title element. Many of the papers reference other papers, and these references are represented by Reference elements nested at some level inside the Paper elements, each containing an attribute named "paperid" of type IDREF that matches the ID attribute of the referenced paper. |
| The example query below illustrates the use of references, and also illustrates how a Quilt query can transform the structure of a document. In this example, Title elements in the input document are transformed into attributes in the output document. |
|
Conclusion |
| Traditionally, documents and databases have been considered to be separate and distinct forms of information. Documents have an irregular structure in which hierarchy and sequence are very important. Databases have a much more regular structure and, in relational databases, place little importance on hierarchy and sequence. With the emergence of XML, the sharp distinction between different forms of information such as documents and databases is quickly disappearing. Unfortunately, most existing query languages are either document-oriented or database-oriented, and find it awkward or impossible to express queries outside the domain for which they were designed. |
| The Quilt language attempts to pull together features from several languages that enable it to operate on a broad range of data sources. From XPath and XQL it draws a powerful path expression syntax that can navigate inside a hierarchical document, selecting a set of nodes that satisfy a complex predicate. From XML-QL it draws the notion of bound variables and a versatile syntax that can generate an output document of arbitrary structure. |
| This paper has illustrated the versatility of Quilt by using it to express queries against diverse sources, ranging from documents to relational databases. On relational data, we have written queries involving joins, outer joins, and grouping. On documents, we have written queries that preserve hierarchy and exploit ordering and references. We believe that Quilt represents a promising approach to a query language that can help to realize the potential of XML as a universal medium for data interchange. |
| Bibliography |
|
|
|
|
|
|
|
|
|
|
|
|
![]() |
Query languages | Table of contents | Indexes | A canonical query language &, its efficient implementation | ![]() | |||