![]() |
XML Schema types and equivalence classes | Table of contents | Indexes | XML &, query languages | ![]() |
|||
| XML schema aware processing XML schema validation XML schemas | Derivation, tolerance and validity |
| a formal model of type definition in XML Schemas |
| Brown, Allen |
| Allen Brown |
| Jr. |
| Senior Program Manager |
Microsoft Corporation ![]() Redmond ![]() USA ![]() Washington ![]() | Microsoft Corporation,
One Microsoft Way Redmond Washington 98052-6399 USA Phone: +1 425 705-3290 Fax: +1 425 536-7329 email: allenbr@microsoft.com web site: msdn.microsoft.com/xml/ |
| Biography |
| Abstract |
Introduction |
| The principal idea here is that the designer of a modified or derived type be able to declare the nature of the modification made. In order to exploit fully such a declaration, an application needs to be able to declare what sorts of modifications it is willing to tolerate. Moreover, the validator, upon hearing such advice from the application, needs to validate an instance according to the tolerances declared. In effect we are proposing that we make applications responsible for telling the validator what classes of transformations lead tosubstitutable derivations from theapplication's perspective. The validator, then, is responsible for guaranteeing that the effective derivations fall within the application's tolerances. |
| We open by informally sketching a model of the declaration, the notion of tolerance, and the kinds of derivations with respect to which validations might be successfully conducted. Later, we will give a rigorous account of the concepts in question, from which one can design a concrete specification syntax, and implement a validator. |
| First, let's embellish the element declaration with a tolerance declaration: |
<element type="mumble" tolerates="(attr-extension | elt-extension)*"/> |
| and then define a derived type: |
<complexType name="grumble" base="mumble" derivedBy="attr-extension"> ... </complexType> |
The former declares an element that permits
|
| The example above illustrates how the original type designer might indicate the permissibility of the further elaboration of the type. But an application knowing only about the original type might not be as permissive as the designer. In general, an application should be able to declare to the validator that certain kinds of derivation are tolerated. One possibility is to make a global declaration of such tolerance. Another possibility is to make local declarations of the form "this application tolerates derivations of themumble type by any sequence of transformations of the following kinds..." |
| Now there are two questions: |
In this informal account, we only consider derivationsvis à vis
content models, and "element only" content models at that. (It should be reasonably clear how to extend the same ideas to cover derivations with respect to attributes.) For the time being, types will be conflated with regular expressions, where a derived type is accessed through a new tag.
|
| XML validation | Validation: an informal account |
First, let's consider validation without derivation. Validity is defined recursively roughly as follows: A node of an XML tree tagged with
, which has the type definition
(
being a regular expression in tags) is valid just in case the children of the node tagged with
(when taken in sequence) are in the regular set denoted by R, and each of the children is itself valid. An XML tree, by extension, is valid just in case all of its nodes are valid. |
Without saying what can count as a derivation (other than to note that every type is a derivation of itself), we extend the definition of validity to accommodate derivation. A node of an XML tree tagged with
, is valid just in case
has some derivation
defined by
, such that each of the children of the node tagged with
(when taken in sequence) is in the regular set denoted by
, and each of the children is valid. An XML tree, by extension, is valid just in case all of its nodes are valid. |
| type derivation, subtyping, inheritance | Derivation by transformation |
Suppose
,…,
are type definitions, we will define a set of immediate transformations below yielding a type definition
such that
is an immediate derivation from
,…,
by one such transformation. If
,…,
is a set of transformations, there is the obvious inductive definition of a derivation
from
by a sequence of immediate transformations taken from that set. So what might be interesting immediate transformations? |
Consider the immediate derivation of
from
,…,
by the "prepend" immediate transformation (or inheritance transformation) illustrated as follows: |
|
|
|
|
| yields |
|
where
is a new regular expression in tags. Similarly, one might define the "postpend" immediate transformation by |
|
| One can define a kind of "interleaving" immediate transformation by beginning with the type definitions |
|
|
|
|
|
|
|
| one "derives" them to an inherited type definition |
|
where the
's are all new regular expressions. The transformations described thus far are essentially additive. But there are lots of others. |
Refining
from
by the "insertion"' immediate transformation: |
|
| becomes |
|
Refining
from
by the "deletion" immediate transformation: |
|
| becomes |
|
Deriving
from
by a "restriction on choices" immediate transformation:: |
|
| can be restricted to any of: |
|
|
|
Refining
from
by the "cyclic permutation" immediate transformation:. |
|
| becomes |
|
| No doubt you have noticed that we have not told you how the positions at which the transformations occur are to be identified. Also, the only transformations we have shown are on the immediate subexpressions of regular expressions. There are endless grammar-based processors that allow the identification of subexpressions and modification thereof, so we'll defer providing a solution. |
By composing the transformations described, a wide variety of insertions, deletions and rearrangements can be accommodated. (Indeed, if we allow the transformations above to occur at any level, we get a universal editor.) While each immediate transformation is eminently understandable, their composition can rapidly become opaque, so one may want to declare the number of immediate transformations that are tolerable in refining
from
. An application's robustness against transformations is more likely, however, to be determined by the nature of the transformations tolerated rather than their number. |
Imagine that an application can declare that it will tolerate derivations
from
by any sequence of transformations in a set of immediate transformations
. Thus, in the validation definition above, we amend it to look for not just derivations, but "tolerated" derivations. The validator, of course, should leave a record of what derivations were used at each node during the validation process. Moreover, the application can have access to the transformational relationship between the type associated with a node's actual tag and the type that led to the validation of the node's children. |
The formalities |
| Now we are ready to provide mathematical models of schemas, instances and derivation wherein: |
Schemas |
Aschema
where |
is a finite set ofelement
names |
is a finite set oftype
names |
is a finite set ofattribute
names |
is a finite set ofvalues
|
is the distinguishedroot
element |
is the set of regular expressions over
|
is thecontent modeling
function
|
is theattribution
function
|
is thetype declaration
function |
, is a partial order where
, and whenever
|
A type
isbasic
if there does not exist a distinct type
such that
.
thesignature
of
. |
Instances |
Aninstance
I of a schema S is a structure
where |
is a finite set ofnodes
|
is the distinguishedroot
node |
and
are discrete partial orders
such that if
and
then
and
are children of
(and siblings of one another), and either
(
is the elder sibling of
), or
(
is the elder sibling of
) |
|
there is no
such that
and
|
,
only if there exists a node
such that
and
|
where
|
A labeling, being an ordered 4-tuple, has a first component called itstag
, it is said to beimplicit
, and otherwiseexplicit
. The only nodes of an instance with a value component of
are the non-minimal nodes of the instance. The partial order
induces a tree on the nodes of I rooted at
. |
Basic validity |
An instance I is S=
-valid
(read "S-basic valid") with respect to a schema S iff all of its nodes have implicit types,
's tag is
and each node
is S=
-valid with respect to a schema S. A node
is S=
-valid with respect to a schema S iff either
, or
and |
Validitywith derivation |
The intent of derivation is to control the circumstances under which a tag might occur at a node where another kind of tag is expected.
,
, then
is a
-alternative
of
. Let
beall
of the
-alternatives of
, then
is the
-closure
of
. By extension, for
the
-closure of
(denoted
) is obtained from
by replacing each element in the latter with its
-closure. So any occurrence of the
above would be replaced with
.
,
, then
is a
-alternative of
. Let
beall
of the
-alternatives
, then the
-effective content model
of
is |
when
; |
when
. |
An instance I is
-valid
(read "
-derivation valid") with respect to a schema S iff
's tag is either
or a
-alternative of
, and each node
is
-valid with respect to a schema S. A node
is
-valid with respect to a schema S iff either
, or
and |
Validity with derivationand tolerance |
Recalling that
is a subset of
, consider a segmentation of
into a finite collection of segments
such that
. Since we are free to construct
and partition it any way we like, we could think of each of the segments as corresponding to a particular atomic derivation transformation on types. (In the next section we'll examine some particular kinds of atomic transformations.) Furthermore, we can abuse the notation of juxtaposition and consider composing the atomic transformations as relations,e.g.
. Indeed, we can consider regular expressions over atomic transformations, and note that in the usual notation for regular expressions
. |
Let S and I be as above. Suppose
is segmented into a finite collection of segments
such that
. Let
be a regular expression over the segments of
. Putting
in the place of
, we can define the
-alternatives, the
-closure, and the
-effective signature by analogy with the
-alternatives, the
-closure, and the
-effective signature above. An instance I is
-valid
(read "S -derivation valid tolerating transformations in the regular set
") with respect to a schema S iff
's tag is either
or an
-alternative of
, and each node
is
-valid with respect to a schema S. A node
is
-valid with respect to a schema S iff either
, or
and |
Modeling particular derivation transformations |
We now explore how some of the forms of modification described above fit into the formal model. Generally speaking, we proceed by taking a schema S and defining a new schema S' where the components of the latter structure are defined by making incremental changes to the components of the former, but leaving the root element the same in both structures. We will subscript the components of S and S' with their respective schema names,e.g.
and
. |
Adding a new tag |
From schema S with type
we create a new schema S' with a new element
: |
|
and
may now order types with content models that are regular expressions over
. |
Creating a new type by adding an attribute |
From schema S with type
we create a new schema S' having a new type
with an additional attribute
assigned from the subspace
: |
|
Creating a new type by restricting choices |
From schema S with type
we create a new schema S' having a new type
such that
is derived from
by restriction of choices (see the example in thederivation by transformation section):
|
|
Creating a new type by cyclic permutation |
From schema S with type
we create a new schema S’ having a new type
such that
is derived from
by cyclic permutation (see the example in thederivation by transformation section):
|
|
Conclusions |
| In the foregoing we have abstracted |
| In doing so we assume only that attributes have simple values, and that elements are either simple values or structures characterized by regular expressions. Indeed the model of types (in ) that we employ is essentially the extensional one traditionally studied by logicians. On the surface it would appear that this is not quite the same as the typical computer science model of types where two named types having the same type signature represent distinct types. It happens that we can encode the name of a type both in a regular expression and in an attribution, producing, in effect, distinct copies of the same structure. Using this device together with imposing various restrictions on the derivation partial order, we can reproduce the validation semantics of the XML Schema candidate recommendation. More importantly, however, we have a much richer notion of type derivation and validation. |
| While implementation is outside the scope of this paper, we should observe that validation, as presented, depends only on a set of regular expressions that can be statically constructed from the definition of a schema. Adding derivation and/or tolerance simply increases the size and/or number of the regular expressions. It is easy to imagine a compiler that undertakes such a construction. Validation at a node reduces to checking if a sequence of tokens associated with each of the child nodes is in the regular set (in ) defined by a particular regular expression. |
| Bibliography |
|
|
|
|
![]() |
XML Schema types and equivalence classes | Table of contents | Indexes | XML &, query languages | ![]() | |||