STEP/SGML harmonization - Data Architecture or Product Documentation?   Table of contents   Indexes   Achieving Individualized, Timely Web Delivery

 
 

W3C's Resource Description Framework Schemas: DTDs for the 21st Century


 
David   Singer
  Senior Technical Staff Member
  IBM Internet Division
650 Harry Road
San Jose   California  95120  USA
Phone: +1 408 927 2509
Fax: +1 408 927 4073
Email: singer@almaden.ibm.com
 
Biographical notice:
 
David Singer
 
David Singer is the chair of the W3C  (World-Wide Web Consortium) 's Resource Description Framework Schemas Working Group. He is also a member of the W3C Advisory Committee and is active in the development of the Document Object Model. He is one of the authors of "Rating Services and Rating Systems (and Their Machine Readable Descriptions)", one of the Recommendations defining the PICS  (Platform for Internet Content Selection) .
IBM Tokyo Research Laboratory
 Japan 
 Tokyo 
Uramoto, Naohiko
 

Naohiko   Uramoto
  Research Staff Member
  IBM Tokyo Research Laboratory
5-19 Sanbancho
Chiyoda-ku
Tokyo   Japan  102
Phone: +81 462 73 4564
Email: uramoto@jp.ibm.com
 
Biographical notice:
 
Naohiko Uramoto
 
Naohiko Uramoto is a Research Staff Member at the IBM Tokyo Research Laboratory. He is currently doing research and development of XML tools, digital signatures for XML documents, and publish/subscribe middleware for Java. He is currently a member of the RDF Schemas and RDF Model and Syntax Working Groups. He has also done work in the areas of natural language processing, machine translation, information retrieval, and digital libraries.
 
ABSTRACT:
 
TheRDF  (Resource Description Framework) is currently being developed in two Working Groups at the W3C . RDF provides a language for representing metadata on the Web, allowing greater expressiveness, precision, and machine-assisted computation than today's ad hoc techniques. This language can be expressed in multiple forms; the common interchange form uses XML. RDF emphasizes facilities to enable automated processing of Web resources; in particular, RDF Schemas describe metadata in much the same way as DTDs describe documents, though RDF Schemas can include semantic information, not just structural information.
 
At the time this paper was submitted for the proceedings, the RDF specifications had not yet been completed. This paper is based on the most current working drafts available to the authors at deadline; however, there will be changes between those drafts and the final specifications. You should consult the RDF web page (http://www.w3.org/RDF) for later information. You can obtain the current version of the RDF Model and Syntax specification at http://www.w3.org/TR/WD-rdf-syntax, and the current version of the RDF Schema specification at http://www.w3.org/TR/WD-rdf-schema.
 
 

What is RDF?

 
RDF provides a vocabulary and grammar for expressing metadata on the World-Wide Web. It has been designed to facilitate interoperability of applications which generate and process machine-understandable representations of data about resources on the Web; because RDF statements are not primarily intended to be directly read or written by human beings, its syntax is designed for precision rather than concision. RDF provides the framework to transform the Web from a collection of data to a machine-processable repository of information.
 
RDF metadata can be used in a variety of application areas; for example: in resource discovery to provide better search engine capabilities; in cataloging for describing the content and content relationships available at a particular Web site, page, or digital library; by intelligent software agents to facilitate knowledge sharing and exchange; in content rating; in describing collections of pages that represent a single logical "document"; for describing intellectual property rights of Web pages, and in many others. RDF with digital signatures will be key to building the "Web of Trust" for electronic commerce, collaboration, and other applications.
 
 

The RDF Data Model

 
The core of RDF is its data model. The data model, which can be represented as 3-tuples, as a graph, or in XML, allows one to represent named properties (such as "Author", "Phone Number", or "Title"), their values (such as "David Singer", "+1 800 555 1212", or "Chair"), and resources (such as "the RDF Schema Working Group" or "http://www.w3.org/TR/WD-rdf-schema"). The model also allows one to make statements about resources (such as "David Singer is the chair of the RDF Schema Working Group"), and to then use those statements as objects of other statements (such as "Ralph Swick believes that "David Singer is the chair of the RDF Schema Working Group'). In other words, the value of a property can be an RDF statement, as well as being a string or a resource.
 
The ability to make statements about statements, as well as statements about resources, provides RDF with the expressiveness needed to meet its goals. It allows the user of RDF to represent arbitrarily complex directed graphs. In contrast, earlier Web-based metadata schemes limited the scope of expressiveness. The HTML META element (http://www.w3.org/TR/REC-html40/struct/global.html#edef-META) allows arbitrary keyword/value pairs, but does not provide mechanisms for interrelating them to create more complicated structures. The Platform for Internet Content Selection (http://www.w3.org/TR/REC-PICS-services-961031) allows statements to be made about Web pages, but those statements are limited to quantifiable statements (for example, "this Web page has a security rating of 6.3 on a 1-to-10 scale" is legal, but "David Singer is the author of this page" is not expressible in PICS ), and again, there is only a limited mechanism for making statements about statements.
 
The RDF data model provides some additional basic concepts, such as collections of nodes (nodes are resources, objects, or RDF statements), as well as primitive data types (such as string).
 
 

The RDF Schema

 
The RDF Schema (http://www.w3.org/TR/WD-rdf-schema) provides a mechanism to assist in the creation of consistent statements about resources. The RDF Schema specification is a "meta-schema" - it is used to define other schemas, in the same way as XML is used to define application-specific DTDs. All RDF schemas are built on the data model of RDF, and in fact the RDF Schema is expressible in the RDF data model.
 
The underlying concepts of the RDF Schema are described in the following sections.
 
 

Class

 
This corresponds to the generic concept of a "type" or a "category". Classes represent kinds of resources, such as web pages or people.
 
 

instanceOf

 
This is a relationship between a resource and a Class, indicating that the resource is a member of the Class. A resource may be an instanceOf more than one Class.
 
 

subClassOf

 
This is a relationship between two classes, indicating that one Class is a subclass of another. A Class may be a subClassOf more than one Class. This is a transitive relationship: If "x" is an instanceOf "B", and "B" is a subClassOf "A", then "x" is also an instanceOf "A".
 
 

Resource

 
This is the class of things that RDF can describe.
 
 

PropertyType

 
This is the class of properties that RDF can ascribe to Resources.
 
 

Primitive Data Types

 
There are a number of classes whose instances are primitive objects, such as strings, whole numbers, and dates.
 
 

Constraints

 
The RDF Schema allows one to place constraints on the use of a PropertyType. These constraints help express the meaning of the PropertyType; for example, by constraining the kind of object that can have a property, one can avoid nonsensical statements like "the color blue weighs five kilograms." There are several constraints available in the RDF Schema; they are all subclasses of PropertyType, and so are PropertyTypes in their own right. We have:
 
  • range: This constrains a PropertyType so that its values must be instancesOf a particular Class. As an example, if the range of the PropertyType "age" is "Integer", then any use of the "age" PropertyType must have an Integer value (and not a date, a float, or a color). A PropertyType may have at most one "range" property (but need not have any).
  • isFrom: This constrains a PropertyType so that its values must be members of a Collection. As an example, one might constrain a "maritalStatus" PropertyType to be one of the collection (Married, Single, Divorced, Widowed).
  • allowedPropertyType: This constrains a Class by specifying the allowable PropertyTypes for instancesOf the Class and its subclasses.
  • necessity: This constrains the number of occurrences of a specific PropertyType. It isFrom the collection (OccursExactlyOnce, Necessary, OptionalAtMostOnce, and Optional). "Necessary" means that the property must occur at least once and may occur any number of times; "Optional" means that the property need not occur but may occur any number of times. "OccursExactlyOnce" and "OptionalAtMostOnce" are self-explanatory.
 
 

Comments

 
One can add human-readable comments to schemas using this property. This property should be freely used!
 
 

Collections

 
There are classes defined for "Collection", "Bag" (multi-set), "List", and "Alternatives".
 
 

Deploying RDF and RDF Schemas

 
Like PICS ratings, RDF expressions can be included in the document to which they pertain, be transported in HTTP headers, or be provided "out-of-band" as separate documents. If RDF expressions are to be included in the relevant document, care must be taken if the document author wishes to preserve the capability of performing XML DTD validation on the document.
 
In XML documents, RDF will use the namespace mechanism to be proposed by the XML Working Group (see http://www.w3.org/XML/Activity.html).
 
HTML versions through 4.0 do not allow one to include arbitrary XML content (such as RDF expressions) in an HTML document without potentially exposing that content to the rendering agent. The RDF Model and Syntax specification proposes interim ways to avoid this problem, and we hope that the next revision of HTML provides a permanent solution.
 
In general, RDF Schemas are expected to be standalone documents. There is a strong desire to allow the author of a schema to mix the formal definition (in RDF) with human-readable information about the schema; if the human-readable information is in HTML (as might often be the case), we have the problem of mixing XML and HTML again.
 
It is interesting to note that, except within an RDF Sequence, the order of RDF and RDF Schema elements in the XML data stream has no effect on the meaning of the RDF statements.
 
 

A sample RDF Schema

 
This is a sample RDF Schema that describes a person. The details of the syntax will almost certainly have changed between the submission of this paper and the publication of the RDF Schemas specification; consult the specification for the correct syntax. For purposes of brevity, we are assuming that namespace defaulting is available, and so the elements in this example are not qualified with a namespace identifier.
 
<Class id="Person">
  <description>Class for representing people. Instances correspond
    to a single person.</description>
  <subTypeOf href="#Animal"/>
  <allowedPropertyType>
    <PropertyType id="age">
      <range href="#Integer"/>
    </PropertyType>
  </allowedPropertyType>
  <allowedPropertyType>
    <PropertyType id="ssn">
      <range href="#Integer"/>
      <neccesity href="#OccursExactlyOnce"/>
    </PropertyType>
  </allowedPropertyType>
  <allowedPropertyType>
     <PropertyType id="martialStatus">
       <neccesity href="#OptionalAtMostOnce" />
       <isFrom>
          <Collection>
              <LI id="Married"/>
              <LI id="Divorced"/>
              <LI id="Single"/>
              <LI id="Widowed"/>
          </Collection>
       </isFrom>
     </PropertyType>
  </allowedPropertyType>
</Class>
 
Translating this schema into English gives us this:
 
There is a Class named "Person", which is the class for representing people. Instances of this class represent a single person.
 
"Person" is a subclass of "Animal" (which is not included in this example).
 
The properties that a "Person" may have are:
 
  • "age", which is an integer
  • "ssn", which is an integer, is required, and can only occur once
  • "martial status", which is one of "Married," "Divorced," "Single", or "Widowed"; only one marital status is allowed.
 
In addition, any properties allowed for an "Animal" are also allowable for a "Person". In the example following, we assume that "name" is an allowable property of an "Animal".
 
 

A sample RDF file

 
Here is a description of a person using the schema above:
 
<?xml:namespace name="http://www.w3.org/RDF" as="RDF"?>
<?xml:namespace name="http://hypothetical.net/#person" as="WHO"?>

<RDF:RDF xml:lang="en"/>
  <RDF:Description id="Number6"/>
  <RDFS:instanceOf href="http://hypothetical.net/#person"/>
    <WHO:name>Number Six</WHO:name>
    <WHO:age>45</WHO:age>
    <WHO:ssn>6</WHO:ssn>
  </RDF:Description>
</RDF:RDF>
 
 

RDF implementations

 
Since RDF has not yet stabilized, the only implementations are experimental in nature. One such implementation is "Reggie - the Metadata Editor" (http://metadata.net/dstc/) from the Resource Discovery Unit of the Distributed Systems Technology Centre in Australia (http://www.dstc.edu.au/).
 
 

Conclusion

 
RDF provides the framework for interoperable, machine-processable metadata on the Web. Because it provides a well-defined and consistent data model, it allows the user to make arbitrarily complex statements about resources (which can include any object that can be referred to by a Universal Resource Identifier (URI)). Because the data model is self-contained, RDF can be used to make statements about other RDF statements, allowing inferences to be drawn and other reasoning to be performed. RDF is transported as XML elements, which can be included in XML documents or provided as separate documents.
 
 

Acknowledgements

 
The authors thank the members of the RDF Model and Syntax and RDF Schemas Working Groups for their efforts in creating and documenting RDF. We especially thank R. V. Guha, Ora Lassila, Eric Miller, and Ralph Swick for their comments and suggestions on drafts of this paper.
 
 

References

 
RDF Web page: http://www.w3.org/RDF
 
RDF Model and Syntax: http://www.w3.org/TR/WD-rdf-syntax
 
RDF Schema: http://www.w3.org/TR/WD-rdf-schema
 
HTML 4.0 Recommendation: http://www.w3.org/TR/REC-html40/struct/global.html
 
Rating Services and Rating Systems (and Their Machine Readable Descriptions): http://www.w3.org/TR/REC-PICS-services-961031
 
XML Activity Statement: http://www.w3.org/XML/Activity.html
 
W3C Workshop on the Future of HTML: http://www.w3.org/MarkUp/future/
 
Reggie - The Metadata Editor: http://metadata.net/dstc/

STEP/SGML harmonization - Data Architecture or Product Documentation?   Table of contents   Indexes   Achieving Individualized, Timely Web Delivery