Preparing intelligent graphics for interactive catalogs   Table of contents   Indexes   Application Solution for the Graphic Arts Industry

 
 

The Web Document API


 
Lauren   Wood
  Technical Product Manager
  SoftQuad, Inc.
108-10070 King George Hwy
Surrey   British Columbia  Canada  V3T 2W4
Phone: +1-604-585 8394
Fax: +1-604-585 1926
Email: lauren@softquad.com Web: http://www.softquad.com
 
Biographical notice:
 
Lauren Wood
 
Lauren Wood is Technical Product Manager at SoftQuad, Inc, one of the leading vendors of Internet, Intranet and SGML document solutions. She plays a major role in the design of SoftQuad's authoring tools, as well as taking part in various technical committees, such as the W3C XML Working Group. She chairs the W3C Document Object Model Working Group.
 
Prior to starting at SoftQuad, Lauren was an SGML  (Standard Generalized Markup Language) consultant at STEP Stürtz Electronic Publishing GmbH in Germany. Her specialties were document and DTD design for publishing houses and consulting on SGML document management systems, particularly for the aerospace industry.
 
Lauren holds a PhD in theoretical nuclear physics from the University of Melbourne, Australia.
 
ABSTRACT:
 
"The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of HTML and XML documents. The document can be further processed and the results of that processing can be incorporated back into the presented page."
 
This sentence is taken from the pages at the W3C  (World Wide Web Consortium) site that discuss the work being done by the DOM  (Document Object Model) Working Group. This group is working hard to standardize the various ways of accessing HTML  (Hypertext Markup Language) and XML  (eXtensible/Extensible Markup Language) documents that exist, from JavaScript and applets to the various vendor-dependent command language interfaces. The group consists of representatives from many of the companies one would expect, from both theHTML andSGML /XML communities. This talk will present an overview of the current specifications, what has been done, and what yet remains to be specified. The latest specification of theDOM will always be found on theW3C site, at .
 
 

What can I do with theDOM ?

 
Question: What do Microsoft Word, Netscape Navigator, and SoftQuad Author/Editor have in common?
 
Answer: You can add functionality to each of them, to do some special processing that is particular to the way you work, or the data you have.
 
Wouldn't it be nice if … you could write the same client application that would work the same in each tool you used?
 
This is part of the dream involved in the Document Object Model. It's something the SGML community has been dreaming of for years. The HTML community has been dreaming of truly dynamic documents, documents that change themselves when the user's mouse passes over part of the document, documents that process data according to the user's input, documents that show the user only the desired information. The accessibility community has been dreaming of tools that can get the structure of a document and pass only the interesting part of the document to a screen reader or braille tool, where the user defines what "interesting" really means.
 
The group designing the DOM has all of these dreams.
 
Question: What is "dynamic HTML "?
 
Answer: That depends on who has used the term. "Dynamic HTML " is a marketing term which means different things to different vendors. It's usually some mixture of HTML elements, CSS styles, and scripting.
 
Wouldn't it be nice if … the same scripts worked in all the tools in the same way?
 
The charter of the Document Object Model is to specify an interface for use with HTML , XML and CSS  (Cascading Style Sheets) in a way that will be useful to the application developers and Web page authors, and that will be interoperable. The basics of the script or application will be the same whether you use JavaScript, VBScript, Java, or Scheme. The interface should be flexible enough to allow for uses that we can't yet see, as well as those that we already know. We already know the world has started using "dynamic HTML " in various flavours: to do client-side manipulation of the page, to get information from a database onto the page, to tell the reader of the page when some form input field has been incorrectly filled out. Up until now, only the most basic scripts have been interoperable in the two major HTML browsers. This puts a great burden on Web authors to make sure that their scripts work on all platforms and all browsers. In designing a general interface, we're putting the burden onto the browser and editor vendors. If they support the DOM , applications written using this interface will work without vast amounts of code to find out which browser the reader is using.
 
There are two classes of users that we have been thinking of when designing this specification. There are script authors, who typically will use JavaScript to do cool things to a page that the script is part of, and client application developers, who typically will use Java or C++ (depending on which language the hosting DOM implementation supports) to write an application that will work on all pages of a certain type. Netscape Navigator plugins and Microsoft ActiveX controls are examples of these types of client applications. Up till now the browser client application has typically had to do all the work itself - it hasn't been able to ask the hosting browser to render part of the document, or find out what the user settings are. This seems a waste - if the browser has the information, why can't the client application get at it? Part of the DOM work will be to define a way to get at this information that satisfies security and privacy considerations. Thus a client application won't have to duplicate all the work that the hosting browser, editor, … has already done, but will be able to pass off part of the document for manipulation or rendering.
 
As was to be expected when designing an interface for use by two very different groups of people (script authors and application developers) we need to be very careful about terminology. The host that allows a client DOM application to work is called the DOM implementation. What most SGML people would call the parse tree, even though it often isn't really a pure tree, is called the structure model in the DOM , so as not to confuse people who think maybe it would have to be implemented as a tree if we call it that.
 
 

General Requirements

 
Listed below are some of the general requirements of the Document Object Model interface (taken from ).
  1. The Object Model is language neutral and platform independent.
  2. There will be a core DOM that is applicable to HTML, CSS and XML documents.
  3. The Object Model will not preclude use by either agents external to the document content, or scripts embedded within the document.
  4. A visual UI component will not be required for a conforming implementation of the Object Model.
  5. The specific HTML, CSS or XML document object models will be driven by the underlying constructs of those languages.
  6. It must be possible to read in a document and write out a structurally isomorphic document.
 
 
We also have a number of reference applications. The most obvious are graphical browsers and editors, but we also take into account speech-based browsers, and other applications that don't need a graphical user interface. The different applications have very different needs, ranging from small (a few kilobytes) HTML pages to potentially large (hundreds of megabytes) XML documents. In other words, we want to make it possible for all sorts of script authors and programmers to use the DOM to access a document and do useful work with it.
 
 

Designing the DOM

 
The DOM does not include a parser, either for HTML or XML . The DOM only acts after the parser has read the document into whatever internal representation it uses. It may, of course, start acting on the document while the rest of the document is still being parsed, as otherwise this would mean the application would have to wait until the entire document had been read in to the internal representation before being able to start work. This is not generally acceptable.
 
We call this internal representation the structure model, although it is often called the parse tree. A complete hosting implementation of the DOM includes an HTML or XML parser, the DOM , and whatever other implementation-specific functions exist. So the hosting implementation provides the interfaces to the client application. The DOM itself therefore doesn't worry about how a document might be written out to disk - that's up to the rest of the implementation to allow or disallow. It doesn't worry about how an implementation infers which tags are missing in HTML , it knows that the tags are there. And most importantly, the DOM doesn't determine how functions should be implemented, or what the structure model looks like. Each implementation is free to choose the language of implementation and make its own decisions about basic structures. A DOM-compliant implementation must implement the interfaces with the semantics as specified, but need not implement any given underlying model. It's possible, of course, that some underlying models will make implementation of a given function in the interface easier, or make the performance better.
 
In other words, the DOM is the interface that sits between the structure model and the client application that wants to do something to or with the document that got parsed into the structure model. The client application uses the DOM interfaces that the hosting implementation provides to do something useful (or not) with the document.
 
There are always a lot of trade-offs when designing an interface like this. Most of the time it is best to provide only basic functions and methods, which allow more complicated convenience functions to be built on top of them. The more that is define, the more interoperable the final solution is, but the less that is defined, the easier it is to get the semantics right. Defining less is easier for those implementing the hosting implementation, but defining more makes it easier for those writing the clients. In general the DOM WG tried to keep the interfaces as small as possible, while still favouring those writing client applications (including scripts) over those writing hosting implementations.
 
 

The DOM Level 1

 
The DOM is organised into sections, which we call levels. Level 0 is defined as the functionality exposed by Microsoft Internet Explorer 3.0 and Netscape Navigator 3.0. Level 1 is the first level that the DOM group is defining, and provides the foundation for future work.
 
The DOM Level 1 defines document structure, navigation, and manipulation, as well as basic object types such as Nodes. Level 1 does not deal with style information, but with HTML and XML documents. The requirements include being able to navigate to all elements from any other element, being able to access all elements and attributes, and being able to add, remove and change elements and attributes. This may be subject to the DTD so that the application can maintain validity, of course, so the DOM client needs to be able to read the DTD if this is appropriate. It may also be subject to some security considerations. We haven't yet tackled a full security model including ways to restrict who reads the content of a given element, but that will have to be done.
 
HTML and XML are different in some important ways, which influences the design of the specification. The most obvious way in which they differ is that HTML has a fixed DTD . (Well, that's the theory). So the HTML DOM can assume HTML 4.0 is the DTD , and build in convenience functions that rely on the fact that the user and the hosting implementation both know that IMG is an image, and B is bold. Implied tags can be inferred even without a DTD , because we know what they are. XML applications may need to validate according to a given DTD , which most HTML applications don't do. The solution was to define a core model, with extra interfaces defined for XML and for HTML .
 
 

Core, HTML and XML

 
The framework of the structure model was based on the SGML property set. We changed some things to make it easier for application developers, and to take into account the differences in HTML and XML from full SGML . For example, comment declarations in full SGML may contain more than one comment. In XML they may not, and in HTML current practice is that they do not. Hence we decided to allow only one comment in a comment declaration in the DOM . Another example is that most HTML applications do not take the DTD  (Document Type Definition) into account, in fact very few of them even read a DOCTYPE declaration. So the HTML model assumes HTML 4.0, as specified by the W3C , rather than relying on the DOCTYPE. In XML the DOM application should use the DTD information that exists.
 
 

Core

 
The core model contains all the things you would expect - nodes, node iterators, functions to navigate the structure model, constructs that are common to HTML and XML . So comments are in the core model, but parameter entities are in the XML model, because they aren't used in HTML . Examples of core functionality include being able to navigate from an element to its children and being able to find out the attributes that are defined and their values for a given element. Basic tree-walking functions such as getNextSibling are also part of the core model.
 
 

HTML

 
The HTML Level 1 contains the convenience functions that can be written because we know what the DTD is. The current varying implementations of "dynamic HTML " have given us some idea of what script authors want to be able to do for HTML pages, and what should be easy for them to do. Some of these functions can be generalised to, and will be useful for, XML , and we are working on that. So the HTML specification allows you to easily get a list of all the IMG elements in a document, for example, whereas the core specification will have a more general query interface that may not be as easy for the script author to use.
 
 

XML

 
The XML Level 1 contains mostly functions for finding information in the DTD . No DTD manipulation yet; that will be coming later. But you can ask what the content model is for an element, or the allowed values for an attribute. There are also some XML constructs that do not in practice show up in HTML , such as CDATA sections, which therefore are in the XML and not the core specification. If they start to be used in HTML , they would of course be made available in the core specification.
 
 

Future Work

 
The DOM work has a long way to go; we've really just started with the basic framework. What we have done up until now is the foundation of future work, and as such it is important to get it right. Now we can move on to other aspects of the DOM , the other requirements that we have to fulfil. These include a style sheet object model, an event model, ways to manipulate the DTD , and a better security model. As we work, we will be making more chapters of the DOM Book available for public comment. Please let us know what we're doing right, and what we're doing wrong. Public comments can be sent to www-dom@w3.org; to subscribe send email with the subject line "subscribe" to www-dom-request@w3.org. We're looking forward to seeing people use what we've done.
 
Acknowledgments
    Disclaimer: Although the companies mentioned in this talk are taking part in the DOM process, their products have been mentioned for illustrative purposes only and this talk does not constitute a commitment that any of these products will in fact support the interfaces specified by the DOM .
 
Bibliography
The DOM Book
http://www.w3.org/TR/WD-DOM
The DOM requirements
http://www.w3.org/TR/WD-DOM-requirements

Preparing intelligent graphics for interactive catalogs   Table of contents   Indexes   Application Solution for the Graphic Arts Industry