Online Publishing using Topic Maps. The Case of Quid Encyclopedia   Table of contents   Indexes   Creating Platform-Independent, Context-Sensitive User Assistance (On-Line Help) With an Enterprise-Wide SGML Publishing Platform

Bos, Lisa
Horsham
Reed Technology and Information Services
 USA 
 
Lisa W Bos
 Director, Content Management
Reed Technology and Information Services
  1 Progress Drive Horsham (Pennsylvania)  USA (19044)
Email: lisa.bos@reedtech.com Web site:http://www.reedtech.com
 Biography
 As Director of Content Management Systems for Reed Technology and Information Services, Lisa Bos oversees the technical staff responsible to design and develop content management systems within the Technology Solutions department, and ensures that Reed remains in step with the best technologies for the development of editorial systems. Her department consists of experts in database design, development, and administration as well as a team focused on the development of XML- and SGML-based solutions. Lisa has hands-on experience in designing and implementing content management systems, including business analysis, system design, data design, and data conversion. She is presently managing Reed's project to develop an XML- and Oracle-based content management system for Congressional Quarterly.
 

I've been to the W3C site and now I'm more confused than ever!

 XML is just one component of the overall set of technologies required to do certain kinds of software development, especially Web-oriented software development. Other technologies fill in the gaps, and they might be XML-based, or they might not. For example, RDF defines a way to represent metadata. In an application of RDF, XML might be used to encode that representation, or it might not. In addition, the same RDF information might be encoded as XML at one point in time, and then transformed into another form and then back again. Distinctions like these can be confusing to those responsible for the creation, management and publication of content.
 So, to clear up the confusion, let's step back a little.
 Lately, if you are interested in data (or content, or information, or whatever you choose to call it), then you are also inundated with ideas about how to use XML to do your job. However, you probably have a difficult time knowing which ideas are applicable to the problems you are trying to solve. There are some good reasons for that.
 First, you might be coming from any number of backgrounds. You could be:
 
  1.  An author or editor
  2.  A graphic artist, desktop publisher, Web designer, or another kind of design professional
  3.  A webmaster
  4.  A database developer, Web developer, or another kind of software developer
  5.  Other…
 Each of these people works with data, but, not surprisingly, the way in which they work with data and even the way in which they think about it varies dramatically. For the purpose of categorizing XML-related specifications, these people can be divided into two main groups: Content Creators andSoftware Developers . Look how different the kinds of tasks performed by those two groups can be (and this list is by no means complete):
 Content Creators
 
  1.  Create or edit documents/data
  2.  Index or link documents/data
  3.  Search documents/data
  4.  Design the appearance of a Web site and organize its content
  5.  Create graphics that are useful for print and Web outputs
 Software Developers
 
  1.  Design databases and document structures for capturing a specific kind of content
  2.  Ensure that a Web site is accessible to people using older browsers but still takes advantage of all the cool features available to readers using newer browsers
  3.  Ensure that the look and feel of your Web site can be changed overnight, and that new content can be added to it just as quickly
  4.  Integrate your internal systems with the systems of one of your suppliers by sending and receiving data over the Web
  5.  Convert large volumes of data from old formats into new formats and ensure that those formats have an indefinite shelf life
  6.  Or, in summary, design and build the software that enables Content Creators to do their jobs (often over the Web)
 For some really good reasons that will ultimately benefit all of us, XML and related specifications address each of these tasks and more. No wonder the XML world can be confusing-it would be unreasonable to expect most people to understand the technical details of specifications related to each of these tasks.
 Categorizing the specifications can help to avoid some of the confusion. This in itself is a challenge, since most specifications can be classified in multiple ways. But, the following category list is a useful start.
 
Category Examples Who's Interested
Markup and Markup Construction Character sets and special characters Software developers
HTML/XML/XHTML
XML Fragment Interchange
Data Interfaces and Models DOM Software developers
RDF
XML Schema
Stylesheets Formatting languages like CSS
 Software developers
 Content creators (for today)
Transformation languages like XSLT
Grammars MathML
 Software developers
 Content creators
SVG
PICS
 You probably noticed that content creators don't show up in this table very often. While content creators will always need a high-level understanding of the pros and cons of various technical options, there is little reason they should be required to understand the technical "guts" of a particular solution in order to be able to do their jobs. For example, a graphic artist should not be concerned with the XML tags used to create an image. However, he will be interested in whether a particular graphic file format is lightweight, can be sized without a reduction in quality, and is useful to both Web and print products. Similarly, an author should be excited about XML because it is portable among all the applications she wants to use or because it faciliates repurposing of her data—not because she thinks those angle brackets are really cool.
 Today, of course, many content creators are forced to worry about these things—to do the job of a software developer. This is because XML, related specifications, and the applications that support them are still evolving, and because conversations about technical nuts and bolts are regularly mixed up with conversations about grammars for specific kinds of content. If the software developers do their jobs right, then in the future more and more of the nuts and bolts will live invisibly under the hood of XML applications, and content creators won't have to think about them unless they really want to.
 

Who manages the development of XML-related standards?

 The development of XML and many other XML-based specifications is managed by the World Wide Web Consortium or W3C. However, individuals, businesses, and other organizations are developing additional specifications you might be interested in. This is especially true of grammars (DTDs) that are appropriate for specific kinds of content.
 This presentation sticks to the list of specifications being developed by the W3C. A complete list of these can be found at http://www.w3c.org/tr . Most W3C specifications have abstracts and introductions that provide an easy to understand overview of the specification objectives and contents. Many also include a history section and links to useful tutorials on other sites.
 If you are interested in additional specifications and XML generally, Robin Cover's site (on the OASIS site) is a great place to start. See http://www.oasis-open.org/cover/xml.html .
 

W3C process

 Before we get started, you should be aware that the W3C doesn't produce "standards", it produces technical reports or specifications. These have a life cycle of the following stages, and "Recommendation" is as high as a specification ever gets. These descriptions are from the W3C web site.
 
Notes A Note is a dated, public record of an idea, comment, or document. Members wishing to have their ideas published at the W3C site as a Note must follow the Submission process.
Working Drafts A working draft represents work in progress and a commitment by W3C to pursue work in this area. A working draft does not imply consensus by a group or W3C.
Proposed Recommendations A Proposed Recommendation is work that (1) represents consensus within the group that produced it and (2) has been proposed by the Director to the Advisory Committee for review.
Recommendations A Recommendation is work that represents consensus within W3C and has the Director's stamp of approval. W3C considers that the ideas or technology specified by a Recommendation are appropriate for widespread deployment and promote W3C's mission.
 

Common principles in W3C specifications

 
  •  None of the specifications is a solution by itself. They are all components that can be combined to address a specific need. Most of them reference one or more other specifications.
  •  Many of the specifications describe structures that are themselves expressed in XML. For example, an XSL instance is an XML document.
  •  Some specifications are abstract building blocks to other specifications. For example, RDF describes a way to represent metadata, but not any specific kind of metadata. P3P is a specification that defines a specific set of metadata using RDF.
 

The related specifications

 The first thing you'll notice about the W3C specifications is that there are a lot of them, and that many interdependent specifications are at very different places in the W3C life cycle.
 Only a small number of the current W3C Notes are listed here.
 

Markup and Markup Construction

 
Subcategory Specification Status
Character Sets Character Model for the World Wide Web Working Draft in development
Unicode in XML and other Markup Languages Working Draft in development
HTML HTML 4.0 Specification Recommendation
HTML 4.01 Specification Proposed Recommendation
XML Canonical XML Working Draft in development
Extensible Markup Language (XML) 1.0 Specification Recommendation
XML Fragment Interchange Working Draft in development
XHTML XHTML™ 1.0: The Extensible HyperText Markup Language - A Reformulation of HTML 4.0 in XML 1.0 Proposed Recommendation
XHTML™ 1.1 - Module-based XHTML Working Draft in development
Building XHTML™ Modules Working Draft in development
Modularization of XHTML™ Working Draft in development
XHTML™ Document Profile Requirements Working Draft in development
Namespaces Namespaces in XML Recommendation
Stylesheet Pointers Associating Style Sheets with XML documents Recommendation
XLink XML Linking Language (XLink) Working Draft in development
XPath XML Path Language (XPath) Version 1.0 Proposed Recommendation
XPointer XML Pointer Language (XPointer) Working Draft in development
 

Data Interfaces and Models

 
Subcategory Specification Status
DOM Document Object Model (DOM) Level 1 Recommendation
Document Object Model (DOM) Level 2 Specification Working Draft in Last Call
RDF Resource Description Framework (RDF) Model and Syntax Specification Recommendation
Resource Description Framework (RDF) Schemas Proposed Recommendation
XML Information Set XML Information Set Working Draft in development
XML Schema XML Schema Part 1: Structures Working Draft in development
XML Schema Part 2: Datatypes Working Draft in development
 

Stylesheets

 
Subcategory Specification Status
CSS Behavioral Extensions to CSS Working Draft in development
Cascading Style Sheets (CSS1) Level 1 Specification Recommendation
Cascading Style Sheets, level 2 (CSS2) Specification Recommendation
Color Profiles for CSS3 Working Draft in development
CSS Namespace Enhancements (Proposal) Working Draft in development
CSS3 module: W3C selectors Working Draft in development
Multi-column layout in CSS Working Draft in development
Paged Media Properties for CSS3 Working Draft in development
XSL Extensible Stylesheet Language (XSL) Working Draft in development
XSL Transformations (XSLT) Version 1.0 Proposed Recommendation
 

Grammars

 
Subcategory Specification Status
Digital Signatures XML-Signature Requirements Working Draft in development
Ecommerce Common Markup for micropayment per-fee-links Working Draft in Last Call
Forms XHTML™ Extended Forms Requirements Working Draft in development
Graphics Scalable Vector Graphics (SVG) Working Draft in Last Call
Scalable Vector Graphics (SVG) Requirements Working Draft in development
WebCGM Profile Recommendation
Math Mathematical Markup Language (MathML™) 1.01 Specification Recommendation
Multimedia Synchronized Multimedia Integration Language (SMIL) 1.0 Specification Recommendation
Synchronized Multimedia Integration Language (SMIL) Boston Specification Working Draft in development
Privacy/Ratings A P3P Preference Exchange Language (APPEL) Working Draft in development
PICS 1.1 Label Distribution -- Label Syntax and Communication Protocols Recommendation
PICS 1.1 Rating Services and Rating Systems -- and Their Machine Readable Descriptions Recommendation
PICS Signed Labels (DSig) 1.0 Specification Recommendation
PICSRules 1.1 Specification Recommendation
Platform for Privacy Preferences (P3P) Specification Working Draft in development
Ruby Ruby Working Draft in development
Voice Browser/Mobile Access WAP Binary XML Content Format Note
 

Details about some of the specifications

 

Markup and Markup Construction

 

HTML, XML, and XHTML

 HTML and XML are stable but still evolving. XHTML is the next step in that evolution. To quote the W3C: "XHTML 1.0 is the first major change to HTML since HTML 4.0 was released in 1997. It brings the rigor of XML to Web pages and is the keystone in W3C's work to create standards that provide richer Web pages on an ever increasing range of browser platforms including cell phones, televisions, cars, wallet sized wireless communicators, kiosks, and desktops."
 

Namespaces

 The XML Namespace specification provides mechanisms for multiple tag sets to be used together without collisions from duplicate names (if, for example, both sets of tags you want to combine contain a "name" element, but the two elements have very different meanings and content models). Namespaces also provide a means to identify (point to) the definition for each tag set that's used.
 

XPath, XPointer, and XLink

 XPath describes a way to address parts of an XML document. It uses a non-XML syntax that can be embedded within URIs and XML attribute values. XPath has a subset that can be used for pattern matching tests in XSLT.
 XPointer specifies a language for pointing into the structures (e.g. elements, strings) of XML documents. It makes use of the XPath specification.
 XLink defines constructs for explicitly linking from XML documents to other XML documents and Internet resources. Think of this as a very evolved version of the HTML <A> tag. In that analogy, XPointer would describe how to build the HREF attribute value.
 

Data Interfaces and Models

 

DOM

 The Document Object Model is an API. It describes HTML or XML documents as trees and how those trees can be accessed and manipulated by software. This allows the dynamic processing of documents—for example, on the fly changes to the document layout as displayed in a browser.
 For the geeks, here's a quote from the W3C: "The Document Object Model, despite its name, is not an object model in the same way as the Component Object Model (COM). The COM, like CORBA, is a language-independent way to specify interfaces and objects; the Document Object Model is a set of interfaces and objects designed for managing HTML and XML documents. The DOM may be implemented using language-independent systems like COM or CORBA; it may also be implemented using language-specific bindings like the Java or ECMAScript bindings that we define."
 

RDF

 RDF is the Resource Description Framework, a means of representing metadata about a resource (e.g. a document or a user) in the form of properties and property values. RDF is not necessarily encoded with XML; however, it is reliant on certain other technologies related to XML, such as those for dealing with internationalization. When using RDF, you create your own set of properties and values—RDF itself does not contain a list of such sets, it just explains how they should be created.
 

XML Schema

 The XML Schema activity has been divided into two areas: Structures and Datatypes.
 The easiest way to describe XML Schema Part 1: Structures is to say that it defines a replacement for DTDs, and that it uses an XML syntax rather than the markup defined in XML 1.0 document type definitions (DTDs). To be more accurate, Structures defines a superset of what's possible with DTDs.
 XML Schema Part 2: Datatypes describes a way to use datatypes in XML Schemas. This will enable applications to validate the content of XML elements and attributes in ways that are critical to data management and that are not possible with DTDs.
 

Stylesheets

 Both CSS and XSL allow style definitions to be captured separately from the documents they are associated with. This provides a number of benefits, including easier maintenance, greater consistency, greater flexibility (the ability to switch among stylesheet dynamically), and easier document authoring.
 

CSS

 Cascading Stylesheets Level 1 was first approved as a recommendation in 1996. Since then, it has been evolving to support more complex rendering rules, including print and voice styling. CSS Level 2 was made a recommendation in 1998.
 CSS can be used with both HTML and XML, although neither Netscape and Internet Explorer fully support all its capabilities.
 

XSL and XSLT

 The Extensible Stylesheet Language is an XML vocabulary for creating stylesheets that can format and transform XML documents. XSL is an extremely powerful means to manipulate documents, and is therefore complex and has been difficult to finalize. While the complete specifications is still a working draft, XSLT (XSL Transformations), a subset of XSL, is now a proposed recommendation with a number of implementations. At first glance, transformations might appear more complicated than formatting, and so it could seem odd to finalize the transformation specification first. However, since XSLT can be used to dynamically produce other document formats (including HTML) from XML, it is actually of more practical short term benefit than the formatting language would be.
 

Grammars

 

Graphics

 There are two types of graphics formats being discussed by the W3C right now. The first, Scalable Vector Graphics (SVG), is a vector format with an XML syntax. Vector images are small in file size, making SVG well-suited for the Web. Because it is an XML-based format, DOM-aware software will be able to build and manipulate SVG graphics on the fly. SVG has gotten support from a number of industry groups and companies. Corel is currently offering a beta export filter to SVG from CorelDRAW.
 The second format is a Web "profile" of CGM. CGM is a complex ISO standard commonly used in engineering disciplines. CGM images can be a composite of vector and raster images.
 

Math

 MathML is an XML syntax for describing mathematical content. While current browsers do not support MathML or stylesheets for rendering it, plugins are becoming available.
 

Multimedia

 The Synchronized Multimedia Integration Language (SMIL) describes a syntax for timing and synchronization information. This kind of information is necessary for describing multimedia content such as training programs and animation. SMIL Boston is an XML equivalent to SMIL.
 The timing and synchronization components of SMIL Boston are expected to be useful in other XML applications like XHTML.
 Note: This information is up-to-date as of mid-October, 1999.

Online Publishing using Topic Maps. The Case of Quid Encyclopedia   Table of contents   Indexes   Creating Platform-Independent, Context-Sensitive User Assistance (On-Line Help) With an Enterprise-Wide SGML Publishing Platform