David W. Cooper - HyMID Relationships   Table of contents   Indexes   Lois Delcambre - Structured Maps

Eliot Kimber - Property Sets and Groves
 Grove  
 Kimber, W. Eliot  
Passage Systems Inc.
 Property Set 
 
Kimber  W. Eliot Senior SGML Consultant and HyTime Specialist  Passage Systems Inc.,  Email: kimber@passage.com
 

An Excerpt from "Practical Hypermedia: An Introduction to HyTime":Property Sets and Groves

 Presented at HyTime '96, Seattle, WA, August 20 and 21, 1996.
 

Property Sets and Groves

 Data Structures 
 
Any data processing application requires some definition of the data structures and data objects to be processed. This definition can be more or less formal depending on the needs of the application, how widely it will be used, what other systems will interact with it, and so on. The more general and complex the application, the greater the need for formal definitions of data structures.
 These definitions provide a clear and, to the greatest degree possible, unambiguous description of object and data types, the properties of objects, and the possible relationships between objects. Given such formal definitions, implementors and users can more easily understand and work with objects. Implementors can implement the support for objects more consistently. Agreements between groups of users and implementors can be captured as formal object definitions, making it clear what the agreement was.
Event Schedule
 Hyperlink 
Location Ladder
 
The objects HyTime works with are the structures within SGML documents and other kinds of data: SGML elements, parsed data strings, and identifiable structures in other data notations, as well as some of the transient side effects of hypermedia processing, such as location ladders , hyperlinks , and the results of projecting and rendering events in event schedules . These tend to be complex data objects that can have many different equally-useful and correct representations as objects.
Interoperation
 
In the case of SGML documents, there are many different data elements that result from SGML parsing. The SGML standard defines the source syntax and parsing rules for SGML documents but it doesn't define how the result of applying those parsing rules should be represented or even, necessarily, what the precise result is. However, HyTime works with the parsed result, not the original source itself (remember that the location source for any HyTime location address is ultimately a node in a grove). Because HyTime is an enabling application architecture intended to enable the interoperation of a wide variety of different engines, processors, and applications, there must be agreement on how the objects HyTime works with look.
 In addition, there must be a formal mechanism for describing the properties semantic location addresses work with. In other words, before you can have a property location address or a query, you must define what the properties are.
 DSSSL 
 Extended Facilities 
 SGML Extended Facilities 
 
Of course, the definition of objects and their properties is of utility to a wide variety of applications. In particular, the DSSSL standard works with much the same set of objects and properties that HyTime does, as they both work from SGML as a starting point. Therefore, the SGML Extended Facilities annex includes constructs for declaring and describing objects and their properties. The DSSSL and HyTime standards then use these facilities for the definition of objects resulting from parsed SGML documents and objects unique to each standard. Other standards and applications can use these facilities as well.
 DSSSL 
 
Because these formal object and property definitions are primarily for the purpose of documenting application and architecture designs, it is not necessary to understand the syntactic details of property sets unless you are implementing HyTime (or DSSSL) processors or want to use this formalism in your own application or architecture designs.
 DSSSL 
 
The following sections explain property sets and the groves that result from parsing or processing data according to those property sets. The property set and grove formalisms provide the fundamental conceptual underpinnings for HyTime, DSSSL, and, potentially, any other SGML architecture or processing application.
 

Understanding Property Sets

DEFAULT-HYTIME-VIEW
Lexical Type
 Property Set 
 
In the discussion of groves under section , you were introduced to the concept of objects and properties that result from parsing data into the internal data structures that processing applications work with. The objects that make up a grove are defined in aproperty set , which defines a set of objects and their properties. A property set can also define data types, such as "string" or "integer", and lexical types . The data types and lexical types are then used in the definition of properties.
 TechnoTeacher 
 
Property sets serve primarily as documentation for a system and need not be processed by a HyTime engine (although they can be). While any application that uses general facility property sets should include them in its documentation, it need not provide them as part of the executable program. Presumably the objects will be embodied in the program itself. It may be possible, depending on the nature of the objects and the application, to automatically derive program objects or data schemas from general facility property sets, but they were not particularly designed to enable such processing.[Footnote 8. TechnoTeacher, Inc. has built a general property set processor, PropMinder, that processes Extended Facilities property sets and generates schemas for different object-oriented databases. It can also generate skeletal object-oriented program code that can then be completed by a programmer.]
 Grove Plan 
Maximal Grove Plan
Parsing Context
 
From a property set, you then define one or more grove plans , which describe how the objects derived from the processing of a particular data content notation are built into groves. A grove plan associates a property set with a parsing context . The grove plan also indicates what object types should or should not be included in the grove (for example, in an SGML grove, you may not need all the objects and properties related to the original markup string). For every property set there is an implicitmaximal grove plan that includes all objects and properties.
 DSSSL 
PSDR Annex
Property Set Definition Document
 
Property sets are defined in property set definition documents , SGML documents conforming to the property set definition document type defined in the PSDR annex . The HyTime and DSSSL standards both include the SGML property set definition document. The HyTime standard also includes the HyTime property set definition document.
 Grove 
 

Understanding Groves

Object-Oriented Databases
Open Hypermedia
 
All processing systems must build in memory their own representations of the data they operate on. In practice these representations may take many different forms: arrays, variable pools, relational tables in a database, objects in an object-oriented database, and so on. For ad-hoc processors or processors acting on single-use or proprietary data formats, it is not usually necessary to formalize these data structures beyond whatever is needed to support the development and maintenance of the program itself. However, SGML and its related standards exist in part to enable the interoperation of a wide variety of tools and systems. HyTime, in particular, exists to enable the interoperation of a wide variety of tools and systems, such as will be found in a distributed, networked, open hypermedia environment.
 Data Structures 
 
Thus these standards need a general, system independent formalism for defining and referring to "in-memory" data structures. Property sets are the first part of this definition: They provide a language for defining object types and properties and the allowed relationships among them. Groves are the other part of this formalism: they define how the objects defined in property sets are represented during processing and how different groves can be related together. In particular, groves define a regular and predictable structure for data so that it can be addressed reliably given knowledge of the property set used to produce the grove. Two processors working on the same data with the same property set and grove plan should produce identical groves. Other programs, communicating with these programs in terms of the grove should get the same results from both programs.
 CORBA, Common Object Request Broker Architecture 
 Processing Model 
 
Groves provide an abstract data and processing model . Actual programs need not implement groves literally in their own data structures as long as they can provide the correct results. It can be useful to think of programs providing a "grove view" of their internal data structures. Grove views make it possible for programs to communicate with each other by using the common language of groves and property sets, regardless of what their own internal data representations are. This approach is similar to application standards such as ODBC and CORBA, which define common object models and APIs for specific types of applications. The difference is that groves and property sets are, like SGML, a meta-mechanism for defining common object models, not the definition of a common object model directly.
Grove Root
 Hypergrove 
 Node 
Origin Property
Property Assignment
 
A grove is formally defined as a directed graph of nodes. Nodes are ordered sets of property assignments . A grove has exactly onegrove root , which is that node within a grove that has noorigin property . Groves may be related together to formhypergroves .
Atomic Data Value
Named Node List
Nodal
Node List
Node-valued
Non-nodal
 Property 
 
Each node in a grove exhibits one or more properties, as defined in the property set used to construct the grove. Properties are eithernodal ornon-nodal . Nodal properties consist of single nodes or node lists. Non-nodal properties contain atomic data values, such as integers, strings, and Boolean values. There are three types of nodal properties:node-valued ,node lists , andnamed node lists .
 Node 
Node-valued properties
 Property 
 
Node-valued properties are properties whose values are always a single node. Node lists are properties whose values are lists of nodes. Named node lists are node lists where each node has a name that is unique within the node list. Node lists may consist of zero or more nodes. The nodes in node lists are always ordered so that they can be addressed by position within the list.
Acyclic Directed Graph
Iref Relationship Type
 Node 
Relationship Types
Subnode Relationship Type
Uref Relationship Type
 
Nodes can be associated with a particular node list by one of three possible relationship types : subnode ,iref (internal reference), or uref (unrestricted reference). Nodes with a relationship type of "subnode" are directly contained by the node list. The node that exhibits the node list property is said to be the origin of the nodes in the node list. Each node has exactly one origin (except for the grove root, which has no origin). This means that a node occurs in exactly one node list as a subnode. (Technically, the subnode relationships in a grove define an acyclic directed graph .)
 Cross-Document Address 
 Hypergrove 
 Iref 
 Node 
 
Nodes in a node list with a relationship type of "iref" are in the same grove as the node that exhibits the node list property, but have a different node as their origin. Iref relationships represent things like SGML ID references. Nodes in a node list with a relationship type of "uref" may be in the same or different groves. Uref relationships between groves create hypergroves . Uref relationships represent things like HyTime cross-document addresses .
Grove Node
 Picture 
 
Figure 1 shows a typical node in a grove:
 Node 
 
 
 Figure 1. Typical Node in a Grove
 Each node is nothing more than a collection of properties and their values.
 Grove Constructor 
 Grove Plan 
 HyTime Engine 
 SGML Property Set 
 Semantic Grove 
 
Groves are said to be "constructed" by grove constructors according to a grove plan . Grove constructors are processors that take as input either the result of parsing data or another grove or groves and produce a new grove. For SGML, the input to a grove constructor would be the output of an SGML parser and the output would be an SGML document grove as defined by the SGML property set . For HyTime, the input to a grove constructor would be one or more SGML document groves and the output would be a HyTimesemantic grove as defined by the HyTime property set . In a real system parsers and grove constructors may be bound together. For other types of processors, such as HyTime engines , grove construction is simply part of what they do. Figure 2 shows the construction of an SGML document grove.
 Grove Constructor 
 Picture 
 
 
 Figure 2. Construction of an SGML Document Grove
Grove-based Processing
 

Grove-Based Processing

 Processing Model 
 
The grove abstraction implies a simple processing model revolving around the creation and interconnection of groves. Regardless of what real systems actually do, it is useful to model the processing of SGML documents as the creation of groves in order to define the processing needed for a particular task without regard to implementation details. Once a satisfactory grove-based model has been defined it can be translated into specific implementation designs where optimizations and shortcuts can be applied.
 Architectural Grove 
Auxiliary Grove
Data Tokenizer
Derived Grove
Parse Grove
 aGrove 
 pGrove 
 
In an SGML world, processing always begins by parsing an SGML document. The result of this parsing is called a parse grove orpGrove , the grove that results from parsing. A pGrove will then be processed to produce other groves or some non-grove output. When architectures are being used a pGrove can be processed by a generic architecture engine to produce a grove representing the architectural instance derived from the base document, thearchitectural grove oraGrove . For example, a HyTime document would be parsed into a pGrove. The pGrove would then be processed to derive the document's HyTime aGrove, as shown in Figure 3. Groves created from other groves that have (or can have) an independent existence are said to bederived groves . Derived groves that are created for specific processing purposes and are not independent of the groves from which they are created are calledauxiliary groves . For example, the grove that results from applying a data tokenizer to the content of an SGML element is an auxiliary grove, whereas architectural groves are derived groves.
 Architectural Grove 
 Picture 
 pGrove 
 
 
 Figure 3. Creating a HyTime Architectural Grove from a pGrove.
 Client Document 
 aGrove 
 
Because architectural groves are inherent in client documents , it is useful to assume that there is always an aGrove present, whether or not actual processing systems are implemented that way. Architecture-specific processors are then assumed to take as their initial input the aGrove for their architecture, rather than the client documents themselves.
 aGrove 
 pGrove 
 
A processor can always get from an aGrove to the pGrove from which it was derived because each node in a derived drove has the intrinsic property "source", which is the node or nodes from which it was derived. For example, in a HyTime architectural grove, each element node would have as its source the node in the client pGrove from which it was derived, as shown in Figure 4.
 Node 
 Picture 
Source Property
 aGrove 
 pGrove 
 
 
 Figure 4. Nodes in an aGrove Derived From Nodes in a pGrove
Extent Specification
Finite Coordinate Space
 HyTime Engine 
HyTime Property Set
 HyTime Semantic Grove 
 Semantic Grove 
 aGrove 
 pGrove 
 
Architecture-specific processors must maintain their own semantic groves , which hold those objects directly related to the processor's semantics. For HyTime engines , the HyTime semantic grove holds the objects defined in the HyTime property set . The objects in the HyTime semantic grove may be derived from many different nodes derived from many documents. An event node, for example, would be derived from event -andextlist -form elements in a finite coordinate space . The extent property would be derived from the various elements making up the event's extent specification, and so on. Figure 5 shows the construction of a HyTime semantic grove from a pGrove and an aGrove.
 HyTime Semantic Grove 
 Picture 
 Semantic Grove 
 
 
 Figure 5. Construction of a HyTime Semantic Grove.
 Client Document 
Content Location
 Effective pGrove 
 HyTime Engine 
 Hypergrove 
epGrove
 pGrove 
 
A HyTime engine uses the HyTime semantic grove, along with the other groves in the hypergrove of which the semantic grove is a member, to do whatever processing it needs to do. This processing includes the construction of a new grove for the original document reflecting the effective results of HyTime-specific processing. For example, if the content location facility is used, the HyTime engine must resolve any content locations to determine the effective content of those elements before it can resolve any location addresses. This new grove is theeffective pGrove , orepGrove of the client document. Figure 6 shows an epGrove being produced from the other groves in the hypergrove.
 Effective pGrove 
 Picture 
 pGrove 
 
 
 Figure 6. Creation of an Effective pGrove.
 An actual application would probably not literally create a new in-memory representation of the client document, but would just augment its existing representation. However, it's easier to talk about the abstract processing if it is represented as a separate creation process. To keep the grove abstraction simpler and to make location addressing more tractable, groves are considered to be static once created. There is no notion of changing a grove once it has been created. In particular, the grove position of a node cannot change once it has been set in a grove. In the abstract processing model change is represented as destruction of the old grove followed by creation of a new grove. Actual applications can, of course, have more dynamic real data structures.
Canonical Grove Representation
 

Canonical Grove Representation

 Extended Facilities 
 
Groves are an abstraction designed to enable inter-standard and inter-application interaction. Because they represent real data and because the nature of that data is well defined through property sets, it is possible to define a canonical representation of groves. The Extended Facilities annex defines this canonical representation using an SGML document type and a set of severe constraints on how the source documents are organized such that a given grove can produce one and only one string representation of the grove. This string representation can then be used to do string comparisons of groves. This can aid in checking processors for conformance and in debugging. The canonical grove representation can also be used to interchange groves among processors, if necessary. Canonical grove representations, when compressed using normal compression techniques, could also serve as a binary form of SGML documents and application-specific data structures (semantic groves), which, once decompressed, would be usable by any grove-based applications.
 A typical canonical grove looks like this:
 
<!DOCTYPE GROVE PUBLIC "ISO/IEC 10744:1992//DTD Canonical Grove
Representation//EN"><GROVE><NODE CLASS="sgmldoc"><NODEPROP ID="X1"
DATATYPE="nnl" RCSNM="prop1" NODEREL="SUBNODE">
property value</NODEPROP><NODEPROP ID="X2" DATATYPE="nnl" RCSNM="prop2"
NODEREL="SUBNODE">
property value</NODEPROP><NODE ID="X3" CLASS="foo"><NODEPROP ID="X4"
DATATYPE="nnl" RCSNM="prop2" NODEREL="UREFNODE">
property value</NODEPROP><NODEPROP ID="X5" DATATYPE="nl" RCSNM="prop2"
NODEREL="IREFNODE">
property value</NODEPROP></NODE><NODE ID="X6" CLASS="bar"><NODEPROP ID="X7"
DATATYPE="string" RCSNM="prop2" NODEREL="ATOMIC">
property value</NODEPROP><NODEPROP ID="X8" DATATYPE="nnl" RCSNM="prop2"
NODEREL="SUBNODE">
property value</NODEPROP></NODE></NODE></GROVE>
 Cross-Document Address 
 Iref 
Subnode Relationship
Uref Relationship
 
The basic rules for canonical grove documents are that each start and end tag is on a line by itself and attribute values are always enclosed in literals. Subnode relationships are represented by direct containment. Iref and uref relationships are represented by ID references. Every element is assigned an ID using a fixed algorithm of numbering nodes sequentially in a depth-first, left-list traversal of the subnode graph of the grove. Uref relationships cannot be represented by working cross-document addresses because there is no way to consistently declare the other groves as documents within the grove document. Thus, urefs are simply numbered sequentially in the order they are encountered in the grove.
 HTML, Hypertext Markup Language 
 
This HTML document created from the original SGML using Panorama style sheets. Subsequent modifications done with HoTMetaL PRO 3.0.

David W. Cooper - HyMID Relationships   Table of contents   Indexes   Lois Delcambre - Structured Maps