How Five Industries Will Benefit from the Grove Paradigm

How Five Industries Will Benefit from the Grove Paradigm

Steven R.  Newcomb

TechnoTeacher, Inc.

3615 Tanner Lane  Richardson  Texas 75082-2618 USA
Phone:voice: +1 972 231 4098 (at ISOGEN: +1 214 953 0004 x137) Fax:+1 972 994 0087 (at ISOGEN: +1 214 953 3152) Email: Web:

Biographical notice

Principal in TechnoTeacher Inc., a software developer specializing in ISO-standards-based tools for information management systems integrators and software applications developers, with licensees in telecommunications, computers, defense, education, energy, publishing, government, and aerospace. Co-editor of ISO/IEC 10743:1996 Standard Music Description Language (SMDL). Co-editor of ISO/IEC 10744:1992 (and 10744:1997) Hypermedia Time-based Structuring Language (HyTime). Design Team Member, US Navy Metafile for Interactive Documents. Founding Conference Chair, International HyTime Conference, 1994-1997. Conference Co-chair (with Carla Corkern) of the successor Metastructures conference, 1998-. Founding Chairman, Conventions for the Application of HyTime (CApH) activity of the Graphic Communications Association Research Institute, the original developer of the Topic Map paradigm, and co-editor of ISO/IEC 13250:1999, the Topic Maps information architecture.


Because of aircraft safety concerns, the process of creating and maintaining aircraft component documentation (assembly instructions, test instructions, component maintenance manuals, operating procedures, work instructions, shop specifications and shop standards) is exacting, and every step of every process must be highly auditable. Grove-based processing reduces the cost of creating and maintaining such documentation, while improving the reliability and auditability of each step. While the legal requirements governing the auditability of the maintenance of technical information used in ground transportation may be somewhat less, the safety and indemnity concerns are just as compelling.

The process of creating an maintaining transportation equipment documentation involves materials emanating from engineering databases, which are managed as databases, while the documentation itself is normally managed as flat files. Many copies of data whose original source is a record in an engineering database appear in various technical publications. The process of copying a datum into every publication that cites it is a chore, and then each copy must be verified for accuracy by comparing it directly with the source in the engineering database. When the engineering database changes, rediscovering where all the copies are and updating them can be even more expensive, time-consuming and subject to human error. Quality, reliability, speed, and expense are all issues here.

Grove-based processing systems permit n-directional linking among nodes in groves that can represent all the different information resources from which aircraft component documentation is created, and on the basis of which it is maintained. Such n-ary links (links with a number of anchors greater than or equal to two) make it easy to determine, from the perspective of an engineering database, which parts of which publications will be affected by any change in any part of such a database. Similarly, from the perspective of the publications, the source(s) from which every statement in a given publication can be easily determined. In addition, whenever any person involved in the process of creating and maintaining such a technical publication needs to express the need for a change, to propose a change, to assign someone the task of making the change, or to approve a change, a single annotating link can be created that refers to all of the affected components of all of the affected resources. Such n-ary links can be created by anyone, in documents other than the documents that are being annotated, so that annotations can be made without changing what is being annotated in any way; this allows even the least experienced repair technician or production line worker to communicate effectively with the editorial and engineering staff, to request clarifications, or simply to report errors. Finally, grove-based processing applications can easily be created such that, when the time comes to create a new, stabilized version of a publication, all the approved changes are applied and the new version is created automatically; this new version then becomes the stable source on the basis of which the next round of changes can be made. Because accuracy and timeliness are required in order to avoid costly accidents and downtime, and because the addressing power provided by grove-based processing makes fine-grained annotation-based workflow systems possible, grove-based processing can make a big difference in the profitability of transportation equipment companies.

The inefficient practice of inserting "change pages" in existing publications can be avoided altogether, since it actually costs very little to assemble and issue a complete new revision.

Grove-based processing imposes no lower or upper limit on the size of an information component that can be addressed. If it becomes necessary to annotate a single character -- even a punctuation mark -- that character can be addressed. Similarly, if a change is requested for any set of documents, there is no limit on the number of documents involved, or on the size of any of them. In practical terms, document databases in the hundreds of terabytes are feasible using current technology. One existing system has demonstrated a capacity for at least one million annotating hyperlinks.

Several aircraft and aircraft component manufacturers are exploring the use of grove-based technology in their documentation processes. Some, like the Woodward Governor Company (Rockton, Illinois), are hoping that grove-based processing technology will be an invaluable competitive advantage for many of their operations involving workflow, including parts handling, remanufacturing, repair, testing, etc. The possibility of unifying a grove-based documentation process with the process of creating IETMs [interactive electronic technical manuals] is also being explored. Since the transportation industry as a whole is already committed to the SGML/XML (vendor-neutral generic markup) paradigm, and since the grove paradigm was internationally standardized in 1997 as the theoretical basis of SGML-based information interchange, it seems likely that grove-based processing systems will soon be commonplace in all of these industries.

Government Systems

The information handled by governmental authorities is characterized by records pertaining to very large numbers of cases, and sometimes by very large numbers of records per case. The records pertaining to any single case may be structured or unstructured. If structured, they may conform (or fail to conform) to any of a large number of schemas; the number of schemas is open-ended. Many original records arrive in hard-copy form only; these may emanate from anyone, anywhere, and they may include handmade sketches, handwritten notes, report forms, accounting records, photographs, X-rays, audio tapes, video tapes, transcripts, chemical and medical analyses, and innumerable other kinds of information. For example, in the case of a state agency whose purview is labor, employment, and workers' compensation, all of these kinds of records may be found in a single case, a single case may have thousands of pages of such records, and there may be tens of thousands of pending cases at any one time. The same kinds of challenges can be found in other governmental agencies, including not only executive arenas like taxation, healthcare, insurance regulation, law enforcement, public health, public safety, welfare, consumer affairs, and education, but also in support of legislative and judicial functions.

Grove-based information processing provides governmental authorities the opportunity to make significant improvements in caseworker productivity, and in the quality and consistency of casework. Assuming all records are digitized (by scanning, data entry, OCR, and/or other conversion), all records can be queried using structure-sensitive and structure-ignoring queries. More importantly, "bookmarks" can be created that can take any caseworker to any structural portion of any record, without changing the records. These bookmarks can be made to appear as hyperlinks in summary and/or ongoing reports created and maintained by caseworkers, so that caseworkers who are unfamiliar with a particular case can quickly become familiar with all of the currently-relevant material, regardless of its intrinsic structure or format.

Prior to grove-based information processing methodology, similar effects have been achieved by means of imaging technology. However, the process of reducing all records to images made most information less accessible and created a situation in which addressing could only be done in terms of the position of the addressed material on an image of a page. The structural context of the addressed material, and indeed the semantics conveyed by the original structure of the material are often lost in systems that use images as their lingua franca. For example, it's difficult to build an image-based system that can reliably distinguish between a column heading and a title; this makes it impossible to support searches limited to titles (or column headings). Examples of information that is not properly understood by imaging systems include the names of fields in forms, the schemas of databases, the metastructural hints provided by the procedural markup found in word-processing files, the objects found in drawings, etc. Moreover, total reliance on images, or even on preformatted text, makes full text searches unreliable. For example, not only is OCR of an image of a page of text inherently somewhat lossy, but also, in the case of two-column text, very often a search for a phrase will miss instances that should be hits, and recognize hits where there are none, by reading straight across the gutter between the columns. Finally, text imaging systems are typically architected around the assumption that everything is ultimately always found on some sort of page; they can't handle audio or video data. They can't even handle textual data when such data is too complex to be represented effectively on a page. In the end, imaging systems rely on the fact that the user's eye will make distinctions that machines miss. This reliance on human involvement in structure-sensitive searches makes such searches inherently very expensive and/or unreliable.

With grove-based information processing, all of the advantages of imaging systems (ease of access by caseworkers, vastly reduced physical space requirements, uniformity of addressability of information components) are retained and even strengthened. However, unlike imaging systems, grove-based information processing systems don't lose track of the original structure of incoming data; that original structuring information can be of enormous value in searching and in unambiguous addressing. Moreover, if we consider the information maintained by a governmental agency to be an asset (an asset which includes the bookmarks, summaries, and other reports created by government workers), the value of that asset is not compromised by reliance on any one information notation, presentation paradigm, or proprietary bookmarking methodology. Each information resource can remain in its original form, and none of the information it originally represented need ever be lost or suppressed. The value added by government workers -- the bookmarks, annotations, summaries, and reports -- can also be fully retained, even across technology changes, because the addressing information of every component of every resource is expressed according to international standards.


The healthcare industry is enormous and growing rapidly, and yet healthcare providers remain highly differentiated. For example, research hospitals use operational procedures (and information architectures that support those procedures) that are quite different from those used in community hospitals. Even within the US, clinics come in many varieties and keep records that are consistent with their missions, but inconsistent with the record-keeping practices of other clinics. Health maintenance organizations and health insurers in general are imposing increasingly uniform record-keeping requirements on almost all healthcare providers, but the special record-keeping requirements of special-purpose clinics, and of hospitals with teaching and research missions, are at odds with the insurers' attempts to impose such uniformity. The result is often a welter of forms on which healthcare personnel must enter information redundantly, and even in such a way as to appear inconsistent with one another, in order to meet all of the conflicting requirements. This increases the cost of healthcare, and it corrodes the morale of healthcare professionals. Instead of seeing and helping patients, they are becoming accountants and insurance functionaries. Instead of discussing medicine with their colleagues, they spend much of their professional communication time discussing the minutiae of funding models, the economic pitfalls of various diagnoses (for both patients and themselves), etc.

Despite the fact that in our almost totally globalized economy, in which at any moment a significant fraction of the world's population is beyond the reach of its own local healthcare system, the local requirements regarding healthcare record-keeping lack uniformity from country to country. This situation seems quite inexcusable, given the fact that all human beings are members of the same species, and that knowledge of human medicine is available everywhere, and even the latest healthcare technology is highly patent-driven and only rarely secret. It is entirely reasonable to think, given modern internet technology, that if a US citizen becomes ill in South Africa, Peru or Singapore, his health records should be available to local doctors in a matter of seconds, and that they should be immediately interpretable upon arrival. It will be easy to accomplish this feat with grove-based processing and inheritable information architectures.

The Kona architecture being developed in the context of the HL7 initiative is an "SGML inheritable architecture" designed to permit global interchange of medical records, while continuing to support multiple levels of local control. In this paradigm, the most local architecture -- a DTD used for patient records in the context of a given clinic, for example -- inherits its characteristics from a variety of other architectures, including the architecture dictated by the County Board of Health, the Insurance Commissioner, the relevant insurance companies, and any other involved parties: regulatory authorities, business partners, accrediting agencies, etc. Since the clinic is in control of the DTD, and as long as none of the constraints imposed by the information architectures inherited by the DTD are violated, the clinic is free to add features to its records, and to manage the way in which the records are made and maintained, in any way it likes. For every inherited architecture, at processing time there can be a distinct grove. (The "SP" parser now used throughout the SGML industry has supported the creation of such distinct groves for well over a year now.) This means that every authority can automatically read the clinic's patient records just as if they were kept using exactly the DTD required by that particular authority. The Kona architecture is an attempt to create a world standard "basic" architecture for all patient records, so that the dream of total portability of patient records can be realized, without compromising in any way the autonomy of local authorities, healthcare businesses and medical institutions.

Grove processing also brings a new level of annotative power to medical records. In a world where digitized medical imaging and other diagnostic output (EEG, ECG, etc.) is commonplace, there is a need for a single standard way to address, from within medical commentary, particular parts of such multinotational data, in terms of whatever are the most convenient identifying characteristics of components of such data, regardless of its origin or notation. Because each such notation can have a "property set" that gives standard names to the characteristics and phenomena represented by the information resources that use the notation, every instance of every such characteristic or phenomenon can be unambiguously addressed. Since it can be addressed, it can be a target of a hyperlink, and the meaning of the hyperlink can be anything at all, including an annotation of any kind, for any purpose, in support of any application. The hyperlink can be anywhere, including inside a doctor's textual remarks about a case. It seems quite likely that, ultimately, medical records will consist primarily of XML documents that will routinely contain XLinks that employ addressing statements based on the properties of the notations in which a patient's diagnostic output data are recorded. The basis of all this functionality has been demonstrated using today's technology; all that remains is to deploy it widely.


IETMs are rapidly becoming commonplace, replacing paper-based manuals for automobiles, consumer electronics, office machinery, etc. IETMs offer significant cost savings and quality improvements over paper-based manuals, due to improved timeliness, accuracy, portability, and lower cost of duplication and distribution. IETMs also offer the potential for diagnostic information arriving at the IETM [interactive electronic technical manual] delivery system (usually a notebook computer), directly from the system being repaired, to influence the presentation of information and instructions to the person who is performing the repair and/or maintenance. The military exploits this kind of enhanced presentation capability to reduce training costs and to lower the competency requirement for the technicians who keep complex weapons systems in battle-ready condition. Commercial enterprises use the enhanced presentation capabilities of IETMs to improve the productivity of technicians, and to reduce the downtime of maintained equipment. Significant cost savings routinely result from using IETMs to improve the availability of replacement parts, to minimize the paperwork involved in delivering parts where they are needed, and to minimize the likelihood that the parts, when delivered, will turn out to be the wrong parts.

Each IETM must be created by assembling only the relevant product documentation, engineering information, logistics information, and information generated by a wide variety of persons. Some of the persons who generate relevant information may be technical writers, some may be customers, some may be end users, and some may be field technicians. Some of this information may become part of the IETM 's presented information, and some of it may only influence the editorial process by which the IETM is created. In any case, a wide variety of information must be taken into account, and everything that appears in the deliverable IETM either was inserted verbatim from some part of some supporting document, or it is somehow derived from some such supporting document component. When any supporting document changes, it is vital that the next versions of any IETMs that depend on that supporting document accurately reflect the change.

Grove-based information processing facilitates the creation and maintenance of IETMs in many ways, including:


The semiconductor industry is characterized by extreme competition, low margins, and rapid change. In the semiconductor industry, yesterday's high-margin leading-edge product is today's low-margin commodity item. Grove-based processing creates a single coherent universe in which semiconductor buyers can reliably compare competing semiconductor datasheets as "apples to apples"; they can make better and faster design and buying decisions. Grove-based processing increases return on investment by allowing semiconductor customers to understand and apply innovations more rapidly, efficiently, and accurately.

Grove-based processing also increases the reusability of technical information that is associated with any given reusable functional module. Since a given semiconductor functional module consists of a heterogeneous set of intellectual properties (microlithography, specifications, timing diagrams, test results, etc.), it is only natural and efficient to seek ways to manage and maintain these materials as a unit, and yet to allow selections from their contents to appear by reference wherever they may be needed. The grove paradigm supports this, because it allows any information component of any information asset, in any notation, to be addressed in any convenient terms and re-used in any application-defined fashion.

The Silicon Integration Initiative (Si2), an international consortium including almost all of the major semiconductor manufacturers, has developed information architectures for publishing technical data about their products. One architecture is for semiconductor "datasheets" -- technical specification summaries for particular semiconductor prodects. Another architecture is for definitions of the terms used in the datasheets. Many kinds of relationships must be modeled in semiconductor data, and these relationships may or may not be explicit in the text of a given product-specific datasheet. A given information object may participate in one or more relationships by virtue of hyperlinks that may not be in the same document. Grove-based processing of semiconductor data makes it possible for software that performs application neutral but Si2-specific semantic processing to be developed once, and then used by all members (and their customers) in their various applications. In this way, all members can have complete assurance that the specialized semantics of Si2 documents will be available, convenient to use, and uniformly interpreted by other consortium members, and by all customers of all consortium members. Furthermore, grove-based processing establishes a firm foundation on which very sophisticated applications can be built by anyone, with guaranteed conformance to Si2-defined semantics. The risk of developing software is thus radically reduced, and the overall productivity of semiconductor-related industries is enhanced. The demand for semiconductors is increased, and the cost of shifting to grove-based systems is repaid many times over.