Most Frequently Asked Business Questions About XML   Table of contents   Indexes  

 

XML for legislation drafting, management and Web delivery—How structured document representation facilitates automatic processing

Arnold-Moore, Timothy
 Australia 
Melbourne
RMIT Multimedia Database Systems
 
Timothy  Arnold-Moore
EnAct Technical Director,  RMIT Multimedia Database Systems 
 3 GPO Box 2476V
Melbourne  (Australia) 3001 
Email: tja@mds.rmit.edu.au

Biographical notice

Dr Arnold-Moore obtained his B.Sc. (Hons) in Computer Science in 1991 and LL.B. in 1993 from the University in Melbourne. He recently completed his Ph.D. in Computer Science at Royal Melbourne Institute of Technology with a dissertation entitled "Information Systems for Legislation". He has been a member of Multimedia Database Systems, developers of the Structured Information Manager—a structured document repository, web server, and development environment—since 1993.

He has number of publications and presentations in international conferences on subjects including data models and query languages for collections of structured documents, document database issues including indexing and versioning, workflow management, automating the processing of legislation, and defamation and intellectual property law. He is also co-author of "Document Computing: Technologies for Managing Electronic Document Collections" soon to be released.

As the technical director of the EnAct—an SGML/XML legislation drafting and management system developed for the Tasmanian government—Timothy has successfully incorporated much of his research into a commercial application. He has also been engaged as a technical consultant on SGML, XML, and legislation management in other government projects.

 

The problem

  Legislation is dynamic. It covers nearly every aspect of our lives and the world around us. Despite the fact that it often takes a long time to achieve any particular change in the law, most legislation is constantly subject to change. In most jurisdictions, these changes, usually called amendments, are expressed in terms of simple changes to the text of the Act (e.g. substitute this phrase for another, omit this section, replace this paragraph with these three paragraphs, etc). Amendments may appear attached to an Act with new law (a substantive Act) or they may appear in separate amending Acts.
  One particularly contentious piece of legislation in Tasmania is theRacing and Gaming Act 1952  . Since 1978, this Act has been amended by more than 50 other Acts with over 400 separate amendments made.
 Paper is static. Once it is printed, a page cannot be changed. The authorized government printer together with the Office of Parliamentary Counsel (the office responsible for providing drafting services and legal advice to the Parliament) periodicallyconsolidate  these amendments by applying them in turn to the original substantive Act (or the last consolidation) and then printing a copy of the Act as amended up until a particular date. These consolidations are then printed and released for sale by the government printer. But the consolidations typically happened every five to ten years, and in Tasmania, where the OPC had fallen behind, theRacing and Gaming Act 1952  had not been consolidated since 1978. In between consolidations, lawyers and other users of legislation had to manually apply the amendments themselves. When a person went to the government printer to buy a copy of the Act, they would get the most recent consolidation and a copy of all those Acts that amended it, and a notice disclaiming liability in case the government printer had missed one of the amending Acts.
 Private publishers in some jurisdictions sometimes produce loose-leaf services. Legislation (and often case notes and commentary) are shipped in a ring-binder and periodic updates are sent to replace entire pages or series of pages in the ring-binder. In this way the static nature of paper can be partially overcome. But versions which are valid only for a short time between two updates never appear, and only the current consolidation (or more correctly the consolidation valid at the time of the last update) is presented.
 These loose leaf services were reflected in new electronic products—on-line services and later CD-ROM. But because Tasmania was such a small market, publishers didn't see that it was economically viable to cover the costs of converting and maintaining Tasmania's legislation, so Tasmania was stuck with the old government printer consolidation solution. There was no loose-leaf, on-line, or CD-ROM service. In other jurisdictions, on-line services were prohibitively expensive. With 6 states, 2 territories, and a federal government all producing legislation, a total population smaller than California but an area comparable with mainland US, communications costs and service costs were prevented wide spread adoption by all but the largest commercial firms. Legislation was maintained by hand-written pen marks and pasted in paper on the latest consolidation. CD-ROM delivery brought the costs down so that they were accessible to most lawyers and the conversion and maintenance costs could be spread over a larger client base. Finally the Internet, which brought the costs of telecommunications to a reasonable level, made on-line services feasible again. However, CD-ROM has the same problem as paper. It is static. On-line services, despite the potential for a more dynamic collection, until now have maintained this paper and CD-ROM paradigm by showing only the latest consolidation, albeit with more frequent updates.
 Citizens, in a society where ignorance of the law is no excuse, at the very least have a right to know what the current text of the law is. In a democracy, where they have right to direct how the law should change, this is even more important. Governments have an obligation to provide it to them. Legislators need an accurate version of the legislation in order to assess and vote on the amendments presented to them by other legislators. But beyond the current state of the law, lawyers are interested in what the law was when their client's problem arose. They need to plan the future affairs of their clients, to the extent that the future state of the law is known. What is needed is a dynamic collection of the legislation, a 'point-in-time' database, where users can search and browse the whole body of legislation as it was at an arbitrary snapshot in time.
 

The Solution

 EnAct provides such a collection. This is made possible largely through using structured document technology, both SGML and XML to automate much of the process of producing legislation and consolidations. EnAct is a total legislative drafting and management solution, providing facilities to manage the whole legislative process from a drafting request through to final publication of legislation in sessional and consolidated form.
 A workflow client controls and tracks the flow of draft legislation through the legislative process (each process definition and process instance is an XML document). It integrates a customized editing environment, built on Microsoft Word, to capture the structure of the new legislation (producing SGML documents). Amendment legislation is drafted by checking out the relevant consolidation from the repository, marking changes onto the consolidations, and generating a system specific representation of the changes made (a Change Description Document or CDD which is an XML document). This CDD is then used to automatically generate the text of draft amendment legislation. If the draft legislation proceeds through all of the stages to become law, the CDD is also applied to the legislation database to update the stored consolidations. From this database, paper, RTF, HTML, and CD-ROM publications can be created and delivered via a variety of means.
 

Creating amendment wording

 Traditionally the process of amending legislation focussed on the wording of the amendments and not the resulting consolidation. However, the amendment wording is merely a means to an end. The end is the wording of the altered legislation. The EnAct amendment process re-focuses the drafter on the end. Amendment wording is not really drafted at all. The resulting consolidation is drafted, and the amendment wording is automatically produced from the draft. A draft of an amending Act can be produced without any further input from the drafter.
 However, there are a number of alternative ways of arranging and grouping amendments that result in different wording. EnAct provides a tool to drafters allowing them to rearrange the amendments, assign different commencement types to them (individual provisions of legislation can commence at different times, and by different means—by Royal Assent, by proclamation, when another piece of legislation commences, etc). Drafters can even replace the wording of an amendment completely by manually drafting equivalent text although this is used primarily as an escape hatch for unanticipated situations and any such cases are generally incorporated into the automatic generation module where possible.
 

EnAct document repository

 EnAct uses the Structured Information Manager (SIM) Database Server to store historical and sessional versions of all legislation, as well as intermediate documents used to construct the historical versions (draft legislation in SGML and Word format, associated cover letters, CDD's), and the workflow process definitions and instances. The SIM database server incorporates an XML and SGML parser into the database kernel, so SIM understands structured documents in ways that other document repositories do not. The SIM database server communicates via the Z39.50 search and retrieval protocol used widely in the digital library community.
 The historical legislation collection is fragmented at section level as all legislation has a number of sections (or equivalents) and this provides a useful granularity for browsing, for retrieval, and for version granularity. Each fragment has a time stamp—a start and end time—showing the period over which the fragment is valid. Each table of contents is also time stamped. New versions of the table of contents are only required when the text of the table of contents changes. Browsing displays a table of contents on one side and a fragment on the other to give the fragment a context. In addition to the full text and fielded searching capabilities, the searching interface provides the ability to enter a date (which defaults to the date of searching) and any subsequent browsing or searching is over the legislation as it was on that date.
 

The workflow client

 The generic workflow client allows authorized users to create an instance of any number of processes. Each different process is defined in an XML process definition document. This document describes the tasks that make up that process, the documents that need to be created and associated with that process, and the mechanisms by which those tasks can be achieved, so that all of those tasks can be initiated from within the workflow client, and the results collected and managed by the workflow client. Any given instance of a process is also stored as an XML document. Each instance, together with the documents attached to it, is stored in the document repository as just another document.
 The workflow client displays the current status of any instance, and the tasks to be performed at that stage, and any document attached to the process instance. Users may select a task, which activates any applications required to perform the task and passes them documents and other information collected throughout the legislative process that are needed to complete the task. The workflow client communicates with the document repository, inserting and modifying process instance document and attachments via the information retrieval client.
 

Information retrieval client

 The information retrieval client resides on the client machine and provides communication between all client applications and SIM, the document repository. A search and viewing interface is also included which addresses the particular needs of drafters. This allows them to select legislation to be checked-out of the repository for amendment and also assists in the management of cross-references between legislation.
 

The editing environments

  Editing in EnAct is done using custom templates in Microsoft Word. Custom macros and toolbars provide an easy interface to capture the complex structure of draft legislation using Word styles and bookmarks, and to communicate with the workflow and information retrieval clients.
  For amendment, consolidations are imported into Word by collecting the fragments and table of contents that are valid at the specified time point, merging them into a single SGML document, converting that document into RTF.
 

The translation module

 SIM comes with an Application Construction Environment (Ace), which is a language and tool-set for manipulating documents, particularly XML and SGML documents. It has James Clarke's SP parser incorporated allowing the full structure of an XML or SGML document to be accessed. It also includes mechanisms for converting from RTF to SGML or XML, either a format oriented SGML like that in the Rainbow DTD, or via configuration files and style-sheets, any arbitrary DTD. Error messages are provided in terms of the Word styles so that users need never know that XML and SGML are underneath the system. But users are given enough information to make sure that they produce conformant documents. Ace is used to convert draft legislation in an RTF document from Word into an SGML version, or a marked consolidation in an RTF document from Word into an XML CDD. This tool is also used for constructing style-sheets to convert SGML and XML documents into their output formats, RTF, PostScript, and HTML.
  Ace is also incorporated in to the SIM database server and allows complete flexibility in specifying how documents are to be indexed for searching, and how they are fragmented, or otherwise modified for storage and export.
 

Amendment wording generator

  But Ace is not simply a style-sheet system. Ace is a fully fledged programming language and can be used to construct sophisticated automatic processing of documents. The automatic amendment wording generator is written in Ace. This module takes an XML CDD and an SGML draft document and attaches amendment wording corresponding to the changes in the CDD to that draft legislation to produce a new draft.
 

CDD Manager

  As mentioned above, the users of EnAct can customize the generated amendment wording to a certain extent, by rearranging amendments, grouping them differently, commencing different amendments at different times and in different ways, or by replacing the generated wording with manually entered wording. The CDD Manager provides a tool for manipulating CDD's in XML. This is done by presenting a tree structure view (like File Manager or Explorer) of the CDD, and providing various operations to move amendments around and grouping constructs. These modified CDD's can then be associated with the process instance by the workflow client.
 

Version generator

  Another module written in Ace is the version generator. This takes original consolidations of legislation, and one or more CDD's, and applies them to produce a set of time stamped fragments and table of contents reflecting the history of the amendments from the consolidation to the time of the last CDD. This module is used to populate the document repository with historical and sessional fragments.
 

Web delivery

  SIM also includes with a Web server. This provides a multi-threaded HTTP server which maps HTTP requests onto Z39.50 requests, and turns the SGML and XML fragments returned into HTML for delivery to the requester's browser. SIM was designed for large collections and a large user load. Many web servers are hampered by a chain of processes required to communicate via HTTP to a database or document repository. CGI bin scripts initiate command line processes, or activate database clients, which then communicate with the database server process. The typical SIM Web architecture involves two processes (each of which may be on different machines), a Web server process, and a database server process. These communicate directly with each other. This makes SIM solutions much more scalable in volume and user load. It means that precious server power can be utilized to provide more functionality, better presentation, and more dynamic collections. Readers can view the EnAct public access web site for Tasmanian legislation at http://www.thelaw.tas.gov.au/.
 

Benefits to Tasmania from EnAct

  By choosing to adopt a structured markup approach to legislation and adopting the EnAct legislation drafting and management system, Tasmania has achieved a number of key benefits.
 

Authorized Electronic Repository

  Tasmania is possibly the first jurisdiction in the world to make the electronic repository the authorized source of legislation. Rather than vellum stored in a vault in the Supreme Court, the electronic documents have legal status. These means that the people of Tasmania, and any other users of Tasmanian legislation, have access via the Internet to authorized legislation. Before this was not even possible for lawyers, and only in extreme cases for judges.
  Equality of access to legislation has been achieved. Those with Internet connections can view legislation from home. Those without can view at a public library or visit the government printer and request a paper consolidation of the legislation at an arbitrary time point. Those with disabilities now have much access to legislation, those with mobility problems can view legislation from home, those with visual impairment can use Braille readers or voice synthesis. The people of Tasmania have access to the same version viewed by the Parliament, the judiciary, and the legal community.
 

Improved Legislative Processes

  Both parliament and the legal profession have received quite tangible benefits from EnAct. Parliament has access to more effective tools to enable it to do its job. They have access to the current consolidation or whatever consolidation is being modified by draft amending legislation. They can even be provided with a copy of that consolidation with the changes marked on it. This means that parliamentarians are better informed on the legislation before them.
  Legislators are now more accountable. The people have access to much the same tools as the legislators. They can find exactly what is in the legislation, and can lobby more effectively for changes to be made.
  Legislation, particularly amending legislation, is much more uniform now. The structure of legislation is checked automatically before it is presented to parliament. Amendment wording is almost purely automatically generated, so variations in wording describing exactly the same amendments have been limited to a controlled and known set of primitives.
 

Lower Cost

  The cost of maintaining legislation collections for lawyers has been lifted. No longer do they have to pay administrative staff to paste in changes to their paper consolidations. No longer do they have to maintain a vast library of legislation. Rather than every individual law office and government department maintaining their own set of paper consolidations, this process is centralized. The work is done once, and then mostly automatically.
  The result should be lower costs for legal research, lower costs for delivery of government services, and lower costs to businesses and individuals who have to comply with the law. This either means an increased level of service for the same money, higher profits, or a cheaper service to consumers. Either way the benefits of EnAct for Tasmania are many and varied.
 

XML and SGML Lessons

  There are a number of lessons that can be learned from the EnAct system about XML and SGML. The main lessons are that XML has significant advantages over SGML if you need to write your own parser, and that capturing the logical structure of documents whether it be using XML or SGML or a combination of both, makes possible automatic processing that was previously only a dream.
 

XML has Advantages over SGML

  In order to construct the CDD Manager and Workflow clients, both of which were custom applications in Windows NT, a solution was needed for parsing structured documents, preferably without the need for a DTD. Work on the EnAct system was begun in 1995, before XML, and before James Clarke's SP engine. So SIM Ace did not understand XML, and did not support DDE conversations with Microsoft operating systems. Writing a full SGML parser is a daunting exercise, and, since this was a tightly controlled application, a subset of SGML was chosen to minimize the burden of writing a parser for these two applications.
  The restrictions chosen by the EnAct development team were remarkably similar to those chosen by the XML committee. Restrictions included case sensitive generic identifiers (tag names) and attribute names, attribute values limited to literals (in fact always delimited by " or ') with no tag omission. This meant that there could be no empty tags (as empty tags in SGML required that end tags be omitted). However, the exclusion of empty tags proved a little too restrictive as CDD's included, amongst other things, whole elements to be inserted into the existing consolidations. These elements could contain empty tags and, rather than compromise the integrity of the collection, we decided to solve this problem by providing the parser with a list of all of the empty tags.
  The XML solution—to provide a special empty tag delimiter—is much more elegant, although, at the time not valid SGML. Now that SGML tools have been extended to include the empty tag marker in XML we can make the CDD's and workflow process instances conformant documents and remove the need to pass any information to our simple parser. Of course, today we would implement the whole environment using SIM Ace rather than write our own parser, but that option was not available at the time.
  The bottom line was that writing a completely conformant SGML parser was just too hard and, for the task, complete overkill. Writing an XML parser is much simpler, and since most of the advantages of SGML are available using XML (and many more besides), using XML is often a preferable solution where custom tools are needed.
 

Automatic Document Processing

  The automatic processing of legislation in EnAct, in particular the automatic generation of amendment legislation and the automatic application of amendments to the consolidated database, would simply not be possible without representing the logical structure of documents in the collection. There have been a number of previous attempts to automate consolidation in Canada and Australia without success. My suspicion is that these attempts failed, not because of a lack of skill on the part of the people trying to achieve the task, but because the underlying formats simply didn't capture enough information to make the task possible.
  SGML or XML both allow the representation of structural components not merely presentation or formating information. This allows amendments to operate on logical components, to omit or replace sections or parts, instead of omitting or replacing in terms of layout information. If you are dealing with layout information you effectively have to solve the same problems you need to solve to convert from that layout information to an XML document which captures document structure, and then solve the problems associated with the particular document manipulation that you have in mind. By dealing with SGML and XML documents you either eliminate the problem of converting from layout oriented markup to structured markup by creating the documents in SGML or XML to begin with, or you separate it into a document conversion process to manage the identification of structure in a separate process.
  The EnAct system demonstrates the potential of utilizing structured markup to automate significant aspects of the manipulation and processing of documents. It demonstrates just how much can be achieved by freeing implementors from the constraints of static paper, and inflexible layout oriented markup, and investing in the power of structured markup.

Most Frequently Asked Business Questions About XML   Table of contents   Indexes