| Document Structure Identification: a New Paradigm | Table of contents | Indexes | Global |
|||
Authoring: intelligent templates for authoring of SGML documents |
|
Frank-Marcus Steinmann |
| Project Manager |
| debis Systemhaus Magirusstr. 43 Ulm Germany 89077 Phone: +49 731 9344 3221 Fax: +49 731 9344 100 Email: fsteinmann@gei-ulm.daimler-benz.com Web: www.dtro.e-technik.tu-darmstadt.de/fms/fms.html |
Biographical notice: |
Dr. Frank-Marcus Steinmann |
ABSTRACT: |
authoring ![]() document ![]() documents ![]() editing ![]() template ![]() templates |
The possibility, shown in this paper, to define any part of SGML / XML documents as atemplate and to insert it afterwards - also in other documents - represents a significant help for authors of SGML / XML documents. This becomes even more interesting, because the positions where the insertion of the template is possible, are searched au tomatically. An important aspect is not only to find the obvious gaps in the document, but also the difficult positions, for example, when parts of the templates already exist in th e document. The information needed for this is contained completely in the Document Type Definition (DTD). Because of this, the presented algorithm is directly suitable for all possible DTDs without any adaptation. |
| editor |
Conventional SGML editors generally allow the insertion of single elements instead of whole parts of documents. In addition, they only allow for the insertion of elements from the immediate area of the cursor. DTD independent mechanisms for insertion have neither been presented nor used yet. |
HyTime ![]() clink ![]() links ![]() |
The presented template algorithm is able to split a template into sub-templates if required and to insert them one by one, into the document . Because of this, the treatment of templates is simplified, above all, in their creation by the author. Moreover, this feature allows the insertion of several subtrees simultaneously, even when they contain links to each other. Because of these links there is normally no other way to insert several subtrees, especially when inserting them successively. By adding specific knowledge about the semantics of the CLINK concepts of HyTime to the template algorithm, this will create the possiblity to support HyTime CLINKS by the template algorithm. |
| realization |
All things considered, an intelligent template algorithm has been realized and integrated into an SGML editor. Finally, this algorithm will be presented together with the resulting experiences. |
Introduction |
|
documents ![]() |
Due to the increasing use of SGML and XML applications, for example in the oil, the pharmaceutical, the telecommunications and the auto industries, as well as in the WWW, there is a growing need to process SGML / XML documents. |
insertion ![]() |
Frequently used tools for this are SGML editors. Conventional SGML editors generally allow the insertion of single elements instead of whole parts of documents (templates). In addition, they only allow for the insertion of elements from the immediate area of the cursor. As a significant help for the author it should be possible to insert complete parts of documents instead of single elements and to fi nd the position where to insert them, automatically. The role of the author (user) shall be the creation of documents, according to a fix DTD (document type definition). The DTD will be created by an SGML expert. |
XML ![]() |
It does not matter if we are dealing with SGML or XML documents, as long as a DTD is present. Because of this, the following analysis applies to (valid) XML even if it is mentioned only as an example of SGML. |
A simple case, it is worth using templates is shown in example 1. |
Example 1: Given the following extract from a DTD: |
<!ELEMENT team-members - - (team-member+) > <!ELEMENT team-member - - (roles , name , department? , address? , zip? , city? , phone? , fax?) > <!ELEMENT roles - - (role+) > <!ELEMENT role - - (#PCDATA) > <!ELEMENT name - - (#PCDATA) > <!ELEMENT department - - (#PCDATA) > <!ELEMENT address - - (#PCDATA) > <!ELEMENT zip - - (#PCDATA) > <!ELEMENT city - - (#PCDATA) > <!ELEMENT phone - - (#PCDATA) > <!ELEMENT fax - - (#PCDATA) > |
<team-member> <roles> <role></> </roles> <name></> <department></> <address></> <zip></> <city></> <phone></> <fax></> </team-member> |
The template algorithm searches the document for all positions where the insertion is possible, and if the user agrees, the template will be inserted there. |
Example 2: |
<team-member> <roles> <role>Author</> </roles> <name>N.N.</> <department>S1</> <address>Magirusstr. 43</> <zip>89077</> <city>Ulm</> <phone>(0731) 9344-0</> <fax>(0731) 9344-100</> </team-member> |
After inserting this template we only had to enter the correct name for all team-members of department S1 into the <name> element. |
What is a template |
|
Definition |
|
Fig. shows the tree representation of example 1. |
![]() |
Creation of templates |
|
It is useful to give the user the possibility to create his own templates. Of course, predefined templates, created by an SGML expert can be provided too. For the creation of templates there are many ways. The rea lization presented in section permits the user to create templates in a similar way to the well known COPY ( / PASTE) function, by copying the selected part of the document into the template. This permits creating templates easily during editing documents with the same editor. An alternative way creating templates could be the use of another editor or of a special tool. |
Inserting position |
|
Searching area |
|
There are many possibilites to define the area where to search for positions to insert the template. It can be the whole document or an area which has to be selected by the user. The realization presented in secti on searches from the selected element forward in the document until the end of the document. |
Suitable positions |
|
Generally there are 3 different types of positions to insert a template: |
|
![]() |
Insertion |
|
|
Remember that according to section , item c, additional elements cannot be created. |
The intelligent algorithm complements already existing structures (white elements) with parts of the template. The added elements are presented in gray. |
![]() |
A lot of possibilites |
|
Generally, there will be various possiblities to insert the sub-template. Because of this, a DTD independent template algorithm has to try all permutations of the sub-templates to be inserted. It remains for the author, to select the desired permutation. The algorithm presented in section shows all permutations one after the other until the user stops it (agreeing or rejecting). If he agrees, the corresponding permutation will be inserted into the document. If the author rejects a permutation by asking the template algorithm for another permutation or by cancelling the template algorithm, the rejected permutation has to be removed from the document. |
Before complementing the existing subtree <t> (see fig. ) the template can be inserted as left sibling (1), afterwards as right sibling (2) of the existing subtree <t>. |
![]() |
Example 3: Given the DTD of examples 1 and 2, the template of example 2 and the following extract from a document: |
<team-member> <roles> <role>Project Manager</> </roles> <name>Steinmann</> <phone>(0731) 9344-3221</> </team-member> |
Additional to the (simple) positions before and after the <team-member> element exisiting in the document the intelligent algorithm will find three permutations, because the element <role> of the template can either be inserted before (as left sibling), either not be inserted or either be inserted after the existing element <role> of the document (as right sibling). All the other elements can only be added to the elements existing in the document, except <name> and <phone>, because these elements already exist in the document and the DTD does not allow to insert them once more (see <u> in fig. ). |
Permutation 1 |
<team-member> <roles> <role>Author</> <role>Project Manager</> </roles> <name>Steinmann</> <department>S1</> <address>Magirusstr. 43</> <zip>89077</> <city>Ulm</> <phone>(0731) 9344-3221</> <fax>(0731) 9344-100</> </team-member> |
Permutation 2: |
<team-member> <roles> <role>Project Manager</> </roles> <name>Steinmann</> <department>S1</> <address>Magirusstr. 43</> <zip>89077</> <city>Ulm</> <phone>(0731) 9344-3221</> <fax>(0731) 9344-100</> </team-member> |
Permutation 3: |
<team-member> <roles> <role>Project Manager</> <role>Author</> </roles> <name>Steinmann</> <department>S1</> <address>Magirusstr. 43</> <zip>89077</> <city>Ulm</> <phone>(0731) 9344-3221</> <fax>(0731) 9344-100</> </team-member> |
Remember that this has only to be done for elements which can occure multiple. |
The default configuration of the algorithm presented in section disables the insertion of #PCDATA at every place as left or right sibling of another #PCDATA. |
Particular attention has to be spent by inserting links and entities (see section ). |
Several subtrees |
|
How to handle links |
|
In the following, first the insertion of simple templates consisting of only one subtree will be discussed. Especially it will be dealt with the SGML standard linking mechanism ID-IDREF. This mechanism assumes, that the target is marked by an attribute of the type ID with an unambiguous value. The referencing elements use an attribute of the type IDREF which contains the same value like the ID attribute of the element to be refe renced. Furthermore, HyTime CLINKs will be discussed in section . |
Independend from the applied linking mechanism, we have to consider links |
the subtree. Links outside the template need not to be considered, because they are not affected. |
To sum up, these three cases are shown in fig. . |
![]() |
Links between subtrees |
|
In the section above we have seen, that links between subtree will be lost, if the subtrees will not be inserted simultaneously. If the links should be kept, it is necessary to insert the subtrees simultaneously, as this is done by the intelligent algorithm. Then the links can be treated like in (c) and will be copied by inserting the template. Fig. shows an example for this. If the subtre es had been inserted one after the other, the link would have been destroyed, because according to (a) the value of the ID attribute in the right subtree had to be set at another value. |
The intelligent algorithm permits the insertion of several subtrees simultaneously. In addition links between these subtrees will be copied correctly. |
![]() |
HyTime CLINKs |
|
HyTime CLINKs typically use an indirection with <nameloc> elements, e. g. for inter-document-linking. Even if these <nameloc> elements are not part of the template, they have to be copied too, so the link wi ll not be lost. This can be achieved with the intelligent algorithm by adding functionality to search and to copy the relevant <nameloc> elements into an additional subtree automatically. In the example of fig. this could be subtree <t2>. The functionality to be added to the template algorithm adds knowledge about the semantics of the HyTime CLINKs to it. This means the collection of the a ffected <nameloc> elements and their integration in the template. With this, the absolute DTD independence will be restricted to DTDs, which support HyTime. |
It is not recommended to extend the template up to the first common ancestor <a>, because in some cases the possibile position, where the original template <t1> could be inserted, will be prescribed too hard. Because of this, the inserting algorithm should be designed that way, so that several subtrees can be inserted, even if the template do not contain their common ancestor. |
How to handle entities |
|
If a template contains references to entities, it must be guaranteed, that they are defined in the document, where the template has also to be inserted. Conflicts of names with entities already defined in the document have to be avoided. |
How to handle external files |
|
If a template contains links to external files (e. g. graphics), it must be guaranteed, that they are accessable for the document, where the template has also to be inserted. E.g. there is to pay attention for access rights or when copying the template into a file of another local machine. If necessary, the external file has to be copied by inserting a template. Conflicts of names with external files already linked to the document have to be avoided. |
The algorithm and experiences with it |
|
All things considered, an intelligent template algorithm has been realized and integrated into an SGML editor. It was necessary, that the editor allows to access to the SGML structure information of the document. |
The user can create a template by selecting an area of the document in the editor and activating the function "Create Template". The template will be stored in a file, the user has to choice. |
By activating the function "Insert Template" the user can choose a template and insert it into the actual document between the cursor and the end of the document. |
Every time when the template or parts of it could inserted, the user will be asked if he accepts this position. Here he can choice the following possibilites: |
With the configuration (see section ) the number of different permutations can be considerably reduced, if required. |
The algorithm to insert templates consists of the following loop: |
Go from the selected element in the actual document element by element until the end of the document and repeat for each element the following steps: |
|
The template algorithm is easy to use and represents a significant help for authors of SGML documents. The number of presented permutations did not cause any problems so far. |
Conclusion |
|
The SGML templates presented in this paper enhance the functionality of conventional SGML editors to create templates consisting out of single or several subtrees with or without content and to insert them afterwa rds, even in other documents. The more extensive and the more frequently used, the more time and expenditure the user can save. Where the template has to be inserted, the structure of the document must be appropriate. If the template does not fit into a gap in the document directly, there must be overlaps, that means there must be common elements in the document and in the template. To keep the template algorithm general and easy to u se (the user will not be asked for information about missing elements) the template algorithm cannot create additional elements between the document and the template to insert. Consequently the creation of templates bec omes very easy, the user has only to create extensive templates, that means the template root element has to be choosen so high in the hierarchy as possible. The template root element had to be choosen much more carefully, as the template algorithm could not possibly add parts of it around existing elements in the document. |
Furthermore the template algorithm can insert several subtrees simultaneously, because elements of the template are permitted to exist in the document already. The benefit is, that links - even between subtrees - will not be lost, both by creating and inserting the template. By inserting the subtrees successively, these links could not be handled. In addition, this is the basis to support indirect HyTime CLINKs. For this some kn owledge about the semantics of the HyTime CLINKs has to be added to the template algorithm (with this, the absolute DTD independence will be restricted to DTDs, which support HyTime). |
The template algorithm shows all possibilities how to insert the template into the document. Out of these possibilities the user can choose the appropriate one. Configurations help to reduce the number of differen t possibilities, if required. With all this, the template algorithm can be used directly without any adaption for all possible DTDs. Beyond it, templates can be used in the same way for valid XML. |
| Document Structure Identification: a New Paradigm | Table of contents | Indexes | Global |
|||