| | -
| | A Keyword Set is a collection of words which belong together regarding a special aspect. For example we can define a set of words which can be found close to a certain tag or we define words which represent a link and so on. These sets can be used within rules. |
| | Keyword Sets can be defined by two different users |
| | -
| | SGML/XML Expert during analysis and design phase |
-
| | Domain Expert during verification of converted data |
|
| | or obtained automatically from an authoring system. The authoring system stores information about the content of all tags. The content could come from a real authoring process or from previously imported legacy data. |
-
| | Every rule describes how a certain property can be recognized. Usually the name of the rule describes the property. Very trivial properties for example are position or length information (more general -> layout information). The distribution of predefined keyword sets is a more complex property. |
| | A certain rule has zero or many arguments to define different expressions/instances of that rule. For example the framework offers a rule to recognize empty lines. The user can create two different instances of that rule: one which recognizes just one empty line (-> before a paragraph ) and another to recognize 2-3 empty lines (-> before a chapter). We would name them maybe SingleEmptyLine and MultipleEmptyLines. |
| | A rule can consist of other rules - they are recursive. The rules can be combined with boolean expressions (AND/OR/NOT...). So it is very easy to create complex rules from very simple rules. |
| | An action takes place if that rule match can be assigned. The default action is to put a start tag before and a end tag arfter the string which matches the rule. The tag itself comes from the DTD element for which this rule was applied. (For the moment, actions are hard coded ). |
| | For each element within the DTD at least one rule must be defined and assigned. |
-
| | The rules engine applies the assigned rules to each element . If a rule matches, then the action will be executed by the engine and an entry for the verification step (with information about the matched rule, the string and the context) will be written. When more than one rule matches the engine creates a conflict entry for the verification step (with information about the conflict, the matched rules, the string and the context). |
| | The order in which the engine applies the rules (from most complex elements out to the leaves or vice versa) influences the result. To get the best behavior, one should apply the rules in many orders and compare the results. |
| | If the data already contains tags (from a previous iteration step or from hand tagging), the rules engine takes this information and applies only suitable rules. A suitable rule for example is a rule for an element which can be inside a previously added tag (from a DTD). |
| | Information (conflict resolutions, decisions, etc. ) entered during a verification step are used to apply the rules to which this information belongs first. |
-
| | All information written during a run of the rules engine are displayed by the verification engine to a domain expert. |
| | -
| | resolve a conflict (create new rule, apply a different rule, ...) |
-
| | declare a import part as invalid |
-
| | declare a import part as valid |
-
-
| | remove a part from the import algorithm (re authoring) |
-
|
| | After the expert has made all modifications/entries the rules engine could run again if necessary. If he has decided that no further run is necessary the verification engine writes all files and updates all external sources (DB and XML files). |
-
| | The user interface allows different users to interact with all modules inside the framework. |
| | A SGML/XML expert can use it to create keyword sets and rules based on the sets developed during analysis phase of legacy data. The new requests of the customer can also be entered as rules. All information the experts enters can be used for developing a DTD. |
| | A domain expert uses teh interface to verify all actions performed by the algorithm. |
-
| | The reading and writing of data should be implemented as an IO layer which can transform any input format to the internal reprasentation/navigation of the framework and vice versa. |
| | Main parts of a IO layer: |
| | -
| | Read and Write Information from/to Authoring System |
-
-
|
|