![]() |
Spatial/temporal datatypes | Table of contents | Indexes | Linking technologies | ![]() |
|||
| XIL | Indexsheets - the "Extensible Indexing Language" (XIL) |
| defines indexing based on XSLT/XPath |
| Cookson, Bennett |
| Bennett Cookson |
| Jr. |
| Senior Architect |
NextPage ![]() Provo ![]() USA ![]() Utah ![]() | NextPage,
5072 N 300 W Provo Utah 84604 USA Phone: (801) 229-6700 Fax: (801) 229-6786 email: Bennett.Cookson@nextpage.com web site: www.nextpage.com |
| Biography |
| Abstract |
Introduction |
HTML |
Why index elements differently |
Index by fields |
| Fields can be typed to allow for comparison searching such as a price less than $10. The field types include plain text, integers, real numbers, date, and time. |
Document table of contents structure |
Content network example |
XIL |
| This section will discuss a few XIL issues before presenting an example in the next section. |
Preserves the original document |
Separate structure from indexing attributes |
Always use xsl: apply-templates element |
Hit transformation |
XML, XSL, and XIL |
| XIL looks a lot like XSL as shown in the following examples. The xsl:template syntax is based on XSLT with the same match attribute based on XPath. |
| For each XML schema (or DTD) there exists an XSL stylesheet and an XIL indexsheet. Here are samples taken from the XML 99 conference proceedings published with NextPage’s LivePublish. |
XML sample |
<PAPER ID="young" PDF="YES"> <TITLE>Electronic Information Commerce</TITLE> <TRACK ID="publishing">Publishing with XML</TRACK> <SESSION ID="publishing-6">Information Commerce</SESSION> <PRES ID="young" TYPE="PPT">Electronic Information Commerce</PRES> <AUTHOR ID="YoungRussel">Russel W. Young</AUTHOR> <SECT> <TITLE>Introduction</TITLE> <SUBSECT1> <TITLE>Electronic Commerce</TITLE> <PARA>Electronic commerce means...</PARA> </SUBSECT1> </SECT> </PAPER> |
XSL simplified sample |
<xsl:template match="PAPER/TITLE"> <div class="title"><table> <xsl:apply-templates/> </table></div> </xsl:template> <xsl:template match="AUTHOR"> <xsl:element name="a"><xsl:attribute name="class">author</xsl:attribute> <xsl:apply-templates/> </xsl:element> </xsl:template> <xsl:template match="SECT"> <div class="sect"><xsl:apply-templates/></div> </xsl:template> <xsl:template match="SECT/TITLE"> <h2><xsl:apply-templates/></h2> </xsl:template> |
Index all elements sample 1 |
<?xml version='1.0'?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0" xmlns:lp="http://www.NextPage.com/ns/indexsheet/2.0" > <xsl:template match='*'> <lp:index field-element-name="yes"> <xsl:apply-templates/> </lp:index> </xsl:template> </xsl:stylesheet> |
XIL sample 2 |
| (Explained below) |
<?xml version='1.0'?> <xsl:stylesheet case-sensitive="no" xmlns:xsl="http://www.w3.org/XSL/Transform/1.0" xmlns:lp="http://www.NextPage.com/ns/indexsheet/2.0" > <xsl:template match='PAPER/TITLE'> <lp:index field="PaperTitle" relevance="highest" > <xsl:apply-templates/> </lp:index> </xsl:template> <xsl:template match='PAPER/AUTHOR'> <lp:index field="PaperAuthor" hit-anchor="postpone"> <xsl:apply-templates/> </lp:index> </xsl:template> <xsl:template match='PAPER/SESSION'> <lp:index field="SessionTitle" relevance="higher" > <xsl:apply-templates/> </lp:index> </xsl:template> <xsl:template match='SECT'> <lp:index field="Section" toc-section="yes"> <xsl:apply-templates/> </lp:index> </xsl:template> <xsl:template match='SECT/TITLE'> <lp:index field="SectionTitle" toc-heading="yes" relevance="high" > <xsl:apply-templates/> </lp:index> </xsl:template> </xsl:stylesheet> |
Explanation |
| For “PAPER/TITLE” a field is applied for searching purposes (a search form is provided to search on paper titles) and the relevance weight is increased since it is the title of the paper. |
| For “PAPER/AUTHOR” a field is applied for use in a search form and since it will generate an A (anchor/link) element in HTML we must postpone the hit-anchor, if any. |
| “SECT” is indexed with the “toc-section” attribute meaning that this element represents structure that should be included in sub-document Table of Contents. |
| “SECT/TITLE” is indexed with the “toc-heading” attribute meaning that this element represents structure that should be included in sub-document Table of Contents. |
Indexing element attribute values |
<xsl:template match='TRACK'> <lp:index-attribute name="ID" field="TRACK-ID"/> </xsl:template> |
<xsl:template match='META'> <lp:index-attribute name="content" field-name-attribute="name"/> </xsl:template> |
| span with field "SpanClass" --> |
<xsl:template match='SPAN'> <lp:index field-name-attribute="class"> <xsl:process-children/> </lp:index> </xsl:template> |
Protecting valuable markup from unauthorized reuse |
Case-insensitive comparisons |
Indexing objects |
Applied to element |
| Indexing objects that apply to the whole element in the normal XSLT fashion. Because an element includes all children, applied to element means applied to all children and particularly text nodes. |
|
|
|
|
|
|
|
Applies only to tag |
| Indexing objects that only affect the begin and end tags rather than the whole element which means that it does not apply to children of element including text. |
| break-word: |
| (yes|no) The default is "no". |
| Used to explicitly break words at the begin and end tags because some element tags break words for indexing purposes and some don't. If text has whitespace around all words in addition to tags then this option is not needed. However, if the stylesheet or browser "knows" that a specific tag breaks words but with no surrounding whitespace, it will display correctly but the indexer has no way of knowing whether a tag breaks works or not unless break-word attribute is specified. For example: |
| <BigFont>A</BigFont> is for <BigFont>A</BigFont>pple |
| Is indexed as one term: Apple. However, there are times when it is desirable to have tags break words. For example: |
| <Letter>A</Letter><Word>Apple</Word> |
| <Letter>B</Letter><Word>Bat</Word> |
| <Letter>C</Letter><Word>Cat</Word> |
| This example is indexed as Aapple, Bbat, Ccat. To have the text indexed as A Apple, B Bat, C Cat, you would set the break-word attribute equal to "yes" for the lp:index element in the rule that matches on the Word element. |
| remove: |
| (yes|no) Remove tag out of document. Used with HTML to remove custom tags or Protect valuable markup for security reasons. |
Indexing element attribute values |
| lp:index-attribute element attributes |
|
|
|
Conclusion |
| This paper has shown why indexing should be determined per element type rather than the same for all elements. The theory, syntax, and examples of XIL “Indexsheets” have been presented. I have tried to show how XIL Indexsheets can be a very powerful tool for information publishing. |
| Since I'm sure that NextPage is not the only company that has faced the need for this functionality, I am interested in hearing how others have solved the same problem. Currently there is only one implementation of XIL indexsheets. However, with input and collaboration from other organizations, I think XIL could be developed into a useful standard. I would be interested in hearing from organizations or individuals interested in pursuing research in this area. |
| Bibliography |
|
|
|
|
|
![]() |
Spatial/temporal datatypes | Table of contents | Indexes | Linking technologies | ![]() | |||