An XML-Based Interchange Format for EXPRESS-Driven Data   Table of contents   Indexes   Trying not to get lost with a topic map

 

Mass-customizing electronic journals

 Vicente   Luque Centeno
  Universidad Carlos III de Madrid  Área Ingeniería Telemática, Dept. Tecnologías de las Comunicaciones Avda de la Universidad, 30
Leganés   Madrid  Spain  E-28911
Email: vlc@it.uc3m.es Web: http://www.it.uc3m.es/~per
 
Biographical notice:
 Fernández Panadero, Mª Carmen 
 Leganés 
 Spain 
 Universidad Carlos III de Madrid 
 

 Mª Carmen   Fernández Panadero
  Universidad Carlos III de Madrid  Área Ingeniería Telemática, Dept. Tecnologías de las Comunicaciones Avda de la Universidad, 30
Leganés   Madrid  Spain  E-28911
Email: mcfp@it.uc3m.es Web: http://www.it.uc3m.es/~per
 
Biographical notice:
 Delgado Kloos, Carlos 
 Leganés 
 Spain 
 Universidad Carlos III de Madrid 
 

 Carlos   Delgado Kloos
  Universidad Carlos III de Madrid  Área Ingeniería Telemática, Dept. Tecnologías de las Comunicaciones Avda de la Universidad, 30
Leganés   Madrid  Spain  E-28911
Email: cdk@it.uc3m.es Web: http://www.it.uc3m.es/~per
 
Biographical notice:
 
Carlos Delgado Kloos is Full Professor of Telematics Enginering at the Carlos III University of Madrid. His present interests include among others electronic publishing, tele-education and e-commerce. He has been and is presently involved in many projects with European (Esprit), national (Spanish Ministry) and bilateral (Spanish-German and Spanish-French) funding. He has published over 50 articles in national and international conferences and journals. He has further written a book and co-edited three.
 Leganés 
Marín López, Andrés
 Spain 
 Universidad Carlos III de Madrid 
 

He holds or has held various posts in national and international bodies such as: vice-president of IFIP TC 10, secretary of IFIP WG 10.5, editor of the Springer journal `Formal Aspects of Computing', subdirector of Telecommunication Engineering at his University and manager of the National Programme for Information and Communication Technologies at the Spanish Ministry. He has been programme committee member or chair at more than 30 conferences and workshops, among other vice-chair of the IFIP'92 World Computer Congress.
 Andrés   Marín López
  Universidad Carlos III de Madrid  Área Ingeniería Telemática, Dept. Tecnologías de las Comunicaciones Avda de la Universidad, 30
Leganés   Madrid  Spain  E-28911
Email: amarin@it.uc3m.es Web: http://www.it.uc3m.es/~per
 
Biographical notice:
 García Rubio, Carlos 
 Leganés  
 Spain  
 Universidad Carlos III de Madrid  
 

 Carlos   García Rubio
  Universidad Carlos III de Madrid  Área Ingeniería Telemática, Dept. Tecnologías de las Comunicaciones Avda de la Universidad, 30
Leganés   Madrid  Spain  E-28911
Email: cgr@it.uc3m.es Web: http://www.it.uc3m.es/~per
 
Biographical notice:
 Leganés 
 Spain 
Sánchez Fernández, Luis
 Universidad Carlos III de Madrid 
 

 Luis   Sánchez Fernández
  Universidad Carlos III de Madrid  Área Ingeniería Telemática, Dept. Tecnologías de las Comunicaciones Avda de la Universidad, 30
Leganés   Madrid  Spain  E-28911
Email: luis@it.uc3m.es Web: http://www.it.uc3m.es/~per
 
Biographical notice:
García Ares, Arturo
 Leganés 
 Spain 
 Universidad Carlos III de Madrid 
 

 Arturo   García Ares
  Universidad Carlos III de Madrid  Área Ingeniería Telemática, Dept. Tecnologías de las Comunicaciones Avda de la Universidad, 30
Leganés   Madrid  Spain  E-28911
Email: agarcia@it.uc3m.es Web: http://www.it.uc3m.es/~per
 
Biographical notice:
 
 
ABSTRACT:
 
The evolution of the WWW has opened the way to putting information at the fingertips of the whole world with very little effort. As the amount of information available grows, there is an ever increasing demand for personalized information. In this paper, we present some ideas that we are developing in the project ``El Periotrónico'', where we take a new approach to electronic newspapers. We are taking advantage of new Web technologies to personalize both a newspaper's content and interface layout according to users' preferences.
 

Introduction

 
The current Web chaos can be considered a consequence of HTML. HTML allows separating the content, presentation (CSS), and behaviour (Java and JavaScript files), but is not rich enough to describe the logical structure of the document and does not take full advantage of processing capabilities on the client side. New Web technologies like XML, XSL, XLL, DOM, Java y JavaScript solve some of these problems.
 
One of our main design decisions is to use XML to define our own markup language JML  (Journalism Markup Language) to properly tag the journal content, its logical structure, and its metadata. This allows new articles to be self-describing which in turn allows more precise search criteria to be applied.
 
Likewise, we are defining JPML  (Journalism Preferences Markup Language) based on XML to specify the user's interest. The reader of a newspaper indicates his preferred topics. These are saved in a JPML document in order the system to show him/her only those pieces of JML news that match his/her preferences.
 
The introduction of metadata into the news media affects the way the news are created selected and retrieved. Journalists are no longer constrained by the physical amount of space in the printed newspaper and have new ways to present information (multimedia content). Journalists need to indicate the importance level of every news element predicated on the reader's characteristics. New tags and attributes are needed for highlighting text in a personalized manner, defining target readers and indicating the expected level of importance assumed by the journalist.
 
Since journalists have to include JML tags in their articles, a JML editor should be provided for them. New IBM tools that deal with XML include a program that generates an XML editor adapted to a user-defined DTD. The automatically generated JML editor can be extended with JDBC routines that insert the JML documents into a SQL database. The editor can also manage images for illustration and advertising.
 
The next sections describe the benefits of using a XML technology like XSL in the journal generation process, some journal personalization details used in our project and a description of JML and JPML as proposed XML applications for news markup and personalization markup. Finally, some details about the mixed evolution of Web Technology and Digital TV and some conclusions and future work are presented.
 

Journal generation

 
While XML defines the logical structure of the news, XSL allows specifying the formats for different news. One of the main advantages of using XSL instead of CSS is the possibility of specifying a transformation step before the formatting in order to achieve not only a different format but also a different physical structure. With the same XML document but different style sheets we can generate different versions of the newspaper with structure and formatting properties that match different information spaces (printed version, online-version in broadband networks, online-version on networks with smaller bandwidth, etc). In particular, we can convert XML documents to HTML, SMIL, and maybe in the future to MHEG for display via a set-top box on a TV set. After the transformation step, style sheet rules specify the format of the document.

Journal generation from XML format

 
 

Personalization

 
Personalization not only applies to style and layout, but also to contents. We have implemented a personalization agent written in JavaScript that performs contents customization of the news at the client side. Readers can subscribe to different sections. Every section contains a list of references to news articles published in that section. The personalization agent highlights headlines according to the reader's preferences.
 
Although the reader can specify his preferences statically in a form, Web technology allows dynamic personalization too. The object-oriented model of the XML documents and the DOM standard are a perfect material in which to structure information for further processing by languages like Java or JavaScript. The usage of these languages allows the document to interact with the reader. The system can automatically detect the behaviour of the user and analyze it to dynamically modify the configuration parameters.
 

Journalism Markup Language (JML)

 
The purpose of this markup language is to properly tag the journal's contents and its metadata so that four different aims can be achieved.
 
  1.  News articles may be ``self described'' in order to be properly handled in the personalization process.
  2.  The news archive can be accessed by combining matching criteria in order to produce refined results, not every news article that just contains the searching term somewhere in its text.
  3.  Different style rules can be applied to the same document, so that the same document can be viewed with a different layout in a personalized manner.
  4.  Journalists require a method for indicating the importance level they consider every news element might have, maybe depending on the kind of reader. New tags and attributes for highlighting text in a personalized manner, defining target readers and indicating the expected level of importance assumed by the journalist are needed.
 
The figureKLO-002 shows a reduced version of the JML 's DTD grammar and figure KLO-003 shows a small example of a news article tagged in JML .

<!ELEMENT JML (JML_AUTHOR, JML_PLACE?, JML_DATE?,
	JML_TITLE, JML_ABSTRACT?, JML_BODY)>

<!ELEMENT JML_AUTHOR EMPTY>
<!ATTLIST JML_AUTHOR value CDATA #IMPLIED>

<!ELEMENT JML_PLACE EMPTY>
<!ATTLIST JML_PLACE value CDATA #IMPLIED>

<!ELEMENT JML_DATE EMPTY>
<!ATTLIST JML_DATE value CDATA #IMPLIED>

<!ELEMENT JML_TITLE (#PCDATA)>

<!ELEMENT JML_ABSTRACT (#PCDATA)>

<!ELEMENT JML_BODY (#PCDATA|P)*>

<!ELEMENT P (#PCDATA|B|I)*>
<!ATTLIST P importance_level CDATA #IMPLIED>

<!ELEMENT B (#PCDATA)*>
<!ELEMENT I (#PCDATA)*>
<title>JML DTD grammar</title>
 

<?xml version="1.0"?>
<!DOCTYPE JML SYSTEM "jml.dtd">
<JML>
<JML_AUTHOR value="Maruja Torres"/>
<JML_PLACE value="Madrid"/>
<JML_DATE value="09-06-1998"/>
<JML_TITLE>This is the title</JML_TITLE>
<JML_ABSTRACT>This is the abstract</JML_ABSTRACT>
<JML_BODY>
<P importance_level="general">
This <B>is</B> the body</P>
</JML_BODY>
</JML>
<title>Example of JML document</title>
 
 

Journalism Personalization Markup Language (JPML)

 
 JPML has been defined to specify user's interests. Preferences determine the way headlines are shown (highlighted, collapsed, inline, linked, ...). However, the reader can also perform explicit requests that don't match the preferences. Figure KLO-004 shows a simple example of a reader's preferences and figure KLO-005 specifies the DTD grammar for this markup language.

<?xml version="1.0"?>
<!DOCTYPE JPML SYSTEM "jpml.dtd">
<JPML>
<RULE>
<ATOM key="keyword" value="euro"/>
<ATOM key="section" value="finances" negated="true"/>
</RULE>
<RULE>
<ATOM key="keyword" value="Real Madrid"/>
<ATOM key="keyword" value="Champions League"/>
</RULE>
<RULE>
<ATOM key="author" value="Clark Kent"/>
<ATOM key="keyword" value="Ecology"/>
</RULE>
</JPML>
<title>JPML example</title>
 

<!ELEMENT JPML (RULE)*>

<!ENTITY % match "(starts_with|ends_with|substring|fullword|is_equal_to)" >
<!ELEMENT RULE (ATOM)*>
<!ATTLIST RULE
enabled (true|false) "true"
description CDATA #IMPLIED
action CDATA #IMPLIED
>

<!ELEMENT ATOM EMPTY>
<!ATTLIST ATOM
key CDATA #REQUIRED
value CDATA #REQUIRED
ignorecase (true|false) "true"
ignoreaccents (true|false) "true"
negated (true|false) "false"
matching %match; "substring"
>
<title>JPML DTD</title>
 
 
The meaning of condition attributes is described below:
 
  •  key: defines the name of the metadata field to which the matching criteria is applied. Possible values for this attribute aretitle, section, author, keywords, date, source, ... .
  •  value: defines some value specified by the user that can be compared against the value of thekey metadata.
  •  ignorecase: defines whether values have to be folded to uppercase before compared, overriding the need of exact match.
  •  ignoreaccents: defines whether orthographic accents should be considered.
  •  negated: reverses the condition.
  •  matching: defines the criteria which is to be applied between the value and thekey 's value. Though the less restrictive ``substring'' criteria is applied as the default, other criteria can be specified, like ``fullword'' for matching whole words, ``starts_with'' or ``ends_with'', which require that the specified value can be found at the beginning or the end of the metadata field. This allows people to search and/or highlight headlines whoseauthor is Clark Kent , whosetitle starts with Clinton or whosekeywords contain Iraq .
 
Besides that, rules also define the following attributes:
 
  •  enabled: for enabling/disabling the rule.
  •  description: a short user's description for that rule.
  •  action: the action to be performed when the rule is activated. Possible values for this attribute are highlight ,iconify ,hide ,open in full window .
 

Integration of Web Technology in Digital Television

 
There is currently a big activity around the integration of Web based technology in digital television. This integration offers advantages both to Internet content providers and digital television companies. Digital television offers the possibility of integrating audio video and data in real time and processing capabilities at the customer location by means of the set-top-boxes. This opens the possibility to offer to the customers new services, including interactive television. These new services could be based on Web technology. The Internet content providers can access to a big amount of potential customers. Many of these potential customers are not using Internet and therefore, cannot be addressed by this means.
 
Examples of the applications that could be offered include access to Internet via digital television, or using Web technology for annotating broadcast television. In the first case, part of the bandwidth provided is shared between all the customers to access the Internet. In the second case, Web technology gives the enhanced television contents a certain degree of interactivity.
 
Among the different initiatives that are currently being carried out with respect to "television and Web" we could cite the following. is a family ISO standards that deal with the coding of hypermedia contents. It includes the definition of multimedia objects, a declarative language for presenting multimedia contents, and a scripting language for data processing in MHEG applications.
 
is an industry group that includes many companies interested in interactive television. Among these companies we can cite CNN, Disney, Intel, Microsoft, etc. Inside ATVEF it is being developed a specification that supports the presentation of so-called "HTML-enhanced" television contents. It is composed of announcements of the programming, triggers that define the actions to take and the location of the contents and the multimedia contents.
 
Finally, the World Wide Web Consortium has created a "Television and the Web" . Inside this interest group, several activities around the integration of the Web and digital television are performed.
 
We are witnessing an explosion of multimedia and interactive services. This is causing a whole plethora of standards to come up, or existing standards to try to adapt to the new media. HTML belongs to the former case, and JPEG and MPEG are examples of the latter. Given this scenario, the need to standardize a higher level interface than the current standards naturally arises. MHEG, which is an acronym for Multimedia and Hypermedia coding information Experts Group, is a set of standards under development by ISO that address the specification of platform independent applications consisting of multimedia objects. Specifically, it focuses on:
 
  •  Synchronization in space and time of these multimedia objects
  •  User interaction via links and user interface elements such as menus, buttons and text entry fields
 
An MHEG application consists mainly of declarative code that describes the objects that make it up. The application code is stored in servers that handle it to requesting clients. Nor the models nor the applications that are likely to make use of MHEG objects are defined by the standards. Possible scenarios include periodic broadcasting of Near Video on Demand or demand downloading of an electronic education application. The encoding of multimedia content is not part of the standards either; it is assumed that existing standards such as MPEG or AVI will be used.
 
On the client side, an MHEG engine parses the declarative code, produces the required on-screen presentation and handles all user interaction. This engine should be supported on machines with minimal resources, such as set top boxes. This is the reason why cpu-hungry tasks such as 3D imaging have been left out of the initial standard. The low resources constraint implies that MHEG is not restricted to Web browsers, but instead intended to serve as a basic form of encoding multimedia/hypermedia presentations to be transferred between pairs of heterogeneous machines, one acting as the server and the other(s) acting as the client(s).
 
MHEG shares with HTML the declarative approach, but while HTML is inherently a document description language, MHEG takes on the job of describing multimedia/hypermedia applications.
 
Similar standardization efforts been done by other groups. This is the case of SMIL, which is an application of XML targeted at the synchronization of video and audio. MHEG should eventually emerge as the leading technology in the field of multimedia presentations.
 
MHEG presence is imminent in the field of interactive TV, as it has been adopted by DAVIC. DAVIC is an international industry consortium whose purpose is to establish a common field of standards and protocols for the emerging digital interactive television.
 
The MHEG standard specifies the following notations to represent application components:
 
  •  ASN.1 - this notation was the first one to be developed. Although application components are unambiguously expressed in ASN.1, it is not considered friendly enough to be read by humans so the following alternate notation was developed.
  •  Textual Notation - this notation was developed to overcome the problems with ASN.1. It doesn't add new features, there is a one to one mapping between both ASN.1 and textual notations.
  •  XML - currently under development, this notation is targeted to the Web world. It is thought to attract a wider user community due to the growing acceptance of XML. An earlier effort to define an SGML based encoding was cancelled due to lack of resources.
 

Futute work

 
WWW evolves too fast. Though JML, our XML language for journalism, is currently only present at our server side, it seems to be clear that XML browsers for Internet will appear in a few months. Then will be the moment to use a style sheet so that JML can be directly visualized in the client's browser instead of being transformed into HTML at the server side. Backward compatibility will be achieved by maintaing a HTML version for older browsers, but, for these readers, personalization services will loose the benefits of XML.
 
The recent DOM standarization is also a very important milestone that will allow software agents implemented in JavaScript to run without platform details in any browser, with much more flexibility and portability than current Dynamic HTML.
 
Acknowledgments
 
The work reported in this paper has been partially funded by the projectTEL97-0788 of the Spanish CICYT . We wish to acknowledge fruitful discussions with our colleagues Peter T. Breuer, Pilar Diezhandino, Tony Hernández, Natividad Martínez, Tomás Nogales, A. Rodríguez de las Heras and Luis Sánchez of theUniversidad Carlos III de Madrid . Useful assistance has been provided byEl País Digital andFundesco .
 
Bibliography
Bib1
Tim Bray, Jean Paoli, and C. M. Sperberg-McQueen (eds): XML: Extensible Markup Language (XML) 1.0, W3C Recommendation, 10-Feb-1998, http://www.w3.org/TR/REC-xml
Bib2
Krishna Bharat, Tomonari Kamba, Michael Albers, Personalized, interactive news on the Web, Multimedia Systems 6: 349-358 (1998)
Bib3
El Digital de Telepolis http://www.telepolis.es
Bib4
Titulares.com http://www.titulares.com
Bib5
"MHEG-5: An Overview" Robert Joseph, Ph.D.
Bib6
MHEG Centre. http://www.mhegcentre.com
Bib7
Advanced Television Enhancement Forum (ATVEF). http://www.atvef.com
Bib8
World Wide Web Consortium. Television and the Web. http://www.w3.org/TV/
Bib9
"ISO/IEC 13522-5: Information technology - Coding of multimedia and hypermedia information - Part 5: Support for base-level interactive applications"
Bib10
"MHEG 98/1175" http://www.demon.co.uk/tcasey/
Bib11
"MHEG-5: An Overview" Robert Joseph, Ph.D.
Bib12
"MHEG-5 - Aims, Concepts, and Implementation Issues" Marica Echiffre et.al. CSELT-Centro Studi e Laboratori Telecomunicazioni
Bib13
"About MHEG - FAQ" http://www.mhegcentre.com
Bib14
"MHEG Complements Interactive TV" Valerie Thompson http://www.byte.com/art/9702/sec17/art6.htm

An XML-Based Interchange Format for EXPRESS-Driven Data   Table of contents   Indexes   Trying not to get lost with a topic map