An overview of NewsML   Table of contents   Indexes   XML &, the world of finance

 Metadata 
 

The role of standard metadata in a portal publishing system

Rothschild, Jay
 
 Jay  Rothschild
 Information Designer, Cahners Internet Group
  Cahners Business Information 
 Cambridge 
 Massachusetts 
 USA 
Cahners Business Information,  One Alewife Center
Cambridge  Massachusetts  02140 USA
Phone: 617.873.9474 Fax: 617.873.9450 email: jrothschild@cahners.com web site: www.cahners.com
 Biography
 Jay Rothschild - Jay Rothschild is the Information Designer for the Cahners Internet Group, a part of Cahners Business Information. Jay has been working with electronic content management systems since the early days of the World Wide Web. Prior to coming to Cahners, he worked as an Information Systems Design consultant in Seattle.
 Abstract
 This paper will discuss the impact of the Publishing Requirements for Industry Standard Medadata (PRISM) on editorial content exchange and syndication.
 

Standard metadata in portal publishing

 What is “metadata”? When and how should you begin to capture it? How much metadata do you need for a particular piece of content? How do you attach that metadata to the content it describes? How much of the metadata needs to travel with the content, and for how long? Exactly what metadata should you be capturing? These are some of the questions PRISM is working on, and these are precisely the questions Cahners is trying to answer as it makes the leap from being a print-based publishing company to a “new economy” electronic information company.
 XML 
 
For about a year now, Cahners has been using theXML to facilitate the process of publishing its content electronically. Currently, this process begins with a Quark Express document (this is the layout/pagination tool Cahners uses for its print publications). Using a Quark extension developed for Cahners, a Web editor extracts the text portion of the document as well-formed XML, based on the “Xpress Tag” markup representing the names styles from the Quark. The Web editor then uses anASP script that employs the XMLDOM to convert the Well-formed XML to valid XML, conforming to a DTD Cahners wrote specifically for its magazine content.
 ASP, Active Server Pages 
 DOM, Document Object Model 
 
Most of the Metadata Cahners currently captures is stored as attributes of the root XML element in the valid document. The conversion process itself can capture a certain amount of this metadata from the Quark documents (as much as there is). The Web editor then adds the rest of the required metadata by hand using an XML editor. Cahners then uses that metadata to route articles onto the Web, to sort articles by topic or article type, and to filter articles for re-use rights when we syndicate or otherwise re-purpose an article or a complete issue.
 While we are probably ahead of the industry in our use of XML and metadata to re-publish and syndicate our content, we are not yet where we’d like to be, and our XML metadata is still coming at more of a cost in terms of production than it should. For example, when an article is first written or copy-edited, it would be a simple matter for the author or copy editor to add some basic metadata to the article—the author’s name, for example, along with the date it was created, the publisher, perhaps the subject of the article. Later, as an article moves through its editing cycle, it would be useful to capture some of the other information that accrues about the article—for example, who owns the primary and secondary rights to the article text? Who created the illustrations and other graphics? Does Cahners own re-use rights for all images attached to the article? For what publication volume and issue was the article first written? Out of necessity, a Web editor manages to gather all of this information, but the process is much less efficient than it could be, because the Web editor doesn’t know the answers to many of those questions and must track down the answers.
 Beyond the issue of basic information is the question of context for an article. For a piece of content to be truly valuable to Cahners in an electronic arena, we need to be able to sort and re-collate that content in many different ways. We need to be able to search our content for specific pieces of information, and insert electronic “hooks” to capture connections between pieces of content. For example, Cahners may want to create a Web “portal” for all of its electronics industry titles. Within this portal, we want to let our customers view articles (or maybe even just pieces of articles), by subject matter (e.g., “semiconductors,” or “analog circuits”), or by content type (e.g., industry news, feature articles, commentary), or both. We want to send email notification to our readers when an article appears on a subject that readers have identified as of particular interest to them. Or when an article appears about a particular company, or a particular product.
 Or, on a more granular level, we may want to link a company name within an article to a profile of that company, or to a list of products that company currently offers. We may want to link a product name to a profile of the company that sells that product. From there, we may want to take the next logical step, and facilitate some sort of transaction between the reader and the company. This is, after all, the ultimate direction of so-called business-to-business (“B2B”) e-commerce on the Web.
 In order to do all of that, Cahners needs a format-neutral, centralized method for capturing metadata as content is created or acquired, and for storing that metadata in a way that makes it easy for people or programs to search and assemble our stored content. We need a workflow that allows content creators, editors and others to apply metadata to content efficiently, at the moment it first becomes available. Or even to generate the metadata automatically. And finally, to accomplish all I outlined above, we also need a rich, comprehensive set of industry-standard metadata terms.
 To meet the first need, we are building an XML-based content management system that will give our editors workflow tools for creating content, adding metadata, and storing both in a centralized, searchable, repository. Another set of tools will facilitate the creation and publication of portal Web sites, targeted email, content syndication, and other electronic information products. The metadata framework that will help power this system will come, we hope, from the PRISM standard. Because we’re on the cutting edge of the industry right now, working with PRISM is giving us the opportunity to leverage the findings of the Working Group within our current efforts, as well as shape the direction of the industry standard as it develops.

An overview of NewsML   Table of contents   Indexes   XML &, the world of finance