Is XML the Missing Link in Raising Browsers to a Higher Intelligence?   Table of contents   Indexes   Where Does "End-to-end" End?

Bergeron, Donald
LEXIS Publishing & LEXIS-NEXIS Group
Miamisburg
 USA 
 
Donald L. Bergeron
 Consulting Software Engineer - Data Architecture
LEXIS Publishing & LEXIS-NEXIS Group
  9443 Springboro Pike Miamisburg (Ohio)  USA (45342)
Email: donald.bergeron@lexis-nexis.com Web site:www.lexis-nexis.com
 Biography
 Don Bergeron is a Consulting Software Engineer & Data Architect at LEXIS Publishing; part of the LEXIS-NEXIS Group. He is engaged in the integration of XML into the LEXIS-NEXIS systems architecture. He serves on Data Architecture Board of LEXIS Publishing which is focused on positioning XML data and legacy data for effective exploitation. He is the Data Architect and Publishing Product system designer behind Statistical Universe; the October 1998 Database Magazine's Editor's Choice Award winner. Don has also contributed in the solution partner arena as QA Manager for CCA's Model 204 database management system and for MITROL's MIMS product.
 

Preview: A Complex World

 Dynamic content providers are in a fast paced and complex business. We must continually balance the relationship between, New Publishing Product Creation, Consistent Editorial Policy/Data Architecture and a Cost Effective System Architecture. Solution providers to this community must partner with them with both products and implementation approaches which allow clients to tune between these competing demands. Beyond this many content providers have legacy data on a vast scale which must be successfully integrated.
 

Why Is It So Complex?

 Competition! Reed Elsevier / LEXIS Publishing and Thompson / West Publishing are fierce competitors. We strive every day to have more content, richer content and more valuable products for our customers. To accomplish this we ask our internal staffs and solution partners to compete with each other to bring forward answers. When we do this they expand the features available to us, and to all the members of the content provider community. "A rising tide liftsall boats!"
 To be competitive we must be effect at adding value while remaining cost effective. This is not an easy task. Ask any network planners or capacity planner in the content provider business.
 

What is the nature of the content provider business?

 There are three domains that the public experience as content providers:
 
  •  Wide & Narrow Focused Search Portals
  •  Authorship Based Content Providers
  •  Aggregaters & Concentrators
 Wide & Narrow Focused Search Portals are the most typical experience for members of the web community. The major downside issues in this environment are the lack of precision in search and the lack of content persistence.
 Authorship Based Content Providers are becoming more and more common. These have the advantage of having near complete ownership of the intellectual property which they deliver. This content ownership allows them to fully customize the content to the market or individual they are delivering to.
 Aggregaters & Concentrators are the group with the longest history as on-line content providers. We have both blessing (experience) and curse (legacy) of history. A subtle difference which makes this domain complex is that we may not ownall the usage rights to the content. This creates a great deal of complexity between the data, entitlements, pricing and royalties.
 Today, it is quite common for a company to operate in all three parts of this business, but, most often they have their beginnings in one of these branches. LEXIS-NEXIS's home is in the Aggregaters & Concentrators domain. The Wall Street Journal web cite has it's roots in the Authorship domain. The most well known, of course, are Wide & Narrow Focused Search Portals.
 

What is the difference today?

 Higher Expectations!
 
  •  Our Customers are Demanding:
     
    •  Rapid and continuous improvement in the user interface
    •  The ability to perform anormalized search over a wide assortment of document types
    •  Product Customization by market, affinity group targeted or individual users
    •  The ability to deliver seamlessly content from multiple document types normalized into a single integrated document.
    •  The ability to integrate provided content into customer workproduct
    •  More complex entities (graphics, spreadsheets, sound, video)
    •  Functionality delivered now
    •  Data delivered "Really Quick!"
 This environment requires continuous tuning and careful balancing.
 

How can the community think about this?

 We work with a model to assist us.
 
 Business Trade-off Model
 The goal of this model is to encourage the community to correctly partition the value add points between the persistent data described in the data architecture and what can be generated at the publication level.
 

How do we balance the software engineering?

 By going back to basic software engineering principle of early binding vslate binding . We most look at the trade-offs between this creates between flexibility and performance/cost.
 

How can the solution provider community help?

 The solution providers should look at it's portfolio of tools and observe where they play in the model below. They should endeavor to the addition of value as evenly across the modelearly binding andlate binding .
 
 Software Solution Model
 Each solution software box is an opportunity to bind value to the base data. The higher in this diagram we bind value to the data the more flexible we are. The lower in the diagram we bind the value the data thehigher customer performance will be observed and themore predictable the system resource requirements will be.
 
  •  We Need From The Solution Provider Community :
     
    •  That We Keep Architectural Forms as a Viable Feature (Key to late binding)
    •  High Performance in the Update Process in the Delivery Repository (Key to early binding)
    •  Work with the Content Provider Community to Create and Support a True Content and List Markup for Tables and Lists (Assists late binding)
    •  Assist the Content Community with more Robust Application Models for Entitlements, Pricing and Royalties (Keep the solution space open)
    •  Support the Specification of an Architectural Forms Enabled Search Standard (Key to late binding)
 

Scale

 Legacy data and an the new body of knowledge that is added every hour is daunting. The rate of data arriving at LEXIS-NEXIS alone quickly exhausts most of the tools being created in today's applied research community. The solution provider community and the content provider communities must support the research which will provide tomorrow's technologies.
 Another result of scale is that you can not apply all of your value add to all of the data at update time. Therefore you need hooks to support late binding of the value.
 

The Dry Statistics

 
Content in the LEXIS-NEXIS Collection
Sources on-line: 26,933 [Last Updated: 9/9/1999]
Searchable Documents(OnLine): 2.5 Billion [Last Updated: 9/10/1999] 
Related Measures:
Searchable documents added per hour: 32,738
Rolling 12 month growth rate of searchable documents: 44.0%
On-line Databases: 10273 [Last Updated: 9/10/1999] 
Searchable Characters: 2224.8 Billion [Last Updated: 9/10/1999] 
Related Measures
Searchable characters added per hour: 35.1 Million
Rolling 12 month growth rate in searchable characters: 30.2%
This month vs. last month change in searchable characters: 0.2%
Attachments On-line: 26.7 Million, GB: 730.1 [Last Updated: 9/7/1999]
Related Measures
Images On-line: 26.6 Million
Percent Attachments being Images: 99.63 %
Users of the LEXIS-NEXIS Service [Last Updated: 7/9/1999]
Total Subscribers: 1.7 Million
 

What does this Mean To Our Users?

 If you sat in front of your computer screen and spent just 5 seconds skimming over every screen covering all the data we have on line, spending 24 hours a day, you would need over 71,500 days to skim everything on-line. That's equal to 195 years of reading 24 hours per day, every day. You can't catch up!
 Not only can't you catch up, but you would be behind by the amount that we added since you started reading. After 5 years of reading 24 hours per day, every day, you would have fallen behind by over 1,646 years. That gives you an indication of how much data we are planning on adding.
 Another way to look at how much data we have on-line is if you printed out the data on standard letter paper, the stack would be over 247,200 feet high. That's equivalent to more than 8 Mt. Everests, one on top of the other.
 If you took all those pages and taped them end to end, How far would they stretch? They would stretch to more than 5 times around the earth!
 Finally, consider a typist able to type 60 words a minute. It would take about 2,000 typists to keep up with our average weekly data additions.
 By the way, in the time it took you to read this section, LEXIS-NEXIS would have done, on average, around 378 searches. A lot more if it was a busy time!
 

Conclusion

 

What does this mean to the content provider community?

 Be competitive! I hope all of the content providers here today survive until XML 2000. Not all of us will.
 Be good partners with, the public sector, solution vendors and content providers. Because, we can not alone meet all of our customers needs. A meaningful way to accomplish this is actively participate in content area standards definition.
 Do not forget your the hard learned lessons.
 Learn from from your partners.
 

What does this mean to the solution partner community?

 It's a call to them individually and as a community to help us solve the problems of balancing early vs late value binding.
 It's an eye opener, to see and come to grips with the scale of persistent content.
 It's a challenge to them to partner with us to change this persistent content into truly dynamic content on the web.

Is XML the Missing Link in Raising Browsers to a Higher Intelligence?   Table of contents   Indexes   Where Does "End-to-end" End?