| XSL Theory and Practice | Table of contents | Indexes | The need for a European XML/EDI Pilot Project | |||
XML: The way toward the virtual library |
| Ricardo Eito Brun |
| Editor |
| IWE - El Profesional de la informacion
C/ Pedro Teixeira 10 7 21 Madrid Madrid Spain 28020 Phone: +91 555 79 06 Email: ricardo@rayuela.uc3m.es |
Biographical notice: |
Ricardo Eito Brun works as an editor since 1996 forhttp://www.doc6.es/iwe El Profesional de la Informacion</a>, formerly Information World - Spain, a publication of Sweets Subscription Services and as a software technical documentation specialist. |
ABSTRACT: |
In this context, XML initiatives have to face the same problems than SGML had to face in the past decade. |
But which advantages can we get from XML in a library environment |
This contribution explains the different approaches to XML that are interesting for the library community: |
Introduction |
The concept of a virtual library has been one of the most popular dreams for the library community. |
Virtual libraries - also known as digital libraries - propose a new model to store and manage information, as long as a different way to access these materials. |
Advantages of the digital libraries |
But besides the social advantages offered by the virtual libraries, there are more improvements: |
First of all, we are going to be able to attend a larger number of documents. |
The collection of a virtual library is not restricted to the physical limits of any building. As users of a virtual library we are going to be able to access documents available in digital format in different libraries or repositories in the Net. |
More over, the possibilities to search and browse information with success are greater in a digital environment. In fact, one of the points that require a lot of improvement in the virtual libraries currently available is the design of the OPACs (Online Public Access Catalogs) |
OPACs ![]() |
OPACs: the portal to the library services |
The OPAC is the main interface among a library and its users. Up to this moment, the OPACs are offering full text searching as long as the possibility to browse list of keywords from a controlled vocabulary used by librarians to index and describe the contents of the documents |
Anyway, even the most advanced OPACs suffer the lack of navigational capabilities that the controlled vocabularies offer. |
OPACs also offer another possibility: to download the bibliographic description of the retrieved documents and to start an ILL (interlibrary loan) conversation with the library server. |
This possibility is offered when two or more libraries share their bibliographic records in order to improve the quality and scope of their services. For example, a librarian will search a catalog and, as a result, he will be allowed to retrieve some records that he want to add to their own catalog. |
In order to transfer and share this information, the involved libraries need to set up an agreement. There are different collaborative plans that are well known for librarians, for example Libertas Network - developed by the software company SLS and the users of its library management software -, and some others. |
The possibility to retrieve a bibliographic description from a catalog and copy this record to our own database may serve different purposes, besides creating a reference from our database to the database owned by another community of users. It is usually used to save time in the technical processing of library materials and in the cataloguing phase. |
By this way, a library can retrieve the right bibliographic description from another library, avoiding the need to create a new record. A lot of expenses can be saved working on this way. |
But these efforts to cooperate have not only been developed by library users. Other players in the library market have seen here the possibility to strength the links with their customers. So, booksellers are offering the bibliographic description of the materials they provide. So, we find a link between libraries, software providers and booksellers, and as a consequence the need to automate data transfer between their systems. |
| MARC |
Automating data transfer |
The library community has been working for more than twenty years with the MARC (Machine Readable Format) format in order to do so. This format was first released in 1965, and it is based on the international standard ISO 2709. It was designed by the Library of Congress technical staff. |
Now MARC format is a well-established way to share bibliographic data among libraries. There are great collections of book materials that are using this format to describe their collections. For example, the catalog of the Library of Congress has millions of bibliographic records, the same as the OCLC (Online Computer Library Catalog) - one of the greater utilities in the world - and some others, etc. |
MARC permits to manage records with variable-length fields. To do so, records are structured in three parts: lead, directory and data sections. The lead contains information that let the applications know where a field starts, how long it is, etc. |
In order to accommodate the specific needs of the different countries, MARC format has evolved into different formats that share a common structure that guaranties the usability of the basic information regardless the system application or the origin of the records. |
Regarding the possibility to integrate MARC format with SGML , the Library of Congress developed ahttp://lcweb.loc.gov/marc/marcdtd/marcdtdback.html complex DTD (Document Type Definition) that permits convert from MARC to SGML and viceversa</a>. Although this DTD has been said to be very complex and hard to use, it is an excellent starting point to migrate from MARC records to XML data deployable on the web. |
The OPAC and the ILL |
On the last decade several attempts have been done to automate interlibrary loan processes. As an example, there is an international standard based on the OSI (Open System Interconnection) communication protocol to model the interchange of information between two or more libraries wanting to interchange their bibliographic materials. |
This model indicates the format that has to be used to codify and transfer the information in the different steps of the process, what information we have to indicate and the way to do it. Doing it right, the interlibrary request and the subsequent interchanges of data can be automated and processed by any software application developed according to this standard. |
It should be usual to have the possibility to request a document from the OPAC interface. After executing a query, the user should be able to ask for the item and start a conversation with the server. In order to automate this process, the system should be able to complete the following steps: |
As we can see, inter library loan requires a dynamic interchange of data among the client and the server applications. This process can be more complex if more than two libraries are working together. For example, if a library can't loan a document, there may be another library that could do it and the system could redirect the request to a different place, and so on. |
More over, the user could request a document and let the client application search for those libraries that own the document and can loan it on date. |
As a conclusion, inter library loan processing requires two or more software applications working together and interchanging rich information and documents. The information that these documents contain must be used to feed different databases. So, as there are different applications that must interact dynamically, XML is a good opportunity to improve data interchange. |
But of course, we need a standard way or some kind of agreement to describe the documents and the data we are going to use to model the whole process. |
Software to manage bibliographies |
In the market there are several applications that are also interesting for the library community whose purpose is to manage bibliographies. |
In general, these applications offer the following functionality: |
|
Regarding the output layouts, the users are able to create bibliographies using different formats to print the data. There are different formats currently used. Each format specifies how to print the data, how to distinguish the different elements within a reference (author, year of publication, title, and so on), their order within the bibliographic reference, etc. |
We can think here on XML as a way to create links between these applications and the rest of databases and word processing programs that interact with them. |
By this way, an end user would be able to download the bibliographic profile of an item retrieved from a remote database and save it in its personal bibliographic database. In fact, some of these applications are offering Z39.50 retrieving capabilities that lets the user query a remote database and download the result to its own database. |
In the same way, if a new layout is needed to print the data, XML exporting capabilities to word processing applications and XSL for data presentation would be a desirable feature for these applications. |
XML for information providers |
In the last year some changes has been observed in the information market. Information suppliers are distributing their contents to their customers in a different way that we might call web2web information deployment. |
In the first years of Internet, the access through web browsers substituted traditional text-only terminals. Now, the distribution model has changed, and the information is pushed from the web server of the information providers to the intranet servers of the customers. |
Although this model is more oriented toward corporate customers, libraries and documentation centers should also embrace this model as they evolve to the new paradigms in information dissemination. |
The most important idea behind this distribution approach is the need to save the readers' time and to send each workplace only that information that is really pertinent and useful to each worker. |
The organization that receives those documents can store them on its own network for a limited period of time (the duration of this period is fixed by contractual terms). The customer will be able to process these documents and data and integrate them with their own databases. End users will access the whole information using their Internet browsers and HTML based interfaces. These two tools have become the de-facto standard to access data from different sources using a single interface, and they are a key component of the new model to access information. |
But, in order to work, this model requires a standard way to move data from the information provider to their customers. We also need to agree on the format we are going to use to create a single view for all the data available in the corporate intranets. |
Although it is difficult to fix a common way to represent data that could be adopted by the different information vendors, at least the possibility to receive these data in an easy to process format as XML should improve the problem of the integration of information from different providers. |
XML for Knowledge Representation |
On the seventies, a lot of organizations and people developed thesauri and controlled-vocabularies to improve the access to their own databases. |
A thesaurus is a list of controlled terms hierarchically arranged. This hierarchy indicates the semantic relationships between the different terms. The different types of relationships between terms are normally used to lead the user to a non-preferred term to a preferred one. A non-preferred term is a term that should neither be used to search documents in the database nor to index them. Other relationships are used to indicate that a term has a more specific scope that another one, and to indicate that the meaning of two terms are related. |
But thesauri are usually identified with an unsuccessful effort of the library community. There are several reasons for this. One of them is the possibility to use full-text searching to access document databases instead of controlled vocabularies. Full text automatic indexing is more economical. Using full text indexers, there is no need to invest time and resources analyzing the content of the documents and a lot of time and money can be saved. |
The other problem with thesauri is due to a lack of compatibility between the controlled languages designed by different communities of users. Organizations and groups developed their own thesauri to index their databases. As a result, we had different indexing languages and the problem of incompatibility. The user had to learn to use a new thesaurus each time he wanted to search a database. This situation made more difficult the access to the information resources available on different digital vaults. |
But as the number of users of online information vaults has grown, organizations have realized that full-text searching is not the best solution to search and retrieve documents. If we want to get better results, some kind of conceptual analysis and the use of controlled vocabularies and terminology databanks is needed. |
So, there is a renewed interest on thesauri and controlled list of keywords. But improvements need to be done. |
To solve the problem of compatibility people have proposed different approaches. One of them is to use an intermediate language that sets the correspondence between two terms from two different thesauri. The developer of a controlled language only needs to indicate which term in the intermediate language corresponds to each term in the source language. An automatic program will translate the origin term to the corresponding term in the target language. |
Here we find a different area where XML may be useful. For example, we could use elements to represent terms and relationships between them, and attributes to represent the type of the relationships and the correspondence between each term in the source language and another one in an intermediate language. An example of this can be seen in the Dublin Core metadata system. This schema lets the indexer introduce an element to add a classification code to a bibliographic profile. |
Conclusions |
XML is related with different applications that are useful for librarians: RDF (Resource Description Framework) may be a good example of this kind of applications. |
But XML can also be used in other processes that require the interchange of data between applications. Library software and document automation requirements offer several opportunities to use this new format and its related technologies. Inter library loan and cooperate cataloguing are two areas of application where XML can be applied. But we have to consider also the integration of information from different sources, and the need to manage directories of users, libraries, etc. |
Once these repositories were available, new types of information access mechanism could be used, among them the discovery of information by software agents. |
In the case of the library community, we have ways to represent the semantic of the data and to model information-rich processes. With XML , we are offered the possibility to reuse this achievements in a new framework. In fact, we haven't realized yet all the possibilities that Internet is offering to us. |
|
Bibliography
|
| XSL Theory and Practice | Table of contents | Indexes | The need for a European XML/EDI Pilot Project | |||