Home Overview Resources Workshops OAI-PMH Tutorial Project Documents Project Partners
open archives forum

Workshops:
About  |   Pisa, May 02  |   Lisbon, December 02  |   Berlin, March 03  |   Bath, September 03

 
1st - Pisa  |   Programme-Presentations-Notes  |   Abstracts  |   Participants

Abstracts of the Invited Speakers

"New development in OAI (OAI-PMH2)"
by Michael Nelson, Old Dominion University, USA

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is an evolving protocol and philosophy regarding interoperability for digital libraries (DLs). Previously, "distributed searching" models were popular for DL interoperability. However, experience has shown distributed searching systems across large numbers of DLs to be difficult to maintain in an Internet environment. The OAI is a move away from distributed searching, focusing on the arguably simpler model of "metadata harvesting". Perhaps the strongest and distinguishing feature of OAI is its simplicity: by being "smaller" than previous interoperability projects , it actually allows for more powerful and adaptable configurations and deployments. Key concepts in OAI include the separation of responsibilities of "service provider" and "data provider" and the use of community-specific metadata sets (with Dublin Core as a lingua franca).

Key to understanding the philosophy of the OAI is understanding the separation of responsibilities of "service provider" (SP) and "data provider" (DP). In practice, a SP and a DP can reside in the same entity, but it is important to understand the distinction. A DP is a repository (or "archive") - simply a collection of metadata records (which may or may not point to corresponding full-text documents). A SP provides value added services (e.g., searching, browsing) on the metadata extracted from one or more DPs. SPs are free to define their own services, presentation and interfaces tailored to the user base. These services could be complimentary or competitive.

The OAI-PMH consists of six verbs, 3 of which reveal the characteristics of the repository (ListMetadataFormats, ListSets, and Identify) and 3 verbs for extracting metadata from the repository (GetRecord, ListRecords, ListIdentifiers). OAI-PMH 2.0 is going to be realised in June of 2002, after 18 months of testing version 1.1. Although OAI-PMH 2.0 is not backwards compatible with 1.1, 2.0 represents only an evolutionary progression of the OAI-PMH. Some optional features for richer harvester - repository negotiation have been added, ambiguities removed, and extensibility hooks added for desirable features that lie outside of the scope of the protocol (e.g., machine readable rights management information).

OAI-PMH projects and services are being announced frequently; check the Open Archives Initiative home page (www.openarchives.org) for the latest news. The OAI is entering the exciting phase where the focus is no longer just on the protocol, but more rightfully on the various services that use the OAI-PMH in novel and compelling ways as well as the community building that the OAI-PMH facilitates.

New Developments in OAI (OAI-PMH 2.0)  more (ppt-slides, 327 KB)  notes  Notes (pdf-file, 20 KB)


to the top

 

"CERN Document Server Software"
by Martin Vesely, CERN, Switzerland

The presentation gives an overview of implementation approaches of the OAI-PMH protocol and its application potential within communities that are involved in projects where metadata exchange and document handling are essential. The experience gained within the e-prints community is presented, particularly the implementation in the CERN Document Server Software (CDSware), featuring as it does both the service and the data provider aspects foreseen in the protocol.

In general, the OAI-PMH protocol offers a framework for metadata exchange between repositories in a distributed, structurally and syntactically heterogeneous environment, and allows the repositories to practise metadata exchange in a uniform way, taking advantage of the widely deployed web infrastructure that makes it applicable at a significantly lower cost. Services provided within the OAI framework can be characterized as information products created either by a suitable assembly or correlation of metadata gathered from various data providers or by other eventual value-added activities. The core service of the CERN Document Server (CDS) within the OAI framework is the search engine for documents in particle physics and related disciplines using value-added metadata. Today the CERN repository contains more than 550000 metadata records containing mainly original CERN records that may be of interest to potential service providers as a source of scientific and engineering documents in the field of particle physics and related research at CERN. There are also records harvested from other repositories, enriched, maintained and updated periodically by the library staff. In the centralized model that we use today every involved institute submits its contribution of metadata records to the master repository and then harvests the entire contents back. We discuss possible potential of the protocol for an application in decentralized or distributed systems with different topologies based on hierarchical and reciprocal harvesting.

We conclude by discussing protocol features that require customization on the application level rather than on the level of the protocol itself. In order to practise the metadata exchange some policy concerning optional issues has to be agreed within the community. Several issues that we consider essential for broader interoperability within the e-prints community (and most likely also within other communities) are pointed out, such as (i) the http error-prone data flow control, (ii) dealing with the semantic heterogeneity of OAI sets, (iii) the relation of OAI identifiers to the value-added records, and (iv) information loss caused by an exclusive support of the default metadata format.

CERN Dokument Server Software  more (ppt-slides, 85 KB)


to the top

 

"Revealing a new Dynamic: Interaction in an Open Access Archive"
by Steve Hitchcock, University of Southampton, United Kingdom

The talk will show that open access works for authors and users, and reveals some new aspects of the social life of an eprint archive. Illustrating software and services developed as part of the Open Citation Project, and using data from our associated studies of arXiv user behaviour, it will be shown that a new 'dynamic', the speed of interaction between users, becomes evident when access to full resources is free, open and unrestricted. This is important for all those who are building open archives, and for those who are tentatively moving towards building open archives (e.g. the biomedical community).

Revealing a New Dynamic: Interaction in an Open Access Archive  more (ppt-slides, 308 KB)


to the top

 

"Online Information in Astronomy -
From networking to a virtual observatory"
by Francoise Genova, CDS, France

Astronomy relies on long-term observations of variable phenomena, and conserving and reusing data is the key for major scientific objectives, such as the definition of objects and of their properties, or the study of variability and evolution, all this requiring statistical studies on large number of objects. Observations at different wavelengths, with different techniques, allow one to understand the physical phenomena at work in objects. In addition, astronomical observations rely more and more on large ground-based and space observatories, and on large surveys of the sky, and reusing their data for new scientific objectives is necessary to optimise the scientific return of these large projects.

This is an old, but increasingly complex endeavour: information volume and complexity increase, and information is heterogeneous and distributed. Moreover, data must be properly documented to remain usable. A technical revolution has of course occurred in the last years in that domain, with the increased technical capacity to store and manage information, and the new possibilities offered by the WWW in terms of information distribution, of integration of data with documentation, and of navigation between on-line services. These useful and appealing tools are widely used, but one has to keep in mind that careful work on the service contents and functionalities remains mandatory, and that information validation remains critical. Moreover, services and links have to be maintained on the long term.

The development of services for the usage of the scientific community puts new constraints on the Agencies, with competition between data conservation/diffusion and the implementation of new instruments or operational costs. The scientific community also has to put a sufficient priority on this activity, in projects, evaluation, and strategic plans, and to encourage motivated scientists and engineers to work on data conservation and diffusion. Projects have to make their data available, in a usable form, i.e. data has to be properly selected, organized and documented, and the "project memory" has to be kept.

Astronomy has very rapidly taken advantage of the new technical possibilities, by developing on-line information services and information networking. It is a small discipline with few commercial constraints, which has helped to build long term partnership to define community standards, thus allowing the development of links and of generic tools to access data. The network of astronomical information goes from observations, distributed in observatory archives, to results published in electronic journals, with also disciplinary centres distributing data and information in a given domain, and Data Centers building value-added services and generic tools. For instance, the Centre de Données astronomiques de Strasbourg (CDS), created in 1972 with the mission of taking care of electronic data, of building expertise about the data, of implementing tools for science, and of playing an international role, now summarizes its charter as follows: "Collect, homogenize, preserve, distribute astronomical information, for the usage of the whole astronomy community".

Astronomy has developed disciplinary standards and tools for interoperability: for instance, FITS is a widely used data format (images, spectra, tables), and data from any telescopes is kept in FITS format, thus allowing the usage of common tools to deal with it. Another example is the "bibcode" (a 19-character description of published references), first defined by database managers who had to exchange bibliographic information, then adopted and extended by the reference astronomical bibliographic database, ADS, then by the journals when they developed electronic versions, then by observatory archives when they wished to implement links with published papers using their observations. The successful networking of bibliographic information in astronomy demonstrates how a high level of interoperability can be built using de facto standards, with a bottom-up approach defined by a small group of practitioners (long before the advent of the Internet!) and "snowball effect" in the community. In this model, links are easy to build but the quality of contents and validation are fundamental and in thehands of specialists. To take into account the current development of general standards for bibliography, a gateway with bibcode will be developed, to preserve on one hand the human readability of the bibcode and the functionalities and networking of astronomy on-line resources, and on the other hand to permit links with other disciplines (e.g. a correspondence table between the bibcode and DOI).

Another example of disciplinary standard is the description of tabular data, the "ReadMe", an ASCII file, which contains information about the physical organization of the data and about its scientific meaning. It is common to catalogues, tables published in journals, surveys, and catalogues of observations in archives. It allows also a check of the homogeneity of tabular data in journals before publication, which improves the quality of published information, in addition to the peer review. This also means that published tables are usable data, and not only figures printed on paper. In the recent years, an XML standard for astronomical tabular data has been developed (astrores), and the usage of XML is currently widely discussed and implemented in the context of the Virtual Observatory projects.

The Virtual Observatory (VO) can be defined as "an enabling and coordinating entity to foster the development of tools, protocols, and collaborations necessary to realize the full scientific potential of astronomical databases in the coming decade" (NVO White Paper, June 2000). The VO has many components, going from network infrastructure or computer and data GRID, to tools and standards for data mining, or to statistical tools able to access very large, distributed data sets. Several RTD/Phase A projects have been accepted in 2001, the Astrophysical Virtual Observatory in Europe, the National Virtual Observatory in USA, AstroGrid in UK, ... The European project (PI: European Southern Observatory, partners: ESA-ECF, AstroGrid, CDS, Terapix, Jodrell Bank) has three work areas: Science use case and requirements, Interoperability deployment and demonstration, and Technology needs.CDS is responsible for the Interoperability Work Area, and a prototype using the CDS data federation and data integration tools is being developed, to give access to ground- and space-based, multi-wavelength, multi-technique archives. The prototype is made available to the community for scientific usage, in order to obtain science results and user feedback at an early stage of the project. Another objective is to establish a set of usable recommendations for helping archive managers to implement interoperability. CDS also leads the Interoperability Working Group set up by the OPTICON European Network, which aims at studying cost effective tools and standards for improving access and data exchange to/from data archives and information services. The VO projects are coordinating their activities at the international level, and the first common milestone has been the definition of an XML standard for tabular data, VOTable (V1.0 was released on April 15, 2002).

CDS - http://cdsweb.u-strasbg.fr/
AVO - http://www.eso.org/projects/avo/
NVO - http://www.us-vo.org/
AstroGrid - http://www.astrogrid.org/
OPTICON - http://www.astro-opticon.org/
VOTable - http://cdsweb.u-strasbg.fr/doc/VOTable/

Online Information in Astronomy  more (ppt-slides, 3199 KB)


to the top

 

"TORII: Access to Digital Research Community"
by Fabio Asnicar-SISSA, Italy

The communication of the results of scientific research, and in many ways, research itself have changed in recent years as digital means of information production, distribution and access have become widespread. Paper preprints have been replaced by electronic archives, mail and phone calls by e-mail, typewriters and hand drawing by text and graphics software programs, cabinet files by saved directories on hard disks. These new tools, together with multimedia presentations and conference websites, constitute the growing digital network of information that is taking over many aspects of the working place of research. It is a system in which the information flow is regulated, integrated and made available by the software and the network.

This digital network of research is currently organized in three layers:

  • Repositories of information: open archives and databases. This first level is the analogous of library and publishers stacks.
  • Services over and for information: e.g. review journal, cross-citation. They are the analogous of, for instance, library desks and paper journals.
  • Digital communities: synergic union of services and information. Ideally, they replace your desktop environment by giving access to the tools you use in your everyday work.

In this talk we will present Torii, a system that gives direct access to the digital research community. All tools and documents the user need are collected under an unified access point, organized according to his needs and ready for him everywhere he is are and at any time he may need them. An intuitive user interface helps the user to navigate. All the tools the user need are at his finger tips. Choice of archives and subjects are easily costumized to fit his interests. This platform grows as the digital community grows. New features will be added as they become available in the future.

The personal folder is the hub of the system. The user can store his documents here for future reference or to be printed or sent to others. The personal folder is easy to use by means of its drag-and-drop interface. It ideally replaces the cabinet filer where paper documents used to be stored. Stored documents can be ranked according to his profiles, impact factors or evaluation tools. The user will find in his personal folder new documents suggested by the social filtering engine and he can attach to any documents comments for himself or to be shared by the community.

Documents to be manipulated can be organised in a multi-layered stack. It could be an entry in a database, and as a new layer is added so is the entry column modified in the database, or it could be a collection of documents managed by a web server that keeps track of their relationships and modifications. The access to a multi-layered document is dynamical. According to who the user is at a given moment---reader, author, referee, editor---he has access to different layers. Dynamical access requires an appropriate interface between the multi-layered documents and the users. It also requires intelligent agents to sift through the increasingly large amount of information to shape it into some hierarchy and thus making it usable.

Key features for the integration of dynamic access to the informationinto the portal are the XML language and the Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH). The XML language is used to encapsulate in a common structure the exchanged information, originally stored in a variety of formats. This XML metadata structure represents the semantic aspects related to the data.

The OAI-PMH defines a HTTP-based mechanism for harvesting XML files containing metadata from repositories. This is the basic communication protocol between Torii and the underlying services, operating in a location-transparent way. Through the use of the OAI-PMH, Torii will be easily extensible to any archive implementing the protocol.

In a user-friendly information society, the information overload is limited and the information delivery is personalized: the broad-casting of information is replaced by a more effective narrow-casting and mass-media are replaced by personal media tailored to each user's needs. These aims can be reached by more effective systems for information access. Torii provides a filtering component to skim too large a set of retrieved information and thus providing the user only with the information nearest to his interests.

In Torii the user defines his research interest profiles by filling in a form;from this a user profile is derived, based on a semantic network. The profile is automatically updated every time the user provides explicit relevance feedback on some new documents. Documents to be evaluated by the cognitive filtering module are processed through information extraction techniques aimed at capturing the meaning of the document content. These techniques exploit linguistic processing and statistical analysis. Every day the filtering module filters the submissions to the archives accordingly to the user profiles and the graphical user interface provides tools to rank displayed documents accordingly to the user's profiles. Out of the 30-50 daily submissions, the user is able to see the 3-4 most relevant at the top of the list.

Social filtering circulates interesting documents among users who share interests. It automatically feeds in the personal folder those documents that are potentially relevant for the user. The relevance of the documents is evaluated for similarity with the selections done by other users with similar interests. The process is the digital analogous of sharing paper among colleagues. It fosters the growth of the digital community.

Quality control tools memorize and exploit human evaluation of documents. They provide users with the possibility to express their evaluation of a document by filling in a predefined form and writing free textual comments. The form results are used to statistically evaluate numerical scores about the scientific quality of the document, the comments are general, each user can choose whether his comment will be public or for himself. Users can read all public comments on a document. These tools embody a first instance of open peer review in which the community as a whole participate in the review process.

A search engine, Okapi, is accessed directly from Torii. It offers a sophisticated search environment where you can look and search among the more than 150,000 documents currently stored in the archives. Okapi offers advanced retrieval mechanisms based on the probabilistic model of retrieval and relevance feedback. It runs on both the document metadata and their full text. It is fast and accurate.

An assistant monitors the user search and helps him with helpful hints and terminological and contextual suggestions. It alerts the user for dead-end searches leading to hundreds of documents or no document at all. The user is made aware of strategic aspects of searching that allow him to fully exploit all information resources and services. The assistant comes fully integrated into the Okapi search engine of Torii.

Torii integrates also iCite. This tool extracts all citations from all the documents submitted to the archives. These are used to rank documents in Torii so that you can order them according to their impact factors. It is a completely automatic system that creates a net of cross-references inside the archives. It is an instance of service, the second level of the three-layer structure, that can also be accessed independently at icite.sissa.it to search for citations patterns and ranking.

Torii is ready to move on into the future of digital networking. As the next generation of wireless systems comes into production, Torii will be accessible from the user mobile phone. The user can connect already and use it via WAP at \texttt{torii.sissa.it/wml/ia.wml} but the full potentiality of the system must wait for the 3G broad bandwidth to come into being. At that point, the user will be able to browse documents use his personal folder and any other of the features of Torii as he travel.

More information about Torii can be found at: http://tips.sissa.it/docs/booklet.pdf

Torii - Access the Digital Research Community  more (ppt-slides, 507 KB)


to the top

 

"Cyclades: an Open Collaborative Virtual Archive Environment"
by Umberto Straccia - IEI-CNR, Italy

The main goal of CYCLADES is to develop an open collaborative virtual archive service environment supporting both single scholars as well as scholarly communities in carrying out their work. In particular, it will provide functionality to access large, heterogeneous, multidisciplinary archives distributed over the Web and to support remote collaboration among the members of communities of interest.

CYCLADES will run on the data environment composed by the archives that adhere to the Open Archives Initiatives harvesting protocol specifications (http://www.openarchives.org ).

From the technical point of view, CYCLADES will consist of the following federation of interoperable services:

  • Access Service
    It harvests information from the set of OAI compliant archives, and indexes and stores the gathered information in a local database.
  • Collection Service
    It provides mechanisms for dynamically structuring the overall information space into meaningful (from some community's perspective) collections.
  • >Search and Browse Service
    It supports the users in formulating queries and develops plans for their evaluation. In particular, it provides an advanced multilevel browse facility, completely integrated with the search facility, that allows one to browse at schema, attributes, and document levels.
  • Filtering Service
    It supports information filtering on the basis of individual user profiles, and profiles of the working communities the user belongs to. User and community profiles are automatically inferred by monitoring the user behavior.
  • Recommendation Service
    It provides recommendations about new published articles within a working community. The choice about what recommendations to send and to whom is based on both the user and the working community profiles.
  • Collaborative Work Service
    It supports collaboration between members of communities and project groups by providing functionality for creating shared working spaces referencing users' own documents, collections, recommendations, related links, textual annotations, ratings, etc.

In this presentation, we will introduce each of the above services and we will discuss how each of them is influenced by the content harvested by the OAI-MPH.

CYCLADES - An Open Collaborative Virtual Archive Environment  more (ppt-slides, 702 KB)


to the top
 
1st - Pisa  |   Programme-Presentations-Notes  |   Abstracts  |   Participants
 
 

Imprint  

The Open Archives Forum (OAF) is an IST– Accompanying Measures project (IST- 2001-320015).
The partners of OAF are: University of Bath-UKOLN (United Kingdom), Istituto di Scienza e Tecnologie della Informazione-CNR (Italy) and Computer- and Media Service (Computing Center) of Humboldt University (Germany).

information societies technologies