Abstracts of the Presentations

Tutorial: OAI and OAI-PMH for Beginners

An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting

Uwe Müller (Humboldt University), Pete Cliff (UKOLN) and David Casal (University of East Anglia)

Originally developed as a means for metadata dissemination of preprint and eprint servers, the OAI Protocol for Metadata Harvesting (OAI-PMH) meanwhile has become a widely known solution to connect distributed electronic archives of all kinds. The OAI-PMH owes much of its acceptance not only among experts to its simplicity and the comparatively very low costs of its implementation.

After a brief outline of the protocol's genesis and its development to date this tutorial will give an introduction to the main ideas of the OAI-PMH, its general functioning and some protocol details. Then we will deal with special implementation issues for data providers and service providers including both the necessary steps for a local implementation and several examples of freely available and adaptable tools for implementations. The tutorial will also provide an overview of the implementation of a data provider metadata set.

The OAI tutorial includes presentations as well as short breakout sessions with the possibility to discuss special implementation issues. Handouts including a glossary of terms will be provided. The tutorial should be attended by persons who are interested in more technical aspects of the OAI Protocol for Metadata Harvesting.

Technical Validation Questionnaire - interim results

Birgit Matthaei (Humboldt University Berlin)

The Open Archives Forum started a first Technical Validation Questionnaire in preparation for the first OA-Forum workshop in Pisa. The objective was to provide an overview on status, experiences and future plans regarding the workshop participants' OAI implementations. At this time exclusively participants of this workshop were demanded. In Pisa a high interest raised on the results of this small survey and the OA-Forum project received feedback indicating that it would be a good idea to collect experiences from a broader spectrum of OAI implementers as well as to learn more about starting conditions of those planning to implement or ones just beginning.

The focus of interest was on fundamental questions like: Is there a large common ground and therefore good conditions for cooperating and learning from each other, or are requirements so individual that necessarily many further isolated solutions will be developed? Do the existing instruments for implementation fulfil all requirements or should tools and protocols correspond more than before to the needs of different communities?

Thus in the second questionnaire we added or changed some questions and extended the duration. Beside this, we expanded the target audience for the questionnaire and subdivided the form to account for those projects that have not yet integrated OAI-PMH in addition to those who are experienced implementers.
Technical Validation Questionnaire

This second, long-term survey will continue through autumn 2003. The presentation offers interim results of the information the participants gave till now about used software, implementation costs, offered spectrum and interoperability, experiences and expectations in different communities and in different countries.

prometheus - the distributed digital image archive for research and tuition

Georg Hohmann (University of Cologne)

As part of its "New Media in education"- program the German Federal Ministry of Education and Research is financing the cooperative university project "prometheus - the distributed digital image archive for research and tuition". The three-year project has set to work in April 2001. The partners are the University of Cologne, the Humboldt University of Berlin, the Justus-Liebig-University of Giessen and the University of Applied Sciences of Anhalt at Dessau/ Köthen.

The aim of prometheus is to provide a unified interface to a conceptually very large number of different image data bases that focuses history of art and archaeology. The basic philosophy of the project is, that the individual image databases can have arbitrary different formats, which are unified by a server acting as a technical - and potentially conceptual -"broker". Based on this joined image archive und its media specific potential, prometheus will provide a variety of didactic units to support academic teaching and (e-)learning in the disciplines of Art History, Classical Archaeology, and Design History.

The prometheus central server uses a data model developed over the years for historical research, which build upon the idea of semantic network data bases. In recent years it turned out, that the data structures which can be administered by the system - kleio - are a superset of the data structures which can be expressed by XML. The stage one solution - the contributing data bases send XML dumps to the central server, which maps their structures and semantics into a common system - is currently being replaced by stage two, where instead of dumps being transferred the contributing data bases are mapped dynamically.

If we see the OAI as an attempt to provide integrated access to heterogeneous data sources by a specific protocol discipline required of the contributors, prometheus might be seen as the opposite end, as all the effort in the integration is taken care of by the central server, making no specific requirements of the contributing systems. Providing OAI access to all the contributing databases, simply by supporting the protocol by the server in this way, would be easy. It is not planned for, however, at the moment: Among other reasons, as that would make the handling of existing copyright restrictions all the more difficult.


Building Digital Multimedia Libraries using MILESS and MyCoRe

Frank Lützenkirchen (University of Essen)

MyCoRe is an Open Source project for the development of Digital Library and archive solutions (or, put more generally, "Content Repositories" >> CoRe). In the MyCoRe project a group of universities is working on the development of a shared software core for such applications. This core will be adjustable to local requirements and easy to modify. This is expressed by the "My" in MyCoRe, which represents the local adaptability. On the basis of this core which will be available under the open source GNU General Public License, specific local applications will emerge at the participating institutes. The technical base of the system is formed of Java class libraries, XML technology and, besides Open Source database backends, IBM Content Manager and IBM DB2 for large applications.

The Core Functionalities of MyCoRe include the following: Document and Person Metadata, Internal logical Filesystem, Hierarchical Classification System, User and Rights Management, User and Author Editor Interfaces, Distributed Search Function and Interfaces for OAI and Web Services.

The project roots in the MILESS Project of the University of Essen, where a Digital Library application consisting of Java servlets and applets was developed on the basis of the IBM database solution Content Manager. MILESS contains a collection of multimedia teaching and learning materials like animations, audio, video, images, and full text files. It is mainly local material produced in Essen or being used there which is managed with the MILESS system. Since MILESS was developed to fit the local needs in Essen it was never a primary goal to create a product that would flexibly adjust to the requirements of other locations. Out of the first group of later MILESS appliers to which the University of Jena (Urmel) and the University of Leipzig (Quästur, Bach Digital) belong, the "MILESS Community" emerged (>> "M...y CoRe"). Within this community a detailed idea of the general requirements of Digital Library applications, their common structure and possible differences, was formulated. Out of this dialog grew the decision to develop a shared software core for the different local applications based on the experiences with MILESS. This core is MyCoRe system.

MyCoRe is an Open Source product under the GNU General Public License. The System will be realized on the basis of Java. It will be a serverside application built of Java applications and Java servlets. The import and export format for the describing data will be XML. For now the IBM Content Manager and IBM DB2 will be used as database backend. But the system is generally designed to employ also other backends (especially those developed as Open Source products and applying XML technology) in the future. Adjustability, extensibility, and open interfaces are fundamental design premises. To permit as many local applications by "configuration in place of programming" as possible is the main task.


The OAI and OAI-PMH:
How did we get here, and where do we go from here?

Herbert van de Sompel (Los Alamos National Laboratory)

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) has its roots in the Santa Fe Convention of the Open Archives Initiative (OAI). The motivation to launch the OAI was to facilitate transformations in the scholarly communication system through specifying technical interoperability between nodes of such a system. That initial quest led the OAI into the realm of defining a generic protocol for Metadata Harvesting that can be used well beyond the initial application domain.

Now that the stable version 2 of the OAI-PMH is in place, the OAI is reflecting on its mission for the years to come, and refocusing on the original scholarly communication domain is high on the list of priorities. The keynote will address the original motivation to launch the OAI, and it will describe the evolution of the OAI work since its launch in 1999. It will also explain the areas of e-print interoperability that the OAI is interested in focusing on in future work, and it will discuss novel uses of the OAI-PMH in areas that go well beyond the typical realm of resource discovery.


Discovering Good Practice: Metadata and the NINCH Guide

Ian Anderson & Seamus Ross (HATII, University of Glasgow) 1

The National Initiative for a Networked Cultural Heritage (NINCH) "Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials"2 is unique in being practice based and expert led. The Humanities Advanced Technology and Information Institute (HATII) at the University of Glasgow was contracted to undertake extensive research on current practice in digitisation on both sides of the Atlantic. Thus the Guide was based on empirical research, and offered good practice from some of the world's best-established digitisation projects. The NINCH Working Group, who conceived and brought the Guide to publication, strengthened the Guide with input from some of the leading experts in the field. This ensured that the Guide was not only timely but could highlight emergent trends, technologies and strategies. The Guide looks to the future as well as reflecting present processes.

The Guide highlights a variety of approaches to metadata amongst the projects analysed and interviewed by the HATII team. This diversity was not only a consequence of the variety of collections - text, image, sound and moving images - but a result of the different institutional contexts in which projects developed, the legacy of analogue cataloguing methods and different technological choices. Methods for representing metadata include: MARC, EAD, DC, TEI, TIFF, XML, and SGML. Thesauri and controlled vocabularies include: LCSC, CDWA, AAT, VRA, TGN, TGM, and ULAN. As this range of acronyms indicates, most projects adopted a hybrid approach to metadata creation, adopting and adapting various standards and technologies according to the type of metadata being created and project requirements.

Although projects were creating metadata to recognised standards and protocols that would enable interoperability, few took a pro-active approach to this. Whilst there was awareness of initiatives such as OAI, METS, CIMI and SMIL projects were adopting a 'wait and see approach'. This cautious approach was not only a result of the immaturity of these initiatives but reflected problems with existing metadata creation, particularly in the descriptive field. Even with institution or project based searching many projects struggled to reconcile accurate descriptions of their digital collections with absent or inadequate thesauri, subject classifications and name control files. As initiatives such as OAI come on stream parallel developments such as the UK Archival Thesaurus may help solve these problems. Nevertheless, the greatest challenge facing multimedia repositories may be populating interoperable metadata frameworks rather than implementing the technology.

1 www.hatii.arts.gla.ac.uk
2 http://www.ninch.org/guide

Paul Child (University of East Anglia)

Projects are temporary. They have a defined beginning and defined end. As project workers, we would like our work to live on after the project has finished. The most common way of ensuring this longevity is to make it interoperable with the widest number of other systems that we can. This can be a daunting task for a relatively short lived organisation and complicating factors such as dealing with multiple media types and the need to reconcile project aims with the interoperability goal can only make the situation worse.

ArtWorld began in 2000 as a three year project, led by the University of East Anglia and is funded by the Joint Information Systems Committee. ArtWorld provides access to primary visual resource materials for the enhancement of learning and teaching in world art studies. It is a consortium project comprising art museums, university departments and research institutes in England centered at the University of East Anglia, Norwich, and the University of Durham. Resources are being built by a team including teachers, students, museum curators from the consortium together with external IT consultants.

In this presentation I will outline the current status of the ArtWorld project and how the project team has approached the difficulties in reconciling multimedia types, interoperability and conflicting project aims.


Resource Selection and Data Fusion in Distributed Multimedia Digital Libraries: The MIND Approach

Fabio Crestani (University of Strathclyde, UK)

MIND is an IST project funded under the EC Fifh Framework. It is led by the University of Strathclyde (UK), with the University of Florence (Italy), Duisburg (Germany), Sheffield (UK), and Carnegie Mellon (USA) as partners. The project started in January 2001 and is approaching a conclusion.

MIND addresses some of the issues that arise when people have routine access to thousands of heterogeneous and distributed multimedia Digital Libraries. Today, a person must know where to search, how to query different media, and how to combine information from diverse resources. As Digital Libraries continue to proliferate, in a variety of media, and from a variety of sources, these problems of resource selection and data fusion become major obstacles, as solutions based on a centralised repository of metadata will battle with scalability and substantiality.

In this talk I will give a brief overview of the results achieved in the MIND project. I will also outline the differences and similarities between the MIND and the OAI approach to accessing multimedia information in distributed Digital Libraries. Although, very different and starting from almost opposite assumptions, I hope to be able to show that there is strength in a possible combination of the two approaches.


