1. OAI for Beginners: Overview

How to use this tutorial

This tutorial is intended for those who are interested in more technical aspects of the OAI-PMH, although the Overview and the History and Development of OAI-PMH, together with the Glossary, are suitable for those who simply require some general background information. Each part builds on the material in the earlier parts, so a good approach is to work through the parts in order, referring to the glossary as required. In addition to the Glossary, you will find key terms defined within each part of the tutorial. Sets of quick quiz questions for the introductory sections help you to check whether you've picked up key points along the way.

Overview (this part) introduces the basic concepts underlying the OAI and the OAI-PMH. Use this part to gain an understanding of what the OAI-PMH is, and what it does and does not provide. History and Development of OAI-PMH covers the emergence of the Open Archives Initiative, showing how it grew from roots in several earlier initiatives, and discussing the nature of the problems for which it aims to provide solutions. This part also surveys the development of the protocol (including the evolving nature, aims and technical components) from the Santa Fe Convention, through OAI-PMH v.1.0/1.1, to OAI-PMH v.2.0.

The rest of the tutorial contains more technical material. The Main Technical Ideas of OAI-PMH introduces and explains in some detail the key technical elements of the protocol. Implementing OAI-PMH outlines implementation issues for Data Providers and Service Providers; it explains how to implement OAI-PMH as a Data Provider and as a Service Provider, including both the necessary steps for a local implementation and several examples of freely available and adaptable tools for implementations. XML Schemas and Record Formats provides an overview of the implementation of a Data Provider metadata set, including coverage of XML schema and how to support multiple record formats.

Basic OAI concepts and features Top

--- Open Archives Initiative (OAI) ---

The essence of the open archives approach is to enable access to Web-accessible material through interoperable repositories for metadata sharing, publishing and archiving. It arose out of the e-print community, where a growing need for a low-barrier interoperability solution to access across fairly heterogeneous repositories lead to the establishment of the Open Archives Initiative (OAI). The OAI develops and promotes a low-barrier interoperability framework and associated standards, originally to enhance access to e-print archives, but now taking into account access to other digital materials. As it says in the OAI mission statement "The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content."

Many communities are beginning to or potentially could benefit from the open archives approach. The Internet and the growing mass of material in digital format have broadened the potential clientele of many repositories of information. Material can be accessed more widely and also exploited for purposes different from those that originally motivated the creation of the repositories. Moreover, the possibility of accessing multiple repositories enables the construction of new kinds of services that can better serve the needs of the users. An additional incentive is the potential for cost-saving inherent in new models of the scholarly communication process that could be supported with this approach.

As an organisation, the OAI has included an Executive for management, and Steering and Technical Committees for policy direction and evaluation of protocol developments. The Digital Library Federation (DLF), the Coalition for Networked Information (CNI), and the National Science Foundation (NSF) have funded the OAI. While the Executive and the funders are USA-based, the success of the OAI is firmly grounded in the participation of a community of people from around the world, particularly Europe as well as North America. Now that there is a well-developed and stable second version of the protocol, the need to keep control in the hands of a very small number of people who can take independent and speedy decisions may be less important when weighed against the perception of stability and authority conferred by control through a standards body such as ISO, and this possibility has been discussed within the OAI.

--- OAI Protocol for Metadata Harvesting (OAI-PMH) ---

The OAI-Protocol for Metadata Harvesting (OAI-PMH) defines a mechanism for harvesting records containing metadata from repositories. The OAI-PMH gives a simple technical option for data providers to make their metadata available to services, based on the open standards HTTP (Hypertext Transport Protocol) and XML (Extensible Markup Language). The metadata that is harvested may be in any format that is agreed by a community (or by any discrete set of data and service providers), although unqualified Dublin Core is specified to provide a basic level of interoperability. Thus, metadata from many sources can be gathered together in one database, and services can be provided based on this centrally harvested, or "aggregated" data. The link between this metadata and the related content is not defined by the OAI protocol. It is important to realise that OAI-PMH does not provide a search across this data, it simply makes it possible to bring the data together in one place. In order to provide services, the harvesting approach must be combined with other mechanisms.

Much promise is seen for the use of the protocol within an open archives approach. Support for a new pattern for scholarly communication is the most publicised potential benefit. Perhaps most readily achievable are the goals of surfacing 'hidden resources' and low cost interoperability. Although the OAI-PMH is technically very simple, building coherent services that meet user requirements remains complex. The OAI-PMH protocol could become part of the infrastructure of the Web, as taken-for-granted as the HTTP protocol now is, if a combination of its relative simplicity and proven success by early implementers in a service context leads to widespread uptake by research organisations, publishers, and "memory organisations".

Seven key definitions Top

Open Archive Initiative (OAI)
OAI is an initiative to develop and promote interoperability standards that aim to facilitate the efficient dissemination of content.

The term "archive" in the name Open Archives Initiative reflects the origins of the OAI in the e-prints community where the term archive is generally accepted as a synonym for repository of scholarly papers. Members of the archiving profession have justifiably noted the strict definition of an ?archive? within their domain; with connotations of preservation of long-term value, statutory authorization and institutional policy. The OAI uses the term ?archive? in a broader sense: as a repository for stored information. Language and terms are never unambiguous and uncontroversial and the OAI respectfully requests the indulgence of the professional archiving community with this broader use of ?archive?.
(OAI definition quoted from FAQ on OAI Web site)

OAI Protocol for Metadata Harvesting (OAI-PMH)
OAI-PMH is a lightweight harvesting protocol for sharing metadata between services.

A protocol is a set of rules defining communication between systems. FTP (File Transfer Protocol) and HTTP (Hypertext Transport Protocol) are examples of other protocols used for communication between systems across the Internet.

In the OAI context, harvesting refers specifically to the gathering together of metadata from a number of distributed repositories into a combined data store.

Data Provider
A Data Provider maintains one or more repositories (web servers) that support the OAI-PMH as a means of exposing metadata.
(OAI definition quoted from FAQ on OAI Web site)

Service Provider
A Service Provider issues OAI-PMH requests to data providers and uses the metadata as a basis for building value-added services.
(OAI definition quoted from FAQ on OAI Web site)
A Service Provider in this manner is "harvesting" the metadata exposed by Data Providers

Sources of further information Top

The rest of this tutorial.

Open Archives Initiative (OAI official Web site)

Open Archives Forum (OA-Forum Web site)

Quick Quiz Questions Top

Answer and 'Mark' each question separately. Feedback is provided for each marked answer. Once you have marked a question, you can get a further 'Explanation' of the answers. When you have finished, check your total marks for the questions you tried. The marks are provided only to you, they are not stored when you leave this page.

Q1. What is the OAI?

(Select one answer)

(a) The OAI is an initiative to provide open access to the output of scientific and other scholarly research.
(b) The OAI is an initiative supporting archives for the preservation of and access to digital resources.
(c) The OAI is an initiative to develop and promote interoperability standards that aim to facilitate the efficient dissemination of content.


Q2. What is the OAI-PMH?

(Select one answer)

(a) The OAI-PMH is a protocol for sharing metadata.
(b) The OAI-PMH is a low-barrier protocol for searching across repositories and retrieving resources from them.


Q3. What is the primary source of technical information about the OAI and OAI-PMH?

(Select one answer)

(a) The Web site of the Open Archives Forum (
(b) The Web site of the BOAI (
(c) The Web site of the Open Archives Initiative (


Click to view your total score for all the above questions that you have attempted.

