Home Overview Resources Workshops OAI-PMH Tutorial Project Documents Project Partners
open archives forum

Workshops:
About  |   Pisa, May 02  |   Lisbon, December 02  |   Berlin, March 03  |   Bath, September 03

 
2nd - Lisbon  |   Programme-Presentations-Notes  |   Abstracts  |   Participants

Abstracts of the Invited Speakers

Open Access to Libraries

Josè Borbinha (National Library of Portugal), Nuno Freire (National Library of Portugal), Hans-Jörg Lieder (Berlin State Library) and Theo van Veen (Koninklijke Bibliotheek)

MALVINE and LEAF. Perspectives of the Open Archives Initiative Protocol for Metadata Harvesting in European Projects and Beyond.
Hans-Jörg Lieder (Berlin State Library)

MALVINE will provide access to distributed holdings of modern manuscripts kept in European libraries and archives. The service will be available as of 1.1.2003 at: http://www.malvine.org. The launch of MALVINE will not mark the end of a development but - hopefully - rather the beginning of a new phase of activities in the sector of modern manuscripts. The future emphasis of MALVINE will clearly be the maintenance and consolidation of the status quo and the integration of further data providers with a view on increasing the pan-European significance of search results.

At the time of launching MALVINE the OAI protocol for metadata harvesting will play no role in the network scenario. The presentation will describe how the MALVINE Consortium envisages the possibly future use of the protocol in a landscape of European institutions - be these great or small - providing data to a joint service.

The LEAF project (http://www.leaf-eu.org) develops a model architecture for establishing links between distributed authority records (personal names only) and providing access to them. The system will allow uploads of the distributed authorities to a central system and automatically link those authorities concerning the same entity. It will be shown how the OAI protocol plays a vital role in keeping the central repository up-to-date at any time.

Presentation  more ppt-slides, 247 KB
 

to the top
The European Library: opportunities for new services.
Theo van Veen(Koninklijke Bibliotheek)

The European Library Project (TEL), sponsored by the European Commission, brings together 10 major European national libraries and library organisations to investigate the technical and policy issues involved in sharing digital resources. The objective of TEL is to set up a co-operative framework, which will lead to a system for access to the major national and deposit collections in European national libraries.

In this presentation I will discuss the development of a metadata model and the development of an interoperability testbed. This testbed will offer distributed searching in the national collections via Z39.50 alongside searching a central index of metadata harvested from other collections via the Open Archives Initiative protocol (OAI-PMH). Access to this central index is offered via SRU - a new protocol for search and retrieval based on http and XML initiated by the Z39.50 Implementers Group as a low barrier alternative to Z39.50.

The major challenges in the technical work of this project are related to the diversity of collections, languages and local services. From a user perspective TEL hopes to meet these challenges by lowering the barrier for users to access the different collections by offering metadata searches integrated in a central index, not just a menu of web sites and we hope to offer translations of search terms and facilitate searching on names by providing access to different name authority databases. From a provider perspective we hope to lower the barrier to participate in TEL by using simple protocols and by facilitating conversions. And from a library perspective we hope to share machine-readable metadata by developing a common metadata model and vocabularies with tools that allow for an ongoing evolution.

TEL will offer access to different services like multilingual services, name authority services and links to local services. One of the keys to meet the above challenges is integration: metadata generated by each service should be usable when accessing other services. This requires a common understanding of metadata, an easy way to carry metadata from one service to other services and an easy way to associate related metadata. It will be discussed how the TEL metadata development, resulting in a TEL Application Profile, dynamically generated links and integrated indexing of different types of metadata will contribute to fulfil those requirements. With name authorities as an example it will be demonstrated how this contributes to bringing new services within a “one mouse click distance”. Integrated object metadata and name authority search will help the user in finding main entries rather than telling the user that his search resulted in no hits. A pan-European "Central Name Authority File", as being one of the results of the LEAF project can contribute in the realisation of these valuable services.

Presentation  more ppt-slides, 711 KB
 


to the top

 

Open Access to Archives

George Mackenzie (The National Archives of Scotland) and Goran Krisstiansson (The Regional Archives in Lund)

The OA Forum has commissioned this report from practising archivists in the UK and Sweden. It looks at the potential for using the Open Archives Initiative Protocol for Metadata Harvesting as a simple means of disseminating and exchanging archive catalogues. The world of conventional archives is interested in exchanging metadata, and has widely adopted international data structure standards produced by the International Council on Archives. It has also shown interest in a system for encoding catalogues known as Encoded Archival Description, or EAD. Archive descriptions are complex and collection based, proceeding from the general (fonds or collection level) to the particular (item level). The report briefly examines two implementations of OAI, the University of Illinois Project and the AIM25 project in the UK. It also considers a related hybrid implementation in Australia, and a planned use of the protocol in A2A, another UK project. It observes that OAI can be used for exchanging archive descriptions, but that there are problems. The first is difficulty in accurately reflecting linkages between levels of description. The second is the inconsistent application of EAD. The report also looks briefly at alternative means archivists are using for exchanging metadata, particularly the Z39.50 protocol. The report concludes that OAI will be used by conventional archives only if three conditions are fulfilled. First, archivists must be confident that compliant descriptions will respect archival principles, second, descriptions must be produced with little effort from existing systems, and third, archivists must believe that the wider OAI user base contains sufficient numbers of potential users. It suggests possible strategies in which archives would produce OAI compliant records for parts of their descriptions only.

Outline

1. Introduction

1.1 It is necessary first to consider terminology. The use of the word archives by the scholarly e-print world in the OAI is not itself the problem, but it is important to understand what archivists mean by the term. A generally agreed definition would be "documents created or accumulated by a person or organisation in the course of their business and preserved because of their continuing value". Archives have value not only for the information they contain, but also as evidence, and this underlies the descriptive systems and standards that archivists use.

1.2 Archivists are relative latecomers to standardised description. They have been much slower to develop tools and processes than the library world. This is largely because archives are by definition unique and arranging and describing them is therefore much more complex than for books or serials. Until fairly recently, archives described their holdings in individual ways, using locally determined rules. This did not matter, since users of archives had to make physical visits to see the records, and could have the house catalogues explained by the archivists. However, as electronic communications have spread, archivists have seen the desirability of exchanging and disseminating data, and have recognised the need to have standardised tools to do so. In the United States, where archives are often associated with library services, archivists using computers for descriptions began to adopt library type standards and adapt them. The USMARC-AMC standard was adopted by the Society of American Archivists in the mid 1980s and is now used fairly extensively in the United States, mainly to describe collections of archives, rather than individual items1.

1.3 The International Council on Archives (ICA) has developed two data structure standards for archive descriptions2: ISAD (G), which is a general standard and ISAAR (CPF) which is a standard for describing the organisations and individuals who create records. Both are essentially sets of data elements, and give little or no guidance on the content. They have been widely accepted and are now in common but not comprehensive use in Europe and elsewhere in the world.

1.4 In a parallel move, a document type definition (DTD) for encoding archival descriptions has been developed by a group of interested scholars led by Daniel Pitti. Encoded Archival Description (EAD) matches the ISAD data elements and provides archivists with a means of encoding their finding aids. EAD is in fairly wide use in the archive world, especially in North America. It is now maintained by the Network Development and MARC Standards Office of the Library of Congress in partnership with the Society of American Archivists. The EAD website currently lists 56 implementers, mainly universities in the United States, and 26 further co-operative sites3.

2. Context: How archives describe their holdings

2.1 Archival Description Systems
The essential idea underlying the way that archivists arrange and describe their holdings is context. What this means is that you cannot understand the content of a document unless you also understand the context of its creation. Archivists regard the principle of provenance as fundamental. This means that all the documents relating to an individual or organisation should be associated together, and any original order in them should be respected. From this, two principles follow. First, archive descriptions are hierarchical, and proceed from the collection or fonds level down to the item level. To understand each item, you need to know the levels above it. This hierarchy of levels is vital to archive descriptions and must be represented in the descriptive system. The second principle is that archive descriptions involve a separation of content information and context information. Information about what is in a record is separated from information about how the record was created, by whom, when and why. These principles are clearly implemented in the ISAD and ISAAR standards. The EUAN project concluded that these were fundamental to international description, and that the two standards were an essential step in any systems for data exchange across international boundaries.

2.2 XML / EAD
Looking at a typical implementation, in the Swedish Arkis 2 system, can further illuminate the archival approach to description. The new Arkis 2 system being implemented in the Swedish National Archives, has a relational database using SQL Server and uses EAD as an exchange format. Once XML is more widely used and supported by web browsers, EAD will be used as an output format for displaying Arkis 2 searches on the Riksarkivet website.

2.3 The Riksarkivet set up their National Archive Database (NAD) in 1990, covering the holdings of the national archives and the provincial archives in Sweden. In 1993 it received a boost, with the provision of 1,000 young unemployed people to work on it, under a government scheme. An early principle was not to re-invent things, and accordingly the MARC-AMC standard was adopted, not to create records in a MARC system but to use it as an exchange format.

2.4 The data model for the Arkis 2 system, which is given in Annex 1, shows the relations of the different parts to each other, distinguishing the intellectual or knowledge level from the physical or operational level. Arkis 2, unlike its predecessor Arkis 1, allows true multi-level descriptions. It has been designed to cover all archival activities, including, for example, locations of documents. Arkis 2 has, from the outset, been designed as an Internet available service. During the development phase, EAD emerged and its value was quickly recognised. It is used in the same way as MARC-AMC was used, as a means of tagging data elements in the system to allow the export of archival information. In their application, some parts of the descriptions are tagged using a simple XML editor, both to give formatting instructions (boldface, italic, etc.) and to indicate certain descriptive elements (names within scope and content etc.) Arkis 2 can display multiple levels of information, including the automatic construction of an organisation chart, based on the descriptive levels. The display follows the format used in Windows file trees. This allows users to see how the levels of description are derived from organisational levels, in a graphical way.

2.5 Dublin Core is not used for archive cataloguing because it is not capable of reproducing the richness of archive descriptions and in particular to show the necessary levels and relationships. While no archives use it as a native description format, the EUAN project identified its potential for indexing and describing web pages.

3. Who is Using OAI in Archives?

3.1 Very few archive organisations know or use OAI. The list of OAI repositories contains only two archive implementations. These are briefly surveyed below, together with two further examples. It is important to stress that these sites (and archives generally) provide access to catalogue information only and not to digital materials.

  • UIUC This is the biggest implementation, which has investigated conversion from EAD encoded descriptions to Dublin Core for exposure to OAI harvesters. It has demonstrated both the possibilities and the difficulties. In particular, the attempts to convert from EAD format to OAI revealed widespread differences in the way it was used by cataloguers, leading to difficulties of consistency.
  • AIM25 This UK based project covering a range of archive repositories in London has provided OAI compliant descriptions to UIUC. These have been produced in a simpler fashion than in most of the UIUC examples. AIM25 holds its descriptions in a database and can export them in EAD or other format as required. They did not find OAI compliance difficult to achieve, but found it difficult to reproduce the linkages between archive material from the same collection, or between material created by the same person or organisation held at different repositories

3.2 In Australia there is interest in the library world in using OAI, and one archive related project, Bright Sparcs, which provides biographical and name authority information on Australian scientists, is involved in work with the National Library of Australia (NLA). This is not a typical archive site, though it does illustrate better the potential of using OAI for name authorities. Bright Sparcs provides Dublin Core compliant descriptions of its pages, and these are harvested by the NLA and mounted on their site.

3.3 Another UK archive network is planning to use OAI. The Access to Archives (A2A) project brings together catalogue descriptions at collection and item level from a range of participating archive organisations in England, with central editorial control provided by a team based at the Public Record Office. They have around 4 million records captured or planned for capture. They intend to produce OAI compliant descriptions at collection level only, and use these as a signpost to the fuller, item level description, which will be available through the A2A site. They recognise the importance of judging whether users find this a helpful approach.

4. What Other Ways are Archives approaching Interoperability?

4.1 Archives across the world are increasingly interested in international interoperability. The HE Archives Hub in the UK is using Z39.50 in a sophisticated retrieval system known as Cheshire 2. They do not at present intend to provide OAI compliance, though they could do so, because they offer a fully functioning Z39.50 target, which can provide the full multi-level descriptions archives need.

4.2 Other archive institutions have adopted common portals or network systems. One example is the Scottish Archive Network (SCAN), which is bringing together collection level descriptions from 51 Scottish archives and will provide a fully searchable database linked back to item level descriptions in individual archives’ sites. The contents of the central SCAN database will not be visible to search engines, but by providing links and advertising the existence of the site, the aim is to reach users with an interest in Scottish history or archives and encourage them to search. Another example is CAIN, the Canadian Archival Information Network. Individual archive institutions in Canada supply descriptions to their provincial networks, which in turn produce records for the national network database.

4.3 The EUAN project examined both centralised and distributed means of exchanging data from archive catalogues and concluded that using fonds level descriptions as the basis for advertising archive holdings to users and potential users is a valid one. The project built a simple working model based on harvesting XML descriptions at fonds level and presenting them on a single server. This showed how a system might work as a means of resource discovery. The project did not, ultimately, manage to test the model with a significant volume of international content, nor has it subsequently extended, as envisaged in the business plan, to include a critical mass of partners. However, it did find, in a limited survey of Italian and other European archives, sufficient similarity in the way archive institutions approached description to make a common approach possible, particularly at the fonds level.

4.4 There is not much research available about how people search archives, particularly on the Internet. Some work has been done in Canada.

5. Conclusions

5.1 Few conventional archives yet know about OAI and even fewer are using it. The research for this report has necessarily been limited, but it has revealed only two actual implementations in conventional archives, one planned implementation and one hybrid that bridges archives and libraries.

5.2 Archive organisations will make catalogue information available on line and on the Internet, but do not normally provide access in digital form to more than a tiny fraction, if any, of the materials described. The archives that have implemented OAI are in this category.

5.3 OAI can be made to work for archives, but there are problems. The first and general one is how to represent adequately the complex relationships and hierarchies in archive descriptions, so that vital contextual information is not lost, or is at least available to the researcher. A second series of problems have been encountered in trying to convert from EAD encoded descriptions to OAI, which relate to the different ways in which EAD has been implemented. Greater prescription in the use of EAD would solve some of these problems.

5.4 One possible strategy is to use OAI for fonds or collection level descriptions only. These are non hierarchical thus avoiding the difficulty of linking levels, but will contain less richness of content. These would be seen as a signpost to a fuller description, with a link straight into the database of the holding institution.

5.5 Another related strategy is to use OAI for name authorities, which are also non hierarchical. This would depend on there being recognised national or international authorities in existence, with links to archive resources. This is the case in Sweden, but not in all European countries. The name authorities will lead a user on to archive or other material.

5.6 Archivists are only likely to use OAI:

  • if they are confident that OAI compliant descriptions will respect their multi-level descriptions;
  • if they can export data in an OAI compliant way with little or no additional work;
  • if they believe that OAI will let them reach new audiences that will be interested in their holdings.

5.7 In the longer term, if OAI is widely used by varied organisations and domains, will the result be increased problems of navigating among huge quantities of information? In these circumstances, will OAI provide sufficient functionality to enable a user to tailor requests?

1. Victoria Irons Walch, Standards for Archival Description (SAA, 1994) on-line version at http://www.archivists.org/catalog/stds99/chapter3.html
2. http://www.ica.org/biblio.php?pbodycode=CDS&ppubtype=pub&plangue=eng
3. http://www.loc.gov/ead/

Presentation  more ppt-slides, 639 KB


to the top

 

Open Archives and Intellectual Property - incompatible world views?

Mark Bide (Rightscom)

The expert report commissioned by the Open Archives Forum discusses the relationship between the Open Archives Initiative and Intellectual Property. There is considerable confusion over the nature of the Open Archives Initiative and the open access movement, which confuses much of the discussion surrounding OAI. So far as possible, we try to distinguish between these, although both are discussed.

Many of the issues which this raises have as much to do with commercial considerations as with legal ones, and it is inevitable that there should be some cross over between these different perspectives since "the content industry" is dependent on copyright for the security of its business model. Intellectual property is an essentially utilitarian concept, designed to maximise the value of creative effort for society as a whole as well as for individual creators.

Intellectual property is governed by national law, drawn up in accordance with international conventions and treaties.

National law has two distinctively different traditions: the continental European tradition is based on the "droit d'auteur", the inalienable right of the creator over the creation; the Anglo-Saxon tradition is more explicitly commercial, seeing copyright as predominantly a property right, something that can be traded. These differences in outlook sometimes lead to substantially different attitudes to Intellectual Property issues, although the difference in their practical impact is relatively limited.

Copyright provides creators with an exclusive right to control the copying and publication of their work for a limited period of time. This right may be assigned or licensed to others.

Moral rights provide additional rights to creators, including the right to be identified as the creator of a work, and the right to object to derogatory treatment of the work; moral rights carry significantly different weight in different national legislative frameworks.

All copyright legislation includes certain limited exceptions. These must (under international treaty) pass a "three step test" which ensures that the exceptions do not unduly interfere with the normal exercise of the creator’s rights. Exceptions are normally limited by a test of "substantiality" which cannot be objectively measured. Additional rights exist in many countries to protect databases which may not be protected by copyright law because they exhibit insufficient creativity.

The development of the global network has not altered the law of copyright - all existing legislation applies equally to content on the network as elsewhere. However, new legislation has been necessary to reflect changed circumstances, creating new exceptions to copyright and new protections for copyright owners.

Because of the ease with which intellectual property can be copied and distributed over the network, some owners of intellectual property rights believe that the law is not able to provide sufficient protection and are seeking to develop and implement technical measures to protect their content.

Some believe that there can be no effective technological measures for the protection of copyright, and that other ways will have to be found to compensate owners for casual copying. In some countries, this includes the introduction of levies on either copying equipment or media.

The existence of the network is also encouraging the development of new ways of licensing intellectual property, based on the "open software" model. These licences selectively assert creators’ rights under copyright law, but permit users wide licence to copy and distribute without payment.

Individual items of metadata may not be protected by copyright, to the extent that they are simply facts intrinsic to the resources that they describe. However, metadata records which include elements of significant creativity - including abstracts - may be "works" in their own right and protected by copyright. Collections of metadata may be covered by database right, even if the metadata records themselves are not covered by copyright.

The resources described by the metadata are likely themselves also to be subject to copyright protection, unless they have passed into the public domain because of their age. Our focus in this report is on academic journal articles, since these are the main subject of current Open Archives activity.

Although it might be assumed that academic institutions would in general own copyright in journal articles (the normal rule for employers whose employees create intellectual property), it is custom and practice, and often explicit in employment contracts, that academics retain rights in journal articles. We believe this is highly unlikely to change to any great extent in the foreseeable future.

Journal publishers have traditionally insisted on a full assignment of rights in articles that they publish. However, many are now content to accept an exclusive licence to publish. However, an exclusive licence may be just as restrictive as an assignment.

Many journal publishers do not seek (at least at the present time) to restrict authors from posting copies of journal articles (either before or after formal publication) to the eprint archives. However, authors should ensure that they have an explicit understanding of the rights and the contractual situation, which may be complex.

Those who manage eprint servers are publishers, and will need to be aware of their responsibilities as such. This implies that they should ensure that they received proper warranties that an author has the right to post an eprint of a paper.

Non-textual resources are more complicated than text resources from the point of view of rights clearance and ownership; the owners of the rights in these resources are often much more rigorous about their enforcement. Repositories that include non-textual materials will have to be very careful to ensure that they do not infringe any rights.

It is clear that authors’ attitudes to questions of intellectual property and Open Archives are substantially coloured by the value that they seek from publication (which is not directly monetary). Their behaviour indicates that, even in those disciplines where Open Archives have been long established, formal publication in the peer-reviewed literature remains essential. This is always likely to mean giving up some rights over the content.

Publishers' current attitudes to the Open Archives Initiative have been much affected by the confusion between the Open Archives Initiative and the open access movement. It is hardly surprising that publishers show little enthusiasm for what is often openly portrayed to them as an attempt to undermine their business.

It would be equally unsurprising if academic institutions did not favour a mechanism which might make the acquisition of journals content less expensive (or indeed anything else). This is the other side of the coin. However, they will have to take on considerable responsibilities if they are themselves to become publishers on a large scale.

We recommend that those involved as data providers and service providers in the OAI model should develop mechanisms to make explicit their understanding of the use to which harvested metadata will be put. To this end, we recommend that metadata harvested under the OAI protocol should include information about the permitted uses of the metadata itself and the rights and permissions status of the resource which it describes. We believe that those operating eprint archives - or any other online resource repository - will need to take their responsibility as publishers seriously. This will include developing "notice and takedown" procedures for dealing with situations when notice is given of alleged infringements of copyright.

There is ultimately no conflict between Open Archives and Intellectual Property - but Open Archives exist within the framework of Intellectual Property law, and would be advised to recognise this in the way that they operate.

Presentation  more ppt-slides, 215 KB


to the top

OAI: Where we are now, how we got here, and where we are going

Simeon Warner (Cornell University)

Almost two years has passed since the first release of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) in January 2001. During that period, the OAI-PMH has emerged as a practical foundation for digital library interoperability.

The OAI-PMH supports interoperability via a relatively simple two-party model. At one end, data providers employ the OAI-PMH to expose structured data, metadata, in various forms. At the other end, service providers use the OAI-PMH to harvest the metadata from data providers and then subsequently automatically process it and add value in the form of services.

While resource discovery is often mentioned as the exemplar service, other service possibilities include longevity and risk management, personalization, and current awareness.

The general acceptance of the OAI-PMH is based on a number of factors. It is intentionally low-barrier, exploiting widely deployed Web technologies such as HTTP and XML. It builds on many years of metadata practice, leveraging the development of a lingua franca metadata vocabulary in the Dublin Core Metadata Initiative. It accommodates a number of community and domain-specific extensions such as the co-existence of multiple domain-specific metadata vocabularies, collection description, and resource organization schemes.

The first sixteen months after the Version 1 release of the OAI-PMH were purposefully experimental. The intention during that time was to provide a reasonably stable platform for early adopters to test the concepts of metadata harvesting and build a number of fundamental services. Indeed , that intended stability was accomplished, with only one change in the protocol occurring over the sixteen months due to a change in the XML Schema specification.

This talk reports on the results of the experimentation on the OAI-PMH carried out until now with both version 1 and version 2 and it outlines the expectations and prospects for the future.


to the top

Design of the PORTA EUROPA Portal (PEP) Pilot Project

Marco Pirri (University of Florence)

This talk concerns the conception of an OAI1 compliant service that can manage three different digital historical archives maintained by the European University Institute (EUI) in Florence. This situation requires careful consideration of interoperability issues related to uniform naming, metadata formats, document models and access protocols for the different data sources.

In this talk we will present the design approach for the digital archives federation services to be developed in the Porta Europa Portal (PEP) Pilot Project. The PEP pilot project specialised portal should provide high quality information, selected according to the criteria of originality, accuracy, credibility together with the cultural and political pluralism derived from the EUI's profile. The information in Porta Europa will be: relevant, reliable, searchable and retrievable.

To test the feasibility and the impact of the PEP project the EUI committed itself to the development of a PEP prototype2 concerning historic topics. To this extent, among the various available digital historical archives three of them were chosen for the implementation of the pilot. Our approach in solving problems of standardization and interoperability in the PEP pilot project is based on two main issues:

  • Metadata standard ( Dublin Core3)
  • Protocols ( OAI-PMH )

The PEP (Porta Europa Portal) project refers to the integration of three digital libraries related to European history topics: Voices on Europe, Virtual Library and Biblio library catalogue.

Each of these data source is characterized by:

  • a collection of data objects (books, journals, documents, multimedia objects etc.) available locally or through the network
  • a collection of metadata structures
  • a collection of services (access methods, management functions, logging/statistics, etc.)
  • a domain focus (topic)
  • a community of users

Of course the need of integrating the three data sources comes from the topic (European history) and users community which are common to all three archives.

  • Voices on Europe; (http://wwwarc.iue.it/webpub/Welcome.html) Voices on Europe is an archive containing the electronic audio version and electronic transcriptions about a hundred of interviews given by outstanding politician and historians.
  • WWW-VL (Virtual Library) on European History Integration; (http://vlib.iue.it/history/index.html) The Virtual Library (VL) is the Web oldest catalogue, conceived by Tim Berners-Lee. Unlike commercial catalogues, it is run by a loose confederation of volunteers, who compile pages of relevant links for specific areas in which they are expert. The EUI Library Web site contains the complete list of VLs belonging to the WWW VL History Project in the University of Lawrence/Kansas (USA) and mirrored at the European University Institute's Library.
  • Biblio (the EUI historical archives); (http://www.iue.it/LIB/Catalogue/) This is the library catalogue containing more than 250.000 bibliographic records. Access to resources is supported by INNOPAC, well known Library Automation System.

The PEP Pilot Project is being developed according to the following steps:

  1. Analysis of the three data resources; in this part we first understand the current situation of the resources and we identify the main issues involved in each case. Each resource is characterised by different issues which are elicited and therefore faced. This phase end with a detailed description of the metadata formats, document models and access protocols for each of the data sources. The analysis revealed the strong points and the weakness of each digital library setting the basis for the definition of a common document description model.
  2. Definition of the federation architecture (figure 1); the architecture of our federation service4 is structured in three layers: the data source layer where all information is stored with autonomy of representation and access interfaces, the adapter layer were special adapters have to be implemented to provide uniform access and transform the data source specific model into the global model of the federated system, and the federation layer which is responsible for global data integration using an on purpose database and is the OAI data provider and the User interface that will be the OAI service provider.

PEP Architecture
Figure 1. PEP Architecture

Data Source Layer: these are the archives (digital libraries) whose integration we deal with: Voices on Europe, Virtual Library and Biblio library catalogue.

Adapter Layer: this layer provides uniform access to the information, hiding the differences in the data models and query interfaces. Here the metadata are translated from the source specific model into the global model of the federated system.
The development of this work is the adoption of the Web services technical framework where a standardized mechanism would be used to describe, locate and communicate with each digital library. The main operation of this layer is the "extraction of data". This operation has to be automatic so that each interface has to be implemented specifically for the resource. As instance SQL queries could be used to extract data from some Data sources (Voices on Europe, Virtual Library) and some external tools such as Innopac tools could be used for the catalogue.

Federation Layer and User Interface: in this two layers is implemented the OAI-PMH, in details:

  • Data Provider (The Federation layer)
  • Service Provider (The User Interface)
Moreover the Federation layer has to describe Metadata of the three different resources in a common standard to allow in a second step to store them in a unique database.
To this extent a common metadata format (Meta Resource Card - MRC) must be devised for the three resources. To effectively address the interoperability issue, the Meta Resource Card should follow the unqualified Dublin Core Standard to define the common fields.
In the Federation Layer are implemented interoperability functions, the OAI compliant Data Provider, that is the core of pilot project. The User Interface will be OAI compliant Service Provider and it will use OAI harvesting to extract data. In a first period externally implemented interfaces such as Arc5 could be used as the Service Provider.

Presentation  more ppt-slides, 509 KB

[1] OAI - Open Archives Initiative, http://www.openarchives.org
[2] PEP pilot project - Porta Europa Portal pilot project, http://www.iue.it/Personal/Staff/pirri/
[3] DC - Dublin Core Metadata Initiative, OCLC, Dublin Ohio,http://www.dublincore.org/
[4] Endig, M, Hoding, M, Saake, G., Sattler, K.U. and Schallehn, E, 2000. - Federation services for heterogeneous digital libraries accessing cooperative and non-cooperative sources. In: International Conference on Digital Libraries: Research and Practice, 2000 Kyoto.120 -127
[5] Arc - AcRoss Archive Search Service, http://arc.cs.odu.edu/
to the top
 
2nd - Lisbon  |   Programme-Presentations-Notes  |   Abstracts  |   Participants
 
 

Imprint  

The Open Archives Forum (OAF) is an IST– Accompanying Measures project (IST- 2001-320015).
The partners of OAF are: University of Bath-UKOLN (United Kingdom), Istituto di Scienza e Tecnologie della Informazione-CNR (Italy) and Computer- and Media Service (Computing Center) of Humboldt University (Germany).

information societies technologies