OAI-PMH: Basics and Resources

Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a set of specifications for making structured open repository metadata accessible to other service providers issuing requests.

Why learn about OAI-PMH?

Taking advantage of repositories (data providers) and services (service providers) that offer metadata using OAI-PMH will allow your resources better visibility and access.  For example, many discovery services (the “harvester”) use OAI-PMH metadata for indexing open access institutional repository articles.

The Basics

Open Archives InitiativeOpen Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) specifies how metadata is structured and presented for ingestion by external services, usually on the Internet.  OAI-PMH metadata is encoded in extensible markup language (XML) format.  OAI-PMH records are harvested using HTTP requests.

OAI-PMH is a project of the Open Archives Initiative.

OAI-PMH Records

An XML-encoded OAI-PMH record is organized into the following parts:

  • Header – Unique identifier, datestamp, set membership, status (optional).
  • Metadata – Set of metadata, often in simple Dublin Core.
  • About – Optional rights statements, provenance, and other uses.

A specific type of OAI-PMH used for library bibliographic data is the OAI-DC (Dublin Core) metadata schema.  This is the type that many library vendors employ for exposure and harvesting.

OAI-DC (Dublin Core) metadata schema

View the OAI-DC schema at www.openarchives.org/OAI/2.0/oai_dc.xsd.

Below is an example of an OAI-PMH record using the simple Dublin Core schema.  It contains a header and metadata, but no optional about data.

<header>
  <identifier>oai:commons.erau.edu:jaaer-1001</identifier>
  <datestamp>2013-08-01T16:00:32Z</datestamp>
  <setSpec>publication:journals</setSpec>
  <setSpec>publication:journals-magazines</setSpec>
  <setSpec>publication:jaaer</setSpec>
</header>
<metadata>
  <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
    <dc:title>Back Matter</dc:title>
    <dc:date>1990-01-01T08:00:00Z</dc:date>
    <dc:type>text</dc:type>
    <dc:format>application/pdf</dc:format>
    <dc:identifier>http://commons.erau.edu/jaaer/vol1/iss1/2
      </dc:identifier>
    <dc:identifier>
      http://commons.erau.edu/cgi/viewcontent.cgi?
      article=1001&amp;context=jaaer
      </dc:identifier>
    <dc:source>Journal of Aviation/Aerospace Education &
      Research</dc:source>
    <dc:publisher>ERAU Scholarly Commons</dc:publisher>
  </oai_dc:dc>
</metadata>

OAI-PMH records are often grouped into sets for organization and selective retrieval.  This is done using the setSpec attribute followed by a colon and setName (with optional setDescription).

<setSpec>publication:journals</setSpec>
OAI-PMH Harvesting

OAI-PMH records harvesting is done using requests that must be submitted using the HTTP GET or POST method.

Each harvesting HTTP request contains a base URLpath (optional), verb statement, and multiple optional arguments (separated by an ampersand [&]).

http://www.xyz.edu/do/oai/?verb=ListRecords&metadataPrefix=oai_dc

OAI-PMH includes a set of six commands (called verbs) that are invoked within HTTP.  They are:

  • GetRecord – Used to retrieve an individual metadata record.
  • Identify – Used to retrieve repository information (ex. name, version).
  • ListIdentifiers – Used to retrieve only headers.
  • ListMetadataFormats – Used to retrieve the available metadata formats.
  • ListRecords – Used to retrieve actual item metadata records.
  • ListSets – Used to retrieve the set structure of a repository

A harvesting HTTP request may also contain multiple attributes to narrow the scope of records retrieved.  Common attributes include metadataPrefix= (ex. metadataPrefix=oai_dc), Identifier= (ex. identifier=oai:lcoa1.loc.gov:loc.gmd), set= (ex. set=journals), and from= and until= (for selecting data ranges).

We’ll look at a few services that offer access to or use OAI-PMH metadata.

bepress Digital Commons

bepressDigital Commons, bepress’s institutional repository platform, supports OAI-PMH for harvesting repository records.  It maps fields to simple Dublin Core and exposes them using OAI-DC (and also flexible Simple Dublin Core, Qualified Dublin Core, and Electronic Theses and Dissertations Metadata Standard (ETD-MS) in Canada).

bepress has published a Digital Commons and OAI-PMH: Harvesting Repository Records guide for institutional repository administrators.

DSpace

DSpaceDSpace is a free open source repository system used by many libraries and other institutions.  It supports OAI-PMH for harvesting repository records.

The DuraSpace (DSpace) wiki provides information on enabling and configuring the OAI-PMH service in Apache Tomcat.

MarcEdit

MarcEditMarcEdit includes a built-in OAI-PMH metadata harvester.  Using this feature you can harvest records from your institutional repository or other systems and convert them to MARC records for inclusion in your library catalog.  MarcEdit supports MARCXML, oaimarc, oai_dc, and MODS formats.

OAIster

OCLCOCLC’s OAIster is a “catalog of millions of records … built through harvesting from open access collections worldwide using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).”  OAIster records are available in WorldCat search results.

Administrators of institutional repositories can submit OAI-PMH access data to OCLC via the WorldCat Digital Collection Gateway.  Once configured, OCLC will index repository records periodically (ex. quarterly).

Springshare LibGuides

SpringshareOn July 15, 2015, Springshare announced OAI-PMH support for LibGuides.  LibGuides uses the standard Dublin Core OAI-DC metadata format.

Basically, with the full OAI-PMH support in LibGuides it is now even easier to index and show results from your guides alongside all of your library resources – in your catalog, discovery layer, federated search system, institutional repository, you name it – if it supports OAI-PMH you can index LibGuides in it, too.

Springshare gives you your base URL and great examples for all six HTTP request verbs.

Recently, Springshare announced an overhaul to its OAI offering to include real-time updating, sets support, database link assets, e-reserves courses, and custom guide metadata.

Many other online databases and services used by libraries use OAI-PMH data including arXivCONTENTdm, Library of Congress, and Omeka.net.

Resources

OAI for Beginners – The Open Archives Forum online tutorial.

OAI-PMH Document – The official protocol, currently on version 2.0.

OAI-PMH Google Group – An active group for questions and answers.

#OAIPMH – Hashtag for Twitter posts on OAI-PMH.