Top Banner
OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – http://opcit.eprints.org/ www.ecs.soton.ac.uk BCS Metadata Meeting, London 29 th May 2002 (Many slides borrowed from Michael L. Nelson)
25

OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org

Mar 27, 2015

Download

Documents

Audrey Stack
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

OAI Protocol for Metadata Harvesting

Tim BrodyIntelligence, Agents, Multimedia Group

University of SouthamptonOpCit – http://opcit.eprints.org/

www.ecs.soton.ac.uk

BCS Metadata Meeting, London 29th May 2002

(Many slides borrowed from Michael L. Nelson)

Page 2: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

OAI 2.0

• Public, stable not released yet … (but very close)– Beta released mid-May– Public release scheduled: 1st June

• 2.0 implementations in the pipeline– British Library, Cornell Univ, Ex Libris, my.OAI, Humbolt

Univ, InQuirion Pty Ltd, Library of Congress, NASA, OCLC, Old Dominion Univ, U. of Illinois, U. of Southampton, UCLA,

John Hopkins U., Indiana U., NYU, UKOLN, Virginia Tech

Page 3: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

Open Archives Initiative

The protocol is openlydocumented, and metadatais “exposed” to at least somepeer group (note: rights management can still apply!)

Archive defined as a“collection of stuff” --not the archivist’s definition of “archive”. “Repository” used in most OAI documents.

OAI is happeningat break-neck speed...

Page 4: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

Metadata Harvesting• Move away from distributed searching• Extract metadata from various sources• Build services on local copies of metadata

– Resources remain at remote repositories

user

. . .

search for “cfd applications”

local copy ofmetadata

metadataharvested offline

metadataharvested offline

metadataharvested offline

metadataharvested offline

each node independently maintained

all searching, browsing, etc. performed on the metadata hereindividual nodes can

still support direct userinteraction

Page 5: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

Metadata Harvesting

• Repositories (archives etc.) = low implementation cost

• Services = higher implementation cost

• Similar to web search model– DP9 gateway makes it exactly the same

Page 6: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

about eprintsdocument

like objectsresources

metadata OAMSunqualifiedDublin Core

unqualifiedDublin Core

transport HTTP HTTP HTTP

responses XML XML XML

requests HTTP GET/POST HTTP GET/POST HTTP GET/POST

verbs Dienst OAI-PMH OAI-PMH

nature experimental experimental stable

modelmetadataharvesting

metadataharvesting

metadataharvesting

Santa Feconvention

OAI-PMHv.1.0/1.1

OAI-PMHv.2.0

Page 7: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

OAI-PMH v.2.0 [06/2002]

• Goal: recurrent exchange of metadata about resources between systems

• Input:• OAI-PMH v.1.0 [01/01 – 09/02]• feedback on OAI-implementers• deliberations by OAI-tech [09/01 -]• alpha test group of OAI-PMH v.2.0 [03/02 -]

Page 8: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

• low-barrier interoperability specification• metadata harvesting model: data provider / service

provider• metadata about resources • autonomous protocol• distinction between protocol and periphery

• community-specific extensions• HTTP based• XML responses• unqualified Dublin Core• stable (1.0 characterized as experimental)

OAI-PMH v.2.0 [06/2002]

Page 9: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

OAI Data Model:

Resources / Items / Records

resource

all available metadata about David

item

Dublin Coremetadata

MARCmetadata

SPECTRUMmetadata records

item = identifier

record = identifier + metadata format + datestamp

Page 10: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

Overview of OAI Verbs

Verb Function

Identify description of archive

ListMetadataFormats metadata formats supported by archive

ListSets sets defined by archive

ListIdentifiers OAI unique ids contained in archive

ListRecords listing of N records

GetRecord listing of a single record

archivalmetadata

harvestingverbs

most verbs take arguments: dates, sets, ids, metadata formatsand resumption token (for flow control)

Page 11: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

Identify

• Arguments– none

• Errors– none

• Arguments– none

• Errors– badArgument

1.1 2.0

Page 12: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

ListMetadataFormats

• Arguments– identifier

(OPTIONAL)

• Errors– id does not exist

• Arguments– identifier

(OPTIONAL)

• Errors– badArgument– noMetadataFormats– idDoesNotExist

1.1 2.0

Page 13: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

ListSets

• Arguments– resumptionToken

(EXCLUSIVE)

• Errors– no set hierarchy

• Arguments– resumptionToken

(EXCLUSIVE)

• Errors– badArgument– badResumptionToken– noSetHierarchy

1.1 2.0

Page 14: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

ListIdentifiers

• Arguments– from (OPTIONAL)

– until (OPTIONAL)

– set (OPTIONAL)

– resumptionToken (EXCLUSIVE)

• Errors– no records match

• Arguments– from (OPTIONAL)– until (OPTIONAL)– set (OPTIONAL)– resumptionToken

(EXCLUSIVE)– metadataPrefix (REQUIRED)

• Errors– badArgument– cannotDisseminateFormat– badResumptionToken– noSetHierarchy– noRecordsMatch

1.1 2.0

Page 15: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

ListRecords

• Arguments– from (OPTIONAL)– until (OPTIONAL)– set (OPTIONAL)– resumptionToken

(EXCLUSIVE)– metadataPrefix

(REQUIRED)

• Errors– no records match– metadata format cannot be

disseminated

• Arguments– from (OPTIONAL)– until (OPTIONAL)– set (OPTIONAL)– resumptionToken

(EXCLUSIVE)– metadataPrefix (REQUIRED)

• Errors– noRecordsMatch– cannotDisseminateFormat– badResumptionToken– noSetHierarchy– badArgument

1.1 2.0

Page 16: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

GetRecord

• Arguments– identifier

(REQUIRED)

– metadataPrefix (REQUIRED)

• Errors– id does not exist

– metadata format cannot be disseminated

• Arguments– identifier

(REQUIRED)– metadataPrefix

(REQUIRED)

• Errors– badArgument– cannotDisseminateFor

mat– idDoesNotExist

1.1 2.0

Page 17: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

<?xml version="1.0" encoding="UTF-8"?><OAI-PMH><responseDate>2002-0208T08:55:46Z</responseDate> <request verb=“GetRecord”… …>http://arXiv.org/oai2</request> <GetRecord> <record> <header> <identifier>oai:arXiv:cs/0112017</identifier> <datestamp>2001-12-14</datestamp> <setSpec>cs</setSpec> <setSpec>math</setSpec> </header> <metadata> ….. </metadata> </record> </GetRecord></OAI-PMH>

response no errors

Page 18: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

<?xml version="1.0" encoding="UTF-8"?><OAI-PMH><responseDate>2002-0208T08:55:46Z</responseDate> <request>http://arXiv.org/oai2</request><error code=“badVerb”>ShowMe is not a valid OAI-PMH verb</error></OAI-PMH>

response with error

Page 19: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

• Idempotency of resumptionToken: return same incomplete list when rT is re-issued• while no changes occur in the repo: strict• while changes occur in the repo: all items with unchanged

datestamp• new attributes for the resumptionToken:

• expirationDate• completeListSize• cursor

resumptionToken Flow-Control

Page 20: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

• evolution

• from talking about OAI-PMH

• to talking about projects that use OAI-PMH

• to talking about projects and failing to mention they use OAI-PMH

• => OAI-PMH becomes part of the infrastructure

Adoption

Page 21: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

• 49 registered repositories [11/2001]

• 65 registered repositories [03/2002]

• 77 registered repositories [05/2002]

• 5+ million records

• many unregistered repositories

• private implementations (e.g. RDN)

Data Providers (a.k.a. repositories)

Page 22: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

• Arc: cross-searching of registered repositories [ http://arc.cs.odu.edu ]

• CiteBase: research literature search + citation ranking[ http://citebase.eprints.org ]

• OLAC: cross-searching of Language Archive Community repositories[ http://www.language-archives.org/index.html ]

Service Providers

Page 23: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

• Scirus scientific search engine [Elsevier][ http://www.scirus.com ]

• my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.][ http://www.myoai.com ]

• Growing interest from web search engines

Service Providers

Page 24: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

• Repository Explorer: interactive exploration of repositories [Virginia Tech][ http://www.purl.org/NET/oai_explorer ]

• eprints.org: generic OAI-PMH compliant repository software [U of Southampton][ http://www.eprints.org ]

• ALCME repository and harvester software [OCLC][ http://alcme.oclc.org/index.html ]

• APIs, others tools @ www.openarchives.org

OAI-PMH tools

Page 25: OAI Protocol for Metadata Harvesting Tim Brody Intelligence, Agents, Multimedia Group University of Southampton OpCit – //opcit.eprints.org/

http://www.openarchives.org/

[email protected]