UKOLN is supported by: The Open Archives Initiative Protocol for Metadata Harvesting CRIS + Open Access = The Route to Research Knowledge on the GRID Brussels – 21 September 2004 Andy Powell, UKOLN, University of Bath [email protected]www.bath.ac.u k A centre of expertise in digital information management www.ukoln.ac.u k
19
Embed
The Open Archives Initiative Protocol for Metadata Harvesting
CRIS + Open Access = The Route to Research Knowledge on the GRID, Brussels – 21 September 2004
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
UKOLN is supported by:
The Open Archives Initiative Protocol for Metadata HarvestingCRIS + Open Access = The Route to Research Knowledge on the GRIDBrussels – 21 September 2004
A centre of expertise in digital information management
www.ukoln.ac.uk
2
Contents
• a brief history of OAI• 10 technical things you should know about
the OAI-PMH• potential impact…
– institutional context– the role of the library?– the researcher
• current activities/issues• OAI and the semantic Webnote: primary focus is on the technology
3
OAI roots
• the roots of OAI lie in the development of eprint archives…– arXiv, CogPrints, NACA (NASA), RePEc, NDLTD, NCSTRL
• each offered Web interface for deposit of articles and for end-user searches
• difficult for end-users to work across archives without having to learn multiple different interfaces
• recognised need for single search interface to all archives– Universal Pre-print Service (UPS)
4
Searching vs. harvesting
• two possible approaches to building a single search interface to multiple eprint archives…– cross-searching multiple archives based on protocol like
Z39.50– harvesting metadata into one or more ‘central’ services –
bulk move data to the user-interface
• US digital library experience in this area indicated that cross-searching not preferred approach– distributed searching of N nodes viable, but only for small
values of N
5
Harvesting requirements• in order that harvesting approach can work
there need to be agreements about…– transport protocols – HTTP vs. FTP vs. …– metadata formats – DC vs. MARC vs. …– quality assurance – mandatory elements,
mechanisms for naming of people, subjects, etc., handling duplicated records, best-practice
– intellectual property and usage rights – who can do what with the records
• work in this area resulted in the “Santa Fe Convention”
6
Development of OAI-PMH• 2 year metamorphosis thru various names
– Santa Fe Convention, OAI-PMH versions 1.0, 1.1…– OAI Protocol for Metadata Harvesting 2.0
• development steered by international technical committee
5. based on HTTP and XML– simple, Web-friendly, fast deployment
6. OAI-PMH is not a search protocol– but use can underpin search-based services
based on Z39.50 or SRW or SOAP or…
7. OAI-PMH typically carries metadata– content (e.g. full-text or image) made available
separately – typically at URL in metadata
8. mandates simple DC as record format– but extensible to any XML format – IEEE
LOM, ONIX, MARC, METS, MPEG-21, etc.
9
Bluffer’s guide to OAI
9. metadata and ‘content’ often made freely available – but not a requirement– OAI-PMH can be used between closed groups– or, can make metadata available but restrict
access to content in some way
10.underlying HTTP protocol provides– access control – e.g. HTTP BASIC– compression mechanisms (for improving
performance of harvesters)– could, in theory, also provide encryption if
required
10
Dublin Core
• OAI-PMH mandates use of simple DC as lowest common denominator
• agreed XML schema – ‘oai_dc’– simple DC – 15 metadata properties
Impact on institutions…• OAI-PMH technology provides an open, relatively
stable technical framework– allows institution to re-consider management of
intellectual output– greater confidence in availability of external services
(e.g. discovery, access, analysis)
• the technical bit is easy– eprints.org software (Southampton), DSpace
(MIT/HP), Fedora
• but, technical solutions are always easy!– real problem is cultural change required to get
academics to deposit
13
Impact on libraries…
• library is natural choice as ‘managing agent’ for the institutional repository– quality control– metadata enhancement– preservation
• but libraries often weak technically (not always!) therefore technical collaboration within institution may be required
• beginning to see some evidence of externally ‘hosted’ repository services being offered
14
Impact on researchers…• OAI-PMH technology provides a ‘disruptive’
technical framework that supports– new ways for individual researcher to disclose his/her
research output– development of new kinds of ‘research’ discovery
services
• can use ‘personal’ OAI repository• but, need to
– clarify roles of institutional, discipline and personal repositories
– overcome FUD – IPR, peer-review, ability to ‘publish’, quality control, inertia
15
Current activities/issues
• protocol now stable and few changes being discussed
• some lightweight noises about re-implementing OAI-PMH using SOAP (Web services) but little enthusiasm for pushing these kinds of changes forward
• some work on OAI-rights issues – formalising mechanisms for attaching IPR statements and/or licences to the records being exchanged using the protocol, e.g. Creative Commons
16
Creative Commons
• CC is “devoted to expanding the range of creative work available for others to build upon and share”
• provides ‘standard’ licences for content– attribution– noncommercial– no derivative works– share alike