Top Banner
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte, KUL Joris Klerkx, KUL
21

Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

Dec 27, 2015

Download

Documents

Andrea Walters
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

Open Archives Iniative – Protocol for Metadata Harvesting

Iztok Kavkler, University of Ljubljana

Some slides byStefaan Ternier, KULBram Vandenputte, KULJoris Klerkx, KUL

Page 2: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

2

What is OAI?

Harvesting standard, documented athttp://www.openarchives.org/OAI/openarchivesprotocol.html

Seven service verbs– Identify– ListMetadataFormats– GetRecord– ListRecords– ListIdentifiers– ListSets

Allows multiple metadata formats– DC (Dublin core) format mandatory

Page 3: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

3

How OAI works

OAI “VERBS”– Identify – ListMetadataFormats– GetRecord– ListIdentifiers– ListRecords– ListSets

HARVESTER

REPOSITORY

OAI OAI

Service Provider Metadata Provider

HTTP Request

HTTP Response

(OAI Verb)

(Valid XML)

Page 4: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

4

Try it

Install Apache-Tomcat or any other Java servlet container

Download WAR file from

http://fire.eun.org/Iztok/OAILREApp.war Deploy WAR Demo html

http://localhost:8080/OAILREApp/

Or type a service verb, e.g.http://localhost:8080/OAILREApp/oaiHandler?verb=Identify

Page 5: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

5

The raw XML

By default, the resulting XML has stylesheet attached for pretty rendering

To remove the stylesheet comment the line

OAIHandler.styleSheet=testoai/oaicat.xsl

in file

oaicat.properties (in WAR file or the web-app dir)

Page 6: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

6

OAI XML example<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" ...><responseDate>2007-06-11T06:48:58Z</responseDate><request metadataPrefix="oai_lom"

verb="ListRecords">http://localhost:8080/OAILREApp/oaiHandler</request><ListRecords> <record> <header>

<identifier>oai:oai.xyz-repository.com:exercises/112553</identifier><datestamp>2007-06-09T22:38:28Z</datestamp><setSpec>exercises</setSpec>

</header> <metadata>

<lom xmlns=...> ... </lom> </metadata> </record>

....<resumptionToken expirationDate="2007-06-11T07:48:58Z"completeListSize="42" cursor="10">1181544538265</resumptionToken></ListRecords></OAI-PMH>

Page 7: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

7

OAICat - a Java implementation

OAICat home athttp://www.oclc.org/research/software/oai/cat.htm

Takes care of– web service details– OAI XML specification

The implementer has to provide three classes– RepositoryOAICatalog– RepositoryRecordFactory– Repository2oai_dc (lom, ...) - usually more than

one

Page 8: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

8

A sample implementation

(Source code and libs inhttp://fire.eun.org/Iztok/OAILREApp.zip)

Create a new web module Add servlet oaiHandler to web.xml<servlet>

<servlet-name>LreOAIHandler</servlet-name>

<servlet-class>ORG.oclc.oai.server.OAIHandler</servlet-class>

<load-on-startup>5</load-on-startup>

</servlet>

<servlet-mapping>

<servlet-name>LreOAIHandler</servlet-name>

<url-pattern>/oaiHandler</url-pattern>

</servlet-mapping>

Page 9: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

9

(cont)

Define properties file location<context-param>

<param-name>properties</param-name>

<param-value>oaicat.properties</param-value>

</context-param>

Welcome file for testing<welcome-file-list>

<welcome-file>testoai/index.html</welcome-file>

</welcome-file-list>

Page 10: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

10

Sample record

A record with basic fieldsid, url, title, descr and date

SampleOAICatalog contains an array with 3 sample records

Page 11: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

11

SampleOAICatalog.listIdentifiers

Parameters– from – date to harvest from (String in iso8601

format) date or datetime - depends on granularity

– to – date to harvest to– set – a set name, list only records from this set (if

null, list all records) set names classify objects in natural groups every record may belong to multiple sets (or none)

– metadaPrefix – list only records that support this format (sample formats: oai_dc, oai_lom, ...)

Page 12: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

12

SampleOAICatalog.listIdentifiers

Must return a map with to fields– headers – a String iterator of OAI headers– identifiers – a String iterator of OAI identifiers

Both created by the call (rec is a SampleRecord)String[] header = getRecordFactory().createHeader(rec);

headers.add(header[0]);

identifiers.add(header[1]);

Create resultMap<String, Object> listIdMap = new HashMap<String, Object>();

listIdMap.put("headers", headers.iterator());

listIdMap.put("identifiers", identifiers.iterator());

return listIdMap;

Page 13: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

13

getRecordFactory().createHeader(rec)

Creates header by calling the methods in SampleRecordFactory

String getOAIIdentifier(Object rec)– return full oai identifier “oai:oay.rep.com:id001”

String getDatestamp(Object rec)– returns date in iso8601 format

Iterator<String> getSetSpecs (Object rec)ArrayList<String> list = new ArrayList<String>();

list.add(...);

return list.iterator(); Iterator<String> getAbouts (Object rec) String fromOAIIdentifier(String id)

– helper method – convert id to a local id

Page 14: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

14

SampleOAICatalog.listSets

takes no parameters, returns the list of all sets in this repository– each ListIdentifiers or ListRecords query may

contain a set name, limiting the results to just one set

Page 15: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

15

SampleOAICatalog.getSchemaLocations

like GetRecord, but returns the Vector of all metadata schema locations the record supports– to obtain them, just call

getRecordFactory().getSchemaLocations(rec);

Page 16: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

16

SampleOAICatalog.getRecord

String getRecord(String id, String metadataPrefix)– find record and convert it to xml string (<record> element)– id is in global format – to get local value call

getRecordFactory().fromOAIIdentifier(id)– throw IdDoesNotExistException if record not found– to generate XML use constructRecord

constructRecord(rec, metadataPrefix)

Page 17: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

17

SampleOAICatalog.listRecords

just like ListIdentifiers, only generates a list of XML <record> elements

return a map with one elementMap<String, Object> listRecMap = new HashMap<String, Object>();

listRecMap.put(“records", records.iterator());return listRecMap;

Page 18: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

18

Crosswalks

Conversions of native record type to XML like Sample2oai_lom or Sample2oai_dc

Only two methods per implementation– boolean isAvailableFor(Object rec)– String createMetadata(Object rec)

SampleRecord record = (SampleRecord) rec;return LOMFormat.writeStringWithSchema(record.toLOM());

throw CannotDisseminateFormatException if the metadata not available in this format

Page 19: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

19

SampleRecord.toLOM

uses LOM-j lib to quickly hack together LOMhttp://sourceforge.net/projects/lom-j/

– automatic serialization/deserialization of LOM and DC XML formats

Examplelom.newGeneral().newIdentifier(0).newCatalog().setString("lre");

lom.newGeneral().newIdentifier(0).newEntry().setString("sample:" + id);

lom.newTechnical().newLocation(-1).setString(url);

lom.newGeneral().newTitle().newString(0).newLanguage().setValue("en");

lom.newGeneral().newTitle().newString(0).setString(title);

Page 20: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

20

Resumption

A repository usually has fixed limit on the numer of records to return in one call– if there are more available, it returns a resumption

token, allowing to receive next packet– Implemented by functions

listIdentifiers(String resumptionToken) ,listRecords(String resumptionToken)

– see XYZOAICatalog for details

Page 21: Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,

21

References

http://www.openarchives.org/OAI/openarchivesprotocol.html http://www.fmf.uni-lj.si/~kavkler/ http://www.oclc.org/research/software/oai/cat.htm http://www.cs.kuleuven.ac.be/~hmdb/SqiOaiMelt http://sourceforge.net/projects/lom-j/ SIO/Trubar OAI url

http://sio.edus.si/LreTomcat/