Top Banner
Metadata Working Group Fo rum Cornell 2008-0 5-16 1 Metadata Normalization A Case Study in Primo -- and -- Linked Open Data In Libraries
35
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

1

Metadata Normalization

A Case Study in Primo-- and --

Linked Open DataIn Libraries

Page 2: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

2

Topical Overview

• Non-OPAC Discovery Systems• ILS - Discvoery System Ineroperability• SFX Optimization• Metadata Normalization• Ex Libris’ Primo - A Case Study

– Front end and System Overview– Primo - System Demo– Primo - NYU’s Implementation

Page 3: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

3

Topical Overview

• Primo (Cont)– Metadata and Data Analysis– Challenges and Possibilities

• Common Models and Linked Open Data– An Alternative Approach

• Data Harmonization Benefits– Authority Control– Application Profiles– A possible future for Bibliographic Data

Page 4: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

4

Not an OPAC Replacement

• Primo, Endeca, Encore, AquaBrowser, Library Find, VUFind, WorldCat Local

• Not OPAC Replacements– More seamless discovery– “Web 2.0”– Fewer Clicks - ease of use– Cross-depository discovery

Page 5: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

5

Encore by III“Encore goes beyond the online-catalog model to

provide a better patron experience that leverages library content and patron-contributed information. Key features include:– Faceted search by multiple parameters – RightResult™ relevance-ranking – Real-time holdings and status information – Suggested links to content related to the

user's search”http://www.iii.com/encore/main_index2.html

Page 6: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

6

AquaBrowser“Whatever it is, wherever it is, patrons can quickly

and easily find it using a single interface for all types and formats of content. Visually represented and faceted search results allow your patrons to search and discover information faster and more effectively. Relevant search results help them find answers fast. Word clouds encourage exploration and discovery. Facets help to quickly focus the results.”

http://www.aquabrowser.com/products/academic/

Page 7: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

7

Endeca“Endeca for Libraries is the most effective way

for members of the library community to find the book or resource they need and to discover new information they didn't even know the library owned, which drives increased usage of the library's resources, usage of legacy library collections, and re-circulation.”

http://endeca.com/byIndustry/media/index.html

Page 8: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

8

Primo

“Interfacing seamlessly with library applications from Ex Libris and other vendors … management of all types of library resources, regardless of format and location.”

“Find it all. Find it Easily. Get it”http://www.exlibrisgroup.com/category/PrimoOverview

Page 9: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

9

WorldCat Local• “A localized version of WorldCat with custom

branding and relevancy ranking• … interoperates with your existing ILS and

fulfillment systems …• Single-search, multilingual interface for all

physical and electronic content held locally or in remote locations

• Integrated access to the most appropriate delivery options”

http://www.oclc.org/worldcatlocal/default.htm

Page 10: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

10

One Problem, Many Solutions

• Users want a more seamless discovery experience

• Libraries get locked on the 2.0 buzz– Tagging, Reviews, Recommendations– Improved Relevancy Ranking

• Other goals may be more important– Fewer clicks to fulfillment– Cross-Depository Discovery

Page 11: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

11

More Intuitive Searching

• Less complicated initial searches• Less pre-search limiting• More post-search limits via

faceting• Appropriate Delivery bubbles up• Trade-offs…

Page 12: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

12

Not in your OPAC

• DLF ILS Discovery Interface Task Force• From the “Berkeley Accords”:

– “participants agreed to support a set of essential functions through open protocols and technologies by deploying specific recommended standards”

– Harvesting, Availability, Linking

Page 13: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

13

Availability and Access

• Links to full resource at the front• Carefully considered SFX options• Record FRBRization and Dedup• Availibility Statements real time

(or as close as possible)

Page 14: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

14

Open URL From Primo

Aleph Record Aleph Record(s) and/orother data deduped

Other Data Source

Query:•Other NYU Cat•WorldCat

Link both toISBN/ISSNSearch

Query:•Aleph (link to Holdings)•Other NYU Cat•WorldCat

Link Others to ISBN/ISSNSearch

Query:•Aleph (link to Holdings)•Other NYU Cat•WorldCat

Link Others to ISBN/ISSNSearch

Page 15: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

15

Primo: A Case Study

• Normalization Rules• Delivery templates• Tight SFX and MetaLib Integration• “Pipes” for different data sources• Hourly Availability Checking

– (Real Time in Version 2.0)

Page 16: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

16

Harvesting

• Different Data Sources• Different Normalization Rules• All standardized on Primo

Normalized XML (PNX) Record– Very Flat, sections corresponding to

Primo Functionality

Page 17: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

17

The PNX Record

• Display Section• Links Section• Search Section• Sort Section

• Facets Section• Dedup Section• FRBR Section• Delivery Section

Page 18: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

18

Data Sources at NYU

• BobCat (Aleph)• MarcIt!• EAD Records

(Archivists Toolkit)• Preservation

Repository• Faculty Digital

Archive (IR)

• Art Images (Luna Insight)

• MetaLib Resources

• Data in SOLR– Newspaper Index– Data Sets

Page 19: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

19

Issues and Challenges

• Managing Deduplication– Dedup Data only out of box for MARC– Writing for OAI-PMH sources (EAD)

• Consortial Environment(s)• Appropriate Delivery Options• “Interpreting” Metadata

Page 20: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

20

EAD Records

• Archivists Toolkit– Previously in Access, Notepad, Excel– Authority Control (sort of)

• OAI-PMH Overlay• Multiple layers of Crosswalking• Deduping

Page 21: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

21

EAD / Aleph Dedup

• Aleph Title:– James E. Jackson and Esther Cooper Jackson

papers

• EAD Title:– Guide to the James E. Jackson and Esther

Cooper Jackson papers 1917-2004 (Bulk 1937-1992) Tamiment 347

Page 22: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

22

MARC + EAD

EAD Record

Aleph Record

Authority Records

MARC Recordw/ Auth Data

OAI-DC Recordw/ FT of EAD

EAD PNX

Aleph PNX

Dedup PNX

Page 23: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

23

Value of Dedup

• Indexing the Best of Both Worlds• EAD Records:

– Inventory– Long Biographical / Historical Notes

• MARC Data:– Cross References for Access Points

Page 24: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

24

Why is it so hard?

• Continually Repetition of Effort

Page 25: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

25

Page 26: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

26

A Distinction

• Metadata Harmonization:– the “ability to use serveral different

metadata standards in a single software system.”

• Metadata Normalization:– mapping serveral different metadata

standards to a single schema or structure for use in a single software system.

Page 27: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

27

MARC + EAD

EAD Record

Aleph Record

Authority Records

MARC Recordw/ Auth Data

OAI-DC Recordw/ FT of EAD

EAD PNX

Aleph PNX

Dedup PNX

Page 28: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

28

Linked Open Data

• Use URIs as names for things • Use HTTP URIs so that people can look

up those names. • When someone looks up a URI, provide

useful information. • Include links to other URIs. so that they

can discover more things. http://www.w3.org/DesignIssues/LinkedData.html

Page 29: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

29

Primo is NOT Linked Data

• List of nearly a dozen sources, some “normalized” more than once

• “Normalized” into another proprietary format, used by one system

• Additional Resources require additional pipes

Page 30: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

30

Linked Library Data

• Resources get URI’s early in lifecycle

• Properties get URI’s• Vocabularies get URI’s• Everything is dereferenceable as

to it’s meaning

Page 31: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

31

Conclusions

• DCMI/RDA Work• NSDL Registry Work• LC Registry Work• MODs as RDF (Simile & LC)• OAI-ORE• OAI2LOD

Page 32: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

32

Conclusions

• This stuff is happening• We need to be playing with it• We need to be applying lessons

from projects like Primo to it• Library Data is a key component!

Page 33: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

33

…and Librarydata is extremelycomplicated

Page 34: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

34

MARC Record Graph• Does not include authority data• Coins new URI’s any non-literal value• Contains a few minor modeling errors

<modsrdf:Publisher modsrdf:value="Crowell" rdf:about="http://simile.mit.edu/2006/01/publisher/Crowell">

<modsrdf:location> <modsrdf:Place modsrdf:name="New York“

rdf:about="http://simile.mit.edu/2006/01/place/marccountry/nyu"/>

</modsrdf:location></modsrdf:Publisher>

Page 35: Cornell20080516

Metadata Working Group Forum Cornell 2008-05-16

35

Thanks!

Questions?

[email protected]