Top Banner
From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University These slides prepared for the JISC/NSF Digital Libraries Initiative (DLI) All Projects Meeting, Edinburgh, 24-25th June 2002 OpCit is a joint JISC-NSF International Digital Libraries Project 1999-2002
21

From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Mar 27, 2015

Download

Documents

Thomas Bentley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

From eprint archives to open archives and OAI:

the Open Citation project

By The Open Citation Project team

Presented by Steve Hitchcock, Southampton University

These slides prepared for the JISC/NSF Digital Libraries Initiative (DLI) All Projects Meeting, Edinburgh, 24-25th June 2002

OpCit is a joint JISC-NSF

International Digital Libraries Project 1999-2002

Page 2: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

About this presentation

The aim is to show:

• Progress since Stratford All-Projects meeting in 2000

• Demonstrate new services developed by the project

• Highlight the relationship between the project and the Open Archives Initiative

• Outline key tasks remaining and which services will continue beyond the Open Citation Project

Page 3: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Recap 1: principal partners

• Southampton University, IAM (Intelligence, Agents, Multimedia) Research Group, PI Stevan Harnad

Citation-ranked search, EPrints.org, user surveys

• Cornell University, Digital Library Research Group, PI Carl Lagoze

Architecture for reference linking, experiments with the ACM Digital Library and D-Lib magazine, OAI technical support center

• arXiv.org, Paul Ginsparg

Now based at Cornell University. Still the largest archive of freely accessible author-deposited scientific papers

Page 4: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

The Open Citation Project : deliverables

The Open Citation Project (OpCit) is developing software and services to support the Open Archives Initiative (OAI). OpCit can help OAI data providers and service providers:

• Citebase: citation-ranked search

• EPrints.org software: free software to build and manage OAI-compliant eprint archives

• API for reference linking, an interface on which reference linking applications can be built

Page 5: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Recap 2: last time at StratfordReference links on pdf copies of papers

Page 6: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Citebase, a new interface to the scholarly literature

Page 7: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Citebase, a citation-ranked search engine

http://citebase.eprints.org/

“Google for the refereed literature”

Citebase is based on an open citation database

• Harvests metadata using OAI-PMH

• Extracts reference lists from arXiv papers

• Provides impact (and other)-ranked search based on reference data

• Re-exports metadata + references

Page 8: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Evaluating CitebaseThe evaluation is aimed at users of arXiv, and all others who use

bibliographic services to access the refereed journal literature. How you can contribute. Find the evaluation form at

http://www.ecs.soton.ac.uk/~aw01r/citebase/evalForm1.htm

Aims of the evaluation:

• Discover the user’s awareness of related services

• Assess usability with a practical exercise

• Invite the user’s views on the main features

• Assess the level of user satisfaction with the service

Page 9: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Citebase: further developments

• OpenURL-enabled: pointing Citebase links at library and journal services

• Google interface using DP9: getting Citebase results, and open archives, into Google

• Metadata format and XML schema for citations: making citation metadata harvestable via OAI-PMH. Possible formats include:

– Academic Metadata Format: a ‘local profile’ format, some collaborative experiments performed within OpCit

– OpenURL metadata, moving towards NISO standardisation

Page 10: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Recap 3: API for reference linkinggetLinkedText – contents of the paper, reference-linked plus lots of metadata for the paper

getReferenceList – this paper’s references getCurrentCitationList – the list of works citing this paper (best knowledge)

getMyData – metadata for this paper

Page 11: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Surrogates in the API

Based on an automatic analysis of the work, a surrogate for a scholarly work (and of other works, for citations), consists of the following three XML files:

• Bibliographic data for the scholarly work

• References contained in that work, and their contexts within the full text

• Citations of that work

Page 12: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

API: progress and evaluationNew features

• Citation interface added: surrogates can now collect citations

• Graphic citeref tool (demoed on ResearchIndex)

API tested on D-Lib Magazine and the ACM Digital Library. Try demo at http://cs-tr.cs.cornell.edu/RefLinkingDemo/

Performance (in terms of accuracy of data extracted):

• Reference analysis: 86.7%

• Item analysis (bib data, contexts, and references for a given paper): 82.42 %

Implementability

• Simple interface: Surrogate s = new Surrogate (some-url) • Portable: written in Java, has run in Solaris, Win2K, and NT4

• Installation: API source code plus public domain jar files

Page 13: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

EPrints.org softwarehttp://www.eprints.org/

Generates eprints archives that are compliant with the Open Archives Protocol for Metadata Harvesting. EPrints is free (GPL) software. It is aimed at organisations and communities.

EPrints v. 2.0 released February 2002 (now on v. 2.0.1, which fixes bugs and typos). Features:

• Internationalised metadata stored as Unicode

• Support for multiple archives on one server

• Improved user interface

Page 14: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

OpCit and OAI

•OIA aggregator: collecting and caching the results from OAI data providers to improve the efficiency of data harvesting

• OAI infrastructure: proxies, caches, gateways. Improve interoperability, scalability and reliability of OAI services. Joint work with Old Dominion University, see paper A Scalable Architecture for Harvest-Based Digital Libraries - The ODU/Southampton Experiments http://arxiv.org/abs/cs.DL/0205071• OAI Registration and Validation work is performed at Cornell

Page 15: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

EPrints and OAI

• EPrints feeds repository URLs straight into the OAI registration process (if so desired by the EPrints administrator)

• A scan of the OAI database of registered sites shows many sites use EPrints software to create repositories

www.openarchives.org/Register/BrowseSites.pl

Page 16: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

A repository administrator’s view of OAI

“As we have introduced our repository to our faculty and staff, we have emphasized the point that because they would be depositing their material in an OAI-compliant archive, it would automatically and painlessly be discoverable from various other points around the globe. Luckily, we were right.”

Roy Tennant, eScholarship, California Digital Library, June 2002

Page 17: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

OpCit user surveys and data mining

Maximising impact Maximising access

Results from Mining the Social Life of an Eprint Archive http://opcit.eprints.org/tdb198/opcit/

When interoperability is not enough: show authors what users do when open access services are available

Page 18: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Key project tasks remaining

• Evaluation and reporting of the results

• Programmer's guide to using the API

• Journal and conference papers

• Final reports to JISC and NSF

Page 19: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

After OpCit

OpCit formally ends in September 2002, but the following services will continue to be developed

• Citebase

• EPrints.org

• OAI

Page 20: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

What we have achieved; what we have learned

• OAI is gathering momentum

• Software for building repositories is available

• Institutional archives are being created, but need to be filled by authors

• Attracting authors requires evidence of real services that will improve the visibility and impact of their works

• Such services are now available. Citation-ranked search and reference linking are examples of OAI services that offer this

• The infrastructure supporting OAI services continues to be enhanced

• Resource discovery and current awareness are exemplar OAI services now. Future services may include preservation risk management, and personalization

Page 21: From eprint archives to open archives and OAI: the Open Citation project By The Open Citation Project team Presented by Steve Hitchcock, Southampton University.

Credits

Other contributors to the project include

• Technical development at Southampton is directed by Les Carr

• Research at Cornell by Donna Bergmark

• EPrints.org software is being developed by Chris Gutteridge

• CiteBase is produced and managed by Tim Brody

• Project manager is Steve Hitchcock

A copy of these slides can be found on the OpCit Web site

http://opcit.eprints.org/. Look for Papers and Presentations

Contact Steve Hitchcock: [email protected]