Top Banner
Rachael Lammey Product Manager, CrossRef UKSG 2015 CrossRef Text and Data Mining Services: one year in
29

UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Jul 15, 2015

Download

Education

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Rachael LammeyProduct Manager, CrossRefUKSG 2015

CrossRef Text and Data Mining Services: one year in

Page 2: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Not-for-profit association of scholarly publishers

All subjects, all business models

5,000+ organizations from all over the world

83 non-publisher affiliates, 2000 library affiliates

72 million + DOIs assigned to content items

Page 3: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

10.1098/ rst l.1665.0001

Page 4: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

User clicks on CrossRef DOI reference link in Journal A

Tani, N., N. Tomaru, M. Araki, AND K. Ohba. 1996. Genetic diversity and differentiation in populations of Japanese stone pine (Pinus pumila) in Japan. Canadian Journal of Forest Research 26: 1454–1462.[CrossRef]

DOI directory

returns URL

User accesses cited art icle in

Journal B

Page 5: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

100,000,000

Page 6: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

A Text and Data Mining Hub for Researchers

Page 7: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

What is Text and Data Mining (TDM)?

Text Mining is an interdisciplinary field combining techniques from linguistics, computer science and statistics to build tools that can efficiently retrieve and extract information from digital text.

http://blogs.plos.org/everyone/2013/04/17/announcing-the-plos-text-mining-collection/

It uses powerful computers to find links between drugs and side effects, or genes and diseases, that are hidden within the vast scientific literature. These are discoveries that a person scouring through papers one by one may never notice.

http://www.theguardian.com/science/2012/may/23/text-mining-research-tool-forbidden

Page 8: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Why? • Researchers find it impractical to

negotiate multiple bilateral agreements with hundreds of subscription-based publishers in order to authorise TDM of subscribed content.

• Subscription-based publishers find it impractical to negotiate multiple bilateral agreements with thousands of researchers and institutions in order to authorise TDM of subscribed content.

• All parties would benefit from support of standard APIs and data representations in order to enable TDM across both open access and subscription-based publishers.

Page 9: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef
Page 10: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Build Cross-Publisher API for TDM

Page 11: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Access To Full Text

Problem: Researchers want to get full text content from publishers’ sites for OA or subscribed content. Solution:

Solution: Common API (protocol) for requesting machine readable full text from many different publishers

Page 12: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Negotiating Permissions

Problem: Researchers want to know whether text and data mining is allowed, and if not, get permission. Solution: Licensing information embedded in article metadata and a registry for supplemental text and data mining terms and conditions (licenses).

Page 13: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Text and Data Mining Steps• Define problem

• Identify potential corpus to mine

• Discovery (full text links)

• Identification of subset which can be accessed (license information)

• Download identified corpus

• Text and data mine corpus

Page 14: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

The Basic Workflow

Page 15: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Publisher Participation

To enable their content for use by the service, publishers have to provide CrossRef with two additional pieces of metadata:

• Full text URIs (to show where the full-text is located)

• License URIs (to show the Terms & Conditions under which they can use it)

• Can implement rate limiting

CrossRef doesn’t charge publishers for participating in this service.

Page 16: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Researcher Use

• The CrossRef REST API is the main aspect of this service

• It is designed to allow researchers to easily harvest full text documents from all participating publishers regardless of their business model (e.g. open access, subscription).

• It makes use of CrossRef DOI content negotiation to provide researchers with links to the full text of content located on the publisher’s site.

• The publisher remains responsible for actually delivering the full text of the content requested

• CrossRef does not charge researchers for using the service

Page 17: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Publisher Metadata for CrossRef TDM: Hindawi

Page 18: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Publisher Metadata for CrossRef TDM: Elsevier

Page 19: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

CrossRef TDM Demo

Page 20: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Click-ThroughService

Page 21: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Extended Workflow

Page 22: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

ResearcherView

Page 23: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

PublisherView

Page 24: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Researcher queries DOI using CN + API token

Publisher verifies API token

If token verified AND access control allows,publisher returns full text

(frequency at publisher discretion)

Page 25: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Benefits

• Streamlines researcher access to distributed full text for TDM

• Enables machine-to-machine, automated access for recognized TDM (i.e. researchers won’t be locked out of publisher sites)

• Enables article-level licensing info and easy mechanism for supplemental T&Cs for text and data mining (publishers discussing model license via STM)

Page 26: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Publishers

Over 14 million articles with full-text links and license information deposited

Page 27: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

Usable as is:

https://blogs.nd.edu/emorgan/

Page 28: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

http://tdmsupport.crossref.org/

Page 29: UKSG Conference 2015 - CrossRef Text and Data Mining Services: one year in Rachael Lammey, CrossRef

www.crossref.orghttp://www.crossref.org/tdm/index.html

[email protected]