Top Banner
Kew at pro-iBiosphere data hackathon Nicky Nicolson, Matt Blissett RBG Kew Biodiversity Informatics team
32

Kew at the pro-iBiosphere data hackathon

Jul 02, 2015

Download

Technology

nickyn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kew at the pro-iBiosphere data hackathon

Kew at pro-iBiosphere

data hackathon

Nicky Nicolson, Matt BlissettRBG Kew Biodiversity Informatics team

Page 2: Kew at the pro-iBiosphere data hackathon

A map + data + tools = links

Two minute background: what we’ve done, why we

should link up our data

What is needed?

- Persistent identifiers

- Tools – to turn “strings” into “things”

What we’ve brought along:

- Map

- Data

- ... Labelled with persistent identifiers

- A rules based matching / linking tool

Page 3: Kew at the pro-iBiosphere data hackathon

A map + data + tools = links

Two minute background: what we’ve done, why we

should link up our data

What is needed?

- Persistent identifiers

- Tools – to turn “strings” into “things”

What we’ve brought along:

- Map

- Data

- ... Labelled with persistent identifiers

- A rules based matching / linking tool

Page 4: Kew at the pro-iBiosphere data hackathon
Page 5: Kew at the pro-iBiosphere data hackathon
Page 6: Kew at the pro-iBiosphere data hackathon
Page 7: Kew at the pro-iBiosphere data hackathon
Page 8: Kew at the pro-iBiosphere data hackathon
Page 9: Kew at the pro-iBiosphere data hackathon
Page 10: Kew at the pro-iBiosphere data hackathon
Page 11: Kew at the pro-iBiosphere data hackathon
Page 12: Kew at the pro-iBiosphere data hackathon
Page 13: Kew at the pro-iBiosphere data hackathon
Page 14: Kew at the pro-iBiosphere data hackathon
Page 15: Kew at the pro-iBiosphere data hackathon
Page 16: Kew at the pro-iBiosphere data hackathon
Page 17: Kew at the pro-iBiosphere data hackathon
Page 18: Kew at the pro-iBiosphere data hackathon

specimens.kew.org/herbarium/K000525802

doi: 10.1007/s12225-010-9210-7

Page 19: Kew at the pro-iBiosphere data hackathon
Page 20: Kew at the pro-iBiosphere data hackathon

Cited in:

Rakotoarinivo M, Dransfield J. 2010

New species of Dypsis and Ravenea

(Arecaceae) from Madagascar. Kew

Bull. 65, 279–303.

doi:10.1007/s12225-010-9210-7

specimens.kew.org/herbarium/K000525802

Page 21: Kew at the pro-iBiosphere data hackathon

Data linking tool

Rules based

Armed with a tabular dataset, you:

Define zero or more transformers for each field

Define how fields must match

This is a match configuration.

Page 22: Kew at the pro-iBiosphere data hackathon

Examples of transformers

Epithet

mediterraneum → mediterranea

NormaliseDiacrits

Déségl. → Desegl.

RemoveBracketedText, RomanNumeral

cix (1892), 57 → 109 57

CleanedPubAuthors

(L.) A.Gray in Hook.f. → A.Gray

SurnameExtracter

(A.Gray) A.Heller → (Gray) Heller

PageExtractor

37(4): 412 (1977) → 412

Page 23: Kew at the pro-iBiosphere data hackathon

Examples of matchers

Exact

CommonTokens

CapitalLetters

in Beitr. Aethiop. → B A

Beitr. Fl. Aethiop. → B F A = 0.67 ratio

Number

Integer

Levenshtein

Page 24: Kew at the pro-iBiosphere data hackathon

Using the matcher

A configured match can run against any tabular dataset.

Accessible as:

- JSON web service

- Google Refine reconciliation service (work in

progress)

Transformers can be dropped into Google Refine

Page 25: Kew at the pro-iBiosphere data hackathon

Proposal: link names in floras to

IPNI

We’ll set up the tool with IPNI as its backend dataset

We run lists of taxa treated in floras against it and

distribute IPNI IDs for these names.

Short term gain: navigate via the IPNI ID to the

evidence about the name – protologues (Rod has

matched 120K to DOIs) and types.

Long term gain: GSPC target #1 – online world flora.

Simpler to integrate data if we’re talking about the

same name.

Page 26: Kew at the pro-iBiosphere data hackathon

Proposal – link IPNI to types

We set up the tool with a botanical specimen catalogue

as its backend data-source.

We link up the IPNI cited type data with the specimens

themselves.

Page 27: Kew at the pro-iBiosphere data hackathon

Proposal – link floras to

specimens

Floras use herbarium specimens as evidence for their

distribution statements.

We set up the tool with a botanical specimen catalogue

as its backend data-source.

We extract specimen references from floras and run

these against the tool to create links from flora

accounts to specimens themselves.

Page 28: Kew at the pro-iBiosphere data hackathon

specimens.kew.org/herbarium/K000049118

Page 29: Kew at the pro-iBiosphere data hackathon

Cited in: FZ volume:5 part:3 (2003) Rubiaceae by D.M.Bridson &

B.Verdcourt

specimens.kew.org/herbarium/K000049118

Page 30: Kew at the pro-iBiosphere data hackathon

Proposal – link duplicates

between herbaria

We set up the tool with a botanical specimen catalogue

e.g. K as its backend data-source.

We fire specimen data from another specimen

catalogue at it to look for duplicates.

Benefits:

- Geo-referencing

- Imaging

- Data capture efficiency

Page 31: Kew at the pro-iBiosphere data hackathon
Page 32: Kew at the pro-iBiosphere data hackathon

[email protected]

@nickynicolson

[email protected]