Kew at the pro-iBiosphere data hackathon

Post on 02-Jul-2015

377 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

Transcript

Kew at pro-iBiosphere

data hackathon

Nicky Nicolson, Matt BlissettRBG Kew Biodiversity Informatics team

A map + data + tools = links

Two minute background: what we’ve done, why we

should link up our data

What is needed?

- Persistent identifiers

- Tools – to turn “strings” into “things”

What we’ve brought along:

- Map

- Data

- ... Labelled with persistent identifiers

- A rules based matching / linking tool

A map + data + tools = links

Two minute background: what we’ve done, why we

should link up our data

What is needed?

- Persistent identifiers

- Tools – to turn “strings” into “things”

What we’ve brought along:

- Map

- Data

- ... Labelled with persistent identifiers

- A rules based matching / linking tool

specimens.kew.org/herbarium/K000525802

doi: 10.1007/s12225-010-9210-7

Cited in:

Rakotoarinivo M, Dransfield J. 2010

New species of Dypsis and Ravenea

(Arecaceae) from Madagascar. Kew

Bull. 65, 279–303.

doi:10.1007/s12225-010-9210-7

specimens.kew.org/herbarium/K000525802

Data linking tool

Rules based

Armed with a tabular dataset, you:

Define zero or more transformers for each field

Define how fields must match

This is a match configuration.

Examples of transformers

Epithet

mediterraneum → mediterranea

NormaliseDiacrits

Déségl. → Desegl.

RemoveBracketedText, RomanNumeral

cix (1892), 57 → 109 57

CleanedPubAuthors

(L.) A.Gray in Hook.f. → A.Gray

SurnameExtracter

(A.Gray) A.Heller → (Gray) Heller

PageExtractor

37(4): 412 (1977) → 412

Examples of matchers

Exact

CommonTokens

CapitalLetters

in Beitr. Aethiop. → B A

Beitr. Fl. Aethiop. → B F A = 0.67 ratio

Number

Integer

Levenshtein

Using the matcher

A configured match can run against any tabular dataset.

Accessible as:

- JSON web service

- Google Refine reconciliation service (work in

progress)

Transformers can be dropped into Google Refine

Proposal: link names in floras to

IPNI

We’ll set up the tool with IPNI as its backend dataset

We run lists of taxa treated in floras against it and

distribute IPNI IDs for these names.

Short term gain: navigate via the IPNI ID to the

evidence about the name – protologues (Rod has

matched 120K to DOIs) and types.

Long term gain: GSPC target #1 – online world flora.

Simpler to integrate data if we’re talking about the

same name.

Proposal – link IPNI to types

We set up the tool with a botanical specimen catalogue

as its backend data-source.

We link up the IPNI cited type data with the specimens

themselves.

Proposal – link floras to

specimens

Floras use herbarium specimens as evidence for their

distribution statements.

We set up the tool with a botanical specimen catalogue

as its backend data-source.

We extract specimen references from floras and run

these against the tool to create links from flora

accounts to specimens themselves.

specimens.kew.org/herbarium/K000049118

Cited in: FZ volume:5 part:3 (2003) Rubiaceae by D.M.Bridson &

B.Verdcourt

specimens.kew.org/herbarium/K000049118

Proposal – link duplicates

between herbaria

We set up the tool with a botanical specimen catalogue

e.g. K as its backend data-source.

We fire specimen data from another specimen

catalogue at it to look for duplicates.

Benefits:

- Geo-referencing

- Imaging

- Data capture efficiency

n.nicolson@kew.org

@nickynicolson

m.blissett@kew.org

top related