Top Banner
8

The Natural History Open Data Challenge @ OTA16

Apr 12, 2017

Download

Technology

Margaret Gold
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Natural History Open Data Challenge @ OTA16
Page 2: The Natural History Open Data Challenge @ OTA16

Diverse collections spanning space and time

Challenge of scale:>80 million specimens!

Challenge of speed (digitising within a lifetime)

Ambitious digitisation programme (DCP)

Institutional policy “open by default”

Page 3: The Natural History Open Data Challenge @ OTA16
Page 4: The Natural History Open Data Challenge @ OTA16

Higher ClassificationScientific name: Thymelicus lineola (Ochsenheimer, 1808)Family: Hesperiidae

LocationLocality: Tilbury DocksState/province: EnglandCountry: United KingdomContinent: EuropeDecimal latitude: 51.4605Decimal longitude: 0.3449

Collection EventRecorded by: T G. Howarth; HowarthCollection date: 31 / 07 / 1938

Most iCollections specimens will have ~30 fields containing data (over 100 different fields across all collections)

There are some issues… (where is H. M. Edelsten!?)

Page 5: The Natural History Open Data Challenge @ OTA16

http://data.nhm.ac.uk

Page 6: The Natural History Open Data Challenge @ OTA16

Complete NHM Specimen Dataset (3.3M records)

http://bit.ly/2goEpBB

GitHub Gist – NHM API:

http://bit.ly/2gtukRv

iCollections Datasets

http://bit.ly/2gGZub5

Even more data…

http://www.gbif.org/occurrence

Page 7: The Natural History Open Data Challenge @ OTA16

Potential Challenges

How did collecting effort change over time?

Who was the collector who collected from the most distinct localities? – can we make a ranking table and mash up data with Wikipedia or other sources?

What can we learn about the collectors – who travelled the furthest or most regularly?

Were most specimens collected in rural areas? Is there collection bias in particular counties?

How can we make the data more attractive to difference audiences?

How could we display the data in more engaging or informative ways?

Page 8: The Natural History Open Data Challenge @ OTA16

Complete NHM Specimen Dataset (3.3M records)

http://bit.ly/2goEpBB

GitHub Gist – NHM API:

http://bit.ly/2gtukRv

iCollections Datasets

http://bit.ly/2gGZub5

Even more data…

http://www.gbif.org/occurrence