Top Banner
Phil Cryer Biodiversity Heritage Library Scripting Life: the science behind ViBRANT January 20-21, 2011 - Paris, France Biodiversity Heritage Library: Process & Progress
40

Biodiversity Heritiage Library: progress and process

May 18, 2015

Download

Technology

Phil Cryer

And update on Biodiversity Heritage Library's efforts and success in 2010 with a focus on the future as part of the EU project, ViBRANT.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Biodiversity Heritiage Library: progress and process

Phil Cryer Biodiversity Heritage Library

Scripting Life: the science behind ViBRANTJanuary 20-21, 2011 - Paris, France

Phil Cryer Biodiversity Heritage Library

Scripting Life: the science behind ViBRANTJanuary 20-21, 2011 - Paris, France

Biodiversity Heritage Library: Process & Progress

Page 2: Biodiversity Heritiage Library: progress and process

• a consortium of global partners

• aims to share historic biodiversity literature texts

• provides open access of all content

• free for all

Biodiversity Heritage Library (BHL)

Page 3: Biodiversity Heritiage Library: progress and process

bhl data statsbhl data stats

Page 4: Biodiversity Heritiage Library: progress and process

Content•45,000 journals & monographs–8,821 in 2010

•87,000 volumes–15,552 in 2010

•32 million pages–5.6 million in 2010

Page 5: Biodiversity Heritiage Library: progress and process

Usage (2010)•837,000 visits

•422,000 unique visitors

•4.2 millions page views

•221 countries/territories

Page 6: Biodiversity Heritiage Library: progress and process

new featuresnew features

Page 7: Biodiversity Heritiage Library: progress and process

scanning request form

click on ‘Feedback’ to access

Page 8: Biodiversity Heritiage Library: progress and process

new user interface for names index

sortable columns, exportable via CSV, BibTeX and Endnote

Page 9: Biodiversity Heritiage Library: progress and process

downloadable article PDFs

create articles from BHL books

Page 10: Biodiversity Heritiage Library: progress and process

downloadable article PDFs

1- enter metadata about the article

Page 11: Biodiversity Heritiage Library: progress and process

downloadable article PDFs

2- select the pages of the article

Page 12: Biodiversity Heritiage Library: progress and process

downloadable article PDFs

3- PDF request received

Page 13: Biodiversity Heritiage Library: progress and process

downloadable article PDFs

4- PDF article arrives via email

Page 14: Biodiversity Heritiage Library: progress and process

CiteBank (http://citebank.org)

open access repository for biodiversity publications

Page 15: Biodiversity Heritiage Library: progress and process

CiteBank (http://citebank.org)

Solr search with faceting

Page 16: Biodiversity Heritiage Library: progress and process

CiteBank (http://citebank.org)

individual bibliography page

Page 17: Biodiversity Heritiage Library: progress and process

CiteBank features

• access the ‘crowd-sourced’ articles generated from the BHL scans (harvested from BHL)

• platform for journals/publishers/societies in need of tools to store and share content

• harvests metadata from Zookeys, SCiELO, Smithsonian collections nightly via OAI-PMH

• new search index to BHL content using Solr

Page 18: Biodiversity Heritiage Library: progress and process

CiteBank + BHL expands our core features

• content and tools for scholarly crowd-sourcing

– Users can get content they need, do minor work, share enhancements with community

• look to add more content integration with other existing platforms

– EOL, Atlas of Living Australia, JSTOR Plant Science, BioStor and others

– Mendeley, Zotero, RefWorks, etc

Page 19: Biodiversity Heritiage Library: progress and process

• enhancements to the portal home page

– More focus on search

• special collections

– Charles Darwin’s scientific library

• scholarly annotations

– annotations in Darwin’s hand and academic interpretation, crosslinking

More BHL features coming soon...

Page 20: Biodiversity Heritiage Library: progress and process

bhl globalbhl global

Page 21: Biodiversity Heritiage Library: progress and process
Page 22: Biodiversity Heritiage Library: progress and process
Page 23: Biodiversity Heritiage Library: progress and process
Page 24: Biodiversity Heritiage Library: progress and process

Benefits of Global BHL partnerships

• redundancy and resilience

– data and app Mirroring

• exposing unique content

• new tools, services, people

• opportunities for new collaborations

– IMPACT, ViBRANT, OpenUp! in EU

Page 25: Biodiversity Heritiage Library: progress and process

storage clustersstorage clusters

Page 26: Biodiversity Heritiage Library: progress and process

• all BHL data stored at the Internet Archive in San Francisco

– no redundancy– limited in how we could serve our

data and images

– difficult to analyze data

• First global BHL cluster gives us– redundancy and failover– many new serving options

– new ways to run analytics, data mining

Storage issues solved using clusters

Page 27: Biodiversity Heritiage Library: progress and process
Page 28: Biodiversity Heritiage Library: progress and process

• open source software– Linux operating system– Gluster distributed storage system

• commodity hardware– Supermicro servers– ‘off the shelf’ hard drives and other

system components

Open source software / commodity hardware

Page 29: Biodiversity Heritiage Library: progress and process

• BHL Cluster 01– six 4U sized cabinets– twenty-four 1.5TB hard

drives in each cabinet

– 97TB of replicated and distributed storage (over 200TB of raw disk)

BHL Cluster 01

Page 30: Biodiversity Heritiage Library: progress and process
Page 31: Biodiversity Heritiage Library: progress and process

Statistical computing

• find relationships– R GNU statistical language– Hadoop, Disco

• make existing data more useful– image and OCR reprocessing,

taxonfinder

Page 32: Biodiversity Heritiage Library: progress and process

data sharingdata sharing

Page 33: Biodiversity Heritiage Library: progress and process

• replicating BHL data globally– Marine Biological Laboratory (Woods Hole,

US)– National History Museum (London, UK)

– Bibliotheca Alexandrina (Alexandrina, EG)

– Atlas of Living Australia (Canberra, AU)

– China... Brazil...

• advantages to replication

– redundancy, failover– load balancing

– geographical distribution

Data sharing and replication

Page 34: Biodiversity Heritiage Library: progress and process

• grabby– handles initial download from Internet Archive (IA)

• bhl-sync

– open source Dropbox model– handles syncing remote nodes automatically

– uses inotify, lsyncd, OpenSSH, rsync, unison

– remote server only requires a secure login

Open source code available at http://bit.ly/bhl-bits

Software for data sync

Page 35: Biodiversity Heritiage Library: progress and process

• digital repository platform– enables storage and management of digital content– maintains a persistent digital archive

– stores data in a neutral manner

– provides backup, redundancy, disaster recovery

• shares data to remote nodes via OAI-PMH

Fedora-commons integration

Page 36: Biodiversity Heritiage Library: progress and process

future plansfuture plans

Page 37: Biodiversity Heritiage Library: progress and process

• BHL is a member of CrossRef through The Smithsonian

• will start assigning DOIs to BHL monographs

– easy, non-controversial provides open access of all content

• then move on to articles and other publication types

– CrossRef rules make full assignment challenging for crowd-sourced articles

Assigning DOIs (Digital Object Identifier)

Page 38: Biodiversity Heritiage Library: progress and process

• OCR Correction

– a big problem, no easy solution

• add more content

– partnerships, CiteBank

• sustainability planning and funding

– committed to no fees for users

• more outreach

– conferences, marketing

– Facebook, Twitter and other social media avenues...

Wish list for 2011 and beyond

Page 39: Biodiversity Heritiage Library: progress and process

http://biodiversitylibrary.blogspot.com

http://twitter.com/BioDivLibrary #bhlib

http://facebook.com/pages/Biodiversity-Heritage-Library/63547246565

http://flickr.com/groups/bhl

http://youtube.com/user/BioHeritageLibrary

http://biodiversitylibrary.org/RecentRss.aspx

http://slidesha.re/bhl-slides

BHL is social!

Page 40: Biodiversity Heritiage Library: progress and process

slides: slidesha.re/bhl-slidescontact: [email protected]

Thanks.

slides: slidesha.re/bhl-slidescontact: [email protected]

Thanks.

Phil Cryer: Biodiversity Heritage Library

Scripting Life: the science behind ViBRANTJanuary 20-21, 2011 - Paris, France

Phil Cryer: Biodiversity Heritage Library

Scripting Life: the science behind ViBRANTJanuary 20-21, 2011 - Paris, France