Top Banner
Biodiversity Heritage Library © 2008 Biodiversity Heritage Library www.biodiversitylibrary.org The Encyclopedia of Life, The Encyclopedia of Life, BioDiversity Heritage Library, BioDiversity Heritage Library, Biodiversity Informatics Biodiversity Informatics MBLWHOI Library Cathy Norton Deputy Director, BHL Massachusetts Library Association May 7, 2008
55
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

The Encyclopedia of Life,The Encyclopedia of Life, BioDiversity Heritage Library, BioDiversity Heritage Library,

Biodiversity InformaticsBiodiversity Informatics

MBLWHOI Library

Cathy Norton

Deputy Director, BHL

Massachusetts Library Association

May 7, 2008

Page 2: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

“The launch of the Encyclopedia of Life will have a profound and creative effect in science… this effort will lay out new directions

for research in Every branch of biology:

– E.O. Wilson

Page 3: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Collaborative Tree of Life distributed semantic

Biodiversity Heritage Library ever evolving TED all information Synthesis Center Oh wow! SpeciesBase ClassificationBank Education and Outreach ANTS index MacArthur Foundation taxonomic intelligence modular software communal ownership user defined AvenueA | Razorfish OBIS MBL free

visualization images WorkBench sounds phylogeny web 2.0 names-based infrastructure Atlas of Living Australia February 2008 Google Marine Biological Laboratory all species Smithsonian FISHBASE Harvard Field Museum Tree of Life E. O. Wilson aggregation / mashup EDIT ScratchPad widgets

MOBOT NHM AMNH NYBotancial Sloan Foundation GBIF llison l NameBank videos National Geographic any classification TDWG/BIS

Page 4: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Encyclopedia of Life• Major project to create a single Web page for every

known species (1.8 million!)• Total funding will reach at least $50M• EOL needs the literature underpinning in the BHL

project• BHL now key partner in EOL project• Launched on 9th May, 2007

– First 30,000 pages launched at TED Feb 27th, 2008

Page 5: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Serine Molecule

BiodiversityHeritage Library

Synthesis CenterField Museum

InformaticsMarine BiologicalLaboratory & MOBOT

Education & OutreachSmithsonian/Harvard

SecretariatSmithsonian

Page 6: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

This library serves the all of the scientific institutions in Woods Hole

and other scientific groups in the area.

The Library is facing a new dynamic phase

Page 7: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Page 8: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Page 9: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Digitize the core published literature on biodiversity and put on the Web

Agree on approaches with the global taxonomic community, rights holders and others

Mission:Provide Open Access to Biodiversity Literature

Goals:

Page 10: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Page 11: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

How big is the Biodiversity domain?

• Over 5.4 million books dating back to 1469

• 800,000 monographs

• 40,000 journal titles,

(12,5000 current)• 50% pre1923

Page 12: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Why now?

• Cost low – 10-19 cents a page• Other projects funded recently – BL/Microsoft

/Google big ten• Tractable, well-defined scientific domain• Taxonomic information has exceptionally longevity • Supports GBIF and other international initiatives –

including CBD, ABS, Darwin Declaration

Page 13: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

• Taxonomists and other scientists will have access to biodiversity literature - globally

• Will provide the developing world with access to the historical literature

• Scientists working in many biological domains – and other areas like meteorology, geology, ecology, genomics, etc – will get access

• Advance objectives of the Convention on Biological Diversity

Benefits

Page 14: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

• Less space needed for Library collections In Lillie – space freed for other uses

• % material can be stored off-site in ‘dark storage. FTP

• Our scientists will get access at their desk or in the field

• Library focus will shift to informatics• Virtual web library will increase public

access• Library staff will change –

Benefits to the MBLWHOI Library

Page 15: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

• Key partner of Encyclopedia of Life• Working Groups have agreed technical

plan, metadata standards and image standards

• Internet Archive to be technical partner – scanning and hosting

• ‘Scribe’ scanners now installed in NHM NYC and in Boston

• 4.1 million pages already available

Where are we now?

Page 16: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Classes of texts

Public Domain – pre-1923

Non-profit society journals

Post-1923 monographs

some with copyright renewals

some without copyright renewals

Commercial journals

Page 17: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

BHL Seeks Permissions

BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost.

Will provide a set of files to the learned society for reuse as they see fit.

Will index the issues using Taxonomic Intelligence increasing their usability.

Page 18: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Benefits

Use of the articles will increase as evidenced by citation upsurge.

Long-term management of the digital assets is provided by the BHL at no cost so it’s contributors

Content will be integrated into EOL project through TI nomenclatural linking.

Page 19: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Levinus Vincent

Elenchus tabularum, pinacothecarum, 1719

The cited half-life of publications inTaxonomy is longer than in any other Scientific discipline.The decay rate is longer than in most scientific disciplines.

-Maco-economic case for open accessTom Moritz

Current taxonomic literature often relies on texts and specimens> 100 years old.

Page 20: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Georges Louis Leclerc, comte de BuffonHistoire naturelle : générale et particulière (Oiseaux), 1799-1808

Convention on Biological Diversity: Article 17

Institutions that are creating the BHL exist to persist through time.

–The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are low-overhead, flexible, and can respond quickly. –Interoperabilty is the key.. Repository islands will sink

The Long NOW Strategy

Page 21: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Biologia Centrali-American

0

1

2

3

4

5

6

7

8

US & Canada Europe Mexico & C.America

SouthAmerica

Physical Distribution…

Now… you can

Parse Date, harvest out data, Wealth of informaiton locked on the pages are now liberated!

Page 22: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Henry Walter BatesThe Naturalist on the River Amazons, 1863

Most literature is in the developed worldthe Northern Hemisphere

Most Biodiverstiy is in the developing worldthe Southern Hemisphere

Page 23: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Progne subis- Purple Martin Illustrations of the nest and eggs of birds of Ohio, 1879-1886

Library and Laboratory: the Marriage of Research, Data and Taxonomic LiteratureLondon, February 2005

Eighty participants from 22 countries gathered to discuss the status and future of access to the taxonomic literature and to propose an agenda for actions that would improve the research environment for taxonomy. The participants were taxonomists; librarians; publishers; representatives of learned and professional societies, private foundations and government agencies; and specialists in information and communications technology.

Scalable Mass ScanningContractsFirewallsSecurityLoading DocksTrucks180 mile round trip!

Page 24: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Internet Archive Scribe: Boston

Page 25: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Ernest Ingersoll Hand-book to the National Museum … Smithsonian Institution, 1886

Mass Scanning WorkflowBid ListsPick ListsPacking ListsSerials ManagementMonographic ManagementStickers for Media and cartsRare Books-vaults

Page 26: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Jacob Christian SchäfferElementa entomologica . . . 1766.

BHL Portalhttp://www.biodiversitylibrary.org

Serve image and test files: create volume, Part, piece, metadata; ingest page level Metadata at scanning level; apply GloballyUnique Identifiers (GUIDs) for linking to Other taxonomic services.

Page 27: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Biodiversity Heritage Library

Page 28: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Biodiverstiy Informatics

Page 29: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

“All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.”

~ Grimaldi & Engel, 2005, Evolution of the Insects

Page 30: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Who knowth not the name, knoweth not the subject

Linnaeus, 1737, Critica Botanica n 210.

Page 31: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

• Information about named groups (taxa) of organisms (taxon-related information)

• Extends back at least 1000 years

• Books, journals, surveys• Museum specimens,

herbaria• In many languages and is

distributed

From T.E. Glover, The Fishes of Southwestern Japan, c.1870

Page 32: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

The challenge for contemporary DIGITAL libraries

Goal:

Use one name to find the content for all names related to “that” species.

Page 33: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Names – the only universal metadata for Biology

Names offer a logical way to search for and index content

•Names annotate data objects•All names annotate all data objects

•A compilation of all names ever used is the foundation of a universal index for biology or for a semantic web for biology

Page 34: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

LibrariesPublishers

MuseumsFederal Agencies

Who is affected by these problems?

Page 35: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Serious challenges in federated environments

One organism

4 scientific names

4 maps

We want one map

Page 36: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Reuse, don’t rebuild

Page 37: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Page 38: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

• All names & all Classifications ClassificationBank • Alternative names reconciled

• Similar names disambiguated

• Exploit hierarchies to browse and search, build a comprehensive classification

• Improve performance with federated systems

• Read documents, web sites, databases and taxonomically indexing the content

• Create a unified portal to information about organisms on the internet

Taxonomic intelligence is the inclusion of taxonomic practices, skills and knowledge within informatics services to manage information about organisms

Page 39: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

• data from various sources may be merged

• red dots on the maplink back to the website thatprovided the geographical co-ordinates

Specimen distribution data from remote sources

Page 40: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

uBio Programmers

BHL Taxonomic Intelligence Tool

Georges Louis Leclerc, comte de BuffonHistoire naturelle : générale et particulière (Oiseaux), 1799-1808

Page 41: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

uBio

• 10.7 Million+ Name Strings

• Reconciliation Groups

• http://www.ubio.org

Page 42: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Training and Improving the Algorithm

Page 43: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

uBioRSS Taxonomically Intelligent RSS Feed Aggregator

Page 44: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

uBioRSS Taxonomically Intelligent RSS Feed Aggregator

Page 45: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

MBL WHOI Library – Woods Hole authors’ publications

Page 46: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

MBL WHOI Library – Woods Hole species publications

Page 47: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

• It will benefit any initiative that uses distributed and heterogeneous information about biology

• Distributed content on the same species can be drawn together because different names will be standardized through reconciliation

• We can read documents, find names, catalog and taxonomically index documents

• Produce a framework around which we can organize and assemble remote and local content

Taxonomic intelligence works miracles!

Page 48: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Taxonomically intelligent scientific text parsing

Page 49: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Page 50: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Page 51: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

• Search• Browse

Page 52: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

“It is exciting to anticipate the scientific chords we might hear once 1.8

million notes are brought together through this instrument. Potential

EOL users are professional and citizen scientists, teachers, students,

media, environmental managers, families and artists. The site will link

the public and scientific community in a collaborative way that’s

without precedent in scale.”

• Jim Edwards, Executive Director, EOL

Page 53: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Page 54: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

Acknowledgments

Catherine NortonPatrick LearyDavid Remsen

Diane RielingerDavid Patterson

Neil Sarkar

A.W. Mellon FoundationAlfred P. Sloan Foundation

John D. & Catherine T. MacArthur FoundationInternet Archive

Christopher FreelandTom Garnett

Martin KalfatovicGraham Higley BHL & EOL Teams

Page 55: Mla May 7

Biodiversity Heritage Library

© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org

www.biodiversitylibrary.orgwww.eol.orgwww.ubio.org