Page 1
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
The Encyclopedia of Life,The Encyclopedia of Life, BioDiversity Heritage Library, BioDiversity Heritage Library,
Biodiversity InformaticsBiodiversity Informatics
MBLWHOI Library
Cathy Norton
Deputy Director, BHL
Massachusetts Library Association
May 7, 2008
Page 2
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
“The launch of the Encyclopedia of Life will have a profound and creative effect in science… this effort will lay out new directions
for research in Every branch of biology:
– E.O. Wilson
Page 3
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Collaborative Tree of Life distributed semantic
Biodiversity Heritage Library ever evolving TED all information Synthesis Center Oh wow! SpeciesBase ClassificationBank Education and Outreach ANTS index MacArthur Foundation taxonomic intelligence modular software communal ownership user defined AvenueA | Razorfish OBIS MBL free
visualization images WorkBench sounds phylogeny web 2.0 names-based infrastructure Atlas of Living Australia February 2008 Google Marine Biological Laboratory all species Smithsonian FISHBASE Harvard Field Museum Tree of Life E. O. Wilson aggregation / mashup EDIT ScratchPad widgets
MOBOT NHM AMNH NYBotancial Sloan Foundation GBIF llison l NameBank videos National Geographic any classification TDWG/BIS
Page 4
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Encyclopedia of Life• Major project to create a single Web page for every
known species (1.8 million!)• Total funding will reach at least $50M• EOL needs the literature underpinning in the BHL
project• BHL now key partner in EOL project• Launched on 9th May, 2007
– First 30,000 pages launched at TED Feb 27th, 2008
Page 5
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Serine Molecule
BiodiversityHeritage Library
Synthesis CenterField Museum
InformaticsMarine BiologicalLaboratory & MOBOT
Education & OutreachSmithsonian/Harvard
SecretariatSmithsonian
Page 6
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
This library serves the all of the scientific institutions in Woods Hole
and other scientific groups in the area.
The Library is facing a new dynamic phase
Page 7
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Page 8
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Page 9
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Digitize the core published literature on biodiversity and put on the Web
Agree on approaches with the global taxonomic community, rights holders and others
Mission:Provide Open Access to Biodiversity Literature
Goals:
Page 10
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Page 11
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
How big is the Biodiversity domain?
• Over 5.4 million books dating back to 1469
• 800,000 monographs
• 40,000 journal titles,
(12,5000 current)• 50% pre1923
Page 12
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Why now?
• Cost low – 10-19 cents a page• Other projects funded recently – BL/Microsoft
/Google big ten• Tractable, well-defined scientific domain• Taxonomic information has exceptionally longevity • Supports GBIF and other international initiatives –
including CBD, ABS, Darwin Declaration
Page 13
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• Taxonomists and other scientists will have access to biodiversity literature - globally
• Will provide the developing world with access to the historical literature
• Scientists working in many biological domains – and other areas like meteorology, geology, ecology, genomics, etc – will get access
• Advance objectives of the Convention on Biological Diversity
Benefits
Page 14
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• Less space needed for Library collections In Lillie – space freed for other uses
• % material can be stored off-site in ‘dark storage. FTP
• Our scientists will get access at their desk or in the field
• Library focus will shift to informatics• Virtual web library will increase public
access• Library staff will change –
Benefits to the MBLWHOI Library
Page 15
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• Key partner of Encyclopedia of Life• Working Groups have agreed technical
plan, metadata standards and image standards
• Internet Archive to be technical partner – scanning and hosting
• ‘Scribe’ scanners now installed in NHM NYC and in Boston
• 4.1 million pages already available
Where are we now?
Page 16
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Classes of texts
Public Domain – pre-1923
Non-profit society journals
Post-1923 monographs
some with copyright renewals
some without copyright renewals
Commercial journals
Page 17
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
BHL Seeks Permissions
BHL will digitize learned society backfiles and mount them through the BHL Portal at no cost.
Will provide a set of files to the learned society for reuse as they see fit.
Will index the issues using Taxonomic Intelligence increasing their usability.
Page 18
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Benefits
Use of the articles will increase as evidenced by citation upsurge.
Long-term management of the digital assets is provided by the BHL at no cost so it’s contributors
Content will be integrated into EOL project through TI nomenclatural linking.
Page 19
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Levinus Vincent
Elenchus tabularum, pinacothecarum, 1719
The cited half-life of publications inTaxonomy is longer than in any other Scientific discipline.The decay rate is longer than in most scientific disciplines.
-Maco-economic case for open accessTom Moritz
Current taxonomic literature often relies on texts and specimens> 100 years old.
Page 20
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Georges Louis Leclerc, comte de BuffonHistoire naturelle : générale et particulière (Oiseaux), 1799-1808
Convention on Biological Diversity: Article 17
Institutions that are creating the BHL exist to persist through time.
–The future is uncertain, the technology landscape changes, people pass on. So create consortial structures that are low-overhead, flexible, and can respond quickly. –Interoperabilty is the key.. Repository islands will sink
The Long NOW Strategy
Page 21
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Biologia Centrali-American
0
1
2
3
4
5
6
7
8
US & Canada Europe Mexico & C.America
SouthAmerica
Physical Distribution…
Now… you can
Parse Date, harvest out data, Wealth of informaiton locked on the pages are now liberated!
Page 22
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Henry Walter BatesThe Naturalist on the River Amazons, 1863
Most literature is in the developed worldthe Northern Hemisphere
Most Biodiverstiy is in the developing worldthe Southern Hemisphere
Page 23
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Progne subis- Purple Martin Illustrations of the nest and eggs of birds of Ohio, 1879-1886
Library and Laboratory: the Marriage of Research, Data and Taxonomic LiteratureLondon, February 2005
Eighty participants from 22 countries gathered to discuss the status and future of access to the taxonomic literature and to propose an agenda for actions that would improve the research environment for taxonomy. The participants were taxonomists; librarians; publishers; representatives of learned and professional societies, private foundations and government agencies; and specialists in information and communications technology.
Scalable Mass ScanningContractsFirewallsSecurityLoading DocksTrucks180 mile round trip!
Page 24
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Internet Archive Scribe: Boston
Page 25
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Ernest Ingersoll Hand-book to the National Museum … Smithsonian Institution, 1886
Mass Scanning WorkflowBid ListsPick ListsPacking ListsSerials ManagementMonographic ManagementStickers for Media and cartsRare Books-vaults
Page 26
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Jacob Christian SchäfferElementa entomologica . . . 1766.
BHL Portalhttp://www.biodiversitylibrary.org
Serve image and test files: create volume, Part, piece, metadata; ingest page level Metadata at scanning level; apply GloballyUnique Identifiers (GUIDs) for linking to Other taxonomic services.
Page 27
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiversity Heritage Library
Page 28
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Biodiverstiy Informatics
Page 29
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
“All accumulated information of a species is tied to a scientific name, a name that serves as a link between what has been learned in the past and what we today add to the body of knowledge.”
~ Grimaldi & Engel, 2005, Evolution of the Insects
Page 30
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Who knowth not the name, knoweth not the subject
Linnaeus, 1737, Critica Botanica n 210.
Page 31
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• Information about named groups (taxa) of organisms (taxon-related information)
• Extends back at least 1000 years
• Books, journals, surveys• Museum specimens,
herbaria• In many languages and is
distributed
From T.E. Glover, The Fishes of Southwestern Japan, c.1870
Page 32
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
The challenge for contemporary DIGITAL libraries
Goal:
Use one name to find the content for all names related to “that” species.
Page 33
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Names – the only universal metadata for Biology
Names offer a logical way to search for and index content
•Names annotate data objects•All names annotate all data objects
•A compilation of all names ever used is the foundation of a universal index for biology or for a semantic web for biology
Page 34
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
LibrariesPublishers
MuseumsFederal Agencies
Who is affected by these problems?
Page 35
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Serious challenges in federated environments
One organism
4 scientific names
4 maps
We want one map
Page 36
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Reuse, don’t rebuild
Page 37
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Page 38
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• All names & all Classifications ClassificationBank • Alternative names reconciled
• Similar names disambiguated
• Exploit hierarchies to browse and search, build a comprehensive classification
• Improve performance with federated systems
• Read documents, web sites, databases and taxonomically indexing the content
• Create a unified portal to information about organisms on the internet
Taxonomic intelligence is the inclusion of taxonomic practices, skills and knowledge within informatics services to manage information about organisms
Page 39
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• data from various sources may be merged
• red dots on the maplink back to the website thatprovided the geographical co-ordinates
Specimen distribution data from remote sources
Page 40
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
uBio Programmers
BHL Taxonomic Intelligence Tool
Georges Louis Leclerc, comte de BuffonHistoire naturelle : générale et particulière (Oiseaux), 1799-1808
Page 41
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
uBio
• 10.7 Million+ Name Strings
• Reconciliation Groups
• http://www.ubio.org
Page 42
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Training and Improving the Algorithm
Page 43
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
uBioRSS Taxonomically Intelligent RSS Feed Aggregator
Page 44
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
uBioRSS Taxonomically Intelligent RSS Feed Aggregator
Page 45
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
MBL WHOI Library – Woods Hole authors’ publications
Page 46
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
MBL WHOI Library – Woods Hole species publications
Page 47
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• It will benefit any initiative that uses distributed and heterogeneous information about biology
• Distributed content on the same species can be drawn together because different names will be standardized through reconciliation
• We can read documents, find names, catalog and taxonomically index documents
• Produce a framework around which we can organize and assemble remote and local content
Taxonomic intelligence works miracles!
Page 48
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Taxonomically intelligent scientific text parsing
Page 49
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Page 50
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Page 51
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
• Search• Browse
Page 52
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
“It is exciting to anticipate the scientific chords we might hear once 1.8
million notes are brought together through this instrument. Potential
EOL users are professional and citizen scientists, teachers, students,
media, environmental managers, families and artists. The site will link
the public and scientific community in a collaborative way that’s
without precedent in scale.”
• Jim Edwards, Executive Director, EOL
Page 53
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Page 54
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
Acknowledgments
Catherine NortonPatrick LearyDavid Remsen
Diane RielingerDavid Patterson
Neil Sarkar
A.W. Mellon FoundationAlfred P. Sloan Foundation
John D. & Catherine T. MacArthur FoundationInternet Archive
Christopher FreelandTom Garnett
Martin KalfatovicGraham Higley BHL & EOL Teams
Page 55
Biodiversity Heritage Library
© 2008 Biodiversity Heritage Library www.biodiversitylibrary.org
www.biodiversitylibrary.orgwww.eol.orgwww.ubio.org