Taking TL-2 Online A Linked Data Resource Martin R. Kalfatovic Joel Richard Smithsonian Libraries TDWG 2011 Annual Conference New Orleans, LA 18 October 2011
Jul 03, 2015
Taking TL-2 Online
A Linked Data Resource
Martin R. KalfatovicJoel RichardSmithsonian
Libraries
TDWG2011 Annual Conference
New Orleans, LA18 October 2011
TL1 (1967) xx, 556 pp. A-Z [not numbered]TL-2/1 2 (1976) xl, 1136 pp. A-G 1-2223TL-2/2 (1979) xviii, 991 pp. H-Le 2224-4483TL-2/3 (1981) xii, 980 pp. Lh-O 4484-7174TL-2/4 (1983) ix, 1214 pp. P-Sak 7175-10,104TL-2/5 (1985) [v], 1066 pp. Sal-Ste 10,105-13,105TL-2/6 (1986) [v], 926 pp. Sti-Vuy 13,106-16,459TL-2/7 (1988) lvi, 653 pp. W-Z 16,460-18,785Suppl. 13 (1992) viii, 453 pp. A-Ba 18,786-20,458Suppl. 2 (1993) vi, 464 pp. Be-Bo 20,459-22,485Suppl. 3 (1995) vi, 550 pp. Br-Ca 22,486-25,190Suppl. 4 (1997) vi, 614 pp. Ce-Cz 25,191-28,566Suppl. 5 (1998) viii, 431 pp. Da-Di 28,567-30,948Suppl. 6 (2000) vi, 518 pp. Do-E 30,949-33,658Suppl. 7 (2009) xviii, 469 pp. F-Frer 33,659-35,497Suppl. 8 (2009) viii, 560 pp. Fress-G 35,498-37,609
TL-2 print volumes …
by the numbers
While TL-2 is an essential reference tool to plant scientists, reference librarians, and catalogers, it is less well known to the broader natural sciences community. The community will immediately benefit from open online access to the detailed treatments of 9,072 authors and 37,609 numbered citations....
The content provides precise dates of publication, details about specific titles, editions, and related publications, associated authors, biographical details including dates, education and career highlights, and disposition of herbaria.
-Judy Warnement, Harvard Univ. Botany Libraries
Scope of the Project
> Print version: 15 volumes> Pages: ~ 11, 000> Characters/Page Avg. of 3,400> Author entries: ~ 44, 000> Image files: ~ 9 GB in size
Project Starts: 2010
IAPT and Smithsonian Libraries sign agreement to create a new online version of TL-2
Funded by the Smithsonian's Atherton Seidell Endowment
Image and File Creation
Scanned print volumes done through Internet Archive
100% quality control review of scanned pages by SIL staff
Re-keyed to 99.97% accuracy
ProjectMilestones I
January 2011
Scanning of print volumes complete; image files on BHL
December 2010
IAPT staff provide machine-readable versions of more recent volumes
ProjectMilestones II
November 2011
Completion of text conversion of TL-2
September 2011
Test conversion done and conversion methodologies approved
ProjectMilestones III
January 2012
TL-2 Online, version 1.0 publicly available via a Smithsonian Institution Libraries website. This version will provide, at a minimum, all the functionality currently provided by the limited access version
Example of Conversion
Specs
> Introduction sections can be omitted. Introductory text to the indexes can be omitted.> Accented letters and diacritics must be preserved.> The beginning of all non-indented lines should be indicated with a <br/> tag. > Indented lines of text are not indicated> Bold and italicized text should not be indicated.> Hyphenated words will be maintained throughout the text. > The presence of a tables should be indicated by a <Table/> tag. No other parsing should be done. > Each line in the indexes will be converted to simple XML, but not parsed into fields.
XSD andSample ofConverted
Text
What do you Want!!!
February 2012 forward
TL-2 Online, version X.0; additional functionality will be added to the TL-2 Online version; details of this functionality will be developed with the input of the botanical & taxonomic community.
ObligatoryBiodiversity
HeritageLibrarySlide
Planned Future
Developments I
Initially TL-2 will be presented in a basic website that is searchable by keyword, botanist name, TL-2 title number, or TL-2 botanist or title abbreviation. The website will display the search results with the scanned page (as a zoomable JPEGs) and the parallel OCRed and corrected text. The full OCRed text may be made available for download and the scanned pages can also be browsed in a "page turning" application. This will be the form that the TL-2 site will take before migration to the Libraries' Digital Library website next spring.
Planned Future
Developments II
The second round of planned improvements to the TL-2 site includes implementing Linked Open Data for the entire TL-2 dataset. This computer-friendly format will enhance the reusability of the TL-2 data for projects now and in the future. Each botanist and title will have a unique URI on the Libraries' website. This URI will be a permanent, authoritative location on the web for the botanists and titles and information about each in both human-readable form (via HTML) and computer-readable form (via RDF/XML.) The implementation of Linked Open Data also facilitates the creation of a SPARQL endpoint, which allows the data contained in our website to be queried like a database.
Planned Future
Developments III
We plan to add to our linked data by parsing the herbaria names with the goal of linking them to their real names in the TL-2 index and to an external location on the Web. Once the herbaria are identified and linked, they can be search forwards by listing the herbaria containing a botanists' plant specimens and backwards by indicating which TL-2 botanists contributed to a given herbarium.
Planned Future
Developments IV
Additionally, we plan to look up each botanist in TL-2 to their record at the Virtual International Authority File (VIAF) to improve identification of the botanist and the ability to link to other sources on the internet. Similarly, we hope to decode and resolve the bibliographic entries for each botanist and link them to the Biodiversity Heritage Library or other appropriate online databases.
Planned Future
Developments V
Finally, the each botanist may have one or more species that are named after them. This information includes a genus name, the person who named the genus (the author) and the year that the name was created. We aim to identify the species names and link them to the Encyclopedia of Life, the Biodiversity Heritage Library, or other more appropriate online databases of species names. Additionally, we would like to connect the author to his or her record in TL-2, if it exists, thereby creating additional internal cross-links in TL-2.
Can you name the
botanists you saw?
Herman BoerhaaveNathaniel Lord BrittonA.P. de CandolleCarolus ClusiusCadwallader ColdenErasmus DarwinR.L. DesfontainesLarry DorrHenry EnglefieldJoseph Dalton Hooker
C.M. HoveyN.J. von JacquinBernard de JussieuCarl LinnaeusC.F.B. MirbelDan NicholsonJ.W. PalmstruchRichard PultenyHenry ShawMartin VaulJudy Warnement
Thanks to the following
collaborators on the project
Susan FraserDon WheelerJudy WarnementDoug HollandChris FreelandIAPTUnlimited PrioritiesInternet ArchiveData Conversion Lab
Smithsonian TeamGilbert BorregoGrace CostantinoLarry DorrRobin EverlySue GravesSuzanne PilskJoel RichardKeri Thompson