Increasing Usability of Biodiversity Databases through Semantic Enrichment Klaus Riede Zoologisches Forschungsinstitut & Museum Alexander Koenig (ZFMK) Adenauerallee 150-164 53113 Bonn, Germany
Dec 29, 2015
Increasing Usability of Biodiversity Databases through
Semantic Enrichment
Klaus Riede Zoologisches Forschungsinstitut &
Museum Alexander Koenig (ZFMK)
Adenauerallee 150-164
53113 Bonn, Germany
Semantic Enrichment:Some examples.....
Huge Biodiversity Databases already exist.
They cover distinct organims:
Fishbase, Orthoptera Species File
OR
Distinct themes:
Threat: IUCN Red List Database (www.redlist.org)
Migration: Global Register of Migratory Species (www.groms.de)
Why do we need semantic enrichment?
Semantic Enrichment:Some examples.....
Try to search for:
Number of „Extinct Tropical Timber Trees“
Database: IUCN Red List Database (www.redlist.org)
Query: Tropical tree
Problem: plants are not classified according to life-form
Plant families such as TAXODIACEAE comprise trees
(e.g. Taiwania cryptomeroides - VULNERABLE)
CUPRESSACEAE contain shrubs (Actinostrobus) AND trees ( Thuja spp.)
Semantic Enrichment:Searching for Red-Listed Trees
To search the IUCN Red List Database (www.redlist.org)
for „Threatened“ trees, you have to know plant taxonomy:
Searching the Order CONIFERALES (containing Taxodiaceae trees):–16 Critically Endangered,
–43 Endangered,
–93 Vulnerable,
...but some of those are shrubs (Cupressaceae: Actinostrobus)
Threatened Cupressaceae:– 2 Critically Endangered, (e.g. Thuja sutchuensis)
– 15 Endangered, (e.g. Juniperus cedrus)
– 25 Vulnerable (e.g. Cupressus gigantea)
Semantic enrichment is necessary to search for „Trees“
http://www.botanik.uni-bonn.de/conifers/index.htm
Two Worlds: Relational databases and complex data sets
Relational Databases
Digital Orthoptera Specimen Access
SYSTAX
GROMS Global Register of Migratory Species
Complex data sets
Sounds, Pictures
gene sequences (links) geographic coordinates
Maps (GIS-data: shapes)
Example #1Data-mining for Knowledge Gaps
The „Global Register of Migratory Species“ Databasecontains literature citations on migration.
Knowledge gaps were detected by searching for text strings such as: poor* , little known, unknown
www.groms.de
The relational organisation of the GROMS database allows application of SQL queries for text-mining:
ID Author, Title etc
Lit_IDSpecies_IDText:[......................migration... unknown...................................]
IDTaxon nameMigrationRed List status, etc
References Table: Joint Table: Species Table
1:manymany:1
5,500 entries 8,500 entries 4,355 entries
Many:Many relation connects References and Species Names
SQL statement:Searching for non-passerine birds with poorly known
migration behaviour:
Apus caffer White-rumpedswift
intercontinental Migratory in northernmost and southernmost parts of range. Spanish population present early May to Aug-Oct, some recorded intoearly Dec, with autumn migration through Straits of Gibraltar mid-Aug to mid-Oct; S African population present Aug-May, mainlyabsent from S Cape and much reduced farther N within S breeding range Jun-Jul. Poorly understood wet-season movements into Sahelmay be feature of N sub-Saharan populations. Otherwise resident. Migrates in flocks of up to 100. S African migrants may betransequatorial. Some degree of altitudinal migration in Natal. First record from rabia 1982, and seen at least once subsequently inTihamah coastal plains, Saudi Arabia, in Mar 1989. Vagrant to Norway (May, Jun) and Finland (Nov).
Chaetura vauxi Vaux's swift intercontinental Nominate race a migrant, present in far N of range May to mid-Sept, exceptionally late Mar on coast. Migrates through S Californiamid-Apr to early May, with weaker autumn passage peaking early Sept, though continuing to early Oct, migrants leaving the state bymid-Oct. Recorded SE Farallon Is, 42 km W of San Francisco, in similar numbers over 22 years, in spring 813 in early-late May, and inautumn 803 early Sept to late Oct. Recorded E to Louisiana and Florida Passage through NW Mexico Apr-May and mid-Sept to Oct;nominate race present C Mexico to W Honduras, mid-Sept to May. Incidence of wintering in California increasing, small flocksoccurring mainly in S, though wintering as far as NW California not unknown.
SELECT Tab_Arten.Latein, Tab_Arten.Englisch, Tab_Arten.Migration,Jointab_Art_Lit.Lit_Bezug, Tab_Literatur.Autor_Name, Tab_Literatur.Autor_Vorname,Tab_Literatur.Coautoren_Namen, Tab_Literatur.Jahr, Tab_Arten.animalgroup,Tab_Arten.FamilieFROM Tab_Literatur RIGHT JOIN (Tab_Arten INNER JOIN Jointab_Art_Lit ONTab_Arten.ID = Jointab_Art_Lit.ID_Art) ON Tab_Literatur.ID = Jointab_Art_Lit.ID_LitWHERE (((Jointab_Art_Lit.Theme)=7) AND ((Tab_Arten.Animal_Class)=2) AND((Jointab_Art_Lit.Lit_Bezug) Like "*unknown*")) OR (((Jointab_Art_Lit.Theme)=7) AND((Tab_Arten.Animal_Class)=2) AND ((Jointab_Art_Lit.Lit_Bezug) Like "*perhaps*")) OR(((Jointab_Art_Lit.Theme)=7) AND ((Tab_Arten.Animal_Class)=2) AND((Jointab_Art_Lit.Lit_Bezug) Like "*little*")) OR (((Jointab_Art_Lit.Theme)=7) AND((Tab_Arten.Animal_Class)=2) AND ((Jointab_Art_Lit.Lit_Bezug) Like "*poor*")) OR(((Jointab_Art_Lit.Theme)=7) AND ((Tab_Arten.Animal_Class)=2) AND((Jointab_Art_Lit.Lit_Bezug) Like "*possib*"))ORDER BY Tab_Arten.animalgroup, Tab_Arten.Familie;
Result: 349 birds with unsufficiently known migration behaviour
Apus caffer
White-rumped swift
Migratory in northernmost and southernmost parts of range.Spanish population present early May to Aug-Oct, somerecorded into early Dec, with autumn migration through Straitsof Gibraltar mid-Aug to mid-Oct; S African population presentAug-May, mainly absent from S Cape and much reducedfarther N within S breeding range Jun-Jul. Poorly understoodwet-season movements into Sahel may be feature of N sub-Saharan populations. Otherwise resident. Migrates in flocks ofup to 100. S African migrants may be transequatorial. Somedegree of altitudinal migration in Natal. First record from rabia1982, and seen at least once subsequently in Tihamahcoastal plains, Saudi Arabia, in Mar 1989. Vagrant to Norway(May, Jun) and Finland (Nov).
Caprimulgusclimacurus
Long-tailed nightjar
Poorly known. Nominate race migratory and partiallysedentary, some populations moving S after breeding season.Race sclateri possibly sedentary and partially migratory.Race nigricans probably sedentary. Outside breedingseason, range also includes S Ivory Coast, SW Nigeria, SCameroon, Equatorial Guinea, Gabon, SE Congo (lowerCongo river valley), NE Angola (one record Luaco), SESudan, SW Ethiopia, W Kenya (sporadic in Turkana andPokot region) and E Uganda.
mainly based on „Handbook of the birds of the World (del Hoyo et al. 1992-2003
www.groms.de
Example #2:Automatic Annotation of Sound Parameters
The Orthoptera Song Repository of the DORSA project was used to annotate all 5,000 sound files automatically with sound parameters.
Sound parameters were added to the SysTax database, which stores specimen data from various museum databases, including herbaria.
The annotated SysTax Oracle database is now searchable for sound parameters, such as Carrier Frequency and Pulse Rate
Orthopteren-Typenmaterial in deutschen Museen.
Deutsche Orthopteren Sammlungen - www.dorsa.de
Deutsche Orthopteren Sammlungen - www.dorsa.de
Überprüfung, Bestimmung, Verifizierung von•Angaben über Typenmaterial, •Auffinden „historischer“ Typen,•Festlegung von Lektotypen
Deutsche Orthopteren Sammlungen - www.dorsa.de
Taxonomic database(OSF:
Orthoptera Species File, USA)
Specimens (german museums, phonotheks)
(www.dorsa.de)
Mutual links
Extraction of sound parameters by using MatLab Software
Pulse rate
Carrier frequency
Carrier frequency
In cooperation with:Dept of Neuroinformatics, Ulm
Enriched sound file table:pulse distance, length, frequency etc were added to the
SYSTAX table
PULSEDISTPULSEDISTSPULSELENGTPULSELENGTDUTY_CYCLEFREQUENCYFREQUENCYSFILENAME12.5 ms 419.8 10.5 ms 2.6 0.84 7590.44 Hz 66.60 n3/tr/trigsp01/s6n023.wav19.3 ms 94.8 7.0 ms 2.2 0.36 7593.24 Hz 63.79 n1/tr/trigsp01/s6n023.wav18.1 ms 453.3 10.6 ms 4.7 0.59 7357.79 Hz 521.05 n3/tr/trigsp01/s6n027.wav18.1 ms 290.4 10.3 ms 5.2 0.57 7302.54 Hz 610.23 n1/tr/trigsp01/s6n027.wav18.1 ms 983.1 10.4 ms 3.2 0.57 7684.25 Hz 114.84 n1/tr/trigsp01/s6n027f.wav13.2 ms 203.2 6.6 ms 3.3 0.50 7104.88 Hz 76.85 n3/tr/trigsp01/s6n029.wav13.6 ms 79.1 12.0 ms 3.5 0.88 7128.05 Hz 78.00 n1/tr/trigsp01/s6n029.wav11.5 ms 458.2 6.1 ms 3.1 0.53 7702.09 Hz 380.65 n3/tr/trigsp01/s6n031.wav11.6 ms 113.8 5.7 ms 2.6 0.49 7806.72 Hz 78.67 n1/tr/trigsp01/s6n031f.wav22.9 ms 130.4 8.4 ms 2.6 0.37 6867.13 Hz 77.54 n3/tr/trigsp01/s6n034.wav22.9 ms 171.9 8.4 ms 2.6 0.37 6855.36 Hz 90.63 n1/tr/trigsp01/s6n034.wav22.9 ms 126.7 8.4 ms 2.6 0.37 6855.70 Hz 102.59 n1/tr/trigsp01/s6n034f.wav14.4 ms 114.3 11.7 ms 2.6 0.81 6672.19 Hz 53.27 n3/tr/trigsp01/s6n041.wav14.6 ms 209.3 11.7 ms 3.2 0.80 6641.43 Hz 60.03 n1/tr/trigsp01/s6n041.wav14.6 ms 209.3 11.7 ms 3.2 0.80 6643.62 Hz 70.45 n1/tr/trigsp01/s6n041f.wav39.4 ms 1988.8 19.5 ms 7.3 0.49 6165.52 Hz 124.48 n3/tr/trigsp01/s6n044.wav13.0 ms 100.7 11.0 ms 2.0 0.85 7207.58 Hz 41.06 n1/tr/trigsp01/s6n044f.wav13.2 ms 295.5 11.0 ms 4.0 0.83 6965.09 Hz 1129.63 n3/tr/trigsp01/s6n049.wav13.0 ms 100.0 8.5 ms 2.4 0.65 7205.56 Hz 41.95 n1/tr/trigsp01/s6n049.wav11.5 ms 506.9 9.8 ms 2.9 0.85 7528.76 Hz 64.86 n1/tr/trigsp01/s7n008.wav11.5 ms 82.3 10.0 ms 2.7 0.87 7545.55 Hz 51.77 n3/tr/trigsp01/s7n008.wav11.5 ms 506.9 9.8 ms 2.9 0.85 7527.69 Hz 63.26 n1/tr/trigsp01/s7n008f.wav13.2 ms 148.8 11.0 ms 2.7 0.83 7322.22 Hz 66.56 n3/tr/trigsp01/s7n026.wav13.5 ms 1586.9 11.2 ms 3.7 0.83 7330.96 Hz 58.02 n1/tr/trigsp01/s7n026.wav13.5 ms 1581.0 11.2 ms 3.7 0.83 7332.97 Hz 44.75 n1/tr/trigsp01/s7n026f.wav17.8 ms 174.6 11.0 ms 2.9 0.62 7464.91 Hz 61.32 n3/tr/trigsp01/s7n031.wav17.7 ms 123.3 10.4 ms 3.3 0.59 7467.31 Hz 54.61 n1/tr/trigsp01/s7n031.wav17.7 ms 123.3 10.4 ms 3.3 0.59 7462.75 Hz 55.57 n1/tr/trigsp01/s7n031f.wav13.7 ms 172.4 7.3 ms 2.6 0.53 7325.90 Hz 41.79 n3/tr/trigsp01/s7n053.wav
Bioacoustic, automatised classification of ethospecies
allows Rapid Assessment
Mapping with microphones allows to answerimportant research questions, such as:
- species ranges/ endemism- species abundance- species turnover- community patterns- activity patterns- vulnerability to habitat degradation- extermination rates
Example #3Enriching databases with Geographic information
- Adding lat-lon coordinates by Geo-referencing
- GIS Analysis of complex geometries (shapes) by intersection with other GIS-layers and subsequent update
Georeferencing is necessary to update place names withlat-lon data
Label Country long_dec lat_dec
Ahwaz Iran 48.66667 31.16667Ainazi-Svetupe River Latvia 24.3 57.78333Ainovy Islands Russia 31.58333 69.83334Akaki Region Ethiopia 38.83333 8.833333Akh-Chala Plavni Novogolov Azerbaijan 48.66667 39.5Akhna Dam Cyprus 33.8 35.03333Akhtarski and Sladki Limans Russia 38 46Akrotiri Salt Lake Cyprus 32.93333 34.51667Aksehir Gölü Turkey 31.4 38.55Akureyri Iceland -18.08333 65.66666Akyatan Gölü Turkey 35.31667 36.58333
?
Geographic coordinates were added to place names, using Times Atlas or gazetteers (Getty, Alexandria Project)
Label Country long_dec lat_dec
Ahwaz Iran 48.66667 31.16667Ainazi-Svetupe River Latvia 24.3 57.78333Ainovy Islands Russia 31.58333 69.83334Akaki Region Ethiopia 38.83333 8.833333Akh-Chala Plavni Novogolov Azerbaijan 48.66667 39.5Akhna Dam Cyprus 33.8 35.03333Akhtarski and Sladki Limans Russia 38 46Akrotiri Salt Lake Cyprus 32.93333 34.51667Aksehir Gölü Turkey 31.4 38.55Akureyri Iceland -18.08333 65.66666Akyatan Gölü Turkey 35.31667 36.58333
Mapping requires specimen data enriched with geographic coordinates
The DORSA mapserver is available atwww.dorsa.de
Deutsche Orthopteren Sammlungen - www.dorsa.de
Herkunftsländer des Typenmaterials in deutschen Museen
Example #3Enriching databases with Geographic informationbased on GIS calculation of range territories
Distribution maps (shapes) are available at www.groms.de
Import of Intersection Results:1,000 mapped species - 2,522 administrative units 340,000 combinations (dbf attribute table:province – species)
Queensland search results:
Summary:Semantic enriching of relational databases is possible by automatic annotation:
Relational database
Table with annotation Results
Enriched RelationalDatabase
External data set(sounds, GIS)
Link
Running annotationprogram (eg GIS intersection
ImportingResult table
Enrichment allows SQL retrieval of complex data parameters