On-line biological data concepts at CSIRO Marine Research, Australia Tony Rees & Kim Finney Divisional Data Centre CSIRO Marine Research, Hobart, Australia.

Post on 31-Dec-2015

215 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

On-line biological data concepts at CSIRO Marine Research, Australia

Tony Rees & Kim Finney

Divisional Data Centre

CSIRO Marine Research, Hobart, Australia

http://www.marine.csiro.au/datacentre/

Our website: http://www.marine.csiro.au/datacentre/

Pre-existing situation at CMR (before 1997)

• Data in a variety of databases and flat files

• No metadata or digital documentation

• No web access to any data or metadata

• CAAB (taxon coding system) in existence but coverage patchy and compliance variable

Our implementation path

Stage 1 (1997-2000) ...

• Construct a searchable, web-accessible metadata system and start population it with information - MarLIN v1

• Upgrade CAAB to form a comprehensive taxon dictionary for MarLIN (also accessible by SQuID)

• Build a pilot data store and visualisation system with a web-driven GUI (Java applet) - SQuID v1

Stage 2 (2000-) ...

• Build SQuID v2 (onwards) to become a comprehensive data store, with upgraded links to MarLIN and CAAB

• Implement linkage between MarLIN and Australia-wide, distributed metadata search system

Stage 3… ???

Our system overview

Subsets of information shared with other metadata

directory systems

Entry point to data

Display relevant metadata

Data directory(metadatabase)

- holds info at “dataset” level (e.g. survey, species range)

Master data storage (includes index layer) - holds info at the atomic

data level

Taxon dictionary

Digression #1: Taxon matching

• Simplistic view:

– text match on one field (“scientific name”) or two (genus + species)

• More comprehensive approach:

– 10 or more fields required, e.g. in CAAB we define the following:Genus Subgenus Species Qualifier also need to flag: Subspecies - Is botanical or zoological code applicable? Variety - Species name latin or informal (“sp. A”, etc.)? Original Author/s - Has name changed from original? (even if Original Date no revising author/date stored) Revising Author/s Revision Date Authority Addendum

Examples from our database:• Chlamys (Belchlamys) aktinos (Petterd, 1886) … a scallop

• Ophiaster hydroideus (Lohmann) Lohmann, 1913 emend. Manton & Oates, 1983 … a coccolithophorid

• Heteroclinus sp. 1 [in Gomon et al, 1994] .. Kuiter's weedfish

Taxon matching … continued

• We have standardised on taxon codes, rather than names for data storage and matching … names are stored as an attribute of the code (and can be updated in the future as needed)

• Our “CAAB” coding system has evolved over 20+ years - earlier generations of codes are maintained on the system

• New web-based access facility for retrieving latest name for a code, searching for a taxon, etc.

• Same CAAB codes are also used by other marine science/fisheries agencies around Australia

• Facility newly implemented in CAAB to hold ITIS codes, for cross-reference to international systems in the future

CAAB services available

• Retrieve current sci. name, common name(s), taxon code, taxon report

CAABuser

interface

• Initiate a MarLIN search, ITIS report, FishBase report

User searches by scientific name,

common name or taxon code (or portion

thereof)

• List taxa by CAAB category or family

Application-level

requests

• Generate scientific name, common name, current code (if applicable) for a given taxon code

• Call a CAAB taxon report

• List taxa matching query

• Translate an ITIS number to a CAAB code (or vice versa)

CAAB web interface (current version)

Digression #2: taxonomy keywords• CAAB uses “major categories” (mostly = phyla)

• MarLIN uses Australian “Blue Pages” keywords (c. 100 terms) - independent of CAAB codes (in current implementation)

• NASA GCMD keywords would be an OBIS option (maybe with additions to suit OBIS) - c. 50 currently relevant … could also cross-map to GEMET (EC) list (c.200)

EARTH SCIENCE >> Biosphere >> Zoology >> AmphibiansEARTH SCIENCE >> Biosphere >> Zoology >> AnemonesEARTH SCIENCE >> Biosphere >> Zoology >> ArachnidsEARTH SCIENCE >> Biosphere >> Zoology >> ArthropodsEARTH SCIENCE >> Biosphere >> Zoology >> BirdsEARTH SCIENCE >> Biosphere >> Zoology >> CentipedesEARTH SCIENCE >> Biosphere >> Zoology >> CoralsEARTH SCIENCE >> Biosphere >> Zoology >> CrustaceansEARTH SCIENCE >> Biosphere >> Zoology >> EchinodermsEARTH SCIENCE >> Biosphere >> Zoology >> FishEARTH SCIENCE >> Biosphere >> Zoology >> FlatwormsEARTH SCIENCE >> Biosphere >> Zoology >> InsectsEARTH SCIENCE >> Biosphere >> Zoology >> InvertebratesEARTH SCIENCE >> Biosphere >> Zoology >> JellyfishEARTH SCIENCE >> Biosphere >> Zoology >> MammalsEARTH SCIENCE >> Biosphere >> Zoology >> MillipedesEARTH SCIENCE >> Biosphere >> Zoology >> MollusksEARTH SCIENCE >> Biosphere >> Zoology >> ReptilesEARTH SCIENCE >> Biosphere >> Zoology >> RoundwormsEARTH SCIENCE >> Biosphere >> Zoology >> Segmented WormsEARTH SCIENCE >> Biosphere >> Zoology >> SpongesEARTH SCIENCE >> Biosphere >> Zoology >> VertebratesEARTH SCIENCE >> Biosphere >> Zoology >> Zooplankton

EARTH SCIENCE >> Biosphere >> Microbiota >> AmoebaeEARTH SCIENCE >> Biosphere >> Microbiota >> BacteriaEARTH SCIENCE >> Biosphere >> Microbiota >> Blue-green AlgaeEARTH SCIENCE >> Biosphere >> Microbiota >> CiliatesEARTH SCIENCE >> Biosphere >> Microbiota >> CoccolithophoreEARTH SCIENCE >> Biosphere >> Microbiota >> DiatomsEARTH SCIENCE >> Biosphere >> Microbiota >> FlagellatesEARTH SCIENCE >> Biosphere >> Microbiota >> ForaminifersEARTH SCIENCE >> Biosphere >> Microbiota >> MicroalgaeEARTH SCIENCE >> Biosphere >> Microbiota >> MicrophyteEARTH SCIENCE >> Biosphere >> Microbiota >> PhytoplanktonEARTH SCIENCE >> Biosphere >> Microbiota >> PlanktonEARTH SCIENCE >> Biosphere >> Microbiota >> ProtistEARTH SCIENCE >> Biosphere >> Microbiota >> RadiolariansEARTH SCIENCE >> Biosphere >> Microbiota >> Zooplankton

EARTH SCIENCE >> Biosphere >> Vegetation >> AlgaeEARTH SCIENCE >> Biosphere >> Vegetation >> Flowering PlantsEARTH SCIENCE >> Biosphere >> Vegetation >> LichensEARTH SCIENCE >> Biosphere >> Vegetation >> MacroalgaeEARTH SCIENCE >> Biosphere >> Vegetation >> MacrophyteEARTH SCIENCE >> Biosphere >> Vegetation >> Phytoplankton

Taxonomy keyword cross-mapping (examples)

Invertebrates Sponges Jellyfish Anemones Corals Flatworms Roundworms Segmented Worms Mollusks

Arthropods Insects ArachnidsEchinoderms CrustaceansVertebrates Fish Amphibians Reptiles Birds Mammals

invertebrate … S709 poriferan … S744 coelenterate … S737 coral … S738 nematode … S743 annelid … S711 ++ mollusc … S740 cephalopod … S741 gastropod … S742 arthropod … S713 insect … S719 ++ chelicerate … S714 ++ echinoderm … S739 crustacean … S717vertebrate … S649 fish … S754 amphibian … S 650 ++ reptile … S691 ++ bird … S654 ++ mammal … S 664 ++

GCMD list GEMET list

MarLIN - used for data discovery

• MarLIN - based on an Oracle database containing dataset, project, and survey descriptions, plus on-line links to data and web resources

• Holds metadata according to regional (ANZLIC and “Blue Pages”) standards, with additional agency-constructed fields (“extended ANZLIC”)

• Web interface for searching and metadata contribution/update, using HTML, Oracle Web Server and custom PL/SQL application

• Produces lists of datasets, or dataset reports, as requested

• Includes links to pre-formatted data “packets” (now) and to SQuID (in future), for access to the data

NB: no data visualising capability, apart from “thumbnails” showing data extent

MarLIN - behind the scenes

• Some 25+ tables, holding the following:

– text-based fields (e.g. title, abstract, contributors, references, etc.)

– keywords, handled as numeric ID’s (including taxonomic keywords)

– species/species groups, handled as CAAB codes

– spatial extent, handled as bounding coordinates (max and min. latitude and longitude)

– time extent, handled as earliest and latest collection date for items in the dataset

– originator organisation, present custodian, survey, contact person, etc, handled as numeric ID’s

• Initial search set up by keyword/ID type, spatial coordinates, time period (if desired)

• Then search/browse by subject categories, keywords, taxon names, contributing project, vessel/voyage identifier, location of data, etc.

• Free text search also supported

MarLIN search interface

Example MarLIN search result - by taxonomic group

subject categories | custodian organisations | vessels | voyages | projects |taxonomic groups | species | habitats | parameters | equipment

The following choices are presently available for MarLIN records in the selected region and/or time period: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24,

East=122)

Click on any hyperlink to see the full listing for that item. Invertebrates 4 . . . . Cephalopods 1 . . . . . . Squids 1 . . Crustaceans 2 . . . . Prawns & Shrimps 2 Fishes 4 . . Breams 1 . . Dories 1 . . Leatherjackets 1 . . Perches 3 . . Redfishes 1 . . Roughies 1 . . Snappers 4 . . Whales 1

Example MarLIN search result - by species

subject categories | custodian organisations | vessels | voyages | projects |taxonomic groups | species | habitats | parameters | equipment

The following choices are presently available for MarLIN records in the selected region and/or time period: Start year: 1990 End year: 1995 Selected region: Australian North West Shelf (stored coordinates used: North=-17, West=114, South=-24,

East=122)

Click on any hyperlink to see the full listing for that item.

23 636004 Nototodarus gouldi .. Gould's squid 1

28 786002 Metanephrops boschmai .. Boschma's scampi 1

28 786005 Metanephrops velutinus .. velvet scampi 1

28 821001 Ibacus alticrenatus .. deepwater bug 1

28 821002 Ibacus pubescens .. [a shovel-nosed/slipper lobster] 1

37 118001 Saurida undosquamis .. brushtooth lizardfish 3

37 118016 Saurida sp. 2 [in Sainsbury et al, 1985] .. grey lizardfish 3

37 255004 Gephyroberyx darwinii .. Darwin's roughy 1

37 258002 Beryx splendens .. alfonsino 1

(etc.)

Example MarLIN search result - dataset titles

You searched on the following criteria:

Start year: 1990 End year: 1995 Selected region: Australian North West Shelf CAAB Species: 37 118001 - Saurida undosquamis

There are 3 datasets matching your criteria in MarLIN at this time.Click on the dataset title to view the metadata record for any dataset.

Southern Surveyor Voyage SS 02/90 - Biological Data Overview Southern Surveyor Voyage SS 04/91 - Biological Data Overview

Southern Surveyor Voyage SS 08/95 - Biological Data Overview

------------------------------------------------------------------------------

SQuID - data repository and visualisation tool

• Oracle relational database containing c. 45 tables (present version)

• Holds point, poly-line, and polygon based, geo-referenced data (also time and depth referenced)

• Client runs as Java applet, connects to Oracle data store by Remote Method Invocation (RMI) and JDBC

• Search by spatial coordinates, time period, data “stream” … can subset by survey if desired

• Retrieve atomic-level data for inspection or upload to user’s system

• Basic plotting routines provided, such as:– geographic distribution of data (sampling points, vessel tracks)

– vertical plots (e.g. temperature, salinity, oxygen vs depth)

– time-based plots (e.g. water temperature measurement through a voyage)

– pie charts for catch composition by number or weight

– length-frequency data, aggregated or by sex of individual

• Taxon handling using CAAB codes (system includes legacy data with obsolete codes)

• Links to MarLIN to display relevant metadata

SQuID user interface - version 1.0

Example SQuID search result

SQuID atomic level data - example

Time series data in SQuID

SQuID vs MarLIN / CAAB - two different approaches

SQuID - a data-rich browser environment

• Large files uploaded to the browser to allow interactive functions (zoomable maps, on-demand display of sample details, cursor tracking, browser-generated plots)

• Disadvantages: more complex applet to load, longer waits for queries to be serviced, performance on user’s machine may be limiting

MarLIN & CAAB - a minimal browser environment

• No reliance on JAVA version control, browser plugins etc, no load time at startup

• All processing takes place on the server (can maximise performance there) - less stringent requirements for users in hardware terms

• Disadvantage: less real-time interactivity provided (although some workarounds possible)

… May look at a hybrid solution for SQuID v2 - prioritise what level of interactivity/data upload is really needed, handle more at server level

some considerations for OBIS ...

• For agency-specific reasons, we have arrived at separate metadata/data systems. OBIS might want to integrate these two aspects more fully

• Automated generation/maintenance of metadata might be possible (at least in part) and is certainly desirable

• Where would OBIS metadata reside? (centrally or replicated or fully distributed?) - Australian “ASDD” is an example of a fully distributed system, NASA “GCMD” is a centralised one

• Need to decide on taxon handling for OBIS (names or codes), plus standard(s) for higher level searching

• OBIS software should aim to tolerate a diversity of agency-level systems, while encouraging/facilitating “best practice” data management

The End

CAAB web search

top related