http://www.systematics.rdg.ac.uk/spice/ Common Data Models and Protocols Richard White, Cardiff University Talk given at “Making Species Databases Interoperable”, Reading, 15 July 20042 SPICE for Species 2000 Funded in the UK by the BBSRC/EPSRC Bioinformatics Initiative Universities of Cardiff & Reading http:// www.systematics.rdg.ac.uk/spice/
35
Embed
Http:// Common Data Models and Protocols Richard White, Cardiff University Talk given at “Making Species Databases Interoperable”,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
http://www.systematics.rdg.ac.uk/spice/
Common Data Models and ProtocolsCommon Data Models and Protocols
Richard White, Cardiff University
Talk given at “Making Species Databases Interoperable”, Reading, 15 July 20042
SPICE for Species 2000Funded in the UK by the
BBSRC/EPSRC Bioinformatics Initiative
Universities of Cardiff & Readinghttp://www.systematics.rdg.ac.uk/spice/
http://www.systematics.rdg.ac.uk/spice/
Species 2000Species 2000
The story so far ...
Species 2000 is an international collaborative project to create and provide access to an authoritative and up-to-date checklist and index to all the world’s species.
How is it going to do this?
http://www.systematics.rdg.ac.uk/spice/
Species 2000 services to usersSpecies 2000 services to users
Dynamic Checklist Annual Checklist Web site, including database links submitted
by users or producers Distribution media, including downloaded
data Index to species information (hyperlinks to
SISs) Packaged functions providing services to other
software
http://www.systematics.rdg.ac.uk/spice/
Species 2000 organisationSpecies 2000 organisation
Taxonomic hierarchy (or hierarchies)
Species
Global species databases (GSDs) and interim
checklists: the species index GSDinterim
checklists
Species information sources (SISs): regional faunas and floras, specialist or sectoral
databases, web pages etc.
SIS
http://www.systematics.rdg.ac.uk/spice/
Merging & Linking
MergingThe original databases are physically copied into a new combined database
LinkingThe original databases remain separate, but are accessed through a single system
http://www.systematics.rdg.ac.uk/spice/
Merging
1. The original databases are physically copied into a new combined database.
2. The user interacts with the new combined database.
Plants ofEurope
Plants ofAfrica
Plants ofthe World
1
2
http://www.systematics.rdg.ac.uk/spice/
Linking
1. The user interacts with an access system which does not itself contain data.
2. When the user requests data, it is fetched from the appropriate database.
Plants ofEurope
Plants ofAfrica
Plants ofthe World
2
1
http://www.systematics.rdg.ac.uk/spice/
Architecture of Species 2000Architecture of Species 2000
User interface
Data collector
Wrapper
GSD
Wrapper
GSD
Wrapper
GSD
CAS
(Common Access System)
or “harness”
Protocol
Distributed array of databases
http://www.systematics.rdg.ac.uk/spice/
Need for communicationNeed for communication
Different people are building the various components of the system:– GSDs– wrappers– CAS– user interface
We need to ensure they all have a common understanding of the data to avoid embarrassing mistakes
http://www.systematics.rdg.ac.uk/spice/
Database wrappersDatabase wrappers
Only the interface to the CAS needs to speak CORBA
Wrappers must:– Translate CAS requests into a form
suitable for the GSD (e.g. SQL) and translate responses back
– Deal with other kinds of heterogeneity, including schema heterogeneity
http://www.systematics.rdg.ac.uk/spice/
Data flow through a wrapper Data flow through a wrapper
Divided wrapper
GSD
Wrapper interface
CAS
External wrapper
XML
Strings e.g. CGI
http://www.systematics.rdg.ac.uk/spice/
Common Data ModelCommon Data Model
We need a Common Data Model (CDM)– A definition of the information being
passed to and fro– Human-readable, not machine-readable– This is used as a reference when creating
specific implementations for CGI/XML (DTD, XML Schema), Web Services, etc.
http://www.systematics.rdg.ac.uk/spice/
What does the CDM look like?What does the CDM look like?
It defines the input (“request”) and output (“response”) for six fundamental operations which the system needs to be able to carry out
http://www.systematics.rdg.ac.uk/spice/
Request Types 0-6Request Types 0-6
– Type 0: Get version of the CDM with which the GSD complies
– Type 3: Get information about the GSD– Type 1: Search for a name in the GSD– Type 2: Fetch “standard data” about a
chosen species– Type 4: Move up the taxonomic
hierarchy– Type 5: Move down the taxonomic
hierarchy
http://www.systematics.rdg.ac.uk/spice/
Type 0 RequestType 0 Request
Request:– (nothing)
Response:– CDMVersion
http://www.systematics.rdg.ac.uk/spice/
Type 3 RequestType 3 Request
Request:– GSDIdentifier
Response:– GSDInfo (a set of fields including its name,
One way to make links appear more intelligent is to create and maintain “cross-maps” which describe how one or more taxa in one resource (such as the Species 2000 index) relate to one or more taxa in another resource
http://www.systematics.rdg.ac.uk/spice/
Litchi 2.2 in useLitchi 2.2 in use
Checklist A Checklist B
Rules
Heuristics
Concept relationships
Cross-map
Taxonomic intelligence
Read into system
Write
Conflict detection
Inference of concept relationships
http://www.systematics.rdg.ac.uk/spice/
More about cross-mapsMore about cross-maps
They may be created and maintained– manually by experts– automatically or semi-automatically by
LITCHI (as above)– by monitoring the behaviour of users
following species links– by analysing data sets describing the taxa,
when sufficient such data is available, using the usual species taxonomy tools (phenetic and cladistic analyses)
http://www.systematics.rdg.ac.uk/spice/
More about cross-mapsMore about cross-maps
They may be held– by individual GSDs, describing how to link
their species to selected related resources, as ILDIS has done for linking to the Northern Eurasia (aka USSR) database)
– by Species 2000 as a repository and service to facilitate intelligent species links
– by an “intelligent linking engine”, as planned for Species 2000 Europa to link its two hubs
http://www.systematics.rdg.ac.uk/spice/
A dreamA dream
A system for managing intelligent species links using taxonomic concept relationships would maximise the potential of the plethora of species-based catalogues, indexes and rich species resources currently being assembled all over the world
Perhaps on the Web, as with the current Spice/Species 2000 prototype
Or ...
http://www.systematics.rdg.ac.uk/spice/
The GridThe Grid
Or maybe on the Grid– One of the aims of which is to provide
access to such knowledge sources as species checklists, synonymy servers, rich species data sets, and cross-maps, for example in the Biodiversity World project