Science Environment for Ecological Knowledge Bertram Ludäscher San Diego Supercomputer Center University of California, San Diego .
Post on 16-Jan-2016
217 Views
Preview:
Transcript
Science Environment Science Environment for Ecological for Ecological Knowledge Knowledge
Science Environment Science Environment for Ecological for Ecological Knowledge Knowledge
Bertram Ludäscher
San Diego Supercomputer CenterUniversity of California, San Diego
http://seek.ecoinformatics.org
UC Santa Barbara
UC San Diego
U New Mexico
U Kansas
Vermont, Napier, ASU, UNC
SEEK Overview, 3/2004 2
Architecture Overview Architecture Overview
• Analysis & Modeling System– Design and execution of
ecological models and analysis
– End user focus– application-/upperware
• Semantic Mediation System– Data Integration of hard-
to-relate sources and processes
– Semantic Types and Ontologies
– upper middleware• EcoGrid
– Access to ecology data and tools
– middle-/underware
•Plus Working Groups:
– Knowledge Representation (SEEK-KR)
– Classification and Nomenclature (TAXON)
– Biodiversity and Ecological Analysis and Modeling (BEAM)
(cf. GEON + Cyberinfrastructure)
SEEK Overview, 3/2004 3
SEEK EcoGridSEEK EcoGrid
• Goal: standardize interfaces (using web and grid services)– We have standardized data via EML– Integrate diverse data networks from ecology, biodiversity, and
environmental sciences
• Grid-standardized interfaces– Uniform interface to:
• Metacat, SRB, DiGIR, Xanthoria, etc.• Anyone can implement these interfaces• Hides complexity of underlying systems
• Metadata-mediated data access– Supports multiple metadata standards– EML, Darwin Core as foci
• Computational services– Pre-defined analytical services– On-the-fly analytical services
SEEK Overview, 3/2004 4
Grid versus Web ServicesGrid versus Web Services
• Grid Services are Web Services– Add authentication, lifecycle management, notification, etc.– Globus Toolkit 3: Implements Open Grid Services Architecture
(OGSA)
• Implications for use– Write a normal web service extending GridService base class– When deployed within GT3, you get these extra functions for
‘free’– Supports distributed computation via proxy authentication
• Problems– Complex system to understand– GT3 can be difficult to deploy– Proposals to incorporate grid services within the Web services
community (Web Services Resource Framework [WSRF])
SEEK Overview, 3/2004 5
EcoGrid client interactionsEcoGrid client interactions
• Modes of interaction– Client-server– Fully distributed– Peer-to-peer
• EcoGrid Registry– Node discovery– Service discovery
• Aggregation services– Centralized access– Reliability– Data preservation
SEEK Overview, 3/2004 6
Building the EcoGridBuilding the EcoGrid
AND
LUQ
HBR
NTL
Metacat node
Legacy system
LTER Network (24) Natural History Collections (>> 100)Organization of Biological Field Stations (180)UC Natural Reserve System (36)Partnership for Interdisciplinary Studies of Coastal Oceans (4)Multi-agency Rocky Intertidal Network (60)
SRB node
DiGIR node
VCR
VegBank node
Xanthoria node
SEEK Overview, 3/2004 7
Kepler: Scientific WorkflowsKepler: Scientific Workflows
EML provides semi-automated data binding
Scientific workflows represent knowledge about the process; Kepler captures this knowledge
Query EcoGrid to find data
Archive output to EcoGrid
SEEK Overview, 3/2004 8
GARP Invasive Species ModelGARP Invasive Species Model
Training sample (d)
GARPrule set (e)
Test sample (d)
Integrated layers
(native range) (c)
DiGIRSpecies
presence &absence points(native range)
(a)
EcoGridQuery
EcoGridQuery
LayerIntegration
LayerIntegration
Sample
+A3+A2
+A1
DataCalculation
Map Validation
User
ValidationMap
SRBEnvironmental layers (invasion
area) (b)
Integrated layers
(invasion area) (c)
Invasionarea
prediction map (f)
DiGIR Species presence &absence points
(invasion area) (a)
Native range
predictionmap (f)
Model qualityparameter (g)
SRBEnvironmental layers (native
range) (b)
Model qualityparameter (g)
Slide from D. Pennington
Scientific workflows represent knowledge about the process; AMS captures this knowledge
SEEK Overview, 3/2004 9
Kepler Team, Projects, Sponsors Kepler Team, Projects, Sponsors
• Ilkay Altintas SDM • Chad Berkley SEEK • Shawn Bowers SEEK• Jeffrey Grethe BIRN• Christopher H. Brooks Ptolemy II • Zhengang Cheng SDM • Efrat Jaeger GEON • Matt Jones SEEK • Edward A. Lee Ptolemy II • Kai Lin GEON• Bertram Ludäscher BIRN, GEON, SDM, SEEK• Steve Mock NMI• Steve Neuendorffer Ptolemy II • Jing Tao SEEK• Mladen Vouk SDM • Yang Zhao Ptolemy II • …
Ptolemy IIPtolemy II
SEEK Overview, 3/2004 10
Kepler Understands EML Data Kepler Understands EML Data (Chad Berkley, SEEK)(Chad Berkley, SEEK)
SEEK Overview, 3/2004 11
Kepler: Ecological ModelingKepler: Ecological Modeling(Chad Berkley, SEEK)(Chad Berkley, SEEK)
SEEK Overview, 3/2004 12
Database Access Database Access (Efrat Jaeger, GEON)(Efrat Jaeger, GEON)
Note: EML descriptions of relational sources would allow automated data ingestion
SEEK Overview, 3/2004 13
Mineral Classification with Kepler … Mineral Classification with Kepler … (Efrat Jaeger, GEON)(Efrat Jaeger, GEON)
SEEK Overview, 3/2004 15
Standard BrowserUI: Client-Side Standard BrowserUI: Client-Side SVGSVG
SEEK Overview, 3/2004 16
SWF Reengineering SWF Reengineering (Ilkay, SDM; Ashraf, Efrat, Kai, GEON)(Ilkay, SDM; Ashraf, Efrat, Kai, GEON)
SEEK Overview, 3/2004 18
Result launched via BrowserUI Result launched via BrowserUI actoractor
(coupling with ESRI’s ArcIMS)(coupling with ESRI’s ArcIMS)
SEEK Overview, 3/2004 19
Distributed Workflows in Distributed Workflows in KEPLERKEPLER
• Web and Grid Service plug-ins– WSDL (now) and Grid services (stay tuned …)– ProxyInit, GlobusGridJob, GridFTP, DataAccessWizard– SSH, SCP, SDSC SRB, OGS?-???… coming
• WS Harvester– Import query-defined WS operations as Kepler actors
• XSLT and XQuery Data Transformers– to link not “designed-to-fit” web services
• WS-deployment interface (planned)
SEEK Overview, 3/2004 20
Web Service Actor Web Service Actor (Ilkay Altintas, (Ilkay Altintas, SDM)SDM)
Given a WSDL and the name of an operation of a web service, dynamically customizes itself to implement and execute that method.
Configure - select service operation
SEEK Overview, 3/2004 21
Set Parameters and CommitSet Parameters and Commit
Set parameters and commit
SEEK Overview, 3/2004 22
SpecializedSpecialized WS Actor WS Actor (after instantiation)(after instantiation)
SEEK Overview, 3/2004 23
Web Service Web Service Harvester Harvester (Ilkay Altintas, SDM)(Ilkay Altintas, SDM)
• Imports the web services in a repository into the actor library.• Has the capability to search for web services based on a keyword.
SEEK Overview, 3/2004 24
Kepler: Grid Services AccessKepler: Grid Services Access(Steve Mock, NMI)(Steve Mock, NMI)
SEEK Overview, 3/2004 25
An (oversimplified) An (oversimplified) Model of the Model of the GridGrid
• Hosts: {h1, h2, h3, …}
• Data@Hosts: d1@{hi}, d2@{hj}, …
• Functions@Hosts: f1@{hi}, f2@{hj}, …
• Given: data/workflow:• … as a functional plan: […; Y := f(X); Z := g(Y); …] • … as a logic plan: […; f(X,Y)g(Y,Z); …]
• Find Host Assignment: di hi , fj hj for all di , fj
… s.t. […; d3@h3 := f@h2(d1@h1), …] is a valid plan
f gX Y Z
SEEK Overview, 3/2004 26
Shipping & Handling Algebra Shipping & Handling Algebra (SHA)(SHA)
f@a
x@b y@c
f@a
x@b y@c
f@a
x@b y@c
f@a
x@b y@c
plan Y@C = F@A of X@B =
1. [ X@B to A, Y@A := F@A(X@A), Y@A to C ]
2. [ F@A => B, Y@B := F@B(X@B), Y@B to C ]
3. [ X@B to C, F@A => C, Y@C := F@C(X@C) ]
Logical view
Physical view: SHA Plans
(1)
(3)
(2)
SEEK Overview, 3/2004 27
Grid-Enabling PTII: Grid-Enabling PTII: HandlesHandles
A B
GA GB
1. AGA: get_handle2. GAA: return &X3. AB: send &X4. BGB: request &X5. GBGA: request &X6. GA GB: send *X7. GBB: send done(&X)
Example: &X = “GA.17”
*X =<some_huge_file>
Candidate Formalisms:• GridFTP• SSH, SCP• SDSC SRB• OGS?-??? … WSRF?
1 2
3
4
5
6
7
Kepler space
Grid space
Logical token transfer (3) requires get_handle(1,2); then exec_handle(4,5,6,7) for completion.
SEEK Overview, 3/2004 28
Homogeneous Data IntegrationHomogeneous Data Integration
• Integration of homogeneous or mostly homogeneous data via EML metadata is relatively straightforward
SEEK Overview, 3/2004 29
Heterogeneous Data Heterogeneous Data integrationintegration
• Requires advanced metadata and processing
– Attributes must be semantically typed– Collection protocols must be known– Units and measurement scale must be known– Measurement relationships must be known
• e.g., that ArealDensity=Count/Area
SEEK Overview, 3/2004 30
• Label data with semantic types• Label inputs and outputs of analytical components with
semantic types
• Use reasoning engines to generate transformation steps– Beware analytical constraints
• Use reasoning engine to discover relevant components
Semantic MediationSemantic Mediation
Data Ontology Workflow Components
SEEK Overview, 3/2004 31
Ecological ontologiesEcological ontologies
• What was measured (e.g., biomass)• Type of measurement (e.g., Energy)• Context of measurement (e.g., Psychotria limonensis)• How it was measured (e.g., dry weight)
• SEEK intends to enable community-created ecological ontologies using OWL– Represents a controlled vocabulary for ecological metadata
SEEK Overview, 3/2004 32
ExtensionsExtensions: Semantic Types: Semantic Types
• Take concepts and relationships from an ontology to “semantically type” the data-in/out ports
• Application: e.g., design support: – smart/semi-automatic wiring, generation of “massaging
actors”
m1
(normalize)p3 p4
Takes Abundance Count
Measurements for Life StagesReturns Mortality Rate Derived
Measurements for Life Stages
SEEK Overview, 3/2004 35
Semantic TypesSemantic Types
• The semantic type signature– Type expressions over the (OWL) ontology
m1
(normalize)p3 p4
SemType m1 ::
Observation & itemMeasured.AbundanceCount &
hasContext.appliesTo.LifeStageProperty
->
DerivedObservation & itemMeasured.MortalityRate &
hasContext.appliesTo.LifeStageProperty
SEEK Overview, 3/2004 36
Extended Type System Extended Type System (here: OWL (here: OWL Semantic Types)Semantic Types)
SemType m1 :: Observation & itemMeasured.AbundanceCount & hasContext.appliesTo.LifeStageProperty DerivedObservation & itemMeasured.MortalityRate & hasContext.appliesTo.LifeStagePropertySubstructure association:
XML raw-data =(X)Query=> object model =link => OWL ontology
SEEK Overview, 3/2004 37
Semantic Types for Scientific Semantic Types for Scientific WorkflowsWorkflows
SEEK Overview, 3/2004 38
Deriving Data Transformations Deriving Data Transformations from Semantic Service from Semantic Service
RegistrationRegistration
[Bowers-Ludaescher,DILS’04]
SEEK Overview, 3/2004 39
Structural and Semantic MappingsStructural and Semantic Mappings
[Bowers-Ludaescher,DILS’04]
SEEK Overview, 3/2004 40
• Fundamental improvements for researchers
– Global access to ecologically relevant data– Rapidly locate and utilize distributed computation– Capture, reproduce, extend analysis process
SEEK ImpactSEEK Impact
SEEK Overview, 3/2004 41
AcknowledgementsAcknowledgements
This material is based upon work supported by:
The National Science Foundation under Grant Numbers 9980154, 9904777, 0131178, 9905838, 0129792, and 0225676.
PBI Collaborators: NCEAS, University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research)
Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON
top related