Vision for the 21 Vision for the 21 st st Century Century Information Environment in Information Environment in Ecology (Ecoinformatics) Ecology (Ecoinformatics) Deana Pennington Deana Pennington University of New Mexico University of New Mexico LTER Network Office LTER Network Office Shawn Bowers Shawn Bowers UCSD UCSD San Diego Supercomputer Center San Diego Supercomputer Center
49
Embed
Vision for the 21 st Century Information Environment in Ecology (Ecoinformatics) Deana Pennington University of New Mexico LTER Network Office Shawn Bowers.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Vision for the 21Vision for the 21stst Century Century Information Environment in Information Environment in Ecology (Ecoinformatics) Ecology (Ecoinformatics)
Deana PenningtonDeana PenningtonUniversity of New MexicoUniversity of New Mexico
LTER Network OfficeLTER Network Office
Shawn BowersShawn BowersUCSDUCSD
San Diego Supercomputer CenterSan Diego Supercomputer Center
Remotely sensed images capture information continuous space, which can then be compared through time to derive events
Wireless sensors capture information at a continuous time, which can then be compared through space to derive spatial patterns
Event
t = 2
t = 1
t
tt
Event A Event A
Event A
History Repeats Itself…History Repeats Itself…
“…“…use of remotely sensed data…lagged for many use of remotely sensed data…lagged for many years. The reasons for this have little to do with the years. The reasons for this have little to do with the sophistication of remote sensing technology. Rather sophistication of remote sensing technology. Rather it has to do more with the ability to store, manage, it has to do more with the ability to store, manage, access and use the massive data produced by access and use the massive data produced by satellites, radar facilities and other remote sensing satellites, radar facilities and other remote sensing instruments. Without instruments. Without advanced information advanced information processingprocessing, it would take decades , it would take decades to compile and to compile and analyzeanalyze the incredible amounts of information that the incredible amounts of information that produced by many of these instruments.” produced by many of these instruments.”
-Dr. Rita Colwell, Director NSF, 1998-Dr. Rita Colwell, Director NSF, 1998
SensorsSensors Deployed Sensor NetworksDeployed Sensor Networks MetadataMetadata Security and Error ResiliencySecurity and Error Resiliency Cyberinfrastructure for Sensor NetworksCyberinfrastructure for Sensor Networks Analysis and VisualizationAnalysis and Visualization
EducationEducation OutreachOutreach Collaboration and PartneringCollaboration and Partnering
Environmental Cyberinfrastructure Needs for Distributed Sensor Networks: a Report from a NSF Sponsored Workshop (2003)
InformationAcquisition,
Archival & Retrieval
Data Preprocessing
& Product Creation
Integrated DataAnalysis &Synthesis
InferenceFrom
Pattern
Incorporating IT Incorporating IT Analytical Advances into Analytical Advances into
Extend the current web with Extend the current web with “knowledge”“knowledge” and and “meaning”“meaning” for for
Better searchingBetter searching (that is, better answers to current (that is, better answers to current searches)searches)
Automated software toolsAutomated software tools that process web that process web information (comparison shopping, making information (comparison shopping, making appointments, and so on)appointments, and so on)
Proposes a new form of Proposes a new form of web contentweb content,, which uses which uses ontologies ontologies and and knowledge representationknowledge representation techniquestechniques
The Semantic Web The Semantic Web [Sci. Am., [Sci. Am., May ‘01, Berners-Lee]May ‘01, Berners-Lee]
Semantic-Web Agent
Find physical therapistfor mom using my schedule
get openings
get physicianprescription
get possible providersand availability
get locations
Return provideravailable within 10 miles of location
“Mom needs to see a specialist for a series of physical therapy sessions – can you take her?”
Semantic Web Semantic Web Architecture (RDF)Architecture (RDF)
The The Resource Description Framework Resource Description Framework (RDF), (RDF), which is a language to:which is a language to:
Define Define standard ontologiesstandard ontologies AnnotateAnnotate web-pages with Semantic-Web web-pages with Semantic-Web
content content
Ultimately, tools … to exploit semantic Ultimately, tools … to exploit semantic mark upmark up
Web-crawlers, search engines, personal agentsWeb-crawlers, search engines, personal agents
RDF / RDF SchemaRDF / RDF Schema
An RDF Schema (or OWL) An RDF Schema (or OWL) ontologyontology
Serves as a common set of terms (a Serves as a common set of terms (a vocabularyvocabulary) with ) with relationshipsrelationships and and constraintsconstraints
Can be Can be publishedpublished as Web-content using RDF (for as Web-content using RDF (for others to use)others to use)
worksAtcoversInsuranceProvider
InsuranceProvider PhysicanPhysican
PhysicalTherapistPhysicalTherapist
MedicalFacilityMedicalFacility LocationLocation
locatedAt
RDF / RDF SchemaRDF / RDF Schema
With RDF, this Web-page With RDF, this Web-page can be annotated using the can be annotated using the ontologyontology
worksAtcoversInsuranceProvider
Physican
PhysicalTherapist
MedicalFacility
LocationlocatedAt
BlueCrossBlueCross Dr. HartmanDr. Hartman UniversityHospital
UniversityHospital
555 Univ.Drive …
555 Univ.Drive …
covers worksAt locatedAt
RDF / RDF SchemaRDF / RDF Schema
Annotations provide access to Annotations provide access to the meaningful, or semantic the meaningful, or semantic content of the Web-pagecontent of the Web-page
worksAtcoversInsuranceProvider
Physican
PhysicalTherapist
MedicalFacility
LocationlocatedAt
BlueCross Dr. HartmanDr. HartmanUniversityHospital
555 Univ.Drive …
covers worksAt locatedAt
Which Physical Therapists workAt a Facility within Location X?
Which Physical Therapists workAt a Facility within Location X?
SEEK and the Semantic SEEK and the Semantic WebWeb
We want to build technology using Semantic-We want to build technology using Semantic-Web standards to …Web standards to …
… … explore the use of semantics to help explore the use of semantics to help scientists deal with heterogeneityscientists deal with heterogeneity Define standard Define standard ecological ontologiesecological ontologies Automate dataset and analytic-step Automate dataset and analytic-step discoverydiscovery, ,
exchangeexchange, and , and integrationintegration Help researchers construct and reuse Help researchers construct and reuse scientific scientific
workflowsworkflows, for example, for ecological modeling, for example, for ecological modeling
SEEK SEEK EcoGridEcoGrid
Pipeline
Pipeline
1. Question of interest2. Query EcoGrid for workflows (ontologies)3. Query EcoGrid for data (ontologies & semantic mediation)4. SRB optimizes and runs analysis5. Get results…archive to EcoGrid
Resources (data & computational)Managed by Storage Resource Broker (SRB)
EcoGridEcoGrid
Analytical Services
Matt Jones, 2003Data Services(includes analytical libraries)
Storage Resourc
e Broker
1. Node Registry• Web service: XML standards, SOAP/WSDL protocols• Data: REQUIRES standard metadata (EML and others)• Workflows: standard workflow metadata?
Overview of Overview of architecturearchitecture
SEEK Components
Benefits to UsersBenefits to Users ScientistsScientists
Access to high end computing Access to high end computing technologiestechnologies
Better integration of all relevant Better integration of all relevant datadata
Workflow standardization and Workflow standardization and analysisanalysis
Time and resource efficiencyTime and resource efficiency Reusable analytical steps & Reusable analytical steps &
workflowsworkflows
StudentsStudentsImproved access to knowledge baseImproved access to knowledge base
Environmental ManagersEnvironmental ManagersAccessibility to current scientific Accessibility to current scientific
approachapproach
Policy makersPolicy makersTimely input to decision makingTimely input to decision making
Formal documentation of Formal documentation of methods methods
(output in report format)(output in report format)Reproducibility of methodsReproducibility of methodsVisual creation and Visual creation and communication of methodscommunication of methodsVersioningVersioningAutomated data typing and Automated data typing and transformationtransformation
SEEK: ENM workflowsSEEK: ENM workflows
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
EcoGridDataBase
Training sample
GARPrule set
Test sample
Species pres. & abs.
points
EcoGridQuery
EcoGridQuery
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Model qualityparameters
Native range prediction map
Env. layers
GenerateMetadata
ArchiveTo Ecogrid
Selectedprediction
maps
PhysicalTransformatio
n
Scaling
Integrated layers
Integrated layers
GARPrule set
Species pres. & abs.
points
Analytical Pipelines Analytical Pipelines Sloan Digital Sky Project: Sloan Digital Sky Project:
Mapping the Universe Mapping the Universe
“The raw data…are fed through data analysis software pipelines…to extract about 400 attributes for each celestial object…These pipelines embody much of mankind’s knowledge of astronomy.” Szalay et al., 2001
Training sample
GARPrule set
Test sampleSpecies
pres. & abs. points
EcoGridQuery
LayerIntegration
SampleData
+A3+A2
+A1
DataCalculation
MapGeneration
Validation
User
Model qualityparameters
Native range prediction map
Env. layers
GenerateMetadata
ArchiveTo Ecogrid
Selectedprediction
maps
PhysicalTransformatio
n
Scaling
Integrated layers
Integrated layers
GARPrule set
Species pres. & abs.
points
Species Distribution Species Distribution PipelinePipeline
AcousticSignal
ProcessingPipeline
Remotely sensed data (land cover class, etc.)Ground sensor data (climate, etc.)
Society for Industrial and Society for Industrial and Applied Mathematics Applied Mathematics (SIAM) Conference on (SIAM) Conference on Imaging Science, 2004Imaging Science, 2004
CONFERENCE THEMES CONFERENCE THEMES Image acquisition Image acquisition Image reconstruction and Image reconstruction and
restoration restoration Image storage, compression, and Image storage, compression, and
retrieval retrieval Image coding and transmission Image coding and transmission PDEs in image filtering and PDEs in image filtering and
processing processing Image registration and warping Image registration and warping Image modeling and analysis Image modeling and analysis Statistical aspects of imaging Statistical aspects of imaging Wavelets and multiscale analysis Wavelets and multiscale analysis Multidimensional imaging sciences Multidimensional imaging sciences Inverse problems in imaging Inverse problems in imaging
sciences sciences Mathematics of visualization Mathematics of visualization Biomedical imaging Biomedical imaging Applications Applications
“By their very nature, these challenges cut across the disciplines of physics, engineering, mathematics, biology, medicine, and statistics.”
Grid TechnologyGrid TechnologyEcoGrid vs semantic webEcoGrid vs semantic web
Analytical pipelines/WorkflowsAnalytical pipelines/WorkflowsSensors: generic vs domain specificSensors: generic vs domain specificReuse of actors/workflowsReuse of actors/workflowsWorkflow metadata and reportingWorkflow metadata and reporting
Ontologies/Semantic MediationOntologies/Semantic MediationQuery EcoGrid for workflowsQuery EcoGrid for workflowsQuery EcoGrid for data to fit the selected Query EcoGrid for data to fit the selected
workflow(s)workflow(s)Integration of heterogenous data typesIntegration of heterogenous data types
Data MiningData Mining-finding interesting -finding interesting
AVHRR: 1 x 1 km pixels, 14 years * 26 images/year * 1824 pixels = 663,936 data pointsTM: 30 x 30m pixels, 14 years * 2 images/year * 65,260 pixels = 1,827,280 data points
if 20 images/year => 18,272,800 data points if 30 years => 39,156,000 data points
Data Mining ChallengesData Mining ChallengesBiomedical DataBiomedical Data Large sample setsLarge sample sets Few correlates (dozens)Few correlates (dozens) Hard classesHard classes
Ecologic DataEcologic Data Paucity of accurate reference dataPaucity of accurate reference data Spatial autocorrelationSpatial autocorrelation Large number of potential Large number of potential
Spatiotemporal analysis & Spatiotemporal analysis & visualization techniques that visualization techniques that explicitly deal with these explicitly deal with these challengeschallenges
EcoGrid archive of ground truth EcoGrid archive of ground truth data and the ontologies that will data and the ontologies that will allow us to semantically mediate allow us to semantically mediate the classesthe classes
Multidisciplinary staffMultidisciplinary staff Working groups (4-6 weeks)Working groups (4-6 weeks) Multidisciplinary postdocsMultidisciplinary postdocs Summer school in Summer school in