1 Foundations VI: Foundations VI: Discovery, Access and Semantic Integration Deborah McGuinness and Peter Fox CSCI-6962-01 Week 11, November 10, 2008
Jan 13, 2016
1
Foundations VI: Foundations VI: Discovery, Access and
Semantic Integration
Deborah McGuinness and Peter Fox
CSCI-6962-01
Week 11, November 10, 2008
Contents• Review of reading, questions, comments
• Semantic Integration using SESDI – Semantically-Enabled Scientific Data Integration as an example
• Semantically-Enabled Search – ex. Noesis
• Integration using Top Level Science ontologies
• Summary
• Next week2
References• Fox, P.; McGuinness, D.L.; Raskin, R.; Sinha, K. A Volcano Erupts: Semantically
Mediated Integration of Heterogeneous Volcanic and Atmospheric Data. Proceedings of the First Workshop on Cyberinfrastructure: Information Management in eScience, co-located with the ACM Conference on Information
and Knowledge Management, Lisbon, Portugal, November 9, 2007. ftp://ftp.ksl.stanford.edu/pub/KSL_Reports/KSL-07-09.pdf
• Sunil Movva, Rahul Ramachandran, Xiang Li, Phani Cherukuri, Sara Graves. Noesis: A Semantic Search Engine and Resource Aggregator for Atmospheric Science. NSTC2007. http://esto.nasa.gov/conferences/nstc2007/papers/Ramachandran_Rahul_A3P4_NSTC-07-0084.pdf
• Boyan Brodaric and Florian Probst. Enabling Cross-Disciplinary e-Science by Integrating Geoscience Ontologies with DOLCE. Under Review. 2008.
• Yolanda Gil, Ewa Deelman, Mark Ellisman, Thomas Fahringer, Geoffrey Fox, Dennis Gannon, Carole Goble, Miron Livny, Luc Moreau, Jim Myers, "Examining the Challenges of Scientific Workflows," Computer , vol. 40, no. 12, pp. 24-32, December, 2007. http://www.isi.edu/~gil/papers/computer-NSFworkflows07.pdf
3
4
Semantic Web Methodology and Technology Development Process
• Establish and improve a well-defined methodology vision for Semantic Technology based application development
• Leverage controlled vocabularies, et c.
Use Case
Small Team, mixed skills
Analysis
Adopt Technology Approach
Leverage Technology Infrastructur
e
Rapid PrototypeOpen World:
Evolve, Iterate, Redesign, Redeploy
Use Tools
Science/Expert Review & Iteration
Develop model/
ontology
EvaluationEvaluation
Motivation for Semantic Integration
• In order to solve problems that are inherently multi-disciplinary, researchers often need data from many varied sources.
• Consider problems such as global warming or some problems that you suggested 2 classes ago – e.g.impact of earthquakes on transportation, etc.
Semantically Enabled Scientific Data Integration
• SESDI slides from joint work with Fox, McGuinness, Raskin, Sinha and materials from – McGuinness et al, Geoinformatics 2007– Fox et al, ESTO 2008
Mt. Spurr, AK. 8/18/1992 eruption, USGS
http://www.avo.alaska.edu/image.php?id=319
Eruption cloud movement from Mt.Spurr, AK,1992
USGS
Tropopause
http://aerosols.larc.nasa.gov/volcano2.swf
Atmosphere Use Case• Determine the statistical signatures of both
volcanic and solar forcings on the height of the tropopause From paleoclimate researcher – Caspar Ammann – Climate and Global
Dynamics Division of NCAR - CGD/NCAR
Layperson perspective:
- look for indicators of acid rain in the part of the atmosphere we experience…
(look at measurements of sulfur dioxide in relation to sulfuric acid after volcanic eruptions at the boundary of the troposphere and the stratosphere)
SESDI Impact: A Better Way to Access DataThe Problem
Scientists often only use data from a single instrument because it is difficult to access, process, and understand data from multiple instruments. A typical data query might be:
“Give me the temperature, pressure, and water vapor from the AIRS instrument from Jan 2005 to Jan 2008”
“Search for MLS/Aura Level 2, SO2 Slant Column Density from 2/1/2007”
A Solution
Using a simple process, SESDI allows data from various sources to be registered in an ontology so that it can be easily accessed and understood. Scientists can use only the ontology components that relate to their data. An SESDI query might look like:
“Show all areas in California where sulfur dioxide (SO2) levels were above normal between Jan 2000 and Jan 2007”
This query will pull data from all available sources registered in the ontology and allow seamless data fusion.
Components to implement• An analysis application
• Cross-domain terms, concepts and relations
• Connections to underlying data (registration)
• Framework to put these together
• Integration connector
Detection and attribution relations…
Data Registration Framework
Level 1:
Data Registration at the Discovery Level,
e.g. Volcanolocation and activity
Level 2:
Data Registration at the Inventory Level, e.g. list of datasets by,types, times, products
Level 3:
Data Registration at the Item Detail
Level, e.g. access toindividual quantities
Ontology basedData Integration
Earth Sciences Virtual DatabaseA Data Warehouse where
Schema heterogeneity problem is Solved; schema based integration
Data Discovery Data Integration
A.K.Sinha, Virginia Tech, 2006
How to find the data?• Think about it the way the data providers do
SEDRE: Semantically Enabled Data Registration Engine
A. K. Sinha, A. Rezgui, Virginia Tech
•SEDRE: a system that enables scientists to semantically register data sets for optimal querying and semantic integration
•SEDRE enables mapping of heterogeneous data to concepts in domain ontologies
Semantic Registration in SEDRE: An Overview
Registry Server: Warehouse of registration
records
Volcanic Data
SEDRESEDRE
UserData
Server
Semantic Registration
Registration/Discovery Ontologies
Ontology Server
Registry Database
• SEDRE is a desktop application
• Users download and install SEDRE
• SEDRE accesses domain ontologies
• Users map data attributes (e.g., SO2) to concepts in
ontologies without ‘knowing it’
Example 1: Registration of Volcanic Data
SO2 Emission from Kilauea east rift zone -
vehicle-based (Source: HVO)Abreviations: t/d=metric tonne (1000 kg)/day, SD=standard deviation, WS=wind speed, WD=wind direction east of true north, N=number of traverses
Location Codes:• U - Above the 180° turn at Holei Pali (upper Chain of Craters Road)
• L - Below Holei Pali (lower Chain of Craters Road)
• UL - Individual traverses were made both above and below the 180° turn at Holei Pali
• H - Highway 11
Loading Volcanic Data into SEDRE
Registering Volcanic Data (1)
Registering Volcanic Data (2)
• No explicit lat/long data
• Volcano identified by name
• Volcano ontology framework will link name to location
Example 2: Registration of Atmospheric Data
Satellite data for SO2 emissions
Abbreviation: SCD: Slant Column Density (in Dobson Unit (DU))
Loading Atmospheric Data into SEDRE
Registering Atmospheric Data (1)
Registering Atmospheric Data (2)
SEDRE+DIA: Overview
Query: Show all areas in California where sulfur dioxide (SO2) levels have been above normal between Jan. 1990
and Dec. 1990
User
Registry Server: Warehouse of registration
records
Map Server
Volcanic Data
SEDRESEDRE
UserData
Server
Semantic Registration
DIADIA
Gazetteer
Geochronology
Geodynamics
Ore DepositsGeochemistry
Petrology Mineralogy Structure Geophysics
Stratigraphy
Paleontology
Hydrology TectonicsAreal Geology
DIA (Discovery, Integration, Analysis)
SEDRE (Semantically Enabled Data Registration)
Registration/Discovery Ontologies
Ontology Server
Registry Database
Data Access
Semantic Discovery
DIA: Web-based System for Data Discovery, Integration and Analysis
(Developed at Virginia Tech through NSF funding)
SESDI Data Registration Summary Summary
• Semantic data frameworks technologies are changing the landscape of providing data to scientists
• Tools for data registration are soon to be available
• Applications to perform data integration mediated by semantics are available
• Initial results - applied to two volcanoes - led to correlation of SO2 concentration from volcano and in the atmosphere and relation to H2SO4.
Volcano Workshop - before
Volcano concept map after the workshop - some linked concepts are circled
Plate tectonics - before workshop
Plate tectonics - after
Atmosphere (portions from SWEET)
Atmosphere II
Volcano Workshop - before
Volcano concept map after the workshop - some linked concepts are circled
Packages for an Ontology for Volcanoes
Data Types
Volcanic System
Climate
Phenomenon Material
Instruments
ImportNASA: Semantic Web for Earth Science
Numerics Ontology
ImportNASA: Semantic Web for Earth Science
Units Ontology
ImportNASA: Semantic Web for Earth Science
Physical Property Ontology
ImportNASA: Semantic Web for Earth Science
Physical Phenomena Ontology
Planetary Material
Planetary Structure
Physical Properties
PlanetaryLocation
Geologic Time
GeoImage
PlanetaryPhenomenon
IMPORT EXISTINGONTOLOGIES
SWEET GEON
DOLCE ROCKS: Integrating Foundational and Geoscience Ontologies
Preliminary results for the integration of concepts from DOLCE, GeoSciML, and SWEET
Boyan BrodaricNatural Resources CanadaFlorian ProbstUniversity of Muenster
From SSKI Spring Symposium Series on Semantic Scientific Knowledge Integration
Outline
Foundational ontologies
DOLCE
DOLCE + GeoSciML
DOLCE + SWEET
DOLCE + GeoSciML + SWEET
Brodaric / Probst – SSKI 2008
Foundational Ontologies
CONTENTS
General concepts and relations that apply in all domainsphysical object, process, event,…, inheres, participates,…
Rigorously definedformal logic, philosophical principles, highly structured
ExamplesDOLCE, BFO, GFO, SUMO, CYC, (Sowa)
Brodaric / Probst – SSKI 2008
Foundational Ontologies
PURPOSE: help integrate domain ontologies
Geophysics ontology
Marine ontology
Water ontology
Planetary ontology
Geology ontology
Struc ontology
Rock ontology
“…and then there was one…”
Foundational ontology
Brodaric / Probst – SSKI 2008
Foundational Ontologies
PURPOSE: help organize domain ontologies
“…a place for everything, and everything in its place…”
Foundational ontology
shale rock formation
lithification
Brodaric / Probst – SSKI 2008
Problem scenario
Little work done on linking foundational ontologies with geoscience ontologies
Such linkage might benefit various scenarios requiring cross-disciplinary knowledge, e.g.:
water budgets: groundwater (geology) and surface water (hydro)
hazards risk: hazard potential (geology, geophysics) and items at threat (infrastructure, people, environment, economic)
health: toxic substances (geochemistry) and people, wildlife
many others…
Brodaric / Probst – SSKI 2008
focus of this talk
Project
SWEET GeoSciML
DOLCE
Objectivesevaluate fit between DOLCE, GeoSciML, SWEET
evaluate operational benefits: e.g. data discovery, integration,…
Approachextend DOLCE
do not alter GeoSciML, SWEET
use Protégé / OWL and SeReS
Expected Resultsunified ontology, internally consistent
increased ability to discover and integrate data
Brodaric / Probst – SSKI 2008
DOLCE 2.1 Lite-Plus, OWL 397
PerdurantEndurant
Quality
Abstract
inheres
inheres
Physical Quality Temporal Qualitycolor age
located-in
located-in
GSA Time-scale
Physical Region Temporal Regionbrown
Munsell-space
Ordovician
rock body
Lithification
event
participates
Physical EndurantPhysical Object
ProcessEventState
Brodaric / Probst – SSKI 2008
GeoSciML 2.0 beta GML-UML schema of basic geologic entities
focus: Geologic Unit, Earth Material
some classes, many relations align relations
Brodaric / Probst – SSKI 2008
CompoundMaterial part : EarthMaterial plays: CompositionPart generic-constituent: Particle participant-in: GeologicProcess [0..*] host-of: Fabric[0..*]
has_quality: Lithology [1..*] has_quality: CompositionCategory [0..*] has_quality: PhysicalQuality [0..*] has_quality: MetamorphicQuality [0..*] has_quality: ConsolidationDegree
Rock UnconsolidatedMaterial
DOLCE + GeoSciML (1)
Physical-Body Amount-Of-Matter
generic-constituent
Physical Quality Physical Region
has-quality q_location
Benefitsfull coverage of GeoSciML fragment
Issuesclassification vs quality vs subtype
complex qualities (non-unary)
reference spaces (e.g. units of meas.)
has-q: ParticleType
has-q: FabricType
has-q: ProcessType
has-q: MineralClass
has-q: ChemicalClass
Particletype
Grain
Fabrictype
Foliated
Processtype
Sedimentary
Mineralclass
QAPFregion
Chemicalclass
TASregion
Rock
Lithology
Rocktype
Shale
GeologicUnit
EarthMaterial
CompoundMaterial
GeologicUnitType Formation X
Brodaric / Probst – SSKI 2008
DOLCE + GeoSciML (2)
Physical-Body Amount-Of-Matter
generic-constituent
Physical Quality Physical Region
has-quality q_location
Social-Object Concept
classifies
part:Particletype
Grain
req: ProcessType
Sedimentary
req: GrainSizeParam
Density
GrainSize
Rock
DensityRegion
GrainSizeRegion
aphanitic
Lithology
Shale
GeologicUnit
EarthMaterial
CompoundMaterial
Formation X
Benefitsfull coverage of GeoSciML fragment
classifications are not qualities
Issueswhat is / isn’t a concept?
subtypes vs concepts
duplicate qualities & params
Brodaric / Probst – SSKI 2008
SWEET 1.1 beta
OWL ontology of Earth related entities
focus: Substance, Earthrealm, Sunrealm, Phenomena, Process, Biosphere, Property
many classes, few relations align classes
Brodaric / Probst – SSKI 2008
DOLCE + SWEETDOLCE = SWEET < SWEET
Physical-body BodyofGround, BodyofWater,…
Material-Artifact Infrastructure, Dam, Product,…
Physical-Object LivingThing, MarineAnimal
Amount-of-Matter Substance
Activity HumanActivity
Physical-Phenomenon Phenomena
Process Process
State StateOfMatter
Quality Quantity, Moisture,…
Physical-Region Basalt,…
Temporal-Region Ordovician,…
Benefitsfull coverage
rich relations
home for orphans
single superclasses
Issuesindividuals (e.g. Planet Earth)
roles (contaminant)
features (SeaFloor)
Brodaric / Probst – SSKI 2008
DOLCE + SWEET + GeoSciMLDOLCE = SWEET < SWEET =GeoSciML <GeoSciML
Physical-body BodyofGround <RockBody GeologicUnit
Material-Artifact Infrastructure, Dam, Product,…
Physical-Object LivingThing, MarineAnimal
Amount-of-Matter SubstanceMixedSubstanceRock
EarthMaterial CompoundMaterialRock
Activity HumanActivity
Physical-Phenomenon PhenomenaSolidEarthPhen. GeologicEvent
Process ProcessGeologicalProcess
State StateOfMatter
Quality QuantityAge eventAge
Physical-Region Basalt,…
Temporal-Region Ordovician
Brodaric / Probst – SSKI 2008
Status
DOLCE + GeoSciML + SWEET: mappings complete*
Protégé / OWL: encoding in progress
Operational evaluation: future work
Brodaric / Probst – SSKI 2008
Conclusions
Surprisingly good fit amongst ontologiesso far: no show-stopper conflicts, a few difficult conflicts
DOLCE richness benefits geoscience ontologiesgood conceptual foundation helps clear some existing problems
Unresolved issues in modeling science entitiesmodeling classifications, interpretations, theories, models,…
Brodaric / Probst – SSKI 2008
Semantic Search• One thing we can think about is building
domain literate tools that “understand” one or more science areas and use that knowledge (empowered by ontologies) to find or otherwise intelligently manipulate data.
• IWSearch is a simple example that uses its knowledge of PML to refine SWOOGLE and return appropriate search terms. This uses an underlying PML ontology
• NOESIS uses background science ontologies to find data
Motivating Example
Related Results
Unrelated Results
Ramachandran and Movva
The Problem• Search query is broad and not scoped.
• Only one kind of resource is searched (E.g., Web-documents).
• Semantics associated with the search string are not captured.
Noesis is an attempt to solve these issues– Search only what you want (Search Scoping)– Search everything that you want (Resource
aggregationRamachandran and Movva
Noesis Approach• Noesis provides topic-based searches,
retrieving resources conceptually related to the user’s need.
• Semantic information is captured in the domain ontologies and is used for providing context for the search process.
Ramachandran and Movva
Noesis Approach (cont’d) Using domain ontologies, Noesis provides a
guided refinement of search query producing successful searches and reducing the user’s burden to experiment with different search strings.
• Semantics are captured by providing the option to expand the query terms to include suggested related concepts.
Ramachandran and Movva
Noesis AlgorithmNoesis search engine uses a two step algorithm.
– Query Analysis: Search query is broken down to identify concepts that are defined in the domain.
– Query Expansion: Related concepts are added to the search string for Scoping the search.
Ramachandran and Movva
Ontology• From Machine Learning/Artificial
Intelligence/Intelligent Systems perspective “an ontology is a formal, explicit specification of a shared conceptualization” (Gruber, 1993).
• An ontology captures and encodes domain knowledge of concepts, constraints and the relationships among them, for use in a machine-readable fashion.
Ramachandran and Movva
Noesis OntologiesNoesis uses two classes of ontologies.• Domain ontologies: These are the ontologies
used to describe concepts in a domain and their relationships. Noesis uses a set of core ontologies for describing concepts in Atmospheric Science.
• Application ontologies: Application ontologies describe a domain with respect to a function (E.g., A data archive). They enable flexible querying by bridging the gap between semantic concepts and the application specific vocabulary.
Ramachandran and Movva
Providing Semantics - Specializations
• Adding related concepts to the search term as a way to provide semantic information.
• Ontologies are organized in tree like taxonomies, where the child nodes represent the specializations of a parent node.
• The first type of the related terms are specializations (SP). They are used to expand the search terms thus providing more detailed search. For example: – Search Term: “height– Additional term(SP): “free-board”
Ramachandran and Movva
Providing Semantics - Synonyms• Synonyms are different terms that have the same
meaning. In ontological terms they are the equivalent concepts.
• owl:equivalent-class allows linking two syntactically different terms to one semantic concept (synonyms).
• The second form of related concepts are synonyms (SN). These include acronyms. For Example:
• Search Term: ‘Albedo’• Additional Term(SN): ‘Reflectance’
Ramachandran and Movva
Providing Context• In reality, every concept has a set of other
concepts, that are neither in the same inheritance hierarchy nor equivalent, that tend to co-exist.
• These are called the associated concepts and they are captured in the ontology through the property relationships.
• Associated concepts are the third type of related terms (RT) and they provide the context for the search query. For example
• Search Term: ‘Mesocyclone’• Associated Term (RT): ‘Vorticity’
Ramachandran and Movva
Semantics for Search Scoping
A user can start at a general topic and navigate to specific topics by selecting the related concepts of interest.
Related Terms
Specializations
Specializations and related termsfor the search term (“Cyclone”)
presented to the user
Synonyms for the search term (“Reflectance”) presented to the user
Synonyms
Semantics for Search Scoping
The Search• The identified concepts from the query string
are used to search the Ontology inference service (OIS) to get the related concepts (SP,SN and RT).
• The obtained terms are used to search other resources like Google, Yahoo etcSearch Query:(SP1.. SPn SN1..SPn RT1..RTn).
The Search (Cont’d)
Noesis Engine(Query Expansion, Search and Refine results)
Noesis GUI
Inference Engine/Reasoner (Pellet)
Ontologies(CF, Core)
Ontology Inference Service (OIS)
Semantic Support
Syntax Based Search Engines
Yahoo Google …
Open Web Search Engines
DataBase Search Engine …
Hidden Web Search
DataSet Interoperability• Over Years, data formats have evolved from
different communities, thus making them limited to the community and context.
• Semantically homogeneous data is archived as syntactically heterogeneous data sets. For example:
• DataSet1: Albedo• DataSet2: Reflectance
• Semantic Interoperability between data formats is required to make them usable across communities.
Application Ontologies
• Data archives are often catalogued with content meta data to enable searching. But, users still need to search for syntactically matching keywords.
• Application ontologies describe a domain with respect to function (i.e. Data archive).
• Application ontologies enable flexible querying by bridging the gap between semantic concepts and the keywords.
Application Ontologies (Cont’d)
• An application ontology is added for every new catalog (E.g.: CF-Ontology).
• This ontology annotates the terms used in the catalog with the conceptual meanings.
• The concepts in this ontology are linked with Core ontology (Modified SWEET) through the owl:equivalentClass.
Noesis - Data Search
Noesis Engine(Query Expansion, Search and Refine results)
Noesis GUI
Semantic Support
Resource Catalog Search Engine
…
Data Search Engines
1. Search Request8. Results for display
6. Search with mapped terms
2. Related concept Request
Inference Engine/Reasoner (Pellet)
Ontologies(CF, Core)
Ontology Inference Service (OIS)
4. Related CF Terms Req
3. Related Concepts
5. CF Terms
7. Data Search Results
Resource AggregationNoesis is a meta search engine
– Meta Search Engines simultaneously search multiple Open Web and Hidden Web resources to provide increase search coverage, efficiency and effectiveness.
– Searches for web-pages, data, education material and Publications.
Resource Aggregation (Contd.)
Noesis uses the scoped search string to fetch resources, through search web services provided by third parties:
• Web: Yahoo, Google, Ask.com• Data: NCDC, NCAR, NOAA, NASA GCMD, LEAD, IPCC Model
Ouput, LDEO• Publications: AMS, Elsevier, Springer, RMS, Blackwell, AGU• Education: DLESE
Resource Aggregation (Contd.)
Collates resources such as web pages, data sets, publications etc.
Example – Step 1 (User Input)
User types a search term (‘Cyclone’)
Example – Step 2 (Query Refinement)
Related Terms
Specializations
User selects the related terms to add to his search query
Example – Web Results (Step 3)Refined query
Querying Web Search Engines
Relevant Results
Example – Publications (Step 3)
Refined query used for publications search
Example – Data Search (Step 3)
Mapped query used for data search
Next week• This weeks assignment:
– No new reading:– Prepare your class presentations– No physical office hours this week . Office hours
are virtual (through email) this week. Peter and I are both traveling but will be connected this week.
• Next class (week 12 – November 17): – Class presentations
• Questions?84