Bertram Lud ä scher Data and Knowledge System San Diego Supercomputer Center U.C. San Diego
Post on 25-Jan-2016
23 Views
Preview:
DESCRIPTION
Transcript
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 11
EDBT Panel, March 2002, Prague:EDBT Panel, March 2002, Prague: Scientific Data Integration Scientific Data Integration
for for Complex Multiple-WorldsComplex Multiple-Worlds Scenarios: Scenarios: Databases Meets Knowledge RepresentationDatabases Meets Knowledge Representation
EDBT Panel, March 2002, Prague:EDBT Panel, March 2002, Prague: Scientific Data Integration Scientific Data Integration
for for Complex Multiple-WorldsComplex Multiple-Worlds Scenarios: Scenarios: Databases Meets Knowledge RepresentationDatabases Meets Knowledge Representation
Bertram LudBertram Ludääscherscher
Data and Knowledge SystemData and Knowledge System
San Diego Supercomputer Center San Diego Supercomputer Center
U.C. San DiegoU.C. San Diego
Bertram LudBertram Ludääscherscher
Data and Knowledge SystemData and Knowledge System
San Diego Supercomputer Center San Diego Supercomputer Center
U.C. San DiegoU.C. San Diego
A Home Buyer’s Information Integration ProblemA Home Buyer’s Information Integration Problem
What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood
with below-average crime rate and diverse population?
?Information Integration
?Information Integration
RealtorRealtor DemographicsDemographicsSchool RankingsSchool RankingsCrime StatsCrime Stats
“Simple Multiple-Worlds”Mediation Problem
=> XML-Based Mediator
“Simple Multiple-Worlds”Mediation Problem
=> XML-Based Mediator
A Neuroscientist’s Information Integration ProblemA Neuroscientist’s Information Integration Problem
What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity?
How about other rodents?
?Information Integration
?Information Integration
protein localization(NCMIR)
protein localization(NCMIR)
neurotransmission(SENSELAB)
neurotransmission(SENSELAB)
sequence info(CaPROT)
sequence info(CaPROT) morphometry
(SYNAPSE)
morphometry(SYNAPSE)
“Complex Multiple-Worlds”Mediation Problem
=> Model-Based Mediator
“Complex Multiple-Worlds”Mediation Problem
=> Model-Based Mediator
A Geoscientist’s Information Integration ProblemA Geoscientist’s Information Integration Problem
What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ?
How does it relate to host rock structures?
?Information Integration
?Information Integration
Geologic Map(Virginia)
Geologic Map(Virginia) GeoChemicalGeoChemical GeoPhysical
(gravity contours)
GeoPhysical(gravity contours)
GeoChronologic(Concordia)
GeoChronologic(Concordia)
Foliation Map(structure DB)
Foliation Map(structure DB)
“Complex Multiple-Worlds”
Mediation
“Complex Multiple-Worlds”
Mediation
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 55
Scientific Data Integration Challenges: Scientific Data Integration Challenges: Heterogeneities in the 4S’s ...Heterogeneities in the 4S’s ...
• System AspectsSystem Aspects– platforms, devices, phys. distribution, transport protocols,
access APIs, impedance mismatch, user interfaces, application integration ...
• SyntaxesSyntaxes– heterogeneous data formats (one for each tool ...)
• StructuresStructures– heterogeneous schemas (one for each DB ...)– heterogeneous data models (RDBs, ORDBs, OODBs,
XMLDBs)
• SemanticsSemantics– unclear semantics: e.g., incoherent terminology, multiple
taxonomies, ...
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 66
Data Integration: Approaches / SolutionsData Integration: Approaches / Solutions
SyntaxSyntax
StructureStructure
SemanticsSemantics
System aspectsSystem aspects
• (Data-)Grid / Middleware(Data-)Grid / Middleware– system: distributed data & computing (SDSC
SRB, Globus, web services, WSDL)– source = file or DB
• XML-Based MediatorsXML-Based Mediators– structure: XML queries and views– source = XML-DB
• Model-Based/Semantic MediatorsModel-Based/Semantic Mediators– semantics: conceptual models and declarative
views – source = Knowledge Base (DB+CMs+ICs)
• Semantic Web FormalismsSemantic Web Formalisms– semantics: ontologies, description logics
(RDF(S), DAML+OIL,...)
• Knowledge/Semantic-GridKnowledge/Semantic-Grid– combination
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 77
What’s in a Link? What’s in a Link? • Syntactic Joins Syntactic Joins
(X,Y) := X.SSN = Y.SSN equality (X,Y) := X.UMLS-ID = Y.UID
• ““Speciality” JoinsSpeciality” Joins (X,Y,Score) := BLAST(X,Y,Score) similarity
• Semantic/Rule-Based JoinsSemantic/Rule-Based Joins (X,Y,C) :=
X isa C, Y isa C, BLAST(X,Y,S), S>0.8 homology, lub (X,Y,[produces,B,increased_in]) :=
X produces B, B increased_in Y. rule-based
e.g., X=-secretase, B=beta amyloid, Y=Alzheimer’s disease
• Challenge: Challenge: – compile semantic joins into efficient syntactic ones
XY
XML-Based vs. Model-Based MediationXML-Based vs. Model-Based Mediation
Raw DataRaw DataRaw Data
IF THEN IF THEN IF THEN
LogicalDomainConstraints
Integrated-CM :=
CM-QL(Src1-CM,...)
Integrated-CM :=
CM-QL(Src1-CM,...)
. . ....
....
........ (XML)Objects
Conceptual Models
XMLElements
XML Models
C2 C3
C1
R
Classes,Relations,Ontologiesis-a, has-a, ...
“Glue” Maps Domain Maps Process Maps
“Glue” Maps Domain Maps Process Maps
Integrated-DTD :=
XQuery(Src1-DTD,...)
Integrated-DTD :=
XQuery(Src1-DTD,...)
No DomainConstraints
A = (B*|C),DB = ...
Structural Constraints (DTDs),Parent, Child, Sibling, ...
CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}
NCMIR ANATOM NCMIR ANATOM Domain Map:Domain Map:• conceptsconcepts• relationsrelations• logic ruleslogic rules
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 1010
Semantics-Aware Semantics-Aware BrowsingBrowsing and and QueryingQuerying
Cerebellum
Source 1 Source 2
Source 3
Cerebellar Cortex
Granule Cell Layer
Purkinje Cell layer
Molecular Layer
has a
Purkinje Cell Dendrite
Dendritic spines
Dendritic shaft
Endoplasmic reticulum
Purkinje Neuron
has a
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 1111
Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)
Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)
Domain Map (DM)
Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).
Domain Expert Knowledge
DM in Description Logic
Formalizing Glue Knowledge:Formalizing Glue Knowledge:Domain Map for Domain Map for SYNAPSESYNAPSE and and NCMIRNCMIR
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 1212
Source Registration/Data ContextualizationSource Registration/Data Contextualization
Source registers data with an existing ontology, using description logics it may also refine the mediator’s
domain map... [ICDE01]
sources can register new concepts at the mediator ...
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 1313
Source Registration: Semantic Annotations Source Registration: Semantic Annotations
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 1414
Multiple Ways of Querying DataMultiple Ways of Querying Data
Brain
Cerebellum
Purkinje Cell Layer
Purkinje cell
neuron
has a
has a
has a
is aSpatial Representation (Atlases)
Ontologies
Transformations
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 1515
S1 S2
S3
(XML-Wrapper) (XML-Wrapper) (XML-Wrapper)
CM-Wrapper CM-Wrapper CM-Wrapper
USER/ClientUSER/Client
CM (Integrated View)
MediatorEngine
FL rule proc.
LP rule proc.
Graph proc.XSB Engine
GCM
CM S1
GCM
CM S2
GCM
CM S3
CM Queries & Results (exchanged in XML)
Domain MapsDMs
Domain MapsDMs
Domain MapsDMs
Domain MapsDMs
Domain MapsDMs
Process MapsPMs
“Glue” MapsGMs
semanticcontextCON(S)
Integrated View Definition IVD
Model-Based Mediator Architecture
First Results & Demos:[SSDBM’00] [VLDB’00]
[ICDE’01] [HBP’01] [EDBT’02][BNCOD’02]
Conceptual Model =• Object Model • Knowledge Base• Contextualization
Conceptual Model =• Object Model • Knowledge Base• Contextualization
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 1616
Model-Based Mediation Methodology ...Model-Based Mediation Methodology ...
• Lift Sources to export CMs: Lift Sources to export CMs:
CM(S) = OM(S) + KB(S) + CON(S)
• Object Model OM(Object Model OM(SS):):– complex objects (frames), class hierarchy, OO constraints
• Knowledge Base KB(Knowledge Base KB(SS):):– explicit representation of (“hidden”) source semantics
– logic rules over OM(S)
• Contextualization CON(Contextualization CON(SS):):– situate OM(S) data using “glue maps” (GMs): domain maps DMs (ontology)
= terminological knowledge: concepts + roles process maps PMs
= “procedural knowledge”: states + transitions
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 1717
... Model-Based Mediation Methodology... Model-Based Mediation Methodology
• Integrated View Definition (IVD)Integrated View Definition (IVD)– declarative (logic) rules with object-oriented features
– defined over CM(S), domain maps, process maps
– needs “mediation engineers” = domain + KRDB experts
• Knowledge-Based Querying and Browsing (runtime):Knowledge-Based Querying and Browsing (runtime):– mediator composes the user query Q with the IVD
... rewrites (Q o IVD), sends subqueries to sources
... post-processes returned results (e.g., situate in context)
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 1818
Mediation Scenarios & TechniquesMediation Scenarios & TechniquesFederated Databases XML-Based Mediation Model-Based Mediation
One-World One-/Multiple-Worlds Complex Multiple-Worlds
Common Schema Mediated Schema Common Glue Maps
SQL, rules XML query languages DOOD query languages
Schema Transformations Syntax-Aware Mappings Semantics-Aware Mappings
Syntactic Joins Syntactic Joins “Semantic” Joins via Glue Maps
DB expert DB expert KRDB + domain experts
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 1919
Some ObservationsSome Observations• Scientific Data Integration is different Scientific Data Integration is different
– e.g., complex and hidden semantics,...
• Co-Education (CS=>DS, DS=>CS) takes time Co-Education (CS=>DS, DS=>CS) takes time – NIH BioInformatics Research Network (BIRN) – Neuroscientists– DOE Scientific Data Management Center (SDM)– Starting with Ecologists, Geoscientists, ...
• A good thing about standards: A good thing about standards: • There are so many to choose from:There are so many to choose from:
– SQL, http, HTML, XML, XQuery, XSLT, XML Schema, RDF(S), DAML+OIL, DAML-S, UMLS, GO, XMI, SOAP, WSDL, ...
• Syntax is overrated (and its impact underestimated?)Syntax is overrated (and its impact underestimated?)– nobody likes LISP any more, but everybody likes XML ...
• 22ndnd Marriage of Knowledge Representation & Databases: Marriage of Knowledge Representation & Databases:– Semantic Web– (child from 1st marriage: Deductive Databases; aren’t they cute siblings? ;)=> model-based/semantic mediators
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 2020
Internet2
SOAP
SOA
P
OILOIL
The Road Ahead: Scientific Data Integration with The Road Ahead: Scientific Data Integration with the Semantic Web !?the Semantic Web !?
Data-Grid
Scientific DataScientific Data RDF DOOD rules
WSDL XQuery
DAML-S
RDF DOOD rules
WSDL XQuery
DAML-S
XMLXML RDF RDF
XMLDB
sub
sum
ptio
n
DAML
Logic
descrip
tion
log
ics
RDB
infe
ren
ce
ORDBontologies
’
Integrated Data ViewsIntegrated Data Views
Ivory
Tower
San Diego Supercomputer CenterSan Diego Supercomputer CenterEDBT'02, PragueEDBT'02, Prague 2121
Some Related References: Some Related References: Mediation of Neuroscience DataMediation of Neuroscience Data
• Model-Based Mediation with Domain MapsModel-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. , B. Ludäscher, A. Gupta, M. E. Martone, Martone, 17th Intl. Conference on Data Engineering17th Intl. Conference on Data Engineering ( (ICDEICDE), Heidelberg, Germany, ), Heidelberg, Germany, IEEE Computer Society, April 2001. IEEE Computer Society, April 2001.
• Navigating Virtual Information Sources with Know-MENavigating Virtual Information Sources with Know-ME, X. Qian, B. Ludäscher, , X. Qian, B. Ludäscher, M. E. Martone, A. Gupta, M. E. Martone, A. Gupta, demonstration track, Intl. Conference on Extending demonstration track, Intl. Conference on Extending Database TechnologyDatabase Technology ( (EDBTEDBT), Prague, Czech Republic, March 2002. ), Prague, Czech Republic, March 2002.
• Model-Based Information Integration in a Neuroscience Mediator SystemModel-Based Information Integration in a Neuroscience Mediator System , B. , B. Ludäscher, A. Gupta, M. E. Martone, Ludäscher, A. Gupta, M. E. Martone, demonstration track, 26th Intl. Conference on demonstration track, 26th Intl. Conference on Very Large DatabasesVery Large Databases ( (VLDBVLDB), Cairo, Egypt, September 2000. ), Cairo, Egypt, September 2000.
• Knowledge-Based Integration of Neuroscience Data SourcesKnowledge-Based Integration of Neuroscience Data Sources, A. Gupta, B. , A. Gupta, B. Ludäscher, M. E. Martone, Ludäscher, M. E. Martone, 12th Intl. Conference on Scientific and Statistical Database 12th Intl. Conference on Scientific and Statistical Database ManagementManagement ( (SSDBMSSDBM), Berlin, Germany, IEEE Computer Society, July 2000. ), Berlin, Germany, IEEE Computer Society, July 2000.
• A Cell-Centered Database for Electron Tomographic DataA Cell-Centered Database for Electron Tomographic Data, M. E. Martone, A. , M. E. Martone, A. Gupta, M. Wong, X. Qian, G. Sosinsky, S. Lamont, B. Ludäscher , and M. H. Gupta, M. Wong, X. Qian, G. Sosinsky, S. Lamont, B. Ludäscher , and M. H. Ellisman. Ellisman. Journal of Structural BiologyJournal of Structural Biology, 2002. to appear , 2002. to appear
top related