Scientific Data Integration with Scientific Data Integration with Model-Based Mediation Model-Based Mediation : : Databases Meets Databases Meets * Knowledge Knowledge Representation Representation Bertram Lud Bertram Lud ä ä scher scher [email protected][email protected]Knowledge-Based Integration Lab Knowledge-Based Integration Lab Data and Knowledge Systems Data and Knowledge Systems San Diego Supercomputer Center San Diego Supercomputer Center U.C. San Diego U.C. San Diego * * or rather or rather rediscovers rediscovers
33
Embed
Scientific Data Integration with Model-Based Mediation : Databases Meets * Knowledge Representation Bertram Ludäscher Bertram Ludä[email protected].
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Scientific Data Integration withScientific Data Integration with Model-Based Mediation Model-Based Mediation: :
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
Information Integration from a DB Perspective Information Integration from a DB Perspective
• Information Integration ChallengeInformation Integration Challenge– Given: data sources S_1, ..., S_k (DBMS, web sites, ...) and
user questions Q_1,...,Q_n that can be answered using the S_i
– Find: the answers to Q_1, ..., Q_n
• The Database Perspective: source = “database” The Database Perspective: source = “database” S_i has a schema (relational, XML, OO, ...) S_i can be queried define virtual (or materialized) integrated views V over
S_1,...,S_k using database query languages questions become queries Q_i against V(S_1,...,S_k)
• Why a Database Perspective?Why a Database Perspective?– scalability, efficiency, reusability (declarative queries), ...
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
DB mediation techniques
OntologiesKR formalisms
Model-Based Mediation
Information Integration LandscapeInformation Integration Landscape
conceptual distanceone-world multiple-worlds
conceptual complexity/depth
low
high
addallbook-buyer
BLAST
EcoCyc
Cyc
WordNet
GO
home-buyer24x7 consumer
NCBI UMLS
MIA Entrez
RiboWeb
Tambis
BioinformaticsGeoinformatics
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
What’s the Problem with XML & Complex Multiple-Worlds?What’s the Problem with XML & Complex Multiple-Worlds?
• XML is SyntaxXML is Syntax– DTDs talk about element nesting– XML Schema schemas give you data types – need anything else? => write comments!
• Domain Semantics is complex:Domain Semantics is complex:– implicit assumptions, hidden semantics sources seem unrelated to the non-expert
• Need Structure and Semantics beyond XML trees!Need Structure and Semantics beyond XML trees! employ richer OO models (UML, EER, ...) make domain semantics and “glue knowledge” explicit use ontologies to fix terminology and conceptualization avoid ambiguities by using formal semantics
XML-Based vs. Model-Based MediationXML-Based vs. Model-Based Mediation
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
Formalizing Glue Knowledge:Formalizing Glue Knowledge:Domain Map for Domain Map for SYNAPSESYNAPSE and and NCMIRNCMIR
Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)
Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)
Domain Map (DM)
Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).
Domain Expert Knowledge
DM in Description Logic
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
anatomical_structures ->>anatomical_structures ->>{AS:{AS:anatomical_structure[anatomical_structure[name->Anatomname->Anatom]]}}] ] , , % from PROLAB% from PROLAB
NAE:NAE:neuro_anatomic_entity[neuro_anatomic_entity[name->Anatom; name->Anatom; % from ANATOM% from ANATOM located_in->>{Brain_region}located_in->>{Brain_region}]], , AS..segments..featuresAS..segments..features[[name->Feature_name; value->Valuename->Feature_name; value->Value]]. .
• provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)
• provided by the domain expert and mediation engineer• deductive OO language (here: F-logic)
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
Process Maps with Process Maps with AbstractionsAbstractions and and ElaborationsElaborations::=> => From Terminological to “Procedural Glue”From Terminological to “Procedural Glue”
• What is the Erdoes number of person P?What is the Erdoes number of person P?
– 3
• Really? Why?Really? Why?– authority based: <VIP> said so
– faith based: don’t know but believe firmly
– query statement Q = ... derived it from DB
– query Q = ... derived it from DB and KB using derivation D logic-based systems often “come with explanations” ultimate goal: “computations as proofs”, “explanation-based computing”
XY
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
OntologiesOntologies
• So what is an Ontology?So what is an Ontology?– definition of things that are relevant to your application– representation of terminological knowledge (“TBox”)– explicit specification of a conceptualization– concept hierarchy (“is-a”)– further semantic relationships between concepts– abstractions of relational schemas, (E)ER, UML classes, XML
Schemas
• Examples:Examples:– NCMIR ANATOM– GO (Gene Ontology)– UMLS (Unified Medical Language System– CYC
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
• Assertional Knowledge (ABox)Assertional Knowledge (ABox)– the marked neuron in image 27
=> the concrete instances/individuals of the concepts/classes that your sources export
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
Description LogicDescription Logic
• DL definition of “Happy Father” DL definition of “Happy Father” (Example from Ian Horrocks, U Manchester, UK)(Example from Ian Horrocks, U Manchester, UK)
National Partnership of Advanced Computational Infrastructure San Diego Supercomputer Center
Some Open Database & Knowledge Some Open Database & Knowledge Representation IssuesRepresentation Issues
• Mix of Query Processing and ReasoningMix of Query Processing and Reasoning– FaCT description logic reasoner for DMs?– or reconcilation of DMs via argumentation-frameworks
(“games”) using well-founded and stable models of logic programs [ICDT97,PODS97,TCS00]
• Modeling “Process Knowledge” => Process MapsModeling “Process Knowledge” => Process Maps– formal semantics? (dynamic/temporal/Kripke models?)– executable semantics? (Statelog?)
• Graph Queries over DMs and PMsGraph Queries over DMs and PMs– expressible in F-logic [InfSystem98]– scalability? (UMLS Domain Map has millions of entries)