From Data Integration To From Data Integration To Semantic Mediation: Semantic Mediation: Addressing Heterogeneities in Data Addressing Heterogeneities in Data Bertram Lud Bertram Lud ä ä scher scher [email protected]Knowledge-Based Information Systems Lab San Diego Supercomputer Center and Department of Computer Science & Engineering University of California, San Diego
45
Embed
From Data Integration To Semantic Mediation: Addressing Heterogeneities in Data
From Data Integration To Semantic Mediation: Addressing Heterogeneities in Data. Bertram Lud ä scher [email protected] Knowledge-Based Information Systems Lab San Diego Supercomputer Center and Department of Computer Science & Engineering University of California, San Diego. Outline. - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
From Data Integration To Semantic From Data Integration To Semantic Mediation:Mediation:
Addressing Heterogeneities in DataAddressing Heterogeneities in Data
From Data Integration To Semantic From Data Integration To Semantic Mediation:Mediation:
Addressing Heterogeneities in DataAddressing Heterogeneities in Data
““gluing” together gluing” together multiple data sources multiple data sources
bridging information bridging information and knowledge gaps and knowledge gaps computationallycomputationally
9
Information Integration from a DB Perspective Information Integration from a DB Perspective
• Information Integration ProblemInformation Integration Problem– GivenGiven: data sources S: data sources S11, ..., S, ..., Skk (DBMS, web sites, ...) and user (DBMS, web sites, ...) and user
questions Qquestions Q11,..., Q,..., Qnn that can be answered using the S that can be answered using the Sii
– FindFind: the answers to Q: the answers to Q11, ..., Q, ..., Qnn
• The Database Perspective: source = “database” The Database Perspective: source = “database” SSii has a has a schemaschema (relational, XML, OO, ...) (relational, XML, OO, ...)
SSii can be queriedcan be queried
define virtual (or materialized) define virtual (or materialized) integrated viewsintegrated views V V over over SS11 ,..., S ,..., Skk using database query languages using database query languages (SQL, XQuery,...)(SQL, XQuery,...)
questions become queriesquestions become queries Q Qii against V(S against V(S11,..., S,..., Skk))
10
OutlineOutline
1.1. Information Integration from a Database PerspectiveInformation Integration from a Database Perspective
2.2. XML-Based Data Integration XML-Based Data Integration
... in their wonderful book called <title>SemWeb Tractat </title> by B. Schatz and T.B. Lee, the authors show how ...
... in their wonderful book called <title>SemWeb Tractat</title> by <author>B. Schatz</author> and <author> T.B. Lee</author>, the authors show how ...
Query Q ( G (SQuery Q ( G (S11,..., S,..., Skk) )) )
13
Some Challenges in XML-Based Integration ...Some Challenges in XML-Based Integration ...• XML Query/Transformation LanguagesXML Query/Transformation Languages
– DB communityDB community: QLs for semistructured data, e.g., : QLs for semistructured data, e.g., TSIMMIS/MSL, Lorel, Yatl, ..., TSIMMIS/MSL, Lorel, Yatl, ..., Florid/F-logicFlorid/F-logic [InfSystems98][InfSystems98]
……New Challenges in (XML-Based) MediationNew Challenges in (XML-Based) Mediation
• Global-As-View (GAV)Global-As-View (GAV)– user query Quser query Q global relations Gglobal relations G Q(G) Q(G) – global relations Gglobal relations G source relations S G(S) source relations S G(S)– challenge: compute answers challenge: compute answers Q(G(V(S)))Q(G(V(S))) withoutwithout computing all of computing all of VV and and GG query rewriting (with limited source capabilities)query rewriting (with limited source capabilities): : Q’(S) = Q(G)Q’(S) = Q(G)
• Local-As-View (LAV) Local-As-View (LAV) – user query Q user query Q global relations Gglobal relations G Q(G)Q(G)– source relations S source relations S global relations G global relations G S(G)S(G)– challenge: “reverse/rewrite rules” from challenge: “reverse/rewrite rules” from S(G) S(G) to some to some G’(S)G’(S) answering queries using views: answering queries using views: equivalent rewritings may not existequivalent rewritings may not exist find maximally contained ones: find maximally contained ones: Q’(G’(S)) Q’(G’(S)) Q(G) Q(G)
• Inter(CS)disciplinary research needed: DB Inter(CS)disciplinary research needed: DB FP FP LP LP – GAV/LAV GAV/LAV view (un)folding view (un)folding Clark’s completion, resolution, factoring Clark’s completion, resolution, factoring
17
Querying XML Streams: A New FrontierQuerying XML Streams: A New Frontier
• New applications for stream-based XML processing: New applications for stream-based XML processing: – Continuous, real-time data streams (wireless sensor networks, …)Continuous, real-time data streams (wireless sensor networks, …)
– Data / message transformation in Web services (SOAP, RMI, processing …)Data / message transformation in Web services (SOAP, RMI, processing …)
• … … leading to a new XML querying & transformation paradigm:leading to a new XML querying & transformation paradigm:– how to execute (some) XML queries & transformations on very large (infinite) how to execute (some) XML queries & transformations on very large (infinite)
data streams using only limited memorydata streams using only limited memory
– XML stream machine (XSM): extended XML transducers with buffersXML stream machine (XSM): extended XML transducers with buffers
XQueryXQuery XSM networkXSM network
XSMs clearly outperform tree-based approaches XSMs clearly outperform tree-based approaches on streamable queries (100x over Xalan) on streamable queries (100x over Xalan) [A Transducer-Based XML Query Processor, Ludäscher [A Transducer-Based XML Query Processor, Ludäscher Mukhopadhyay, Papakonstantinou, VLDB’02]Mukhopadhyay, Papakonstantinou, VLDB’02]
18
OutlineOutline
1.1. Information Integration from a Database PerspectiveInformation Integration from a Database Perspective
2.2. XML-Based Data Integration XML-Based Data Integration
What’s the Problem with XML & Complex Multiple-Worlds?What’s the Problem with XML & Complex Multiple-Worlds?
• XML is XML is SyntaxSyntax– ... for labeled ordered trees... for labeled ordered trees
– ... all ... all semantics lies outsidesemantics lies outside of XML of XML• XML DTDs => tags + nestingXML DTDs => tags + nesting
• XML Schema => DTDs + data modeling XML Schema => DTDs + data modeling
• need anything else? => need anything else? => write comments!write comments!
• Domain Semantics is Domain Semantics is ComplexComplex::– implicitimplicit assumptions, assumptions, hiddenhidden semantics semantics sources sources seem unrelatedseem unrelated to the non-expert to the non-expert
• Need Structure and Semantics Need Structure and Semantics beyond treesbeyond trees!! employ employ richer OO modelsricher OO models make domain make domain semanticssemantics and “ and “glue knowledgeglue knowledge” ” explicitexplicit use use ontologiesontologies to fix terminology and conceptualization to fix terminology and conceptualization avoid ambiguities by using avoid ambiguities by using KR and formal semanticsKR and formal semantics
22
DB mediation techniques
OntologiesKR formalisms
Model-Based Mediation
Information Integration LandscapeInformation Integration Landscape
conceptual distanceone-world multiple-worlds
conceptual complexity/depth
low
high
addallbook-buyer
BLAST
EcoCyc
Cyc
WordNet
GO
home-buyer24x7 consumer
UMLS
MIA Entrez
RiboWeb
Tambis
BioinformaticsGeo-, Ecoinformatics
XML-Based vs. Model-Based MediationXML-Based vs. Model-Based Mediation
X X isaisa C, Y C, Y isaisa C, C, BLASTBLAST(X,Y,S),(X,Y,S), S>0.8S>0.8 homology, lubhomology, lub (X,Y,[produces,B,increased_in]) := (X,Y,[produces,B,increased_in]) :=
X X produces produces B, B B, B increased_in increased_in YY. . rule-basedrule-based
• Object Model OM(Object Model OM(SS):):– complex objects (frames), class hierarchy, OO constraintscomplex objects (frames), class hierarchy, OO constraints
• Knowledge Base KB(Knowledge Base KB(SS):):– explicit representation of (“hidden”) source semantics explicit representation of (“hidden”) source semantics
– logic ruleslogic rules over OM( over OM(SS))
• Contextualization CON(Contextualization CON(SS):):– situatesituate OM( OM(SS) data using “glue maps” (ontologies):) data using “glue maps” (ontologies): domain maps DMs domain maps DMs
• Knowledge-Based Querying and Browsing (runtime):Knowledge-Based Querying and Browsing (runtime):– mediator composes the user query Q with the IVDmediator composes the user query Q with the IVD
... rewrites (Q o IVD), sends subqueries to sources... rewrites (Q o IVD), sends subqueries to sources
... post-processes returned results (e.g., ... post-processes returned results (e.g., situate in contextsituate in context))
Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)
Domain Map = labeled graph with concepts ("classes") and roles ("associations")• additional semantics: expressed as logic rules (F-logic)
Domain Map (DM)
Purkinje cells and Pyramidal cells have dendritesthat have higher-order branches that contain spines.Dendritic spines are ion (calcium) regulating components.Spines have ion binding proteins. Neurotransmissioninvolves ionic activity (release). Ion-binding proteinscontrol ion activity (propagation) in a cell. Ion-regulatingcomponents of cells affect ionic activity (release).
Domain Expert Knowledge
DM in Description Logic
Formalizing Glue Knowledge:Formalizing Glue Knowledge:Domain Map for Domain Map for SYNAPSESYNAPSE and and NCMIRNCMIR
push selectionpush selection@SENSELAB@SENSELAB: X1 := : X1 := selectselect targets of “output from targets of “output from parallel fiber”parallel fiber” ;;
determine source contextdetermine source context@MEDIATOR@MEDIATOR: X2 := : X2 := ““find and situatefind and situate”” X1 in ANATOM X1 in ANATOM Domain MapDomain Map;;
compute region of interest (here: downward closure)compute region of interest (here: downward closure)@MEDIATOR@MEDIATOR: X3 := : X3 := subregion-closuresubregion-closure(X2);(X2);
• Mix of Query Processing and ReasoningMix of Query Processing and Reasoning– GAV & LAV with semantic query optimization (NIH BIRN, NSF GEON)GAV & LAV with semantic query optimization (NIH BIRN, NSF GEON)– description logic reasoner for DMs (FaCT) ?description logic reasoner for DMs (FaCT) ?– reconciliation of conflicting DMs via reconciliation of conflicting DMs via argumentation-frameworksargumentation-frameworks (“games”) (“games”)
using using well-foundedwell-founded and and stable modelsstable models of logic programs [ICDT97, PODS97, of logic programs [ICDT97, PODS97, TCS00, TODS02]TCS00, TODS02]
• Graph Queries over DMs and PMsGraph Queries over DMs and PMs– expressible in F-logic [InfSystem98]expressible in F-logic [InfSystem98]– scalability? (UMLS Domain Map has millions of entries)scalability? (UMLS Domain Map has millions of entries)
• How to incorporate “procedural features”?How to incorporate “procedural features”?– Bioinformatics, Ecoinformatics, … => sources = DBs + analytical tools + …Bioinformatics, Ecoinformatics, … => sources = DBs + analytical tools + … scientific workflow planning and management (“promoter identification scientific workflow planning and management (“promoter identification
workflow” for DOE SciDAC, NSF/ITR SEEK)workflow” for DOE SciDAC, NSF/ITR SEEK)
35
Process Maps with Process Maps with AbstractionsAbstractions and and ElaborationsElaborations:: From Terminological to From Terminological to Procedural GlueProcedural Glue
Some References Some References • Model-Based Mediation:Model-Based Mediation:
– A Model-Based Mediator System for Scientific Data ManagementA Model-Based Mediator System for Scientific Data Management, B. Ludäscher, A. Gupta, M. , B. Ludäscher, A. Gupta, M. Martone, Martone, Bioinformatics: Managing Scientific DataBioinformatics: Managing Scientific Data , Lacroix, Critchlow (eds), Morgan , Lacroix, Critchlow (eds), Morgan Kaufmann, to appear, 2003Kaufmann, to appear, 2003
– Model-Based Mediation with Domain MapsModel-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. Martone, , B. Ludäscher, A. Gupta, M. E. Martone, 17th Intl. 17th Intl. Conference on Data EngineeringConference on Data Engineering (ICDE’01)(ICDE’01), Heidelberg, Germany, IEEE Computer Society, , Heidelberg, Germany, IEEE Computer Society, 2001. 2001.
– Managing Managing SemistructuredSemistructured Data with FLORID: A Deductive Object-Oriented Perspective Data with FLORID: A Deductive Object-Oriented Perspective, B. , B. Ludäscher, R. Himmeröder, G. Lausen, W. May, C. Schlepphorst, Ludäscher, R. Himmeröder, G. Lausen, W. May, C. Schlepphorst, Information Systems, 23(8), Special Issue onInformation Systems, 23(8), Special Issue on Semistructured Semistructured Data Data, 1998. , 1998.
• XML-Based Mediation:XML-Based Mediation:– VXD/Lazy MediatorsVXD/Lazy Mediators: : Navigation-Driven Evaluation of Virtual Mediated ViewsNavigation-Driven Evaluation of Virtual Mediated Views, B. Ludäscher, , B. Ludäscher,
Y. Papakonstantinou, P. Velikhov, Y. Papakonstantinou, P. Velikhov, Intl. Conference on Extending Database TechnologyIntl. Conference on Extending Database Technology (EDBT’00)(EDBT’00), Konstanz, Germany, LNCS 1777, Springer, 2000. , Konstanz, Germany, LNCS 1777, Springer, 2000.
– XML StreamsXML Streams: : A Transducer-Based XML Query ProcessorA Transducer-Based XML Query Processor, B. Ludäscher, P. Mukhopadhyay, , B. Ludäscher, P. Mukhopadhyay, Y. Papakonstantinou, Y. Papakonstantinou, Intl. Conference on Very Large Databases Intl. Conference on Very Large Databases (VLDB’02), Hong Kong, 2002(VLDB’02), Hong Kong, 2002
45
Knowledge Representation:Knowledge Representation:Relating Theory to the World via Formal ModelsRelating Theory to the World via Formal Models
John F. Sowa, Knowledge Representation: Logical, Philosophical, and Computational Foundations