Finding Plans from Proofs PDQ: Proof-driven Query Answering over Web-based Data Michael Benedikt, Julien Leblay, Efthymia Tsamoura - Oxford University Supported by EPSRC grant EP/H017690/1, Query-driven Data Acquisition from Web-based Data Sources Project homepage: http://www.cs.ox.ac.uk/projects/pdq/ Contact: [email protected] Example: online services for geographic information r 1 : Places(id, name, type, coordinates, ...) information about places (e.g. city, country, continent, lake, etc.) r 2 : BelongsTo(source, target) containment between places, "China belongs to Asia". r 3 : Countries(id, name, iso_code, ...) information about countries. φ 1 :Places(x, y, Country, ...) ↔ Countries(x, y, ...) Query for countries in Asia: not answerable without considering constraints. SELECT p 1 .name FROM BelongsTo AS bt JOIN Places AS p 1 ON p 1 .id=bt.source JOIN Places AS p 2 ON p 2 .id=bt.target WHERE p 1 .type = ’Country’ AND p 2 .name = ’Asia’ Pre-processing steps create auxiliary schema by adding relations InferredAccPlaces, InferredAccBelongsTo, InferredAccCountries, Accessible and constraints: φ’ 1 : InferredAccPlaces(x, y, Country, ...) ↔ InferredAccCountries(x, y,...) α 1 : Accessible(y)∧ Places(x, y , z, ...) → InferredAccPlaces(x, y, z, ...)∧ Accessible(x) ∧ Accessible(z)∧ ... α 2 : Accessible(x)∧ BelongsTo(x, y) → InferredAccBelongsTo(x, y)∧ Accessible(y) α 3 : Countries(x, y, z, ...) → InferredAccCountries(x, y, z, ...)∧ Accessible(x)∧ … α 4 : … Context Web data sources which may have: • overlapping information, • access restrictions. As a result: • There may be no web query plan for a given user query. • There may be many plans using different sources with different costs. Need to reason about Integrity constraints and access limitations. PDQ System for determining a query plan in the presence of web-based sources. i. constraint-aware ii. access-aware – abiding by access restrictions, iii. cost-aware – making use of any cost information Approach: generating query plans from proofs that a query is answerable. Input S: Schema 〈R, Σ〉, R set of relations with access methods (free, limited, inaccessible), Σ set of integrity constraints (TGDs). Q: Conjunctive query over S. f: Cost function on evaluation plans. Output P best : plan with minimal cost. Step 1: Pre-processing S augmented with new relations and axioms modelling the access restrictions. A goal query Q inferred is created based on the relations of the augmented schema. Q is grounded to form the initial state of the plan search. Step 2: Basic search step Each state is closed under firing of rules (blue arrows) other than accessibility axioms (denoted α i ). Every possible firing of accessibility axioms (red arrows) gives a new candidate state, inheriting all the facts of its ancestors. Step 3: Plans and costs Each new state gives a plan, to which a cost is assigned (orange circles). If state corresponds to a match with Q inferred and its plan’s cost is lower than the best so far, it becomes the new best state. Queries over Web Data Architecture & User Experience User interface for creating and editing schemas and queries Interactive exploration of the planner’s search space. Online execution of plans. User interface for creating and configuring planning sessions. Dashboard Architecture Runtime Planner InferredAccPlaces(id 2 , "Asia", c 2 , …), Accessible(id 2 ), Accessible(c 2 ), … T’ 1 ⇐ Places ⇐ ("") InferredAccPlaces(id 2 , "Asia", c 2 , …), Accessible(id 2 ), Accessible(c 2 ), … T 2 ⇐ Places ⇐("") T 3 := T 1 ⋈ T 2 InferredAccBelongsTo(id 1 , id 2 ) T 4 ⇐ BelongsTo ⇐ π source (T 3 ) T 5 := π name ( T 3 ⋈ T 4 ) Places(id 1 , name 1 , "Country", …), Places(id 2 , "Asia", …), BelongsTo(id 1 , id 2 ), Accessible("Asia"), Accessible("Country") Initial State Countries(id 1 , name 1 , c 1 , …) φ 1 Goal : Q inferred (name) ← InferredAccPlaces(id 1 , name 1 , "Country", …) ∧ InferredAccPlaces(id 2 , "Asia", …)∧ InferredAccBelongsTo(id 1 , id 2 ) φ‘ 1 α 1 α 1 α 2 α 3 InferredAccCountries(id 1 , name 1 , c 1 , …), Accessible(id 1 ), Accessible(name 1 ), Accessible(c 1 ) T 1 ⇐ Countries ⇐ Ø InferredAccPlaces(id 1 , name 1 , "Country", …) InferredAccCountries(id 1 , name 1 , c 1 , …), Accessible(id 1 ), Accessible(name 1 ), … T’ 2 ⇐ Countries ⇐ Ø T’ 3 := T’ 1 ⋈ T’ 2 InferredAccPlaces(id 1 , name 1 , "Country", …) φ‘ 1 α 3 InferredAccBelongsTo(id 1 , id 2 ) T’ 4 ⇐ BelongsTo ⇐ π source (T’ 3 ) T‘ 5 := π name ( T‘ 3 ⋈ T‘ 4 ) α 2 3 2 25 35 45 55 Models free access on Countries