Top Banner
© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter Hass (IBM Almaden Research Center) Symbolic Query Processing
50

© ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

Jan 03, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

© ETH Zürich

Eric Lo

ETH Zuricha joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich),

Tamer Ozsu (U of Waterloo) and Peter Hass (IBM Almaden Research Center)

Symbolic Query Processing

Page 2: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

2ETH Zurich

Symbolic Query Processing

Treat all data as symbols (think of variables)

E.g., a1 represents any value under the domain of

attribute a

Table R and S are called symbolic relations

Page 3: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

3ETH Zurich

Background – Symbolic Execution 1/3

Borrow the concept from symbolic execution

A well known program verification technique

Represent values of program variables with

symbolic values instead of concrete data

Manipulate expressions based on those

symbolic values

Page 4: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

4ETH Zurich

Background – Symbolic Execution 2/3

1. minsalary = read_input();

2. bensalary = minsalary + 2000;

3. if (bensalary < 80000)

4. output “no kidding!”;

5. else

6. output “that’s right”;

Find a test case for path 1236

Symbolic execution – start:

1. minsalary = ben

2. bensalary = ben + 2000;

3. bensalary = ben + 2000

and !(bensalary < 80000);- ( )

Symbolic execution – end

Instantiate ():

ben = 90000 expected input

“that’s right” expected output

Page 5: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

5ETH Zurich

Background – Symbolic Execution 3/3

Has been research for > 20 years

Still have many limitations E.g., cannot handle highly complex software

However, many large software vendors still put

hope on this technique for program verifications E.g., Microsoft Research

No progress on database applications

involve an external database and SQL

Page 6: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

6ETH Zurich

SQP Applications

Extend program verification and symbolic

execution techniques to support database

applications

For DBMS testing focus of today

Page 7: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

7ETH Zurich

Symbolic Query Processing

Query manipulates data according

to different needs

R b=c S

Want the join results to have one

tuple? set c1=b1

Want the join results to have: four tuples

Zipf distribution (t1 joins more, t2 joins

less)?

b1

Page 8: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

8ETH Zurich

DBMS Testing

To test a DBMS, we generate a lot of test

databases and execute a lot of test queries

DBMS vendors are looking for a way to control

the intermediate results of a test query such that

we can test an individual component of a DBMS

under a particular test case

Page 9: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

9ETH Zurich

DBMS Testing Example

Test the accuracy of a cardinality

estimation component of a query

optimizer under

a multi-way hash join query

a two-way join query with aggregation

If we can make sure executing the

test query on the test database gives

expected answer

Page 10: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

10ETH Zurich

DBMS Testing The test query is given

Physical join ordering can be

fixed (by testers)

Evaluation algorithm (e.g.,

using hash-join) can be fixed

too

However, the size of the

intermediate results cannot be

fixed easily

Page 11: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

11ETH Zurich

DBMS Testing Problem

Guarantee that executing a test query on a test

database can obtain the desired intermediate query

results (e.g.,. output cardinality, data distribution)

Page 12: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

12ETH Zurich

DBMS Testing Problem A test case T is:

a parametric query Qp

with a set of constraints C on each

intermediate result

A good test database D means Qp (D) satisfies C

- if the set of parameters p is properly

instantiated

D covers test case TTest case T

Page 13: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

13ETH Zurich

Trial-and-error

Generate Database 3, 2, and 1 Using traditional database generators

such as IBM Test DB generator, MSR

DB generator, etc

Search for parameters

T2 is never covered

The database generation process

does not care about the test queries

Page 14: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

14ETH Zurich

Latest approach – Finding query parameters MSR realized this problem [TKDE06]

Given the test database + the test query Qp,

search parameter values for p such that Qp(D)

(almost) fit the cardinality requirements

defined on the test case

It is a NP-hard problem

Same as the previous approach, T2 is never

covered

Page 15: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

15ETH Zurich

QAGen – Query Aware test database Generator

Based on symbolic query

processing

We can control the output size

of each intermediate query

result (and even more)

Page 16: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

16ETH Zurich

QAGen – Generate a query-aware test database for each test case

Page 17: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

17ETH Zurich

QAGen overview

Page 18: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

18ETH Zurich

QAGen overview – Query Analyzer

Analyzer the query and assign the

knob to an operator

A knob is a parameter of an

operator to control the output

(e.g., output cardinality,

distribution)

A knob for an operator is not

always available for tuning

Page 19: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

19ETH Zurich

QAGen overview – Query Analyzer

A knob for an operator is not always available for tuning

join distribution? Yes

join distribution? No

Page 20: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

20ETH Zurich

QAGen overview – Query Analyzer

The available knob(s) for an operator depends on its input characteristics

Definition: pre-grouping data

Definition: non pre-grouping data

Page 21: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

21ETH Zurich

QAGen overview – Query Analyzer

Page 22: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

22ETH Zurich

Symbolic Query Engine and Symbolic Database

Page 23: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

23ETH Zurich

Symbolic Query Engine and Symbolic Database (SDB)

An SQL operator: Add predicates to a symbol

Replace a symbol with another other

symbol (e.g., joining)

E.g., SELECT a FROM R

WHERE a > p;

1 output

σa>p

<=p>p

Page 24: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

24ETH Zurich

Symbolic Query Engine and Symbolic Database (SDB)

How to physically store the

symbolic data?

Options: Implement a native symbolic database

Use relational database- How to represent “a1 > p”?

- Stores all predicates that are associated

with a symbol s in a separate relation called

PTable

<=p

>p

a1 a1>pa2 a2<=p

s Pred.

PTable

Page 25: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

25ETH Zurich

Data Instantiator

Page 26: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

26ETH Zurich

Data Instantiation

• Data instantiator uses a constraint solver:• Input: a (propositional) constraint (e.g., A + B > 50)• Output: any concrete values for the constraint (e.g., A=99, B=12)

Page 27: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

27ETH Zurich

Symbolic Query Engine

Page 28: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

28ETH Zurich

Symbolic Query Engine

Iterator-based open(), getNext(), close()

No naughty user Contradicting knob values

Page 29: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

29ETH Zurich

SQP – Table operator

Fill up the table with

symbols

Page 30: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

30ETH Zurich

SQP – σ operator

Page 31: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

31ETH Zurich

SQP – operator (with FK constraint)

Action: join key replacement

Page 32: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

32ETH Zurich

SQP – operator (with FK constraint)

Action: join key replacement

Page 33: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

33ETH Zurich

SQP – operator (with FK constraint)

When the input of the join is

pre-grouped, the world has

changed

It sometimes happen, e.g., 2-way join Base tables A, B and C with

foreign key relationships A B, B C

Page 34: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

34ETH Zurich

SQP – operator (with FK constraint)

Do not support join distribution (the knob is disabled by the

analyzer)

Controlling the output cardinality is a subset-sum problem

(weakly NP-hard)

Subset-sum has a

pseudo-polynomial time exact

solution using

dynamic programming

Page 35: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

35ETH Zurich

SQP – operator (with FK constraint)

Blocking

During open() Materialize Table S in a temporary relation

SELECT COUNT(k)

From S

GROUP BY k

Solve the subset-sum

Page 36: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

36ETH Zurich

SQP – χ operatorAction 1: Aggregation attribute replacement

• o_date3 o_date1• o_date4 o_date2

2nd output group (o_date2)

1st output group (o_date1)

Page 37: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

37ETH Zurich

SQP – χ operatorAction 2 (base case version): - Adding aggregation constraints to PTable, base case:

<l_price1, aggsum1 = l_price1+ l_price2 + l_price3+l_price4 + l_price7><l_price2, aggsum1 = l_price1+ l_price2 + l_price3+l_price4 + l_price7><l_price3, aggsum1 = ‘’> <l_price4, aggsum1=‘’><l_price7, aggsum1 = l_price1+ l_price2 + l_price3+l_price4 + l_price7><l_price5, aggsum2 = l_price5+ l_price6 + l_price8><l_price6, aggsum2 = l_price5+ l_price6 + l_price8> <l_price8, aggsum2 = ‘’>

Page 38: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

38ETH Zurich

SQP – χ operatorAction 2 (optimized version):- A constraint solver call is exponential to the size of predicates- Adding 2 aggregation constraints to PTable:

<l_price1, aggsum1 = l_price1 x 5><l_price5, aggsum2 = l_price5 x 3>

and do l_price replacement

Page 39: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

39ETH Zurich

Data Instantiation

Page 40: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

40ETH Zurich

Data Instantiation

Use a constraint solver to instantiate the symbolic

database for each symbolic relation r

for each tuple t for each symbol s

load the related predicates Pinstantiate Pcache P

Page 41: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

41ETH Zurich

Experiment 1 – Operator Performance

Study the performance (and scalability) of Individual operator during SQP The data instantiation phase

Use TPC-dbgen to generate 3 TPCH-DB 10M, 100M, 1G

Q8(TPCH-DB) to collect the intermediate results

R for each operator

QAGen(Q8, R) Q8 query aware database

Page 42: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

42ETH Zurich

Experiments – TPC-H Query 8

Page 43: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

43ETH Zurich

Experiment 1 – TPC-H Query 8

Page 44: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

44ETH Zurich

Experiment 2 – Effects of knob values

Use TPCH Q8

6 sets of knob values TPCH-Uniform, TPCH-Zipf Min-Uniform, Min-Zipf Max-Uniform, Max-Zipf

Page 45: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

45ETH Zurich

Experiment 2 – Effects of knob values

Page 46: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

46ETH Zurich

Experiment 3 – System Scalability

Page 47: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

47ETH Zurich

Related Work, Future Work, Conclusions

Reverse Query Processing (ICDE07) Given the result R, the query Q, reversely process Q

to generate D for function testing database applications, view maintenance, debugging SQL

Multiple SQL statements (to ACM TSE journal)

Page 48: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

48ETH Zurich

Page 49: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

49ETH Zurich

Current approach 2 – Stochastically generate many test queries

Based on a given test database,

RAGS/QGen generates many

valid SQL queries to test the

system

No guarantee that T1 can be

covered

Same as the previous approach,

T2 is never covered

Page 50: © ETH Zürich Eric Lo ETH Zurich a joint work with Carsten Binnig (U of Heidelberg), Donald Kossmann (ETH Zurich), Tamer Ozsu (U of Waterloo) and Peter.

50ETH Zurich

QAGen overview – Query Analyzer

Each knob combination

(e.g., output cardinality + join

distribution) for an operator

may have different ways to

implement it

The output is an knob-

annotated execution plan