Top Banner
Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bomb
21

Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Jan 08, 2018

Download

Documents

Tracy Barrett

 Automated Test Data generation  Based on database constraints, and SQL query ▪ Agenda [Chays et al., STVR04], a tool which generates test cases for database applications which additionally uses user fed heuristics  Ensuring query result is not empty ▪ Reverse Query Processing [Binning et al., ICDE07] takes desired query output and generates relation instances ▪ Handle a subset of Select/Project/Join/GroupBy queries  None of the above guarantee anything about detecting errors in SQL queries  Question: How do you model SQL errors?  Answer: Query Mutation 3
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Bhanu Pratap GuptaDevang ViraS. Sudarshan

Dept. of Computer Science and Engineering, IIT Bombay

Page 2: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Complex SQL queries hard to get right Question: How to check if an SQL

query is correct? Formal verification is not applicable since

we do not have a separate specification and an implementation

State of the art solution: Generate test databases and check if the query gives the intended result

2

Page 3: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Automated Test Data generation Based on database constraints, and SQL query

▪ Agenda [Chays et al., STVR04], a tool which generates test cases for database applications which additionally uses user fed heuristics

Ensuring query result is not empty▪ Reverse Query Processing [Binning et al., ICDE07]

takes desired query output and generates relation instances

▪ Handle a subset of Select/Project/Join/GroupBy queries None of the above guarantee anything about detecting

errors in SQL queries Question: How do you model SQL errors? Answer: Query Mutation

3

Page 4: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Mutant: Variation of the given query Mutations model common programming errors, like

▪ Join used instead of outerjoin (or vice versa)▪ Join/selection condition errors

▪ < vs. <=, missing or extra condition▪ Wrong aggregate (min vs. max)

Mutant may be the intended query

4

Page 5: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Traditional use of mutation testing has been to check coverage of dataset Generate mutants of the original program by modifying the

program in a controlled manner A dataset kills a mutant if query and the mutant give

different results on the dataset A dataset is considered complete if it can kill all non-

equivalent mutants of the given query Prior work:

Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05] defined a class of SQL query mutations

Shortcoming: do not address test data generation Our goal: generated dataset for testing query

Test dataset and query result on the dataset are shown to human, who verifies that the query result is what is expected given this dataset

Note that we do not need to actually generate and execute mutants

5

Page 6: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Address the problem of test data generation for killing non-equivalent mutants Equivalent Mutants: r(A,B) s(B,C) and r(A,B)

s(B,C) where r.B is a foreign key to s, and is not null will always produce the same resultset

Define class of: Join/outerjoin mutations Selection predicate mutations

Algorithm for test data generation that kills all non-equivalent mutants in above class Under some simplifying assumptions (given in the

paper) With the guarantee that generated datasets are small

and realistic, to aid in human verification of results6

Page 7: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Join type mutations: An occurrence of a join operator ( , , , ) is replaced by one of the other join operators

Defining join mutations in SQL is complicated by the absence of a particular join order SELECT * FROM a,b,c WHERE (a.x = b.x) and (b.x =

c.x) We consider all relational algebra expressions

(trees) equivalent (under inner join reordering) to the given SQL query

We consider join type mutations to single join nodes in each tree above

7

Page 8: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Case I: Mutation at root node, with no foreign key constraints Schema: r(A), s(B)

To kill this mutant: ensure that for an r tuple there is no matching s tuple

Generated test case: r(A)={(1)}; s(B)={} Basic idea:

(a) run query on given database, (b) from result extract matching tuples for r and s(c) delete s tuple to ensure no matching tuple for r

8

Page 9: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Case II: Extra join above mutated node Schema: r(A,B), s(C,D), t(E)

To kill this mutant we must ensure that for an r tuple there is no matching s tuple, but there is a matching t tuple

Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)}

9

Page 10: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Given join expression on relations r1, r2, …, rn Create dataset where all relations have a set of

matching tuples For each relation ri, generate a dataset where rest of

relations match, but ri is empty▪ Unless making ri empty makes join graph disconnected

Above procedure kills all join type mutations of given inner join tree Outer joins complicate picture when attributes are

projected out▪ May have to make more than one ri empty at a time

Foreign keys may prevent making some ri empty

10

Page 11: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Case III: Mutation at root node with foreign key constraints and selection on right side Schema: r(A), s(B,C) Foreign key: r.A →s.B

To kill this mutant we must create an s tuple which matches with the r tuple on the foreign key reference, but which has s.C ≠ 4 Generated test case: r(A)={(2)}; s(B,C)={(2,5)}

Notion of valid nullable pattern defined in paper specifies which relations can be made null/non-matching, given foreign key constraints and join graph

11

Page 12: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Implemented using Java and PostgreSQL Creates datasets by extracting and

modifying tuples from given database Currently handles join type mutation and

selection predicate mutation For creating a merged dataset

▪ Tuples having same values for join attributes must be blocked from being inserted again

Handling selection predicate mutation▪ Eg. to distinguish r.A < 3 and r.A <= 3 we

generate tuples with r.A = 2 and 3

12

Page 13: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Ongoing work : Synthetic data generation taking

database and query constraints into account which is non trivial▪ Idea (from RQP [Binning et al ICDE07]): Use a

model checker to generate data▪ Under implementation using CVC3

Extend the technique to handle aggregations and sub-queries

Future work: data generation for application code with multiple queries

13

Page 14: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Questions

Page 15: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Problem: is Q equivalent to a mutant Q‘ can be reduced to query containment and vice versa in polynomial time

The Chase algorithm can be used to generate datasets to show that Q and Q' are not equivalent (for SPJ queries and several extensions) such a dataset would kill the mutant Q‘ limited work on outerjoin containment data

generation However we don't want to enumerate each

mutant and generate separate datasets too expensive

15

Page 16: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Under the following conditions we can generate merged datasets: Tuples having same values for join attributes

must be blocked from being inserted again The query must not contain any equality

selection on an unique key The result of the query must contain one or

more attributes which together form an unique key for any relation

Also attributes from the result forming an unique key must be guaranteed to be non-null in the result

16

Page 17: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Consider the three relations : Student(name, deptcode, progcode), Department(deptcode, deptname) Program(progcode, progname)

And a query: SELECT rollno, name, deptname, progname

FROM student s INNER JOIN department d ON s.deptcode=d.deptcode INNER JOIN program p ON s.progcode=p.progcode

17

Page 18: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Generate mutants by mutating join operator

of a single node for all above trees

Query Tree 1 Query Tree 2 Query Tree 3

18

Page 19: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Generated data shows : A student (Devang) with valid program and department A student (Abhijeet) with invalid department A student (Sandeep) with invalid program A student (Aditya) with invalid program invalid program

and department A program (PhD) with no student A department (Mechanical) with no student

Deptcode

Deptname

CS Computer CH ChemicalME Mechanical

Progcode

Progname

0 B.Tech1 M.Tech2 PhD

Rollno Name progcode

deptcode

501 Devang 1 CS401 Abhijee

t0 CE

701 Sandeep

5 CH

101 Aditya 4 MA

DepartmentProgram Student

19

Page 20: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Generated data shows : A student (Devang) with valid program and

department A program (B.Tech) with no student A department (Electrical) with no student

Deptcode

Deptname

CS Computer EE Electrical

Progcode

Progname

0 B.Tech1 M.Tech

Rollno Name progcode

deptcode

501 Devang 1 CS

DepartmentProgram Student

Foreign Keys are:Student.progcode → Program.progcodeStudent.deptcode → Department.deptcode

20

Page 21: Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.

Case of no foreign keys

21