The Database and Info. Systems Lab. University of Illinois at Urbana-Champaign Boolean + Ranking: Querying a Database by K- Constrained Optimization Zhen Zhang Joint work with: Seung-won Hwang, Kevin C. Chang, Min Wang, Christian A. Lang, Yuan-chi Chang
30
Embed
Boolean + Ranking: Querying a Database by K-Constrained Optimization
Boolean + Ranking: Querying a Database by K-Constrained Optimization. Zhen Zhang Joint work with: Seung-won Hwang, Kevin C. Chang, Min Wang, Christian A. Lang, Yuan-chi Chang. Information retrieval. Traditional databases. Ranking query: Top 5 ranked by gpa. Boolean query: - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
The Database and Info. Systems Lab.University of Illinois at Urbana-Champaign
Boolean + Ranking: Querying a Database by K-Constrained Optimization
Zhen ZhangJoint work with: Seung-won Hwang, Kevin C. Chang, Min Wang, Christian A. Lang, Yuan-chi Chang
AIM 2
Many queries naturally combine Boolean and ranking
Information retrieval
Ranking query:
Top 5 ranked by gpa
+Database applications on Web
Traditional databases
Boolean query:
dept = CS and year = 2
Qualifying constraint
Quantifying function R: gpa
B: dept = CS and year = 2
Find top answers
AIM 3
Motivating scenarios
Data retrieval: Find houses in certain price range with good
price/sqrft ratio
Data analysis: Find products with highest sale increase in
consecutive years
Select h.address from House h
Where h.price ≤ 200k ν h.price ≥ 400k
Order by h.size/|h.price-300k| Limit 1
Select h.address from House h, CrimeRate c
Where h.price ≤ 200k ν h.price ≥ 400k and h.zipcode = c.zipcode
Order by h.size/|h.price-300k| *c.crimerate-1 Limit 10
Select itemid from Sales s1, Sales s2
Where s1.itemid = s2.itemid and s2.year – s1.year = 1
Order by s2.sale – s1.sale Limit 10
AIM 4
Boolean + Ranking form a coherent goal function
Boolean B + Ranking R = Goal function G
For a tuple t
G(t) = B(t)*R(t) = R(t) if B(t) is true
0 if B(t) is false(ie, lowest score)
AIM 5
The nature of Boolean + Ranking is K-constrained optimization query Optimize goal function G over database D
h.size/|h.price-300k|
[h.price ≤ 200k ν h.price ≥ 400k ]
Addr Zip Price Size
1. Oak park, Chicago 60644 600K 4500
2. Mattis, Champaign 61821 350K 2000
3. … 150K 1000
4. … 250K 2000
5. … 300K 3500
6. … 80K 500
Goal function G
Database D
D
G
AIM 6
What is the query evaluation mechanism?
Ranking query+Boolean query
How to answer?
AIM 7
Current techniques lack of global search mechanism
If evaluated as separate operators
If search by an overall goal function G as a ranking
function
Boolean query B
………
Ranking query R
Current techniques restrict G to be monotonic
Current techniques optimize only condition-by-condition
D Boolean query B
Ranking query R
D RBGoal function G
AIM 8
Our thesis: Evaluate query as its nature suggests!
Optimize G over D
Function optimization
of GDiscrete state
search over D
G
D
D
OPT*
AIM 9
We view compound index as discrete space
Addr Zip Price Size
1. Oak park, Chicago 60644 600K 4500
2. Mattis, Champaign 61821 350K 2000
3. … 150K 1000
4. … 250K 2000
5. … 300K 3500
6. … 80K 500
AIM 10
250
3000
350
100
1500
4000
4500
600
We view compound index as discrete space
250-6000-250
100-2500-100 350-600250-350
52 1………
b1
b3b2
b7b6
3000-45000-3000
1500-30000-1500 4000-60003000-4000
5 1………
a1
a6
a3a2
a7
size
Price (k)
1
52
3 4
6
AIM 11
250
3000
350
100
1500
4000
4500
600
We view compound index as discrete space
M11
M22 M32 M23 M33
M66 M77 M67 M76M55 M56M75
154 2
250-6000-250
100-2500-100 350-600250-350
52 1………
b1
b3b2
b7b6
3000-45000-3000
1500-30000-1500 4000-60003000-4000
5 1………
a1
a6
a3a2
a7
size
Price (k)
1
52
3 4
6
Mij =(ai, bj)
……
AIM 12
250
3000
350
100
1500
4000
4500
600
We view compound index as discrete space
M11
M22 M32 M23 M33
M66 M77 M67 M76M55 M56M75
154 2
250-6000-250
100-2500-100 350-600250-350
52 1………
b1
b3b2
b7b6
3000-45000-3000
1500-30000-1500 4000-60003000-4000
5 1………
a1
a6
a3a2
a7
size
Price (k)
1
52
3 4
6
Mij =(ai, bj)
conceptually, combined space
…
AIM 13
How to perform the search in the space?
What is the search mechanism? How to conceptually view the index space of
D for search How to guide the search?
How to use function G to focus the search
AIM 14
Challenge 1: What is the search mechanism?
AIM 15
We encode as A* because it’s optimal
What A* is: Finding the shortest path Why we choose: Completeness and optimality with
proper heuristics Complete: guarantee to find shortest path Optimal: visit least number of nodes
origin
destination
5
2
96
3
5
1
1
7
AIM 16
Encoding our problem into shortest path is challenging
How to encode: a tuple a path? score of tuple distance of path?
K-constrained optimization
Find a tuple with maximal score
Shortest path
Find a path with minimal distance
AIM 17
Therefore, we encode K-constrained opt. as: How to encode a tuple to a path?
Adding a virtual target t* only reachable through tuples How to encode maximal tuple with minimal path?
Quality of path depends solely on the tuple it passes by For tuple state t
D(t, t*) = - G(t) For two states r, u
D(r, u) = 0
M55
M11
M22 M32 M23 M33
M66 M77 M67 M76M75 M56
154 2
t*
0
0
0
0
- G(4)- G(1)
0
0
…
AIM 18
Challenge 2: How to guide the search?
AIM 19
We use function opt. to sketch the landscape of G Function optimization measures quality of states Function optimization enables:
1. How to define heuristics? 2. How to configure space? 3. Where to start the search?
Problem Study K-constrained optimization queries as boolean +
ranking Abstraction
Encode K-constrained optimization into shortest path problem
Framework Develop OPT* to process K-constrained optimization
AIM 29
Thank you!
Questions?
AIM 30
How to implement function optimization? How do we compare with RankSQL? If bottom-up is always better, why consider top-down Computing upper bound for each region is costly Random vs. sequential I/O Assuming indices on every attribute? Materialize state space for every query? Exponential number of states when attribute grows
Not every attribute has index on it Selective choose the right index (attribute) to use We do perform experiment to study how the system scale with
#attr Your algorithm is not optimal because you change the