Promotion Analysis in Multi- Dimensional Space VLDB 2009 Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1 1 University of Illinois, Urbana-Champaign, Urbana, IL, USA 2 Microsoft Research, Redmond, WA, USA Presenter : Chun Kit Chui (Kit) Supervisor : Dr. Ben Kao
104
Embed
Promotion Analysis in Multi-Dimensional Space VLDB 2009 Tianyi Wu Tianyi Wu 1 Dong Xin 2 Qiaozhu Mei 2 Jiawei Han 1Dong Xin Jiawei Han 1 University of.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Promotion Analysis in Multi-Dimensional Space
VLDB 2009Tianyi Wu1 Dong Xin2 Qiaozhu Mei 2 Jiawei Han1 1University of Illinois, Urbana-Champaign, Urbana, IL, USA 2Microsoft Research, Redmond, WA, USA
Presenter : Chun Kit Chui (Kit)Supervisor : Dr. Ben Kao
Promotion has been playing a key role in marketing…
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Book sales database
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
What is the rank of our book sales among other retailers?
Book sales database
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
What is the rank of our book sales among other retailers?
We ranked the 3rd among all book retailers !
Retailer #Sales
A 61
B 180
C 80
Book sales database
Global aggregate result
E.g. To compute the aggregate value of this cell, we project all tuples with Retailer = “A” and sum up their sales.
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
Book sales database
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Retailer Category Readership #Sales
A Sci & Tech College students 42
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
Book sales database
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Retailer Category Readership #Sales
A Sci & Tech College students 42
B Sci & Tech College students 33
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
Book sales database
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Retailer Category Readership #Sales
A Sci & Tech College students 42
B Sci & Tech College students 33
C Sci & Tech College students 28
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!
Book sales database
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Retailer Category Readership #Sales
A Sci & Tech College students 42
B Sci & Tech College students 33
C Sci & Tech College students 28
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
Full space
Subspaces
Global rank
Local rank
May not be interesting.
Globally low-ranked object may becomes prominent in some subspaces.
Compare with ALL objects in ALL aspects.
Compare with objects in certain area.
Low cost
High cost
Single SQL.
A naïve approach is to compute rank for ALL possible subspaces and return the interesting ones.
Promotion query
We ranked the 3rd among all book retailers !
What is the rank of our book sales among other retailers?
We are the top-1 bookseller in the { Readership = College Students, Category = Sci & Tech } segment !!!
Introduction
Promotion has been playing a key role in marketing…
Manager of retailer A
Retailer Category Readership Year #Sales
A Sci & Tech College students 2009 20
A Sci & Tech University students 2009 5
A Comedy University students 2009 9
B Sci & Tech College students 2009 20
B Sci & Tech University students 2009 7
B Fiction University students 2009 5
B Comedy Kindergarten 2009 20
B Comedy College students 2009 10
C Sci & Tech College students 2009 12
… … … … …
A Sci & Tech College students 2010 22
A Sci & Tech University students 2010 4
A Comedy College students 2010 1
B Sci & Tech College students 2010 13
B Sci & Tech University students 2010 30
B Fiction University students 2010 5
B Comedy Kindergarten 2010 20
B Comedy College students 2010 10
C Sci & Tech College students 2010 16
C Comedy Kindergarten 2010 52
Retailer #Sales
A 61
B 180
C 80
Retailer Category Readership #Sales
A Sci & Tech College students 42
B Sci & Tech College students 33
C Sci & Tech College students 28
Discover the most interesting subspaces where the our brand (Retailer A) is highly ranked among other competitors.
Object dimension
Subspace dimensions
Scoredimension
Target object
Introduction
Person PromotionAn NBA manager would like to promote
Michael Jordan as a superstar.3rd all time leading scorer.Further analysis…
Top scorer in the guard position. Top scorer on the Chicago Bulls team. 11 individual years’ scoring champion.
Player Position Team Year Game … Score
Michael Jordan
Guard Chicago Bulls
1998 vs N.Y. Knicks
… 33
Michael Jordan
Guard Chicago Bulls
1998 vs Utah Jazz
… 15
Scottie Pippen
Small Forward
Chicago Bulls
1998 vs Utah Jazz
… 18
… … … … … … …
Target object
Object dimension
Subspace dimensions
Scoredimension
Local rank in some subspaces
Introduction
The promotion query problemGiven an object (e.g. a product, a person)Goal: Discover the most interesting
subspaces where the object is highly ranked.
Problem Definition
Promotiveness measure
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
Object dimension
Subspace dimensions
Scoredimension
Problem Definition
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
{2007}
{NY,2007} {WA,2007}
Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
All possible subspaces.
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
{2007}
{NY,2007} {WA,2007}
Note that the target object T1 only appears in year = 2008, therefore the subspace {2007} can be pruned.
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
Target subspaces of T1.
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
Object SUM(Score)
T1 0.5 + 0.8 = 1.3
T2 1 + 1 = 2
T3 0.3 + 0.6 + 0.7 = 1.6
We project all tuples of T1 into this cell and sum up their scores.
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 1.3Rank (T1) = 1st / 3
Object SUM(Score)
T1 0.5 + 0.8 = 1.3
T2 1 + 1 = 2
T3 0.3 + 0.6 + 0.7 = 1.6
Object Year SUM(Score)
T1 2008 0.5 + 0.8 = 1.3
T2 2008 1
T3 2008 0.7
We project all tuples of T1 with Year = “2008” into this cell and sum up their scores.
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 0.5Rank (T1) = 1st / 1
Object SUM(Score)
T1 0.5 + 0.8 = 1.3
T2 1 + 1 = 2
T3 0.3 + 0.6 + 0.7 = 1.6
Object Year SUM(Score)
T1 2008 0.5 + 0.8 = 1.3
T2 2008 1
T3 2008 0.7
Object Location Year SUM(Score)
T1 NY 2008 0.5
T2 NY 2008 NO Tuples !
T3 NY 2008 NO Tuples !
We project all tuples of T1 with Location = “NY” and Year = “2008” into this cell and sum up their scores.
SUM (T1) = 1.3Rank (T1) = 1st / 3
Problem Definition Object Location Year Score
T1 NY 2008 0.5
T1 WA 2008 0.8
T2 WA 2007 1.0
T2 WA 2008 1.0
T3 NY 2007 0.3
T3 WA 2007 0.6
T3 WA 2008 0.7
Query Target Object : T1 Aggregation measure : SUM Goal : Discover the most interesting subspaces where T1 is
highly ranked.
{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}
SUM (T1) = 1.3Rank (T1) = 3rd / 3
SUM (T1) = 1.3Rank (T1) = 1st / 3
SUM (T1) = 0.5Rank (T1) = 1st / 1
Object SUM(Score)
T1 0.5 + 0.8 = 1.3
T2 1 + 1 = 2
T3 0.3 + 0.6 + 0.7 = 1.6
Object Year SUM(Score)
T1 2008 0.5 + 0.8 = 1.3
T2 2008 1
T3 2008 0.7
Object Location Year SUM(Score)
T1 NY 2008 0.5
T2 NY 2008 NO Tuples !
T3 NY 2008 NO Tuples !
T1 ranks 1st in both {2008} and {NY,2008}, which one is more interesting?
Problem Definition
Promotiveness of a subspace S : a class of measures to quantify how well a subspace S can promote the target object T. Rank of the target object, Rank(S,T)
Higher rank -> more promotive.
Significance of the subspace, Sig(S) More significant subspace (e.g. more objects) -> more promotive.
Current top-1 result : S2(1/2)The promotive score of S4 should be less than or equal to 1/3, which is less than the current top-1 promotive score (1/2), so S4 can be pruned!!!!
Subspace pruning
Object pruning
Key motivation Try to prune the objects by obtaining an
upper bound of the aggregate score of unseen objects.
Unseen objects with upper bound smaller than the smallest aggregate score of target object can be pruned.
Object pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object
Can you tell the exact rank of t7 in S4? No!The aggregate score of t7 is 0.4, there are at least 3 objects with aggregate value larger than t7!
Experimental evaluations
Settings
ImplementationPentium 3GHz processor2GB of memory160G hard diskWinXP/ Microsoft Visual C# 2008 (in-memory)
Dataset DBLP DatasetTPC-H
Algorithms
PromoRankThe basic query execution framework.
PromoRank++The basic query execution framework
with subspace pruning and object pruning.
PromoCube
DBLP Dataset
Subspace dimensions Conference (2,506) Year (50) Database (boolean) Data mining (boolean) Information retrieval (boolean) Machine learning (boolean)
Object dimension: Author(450K) Score dimension: Paper count Base tuples : 1.76M
DBLP DatasetThe running time increases as R increases. It is because the pruning threshold is determined by the current top-R’s aggregate score. The threshold becomes looser as R becomes larger.
PromoCube performs extremely well when R is small.It is because in such case, the PromoCube can directly return the result using O(1) lookup time.
DBLP DatasetN
um
be
r o
f su
bsp
ace
ag
gre
ga
tion
s
PromoCube performs extremely well when R is small.It is because in such case, the PromoCube can directly return the result using O(1) lookup time.
The running time increases as R increases. It is because the pruning threshold is determined by the current top-R’s aggregate score. The threshold becomes looser as R becomes larger.
Object dimension: l suppkey (10,000) Score dimension: l_extendedprice (ranges from 901.00
to 104949.50) Base tuples: 6,001,215
TPC-H
The gap between PromoRank and PromoRank++ is not large when number of dimensions is small.This is because the total number of target subspace itself is quite small, less chance to perform pruning that exploit parent-child relationship.
PromoCube is increasingly faster w.r.t. number of tuples.
This is because the actual aggregation and partition cost saving of PromoCube is much larger.PromoCube prunes subspace before any aggregation happens, but PromoCube++ prunes subspaces during aggregation process.
Runtime increases when dimensionality increases.This is because there will be more target subspaces when there are more dimensions.
TPC-H
All algorithm’s running time is faster when there are more objects.It is because more objects, less number of target subspaces for each object.With other parameters unchanged, if there are more objects, each object will appear in less tuples, causing less number of target subspaces for each object .
Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality
implies more subspaces. With the same number of tuples, the chance of two
tuples having the same dimension values becomes lower.
Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.
TPC-H{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}{NY,2007} {WA,2007}
{2007}
{*}
{NY}
{NY,2007}
{2007}
{NY,2008}
{2008}
Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality
implies more subspaces. With the same number of tuples, the chance of two
tuples having the same dimension value becomes lower.
Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.
TPC-H{*}
{NY} {WA} {2008}
{NY,2008} {WA,2008}{NY,2007} {WA,2007}
{2007}
Object Location Year Score
T1 NY 2007 0.6
T2 NY 2008 0.4
Object Location Year Score
T1 NY 2007 0.6
T2 WA 2008 0.4{*}
{NY}
{NY,2007}
{2007}
{NY,2008}
{2008}
Both PromoRank++ and PromoCube favor large cardinalities, because… With other parameters unchanged, larger cardinality
implies more subspaces. With the same number of tuples, the chance of two
objects having the same dimension value becomes lower (sparse).
Therefore, it is more likely that the aggregate scores would be equal across parent-child subspaces, thereby providing a tighter lower bound for Rank.
0.6 0.4
1
0.6 0.4
0.6 0.4
Conclusion
Introduced the promotion analysis problem.
Presented a basic query execution framework.
Proposed two pruning techniques and the Promotion Cube for efficient query processing.
The End
Appendix
Object pruning
Subspace Objects in the cuboid and their aggregate scores Aggregate of target object