Top Banner
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won You, Seung-won Hwang, Hwanjo Yu Information Sciences 178(2008)
15

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Dec 13, 2015

Download

Documents

Posy Norton
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Supporting personalized ranking over categorical attributes

Presenter : Lin, Shu-Han

Authors : Gae-won You, Seung-won Hwang, Hwanjo Yu

Information Sciences 178(2008)

Page 2: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Motivation Objective Methodology Experiments Conclusion Comments

Page 3: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

3

Categorical attributes’ problem of information retrieval's personal ranking

Categorical attributes do not have an inherent ordering. How to rank the relevant data by categorical attribute.

For example, how can we…

Find old female with the preference of soda drink.

Name age Gender FavoriteDrink

Buy

Jane 30 Female Coke Coke, Milk

Mary 25 Female Pepsi Coke, Pepsi

Tom 21 Male Water Milk, Water

Denny 26 Male Coke Milk, Juice

Tina 11 Female Pepsi Red Wine, Pepsi

Page 4: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Objectives

Enable a uniform ranked retrieval over a combination of categorical attributes and numerical attributes.

Support ranking of binary representation of categorical attribute Binary encoding

Sparsity

4

Name Female

Jane 1

Mary 1

Tom 0

Denny 0

Tina 1

Name Coke Pepsi Water

Jane 1 0 0

Mary 0 1 0

Tom 0 0 1

Name Coke Pepsi Water

Jane 1 0 1

Mary 1 1 0

Tom 1 0 1

Multi-valued attribute with bounded cardinality (item set, bc=2)Single-valued attribute

Page 5: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Overview

5

( 1)

( 2)

( 3)

Page 6: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Rank formulation

6F= 0.5*age + 3*female + …

Page 7: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Rank processing (TA)

A Simple example query:Find old female with the preference of soda drink.

Transform into

F= age + female

1. Candidate identification1. Sorted Access age and female

2. Find top-k sa(age) and sa(female), e.g., k=1, sa(age)={o1}; sa(female)={o2}

2. Candidate reduction1. O1=30+0

2. O2=25+1

3. O1 with the highest F score

3. Termination1. O1 !> F(30,1)=31 // upper bound score

2. Another round of sorted access to consider more candidates, e.g., sa(age)={O4}; sa(female)={O3}

7

Page 8: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Bitmap – binary encoding

F=v1+v2+v3+v4, k=2

1) K={}, C={1111}( Initailization)2) OID=excute(C)

3) OID={o4},|OID|>0,K={[o4,4]}

4) C={0111/1011/1101/1110} ( Expansion)5) K.count < k, Back to 2)

6) …

8

v1 v2 v3 v4

O1 1 0 1 1

O2 0 1 0 0

O3 0 1 1 1

O4 1 1 1 1

o5 1 0 1 1

Page 9: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Bitmap– sparsity

Single-valued attributeF=w1v1+w2v2+…+w6v6

ranked weightw1 w2 w3;w4 w5 w6≧ ≧ ≧ ≧for simple, all w=1,k=2

1) K={}, C={100.100.100} ( Initailization)2) OID=excute(C)

3) OID={o4},|OID|>0, K=OID={[o4,2]}

4) C={010.100.100/ 100.010.100/100.100.010} ( Expansion)5) K.count<k, Back to 2)

6) …

9

 Attribute1 Attribute2 Attribute2

v1 V2 V3 V4 V5 V6 V4 V5 V6

O1 1 0 0 0 0 1 0 1 0

O2 0 1 0 0 1 0 1 0 0

O3 0 1 0 1 0 0 0 1 0

O4 1 0 0 1 0 0 1 0 0

o5 0 0 1 0 0 1 0 1 0

Page 10: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Bitmap– sparsity

Multi-valued attribute with bounded cardinality

10

Attribute1 Attribute2

v1-1 V1-2 V1-3 V1-4 V2-1 V2-2 V2-3

O1 1 0 0 1 1 0 1

O2 0 1 0 1 1 1 0

O3 0 1 1 0 1 1 0

O4 1 0 0 1 1 0 1

o5 0 0 1 1 0 1 1

Page 11: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

11

• UCI’s sparsity of indicating variable

• 22% of dataset consist only the categorical attributes.

• 56% of combination of numerical & categorical attributes.

Page 12: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – synthetic data

12

Page 13: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments – real-life data

13

Page 14: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

14

Conclusions

This paper studies How to support rank formulation

Processing over data with categorical attributes

Instead of adopting existing numerical algorithms, develop a bitmap-based approach to Binary encoding Sparsity

Single-valued Multi-valued with bounded cardinality

Page 15: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

15

Comments

Advantage …

Drawback …

Application …