Top Banner
CUBELSI: AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University of Hong Kong {bbi, sdlee, kao, ckcheng}@cs.hku.hk
32

CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

Dec 16, 2015

Download

Documents

Sylvia Morgan
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

CUBELSI: AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS

Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng

The University of Hong Kong

{bbi, sdlee, kao, ckcheng}@cs.hku.hk

Page 2: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

SOCIAL TAGGING SYSTEMS

Tags

2

Page 3: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

SEARCH IN SOCIAL TAGGING SYSTEMS

Two Problems:

1.Tag Inconsistency2.A Multitude of Aspects

3

Page 4: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

TAG INCONSISTENCY

car? automobile?

car, automobile

car, Benz

car

car, automobileautomobile

Audi

car

4

Page 5: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

A MULTITUDE OF ASPECTS

moon,worm moon,Perigee moon,lunar

cherry blossoms,Sakura,cherry

blossom

Nikon,astrophotograph

y,D40 5

Page 6: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

SOLUTION

LSI(Latent

Semantic Indexing)

CubeLSI

SVD(Singular

Value Decomposition

)

Tucker Decompositio

n

Taggers

Analyzing semantic relations among tags by taking into account the role of taggers

6

Page 7: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

PROPOSED RANKING FRAMEWORK

CubeLSI Algorithm:Input: tag assignmentsOutput: pairwise tag semantic distances

7

Page 8: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

CONCEPT DISTILLATION

Tags with pairwise distances

mp3

music

photo

photos

video

movie

photophotos

musicmp3

videomovie

Concepts/Clusters8

Page 9: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

PROPOSED RANKING FRAMEWORK

9

Page 10: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

BAG-OF-CONCEPTS REPRESENTATION

Distilled Concepts

10

Page 11: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

PROPOSED RANKING FRAMEWORK

11

Page 12: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

PROPOSED RANKING FRAMEWORK

12

Page 13: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

RANKING SEARCH RESULTS

x

y

z

Query

Search results are sorted in descending order of their Cosine similarity scores.

Resource 1

Resource 2

13

Page 14: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

PROPOSED RANKING FRAMEWORK

CubeLSI Algorithm:Input: tag assignmentsOutput: pairwise tag semantic distances

14

Page 15: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

CUBELSI

Tensor

Second-order Tensor

Third-order Tensor

15

Page 16: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

REPRESENTING DATA AS A THIRD-ORDER TENSOR

16

Page 17: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

PAIRWISE TAG DISTANCE

Two sources of noise:

1. may not result from user considering tag to be irrelevant to 2.Tagging is a casual and ad-hoc activity

17

Page 18: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

TUCKER DECOMPOSITION

Tag

Resource

User

1 2 3Tag

Resource

User

core tensor

original tensor

purified tensor

factor matrices

Purified Tag Distance:

18

Page 19: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

SPACE & TIME COSTS

Last.fm dataset (3897 users, 3326 tags, 2849 resources)

36.9 billion entries

11.1 million entries

Computing the Frobenius-norm for EACH tag pair requires 11.1 million subtractions, squaring and additions.

There are a total of 5.5 million tag pairs for 3326 tags !

The amount of computations needed would be prohibitively huge!!!

19

Page 20: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

• The new formula depends only on core tensor and factor matrix• There is no need to compute any entries of purified tensor• The relatively low dimensions of and implies much fewer

computations needed

SHORT-CUT TO EVALUATING

impractical

is a matrix that can be readily computed from the core tensor

20

Page 21: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

EXPERIMENTAL RESULTS

Dataset statistics#users #tags #resource

s#records

21

Page 22: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

SAMPLE TAG CLUSTERS

22

Page 23: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

OTHER RANKING METHODS

Freq: Resources are ranked in descending order of # of users who annotate the resource with query tags.

BOW (Bag-of-Words) : Use IR; each resource is a document and each tag is a word.

FolkRank [Hotho et al. 2006]: A modified version of PageRank. It follows the assumption that votes cast by important users with important tags would make the annotated resources important.

23

Page 24: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

OTHER RANKING METHODS

LSI: This method projects the third-order tensor onto a 2D tag-resource matrix, and then applies traditional LSI on the tag-resource matrix using SVD.

CubeSim: This method is similar to CubeLSI except that it computes the distance between two tags and directly from the original tensor by

24

Page 25: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

RANKING QUALITY

Evaluation Metric Normalized Discounted Cumulative Gain (NDCG) NDCG rewards more heavily to relevant

resources that are top-ranked than those that appear lower down in the list.

where denotes that the metric is evaluated only on the resources that are ranked top in the list, is the relevance level of the resource ranked in the list, and is a normalization factor that is chosen so that the optimal ranking’s NDCG score is 1.

16 users, each

proposing 8 queries

25

Page 26: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

RANKING QUALITY (Delicious)

26

Page 27: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

RANKING QUALITY (Bibsonomy)

27

Page 28: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

RANKING QUALITY (Last.fm)

28

Page 29: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

EFFICIENCY

Offline: pre-processing times (hours)

Online: query processing times (seconds)

Storage size:

29

Page 30: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

RELATED WORK

Matrix Factorization Our work differs from MF in two ways:

We aim at capturing semantic relations among tags. We deal with a three-dimensional tensor.

Hotho et al. 2006 Our work differs from FolkRank in that our approach

performs offline semantic analysis, which allows online query processing to be efficiently done.

Wu et al. 2006 Our approach is technically different from that work.

Bi et al. 2009 Our approach scales to large social tagging databases,

which the previous work is unable to handle.30

Page 31: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

CONCLUSIONS

We introduce a novel tag-based framework for searching resources in social tagging systems.

We study the role of taggers in search quality for social tagging systems.

We propose CubeLSI, which is a 3D extension of LSI, for semantic analysis over the third-order tensor of resources, taggers, and tags.

We present a comprehensive empirical evaluation of CubeLSI against a number of ranking methods on real datasets.

31

Page 32: CUBELSI : AN EFFECTIVE AND EFFICIENT METHOD FOR SEARCHING RESOURCES IN SOCIAL TAGGING SYSTEMS Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng The University.

THANK YOU!

Bin Bi, Sau Dan Lee, Ben Kao, Reynold Cheng

The University of Hong Kong

{bbi, sdlee, kao, ckcheng}@cs.hku.hk32