Faceted Ranking In Collaborative Tagging Systems
J. I. Orlicki12 P. Fierens2 J. I. Alvarez-Hamelin23
1Core Security Technologies
2ITBA
3CONICET
WEBIST 2009, Lisbon, Portugal
The Problem (Faceted Reputation)
I Which �ickr photographers are the best regarding a facet, i.e.tag set, { sea, portugal }?
I Nodes are users/channels, edges are favorites and tags areassociated to the favorited content.
Single Ranking (1/3)
I Basic approach, single rank and �ltering. Scales well.I Everything is biased to the richer nodes, tags don't in�uence
the ranking.I G goes out, but why is D worstly ranked than A regarding
{sea, portugal}? Is D better than C?
Single Ranking (2/3)
I Basic approach, single rank and �ltering. Scales well.
I Everything is biased to the richer nodes, tags don't in�uencethe ranking.
I G goes out, but why is D worstly ranked than A regarding{sea, portugal}? Is D better than C?
Single Ranking (3/3)
I Basic approach, single rank and �ltering. Scales well.
I Everything is biased to the richer nodes, tags don't in�uencethe ranking.
I G goes out, but why is D worstly ranked than A regarding{sea, portugal}? Is D better than C?
Edge-intersection, 1st gold standard (1/3)
I Filtering edges including the conjunction of tags.
I Adequate tag bias, slightly restrictive.
Edge-intersection, 1st gold standard (2/3)
I Filtering edges including the conjunction of tags.
I Adequate tag bias, slightly restrictive.
Edge-intersection, 1st gold standard (3/3)
I Filtering edges including the conjunction of tags.
I Adequate tag bias, slightly restrictive.
Node-intersection, 2nd gold standard (1/2)I Filtering edges including the disjunction of tags to rank.I Plus �ltering conjuntion of nodes involved in every tag edge
after ranking.I Adequate tag bias, slightly irrestrictive, possibly one tag
prevails over the other.
c
Node-intersection, 2nd gold standard (2/2)
I Filtering edges including the disjunction of tags to rank.
I Plus �ltering conjuntion of nodes involved in every tag edgeafter ranking.
I Adequate tag bias, slightly irrestrictive, possibly one tagprevails over the other.
The Scalability ProblemI The previous two algorithms don't scale for online queries.I Another possibility is computing singleton facets o�ine, and
later merge the results online.I O�ine time and spatial complexity will grow linearly on
#edges × #tags per edge. Scaling nicely.
0.1
1
10
100
1000
10000
100000
1 10 100 1000
# ed
ges
# tags
YouTubeFlickr
Singleton facets, computed o�ine (1/2)
I Singleton facet subgraphs used in ranking, after that only bestK users stored, where K is small.
Singleton facets, computed o�ine (2/2)I Singleton facet subgraphs used in ranking, after that only best
K users stored, where K is small.
Probability-product
I Inspired by the probability independence rule, multiplyPageRank probability of single tags.
A
B
C
D
E
F
sea
0.09
0.14
0.14
0.38
0.14
0.09
×
portugal
0.02
0.04
0.40
0.39
0.07
0.05
=
0.0018
0.0056
0.0560
0.1482
0.0098
0.0045
rank!
#6
#4
#2
#1
#3
#5
I Possible bias towards the heaviest tag, eclipsing the others.
Rank-sum
I Lowest accumulated ordinal/position sum gets the best ranks.
A
B
C
D
E
F
sea
#3
#2
#2
#1
#2
#3
+
portugal
#6
#5
#2
#1
#3
#4
=
9
7
4
2
5
7
rank!
#5
#4
#2
#1
#3
#4
I Avoids this kind of topic drift towards one of the tags.
Winners-intersection
I Top W (small) nodes per singleton facet are used to build anew small graph.
I W = 500 in experiments (W = 3 in example).
A
B
C
D
E
F
sea
#3
#2
#2
#1
#2
#3
∩
portugal
#2
#1
#3
= C
D
E
Conclusions
I Exist approximate and scalable methods for faceted ranking incollaborative tagging systems.
I Functional web prototype: Egg-O-Matic
http://egg-o-matic.itba.edu.ar
I Loose Ends
I Using weighted graphs.I Scienti�c cites dataset (real egos!).I Industrial-sized dataset (10^7 instead of 10^5 edges)