Matching Users and Items Across Domains to Improve the Recommendation Quality Chung-Yi Li, Shou-De Lin [email protected] [email protected] Department of Computer Science and Information Engineering, National Taiwan University 1
Matching Users and Items Across Domains
to Improve the Recommendation Quality
Chung-Yi Li, Shou-De Lin
Department of Computer Science
and Information Engineering,
National Taiwan University
1
Motivation
Lack of data is a serious concern in building a
recommender system, in particular for newly
established services.
Can we leverage the information from other
domains to improve the quality of a recommender
system?
2
2
Problem Definition
Given: Two homogeneous rating matrices
They model the same type of preference.
Decent portion of overlap in users and in items.
π1
Target Rating Matrix
β« β« β«
π2
Source Rating Matrix
β« β« β«
Challenge:
The mapping of users is unknown,
and so is the mapping of items.
Goals:
1. Identify the user mapping and
item mapping.
2. Use the identified mappings to
boost the recommendation
performance.
3
3
Why This Problem Is Challenging
When item correspondence is known, the problem is
much easier
Define user similarity. If the similarity is large, they are
likely to be the same users. [Narayanan 2008]
In our case, both sides are unknown
no clear solution yet
πΉ1
β« β« β«
πΉ2
β« β« β«
4
4
Basic Idea
low rank assumption and factorization models
5
π ? π ?? π ? ππ ? π ?? π ? ππ ? π ?
R1
π ? π ?? π ? ππ ? π ?? π ? ππ ? π ?
R2
1 2 3 41 1 1 1
Γ
1 01 11 30 10 9
n1 n2 n3 n4m1m2m3m4m5
0 90 11 31 11 0
Γ4 3 2 11 1 1 1
n4 n3 n2 n1
m5m4m3m2m1
= =1 2 3 41 1 1 1
Γ
1 01 11 30 10 9
n1 n2 n3 n4m1m2m3m4m5
β18 β9β2 β1β9 β5β5 β3β3 β2
Γ2 1 0 β1β5 β3 β1 1
n4 n3 n2 n1
m5m4m3m2m1
?
?
5
Solve G in
ππ and ππ: rating matrices (partially observed)
Guser and Gitem: correspondence matrices
1. Latent Space Matching π1 β πuser π2πitemπ
ππ (full): low-rank approximation of Ri
Less accurate
2. Matching Refinement π1 β πuser π2πitemπ
More accurate, but harder to solve
π1 π2β πuser πitemπ
M1ΓN1 M2ΓN2M1ΓM2 N2ΓN1
A Two-Stage Model to Find the Matching
? O
O ?
O ?
? O
1. Latent Space Matching
2. Matching Refinement
Rough Matching Result
Final Matching Result
6
6
We want to solve G from
Obstacle: π1 and π2 are not sparse hard to compute/store
Solution: Represent ππ using user and item latent factors
Next challenge: the latent factor representation must be unique
Regular matrix factorization is not applicable.
Solution: Singular Value Decomposition
Singular values are invariant under permutation.
Stage 1: Latent Space Matching
1. Latent Space Matching
7
7
How can we perform SVD on a
Partially Observed Matrix?
In MF, we solve
Thus, π = πππ
In SVD, we want π = ππππ
From P, Q to U, D, V
This transformation operation can be done efficiently
1. Latent Space Matching
π ππ
π πππ
= ππ πππ ππ ππ
πππ· πππ( )
= ππ· ππππx πx πX
T=( )( )
8
8
We want to solve G from
Now we know how to get
Thus
Since SVD is unique, we can separate user and item sides:
π1 = π1π1π1π and π2 = π2π2π2
π
Matching in Latent Space
Same
subproblemS: sign matrix
(K by K, diagonal, -1 or 1)
1. Latent Space Matching
9
π1π1π1π β πuserπ2π2π2
Tπitem
π1 β πuser π2πitemπ
9
Solving
π1 π2β πuser
(M1Γ K) (M1Γ M2) (M2Γ K)
1. When π is given, to solve π: nearest neighbor searchβ’ only enforce row constraints on G.
2. To Solve S: Greedy Search
β’ Iteratively try Skk
πΊ (sign matrix):K by K, diagonal, +1 or -1
1. Latent Space Matching
0 1 0
10
10
Matching Refinement:
π1 β πuser π2πitemπ
More accurate but harder to solve.
Obtain good initialization and reduced
search space from latent space matching.
Solve Guser and Gitem alternatingly.
The objective value always decreases
& converges.
π2
1. Latent Space Matching
2. Matching Refinement
Rough Matching Result
Final Matching Result
11
11
Goals
1. Identify the user mapping and item mapping
2. Then, use the identified mappings to boost
recommendation performance
1. Latent Space Matching
2. Matching Refinement
Rough Matching Result
Final Matching Result
12
12
Matched latent factors are constrained to be similar
Transferring Imperfect Matching to
Predict Ratings13
13
Experiment Setup
Disjoint Split Overlap Split Contained Split Subset Split
training set of R1training set of R2
Partial Split
users
items
β’ Yahoo! Music Dataset
14
14
Accuracy and Mean Average Precision: The higher the better15
Rating Prediction (Root Mean Square Error)
RMSE: the lower the better
16
16
(root mean square error)
Conclusion
It is possible to identify user or item
correspondence unsupervisedly based on
homogeneous rating data
Even with imperfect matching, out model can still
improve the recommendation accuracy.
Questions?
18
17