Convolutional Neural Network based Recommender System Deep Learning based Recommender System (Zhang et al. 2017) Presented by Jiin Seo November 28, 2017
Convolutional Neural Networkbased Recommender System
Deep Learning based Recommender System(Zhang et al. 2017)
Presented by Jiin Seo
November 28, 2017
Outline
1. Attention based CNN
2. Personalized CNN (CNN-PerMLP)
3. Deep Coperative Neural Network (DeepCoNN)
4. Convolutional Matrix Factorization (ConvMF)
5. CNN for Image Feature Extraction(VPOI)
6. CNN for Audio Feature Extraction(WMF)
7. CNN for Text Feature Extraction
Outline
1. Attention based CNN
2. Personalized CNN (CNN-PerMLP)
3. Deep Coperative Neural Network (DeepCoNN)
4. Convolutional Matrix Factorization (ConvMF)
5. CNN for Image Feature Extraction(VPOI)
6. CNN for Audio Feature Extraction(WMF)
7. CNN for Text Feature Extraction
1. Attention based CNN
Attention based CNN (Gong et al. 2016)
• Hashtag recommendation in microblog
• Multi-class classification problem
• (Global channel + Local channel) ⇒ Convolutional layer
• We adopt Attention Mechanism to scan input microblog and selecttrigger word. It chooses to focus only on a small subset of the wordsfor each tag.
1. Attention based CNN
Architecture
Figure: The architecture of the attention-based Convolutional Neural Network
1. Attention based CNN
Notations
• Given an input microblog m with length n,we take wi ∈ Rd for each word in the microblog.(d : dim. of the word vector)
• wi :i+j : the concatenation of words wi ,wi+1, · · · ,wi+j
1. Attention based CNN
Local Attention Channel . 1) Local attention layer
• Attention layer generates a seq. of trigger words (wi , · · · ,wj) from asmall window (window size: h)
• The score of the central word (w(2i+h−1)/2) is
s(2i+h−1)/2 = g(Ml ∗wi :i+h + b)
g : non-linear function, Ml ∈ Rh×d : parameter matrix, b: bias,
• Extract the trigger words.
wi =
{wi if wi > η,0 if wi ≤ η , 0 ≤ i ≤ n
• The threshold : η = δ ·min{s}+ (1− δ) ·max{s} ,s : seq. of scores
1. Attention based CNN
Local Attention Channel . 2) Folding layer
• Abstract the features of the trigger words(w).
z = g(Ml ∗ folding(w) + b)
where g : non-linear function, Ml ∈ Rd×r and b ∈ Rr
• folding : the sum operation for each dimension of all the trigger words
fi =∑j
wj ,i
• Output : fixed-length vector,which represents the embeddings of the trigger words w.
2. Attention based CNN
Global Channel . 1) Convolutional Layer
• All the words for each tag will be encoded.
• We use a CNN architecture to model whole microblog.
• Abstract the features.
zi = g(Mg ·wi :i+l−1 + b)
g : non-linear function, Mg ∈ Rl×d (l : window size) and b ∈ R• We Operate this filter on all combinations of the word in microblog{w1:l ,w2:l+1, · · · ,wn−l+1:}
• A map of feature :
z = [z1, z2, z · · · , zn−l+1]
1. Attention based CNN
Global Channel . 2) Pooling Layer
• A max-overtime pooling operation is applied.
• We can extract the most important feature for each feature map.
• To obtain multiple features,we use multiple filters with varying window sizes in the model.
• Output : fixed length vector,which represents the embeddings of the input microblog m .
1. Attention based CNN
Combining the Outputs of both channels
• Outputs of the local attention channel and the global channel.⇒ A simple convolutional layer
• Combine the information as follows :
h = tanh(M ∗ v[hg;hl] + b)
hg,hl : the feature vectors extracted from global and local channel,M : filter matrix for the convolutional operation, b : bias
1. Attention based CNN
Training
• Parameters : Θ = {W,Ml,Mg}W : words embeddings, Ml,Mg : the parameters of both channels
• Training Objective ftn :
J =∑
(m,a)∈D
−log(a | m)
,where D is the training corpus, a is the hashtag for microblog m.
• To minimize the objective ftn, we use AdaDelta.
1. Attention based CNN
Hashtag Recommendation
• Given an unlabelled dataset,Train our model on training data, and save the model which has thebest performance on the validate dataset.
• Encode the microblog through the local attention channel and globalchannel by the saved model.
• Combine the features generated from both channels.
• The scores of the hashtagsfor the d-th microblog by fully connected layer:
P(yd = a | hd ;β) =exp(β(a)Thd)∑j∈A exp(β(j)Thd)
A : set of candidate hashtags, β : parameters, h : feature vector
• Rank the hashtags for each microblog . And recommend thetop-ranked hashtags
1. Attention based CNN
Reult
• Attention based CNN outperforms state of-the-art methods.
• The trigger words methods could improve the performance.
• The multiple channels can achieve better performance than a singlechannel.
Outline
1. Attention based CNN
2. Personalized CNN (CNN-PerMLP)
3. Deep Coperative Neural Network (DeepCoNN)
4. Convolutional Matrix Factorization (ConvMF)
5. CNN for Image Feature Extraction(VPOI)
6. CNN for Audio Feature Extraction(WMF)
7. CNN for Text Feature Extraction
2. Personalized CNN (CNN-PerMLP)
Personalized CNN for Tag Recommendation (Nguyen et al. 2016)
• Image tag recommender system
• Personalized Content-Aware Tag Recommender suggests a ranked listof relevant tags.(Tu,i )
• CNN-PerMLP employs
• Convolution Neural Networks.• Personalized Fully-Connected Layer• Multilayer Perceptron as the Predictor
2. CNN-PerMLP
Architecture
Figure: The architecture of CNN-PerMLP
2. CNN-PerMLP
Notaions
• U : users, I : imagess, T : tags
• A = (au,i ,t) ∈ R|U|×|I |×|T|,
au,i ,t =
{1 if u assigns the tag t to the image i ,0 o.w.
• S := {(u, i , t) | (au,i ,t) ∈ A ∧ (au,i ,t) = 1} : the observed tagging set
• Tu,i := {t ∈ T | (u, i , t) ∈ S} : the set of relevant tags of user-image
• PS := {(u, i) | ∃t ∈ T : (u, i , t) ∈ S} : all observed posts
2. CNN-PerMLP
Notaions
• The collection of all RGB squared images :R = {Ri ,q | Ri ,q ∈ Rd×d×3 ∧ i ∈ I ∧ q ∈ Q}zi ∈ Rm : the visual features of the i-th image Ri ,Q :the patches
• The final scores of tags are calculated as follows :
y(u, i , t) = avgRi,q,,q∈Q
y ′(u,Ri ,q,, t)
• Top-K tag list :Tu,i := arg max
t∈T,|Tu,i |=K
y(u, i , t)
2. CNN-PerMLP
Convolution Neural Networks
• The visual features are achieved by passing a patch q of the image ithrough the CNN feature extractor.
• Convolutional layer
τkij = ϕ(bk +
p1∑a=1
(Wka ∗ ξa)ij)
τk : k-th feature map, ξa : a-th feature map* : convolutional operator, ϕ : activation ftnWk ∈ Rp1 × Rp2 × Rp2 , bk : weights and biases of filters for τk
• Max pooling operator
τkij = maxa,b
(ξk)a,b : k − th feature map
,
• Output :zqi = fcnn(Rq
i ) : Rd×d×3 → Rm
2. CNN-PerMLP
Personalized Fully-Connected Layer
• To personalize visual features of an image, the user’s information(ID)has to be combined with the features from the CNN .
• This layer captures the interaction between the user and each visualfeature.
• Input :
• zqi : the visual feature vector• κ: = {0, 1}|u| : the sparse vector (user’s features)
• Output (User-aware features) :
ψj(u, zqi ) = ϕ(bj + wper
j · (zqi )j + Vjκu)
wper ∈ Rm : the weights of the visual features ,V ∈ Rm×|U| : the weights of the user features,ϕ : activation ftn
2. CNN-PerMLP
Multilayer Perceptron as the Predictor
• To compute the scores of the tags, MLP is adopted.
• The network has one hidden layer.
• The Neural Network Score ftn :
y ′(u,Rq,i , , tj) = ϕ(wout
j · ϕ(Whiddenψ + bhidden) + boutj )
Whidden, bhidden : the weights and the biases of the hidden layerwoutj ∈Wout , bout : the weights and the biases of the output layer
2. CNN-PerMLP
Optimization• We adapt the Bayesian Personalized Ranking (BPR) optimization
criterion.• BPR finds the model’s parameters that maximize the difference
between the relevant and irrelevant tags.
Figure: The algorithm of BPR
Outline
1. Attention based CNN
2. Personalized CNN (CNN-PerMLP)
3. Deep Coperative Neural Network (DeepCoNN)
4. Convolutional Matrix Factorization (ConvMF)
5. CNN for Image Feature Extraction(VPOI)
6. CNN for Audio Feature Extraction(WMF)
7. CNN for Text Feature Extraction
3. Deep Coperative Neural Network (DeepCoNN)
DeepCoNN (Zheng et al. 2017)
• Joint Deep Modeling of Users and Items using Reviews
• DeepCoNN adopt two parallel CNNs to model User behaviors andItem properties from review texts
• In the shared layer, FM(Factorization Machine) is applied to capturetheir interactions for rating prediction.
3. DeepCoNN
DeepCoNN
• DeepCoNN alleviates the sparsity problem and enhances the modelinterpretability.
• DeepCoNN represents review text using pre-trained a wordembedding-technique.
3. DeepCoNN
Architecture
Figure: The architecture of DeepCoNN
3. DeepCoNN
Notations
• Each tuple (u, i , rui ,wui ) denotes a review written by user u for item iwith rating rui and text review of wui .
• A network for users (Netu) : user reviews −→ xu(rates)
• A network for items (Neti ) : item reviews −→ yi (rates)
• We focus on (Netu) in detail. The same process is applied for (Neti ).
3. DeepCoNN
Word Representation(Look-up Layer)
• A word embedding f : M→ Rn
• Matrix of word vector by user u :
Vu1:n = φ(du
1 )⊕ φ(du2 )⊕ · · · ⊕ φ(du
n )
duk : k-th word of singe document du
1:n, consisting of n wordsφ(du
k ) ∈ Rc : look-up ftn⊕ : the concatenation operator
• The order of words is preserved in matrix Vu1:n.
3. DeepCoNN
CNN Layers . 1) Convolution Layer
• Convolution layer consists of m neurons.
• Each neuron j in the convolutional layer uses filter Kj ∈ Rc×t .
• Convolution operation :
zi = f (Vu1:n ∗Kj + bj)
*: convolutional operatorf (x) = max{0, x}: activation ftn (ReLu)
3. DeepCoNN
CNN Layers . 2) Max Pooling Layer
• The most important feature of each feature map has been captured.
• Convolutional results are reduced to a fixed size vector.
oj = max{z1, z2, · · · , zn−t+1}
• Output vector of convolutional Layer, using multi-filters:
O = {o1, o2, · · · , on1}, n1 : # of kernel in the convolutional layer
3. DeepCoNN
CNN Layers . 3) Fully Connected Layer
• Output (rates for user u) :
xu = f (W ×O + g), xu ∈ Rn2×1
W: Weight matrix
• yi can be obtained with the same process.
• The dropout strategy has also been applied, to prevent overfitting,
3. DeepCoNN
Shared Layer
• This layer Maps the features of users and items into the same featurespace.
• Concatenate xu and yi into a single vector.
z = (xu, yi )
• Factorization Machine (FM) models all nested variable interactions inz.
• The Objective ftn :
J = w0 +
|z|∑i=1
wi zi +
|z|∑i=1
|z|∑j=i+1
< vi , vj > zi zj ,
w0, wi : the global bias and the strength of the i-th variable in z
< vi , vj >=∑|z|
f=1< ˆvi ,f , ˆvj ,f >
Outline
1. Attention based CNN
2. Personalized CNN (CNN-PerMLP)
3. Deep Coperative Neural Network (DeepCoNN)
4. Convolutional Matrix Factorization (ConvMF)
5. CNN for Image Feature Extraction(VPOI)
6. CNN for Audio Feature Extraction(WMF)
7. CNN for Text Feature Extraction
4. Convolutional Matrix Factorization (ConvMF)
ConvMF (Kim et al. 2016)
• Document context-aware recommendation model
• CNN (Convolutional neural network)+ PMF (Probabilistic matrix factorization)
• In the shared layer, FM(Factorization Machine) is applied to capturetheir interactions for rating prediction.
4. ConvMF
Architecture
Figure: The architecture of ConvMF
4. ConvMF
Convolutional neural network(CNN)
• Convolution layer for generating local features
• Pooling layer for representing data as more concise representation
4. ConvMF
Matrix Factorization(MF)
• Goal : Find latent models of users and items on a shared latent space .
• R ∈ RN×M : rating matrix (N users, M items)
• ui ∈ Rk , vj ∈ Rk : latent models of user i and item j
• The rating rij of user i on item j is approximated by the inner-productof corresponding latent models.
rij ≈ rij = uTi vj
• Minimize a Loss ftn :
L =N∑i
M∑j
Iij(rij − uTi vj)2 + λu
N∑i
‖ ui ‖2 +λv
M∑j
‖ vj ‖2
4. ConvMF
Probabilistic Model of ConvMF
• Goal : Find user and item latent models U ∈ Rk × N,V ∈ Rk ×M.
• UTV reconstructs the rating matrix R.
• Condi. dist. over observed ratings is given by
p(R | U,V, σ2) =N∏i
M∏j
N(rij | uTi vj , σ2)Ii j
, where N(x | µ, σ2) is p.d.f. of Normail dist.
• User latent models with zero-mean Gaussian prior are
p(U | σ2U) =N∏i
N(ui | 0, σUI )
4. ConvMF
Probabilistic Model of ConvMF
• Item latent model is generated from three variables:• internal weights W in CNN• Xj representing the document of item j• Gaussian noise
• Item latent model
vj = cnn(W,Xj) + εj
εj ∼ N(o, σ2VI )
• For each wk in W, we place zero-mean Gaussian prior are
p(W | σ2W) =∏k
N(wk | 0, σ2W)
• Condi. dist. over item latent model
p(V |W,X, σ2V) =M∏j
N(vj | cnn(W,Xj), σ2VI )
,where X is the set of description documents of items
4. ConvMF
CNN
• Goal : Generating document latent vectors from documents of items
• 1) embedding layer, 2) convolution layer, 3) pooling layer, and4) output layer
Figure: CNN architecture for ConvMF
4. ConvMF
CNN . 1) Embedding Layer
• A raw document −→ A dense numeric matrix
• Document : seq. of l words
• Document matrix :
D =
| | |· · · wi−1 wi wi+1 · · ·
| | |
,D ∈ Rp×l (1)
4. ConvMF
CNN . 2) Convolutional Layer
• Convolutional Layer extracts contextual features.
• Contextual feature is extracted by j-th shared weight Wjc ∈ Rp×ws :
c ji = f (Wjc ∗D(:,i :(i+ws−1)) + bjc)
* : convolution operator , ws: window size.f : activation ftn(ReLU)
• Contextual feature vector with Wjc
c j = [c j1, cj2, · · · , c
ji , · · · , c
jl−ws+1] ∈ Rl−ws+1
• We use multiple shared weights to capture multiple types ofcontextual features.
Wjc , j = 1, 2, · · · , nc
4. ConvMF
CNN . 3) Pooling Layer
• Max-pooling
df = [max(c1),max(c2), · · · ,max(c j), · · · ,max(cnc )]
4. ConvMF
CNN . 4) Output Layer
• We project df → on k-dim space of user and item latent models.
• Document latent vector using nonlinear projection:
s = tanh(Wf2{tanh(Wf1df + bf1)}+ bf2)
,where Wf1 ∈ Rf×nc ,Wf2 ∈ Rk×f are projection matricesand bf1 ∈ Rf , bf2 ∈ Rk are a bias vectors for Wf1 ,Wf2 with s ∈ Rk
• Output(document latent vector of item j) :
sj = cnn(W,Xj)
Xj : a raw document of item j , W : all the weight and bias variables
4. ConvMF
Optimization
• To optimize the variables , we use maximum a posteriori (MAP)estimation.
maxU,V,W
p(U,V,W | R,X, σ2, σ2U, σ2V, σ2W)
= maxU,V,W
[p(R | U,Vσ2)p(U | σ2U)p(V |W,X, σ2V)p(W | σ2W)]
L(U,V,W) =N∑i
M∑j
Iij2
(rij − uTi vj)2 +λU2
N∑i
‖ ui ‖2
+λV2
M∑j
‖ vj − cnn(W,Xj) ‖2 +λW2
|wk |∑k
‖ wk ‖2
,where λU = σ/σ2U, λV = σ/σ2V, and λW = σ/σ2W
4. ConvMF
- Optimization
• We adopt coordinate descent, to optimize the variables iteratively
ui ← (VIiVT + λUIK )−1VRi
vj ← (UIjUT + λVIK )−1(URj + λVcnn(W,Xj))
,where Ii = diag(Iij), j = 1, · · · ,M and Ri is a vector with (rij)Mj=1 for
user i.
• To optimize W, we use back propagation algorithm.
E(W) =λV2
M∑j
‖ (vj − cnn(W,Xj) ‖2 +λW2
|wk |∑k
‖ wk ‖2 +constant
4. ConvMF
Optimization
• With optimized U,V , and W, finally we can predict unknown ratingsof users on items.
rij ≈ E[rij | uTi vj , σ2]
= uTi vj = uTi (cnn(W,Xj) + εj)
4. ConvMF
Result
• ConvMF significantly outperforms the state-of-the-art competitors
• ConvMF well deals with the sparsity problem and skewed data withcontextual information.
• Pre-trained word embedding model increases the performance ofwhen the number of ratings is insufficient.
• ConvMF can distinguish subtle contextual difference of the sameword via different shared weights.
Outline
1. Attention based CNN
2. Personalized CNN (CNN-PerMLP)
3. Deep Coperative Neural Network (DeepCoNN)
4. Convolutional Matrix Factorization (ConvMF)
5. CNN for Image Feature Extraction(VPOI)
6. CNN for Audio Feature Extraction(WMF)
7. CNN for Text Feature Extraction
5. CNN for Image Feature Extraction(VPOI)
Visual Content Enhanced POI recommendation (VPOI) (Wang et al.2016)
• Goal : Recommending k un-visited POIs to each user.
• VPOI incorporates visual contents for POI recommendations
• Photos reflect users’ interests and informative descriptions aboutlocations.
Figure: Example of Images Posted by Users
5. VPOI
Architecture
Figure: The architecture of VPOI
5. VPOI
POI Recommender
• POI recommendation called location recommendation,
• POI recommendation focuses on
• geographical influence• social correlations• temporal patterns• textual content indications
5. VPOI
Notations
• U = {u1, u2, · · · , un}, L = {l1, l2, · · · , lm}, P = {p1, p2, · · · , pN}: the set of users. locations and photos
• X ∈ Rn×m : user-POI check-in matrix , Xij = freq. or rating of ui on lj
• R ∈ Rn×m : normalized version of X
Rij = g(Xij), g(x) =1
1 + exp−1
• Pui : the set of images uploaded by user i
• Plj : the set of images that are tagged lj
5. VPOI
Basic POI Recommender
• Probabilistic Matrix Factorization (PMF)
• POI recommender is one class CF, where only positive sample aregiven.
• Condi. dist. over observed ratings is
P(R | U,V, σ) =n∏
i=1
m∏j=1
[N(Rij | uTi vj , σ2)]Yij
,where U ∈ RK×n and V ∈ RK×m are the latent feature matrices ofusers and POIs, respectively.Y : indicator matix (Yij = 1 if Rij > 0 and 0 o.w )
5. VPOI
Basic POI Recommender
• User-Check-in data Model is
P(U,V | R) =n∏
i=1
N(ui | 0, σ2uI )m∏j=1
N(vj | 0, σ2v I )
n∏i=1
m∏j=1
[N(Rij | uTi vj , σ2)]Yij .
5. VPOI
Extracting and Modeling
• VGG16 model is choosen.
• For an input image pk , the visual contents are the output of VGG16.We denote it as cnn(pk) .
Figure: The architecture of VGG16 model
5. VPOI
Extracting and Modeling
• Prob. that ps belongs to ui :
P(fis = 1 | ui , ps) =exp(ui · P · CNN(ps))∑
pk∈P exp(uTi · P · CNN(pk))
, where P ∈ RK×d is the interaction marix between the visualcontents and latent user features.fis denotes if ps is posted by ui or not.
• By maximizing P(fis = 1 | ui , ps) for ps ∈ Pui , we force ui to besimilar to the visual contents.
5. VPOI
Extracting and Modeling
• Prob. that pt associated with lj :
P(gjt = 1 | lj , pt) =exp(vTi ·Q · CNN(pt))∑
pk∈P exp(vTj ·Q · CNN(pk))
, where Q ∈ RK×d is the interaction marix between the visualcontents and latent POI features.gjt denotes if pt is associated with lj or not.
• By maximizing P(gjt = 1 | lj , pt) for pt ∈ Pvj , we force vj to besimilar to the visual contents.
5. VPOI
Extracting and Modeling
• The image features :
P(F ,G | P,U,V,P,Q)
= [n∏
i=1
∏ps∈Pui
P(fis = 1 | ui , ps)] · [m∏j=1
∏pt∈Plj
P(gjt = 1 | lj , pt)]
,where F = {fis : ps ∈ Pui , ∀ui ∈ U} and G = {gjt : pt ∈ Plj , ∀lj ∈ L}
5. VPOI
VPOI Framework
maxU,V,P,Q,CNN
P(U,V,P,Q | R,F ,G,P)
• The Posterior Dist. is
P(U,V,P,Q | R,F ,G,P)
∝ P(R,F ,G | U,V,P,Q,P)P(U,V,P,Q | P)
= P(R | U,V)P(F ,G | P,U,V,P,Q)P(P)P(Q)P(U)P(V)
5. VPOI
VPOI Framework
• VPOI Framework can be written as
maxU,V,P,Q,CNN
− ‖ Y � (R−UTV) ‖2F −λ1(‖ U ‖2F + ‖ V ‖2F )
+αn∑
i=1
∑pk∈Pui
logP(fik = 1 | ui , pk)− λ2 ‖ P ‖2F
+αm∑j=1
∑pk∈Pvj
logP(gjk = 1 | vj , pk)− λ2 ‖ Q ‖2F
,where λ1 = σ2
σ2u
== σ2
σ2v, λ2 = σ2
σ2p
= σ2
σ2q
and α = 2σ2. � is the
Hadamard product.
5. VPOI
Algorithm
Figure: The architecture of VGG16 model
5. VPOI
Result
• VPOI outperforms representative state-of-the-art POI recommendersystems.
• The proposed framework alleviates the cold-start problem forrecommendation by incorporating images.
Outline
1. Attention based CNN
2. Personalized CNN (CNN-PerMLP)
3. Deep Coperative Neural Network (DeepCoNN)
4. Convolutional Matrix Factorization (ConvMF)
5. CNN for Image Feature Extraction(VPOI)
6. CNN for Audio Feature Extraction(WMF)
7. CNN for Text Feature Extraction
6. CNN for Audio Feature Extraction
Deep Content-based Music recommendation (Van et al. 2013)
• We propose to use a latent factor model for recommendation, and thelatent factors from music audio when they cannot be obtained fromusage data.
6. CNN for Audio Feature Extraction(WMF)
Weighted Matrix Factorization(WMF)
• The Taste Profile Subset contains play counts per song and per user.
• To learn latent factor representations of all users and items, we useWMF.
• rui : play count for user u and song i
• Define a preference and confidence variables
pui = I (rui > 0),
cui = 1 + αlog(1 + ε−1rui ).
• Assume the user enjoys the song, if pui = 1.
• cui measures how certain we are about this particular preference.
6. CNN for Audio Feature Extraction
Weighted Matrix Factorization(WMF) (Kim et al. 2016)
• WMF objective function :
minx∗,y∗
∑u,i
cui (pui − xTu yi )2 + λ(
∑u
‖ xu ‖2 +∑i
‖ yi ‖2)
,where xu is the latent factor vector for user u, and yi is the latentfactor vector for song i
• It consists of a confidence-weighted MSE and an L2 regularizationterm.
• ALS optimization method is used.
6. CNN for Audio Feature Extraction
Predictingl latent factors from music audio
• Regression problem
• Two methods (to convert music audio signals into a fixed-sizerepresentation):
• Bag-of-words representation• deep CNN
6. CNN for Audio Feature Extraction
Objective functions
• yi : the latent factor vector for song i , obtained with WMF
• y ′i : the corresponding prediction by the model
• Minimize MSE :minθ
∑i
‖ yi − y ′i ‖2
• Minimize WPE(weighted prediction error) :
minθ
∑u,i
cui (pu i − xTu y ′i )2
6. CNN for Audio Feature Extraction
Result
• Predicting latent factors from music audio is a viable method forrecommending new and unpopular music.
• Deep CNN significantly outperforming the traditional approaches.
Outline
1. Attention based CNN
2. Personalized CNN (CNN-PerMLP)
3. Deep Coperative Neural Network (DeepCoNN)
4. Convolutional Matrix Factorization (ConvMF)
5. CNN for Image Feature Extraction(VPOI)
6. CNN for Audio Feature Extraction(WMF)
7. CNN for Text Feature Extraction
7. CNN for Text Feature Extraction
e-Learning Resources Recommendation (Shen et al. 2016)
• Automatic Recommendation Technology for e-Learning Resourceswith CNN
• Text information : the course introduction or the classroom content,the abstract or full content of the learning resources.
• CNN can be used to predict the latent factors from the textinformation .
• We predict the rating scores between students and learning resources.
7. CNN for Text Feature Extraction
Architecture
Figure: The architecture of the recommendation algorithm
7. CNN for Text Feature Extraction
Training process
• Language model is employed for the input of CNN.
• LFM(Latent Factor Model) is employed for the output of CNN.
• CNN bridges the semantic gap between text information and thevectors of latent factors.
7. CNN for Text Feature Extraction
Recommendation process
• CNN : the input text information →the features of the learningresource
• We combine it with the student’s preferences
• The rating score between a student and a learning resource can bepredicted.
7. CNN for Text Feature Extraction
Model
• The CNN can be used to predict the latent factors from the textinformation.
• Input is achieved by language model according to the textinformation
• Output is solved by latent factor model from the historical ratingscores data
7. CNN for Text Feature Extraction
Model - CNN• four layers of CNN
• convolutional layer with multiple feature maps.• a mean-over-time pooling layer• an over-time convolutional layer• fully connected layer
Figure: The Construction of CNN
7. CNN for Text Feature Extraction
Model - CNN . 1) convolutional layer
• xi ∈ Rk : k-dim word representation of i-th word
• x = [x1, x2, · · · , xn] ∈ Rk
ci = f (w · xi + b)
, where w ∈ Rk is a filter, b ∈ R is a bias and f is a non-linear ftn.
• Feature Map :c = [c1, c2, · · · , cn] ∈ Rn
7. CNN for Text Feature Extraction
Model - CNN . 2) mean-overtime pooling layer
• We apply a mean-overtime region pooling operation over the featuremap.
• Pooling Operation in λ regions
bi = max{c(i−1)×(n/λ)+1, · · · , ci×(n/λ)), i ∈ [1, λ]
b = [b1,b2, · · · ,bλ]
7. CNN for Text Feature Extraction
Model - CNN . 3) convolutional layer
• Feature value :a = f (w · b + b)
, where w ∈ Rλ is a filter, b ∈ R is a bias and f is a non-linear ftn.
• The process extracts one feature from one filter. The model usesmultiple filters to obtain multiple features.
7. CNN for Text Feature Extraction
Model - CNN . 4) Fully Connected Layer
• Input : The features from previous layer.
• Output is the predicted latent factors
• The process extracts one feature from one filter. The model usesmultiple filters to obtain multiple features.
7. CNN for Text Feature Extraction
Model - CNN
• Minimize the mean squared error (MSE) of the predictions
arg minw,b
∑i
‖ y′i − yi ‖2
,where y′i is the latent factor vector for article i and yi is the outputof CNN.
7. CNN for Text Feature Extraction
Model - LFM
• The LFM results represent the features of students’ preferences andlearning resources.
Figure: The Process of LFM
7. CNN for Text Feature Extraction
Model - LFM L1R
• We proposed a modified matrix factorization method with L1 normbased regularization.
J(U,V) =∑ij
(Ui∗ · V∗j − rij)2 + λ1 ‖ U ‖1 +λ2 ‖ V ‖1
• U : the relationship between the students and the latent factors
• V : the relationship between the learning resources and the latentfactors
• rij : the rating score that made by i-th student to the j-th learningresource
• To minimize it, the split Bregman iteration method is used.
7. CNN for Text Feature Extraction
Model - Language Model
• Topic Model is employed.
• The Latent Dirichlet Allocation (LDA) method is used to train thetopic model.
7. CNN for Text Feature Extraction
Result
• It achieves significant improvements over conventional methods.
• It can also work well when the existing recommendation algorithmssuffer from the cold-start problem.