-
Findings of the Association for Computational Linguistics: EMNLP
2020, pages 1423–1432November 16 - 20, 2020. c©2020 Association for
Computational Linguistics
1423
Privacy-Preserving News Recommendation Model Learning
Tao Qi1, Fangzhao Wu2, Chuhan Wu1, Yongfeng Huang1 and Xing
Xie21Department of Electronic Engineering & BNRist, Tsinghua
University, Beijing 100084, China
2Microsoft Research Asia, Beijing 100080,
China{taoqi.qt,wuchuhan15}@gmail.com [email protected]
{fangzwu,xing.xie}@microsoft.com
Abstract
News recommendation aims to display newsarticles to users based
on their personal in-terest. Existing news recommendation meth-ods
rely on centralized storage of user be-havior data for model
training, which maylead to privacy concerns and risks due to
theprivacy-sensitive nature of user behaviors. Inthis paper, we
propose a privacy-preservingmethod for news recommendation model
train-ing based on federated learning, where the userbehavior data
is locally stored on user devices.Our method can leverage the
useful informa-tion in the behaviors of massive number usersto
train accurate news recommendation mod-els and meanwhile remove the
need of cen-tralized storage of them. More specifically,on each
user device we keep a local copy ofthe news recommendation model,
and com-pute gradients of the local model based on theuser
behaviors in this device. The local gradi-ents from a group of
randomly selected usersare uploaded to server, which are further
aggre-gated to update the global model in the server.Since the
model gradients may contain someimplicit private information, we
apply localdifferential privacy (LDP) to them before up-loading for
better privacy protection. The up-dated global model is then
distributed to eachuser device for local model update. We
repeatthis process for multiple rounds. Extensive ex-periments on a
real-world dataset show the ef-fectiveness of our method in news
recommen-dation model training with privacy protection.
1 Introduction
With the development of Internet and mobile Inter-net, online
news websites and Apps such as Yahoo!News1 and Toutiao2 have become
very popular forpeople to obtain news information (Okura et
al.,
1https://news.yahoo.com2https://www.toutiao.com/
2017). Since massive news articles are posted on-line every day,
users of online news services faceheavy information overload (Zheng
et al., 2018).Different users usually prefer different news
infor-mation. Thus, personalized news recommendation,which aims to
display news articles to users basedon their personal interest, is
a useful technique toimprove user experience and has been widely
usedin many online news services (Wu et al., 2019b).The research of
news recommendation has attractedmany attentions from both academic
and industrialfields (Okura et al., 2017; Wang et al., 2018; Lianet
al., 2018; An et al., 2019; Wu et al., 2019a).
Many news recommendation methods have beenproposed in recent
years (Wang et al., 2018; Wuet al., 2019b; Zhu et al., 2019b).
These methodsusually recommend news based on the matchingbetween
the news representation learned from newscontent and the user
interest representation learnedfrom historical user behaviors on
news. For ex-ample, Okura et al. (2017) proposed to learn
newsrepresentations from the content of news articlesvia
autoencoder, and learn user interest represen-tations from the
clicked news articles via GatedRecurrent Unit (GRU) network. They
ranked thecandidate news articles using the direct dot prod-uct of
the news and user interest representations.These approaches all
rely on the centralized storageof user behavior data such as news
click historiesfor model training. However, users’ behaviors onnews
websites and Apps are privacy-sensitive, theleakage of which may
bring catastrophic conse-quences. Unfortunately, the centralized
storage ofuser behavior data in server may lead to high pri-vacy
concerns from users and risks of large-scaleprivate data
leakage.
In this paper, we propose a privacy-preservingmethod for news
recommendation model training.Instead of storing user behavior data
on a cen-tral server, in our method it is locally stored on
-
1424
(and never leaves) users’ personal devices, whichcan effectively
reduce the privacy concerns andrisks (McMahan et al., 2017). Since
the behaviordata of a single user is insufficient for model
train-ing, we propose a federated learning based frame-work named
FedNewsRec to coordinate massiveuser devices to collaboratively
learn an accuratenews recommendation model without the need
tocentralized storage of user behavior data. In ourframework, on
each user device we keep a localcopy of the news recommendation
model. Sincethe user behaviors on news websites or Apps storedin
user device can provide important supervisioninformation of how the
current model performs, wecompute the model gradients based on
these localbehaviors. The local gradients from a group of ran-domly
selected users are uploaded to server, whichare further aggregated
to update the global newsrecommendation model maintained in the
server.The updated global model is then distributed toeach user
device for local model update. We repeatthis process for multiple
rounds until the trainingconverges. Since the local model gradients
mayalso contain some implicit private information ofusers’
behaviors on their devices, we apply the lo-cal differential
privacy (LDP) technique to theselocal model gradients before
uploading them toserver, which can better protect user privacy at
thecost of slight performance sacrifice. We conductextensive
experiments on a real-world dataset. Theresults show that our
method can achieve satis-factory performance in news recommendation
bycoordinating massive users for model training, andat the same
time can well protect user privacy.
The major contributions of this work include:(1) We propose a
privacy-preserving method
to train accurate news recommendation model byleveraging the
behavior data of massive users andmeanwhile remove the need to its
centralized stor-age to protect user privacy.
(2) We propose to apply local differential privacyto protect the
private information in local gradientscommunicated between user
devices and server.
(3) We conduct extensive experiments on a real-world dataset to
verify the proposed method inrecommendation accuracy and privacy
protection.
2 Related Work
2.1 News Recommendation
News recommendation can be formulated as a prob-lem of matching
between news articles and users.
There are three core tasks for news recommenda-tion, i.e., how
to model the content of news articles(news representation), how to
model the personalinterest of users in news (user representation),
andhow to measure the relevance between news con-tent and user
interest. For news representation,many feature based methods have
been applied. Forexample, Lian et al. (2018) represented news
us-ing URLs, categories and entities. Recently, manydeep learning
based news recommendation meth-ods represent news from the content
using neuralnetworks. For example, Okura et al. (2017) used
de-noising autoencoder to learn news representationsfrom news
content. Wu et al. (2019c) proposedto learn news representations
from news titles viamulti-head self-attention network. For user
repre-sentation, existing news recommendation methodsusually model
user interest from their historicalbehaviors on news platforms. For
example, Okuraet al. (2017) learned user representations from
thepreviously clicked news using GRU network. Anet al. (2019)
proposed a long- and short-term userrepresentation model (LSTUR)
for user interestmodeling. It captures the long-term user
interestvia user ID embedding and the short-term user in-terest
from the latest news click behaviors via GRU.For measuring the
relevance between user interestand news content, dot product of
user and newsrepresentation vectors is widely used (Okura et
al.,2017; Wu et al., 2019a; An et al., 2019). Somemethods also
explore cosine similarity (Zhu et al.,2019b), feed-forward network
(Wang et al., 2018),feature-interaction network (Lian et al.,
2018).
These existing news recommendation methodsall rely on
centrally-stored user behavior data formodel training. However,
users’ behaviors on newsplatforms are privacy-sensitive. The
centralizedstorage of user behavior data may lead to seriousprivacy
concerns of users. In addition, the newsplatforms have high
responsibility to prevent userdata leakage, and have high pressure
to meet therequirements of strict user privacy protection
regu-lations like GDPR3. Different from existing newsrecommendation
methods, in our method the userbehavior data is locally stored on
personal devices,and only the model gradients are
communicatedbetween user devices and server. Since the
modelgradients contain much less user information thanthe raw
behavior data and they are futher processedby the Local
Differential Privacy (LDP) technique,
3https://eugdpr.org/
-
1425
our method can protect user privacy much betterthan existing
news recommendation methods.
2.2 Federated Learning
Federated learning (McMahan et al., 2017) isa privacy-preserving
machine learning techniquewhich can leverage the rich data of
massive usersto train shared intelligent models without the needto
centrally store the user data. In federated learn-ing the user data
is locally stored on user mobiledevices and never uploaded to
server. Instead, eachuser device computes a model update based
onthe local data, and the locally-computed model up-dates from many
users are aggregated to update theshared model. Since model updates
usually containmuch less information than the raw user data,
therisks of privacy leakage can be effectively reduced.Federated
learning requires that the labeled datacan be inferred from user
interactions for super-vised model learning, which can be perfectly
satis-fied in our news recommendation scenario, sincethe click and
skip behaviors on news websites andApps can provide rich
supervision information.
Federated learning has been applied to trainingquery suggestion
model for smartphone keyboardand topic models (Jiang et al., 2019).
There are alsosome explorations in applying federated learning
torecommendation (Ammad et al., 2019; Chai et al.,2019). For
example, Ammad et al. (2019) proposeda federated collaborative
filtering (FCF) method.In FCF, the personal rating data is locally
storedon user client and is used to compute the local gra-dients of
user embeddings and item embeddings.The user embeddings are locally
maintained in userclient and are directly updated using the local
gra-dient on each client. The item embeddings aremaintained by a
central server, and are updated us-ing the aggregated gradients of
many clients. Chaiet al. (2019) proposed a federated matrix
factor-ization method, which is very similar with FCF.However,
these methods require all users to par-ticipate the process of
federated learning to traintheir embeddings, which is not practical
in real-world recommendation scenarios. Besides, thesemethods
represent items using their IDs, and aredifficult to handle new
items since many news arti-cles are posted every day which are all
new items.Thus, these federated learning based recommenda-tion
methods have their inherent drawbacks, andare not suitable for news
recommendation.
2.3 Local Differential PrivacyLocal differential privacy (LDP)
is an importanttechnique to provide guarantees of privacy for
sen-sitive information collection and analysis (Renet al., 2018).
It has attracted increasing attentionssince user privacy protection
has become a moreand more important issue (Kairouz et al., 2014;
Qinet al., 2016). A classical scenario of LDP is thatthere are a
set of users, and each user u has a pri-vate value v, which is sent
to a untrusted third-partyaggregator so that the aggregator can
learn somestatistical information of the private value
distribu-tion among the users (Cormode et al., 2018). LDPcan
guarantee that the leakage of private informa-tion for each
individual user is bounded by apply-ing a randomized algorithmM to
private value vand sending the perturbed valueM(v) to the
ag-gregator for statistical information inference. Therandomized
algorithmM(·) is called to satisfy �-local differential privacy if
and only if for two ar-bitrary input private values v and v′, the
followinginequation holds:
Pr[M(v) = y] ≤ e� Pr[M(v′) = y], (1)
where y ∈ range(M). � ≥ 0, and it is usu-ally called privacy
budget. Smaller � meansbetter private information protection. In
manyworks (Sarathy and Muralidhar, 2010; Duchi et al.,2013), M(·)
is implemented by adding Laplacenoise to the private value. In this
paper we applyLDP technique to the model gradients which
aregenerated in user devices based on user behaviorsand uploaded to
server, to better protect user pri-vacy and remove the need to a
trusted server.
3 FedNewsRec for Privacy-PreservingNews Recommendation
In this section we introduce our FedNewsRecmethod for
privacy-preserving news recommen-dation model training. We first
describe the newsrecommendation model. Then we describe the
de-tails of FedNewsRec.
3.1 Basic News Recommendation ModelFollowing previous works (Wu
et al., 2019a; Anet al., 2019), the news recommendation model inour
method can be decomposed into two core sub-models, i.e., a news
model to learn news representa-tions and a user model to learn user
representations.
News Model The news model aims to learnnews representations to
model news content. Its ar-
-
1426
NewsEmbedding
···
News TitleTrump claims Ukraine
···
!
···
Word Embedding
···
Attention
Self-Attention
······CNN
Figure 1: The architecture of news model.
chitecture is shown in Fig. 1. Following (Wu et al.,2019b), we
learn news representations from newstitles. The news model contains
four layers stackedfrom bottom to up. The first layer is word
embed-ding, which converts the word sequence in a newstitle into a
sequence of semantic word embeddingvectors. The second layer is a
CNN network, whichis used to learn word representations by
capturinglocal contexts. The third layer is a multi-head
self-attention network (Vaswani et al., 2017), which canlearn
contextual word representations by modelingthe long-range
relatedness between different words.The fourth layer is an
attention network, which isused to build a news representation
vector t fromthe output of multi-head self-attention network
byselecting informative words.
User Model The user model is used to learnuser representations
to model their personal inter-est. Its architecture is shown in
Fig. 2. Follow-ing (Okura et al., 2017), we learn user
representa-tion from their clicked news articles. Motivated bythe
LSTUR model proposed by An et al. (2019),we learn representations
of users by capturing bothlong-term and short-term interests. The
differenceis that in LSTUR the embeddings of user IDs areused to
model long-term interest, while in our usermodel it is learned from
all the historical behav-iors through a combination of a multi-head
self-attention network and an attentive pooling network.This is
because in the federated learning scenario, itis not practical that
all users can participate the pro-cess of model training. Thus, the
ID embeddings ofmany users in LSUTR cannot be learned. For
short-term user interest modeling, our user model appliesa GRU
network to the recent behaviors of users,which is the same with
LSUTR. The embeddings
User Embedding
···
··· ···
Historical clicked news
Short-term user embedding
!N-M+1
"
"#"$ Long-term user embedding
Attention
!N-M+2 !N!1
GRU GRU GRUAttention
Self-Attention
······
Figure 2: The architecture of user model.
of long-term interest and short-term interest arecombined with
an attention network into a unifieduser embedding vector u.
Model Training from User Behavior Users’behaviors on news
websites and Apps can provideuseful supervision information to
train the newsrecommendation models. For example, if a useru clicks
a news article t which has low rankingscore predicted by the model,
then we can tune themodel to give higher ranking score for this
user-news pair. We propose to train the news recommen-dation model
based on both click and non-clickbehaviors. More specifically,
following (Wu et al.,2019b), for each news tci clicked by user u,
werandomly sample H news which are displayed inthe same impression
but not clicked. Assume thisuser has Bu click behaviors in total,
then the lossfunction of the news recommendation model
withparameter set Θ is defined as:
Lu(Θ) =Bu∑i=1
Li, (2)
Li = − log( exp(s(u, tci ))
exp(s(u, tci )) +∑H
j=1 exp(s(u, tnci,j))
),
(3)where tci and t
nci,j are clicked and non-clicked news
articles shown in the same impression. s(u, t) is theranking
score of news t for user u, which is definedas the dot product of
their embedding vectors, i.e.,s(u, t) = uT t.
3.2 The Framework of FedNewsRecNext, we introduce our FedNewsRec
framework forprivacy-preserving news recommendation model
-
1427
!
Aggregator
Update
Upload
Global Model
Model Gradients
User iLossℒ#
ServerModel
Gradients User j
Lossℒ$
Distribute Updated Model
Distribute Updated Model
User Logs ℬ#Randomized
Gradients
LDP Module
LDP Module
Randomized Gradients
User Logs ℬ$
Local ModelNewsModel
User Model
NewsModel
User Model
Clipping
Clipping
Local ModelNewsModel
User Model
Figure 3: The framework of our privacy-preserving news
recommendation approach.
training, which is shown in Fig. 3. In our Fed-NewsRec
framework, the user behaviors on newsplatforms (websites or Apps)
are locally stored onthe user devices and never uploaded to server.
Inaddition, the servers which provide news servicesdo not record
nor collect the user behaviors, whichcan reduce the privacy
concerns of users and therisks of data leakage. Since an accurate
news rec-ommendation model can effectively improve users’news
reading experiences and the behavior datafrom a single user is far
from sufficient for trainingan accurate and unbiased model, in our
FedNews-Rec framework we propose to coordinate a largenumber of
user devices to collectively train intelli-gent news recommendation
models.
Following (McMahan et al., 2017), each user de-vice which
participates the model training is calleda client. Each client has
a copy of the current newsrecommendation model Θ which is
maintained bythe server. Assume user u’s client has accumulateda
set of behaviors on news platforms which is de-noted as Bu, then we
compute a local gradient ofmodel Θ according to the behaviors Bu
and theloss function defined in Eq. (3), which is denotedas gu =
∂Lu∂Θ . Although the local model gradientgu is computed from a set
of behaviors rather thana single behavior, it may still contain
some privateinformation of user behaviors (Zhu et al., 2019a).Thus,
for better privacy protection, we apply localdifferential privacy
(LDP) technique to the localmodel gradients. Denote the randomized
algorithmapplied to gu asM, which is defined as:
M(gu) = clip(gu, δ) + n, (4)
n ∼ La(0, λ), (5)
where n is Laplace noise with 0 mean. The param-
eter λ can control the strength of Laplace noise,and larger λ
can bring better privacy protection.The function clip(x, y) is used
to limit the valueof x with the scale of y. It is motivated by
somestudies which show that applying gradient clippingcan help
avoid potential gradient explosion and isbeneficial for model
training (Zhang et al., 2019).Denote the randomized gradient as g̃u
= M(gu).After clip and randomization operation, it is moredifficult
to infer the raw user behaviors from thegradients. The user client
uploads the randomizedlocal model gradient g̃u to the server.
In our FedNewsRec framework, a server is usedto maintain the
news recommendation model andupdate it via the model gradients from
a large num-ber of users. In each round, the server randomlyselects
a random fraction r (e.g., 10%) of the userclients, and sends the
current news recommenda-tion model Θ to them. Then it collects and
aggre-gates the local model gradients from the selecteduser clients
as follows:
g =1∑
u∈U |Bu|∑u∈U|Bu| · g̃u, (6)
where U is the set of users selected for the learningprocess in
this round, and Bu is the set of behaviorsof user u for local model
gradient computation.
Then the aggregated gradient g is used to updatethe global news
recommendation model Θ main-tained in the server:
Θ = Θ− η · g, (7)
where η is the learning rate. The updated globalmodel is then
distributed to user devices to updatetheir local models. This
process is repeated untilthe model training converges.
-
1428
3.3 Discussions on Privacy ProtectionNext, we discuss why our
FedNewsRec frameworkcan protect user privacy in news
recommendationmodel training. First, in our method the privateuser
behavior data is stored on user own devices,and is never uploaded
to server. Only the modelgradients inferred from the local user
behaviors arecommunicated with the server. According to thedata
processing inequality (McMahan et al., 2017),these gradients never
contain more private informa-tion than the raw user behaviors, and
usually con-tain much less information (McMahan et al., 2017).Thus,
the user privacy can be better protected com-pared with the
centralized storage of user behaviordata as did in existing news
recommendation meth-ods. Second, the local model gradients are
com-puted from a group of user behaviors instead of asingle
behavior. Thus, it is not very easy to infera specific behavior
from the local model gradientsuploaded to server. Third, we apply
the local dif-ferential privacy technique to the local model
gra-dients before uploading by adding Laplace noiseto them. It can
strengthen the privacy protectionof the private information in
local model gradients.According to (Choi et al., 2018), Laplace
noise inLDP can achieve �-local differential privacy, and� =
maxv,v′ |M(v)−M(v′)|λ , where v and v
′ are ar-bitrary values in local model gradient. Since theupper
bound of maxv,v′ |M(v) −M(v′)| in ourFedNewsRec framework is 2δ,
the upper bound ofthe privacy budget � is 2δλ . We can see that by
in-creasing λ (i.e., the strength of the noise), we canachieve a
smaller privacy budget �which means bet-ter privacy protection.
However, strong noise willhurt the accuracy of aggregated
gradients. Thus, λshould be selected based on the trade-off
betweenprivacy protection and model performance.
4 Experiment
4.1 Dataset and Experimental SettingsOur experiments were
conducted on a public newsrecommendation dataset (named Adressa)
collectedfrom a Norwegian news website (Gulla et al.,2017) and
another real-world dataset collectedfrom Microsoft News4 (named
MSN-News).5 Forthe Adressa dataset, following Hu et al. (2020),
weused user logs in the first five days to construct
4https://www.msn.com/en-us5Our dataset and codes will be
publicly available in
https://github.com/JulySinceAndrew/FedNewsRec-EMNLP-Findings-2020.
MSN-News Adressa# users 100,000 528,514# news 118,325 16,004
# impressions 1,341,853 -# positive behaviors 2,006,289
2,411,187# negative behaviors 48,051,601 -
avg. # title length 11.52 6.60
Table 1: The statistical information of the dataset.
users’ click history, used logs in the 6-th day formodel
training, and used logs in the 7-th day formodel evaluation. Since
the Adressa dataset doesnot contain non-clicked data, we randomly
sampled20 news as negative samples for each click to con-struct the
test set. For the MSN-News dataset, werandomly sampled 100,000
users and their behav-ior logs in 5 weeks (from October 19 to
November22, 2019). We assume that the behavior logs ofdifferent
users are stored in a decentralized way tosimulate the real
application of privacy-preservingnews recommendation model
training. We used thebehaviors in the last week for test and the
remain-ing behaviors for training. In addition, since inpractical
applications not all users can participatethe model training, we
randomly selected half ofthe users for training and tested the
model on allusers. The detailed statistics of the two datasets
arelisted in Table 1. Following (Wu et al., 2019b), weused the
average scores of AUC, MRR, nDCG@5,nDCG@10 of all impressions in
the test set to eval-uate the performance. We repeated each
experi-ment five times and reported average results andstand
errors.
In experiments we used the 300-dimensional pre-trained Glove
embedding to initialize word embed-dings. The number of the
self-attention head is 20and the output dimension of each head is
20. Thedimension of GRU hidden state is 400. H in Eq. (3)is 4. The
fraction r of users participating in modeltraining in each round is
2%. The learning rate η inEq. (7) is 0.5. δ in Eq. (4) is 0.005 and
λ in Eq. (5)is 0.015. These hyper-parameters are all
selectedaccording to cross-validation on the training set.
4.2 Effectiveness Evaluation
First, we verify the effectiveness of the proposedFedNewsRec
method. We compared with manymethods, including: (1) FM (Rendle,
2012), factor-ization machine, a classic method for
recommen-dation; (2) DFM (Lian et al., 2018), deep fusionmodel for
news recommendation; (3) EBNR (Okura
-
1429
MethodMSN-News Adressa
AUC MRR nDCG@5 nDCG@10 AUC MRR nDCG@5 nDCG@10FM 58.41±0.04
27.19±0.05 28.98±0.04 34.57±0.06 61.94±0.80 26.59±0.33 22.69±0.54
32.17±0.46
DFM 61.25±0.26 28.68±0.10 30.62±0.21 36.38±0.23 65.14±0.69
34.74±0.89 33.17±1.46 39.79±1.08EBNR 63.64±0.15 29.50±0.14
31.57±0.13 37.38±0.20 65.70±0.72 30.23±0.49 29.37±0.53
36.38±0.44DKN 62.38±0.19 29.40±0.15 31.59±0.11 37.27±0.21
67.53±1.90 32.33±2.79 31.84±2.78 39.96±2.52DAN 62.54±0.23
29.44±0.18 31.67±0.14 37.31±0.25 64.03±3.10 33.37±2.63 31.61±3.03
38.60±3.02
NAML 64.52±0.24 30.93±0.17 33.39±0.16 39.07±0.19 69.20±2.07
35.18±1.49 34.78±1.85 42.34±1.97NPA 64.29±0.20 30.63±0.15
33.11±0.17 38.89±0.23 66.70±2.42 34.68±1.77 33.72±2.09
41.18±1.99
NRMS 65.72±0.16 31.85±0.20 34.59±0.18 40.25±0.17 67.97±2.23
33.16±2.54 32.37±3.59 40.41±2.82FCF 51.03±0.27 22.24±0.14
22.97±0.21 28.44±0.23 53.33±1.28 23.04±2.68 20.24±2.77
27.09±2.61
FedNewsRec 64.65±0.15 30.60±0.09 33.03±0.11 38.77±0.10
69.91±2.53 35.55±1.85 33.74±2.45 41.47±2.78CenNewsRec 66.45±0.17
31.91±0.22 34.62±0.18 40.33± 0.24 71.02±2.09 36.31±2.52 35.73±3.71
43.98±2.52
Table 2: The news recommendation results of different
methods.
et al., 2017), using GRU for user modeling (Choet al., 2014);
(4) DKN (Wang et al., 2018), usingknowledge-aware CNN network for
news repre-sentation in news recommendation; (5) DAN (Zhuet al.,
2019b), using CNN to learn news represen-tations from both news
title and entities and usingLSTM to learn user representations; (6)
NAML (Wuet al., 2019a), learning news representations via
at-tentive multi-view learning; (7) NPA (Wu et al.,2019b), using
personalized attention network tolearn news and user
representations; (8) NRMS (Wuet al., 2019d), learning
representations of newsand users via multi-head self-attention
network;(9) FCF (Ammad et al., 2019), a federated collab-orative
filtering method for recommendation; (10)CenNewsRec, which has the
same news recommen-dation model with FedNewsRec but is trained
oncentralized user behavior data.
The results are listed in Table 2. First, by com-paring
FedNewsRec with SOTA news recommenda-tion methods such as NRMS, NPA
and EBNR, ourmethod can achieve comparable and even
betterperformance on news recommendation. It validatesthe
effectiveness of our approach in learning ac-curate models for
personalized news recommenda-tion. Moreover, different from these
existing newsrecommendation methods which are all trained
oncentralized storage of user behavior data, in ourFedNewsRec the
user behavior data is stored onlocal user devices and is never
uploaded. Thus, ourmethod can train accurate news
recommendationmodel and meanwhile better protect user privacy.
Second, our method can perform better than ex-isting federated
learning based recommendationmethods like FCF (Ammad et al., 2019).
The per-formance of FCF is not good in news recommen-dation. This
is because FCF requires each user andeach item to participate the
training process to learn
their embeddings. However, in practical applica-tion not all the
users can participate the trainingdue to different reasons. In
addition, news arti-cles on online news platforms expire very
quickly,and new news articles continuously emerge. Thus,many items
for recommendation are news items,and unseen in the training data,
which cannot behandled by FCF. In our method we learn
newsrepresentations from news content and learn userrepresentations
from their behaviors using neuralmodels. Therefore, our method can
handle theproblem of new users and new items, and is moresuitable
for news recommendation scenario.
Third, FedNewsRec performs worse than Cen-NewsRec which has the
same news recommenda-tion model with FedNewsRec but is trained on
thecentralized user behavior data. This is intuitivesince
centralized data is more beneficial for modeltraining than
decentralized data. In addition, inFedNewsRec we apply local
differential privacytechnique with Laplace noise to protect the
privateinformation in model gradients, which leads to theaggregated
gradient for model update less accu-rate. Luckily, the gap between
the performanceof FedNewsRec and CenNewsRec is not very big.Thus,
our FedNewsRec method can achieve muchbetter privacy protection at
the cost of acceptableperformance decline. These results validate
theeffectiveness of our method.
4.3 Influence of User Number
In this section, we explore whether our FedNews-Rec method can
exploit the useful behavior infor-mation of massive users in a
federated way to trainaccurate news recommendation models. In
thefollowing sections, we only show the experimentalresults on the
MSN-News dataset. We randomly se-lect different numbers of users
for model training,
-
1430
1k 5k 10k 20k 30k 40k 50k 60k 80k 100k# Users
52
56
60
64A
UC
AUC
nDCG@10
32
36
40
44
nD
CG
@1
0
Figure 4: Performance with different numbers of users.
0.000 0.005 0.010 0.015 0.020 0.025 0.03061
62
63
64
65
AUC
(a) Model performance.
0.005 0.010 0.015 0.020 0.025 0.0300.0
0.5
1.0
1.5
2.0
(b) Privacy budget �.
Figure 5: Influence of the hyper-parameters λ and δ.
and use all users for evaluation. The experimentalresults are
shown in Fig. 4.
From Fig. 4 we have several observations. First,when the number
of users is small (e.g., 1000),the performance of news
recommendation modeltrained on the behavior data of these users is
not sat-isfactory. This is because the behaviors of a singleuser
are usually very limited, and behavior data ofa small number of
users is insufficient to train accu-rate news recommendation model.
This result val-idates the motivation of our FedNewsRec methodto
coordinate a large number of users in a feder-ated way for model
training. Second, when thenumber of users participating in training
increases,the performance of FedNewsRec improves. It indi-cates
that FedNewsRec can effectively exploit theuseful behavior
information from different users tocollectively train an accurate
news recommenda-tion model, which validates the effectiveness of
ourframework. Third, when the number of users is bigenough, further
incorporating more users can onlybring marginal performance
improvement. Thisresult shows that a reasonable number of users
aresufficient for news recommendation model train-ing, and it is
unnecessary to involve too many orall users which is costly and
impractical.
4.4 Hyper-parameter Analysis
In this section, we explore the influence of hyper-parameters on
our method. We show the results oftwo important hyper-parameters,
i.e., δ in Eq. (4)
Figure 6: Convergence of model training.
and λ in Eq. (5) which serve in the local differen-tial privacy
module of our FedNewsRec framework.The results are shown in Fig. 5.
In Fig. 5(a) weshow the performance of our method with differentλ
and δ values. We find that a large λ value canlead to the
performance decline. This is becauselarger λ means stronger Laplace
noise added to thegradients in LDP module, making the
aggregatedgradient for model update less accurate. In addi-tion,
our method tends to have better performancewhen δ is larger. This
is because fewer gradientswill be affected in the clip operation
when δ islarger. In Fig. 5(b) we show the upper bound of theprivacy
budget, i.e., � in Section 3.3, with differentλ and δ values. We
can find that with larger λ valueand smaller δ value, the privacy
budget � becomeslower, which means better privacy protection.
Thisis intuitive, since larger λ value and smaller δ valueindicate
that stronger noise is added and more gra-dient values are clipped,
making it more difficult torecover the original model gradients.
CombiningFig. 5(a) and Fig. 5(b) we can see that the better
pri-vacy protection is achieved by some sacrifice of
theperformance, and we need to select λ and δ valuesbased on the
trade-off between privacy protectionand news recommendation
performance.
4.5 Convergence Analysis
Next we explore the convergence of the model train-ing in
FedNewsRec, and the results are shown inFig. 6. We can see that the
training process can con-verge in about 1,500 rounds under
different settingsof r (i.e., ratio of selected users for model
trainingin each round). It indicates that FedNewsRec cantrain news
recommendation model efficiently.
4.6 Effectiveness of User Model
In this section, we conduct ablation studies to eval-uate the
effectiveness of the short- and long- termuser interest modeling in
our user model. The ex-
-
1431
AUC [email protected]
65.3
65.6
65.9
66.2
66.5
AU
C
39.0
39.3
39.6
39.9
40.2
40.5
nD
CG
@1
0
- long-term user embedding
- short-term user embedding
LSTUR
Figure 7: Effectiveness of User Model
perimental results are shown in Fig. 7, from whichwe have
several observations. First, after removingthe short-term user
embedding, the performance ofour method declines. This is because
users some-times tend to read news related to the topics
theyrecently cared about (An et al., 2019). Our usermodel learns
the short-term user embedding fromthe sequence of users’ recent
clicked news via aGRU network, which can effectively capture
users’short-term interest. Thus, removing the short-termuser
embedding makes the unified user embeddingloss some information of
users’ recent reading pref-erence and causes performance decline.
Second,after removing the long-term user embedding, theperformance
of our method also declines. Thisis because users may read some
news accordingto their long-term interests, which may not be
re-flected by their recent reading history (An et al.,2019). To
address this issue, our user model learnslong-term user embedding
by capturing the relat-edness among users’ clicked news, which can
ef-fectively capture users’ long-term interest. Afterremoving it,
the unified user embedding losses theinformation of the long-term
interest, which hurtsthe recommendation accuracy.
5 Conclusion
In this paper, we propose a privacy-preservingmethod for news
recommendation model training.Different from existing methods which
rely oncentralized storage of user behavior data, in ourmethod the
user behaviors are locally stored on userdevices. We propose a
FedNewsRec framework tocoordinate a large number of users to
collectivelytrain accurate news recommendation models fromthe
behavior data of these users without the need toupload it. In our
method each user client computeslocal model gradients based on the
user behaviors
on device, and sends them to server. The servercoordinates the
training process and maintains aglobal news recommendation model.
It aggregatesthe local model gradients from massive users
andupdates the global model using the aggregated gra-dient. Then
the server sends the updated model touser clients and this process
is repeated for multi-ple rounds. In order to further protect the
privateinformation in the local model gradients, we applylocal
differential privacy to them by adding Laplacenoise. The
experiments on real-world dataset showthat our method can achieve
comparable perfor-mance with SOTA news recommendation methods,and
meanwhile can better protect user privacy.
Acknowledgments
This work was supported by the National Key Re-search and
Development Program of China underGrant number 2018YFC1604002, and
the NationalNatural Science Foundation of China under Grantnumbers
U1836204, U1705261, and 61862002.
ReferencesMuhammad Ammad, Elena Ivannikova, Suleiman A
Khan, Were Oyomno, Qiang Fu, Kuan Eeik Tan,and Adrian Flanagan.
2019. Federated collab-orative filtering for privacy-preserving
personal-ized recommendation system. arXiv
preprintarXiv:1901.09888.
Mingxiao An, Fangzhao Wu, Chuhan Wu, Kun Zhang,Zheng Liu, and
Xing Xie. 2019. Neural news recom-mendation with long-and
short-term user representa-tions. In ACL, pages 336–345.
Di Chai, Leye Wang, Kai Chen, and Qiang Yang.2019. Secure
federated matrix factorization. arXivpreprint arXiv:1906.05108.
Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bah-danau, and
Yoshua Bengio. 2014. On the propertiesof neural machine
translation: Encoder–decoder ap-proaches. In Proceedings of Eighth
Workshop onSyntax, Semantics and Structure in Statistical
Trans-lation, pages 103–111.
Woo-Seok Choi, Matthew Tomei, Jose Ro-drigo Sanchez Vicarte,
Pavan Kumar Hanumolu,and Rakesh Kumar. 2018. Guaranteeing
localdifferential privacy on ultra-low-power systems. InISCA, pages
561–574.
Graham Cormode, Somesh Jha, Tejas Kulkarni,Ninghui Li, Divesh
Srivastava, and Tianhao Wang.2018. Privacy at scale: Local
differential privacy inpractice. In SIGMOD, pages 1655–1658.
-
1432
John C Duchi, Michael I Jordan, and Martin J Wain-wright. 2013.
Local privacy and statistical minimaxrates. In FOCS, pages
429–438.
Jon Atle Gulla, Lemei Zhang, Peng Liu, ÖzlemÖzgöbek, and
Xiaomeng Su. 2017. The adressadataset for news recommendation. In
Proceedingsof the international conference on web
intelligence,pages 1042–1048.
Linmei Hu, Siyong Xu, Chen Li, Cheng Yang, ChuanShi, Nan Duan,
Xing Xie, and Ming Zhou. 2020.Graph neural news recommendation with
unsuper-vised preference disentanglement. In ACL,
pages4255–4264.
Di Jiang, Yuanfeng Song, Yongxin Tong, Xueyang Wu,Weiwei Zhao,
Qian Xu, and Qiang Yang. 2019. Fed-erated topic modeling. In CIKM,
page 1071–1080.
Peter Kairouz, Sewoong Oh, and Pramod Viswanath.2014. Extremal
mechanisms for local differentialprivacy. In NIPS, pages
2879–2887.
Jianxun Lian, Fuzheng Zhang, Xing Xie, andGuangzhong Sun. 2018.
Towards better represen-tation learning for personalized news
recommenda-tion: a multi-channel deep fusion approach. In IJ-CAI,
pages 3805–3811.
Brendan McMahan, Eider Moore, Daniel Ramage,Seth Hampson, and
Blaise Aguera y Arcas. 2017.Communication-efficient learning of
deep networksfrom decentralized data. In AISTATS, pages
1273–1282.
Shumpei Okura, Yukihiro Tagami, Shingo Ono, andAkira Tajima.
2017. Embedding-based news rec-ommendation for millions of users.
In KDD, pages1933–1942.
Zhan Qin, Yin Yang, Ting Yu, Issa Khalil, XiaokuiXiao, and Kui
Ren. 2016. Heavy hitter estimationover set-valued data with local
differential privacy.In CCS, pages 192–203. ACM.
Xuebin Ren, Chia-Mu Yu, Weiren Yu, Shusen Yang,Xinyu Yang, Julie
A McCann, and S Yu Philip.2018. High-dimensional crowdsourced data
publi-cation with local differential privacy. TIFS,
pages2151–2166.
Steffen Rendle. 2012. Factorization machines withlibfm. TIST,
page 57.
Rathindra Sarathy and Krish Muralidhar. 2010. Someadditional
insights on applying differential privacyfor numeric data. In
International Conference onPrivacy in Statistical Databases, pages
210–219.
Ashish Vaswani, Noam Shazeer, Niki Parmar, JakobUszkoreit, Llion
Jones, Aidan N Gomez, ŁukaszKaiser, and Illia Polosukhin. 2017.
Attention is allyou need. In NIPS, pages 6000–6010.
Hongwei Wang, Fuzheng Zhang, Xing Xie, and MinyiGuo. 2018. Dkn:
Deep knowledge-aware networkfor news recommendation. In WWW, pages
1835–1844.
Chuhan Wu, Fangzhao Wu, Mingxiao An, JianqiangHuang, Yongfeng
Huang, and Xing Xie. 2019a.Neural news recommendation with
attentive multi-view learning. IJCAI, pages 3863–3869.
Chuhan Wu, Fangzhao Wu, Mingxiao An, JianqiangHuang, Yongfeng
Huang, and Xing Xie. 2019b.Npa: Neural news recommendation with
personal-ized attention. In KDD, pages 2576–2584.
Chuhan Wu, Fangzhao Wu, Mingxiao An, Tao Qi,Jianqiang Huang,
Yongfeng Huang, and Xing Xie.2019c. Neural news recommendation with
heteroge-neous user behavior. In EMNLP, pages 4876–4885.
Chuhan Wu, Fangzhao Wu, Suyu Ge, Tao Qi,Yongfeng Huang, and Xing
Xie. 2019d. Neu-ral news recommendation with multi-head
self-attention. In EMNLP, pages 6390–6395.
Jingzhao Zhang, Tianxing He, Suvrit Sra, and Ali Jad-babaie.
2019. Why gradient clipping acceleratestraining: A theoretical
justification for adaptivity. InICLR.
Guanjie Zheng, Fuzheng Zhang, Zihan Zheng, YangXiang, Nicholas
Jing Yuan, Xing Xie, and Zhen-hui Li. 2018. Drn: A deep
reinforcement learn-ing framework for news recommendation. In
WWW,pages 167–176.
Ligeng Zhu, Zhijian Liu, and Song Han. 2019a.Deep leakage from
gradients. arXiv preprintarXiv:1906.08935.
Qiannan Zhu, Xiaofei Zhou, Zeliang Song, JianlongTan, and Guo
Li. 2019b. Dan: Deep attention neuralnetwork for news
recommendation. In AAAI, pages5973–5980.