Abstract - arXiv1 Opinion Recommendation using Neural Memory Model Zhongqing Wang and Yue Zhang Singapore University of Technology and Design, 8 Somapah Road, Singapore 487372 fzhongqing

1

Opinion Recommendation using Neural Memory Model

Zhongqing Wang and Yue ZhangSingapore University of Technology and Design,

8 Somapah Road, Singapore 487372{zhongqing wang, yue zhang}@sutd.edu.sg

Abstract

We present opinion recommendation, anovel task of jointly predicting a customreview with a rating score that a certainuser would give to a certain product orservice, given existing reviews and ratingscores to the product or service by otherusers, and the reviews that the user hasgiven to other products and services. Acharacteristic of opinion recommendationis the reliance of multiple data sourcesfor multi-task joint learning, which is thestrength of neural models. We use a singleneural network to model users and prod-ucts, capturing their correlation and gener-ating customised product representationsusing a deep memory network, from whichcustomised ratings and reviews are con-structed jointly. Results show that ouropinion recommendation system gives rat-ings that are closer to real user ratings onYelp.com data compared with Yelp’s ownratings, and our methods give better re-sults compared to several pipelines base-lines using state-of-the-art sentiment rat-ing and summarization systems.

1 Introduction

Offering a channel for customers to share opin-ions and give scores to products and services, re-view websites have become a highly influential in-formation source that customers refer to for mak-ing purchase decisions. Popular examples includeIMDB1 on the movie domain, Epinions2 on theproduct domain, and Yelp3 on the service domain.Figure 1 shows a screenshot of a restaurant reviewpage on Yelp.com, which offers two main types of

1http://www.imdb.com/2http://epinions.com/3https://www.yelp.com/

Figure 1: A restaurant review on Yelp.com.

information. First, an overall rating score is givenunder the restaurant name; second, detailed userreviews are listed below the rating.

Though offering a useful overview and detailsabout a product or service, such information hasseveral limitations. First, the overall rating is gen-eral and not necessarily agreeable to the taste ofindividual customers. Being a simple reflectionof all customer scores, it serves an average cus-tomer well, but can be rather inaccurate for indi-viduals. For example, the authors themselves of-ten find highly rated movies being tedious. Sec-ond, there can be hundreds of reviews for a prod-uct or service, which makes it infeasible for ex-haustive reading. It would be useful to have a briefsummary of all reviews, which ideally should becustomized to the reader.

We investigate the feasibility of a model thataddresses the limitations above. There are twosources of information that the model should col-lect to achieve its goal, namely information onthe target product, and information about the user.The former can be obtained from reviews writtenby other customers about the target product, andthe latter can be obtained from the reviews that theuser has written for other products and services.Given the above two sources of information, themodel should generate a customized score of theproduct that the user is likely to give after trying,as well as a customized review that the user would

arX

iv:1

702.

0151

7v1

[cs

.CL

] 6

Feb

201

7

http://www.imdb.com/

http://epinions.com/

https://www.yelp.com/

2

have written for the target product.

We refer to the task above using the term opin-ion recommendation, which is a new task, yetclosely related to several existing lines of workin NLP. The first is sentiment analysis (Hu andLiu, 2004; Pang and Lee, 2008), which is to givea rating score based on a customer review. Ourtask is different in that we aim to predict user rat-ing scores of new product, instead of predictingthe opinion score of existing reviews. The sec-ond is opinion summarization (Nishikawa et al.,2010; Wang and Ling, 2016), which is to gener-ate a summary based on reviews of a product. Amajor difference between our task and this task isthat the summary must be customized to a certainuser, and a rating score must additionally be given.The third is recommendation (Su and Khoshgof-taar, 2009; Yang et al., 2014), which is to givea ranking score for a certain product or servicebased on the purchase history of the user and othercustomers who have purchased the target product.Our task is different in the source of input, whichis textual customer reviews and ratings rather thannumerical purchase history.

There are three types of inputs for our task,namely the reviews of the target product, the re-views of the user on other products, and otherusers reviews on other products, and two typesof outputs, namely a customized rating score anda customized review. The ideal solution shouldconsider the interaction between all given typesof information, jointly predicting the two types ofoutputs. This poses significant challenges to sta-tistical models, which require manually definedfeatures to capture relevant patterns from trainingdata. Deep learning is a relatively more feasiblechoice, offering viabilities of information fusionby fully connected hidden layers (Collobert et al.,2011; Henderson et al., 2013). We leverage thisadvantage in building our model.

In particular, we use a recurrent neural networkto model the semantic content of each review. Aneural network is used to consolidate existing re-views for the target product, serving the role of aproduct model. In addition, a user model is builtby consolidating the reviews of the given user intoa single vector form. Third, to address potentialsparsity of a user’s history reviews, neighbor usersare identified by collaborative filtering (Ding et al.,2006), and a vector representation is learned by us-ing a neural neighborhood model, which consoli-

dates their history reviews. Finally, a deep mem-ory network is utilized to find the association be-tween the user and target product, jointly yieldingthe rating score and customised review.

Experiments on a Yelp dataset show that themodel outperforms several pipelined baselines us-ing state-of-the-art techniques. In particular, re-view scores given by the opinion recordation sys-tem are closer to real user review scores com-pared to the review scores which Yelp assigns totarget products. Our code is released at http://github.com/anonymous.

2 Related Work

Sentiment Analysis. Our task is related todocument-level sentiment classification (Pang andLee, 2008), which is to infer the sentiment polar-ity of a given document. Recently, various neu-ral network models are used to capture the senti-mental information automatically, including con-volutional neural networks (Kim, 2014), recursiveneural network (Socher et al., 2013) and recurrentneural network (Teng et al., 2016; Tai et al., 2015),which have been shown to achieve competitive re-sults across different benchmarks. Different frombinary classification, review rating prediction aimsto predict the numeric rating of a given review.Pang and Lee (2005) pioneered this task by re-garding it as a classification/regression problem.Most subsequent work focuses on designing effec-tive textural features of reviews (Qu et al., 2010;Li et al., 2011; Wan, 2013). Recently, Tang et al.(2015) proposed a neural network model to predictthe rating score by using both lexical semantic anduser model.

Beyond textural features, user information isalso investigated in the literature of sentimentanalysis. For example, Gao et al. (2013) developeduser-specific features to capture user leniency, andLi et al. (2014) incorporated textual topic and user-word factors through topic modeling. For integrat-ing user information into neural network models,Tang et al. (2015) predicted the rating score givena review by using both lexical semantic informa-tion and a user embedding model. Chen et al.(2016b) proposed a neural network to incorporateglobal user and product information for sentimentclassification via an attention mechanism.

Different from the above research on sentimentanalysis, which focuses on predicting the opinionon existing reviews. Our task is to recommend the

http://github.com/anonymous

http://github.com/anonymous

3

score that a user would give to a new product with-out knowing his review text. The difference orig-inates from the object, previous research aims topredict opinions on reviewed products, while ourtask is to recommend opinion on new products,which the user has not reviewed.

Opinion Summarization. Our work also over-laps with to the area of opinion summarization,which constructs natural language summaries formultiple product reviews (Hu and Liu, 2004).Most previous work extracts opinion words andaspect terms. Typical approaches include asso-ciation mining of frequent candidate aspects (Huand Liu, 2004; Qiu et al., 2011), sequence labelingbased methods (Jakob and Gurevych, 2010; Yangand Cardie, 2013), as well as topic modeling tech-niques (Lin and He, 2009). Recently, word embed-dings and recurrent neural networks are also usedto extract aspect terms (Irsoy and Cardie, 2014;Liu et al., 2015).

Aspect term extraction approaches lack criticalinformation for a user to understand how an as-pect receives a particular rating. To address this,Nishikawa et al. (2010) generated summaries byselecting and ordering sentences taken from mul-tiple review texts according to affirmativeness andreadability of the sentence order. Wang and Liu(2011) adopted both sentence-ranking and graph-based methods to extract summaries on an opinionconversation dataset. While all the methods aboveare extractive, Ganesan et al. (2010) presented agraph-based summarization framework to gener-ate concise abstractive summaries of highly redun-dant opinions, and Wang and Ling (2016) usedan attention-based neural network model to absorbinformation from multiple text units and generatesummaries of movie reviews.

Different from the above research on opinionsummarization, we generate a customized reviewto a certain user, and a rating score must be addi-tionally given.

Recommendation. Recommendation systemssuggest to a user new products and servicesthat might be of their interest. There are twomain approaches, which are content-based andcollaborative-filtering (CF) based (Adomaviciusand Tuzhilin, 2005; Yang et al., 2014), respec-tively. Most existing social recommendation sys-tems are CF-based, and can be further groupedinto model-based CF and neighborhood-basedCF (Kantor et al., 2011; Su and Khoshgoftaar,

2009). Matrix Factorization (MF) is one of themost popular models for CF. In recent MF-basedsocial recommendation works, user-user socialtrust information is integrated with user-item feed-back history (e.g., ratings, clicks, purchases) toimprove the accuracy of traditional recommenda-tion systems, which only factorize user-item feed-back data (Ding et al., 2006; Koren, 2008; Heet al., 2016).

There has been work integrating sentiment anal-ysis and recommendation systems, which use rec-ommendation strategies such as matrix factoriza-tion to improve the performance of sentiment anal-ysis (Leung et al., 2006; Singh et al., 2011). Thesemethods typically use ensemble learning (Singhet al., 2011) or probabilistic graph models (Wuand Ester, 2015). For example, Zhang et al. (2014)who proposed a factor graph model to recommendopinion rating scores by using explicit product fea-tures as hidden variables.

Different from the above research on recom-mendation systems, which utilize numerical pur-chase history between users and products, wework with textual information. In addition, rec-ommendation systems only predict a rating score,while our system generates also a customized re-view, which is more informative.

Neural Network Models. Multi-task learn-ing has been recognised as a strength of neu-ral network models for natural language process-ing (Collobert et al., 2011; Henderson et al., 2013;Zhang and Weiss, 2016; Chen et al., 2016a), wherehidden feature layers are shared between differenttasks that have common basis. Our work can beregarded as an instance of such multi-tasks learn-ing via shared parameters, which has been widelyused in the research community recently.

Dynamic memory network models are inspiredby neural turing machines (Graves et al., 2014),and have been applied for NLP tasks such as ques-tion answering (Sukhbaatar et al., 2015; Kumaret al., 2016), language modeling (Tran et al., 2016)and machine translation (Wang et al., 2016). It istypically used to find abstract semantic representa-tions of texts towards certain tasks, which are con-sistent with our main need, namely abstracting therepresentation of a product that is biased towardsthe taste of a certain user.

4

LSTMLSTM

LSTM

Attention

Customized Review

Attention

Hop nDeep memory

…

User Model

LSTM

Attention

…

Neighborhood Model

Rating Score

Product Model

…

r1 r2 rmr1 r2 rn r1 r2 rn

Figure 2: Overview of proposed model.

3 Model

Formally, the input to our model is a tuple〈RT , RU , RN 〉, where RT = {rT1 , rT2 , ..., rTnt

}is the set of existing reviews of a target product,RU = {rU1 , rU2 , ..., rUnu

} is the set of user’s his-tory reviews, and RN = {rN1 , rN2 , ..., rNnn

} isthe set of the user’s neighborhood reviews. All thereviews are sorted with temporal order. The out-put is a pair 〈YS , YR〉, where YS is a real numberbetween 0 and 5 representing the rating score ofthe target product, and YR is a customised review.

For capturing both general and personalized in-formation, we first build a product model, a usermodel, and a neighborhood model, respectively,and then use a memory network model to integratethese three types of information, constructing acustomized product model. Finally, we predict acustomized rating score and a review collectivelyusing neural stacking. The overall architecture ofthe model is shown in Figure 2.

3.1 Review Model

A customer review is the foundation of our model,based on which we derive representations of both auser and a target product. In particular, a user pro-file can be achieved by modeling all the reviewsof the user RU , and a target product profile canbe obtained by using all existing reviews of theproduct RT . We use the average of word embed-dings to model a review. Formally, given a reviewr = {x1, x2, ..., xm}, where m is the length ofthe review, each word xk is represented with a K-dimensional embedding ewk (Mikolov et al., 2013).We use the

∑k(ewk )/m for the representation of

the review edr .

3.2 User Model

A standard LSTM (Hochreiter and Schmidhuber,1997) without coupled input and forget gates or

peephole connections is used to learn the hid-den states of the reviews. Denoting the re-current function at step t as LSTM(xt, ht−1),we obtain a sequence of hidden state vec-tors {hU1 , hU2 , ..., hUnu

} recurrently by feeding{ed(rU1), ed(rU2), ..., edrUnu

} as inputs, wherehUi = LSTM(ed(rUi), hUi−1). The initial stateand all stand LSTM parameters are randomly ini-tialized and tuned during training.

Not all reviews contribute equally to the rep-resentation of a user. We introduce an attentionmechanism (Bahdanau et al., 2014; Yang et al.,2016) to extract the reviews that are relativelymore important, and aggregate the representationof reviews to form a vector. Taking the hiddenstate {hU1 , ...hU2 , ..., hUnu

} of user model as in-put, the attention model outputs, a continuous vec-tor vU ∈ Rd×1, which is computed as a weightedsum of each hidden state hUi , namely

vU =

nu∑i

αihUi (1)

where nu is the hidden variable size, αi ∈ [0, 1] isthe weight of hUi , and

∑i αi = 1.

For each piece of hidden state hUi , the scoringfunction is calculated by

ui = tanh(WUhUi + bU ) (2)

αi =exp(ui)∑j exp(uj)

(3)

where WU and bU are model parameters. Theattention vector vU is used to represent the UserModel.

3.3 Finding Neighbor UsersWe use neighborhood reviews to improve the usermodel, since a user may not have sufficient re-views to construct a reliable model. Here a neigh-bor refers to a user that has similar tastes to thetarget user (Koren, 2008; Desrosiers and Karypis,2011). The same as the user model, we constructthe neighborhood model vN using the neighbor-hood reviews RN = {rN1 , rN2 , ..., rNnn

} with anattention recurrent network.

A key issue in building the neighborhood modelis how to find neighbors of a certain user. In thisstudy, we use matrix factorization (Koren, 2008) todetect neighbors, which is a standard approach forrecommendation (Ding et al., 2006; Li et al., 2009;

5

He et al., 2016). In particular, users’ rating scoresof products are used to build a product-users ma-trix M ∈ Rnt×nu with nt products and nu users.We approximate it using three factors, which spec-ify soft membership of products and users (Dinget al., 2006) by finding:

minF,S,T

||M − FST T ||

s.t.S ≥ 0, F ≥ 0, T ≥ 0(4)

where F ∈ Rnt×K represents the posterior prob-ability of K topic clusters for each product; S ∈RK×K encodes the distribution of each topic k;and T ∈ RK×nu indicates the posterior probabil-ity of K topic clusters for each user.

As a result of matrix factorization, we directlyobtain the probability of each user on each topicfrom the person-topic matrix T . To infer T , theoptimization problem in Eq.4 can be solved usingthe following updating rule:

Tjk ← Tjk(MTFS)jk

(TT TMTFS)jk(5)

Obtaining the user-topic matrix T , we measure theimplicit connection between two users using:

sim(i, j) =k∑

k=1

TikTjk (6)

where sim(i, j) measure the implicit connectiondegree between users i and j. If sim(i, j) is higherthan a threshold η, we consider user j as the neigh-bor of user i.

3.4 Product Model

Given the representations of existing reviews{e(rT1), e(rT2), ..., erTnt

} of the product, weuse a LSTM to model their temporal orders,obtaining a sequence of hidden state vectorshT = {hT1 , hT2 , ..., hTnt

} by recurrently feeding{e(rT1), e(rT2), ..., erTnt

} as inputs. The hiddenstate vectors hT are used to represent the product.

3.5 Customized Product Model

We use the user representation vU and the neigh-bour representation vN to transform the targetproduct representation hT = {hT1 , hT2 , ..., hTnt

}into a customised product representation vC ,which is tailored to the taste of the user. In par-ticular, a dynamic memory network (Sukhbaatar

et al., 2015; Xiong et al., 2016) is utilized to itera-tively find increasingly abstract representations ofht, by injecting vU and vN information.

The memory model consists of multiple dy-namic computational layers (hops), each of whichcontains an attention layer and a linear layer. In thefirst computational layer (hop 1), we take the hid-den variables hTi (0 ≤ i ≤ nt) of product modelas input, adaptively selecting important evidencesthrough one attention layer using vU and vN . Theoutput of the attention layer gives a linear inter-polation of hT , and the result is considered as in-put to the next layer (hop 2). In the same way,we stack multiple hops and run the steps multipletimes, so that more abstract representations of thetarget product can be derived.

The attention model outputs a continuous vectorvC ∈ Rd×1, which is computed as a weighted sumof hTi (0 ≤ i ≤ nt), namely

vC =

nt∑i

βihTi (7)

where nt is the hidden variable size, βi ∈ [0, 1] isthe weight of hTi , and

∑i βi = 1. For each piece

of hidden state hTi , we use a feed forward neuralnetwork to compute its semantic relatedness withthe abstract representation vC . The scoring func-tion is calculated as follows at hop t:

uti = tanh(WThTi +WCvt−1C

+WUvU +WNvN + b)(8)

βti =exp(uti)∑j exp(utj)

(9)

The vector vC is used to represent the customizedproduct model. At the first hop, we define V 0

C =∑n hTi/n.The product model hTi (0 ≤ i ≤ nt) rep-

resents salient information of existing reviews intheir temporal order, they do not reflect the taste ofa particular user. We use the customised productmodel to integrate user information and productinformation (as reflected by the product model),resulting in a single vector that represents a cus-tomised product. From this vector we are ableto synthesis both a customised review and a cus-tomised rating score.

3.6 Customized Review GenerationThe goal of customized review generation is togenerate a review YR from the customized prod-uct representation vC , composed by a sequence of

6

words yR1 , ..., yRnr. We decompose the predic-

tion of YR into a sequence of word-level predic-tions:

logP (YR|vC) =∑j

P (yRj |yR1 , ..., yRj−1 , vC) (10)

where each word yRj is predicted conditional onthe previously generated yR1 , ..., yRj−1 and the in-put vC . The probability is estimated by using stan-dard word softmax:

P (yRj |yR1 , ..., yRj−1 , vC) =

softmax(hRj )(11)

where hRj is the hidden state variable at times-tamp j, which is modeled as LSTM(uj−1, hRj).Here a LSTM is used to generate a new state hRj

from the representation of the previous state hRj−1

and uj−1. uj−1 is the concatenation of previouslygenerated word yRj−1 and the input representationof customized model vC .

3.7 Customized Opinion Rating PredictionWe consider two factors for customised opinionrating, namely existing review scores and the cus-tomised product representation vC . A baseline rat-ing system such as Yelp.com uses only the formerinformation, typically by taking the average of ex-isting review scores. Such a baseline gives an em-pirical square error of 1.28 (out of 5) in our ex-periments, when compared with a test set of indi-vidual user ratings, which reflects the variance inuser tastes. In order to integrate user preferencesinto the rating, we instead take a weighted aver-age of existing ratings cores, so that the scores ofreviews that are closer to the user preference aregiven higher weights.

As a second factor, we calculate a review scoreindependently according to the customised repre-sentation vc of existing reviews, without consid-ering review scores. The motivation is two fold.First, existing reviews can be relatively few, andhence using their scores alone might not be suffi-cient for a confident score. Second, existing rat-ings can be all different from a users personal rat-ing, if the existing reviews do not come from theuser’s neighbours. As a result, using the averageor weighted average of existing reviews, the per-sonalised user rating might not be reached.

Formally, given the rating scores s1, s2, ..., snof existing reviews, and the the customized prod-

uct representation vC , we calculate:

YS =n∑i

αi · si + µ tanh(WSvC + bS) (12)

In the left term∑n

i αi·si, we use attention weightsαi to measure the important of each rating scoresi. The right term tanh(WSvC + bS) is a review-based shift, weighted by µ.

Since the result of customized review genera-tion can be helpful for rating score prediction, weuse neural stacking additionally feeding the lasthidden state hRn of review generation model asinput for YS prediction, resulting in

YS =n∑i

αi · si+

+ µ tanh(WS(vC ⊕ hRn) + bS)

(13)

where ⊕ denotes vector concatenation.

3.8 TrainingFor our task, there are two joint training objec-tives, for review scoring and review summarisa-tion, respectively. The loss function for the formeris defined as:

L(Θ) =N∑i=1

(Y ∗Si− YSi)

2 +λ

2||Θ||2 (14)

where Y ∗Si

is the predicted rating score, YSi is therating score in the training data, Θ is the set ofmodel parameters and λ is a parameter for L2 reg-ularization.

We train the customized review generationmodel by maximizing the log probability ofEq.10 (Sutskever et al., 2014; Rush et al., 2015).Standard back propagation is performed to opti-mize parameters, where gradients also propagatefrom the scoring objective to the review genera-tion objective due to neural stacking (Eq.13). Weapply online training, where model parameters areoptimized by using Adagrad (Duchi et al., 2011).For all LSTM models, we empirically set the sizeof the hidden layers to 128. We train word em-beddings using the Skip-gram algorithm (Mikolovet al., 2013)4, using a window size of 5 and vec-tor size of 128. In order to avoid over-fitting,dropout (Hinton et al., 2012) is used for word em-bedding with a ratio of 0.2. The neighbor similar-ity threshold η is set to 0.25.

4 https://code.google.com/p/word2vec/

7

AmountBusiness 15,584Review 334,997User 303,032

Table 1: Statistics of the dataset.

4 Experiments

4.1 Experimental Settings

Our data are collected from the yelp academicdataset5, provided by Yelp.com, a popular restau-rant review website. The data set contains threetypes of objects: business, user, and review, wherebusiness objects contain basic information aboutlocal businesses (i.e. restaurants), review objectscontain review texts and star rating, and user ob-jects contain aggregate information about a singleuser across all of Yelp. Table 1 illustrates the gen-eral statistics of the dataset.

For evaluating our model, we choose 4,755user-product pairs from the dataset. For each pair,the existing reviews of the target service (restau-rant) are used for the product model. The ratingscore given by each user to the target service isconsidered as the gold customized rating score,and the review of the target service given by eachuser is used as the gold-standard customized re-view for the user. The remaining reviews of eachuser are used for training the user model. We use3,000 user-product pairs to train the model, 1,000pairs as testing data, and remaining data for devel-opment.

We use the ROUGE-1.5.5 (Lin, 2004) toolkitfor evaluating the performance of customizedreview generation, and report unigram overlap(ROUGE-1) as a means of assessing informative-ness. We use Mean Square Error (MSE) (Wan,2013; Tang et al., 2015) is used as the evalua-tion metric for measuring the performance of cus-tomized rating score prediction. MSE penalizesmore severe errors more heavily.

4.2 Development Experiments

4.2.1 Ablation TestEffects of various configurations of our model, areshown on Table 2, where Joint is the full model ofthis paper, -user ablates the user model, -neighborablates the neighbor model, -rating is a single-taskmodel that generates a review without the rating

5https://www.yelp.com/academic dataset

Rating GenerationJoint 0.904 0.267-user 1.254 0.220-neighbor 1.162 0.245-user,-neighbor 1.342 0.205-rating - 0.254-generation 1.042 -

Table 2: Feature ablation tests.

HOP Bais0 1.342 0 1.1021 1.102 1 0.9042 1.046 2 1.0673 0.904 3 1.1364 0.987 4 1.2065 1.102 5 1.2276 1.0457 1.1268 1.1729 1.15210 1.167

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1.25

0 1 2 3 4 5

MS

E

μ

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1.25

1.30

1.35

1.40

0 1 2 3 4 5 6 7 8 9 10

MS

E

hop

Figure 3: Influence of hops.

score, and -generation generates only the ratingscore.

By comparing “Joint” and “-user,-neighbor”,we can find that customized information have sig-nificant influence on both the rating and reviewgeneration results (p − value < 0.01 using t-test). In addition, comparison between “-Joint”and “-user”, and between “-user” and “-user, -neighbor” shows that both the user informationand the neighbour user information of the user areeffective for improving the results. A users neigh-bours can indeed alleviate scarcity of user reviews.

Finally, comparison between “Joint” and “-generation”, and between “Joint” and “-rating”shows that multi-task learning by parameter shar-ing is highly useful.

4.2.2 Influence of Hops

We show the influence of hops of memory networkfor rating prediction on Figure 3. Note that, themodel would only consider the general product re-views (−user,−neighbor), when hop = 0. Fromthe figure we can find that, when hop = 3, theperformance is the best. It indicates that multiplehops can capture more abstract evidences from ex-ternal memory to improve the performance. How-ever, too many hops leads to over-fitting, therebyharms the performance. As a result, we choose 3as the number of hops in our final test.

8

HOP Bais0 1.342 0 1.1021 1.102 1 0.9042 1.046 2 1.0673 0.904 3 1.1364 0.987 4 1.2065 1.102 5 1.2276 1.0457 1.1268 1.1729 1.15210 1.167

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1.25

0 1 2 3 4 5

MS

E

μ

0.90

0.95

1.00

1.05

1.10

1.15

1.20

1.25

1.30

1.35

1.40

0 1 2 3 4 5 6 7 8 9 10

MS

E

hop

Figure 4: Influence of bias score.

4.2.3 Influence of µWe show the influence of the bias weight parame-ter µ for rating prediction in Figure 4. With µ be-ing 0, the model uses the weighted sum of existingreviews to score the product. When µ is very large,the system tends to use only the customized prod-uct representation vc to score the product, henceignoring existing review scores, which are a use-ful source of information. Our results show thatwhen µ is 1, the performance is optimal, thus indi-cating both existing review scores and review con-tents are equally useful.

4.3 Final Results

We show the final results for opinion recommen-dation, comparing our proposed model with thefollowing state-of-the-art baseline systems:

• RS-Average is the widely-adopted baseline(e.g., by Yelp.com), using the averaged re-view scores as the final score.

• RS-Linear estimates the rating score that auser would give by sui = sall+su+si (Ricciet al., 2011), where su and si are the the train-ing deviations of the user u and the product i,respectively.

• RS-Item applies kNN to estimate the ratingscore (Sarwar et al., 2001). We choose thecosine similarity between vc to measure thedistance between product.

• RS-MF is a state-of-the-art recommendationmodel, which uses matrix factorisation topredict rating score (Ding et al., 2006; Liet al., 2009; He et al., 2016).

• Sum-Opinosis uses a graph-based frameworkto generate abstractive summarisation givenredundant opinions (Ganesan et al., 2010).

Rating GenerationRS-Average 1.280 -RS-Linear 1.234 -RS-Item 1.364 -RS-MF 1.143 -Sum-Opinosis - 0.183Sum-LSTM-Att - 0.196Joint 1.023 0.250

Table 3: Final results.

• Sum-LSTM-Att is a state-of-the-art neural ab-stractive summariser, which uses an atten-tional neural model to consolidate informa-tion from multiple text sources, generat-ing summaries using LSTM decoding (Rushet al., 2015; Wang and Ling, 2016).

All the baseline models are single-task mod-els, without considering rating and summarisationprediction jointly. The results are shown in Ta-ble 3. Our model (“ Joint”) significantly outper-forms both “RS-Average” and “RS-Linear” (p −value < 0.01 using t-test), which demonstratesthe strength of opinion recommendation, whichleverages user characteristics for calculating a rat-ing score for the user.

Our proposed model also significantly outper-forms state-of-the-art recommendation systems(RS-Item and RS-MF) (p− value < 0.01 using t-test), indicating that textual information are a use-ful addition to the rating scores themselves for rec-ommending a product.

Finally, comparison between our proposedmodel and state-of-the-art summarisation tech-niques (Sum-Opinosis and Sum-LSTM-Att)shows the advantage of leveraging user informa-tion to enhance customised review generation,and also the strength of joint learning.

5 Conclusion

We presented a dynamic memory model for opin-ion recommendation, a novel task of jointly pre-dicting the review and rating score that a certainuser would give to a certain product or service.In particular, a deep memory network was utilizedto find the association between the user and theproduct, jointly yielding the rating score and cus-tomised review. Results show that our methods arebetter results compared to several pipelines base-lines using state-of-the-art sentiment rating andsummarisation systems.

9

ReferencesGediminas Adomavicius and Alexander Tuzhilin.

2005. Toward the next generation of recommendersystems: A survey of the state-of-the-art and pos-sible extensions. IEEE transactions on knowledgeand data engineering 17(6):734–749.

Dzmitry Bahdanau, Kyunghyun Cho, and YoshuaBengio. 2014. Neural machine translation byjointly learning to align and translate. CoRRabs/1409.0473. http://arxiv.org/abs/1409.0473.

Hongshen Chen, Yue Zhang, and Qun Liu. 2016a.Neural network for heterogeneous annota-tions. In Proceedings of the 2016 Conferenceon Empirical Methods in Natural LanguageProcessing, EMNLP 2016, Austin, Texas,USA, November 1-4, 2016. pages 731–741.http://aclweb.org/anthology/D/D16/D16-1070.pdf.

Huimin Chen, Maosong Sun, Cunchao Tu, YankaiLin, and Zhiyuan Liu. 2016b. Neural sen-timent classification with user and product at-tention. In Proceedings of the 2016 Con-ference on Empirical Methods in Natural Lan-guage Processing, EMNLP 2016, Austin, Texas,USA, November 1-4, 2016. pages 1650–1659.http://aclweb.org/anthology/D/D16/D16-1171.pdf.

Ronan Collobert, Jason Weston, Leon Bottou,Michael Karlen, Koray Kavukcuoglu, andPavel P. Kuksa. 2011. Natural language pro-cessing (almost) from scratch. Journal ofMachine Learning Research 12:2493–2537.http://dl.acm.org/citation.cfm?id=2078186.

Christian Desrosiers and George Karypis. 2011. Acomprehensive survey of neighborhood-based rec-ommendation methods. In Recommender SystemsHandbook, pages 107–144.

Chris Ding, Tao Li, Wei Peng, and Haesun Park. 2006.Orthogonal nonnegative matrix t-factorizations forclustering. In Proceedings of the 12th ACMSIGKDD international conference on Knowledgediscovery and data mining. ACM, pages 126–135.

John C. Duchi, Elad Hazan, and Yoram Singer.2011. Adaptive subgradient methods for on-line learning and stochastic optimization. Jour-nal of Machine Learning Research 12:2121–2159.http://dl.acm.org/citation.cfm?id=2021068.

Kavita Ganesan, ChengXiang Zhai, and Jiawei Han.2010. Opinosis: a graph-based approach to abstrac-tive summarization of highly redundant opinions. InProceedings of the 23rd international conference oncomputational linguistics. Association for Compu-tational Linguistics, pages 340–348.

Wenliang Gao, Naoki Yoshinaga, Nobuhiro Kaji, andMasaru Kitsuregawa. 2013. Modeling user leniencyand product popularity for sentiment classification.In IJCNLP. pages 1107–1111.

Alex Graves, Greg Wayne, and Ivo Danihelka. 2014.Neural turing machines. CoRR abs/1410.5401.http://arxiv.org/abs/1410.5401.

Xiangnan He, Hanwang Zhang, Min-Yen Kan, andTat-Seng Chua. 2016. Fast matrix factoriza-tion for online recommendation with implicit feed-back. In Proceedings of the 39th InternationalACM SIGIR conference on Research and De-velopment in Information Retrieval, SIGIR 2016,Pisa, Italy, July 17-21, 2016. pages 549–558.https://doi.org/10.1145/2911451.2911489.

James Henderson, Paola Merlo, Ivan Titov, andGabriele Musillo. 2013. Multilingual joint pars-ing of syntactic and semantic dependencies with alatent variable model. Computational Linguistics39(4):949–998.

Geoffrey E. Hinton, Nitish Srivastava, AlexKrizhevsky, Ilya Sutskever, and Ruslan Salakhut-dinov. 2012. Improving neural networks bypreventing co-adaptation of feature detectors. CoRRabs/1207.0580. http://arxiv.org/abs/1207.0580.

Sepp Hochreiter and Jurgen Schmidhu-ber. 1997. Long short-term memory.Neural Computation 9(8):1735–1780.https://doi.org/10.1162/neco.1997.9.8.1735.

Minqing Hu and Bing Liu. 2004. Mining and summa-rizing customer reviews. In Proceedings of the TenthACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining, Seattle, Wash-ington, USA, August 22-25, 2004. pages 168–177.https://doi.org/10.1145/1014052.1014073.

Ozan Irsoy and Claire Cardie. 2014. Opinion miningwith deep recurrent neural networks. In EMNLP.pages 720–728.

Niklas Jakob and Iryna Gurevych. 2010. Extract-ing opinion targets in a single and cross-domainsetting with conditional random fields. In Pro-ceedings of the 2010 Conference on EmpiricalMethods in Natural Language Processing, EMNLP2010, 9-11 October 2010, MIT Stata Center, Mas-sachusetts, USA, A meeting of SIGDAT, a Spe-cial Interest Group of the ACL. pages 1035–1045.http://www.aclweb.org/anthology/D10-1101.

Paul B Kantor, Lior Rokach, Francesco Ricci, andBracha Shapira. 2011. Recommender systems hand-book .

Yoon Kim. 2014. Convolutional neural networks forsentence classification. In Proceedings of the 2014Conference on Empirical Methods in Natural Lan-guage Processing, EMNLP 2014, October 25-29,2014, Doha, Qatar, A meeting of SIGDAT, a Spe-cial Interest Group of the ACL. pages 1746–1751.http://aclweb.org/anthology/D/D14/D14-1181.pdf.

Yehuda Koren. 2008. Factorization meets the neigh-borhood: a multifaceted collaborative filteringmodel. In Proceedings of the 14th ACM SIGKDD

http://arxiv.org/abs/1409.0473



http://aclweb.org/anthology/D/D16/D16-1070.pdf







http://dl.acm.org/citation.cfm?id=2078186








https://doi.org/10.1145/2911451.2911489

https://doi.org/10.1145/2911451.2911489

https://doi.org/10.1145/2911451.2911489

https://doi.org/10.1145/2911451.2911489




https://doi.org/10.1162/neco.1997.9.8.1735

https://doi.org/10.1162/neco.1997.9.8.1735

https://doi.org/10.1145/1014052.1014073

https://doi.org/10.1145/1014052.1014073

https://doi.org/10.1145/1014052.1014073

http://www.aclweb.org/anthology/D10-1101







10

international conference on Knowledge discoveryand data mining. ACM, pages 426–434.

Ankit Kumar, Ozan Irsoy, Peter Ondruska, MohitIyyer, James Bradbury, Ishaan Gulrajani, VictorZhong, Romain Paulus, and Richard Socher.2016. Ask me anything: Dynamic memorynetworks for natural language processing. InProceedings of the 33nd International Conferenceon Machine Learning, ICML 2016, New YorkCity, NY, USA, June 19-24, 2016. pages 1378–1387.http://jmlr.org/proceedings/papers/v48/kumar16.html.

Cane WK Leung, Stephen CF Chan, and Fu-lai Chung.2006. Integrating collaborative filtering and sen-timent analysis: A rating inference approach. InProceedings of the ECAI 2006 workshop on recom-mender systems. pages 62–66.

Fangtao Li, Nathan Nan Liu, Hongwei Jin, KaiZhao, Qiang Yang, and Xiaoyan Zhu. 2011. In-corporating reviewer and product information forreview rating prediction. In IJCAI 2011, Pro-ceedings of the 22nd International Joint Confer-ence on Artificial Intelligence, Barcelona, Cat-alonia, Spain, July 16-22, 2011. pages 1820–1825. https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-305.

Fangtao Li, Sheng Wang, Shenghua Liu, and MingZhang. 2014. Suit: A supervised user-item basedtopic model for sentiment analysis. In AAAI. pages1636–1642.

Tao Li, Yi Zhang, and Vikas Sindhwani. 2009. A non-negative matrix tri-factorization approach to senti-ment classification with lexical prior knowledge. InProceedings of the Joint Conference of the 47th An-nual Meeting of the ACL and the 4th InternationalJoint Conference on Natural Language Processingof the AFNLP: Volume 1-Volume 1. Association forComputational Linguistics, pages 244–252.

Chenghua Lin and Yulan He. 2009. Jointsentiment/topic model for sentiment anal-ysis. In Proceedings of the 18th ACMConference on Information and KnowledgeManagement, CIKM 2009, Hong Kong,China, November 2-6, 2009. pages 375–384.https://doi.org/10.1145/1645953.1646003.

Chin-Yew Lin. 2004. Rouge: A package for auto-matic evaluation of summaries. In Text summariza-tion branches out: Proceedings of the ACL-04 work-shop. Barcelona, Spain, volume 8.

Pengfei Liu, Shafiq R. Joty, and Helen M. Meng. 2015.Fine-grained opinion mining with recurrent neuralnetworks and word embeddings. In Proceedings ofthe 2015 Conference on Empirical Methods in Nat-ural Language Processing, EMNLP 2015, Lisbon,Portugal, September 17-21, 2015. pages 1433–1443.http://aclweb.org/anthology/D/D15/D15-1168.pdf.

Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S.Corrado, and Jeffrey Dean. 2013. Distributedrepresentations of words and phrases and their com-positionality. In Advances in Neural InformationProcessing Systems 26: 27th Annual Conferenceon Neural Information Processing Systems 2013.Proceedings of a meeting held December 5-8,2013, Lake Tahoe, Nevada, United States.. pages3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.

Hitoshi Nishikawa, Takaaki Hasegawa, YoshihiroMatsuo, and Gen-ichiro Kikui. 2010. Opti-mizing informativeness and readability for senti-ment summarization. In ACL 2010, Proceedingsof the 48th Annual Meeting of the Associationfor Computational Linguistics, July 11-16, 2010,Uppsala, Sweden, Short Papers. pages 325–330.http://www.aclweb.org/anthology/P10-2060.

Bo Pang and Lillian Lee. 2005. Seeing stars: Exploit-ing class relationships for sentiment categorizationwith respect to rating scales. In ACL 2005, 43rd An-nual Meeting of the Association for ComputationalLinguistics, Proceedings of the Conference, 25-30June 2005, University of Michigan, USA. pages115–124. http://aclweb.org/anthology/P/P05/P05-1015.pdf.

Bo Pang and Lillian Lee. 2008. Opinion mining andsentiment analysis. Foundations and trends in infor-mation retrieval 2(1-2):1–135.

Guang Qiu, Bing Liu, Jiajun Bu, and Chun Chen.2011. Opinion word expansion and target extractionthrough double propagation. Computational Lin-guistics 37(1):9–27.

Lizhen Qu, Georgiana Ifrim, and Gerhard Weikum.2010. The bag-of-opinions method for review rat-ing prediction from sparse text patterns. In COLING2010, 23rd International Conference on Computa-tional Linguistics, Proceedings of the Conference,23-27 August 2010, Beijing, China. pages 913–921.http://aclweb.org/anthology/C10-1103.

Francesco Ricci, Lior Rokach, Bracha Shapira,and Paul B. Kantor, editors. 2011. Rec-ommender Systems Handbook. Springer.http://www.springerlink.com/content/978-0-387-85819-7.

Alexander M. Rush, Sumit Chopra, and Jason We-ston. 2015. A neural attention model for abstrac-tive sentence summarization. In Proceedings ofthe 2015 Conference on Empirical Methods in Nat-ural Language Processing, EMNLP 2015, Lisbon,Portugal, September 17-21, 2015. pages 379–389.http://aclweb.org/anthology/D/D15/D15-1044.pdf.

Badrul M. Sarwar, George Karypis, Joseph A.Konstan, and John Riedl. 2001. Item-basedcollaborative filtering recommendation algo-rithms. In Proceedings of the Tenth International

http://jmlr.org/proceedings/papers/v48/kumar16.html



https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-305





https://doi.org/10.1145/1645953.1646003

https://doi.org/10.1145/1645953.1646003

https://doi.org/10.1145/1645953.1646003

https://doi.org/10.1145/1645953.1646003




http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality






http://www.aclweb.org/anthology/P10-2060




http://aclweb.org/anthology/P/P05/P05-1015.pdf





http://aclweb.org/anthology/C10-1103



http://www.springerlink.com/content/978-0-387-85819-7

http://www.springerlink.com/content/978-0-387-85819-7




https://doi.org/10.1145/371920.372071

https://doi.org/10.1145/371920.372071

https://doi.org/10.1145/371920.372071

11

World Wide Web Conference, WWW 10, HongKong, China, May 1-5, 2001. pages 285–295.https://doi.org/10.1145/371920.372071.

Vivek Kumar Singh, Mousumi Mukherjee, and Ghan-shyam Kumar Mehta. 2011. Combining collab-orative filtering and sentiment classification forimproved movie recommendations. In Multi-disciplinary Trends in Artificial Intelligence - 5th In-ternational Workshop, MIWAI 2011, Hyderabad, In-dia, December 7-9, 2011. Proceedings. pages 38–50.

Richard Socher, Alex Perelygin, Jean Y Wu, JasonChuang, Christopher D Manning, Andrew Y Ng,and Christopher Potts. 2013. Recursive deep mod-els for semantic compositionality over a sentimenttreebank. In Proceedings of the 2013 Conference onEmpirical Methods in Natural Language Process-ing, EMNLP 2013, 18-21 October 2013, Grand Hy-att Seattle, Seattle, Washington, USA, A meeting ofSIGDAT, a Special Interest Group of the ACL.

Xiaoyuan Su and Taghi M Khoshgoftaar. 2009. A sur-vey of collaborative filtering techniques. Advancesin artificial intelligence 2009:4.

Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston,and Rob Fergus. 2015. End-to-end memory net-works. In Advances in Neural Information Process-ing Systems 28: Annual Conference on Neural In-formation Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada. pages 2440–2448. http://papers.nips.cc/paper/5846-end-to-end-memory-networks.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014.Sequence to sequence learning with neural net-works. In Advances in Neural Information Process-ing Systems 27: Annual Conference on Neural In-formation Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. pages 3104–3112. http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.

Kai Sheng Tai, Richard Socher, and Christopher D.Manning. 2015. Improved semantic representa-tions from tree-structured long short-term mem-ory networks. In Proceedings of the 53rd An-nual Meeting of the Association for Computa-tional Linguistics and the 7th International JointConference on Natural Language Processing ofthe Asian Federation of Natural Language Pro-cessing, ACL 2015, July 26-31, 2015, Beijing,China, Volume 1: Long Papers. pages 1556–1566.http://aclweb.org/anthology/P/P15/P15-1150.pdf.

Duyu Tang, Bing Qin, Ting Liu, and Yuekui Yang.2015. User modeling with neural network forreview rating prediction. In Proceedings of theTwenty-Fourth International Joint Conference onArtificial Intelligence, IJCAI 2015, Buenos Aires,Argentina, July 25-31, 2015. pages 1340–1346.http://ijcai.org/Abstract/15/193.

Zhiyang Teng, Duy-Tin Vo, and Yue Zhang. 2016.Context-sensitive lexicon features for neural senti-ment analysis. In Proceedings of the 2016 Con-ference on Empirical Methods in Natural Lan-guage Processing, EMNLP 2016, Austin, Texas,USA, November 1-4, 2016. pages 1629–1638.http://aclweb.org/anthology/D/D16/D16-1169.pdf.

Ke M. Tran, Arianna Bisazza, and Christof Monz.2016. Recurrent memory networks for languagemodeling. In NAACL HLT 2016, The 2016 Con-ference of the North American Chapter of theAssociation for Computational Linguistics: Hu-man Language Technologies, San Diego Califor-nia, USA, June 12-17, 2016. pages 321–331.http://aclweb.org/anthology/N/N16/N16-1036.pdf.

Xiaojun Wan. 2013. Co-regression for cross-languagereview rating prediction. In Proceedings of the51st Annual Meeting of the Association for Com-putational Linguistics, ACL 2013, 4-9 August 2013,Sofia, Bulgaria, Volume 2: Short Papers. pages526–531. http://aclweb.org/anthology/P/P13/P13-2094.pdf.

Dong Wang and Yang Liu. 2011. A pilot studyof opinion summarization in conversations. InThe 49th Annual Meeting of the Association forComputational Linguistics: Human Language Tech-nologies, Proceedings of the Conference, 19-24June, 2011, Portland, Oregon, USA. pages 331–339.http://www.aclweb.org/anthology/P11-1034.

Lu Wang and Wang Ling. 2016. Neural network-based abstract generation for opinions and argu-ments. In NAACL HLT 2016, The 2016 Con-ference of the North American Chapter of theAssociation for Computational Linguistics: Hu-man Language Technologies, San Diego Cali-fornia, USA, June 12-17, 2016. pages 47–57.http://aclweb.org/anthology/N/N16/N16-1007.pdf.

Mingxuan Wang, Zhengdong Lu, Hang Li, and QunLiu. 2016. Memory-enhanced decoder for neu-ral machine translation. In Proceedings of the2016 Conference on Empirical Methods in Natu-ral Language Processing, EMNLP 2016, Austin,Texas, USA, November 1-4, 2016. pages 278–286.http://aclweb.org/anthology/D/D16/D16-1027.pdf.

Yao Wu and Martin Ester. 2015. FLAME: A proba-bilistic model combining aspect based opinion min-ing and collaborative filtering. In Proceedings ofthe Eighth ACM International Conference on WebSearch and Data Mining, WSDM 2015, Shang-hai, China, February 2-6, 2015. pages 199–208.https://doi.org/10.1145/2684822.2685291.

Caiming Xiong, Stephen Merity, and RichardSocher. 2016. Dynamic memory networks forvisual and textual question answering. In Pro-ceedings of the 33nd International Conferenceon Machine Learning, ICML 2016, New YorkCity, NY, USA, June 19-24, 2016. pages 2397–2406.http://jmlr.org/proceedings/papers/v48/xiong16.html.

https://doi.org/10.1145/371920.372071

http://papers.nips.cc/paper/5846-end-to-end-memory-networks




http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks








http://ijcai.org/Abstract/15/193






http://aclweb.org/anthology/N/N16/N16-1036.pdf

















https://doi.org/10.1145/2684822.2685291

https://doi.org/10.1145/2684822.2685291

https://doi.org/10.1145/2684822.2685291

https://doi.org/10.1145/2684822.2685291

http://jmlr.org/proceedings/papers/v48/xiong16.html



12

Bishan Yang and Claire Cardie. 2013. Joint in-ference for fine-grained opinion extraction. InProceedings of the 51st Annual Meeting ofthe Association for Computational Linguistics,ACL 2013, 4-9 August 2013, Sofia, Bulgaria,Volume 1: Long Papers. pages 1640–1649.http://aclweb.org/anthology/P/P13/P13-1161.pdf.

Xiwang Yang, Yang Guo, Yong Liu, and Harald Steck.2014. A survey of collaborative filtering based so-cial recommender systems. Computer Communica-tions 41:1–10.

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He,Alex Smola, and Eduard Hovy. 2016. Hierarchicalattention networks for document classification. InNAACL 2016, 15th Annual Conference of the NorthAmerican Chapter of the Association for Computa-tional Linguistics: Human Language Technologies,San Diego, US.

Yongfeng Zhang, Guokun Lai, Min Zhang, Yi Zhang,Yiqun Liu, and Shaoping Ma. 2014. Ex-plicit factor models for explainable recommen-dation based on phrase-level sentiment analysis.In The 37th International ACM SIGIR Confer-ence on Research and Development in Informa-tion Retrieval, SIGIR ’14, Gold Coast , QLD,Australia - July 06 - 11, 2014. pages 83–92.https://doi.org/10.1145/2600428.2609579.

Yuan Zhang and David Weiss. 2016. Stack-propagation: Improved representation learningfor syntax. In Proceedings of the 54th An-nual Meeting of the Association for Compu-tational Linguistics, ACL 2016, August 7-12,2016, Berlin, Germany, Volume 1: Long Papers.http://aclweb.org/anthology/P/P16/P16-1147.pdf.




https://doi.org/10.1145/2600428.2609579

https://doi.org/10.1145/2600428.2609579

https://doi.org/10.1145/2600428.2609579

https://doi.org/10.1145/2600428.2609579





Abstract - arXiv1 Opinion Recommendation using Neural Memory Model Zhongqing Wang and Yue Zhang Singapore University of Technology and Design, 8 Somapah Road, Singapore 487372 fzhongqing

Documents