Peer recommendation using negative relevance feedback DEEPIKA SHUKLA * and C RAVINDRANATH CHOWDARY Department of Computer Science and Engineering, Indian Institute of Technology (BHU) Varanasi, Varanasi 221005, India e-mail: [email protected]; [email protected]MS received 26 May 2021; revised 30 August 2021; accepted 5 October 2021 Abstract. It is a challenging task to recommend a peer to a user based on the user’s requirement. Users may have expertise in multiple sub-domains, due to which peer recommendation is a nontrivial task. In this paper, we model peers as nodes in a graph and perform a community search. Weighted attributes are associated with every node in the graph. We propose two novel methods to compute the weights of the attributes. Relevance feedback is a popular technique used to improve the performance of retrieval systems. We propose to use negative relevance feedback in an attributed graph for peer recommendation. We use CL-tree for indexing the nodes in the graph. We compare the proposed system with the state-of-the-art on standard datasets, and our system outperforms the rival system. Keywords. Peer recommendation; negative relevance feedback; relevance feedback; nnowledge graph; co- authorship network. 1. Introduction Recommendation systems are information filtering systems over dynamically generated large volumes of data that prioritize and personalize the contents. In recent years, many researchers are showing interest in recognizing and characterizing the properties of large-scale graphs [1–6]. These graph-based techniques are used to improve the performance of the recommender systems [7–11]. Peer recommendation is a kind of community search [12, 13] problem. The community search in graph science gives the most fitting community containing the query node. By giving attributed query, it provides the most likely com- munity that is matching the query needs. In our context, peer recommendation recommends the most suitable peers for an attributed query. We used a collaboration graph as a knowledge graph to depict all the authors and their collaborations as nodes and edges, respectively. Here, the collaboration graph is some social graph modeling, where a node represents a user, and an edge represents a relationship between two nodes. In figure 1 collaboration graph is created from a coauthor- publication bipartite graph. For example, authors P and Q work together for publication A, so in the co-author network, 1 they will be connected, and their attributes are taken from publication A. Due to the data’s unexpressed nature, the quality of results is unpredictable. Generally, we do not know the output of the model a priori [14]. In peer recommendation for an attributed query, the model should recommend both the direct neighbors of the query node for matching attri- butes/keywords and those peers with similar interests but have not collaborated earlier. The majority of peer rec- ommendation model focuses on nodes that are only reachable through structural cohesiveness [15]. Relevance feedback is a popular technique in retrieval systems to improve performance. We propose to introduce relevance feedback in the recommender system. The majority of the feedback techniques depend on positive or relevant answers. Negative relevance feedback is a particular case of relevance feedback [16]. Here no positive answers are available or provided documents, and answers are assumed to be irrelevant. We use the negative relevance feedback method, which helps to find peers that are not direct neighbors to the query node. The proposed peer recommendation system is person- alized and recommends peers whose interests and exper- tise match the attributes of the query. In this paper, we create an attributed graph and generate an index tree for the graph, which helps in finding the nodes having similar interests as the query node. We propose two features to compute the weight of nodes. These features help in finding the most appropriate peers for the query node. The proposed model provides flexibility in terms of choosing the number of attributes in the query, which impacts the quality of the recommendation. If the recommended list *For correspondence 1 A kind of collaboration graph. Sådhanå (2021) 46:243 Ó Indian Academy of Sciences https://doi.org/10.1007/s12046-021-01763-5
18
Embed
Peer recommendation using negative relevance feedback
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Peer recommendation using negative relevance feedback
DEEPIKA SHUKLA* and C RAVINDRANATH CHOWDARY
Department of Computer Science and Engineering, Indian Institute of Technology (BHU) Varanasi,
Figure 2. Graphs for author ‘Nik Nailah Binti Abdullah’ using negative relevance feedback.
(a) (b)
Figure 3. Graphs for author ‘Xuexiang Huang’ using negative relevance feedback.
243 Page 4 of 18 Sådhanå (2021) 46:243
3. Peer recommendation in dynamic attributedgraphs
In this paper, we model peer recommendation as an
attributed community query (ACQ) [49] problem. Let
G(V, E, X) be an undirected attributed graph along with a
set of edges E and set of vertices V. Here, every node v 2 Vis associated with a set of attributes Xv. For a given
attributed undirected graph G(V, E, X) and a query node
q 2 V with attributes set A, ACQ returns an attributed
community (AC) Gq containing q and Gq � GðV;E;XÞsuch that Gq satisfy structural cohesiveness (i.e., maximal
connectivity among the nodes in Gq) and keyword cohe-
siveness (i.e., all the nodes with x number of keywords in
common).
The graph created in this paper is an attributed co-
authorship graph: a type of collaboration graph of published
articles where attributed nodes are authors/peers that store
information related to their area of interest and expertise,
and a link between two nodes indicate collaboration for an
article. Here the attribute values are dynamic, i.e., when-
ever there is a new link between two nodes, the corre-
sponding attributes’ values get updated. Further, these
attributes and their weights are used to rank the retrieved
communities. The computation of weight is given in
algorithm 1.
3.1 Attributed co-authorship graph creation
Entities in real-life networks have various attributes, and
each attribute has its significance in the relative domain.
Among all attributes of a node, we use some to calculate
the nodes’ interests and expertise. Algorithm 1 describes
the creation of a dynamic attributed co-authorship graph.
Attributes of the nodes in the graph are taken from all the
titles of the author’s publications. Every new publication
can update the list of authors and the associated attributes.
All the authors of a publication pi are represented as a set
Api where Api is list of authors fa1; a2; a3::ang. For everyauthor aj 2 Api , her starting year of publishing ðfyjÞ, totalnumber of publications ðpcjÞ and associated keywords’
attributes (step 2) are computed. If aj 62 G then we add aj to
G and compute its associated attributes’ values. fyj and pcjstores the year of publication ðyrÞ of pi and 1 respectively
ðsteps5� 9Þ. If aj 2 G then for pi, its pcj increments by one
(step 11).
Fa denotes all the keywords of author aj. All the key-
words taken from the title of pi are represented as fpi where
fpi is fk1; k2; k3::kng. If kl 2 fpi and kl 62 Fa then we add kl in
Fa ðsteps13� 15Þ. First-publishing-time ðikyaÞ and latest-
publishing-time ðlkyaÞ of author aj for keyword kl stores yr.
Here, ikya is the initial year when the author started pub-
lishing in a particular keyword kl and lkya is the latest year
when the author published in that keyword. Keyword-fre-
quency ðKkfaÞ of kl is the total number of publications in kl
and initially is set to one ðsteps16� 19Þ. If kl 2 Fa, Kkfa
increases by one and lkya updates by yr ðsteps18� 19Þ .
There are edges between all pairs of vertices of Api in G.
3.2 CL-tree creation
As based on the index, the efficiency of answering ACQ is
improved significantly [48], we create CL-tree for G [49]. It
is an indexed tree-like structure based on the nested k-core2
property where each node contains five attributes. These are
vertex_set (number of vertices of similar nature of con-
nectivity forming that node), parent_node, child_list,
core_number (minimum k-connectivity among all vertices
of that node), and inverted list (features:interested_vertices
pair dictionary). We use an advanced method that follows
the bottom-up approach and is more efficient than the top-
down approach (basic method) regarding real networks for
static CL-tree creation. Initially, we calculate the cores of
all vertices of G, then the recursive creation of nodes of cl-
tree goes from kmax to kmin. Here kmin is zero. All the ver-
tices of ki (kmin � ki � kmax) are considered to find the
number of components formed by these set of vertices.
Each node of the CL-tree denotes a single component of a
particular core number. To address the changes in CL-tree,
we use the incremental learning approach [15] that ensures
2The k � core of a graph is defined as the subgraph in which the
minimum degree of any vertex is k.
Sådhanå (2021) 46:243 Page 5 of 18 243
adding nodes and edges in CLtree at runtime environment.
Detailed explanation of CL-tree creation is given in [48].
3.3 Peer recommendation
We query the model in the form of (q, k, A). Here q is the
query node, k is the core number, and A is the set of
attributes. We search q in the CL-tree, and the core number
of this node should be greater than or equal to k. k in the
query indicates the process of community search, involves
only nodes3 having at least k-core. We use the decremental
algorithm discussed in [12] for community search. The
decremental algorithm uses attributes of a for searching
communities. After applying a community search, we may
get multiple communities, and these communities may have
single or multiple peers for a subset of attributes.
3.4 Weighted community
The retrieved communities include peers with homoge-
neous attributes. A node of the graph can have multiple
interests, leading her into multiple communities with dif-
ferent keywords. We address two issues with the help of
weighing the community:
1. For an attributed query, the model may give multiple
communities for a subset of query attributes.
2. If there are multiple collaborators for a single publica-
tion, the result can fulfill the maximum keyword
cohesiveness and structural cohesiveness even in the
case of very few publications of the involved peers in the
received community.
So, to select the best community, we have proposed two
heuristics to weigh communities. We further use these
weights to rank and choose the best community.
(a) (b)
Figure 4. Graphs for author ‘Yaun-Zhi Song’ for results validation.
(b)(a)
Figure 5. Graphs for author ‘Yousuke Hagiwara’ for results validation.
3CL-tree nodes.
243 Page 6 of 18 Sådhanå (2021) 46:243
1. Importance(I): Importance Ika of a node a in a subdo-
main k is computed as given in equation 1. Any node’s
importance is measured by the percentage of the total
number of publications in a particular area to the total
publication duration in that area. Here duration is the
period of continuation in a particular area. Equation 1 is
used to calculate the Ika , where Kkf a
denotes the total
number of publications by a in a particular field4 k.Duration of that publication is given by the difference to
the first time she published an article ðfkeyaÞ in a
subdomain to the time she published her latest publica-
tion ðlkkeyaÞ in the same subdomain.
If there are d co-authors with respect to publication pi, allthese d co-authors will have at least d � 1 co-authors in
G. If one of the d co-authors is publishing for the first
time, even then, her connectivity will be d � 1, so, Itakes care of this issue.
7https://code.google.com/archive/p/word2vec/.8It gives the most similar word for any keyword.9http://dblp.uni-trier.de/xml/.10https://en.wikipedia.org/wiki/Stop_word.11https://en.wikipedia.org/wiki/Acronym.12https://en.wikipedia.org/wiki/Stemming.
Sådhanå (2021) 46:243 Page 9 of 18 243
retrieved result has multiple communities, but our selection
model selects the best community. The number of common
keywords is not the only criteria to select the community
(figure 7).
Table 2 gives weights of the retrieved keyword sets for
the author ‘Tzu-Kuo Huang’. We observe that the size of
keyword sets is at most three. Though the sets of size three
are available, the model has chosen the set of size two. The
keyword set (generalize, multiclass) has more publication
in a short duration, and our model intelligently chooses this
keyword set.
4.1.2 Effect of negative relevance feedbackTable 3 shows some results after applying negative
feedback to see the effectiveness of the negative
relevance feedback. We consider all the direct neighbors
as negative points. Here relevance is measured by using
equation 5. Figures 2 and 3 give the communities for the
queries ‘Nik Nailah Abdullah’ and ‘Xuexiang Huang’,
respectively. As we can see, figures 2a and 3a are connected
to all their neighbors. The system suggests no new peers.
To get a new community, we update query keywords. After
applying negative relevance feedback, we get communities
given in figures 2b and 3b respectively. The connectivity
for every node of the retrieved community is not
compulsory as individuals are retrieved because of
keyword cohesiveness, and all the initially retrieved peers
are penalized, whereas the query node will remain intact.
The retrieved results are connected in the real graph by
nested core property.
4.1.3 Accuracy validationWe use retrieved results for recommendation and relevantresults to validate recommendations. Here validation is to
check whether the query node is using suggested peers in
the future. We take three sets of randomly selected 1000
authors and apply peer recommendations for these selected
authors. Table 4 shows a few examples of validation
process where the retrieved_community has the final peer
recommendations, and relevant_community has the actual
collaborations for the same query based on the dataset set2.
Results are illustrated in figures 4 and 5 for authors ‘Yuan-
Zhi Song’ and ‘Yousuke Hagiwara’ respectively.
Figures 4a and 5a show original peer recommendations
Shivay Facility’ under the National Supercomputing Mis-
sion, Government of India at the Indian Institute of
Technology, Varanasi are gratefully acknowledged.
References
[1] Yixiang Fang, Xin Huang, Lu Qin, Ying Zhang, Wenjie
Zhang, Reynold Cheng and Xuemin Lin 2020 A survey of
community search over big graphs. VLDB J. 29(1): 353–392[2] Yixiang Fang, Yixing Yang, Wenjie Zhang, Xuemin Lin and
Xin Cao 2020 Effective and efficient community search over
large heterogeneous information networks. Proc. VLDBEndow. 13(6): 854–867
[3] Xin Huang and Laks V S Lakshmanan 2017 Attribute-driven
community search. Proc. VLDB Endow. 10(9): 949–960[4] Lu Chen, Chengfei Liu, Rui Zhou, Jianxin Li, Xiaochun
Yang and Bin Wang 2018 Maximum co-located community
search in large scale social networks. Proc. VLDB Endow.11(10): 1233–1246
[5] Jianxin Li, Xinjue Wang, Ke Deng, Xiaochun Yang, Timos
Sellis and Jeffrey Xu Yu 2017 Most influential community
search over large social networks. In: 33rd IEEE Interna-tional Conference on Data Engineering, ICDE 2017,San Diego, CA, USA, April 19-22, 2017, pages 871–882.
IEEE Computer Society
[6] Xin Huang, Laks V S Lakshmanan and Jianliang Xu 2017
Community search over big graphs: Models, algorithms, and
opportunities. In 33rd IEEE International Conference onData Engineering, ICDE 2017, San Diego, CA, USA, April
19–22, 2017, pages 1451–1454. IEEE Computer Society
[7] Jian Wei, Jianhua He, Kai Chen, Yi Zhou and Zuoyin Tang
2017 Collaborative filtering and deep learning based recom-
mendation system for cold start items. Expert Syst. Appl. 69:29–39