Predicting Missing Links Based on a New Triangle Structuredownloads.hindawi.com/journals/complexity/2018/7312603.pdf · ResearchArticle Predicting Missing Links Based on a New Triangle
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Research ArticlePredicting Missing Links Based on a New Triangle Structure
Shenshen Bai12 Longjie Li 1 Jianjun Cheng1 Shijin Xu1 and Xiaoyun Chen 1
1School of Information Science amp Engineering Lanzhou University Lanzhou 730000 China2Department of Electronic and Information Engineering Lanzhou Vocational Technical College Lanzhou 730070 China
Correspondence should be addressed to Longjie Li ljlilzueducn and Xiaoyun Chen chenxylzueducn
Received 1 May 2018 Revised 17 October 2018 Accepted 12 November 2018 Published 2 December 2018
Guest Editor Katarzyna Musial
Copyright copy 2018 Shenshen Bai et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
With the rapid growth of various complex networks link prediction has become increasingly important because it can discover themissing information and predict future interactions between nodes in a network Recently the CAR and CCLP indexes have beenpresented for link prediction bymeans of different triangle structure informationHowever both indexesmay lose the contributionsof some shared neighbors We propose in this work a new index to make up the weakness and then improve the accuracy oflink prediction The proposed index focuses on a new triangle structure ie the triangle formed by one seed node one commonneighbor and one other node It emphasizes the importance of these triangles but does not ignore the contribution of any commonneighbor In addition the proposed index adopts the theory of resource allocation by penalizing large-degree neighborsThe resultsof comparison with CN AA RA ADP CAR CAA CRA and CCLP on 12 real-world networks show that the proposed indexoutperforms the compared methods in terms of AUC and ranking score
1 Introduction
As a fundamental research hotspot in complex networkanalysis link prediction has a wide range of applications inboth theory and reality such as analysis of network evolution[1 2] recommendation system [3] and checking potentialinteractions between proteins in biological networks [4 5]The basic task of link prediction is to estimate the missing orlatent existent links between unconnected nodes in a network[6 7] To date a host of algorithms and models have beenproposed for link prediction [6 8 9] Reference [8] groupsthem into twoways similarity-based approaches and learning-based approaches A similarity-based approach computessimilarity scores between unconnected nodes based on theknown information Then a ranked list of node pairs indescending order according to their similarity scores isobtained and the node pairs at the top are thought most likelyto have links A learning-based approach formalizes the linkprediction problem into a binary classification task [10] andusesmachine learning methods to solve the problemThe keyjob in a learning-based approach is to construct the featurevectors of node pairs In general learning-based approachesare more complicated than similarity-based ones
The hypothesis behind similarity-based approaches is themore similar that two nodes are the more likely that a linkexists between them [8] This idea is simple and intuitiveThus the study of this kind of approaches has become themainstream [6 9] The Common Neighbors (CN) index [11]as its name suggests simply counts the number of commonneighbors between two nodes The Adamic-Adar (AA) [12]and Resource Allocation (RA) [13] indexes are two variants ofthe CN index they penalize the contributions of large-degreecommon neighbors These indexes are called local methodsbecause they only use local structure information Besidessome global and quasilocal methods have also been proposedby researchers such as Katz [14] SimRank [15] RandomWalks with Restart [16] Local Path [17] FriendLink [18] andLocal RandomWalk [19]
With the increasing growth of sizes of complex networkslocal methods are still good candidates because they are moreefficient in terms of running time than global and quasilocalmethods Therefore we focus in this study on local methodsRecently Cannistraci et al proposed the CAR index [20]which suggests that links between the common neighborsie local-community-links (LCLs) are more valuable thancommon neighbors in link prediction In CAR index a local
HindawiComplexityVolume 2018 Article ID 7312603 11 pageshttpsdoiorg10115520187312603
2 Complexity
a b
c
de
f
g
h(a) Example network
(b) Triangles used in CAR index
(c) Triangles used in CCLP index
a b
c
d
hi
a b
c
d
c a
c
d
g
b(d) Triangles used in TRA index
(d) Similarity of (a b)
seed node
common neighbor
non-common neighbor neighbor of common neighbor
i
g
f
CN(a b) = 4
RA(a b) =4
3
CAR(a b) = 4
CCLP(a b) =4
3
TRA(a b) =49
24
Figure 1 Triangles used in similarity indexes
community is a triangle passing through two common neigh-bors and one seed node In the example network shown inFigure 1(a) there is one LCL between the common neighborsof seed nodes 119886 and 119887 (see Figure 1(b)) Thus CAR indexassigns a similarity score of four to nodes 119886 and 119887 Howeverif we remove the link between 119888 and 119889 CAR will assigna zero similarity score to 119886 and 119887 even though they havefour common neighbors In addition the idea of LCL is alsoplugged into AA RA and Jaccard indexes [20] Later Wu etal proposed the CCLP index based on the clustering coef-ficients of common neighbors This index considers all tri-angles passing through a common neighbor For the examplenetwork in Figure 1(a) there are triangles passing throughnodes 119888 119889 and 119891 respectively (see Figure 1(c)) Thus CCLPindex accumulates the clustering coefficients of nodes 119888 119889and 119891 when calculating the similarity between 119886 and 119887 bututterly neglects the contribution of node 119890 In real-worldnetworks it is possible that there are no triangles passingthrough some or even all shared neighbors of one node pairThus CAR and CCLP indexes may assign a very low or evenzero similarity score to the node pair even if it has manycommon neighbors
In this paper we defines a new type of triangle structurecalled TRA-triangle which is formed by one seed node onecommon neighbor and one other node (see Figure 1(d))Based on the TRA-triangle a new similarity index namelyTRA index is proposed for link prediction This indexsuggests that the common neighbors that can form TRA-tri-angles with a seed node are more important than others Inaddition the proposed index also penalizes the large-degreeneighbors as done in RA index [13] Although all theTRA CAR-based and CCLP indexes are based on trianglestructures the intuitions behind themare differentTheCAR-based indexes believe that LCLs are more valuable than
common neighbors The CCLP index is inspired by CARindex but employs all triangles passing through commonneighbors while the TRA index which only uses the TRA-triangles strikes a balance between CAR and CCLP Further-more as aforementioned CAR-based and CCLP indexes losethe contribution of those common neighbors with no trian-gles passing through them whereas TRA index counts thecontribution of all kinds of common neighbors ThereforeTRA index can achieve better prediction accuracy than CAR-based indexes and CCLP index The accuracy of TRA indexis evaluated on 12 real-world networks from various fieldsThe experimental results show that our index is far superiorto CAR-based indexes and CCLP index Take the networkof HEP as an example which is a very sparse network theimprovements made by TRA on CAR and CCLP under themetric of AUC are up to 269 and 42 respectively
The rest of the paper is structured as follows In Section 2we give the description of the link prediction problem and theevaluation metrics list the compared methods and networksand depict the Wilcoxon signed-ranks test Section 3 intro-duces the proposed method In Section 4 the experimentalresults and performance analysis of the proposed method arepresented Finally Section 5 concludes this work
2 Preliminaries
21 Problem Description and Metric Given an undirectedand unweighted network 119866(119881 119864) in which 119881 and 119864 are thenode set and link set respectively in this study multilinksand self-loops are not allowed Let119873 = |119881| be the number ofnodes in the network and let119880 be the universal possible linkset which contains119873(119873minus1)2 possible linksThen the set ofnonobserved links or nonexisting links is119880minus119864 Suppose thereare some missing links in 119880 minus 119864 the task of link prediction
Complexity 3
is to find those links A similarity-based approach assigns asimilarity score to each node pair in 119880 minus 119864 and assumes thatthe higher score a node pair has the more likely there is a linkbetween them
To test the performance of a similarity index we ran-domly divide the link set 119864 into two parts training set 119864119905119903and testing set 119864119905119904 such that 119864 = 119864119905119903 cup 119864119905119904 and 119864119905119903 cap 119864119905119904 = 0119864119905119903 is supposed to be the observed information and 119864119905119904 isused for testing Two parameter-free metrics are employedto quantify the accuracy of link prediction algorithms AUC[6] and ranking score [21 22] In this situation the AUCscore can be interpreted as the probability that a randomlyselected missing link (ie a link in 119864119905119904) is given a higherscore than a randomly selected nonexistent link (ie a linkin119880minus119864) When implementing if we perform 119899 independentcomparisons there are 1198991 times that the missing link hashigher score and 1198992 times that they have the same score TheAUC value is then computed as
Ranking score (RS) takes the ranks of links in testing setafter sorting in descend order according to their similarityscores into consideration Let 119867 = 119880 minus 119864119905119903 be the set ofnonobserved links Let 119890119894 be a missing link in 119864119905119904 and 119903119894 be itsrankThe ranking score of 119890119894 is defined as119877119878(119890119894) = 119903119894|119867| andthe ranking score of the link prediction result is as follows
Note that the AUC value is the higher the better whereasthe ranking score is the smaller the better
22 Local Similarity Indexes As yet many similarity indexeshave been proposed for link prediction [6 8 9] Here welist some local similarity indexes that will be used in ourexperiments for the purpose of comparison(1) Common Neighbor (CN) index [11] defines the
similarity between 119909 and 119910 as the number of their commonneighbors which is
119862119873(119909 119910) = 1003816100381610038161003816Γ (119909) cap Γ (119910)1003816100381610038161003816 (3)
where Γ(119909) denotes the set of neighbors of node 119909(2) Adamic-Adar (AA) index [12] is a variant of CN
index which believes that small-degree neighbors have morecontributions than large-degree neighbors when computingsimilarity Its definition is as follows
(4) Adaptive Degree Penalization (ADP) index [23]penalizes a common neighbor according to its degree and theaverage clustering coefficient of the networkTherefore it canautomatically adapt to the network The definition of ADPindex is as follows
where 120573 is a constant and 119862 is the average clusteringcoefficient of the network We set 120573 = 25 as suggested bythe authors(5)CAR index [20] suggests that two seed nodes are more
likely to link together if there are links between their commonneighbors which is defined as
where 119871(119911) is the number of links between 119911 and othercommon neighbors of 119909 and 119910(6)CAA and CRA indexes [20] are generated by plugging
the idea of CAR index into the AA and RA indexes respec-tively which are defined as
where119862119862119911 denotes the clustering coefficient of node 119911 whichis
119862119862119911 =2119905119911
119896119911 (119896119911 minus 1) (11)
in which 119905119911 is the number of triangles passing through node119911
23 Networks In this study we use 12 real-world networksdrawn from various fields to evaluate the effectiveness of linkprediction methods
(1) Advogato (ADV) a social network whose users aremainly free and open source software developers [25]
(2) Celegans (CE) the neural network of a Caenorhab-ditis elegans worm [26]
(3) Dolphin a social network of 62 dolphins in a commu-nity living off Doubtful Sound New Zealand [27]
(4) Email a network of email interchanges betweenmembers of a university [28]
4 Complexity
(5) Foodweb (FW) a food web in Florida Bay during therainy season [29]
(6) Hamster a friendship network between users onhamsterstercom [30]
(7) HEP the coauthorships network of scientists whoposted preprints on the high-energy theory archivefrom 1995 to 1999 [31]
(8) Karate the social network of a karate club at a USuniversity [32]
(9) Political blogs (PB) a network of blogs about US poli-tics [33]
(10) USAir a network of the US air transportation system[6]
(11) Word an adjacency network of common adjectivesand noun in the novel ldquoDavidCopperfieldrdquo byCharlesDickens [34]
(12) Yeast the protein-protein interaction network ofbudding yeast [35]
In this work all the aforementioned networks are treatedas undirected and unweighted networks and only the giantcomponent of each network is used Table 1 lists the basicstatistics of the giant components of these networks
Given network 119866(119881 119864) suppose 119909 119910 be two seed nodes(119909 119910) is called a seed node pair with common neighbors if theyhave at least one common neighbor119875Λ denotes the set of seednode pairs with common neighbors formally
119875Λ = (119909 119910) | (119909 119910) notin 119864 and Γ (119909) cap Γ (119910) = 0 (12)
Let 119909 119910 be two seed nodes and 119911 is one of their commonneighbors If 119862119862119911 = 0 we call 119911 is a zero-triangle-neighborotherwise 119911 is a triangle-neighbor If 119871(119911) = 0 119911 is called aCAR-triangle-neighbor and if (119909 119910 119911) = 0 (see (18)) 119911 iscalled a TRA-triangle-neighbor Let 119878 be the set of triangle-neighbors and 119878119862119860119877 119878119879119877119860 denote the sets of CAR- and TRA-triangle-neighbors respectively Clearly 119878119862119860119877 sube 119878119879119877119860 sube 119878Let 119875exist(119878) and 119875forall(119878) be two subsets of 119875Λ For any pairin 119875exist(119878) at least one of their shared neighbors is not atriangle-neighbor and for any pair in 119875forall(119878) all of theirshared neighbors are not triangle-neighbors More explicitly
119875exist (119878)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878
119875forall (119878)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878
(13)
Similarly we define 119875exist(119878119879119877119860) 119875forall(119878119879119877119860) 119875exist(119878119862119860119877) and119875forall(119878119862119860119877) which are
119875exist (119878119879119877119860)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119879119877119860
119875forall (119878119879119877119860)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119879119877119860
119875exist (119878119862119860119877)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119862119860119877
119875forall (119878119862119860119877)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119862119860119877
(14)
Correspondingly the ratios of those subsets to 119875Λ arerespectively defined as
24Wilcoxon Signed-Ranks Test TheWilcoxon signed-rankstest is a nonparametric statistical hypothesis test used tocheck whether two methods perform equally well over multi-ple networks [38 39] Let 119889119894 be the difference in performancescores of two link prediction methods on the 119894th networkThe differences are ranked in accordance with their absolutevalues in case of ties average ranks are assigned Let 119877+be the sum of ranks for the networks on which the secondmethod outperformed the first and 119877minus the sum of ranks forthe opposite For a larger number of networks the statistics
is distributed approximately normally [39] In (16) 119879 =min(119877+ 119877minus) and119873 is the number of networks
With 120572 = 005 if 119911 is small than -196 we reject the null-hypothesis which states that both methods perform equallywell
Complexity 5
Table 1 The basic structural features of the giant components of the 12 networks |119881| and |119864| are the total numbers of nodes and edgesrespectively119863 denotes the density which is119863 = 2|119864||119881|(|119881| minus 1) ⟨119896⟩ and ⟨119889⟩ present the average degree and the average shortest distancerespectively 119862 and 119903 indicate the clustering coefficient [26] and assortative coefficient [36] respectively 119867 is the degree heterogeneity [6]defined as119867 = ⟨1198962⟩⟨119896⟩2 and 119890 is the network efficiency [37]
The link prediction problem has a familiar relationship withthe network evolvingmechanism [2 40] A recently proposedtriangle growth mechanism demonstrates that various keyfeatures observed in most real-world networks can be gener-ated in simulated networks [41] Therefore triangle structureinformation has an important effect in link formation
In this work we focus on a new triangle structure namelyTRA-triangle A TRA-triangle passes through one seed nodeone common neighbor and one other node In our opinionthe commonneighbors that can formTRA-triangles aremoreimportant than others Given two nodes 119906 and V we denotethe number of triangles passing through them as (119906 V)which is
(119906 V) =
119862119873(119906 V) if (119906 V) isin 1198640 otherwise
(17)
For the example network in Figure 1(a) the trianglesused for seed nodes 119886 119887 are shown in Figure 1(d) Clearly
(119886 119888) = 2 and (119886 119889) = 1 Thus node 119888 is in more closecontact with 119886 than 119889 Given seed nodes 119909 and 119910 119911 is oneof their common neighbors Function(119909 119910 119911) sums up thenumber of TRA-triangles formed by 119909 119911 and 119910 119911 which is
In this paper we propose a new similarity index bycombining the aforementioned triangle structure and theidea of RA index [13] For the convenience of statement wename our new method TRA index Its definition is
In (19) the numerator is 1 + (119909 119910 119911)2 Therefore theTRA index does not miss the effect of any common neighborIf all common neighbors are zero-triangle-neighbors TRAdegenerates to RA For the example network in Figure 1(a)119879119877119860(119886 119887) = (1+32)4+(1+22)3+(1+02)2+(1+02)4 =4924
6 Complexity
Table 3 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 01The best performance for each network is emphasized by boldface
Table 3 lists the predicted results of different methods interms of AUC on the 12 networks The results are obtained byaveraging over 50 independent realizations for each networkwith testing set containing 10 links The highest AUC valuefor each network is highlighted in boldface Clearly TRAindex gets nine best results over the 12 networks MeanwhileTRA index outperforms the CAR CAA CRA and CCLPindexes on all networksWe can see fromTable 2 that onmostof the networks there exist varying degrees of such seed nodepairs with common neighbors that belong to 119875exist(119878) andor119875forall(119878) As stated in Introduction CCLP index will give loweror zero similarity scores to those pairs Furthermore bothvalues of 119877exist(119878119862119860119877) and 119877forall(119878119862119860119877) are very high on mostof the networks Particularly on Dolphin Email HamsterHEP and Yeast the corresponding values of 119877forall(119878119862119860119877) aregreater than 08 This phenomenon indicates that only a verysmall fraction of seed node pairs with common neighborson those networks can be assigned similarity scores by CAR-based indexes Although there are some seed node pairsbelonging to 119875exist(119878119879119877119860) andor 119875forall(119878119879119877119860) TRA index still canassign reasonable similarity scores to them Therefore theresults of TRA index in Table 3 are better than them ofCAR CAA CRA and CCLP indexes For CN AA RAand ADP indexes ADP index performs the best since itcan penalize common neighbors by automatically adaptingto the network On Dolphin HEP and USAir ADP indexobtains the best accuracy the performance of our indexapproximates to the best In addition TRA index achievesmuch better AUC scores than others on FW and Karate Thisresult suggests that TRA-triangles play an important role onthese two networks From Table 1 both networks are denseones Roughly speaking the probability that there exist TRA-triangle-neighbors between seed nodes on dense networks ismore than on sparse ones
To check whether the proposed index is significantly dif-ferent with compared methods we appliedWilcoxon signed-ranks test [39] based on the results in Table 3 The pairwisetest results are presented in Figure 2 From the statistical point
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 2 The results of Wilcoxon signed-ranks test based onTable 3With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
of view our index is significantly better than others exceptADP index because ADP index has the capability of adaptingto the structure of a network automatically Although there isno statistical difference between our index and ADP indexaccording toWilcoxon signed-ranks test our index performsbetter than ADP index in terms of AUC
Figure 3 exhibits the changes of AUC on 12 networkswhen the proportion of 119864119905119904 in 119864 increases from 10 to20 It is quite evident from Figure 3 that the AUC valuesof all indexes show downward trends when the proportionincreases from 10 to 20 except on FW The reason is thatthe increase of 119864119905119904 will decrease the size of training set 119864119905119903and then will result in the number of common neighborsbetween seed nodes becoming small Consequently thedifficulty of link prediction will enhance The FW networkwhich possesses high average degree small average shortestdistance and small-degree heterogeneity is a very dense
Complexity 7
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA070
075
080
085
090
095
100AU
C
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA065
070
075
080
085
090
095
AUC
CE
CN AA RA ADP CRA CCLP TRA055
060
065
070
075
080
085
AUC
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA060
065
070
075
080
085
090
AUC
Email
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
FW
CN AA RA ADP CRA CCLP TRA060
065
070
075
080
085
090
AUC
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA060065070075080085090095100
AUC
HEP
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
Karate
CN AA RA ADP CRA CCLP TRA080008250850087509000925095009751000
AUC
PB
CN AA RA ADP CAR CAA CRA CCLP TRA080008250850087509000925095009751000
AUC
USAir
CN AA RA ADP CRA CCLP TRA050005250550057506000625065006750700
AUC
Word
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080AU
CYeast
Figure 3 The changes of AUC when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over 50independent realizations
network Therefore the decrease of training set gives slightinfluence of accuracy on FW In addition we can observefrom Figure 3 that the performance presented by all indexeson ADV CE Dolphin Email Hamster HEP Karate Wordand Yeast is very similar On these nine networks the AUCvalues of CAR-based indexes are obvious lower than thoseof others On the network of FW the results of CAR-basedindexes are better than those of CN AA RA and ADPindexes because FW is a very dense network in which theratio of CAR-triangle-neighbor is very high (see Table 2) OnPB and USAir the performance of CAR-based indexes is notas bad as on other nine networks The reason is both networkshave high average degrees small average shortest distancesand high ratio of CAR-triangle-neighbors
Furthermore we list the AUC values of different methodson the 12 networks when |119864119905119904||119864| = 02 in Table 4 Theresults of our index outperform others on eight among the
12 networks while CCLP index achieves the highest value onCE
Table 5 gives the results in terms of ranking score Theseresults are similar to those in Table 3 The ranking score ofTRA index outperforms others except on Dolphin HEP andUSAir The pairwise Wilcoxon signed-ranks test results areshown in Figure 4 Similar to the test in Figure 2 TRA indexis significantly better than compared methods except ADPindex As depicted above ADP has the adaptive capabilityand hence performs better than other compared methods
Figure 5 describes the changes of ranking score on 12networks when |119864119905119904||119864| increases from 10 to 20 Clearlyall indexes yield higher ranking scores with the increase of119864119905119904 Do not forget that higher ranking score means loweraccuracy As analyzed above FW is very dense Thus thechanges of AUC on FW are very slight (see Figure 3)However the changes of ranking score on FW are more
8 Complexity
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 4 The results of Wilcoxon signed-ranks test based on Table 5 With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA010015020025030035040045050
Rank
ing
Scor
e
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA01
02
03
04
05
06
Rank
ing
Scor
e
CE
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA02
03
04
05
06
07
Rank
ing
Scor
e
Email
CN AA RA ADP CRA CCLP TRA030
032
034
036
038
040
042
044
Rank
ing
Scor
e
FW
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA
02
03
04
05
06
07
Rank
ing
Scor
e
HEP
CN AA RA ADP CRA CCLP TRA02
03
04
05
06
07
08
09
Rank
ing
Scor
e
Karate
CN AA RA ADP CRA CCLP TRA005000750100012501500175020002250250
Rank
ing
Scor
e
PB
CN AA RA ADP CAR CAA CRA CCLP TRA
006
008
010
012
014
016
018
020
Rank
ing
Scor
e
USAir
CN AA RA ADP CRA CCLP TRA04
05
06
07
08
09
Rank
ing
Scor
e
Word
CN AA RA ADP CRA CCLP TRA050055060065070075080085090
Rank
ing
Scor
e
Yeast
Figure 5 The changes of ranking score when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over50 independent realizations
Complexity 9
Table 4 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 02The best performance for each network is emphasized by boldface
Table 5The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =01 The best performance for each network is emphasized by boldface
evident especially for CAA and CRA indexes The reasonis that the calculation of ranking score considers all missinglinks In addition as seen in Figure 5 CAA and CRA indexesperform worse than CAR index according to ranking scoreFrom the definitions of these three indexes we find that bothCAA and CRA indexes can get more negative impact thanCAR index from zero-triangle-neighbors
Finally the ranking scores of all methods on the 12networks with |119864119905119904||119864| = 02 are listed in Table 6 Our indexoutperforms all other indexes except on HEP and USAir interms of ranking scoreThese results are consistent with themof AUC In contrast with that on FW the influence of TRA-triangles on HEP and USAir is small
From the above results we can conclude that TRA indexis superior to CAR-based indexes and CCLP index andperforms better than common-neighbor-based methods onmost of networks
5 Conclusion and Discussion
Link prediction is an important research topic of complexnetwork analysis and has a wide range of applications in
various fields Inspired by the triangle growth mechanism innetwork evolving [41] this paper proposed the TRA indexfor link prediction When computing the similarity betweentwo seed nodes the proposed index not only counts thecontributions of all common neighbors but also emphasizesthe importance of the neighbors that can formTRA-trianglesTo some extent TRA-triangles reflect the close relationshipsbetween neighbors and seed nodes In addition the proposedindex also adopts the theory of resource allocation [13] due toits effectiveness
The accuracy of the TRA index is experimentally evalu-ated over 12 real-world networks from various fields in termsof AUC and ranking score The experimental results showthat the proposed index performs far better than CAR-basedindexes Meanwhile our index outperforms the CCLP indexbecause of the superior strategy in our index For common-neighbor-based methods the proposed index yields someimprovements of accuracy onmost of networksThese resultsindicate that combining the information of TRA-trianglesand the theory of resource allocation in similarity index is ahelpful idea for link prediction
10 Complexity
Table 6The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =02 The best performance for each network is emphasized by boldface
There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair
Data Availability
Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)
References
[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013
[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015
[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012
[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016
[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011
[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015
[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014
[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016
[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016
[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007
[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003
[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009
[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953
[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002
[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
2 Complexity
a b
c
de
f
g
h(a) Example network
(b) Triangles used in CAR index
(c) Triangles used in CCLP index
a b
c
d
hi
a b
c
d
c a
c
d
g
b(d) Triangles used in TRA index
(d) Similarity of (a b)
seed node
common neighbor
non-common neighbor neighbor of common neighbor
i
g
f
CN(a b) = 4
RA(a b) =4
3
CAR(a b) = 4
CCLP(a b) =4
3
TRA(a b) =49
24
Figure 1 Triangles used in similarity indexes
community is a triangle passing through two common neigh-bors and one seed node In the example network shown inFigure 1(a) there is one LCL between the common neighborsof seed nodes 119886 and 119887 (see Figure 1(b)) Thus CAR indexassigns a similarity score of four to nodes 119886 and 119887 Howeverif we remove the link between 119888 and 119889 CAR will assigna zero similarity score to 119886 and 119887 even though they havefour common neighbors In addition the idea of LCL is alsoplugged into AA RA and Jaccard indexes [20] Later Wu etal proposed the CCLP index based on the clustering coef-ficients of common neighbors This index considers all tri-angles passing through a common neighbor For the examplenetwork in Figure 1(a) there are triangles passing throughnodes 119888 119889 and 119891 respectively (see Figure 1(c)) Thus CCLPindex accumulates the clustering coefficients of nodes 119888 119889and 119891 when calculating the similarity between 119886 and 119887 bututterly neglects the contribution of node 119890 In real-worldnetworks it is possible that there are no triangles passingthrough some or even all shared neighbors of one node pairThus CAR and CCLP indexes may assign a very low or evenzero similarity score to the node pair even if it has manycommon neighbors
In this paper we defines a new type of triangle structurecalled TRA-triangle which is formed by one seed node onecommon neighbor and one other node (see Figure 1(d))Based on the TRA-triangle a new similarity index namelyTRA index is proposed for link prediction This indexsuggests that the common neighbors that can form TRA-tri-angles with a seed node are more important than others Inaddition the proposed index also penalizes the large-degreeneighbors as done in RA index [13] Although all theTRA CAR-based and CCLP indexes are based on trianglestructures the intuitions behind themare differentTheCAR-based indexes believe that LCLs are more valuable than
common neighbors The CCLP index is inspired by CARindex but employs all triangles passing through commonneighbors while the TRA index which only uses the TRA-triangles strikes a balance between CAR and CCLP Further-more as aforementioned CAR-based and CCLP indexes losethe contribution of those common neighbors with no trian-gles passing through them whereas TRA index counts thecontribution of all kinds of common neighbors ThereforeTRA index can achieve better prediction accuracy than CAR-based indexes and CCLP index The accuracy of TRA indexis evaluated on 12 real-world networks from various fieldsThe experimental results show that our index is far superiorto CAR-based indexes and CCLP index Take the networkof HEP as an example which is a very sparse network theimprovements made by TRA on CAR and CCLP under themetric of AUC are up to 269 and 42 respectively
The rest of the paper is structured as follows In Section 2we give the description of the link prediction problem and theevaluation metrics list the compared methods and networksand depict the Wilcoxon signed-ranks test Section 3 intro-duces the proposed method In Section 4 the experimentalresults and performance analysis of the proposed method arepresented Finally Section 5 concludes this work
2 Preliminaries
21 Problem Description and Metric Given an undirectedand unweighted network 119866(119881 119864) in which 119881 and 119864 are thenode set and link set respectively in this study multilinksand self-loops are not allowed Let119873 = |119881| be the number ofnodes in the network and let119880 be the universal possible linkset which contains119873(119873minus1)2 possible linksThen the set ofnonobserved links or nonexisting links is119880minus119864 Suppose thereare some missing links in 119880 minus 119864 the task of link prediction
Complexity 3
is to find those links A similarity-based approach assigns asimilarity score to each node pair in 119880 minus 119864 and assumes thatthe higher score a node pair has the more likely there is a linkbetween them
To test the performance of a similarity index we ran-domly divide the link set 119864 into two parts training set 119864119905119903and testing set 119864119905119904 such that 119864 = 119864119905119903 cup 119864119905119904 and 119864119905119903 cap 119864119905119904 = 0119864119905119903 is supposed to be the observed information and 119864119905119904 isused for testing Two parameter-free metrics are employedto quantify the accuracy of link prediction algorithms AUC[6] and ranking score [21 22] In this situation the AUCscore can be interpreted as the probability that a randomlyselected missing link (ie a link in 119864119905119904) is given a higherscore than a randomly selected nonexistent link (ie a linkin119880minus119864) When implementing if we perform 119899 independentcomparisons there are 1198991 times that the missing link hashigher score and 1198992 times that they have the same score TheAUC value is then computed as
Ranking score (RS) takes the ranks of links in testing setafter sorting in descend order according to their similarityscores into consideration Let 119867 = 119880 minus 119864119905119903 be the set ofnonobserved links Let 119890119894 be a missing link in 119864119905119904 and 119903119894 be itsrankThe ranking score of 119890119894 is defined as119877119878(119890119894) = 119903119894|119867| andthe ranking score of the link prediction result is as follows
Note that the AUC value is the higher the better whereasthe ranking score is the smaller the better
22 Local Similarity Indexes As yet many similarity indexeshave been proposed for link prediction [6 8 9] Here welist some local similarity indexes that will be used in ourexperiments for the purpose of comparison(1) Common Neighbor (CN) index [11] defines the
similarity between 119909 and 119910 as the number of their commonneighbors which is
119862119873(119909 119910) = 1003816100381610038161003816Γ (119909) cap Γ (119910)1003816100381610038161003816 (3)
where Γ(119909) denotes the set of neighbors of node 119909(2) Adamic-Adar (AA) index [12] is a variant of CN
index which believes that small-degree neighbors have morecontributions than large-degree neighbors when computingsimilarity Its definition is as follows
(4) Adaptive Degree Penalization (ADP) index [23]penalizes a common neighbor according to its degree and theaverage clustering coefficient of the networkTherefore it canautomatically adapt to the network The definition of ADPindex is as follows
where 120573 is a constant and 119862 is the average clusteringcoefficient of the network We set 120573 = 25 as suggested bythe authors(5)CAR index [20] suggests that two seed nodes are more
likely to link together if there are links between their commonneighbors which is defined as
where 119871(119911) is the number of links between 119911 and othercommon neighbors of 119909 and 119910(6)CAA and CRA indexes [20] are generated by plugging
the idea of CAR index into the AA and RA indexes respec-tively which are defined as
where119862119862119911 denotes the clustering coefficient of node 119911 whichis
119862119862119911 =2119905119911
119896119911 (119896119911 minus 1) (11)
in which 119905119911 is the number of triangles passing through node119911
23 Networks In this study we use 12 real-world networksdrawn from various fields to evaluate the effectiveness of linkprediction methods
(1) Advogato (ADV) a social network whose users aremainly free and open source software developers [25]
(2) Celegans (CE) the neural network of a Caenorhab-ditis elegans worm [26]
(3) Dolphin a social network of 62 dolphins in a commu-nity living off Doubtful Sound New Zealand [27]
(4) Email a network of email interchanges betweenmembers of a university [28]
4 Complexity
(5) Foodweb (FW) a food web in Florida Bay during therainy season [29]
(6) Hamster a friendship network between users onhamsterstercom [30]
(7) HEP the coauthorships network of scientists whoposted preprints on the high-energy theory archivefrom 1995 to 1999 [31]
(8) Karate the social network of a karate club at a USuniversity [32]
(9) Political blogs (PB) a network of blogs about US poli-tics [33]
(10) USAir a network of the US air transportation system[6]
(11) Word an adjacency network of common adjectivesand noun in the novel ldquoDavidCopperfieldrdquo byCharlesDickens [34]
(12) Yeast the protein-protein interaction network ofbudding yeast [35]
In this work all the aforementioned networks are treatedas undirected and unweighted networks and only the giantcomponent of each network is used Table 1 lists the basicstatistics of the giant components of these networks
Given network 119866(119881 119864) suppose 119909 119910 be two seed nodes(119909 119910) is called a seed node pair with common neighbors if theyhave at least one common neighbor119875Λ denotes the set of seednode pairs with common neighbors formally
119875Λ = (119909 119910) | (119909 119910) notin 119864 and Γ (119909) cap Γ (119910) = 0 (12)
Let 119909 119910 be two seed nodes and 119911 is one of their commonneighbors If 119862119862119911 = 0 we call 119911 is a zero-triangle-neighborotherwise 119911 is a triangle-neighbor If 119871(119911) = 0 119911 is called aCAR-triangle-neighbor and if (119909 119910 119911) = 0 (see (18)) 119911 iscalled a TRA-triangle-neighbor Let 119878 be the set of triangle-neighbors and 119878119862119860119877 119878119879119877119860 denote the sets of CAR- and TRA-triangle-neighbors respectively Clearly 119878119862119860119877 sube 119878119879119877119860 sube 119878Let 119875exist(119878) and 119875forall(119878) be two subsets of 119875Λ For any pairin 119875exist(119878) at least one of their shared neighbors is not atriangle-neighbor and for any pair in 119875forall(119878) all of theirshared neighbors are not triangle-neighbors More explicitly
119875exist (119878)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878
119875forall (119878)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878
(13)
Similarly we define 119875exist(119878119879119877119860) 119875forall(119878119879119877119860) 119875exist(119878119862119860119877) and119875forall(119878119862119860119877) which are
119875exist (119878119879119877119860)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119879119877119860
119875forall (119878119879119877119860)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119879119877119860
119875exist (119878119862119860119877)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119862119860119877
119875forall (119878119862119860119877)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119862119860119877
(14)
Correspondingly the ratios of those subsets to 119875Λ arerespectively defined as
24Wilcoxon Signed-Ranks Test TheWilcoxon signed-rankstest is a nonparametric statistical hypothesis test used tocheck whether two methods perform equally well over multi-ple networks [38 39] Let 119889119894 be the difference in performancescores of two link prediction methods on the 119894th networkThe differences are ranked in accordance with their absolutevalues in case of ties average ranks are assigned Let 119877+be the sum of ranks for the networks on which the secondmethod outperformed the first and 119877minus the sum of ranks forthe opposite For a larger number of networks the statistics
is distributed approximately normally [39] In (16) 119879 =min(119877+ 119877minus) and119873 is the number of networks
With 120572 = 005 if 119911 is small than -196 we reject the null-hypothesis which states that both methods perform equallywell
Complexity 5
Table 1 The basic structural features of the giant components of the 12 networks |119881| and |119864| are the total numbers of nodes and edgesrespectively119863 denotes the density which is119863 = 2|119864||119881|(|119881| minus 1) ⟨119896⟩ and ⟨119889⟩ present the average degree and the average shortest distancerespectively 119862 and 119903 indicate the clustering coefficient [26] and assortative coefficient [36] respectively 119867 is the degree heterogeneity [6]defined as119867 = ⟨1198962⟩⟨119896⟩2 and 119890 is the network efficiency [37]
The link prediction problem has a familiar relationship withthe network evolvingmechanism [2 40] A recently proposedtriangle growth mechanism demonstrates that various keyfeatures observed in most real-world networks can be gener-ated in simulated networks [41] Therefore triangle structureinformation has an important effect in link formation
In this work we focus on a new triangle structure namelyTRA-triangle A TRA-triangle passes through one seed nodeone common neighbor and one other node In our opinionthe commonneighbors that can formTRA-triangles aremoreimportant than others Given two nodes 119906 and V we denotethe number of triangles passing through them as (119906 V)which is
(119906 V) =
119862119873(119906 V) if (119906 V) isin 1198640 otherwise
(17)
For the example network in Figure 1(a) the trianglesused for seed nodes 119886 119887 are shown in Figure 1(d) Clearly
(119886 119888) = 2 and (119886 119889) = 1 Thus node 119888 is in more closecontact with 119886 than 119889 Given seed nodes 119909 and 119910 119911 is oneof their common neighbors Function(119909 119910 119911) sums up thenumber of TRA-triangles formed by 119909 119911 and 119910 119911 which is
In this paper we propose a new similarity index bycombining the aforementioned triangle structure and theidea of RA index [13] For the convenience of statement wename our new method TRA index Its definition is
In (19) the numerator is 1 + (119909 119910 119911)2 Therefore theTRA index does not miss the effect of any common neighborIf all common neighbors are zero-triangle-neighbors TRAdegenerates to RA For the example network in Figure 1(a)119879119877119860(119886 119887) = (1+32)4+(1+22)3+(1+02)2+(1+02)4 =4924
6 Complexity
Table 3 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 01The best performance for each network is emphasized by boldface
Table 3 lists the predicted results of different methods interms of AUC on the 12 networks The results are obtained byaveraging over 50 independent realizations for each networkwith testing set containing 10 links The highest AUC valuefor each network is highlighted in boldface Clearly TRAindex gets nine best results over the 12 networks MeanwhileTRA index outperforms the CAR CAA CRA and CCLPindexes on all networksWe can see fromTable 2 that onmostof the networks there exist varying degrees of such seed nodepairs with common neighbors that belong to 119875exist(119878) andor119875forall(119878) As stated in Introduction CCLP index will give loweror zero similarity scores to those pairs Furthermore bothvalues of 119877exist(119878119862119860119877) and 119877forall(119878119862119860119877) are very high on mostof the networks Particularly on Dolphin Email HamsterHEP and Yeast the corresponding values of 119877forall(119878119862119860119877) aregreater than 08 This phenomenon indicates that only a verysmall fraction of seed node pairs with common neighborson those networks can be assigned similarity scores by CAR-based indexes Although there are some seed node pairsbelonging to 119875exist(119878119879119877119860) andor 119875forall(119878119879119877119860) TRA index still canassign reasonable similarity scores to them Therefore theresults of TRA index in Table 3 are better than them ofCAR CAA CRA and CCLP indexes For CN AA RAand ADP indexes ADP index performs the best since itcan penalize common neighbors by automatically adaptingto the network On Dolphin HEP and USAir ADP indexobtains the best accuracy the performance of our indexapproximates to the best In addition TRA index achievesmuch better AUC scores than others on FW and Karate Thisresult suggests that TRA-triangles play an important role onthese two networks From Table 1 both networks are denseones Roughly speaking the probability that there exist TRA-triangle-neighbors between seed nodes on dense networks ismore than on sparse ones
To check whether the proposed index is significantly dif-ferent with compared methods we appliedWilcoxon signed-ranks test [39] based on the results in Table 3 The pairwisetest results are presented in Figure 2 From the statistical point
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 2 The results of Wilcoxon signed-ranks test based onTable 3With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
of view our index is significantly better than others exceptADP index because ADP index has the capability of adaptingto the structure of a network automatically Although there isno statistical difference between our index and ADP indexaccording toWilcoxon signed-ranks test our index performsbetter than ADP index in terms of AUC
Figure 3 exhibits the changes of AUC on 12 networkswhen the proportion of 119864119905119904 in 119864 increases from 10 to20 It is quite evident from Figure 3 that the AUC valuesof all indexes show downward trends when the proportionincreases from 10 to 20 except on FW The reason is thatthe increase of 119864119905119904 will decrease the size of training set 119864119905119903and then will result in the number of common neighborsbetween seed nodes becoming small Consequently thedifficulty of link prediction will enhance The FW networkwhich possesses high average degree small average shortestdistance and small-degree heterogeneity is a very dense
Complexity 7
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA070
075
080
085
090
095
100AU
C
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA065
070
075
080
085
090
095
AUC
CE
CN AA RA ADP CRA CCLP TRA055
060
065
070
075
080
085
AUC
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA060
065
070
075
080
085
090
AUC
Email
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
FW
CN AA RA ADP CRA CCLP TRA060
065
070
075
080
085
090
AUC
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA060065070075080085090095100
AUC
HEP
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
Karate
CN AA RA ADP CRA CCLP TRA080008250850087509000925095009751000
AUC
PB
CN AA RA ADP CAR CAA CRA CCLP TRA080008250850087509000925095009751000
AUC
USAir
CN AA RA ADP CRA CCLP TRA050005250550057506000625065006750700
AUC
Word
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080AU
CYeast
Figure 3 The changes of AUC when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over 50independent realizations
network Therefore the decrease of training set gives slightinfluence of accuracy on FW In addition we can observefrom Figure 3 that the performance presented by all indexeson ADV CE Dolphin Email Hamster HEP Karate Wordand Yeast is very similar On these nine networks the AUCvalues of CAR-based indexes are obvious lower than thoseof others On the network of FW the results of CAR-basedindexes are better than those of CN AA RA and ADPindexes because FW is a very dense network in which theratio of CAR-triangle-neighbor is very high (see Table 2) OnPB and USAir the performance of CAR-based indexes is notas bad as on other nine networks The reason is both networkshave high average degrees small average shortest distancesand high ratio of CAR-triangle-neighbors
Furthermore we list the AUC values of different methodson the 12 networks when |119864119905119904||119864| = 02 in Table 4 Theresults of our index outperform others on eight among the
12 networks while CCLP index achieves the highest value onCE
Table 5 gives the results in terms of ranking score Theseresults are similar to those in Table 3 The ranking score ofTRA index outperforms others except on Dolphin HEP andUSAir The pairwise Wilcoxon signed-ranks test results areshown in Figure 4 Similar to the test in Figure 2 TRA indexis significantly better than compared methods except ADPindex As depicted above ADP has the adaptive capabilityand hence performs better than other compared methods
Figure 5 describes the changes of ranking score on 12networks when |119864119905119904||119864| increases from 10 to 20 Clearlyall indexes yield higher ranking scores with the increase of119864119905119904 Do not forget that higher ranking score means loweraccuracy As analyzed above FW is very dense Thus thechanges of AUC on FW are very slight (see Figure 3)However the changes of ranking score on FW are more
8 Complexity
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 4 The results of Wilcoxon signed-ranks test based on Table 5 With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA010015020025030035040045050
Rank
ing
Scor
e
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA01
02
03
04
05
06
Rank
ing
Scor
e
CE
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA02
03
04
05
06
07
Rank
ing
Scor
e
Email
CN AA RA ADP CRA CCLP TRA030
032
034
036
038
040
042
044
Rank
ing
Scor
e
FW
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA
02
03
04
05
06
07
Rank
ing
Scor
e
HEP
CN AA RA ADP CRA CCLP TRA02
03
04
05
06
07
08
09
Rank
ing
Scor
e
Karate
CN AA RA ADP CRA CCLP TRA005000750100012501500175020002250250
Rank
ing
Scor
e
PB
CN AA RA ADP CAR CAA CRA CCLP TRA
006
008
010
012
014
016
018
020
Rank
ing
Scor
e
USAir
CN AA RA ADP CRA CCLP TRA04
05
06
07
08
09
Rank
ing
Scor
e
Word
CN AA RA ADP CRA CCLP TRA050055060065070075080085090
Rank
ing
Scor
e
Yeast
Figure 5 The changes of ranking score when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over50 independent realizations
Complexity 9
Table 4 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 02The best performance for each network is emphasized by boldface
Table 5The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =01 The best performance for each network is emphasized by boldface
evident especially for CAA and CRA indexes The reasonis that the calculation of ranking score considers all missinglinks In addition as seen in Figure 5 CAA and CRA indexesperform worse than CAR index according to ranking scoreFrom the definitions of these three indexes we find that bothCAA and CRA indexes can get more negative impact thanCAR index from zero-triangle-neighbors
Finally the ranking scores of all methods on the 12networks with |119864119905119904||119864| = 02 are listed in Table 6 Our indexoutperforms all other indexes except on HEP and USAir interms of ranking scoreThese results are consistent with themof AUC In contrast with that on FW the influence of TRA-triangles on HEP and USAir is small
From the above results we can conclude that TRA indexis superior to CAR-based indexes and CCLP index andperforms better than common-neighbor-based methods onmost of networks
5 Conclusion and Discussion
Link prediction is an important research topic of complexnetwork analysis and has a wide range of applications in
various fields Inspired by the triangle growth mechanism innetwork evolving [41] this paper proposed the TRA indexfor link prediction When computing the similarity betweentwo seed nodes the proposed index not only counts thecontributions of all common neighbors but also emphasizesthe importance of the neighbors that can formTRA-trianglesTo some extent TRA-triangles reflect the close relationshipsbetween neighbors and seed nodes In addition the proposedindex also adopts the theory of resource allocation [13] due toits effectiveness
The accuracy of the TRA index is experimentally evalu-ated over 12 real-world networks from various fields in termsof AUC and ranking score The experimental results showthat the proposed index performs far better than CAR-basedindexes Meanwhile our index outperforms the CCLP indexbecause of the superior strategy in our index For common-neighbor-based methods the proposed index yields someimprovements of accuracy onmost of networksThese resultsindicate that combining the information of TRA-trianglesand the theory of resource allocation in similarity index is ahelpful idea for link prediction
10 Complexity
Table 6The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =02 The best performance for each network is emphasized by boldface
There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair
Data Availability
Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)
References
[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013
[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015
[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012
[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016
[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011
[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015
[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014
[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016
[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016
[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007
[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003
[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009
[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953
[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002
[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 3
is to find those links A similarity-based approach assigns asimilarity score to each node pair in 119880 minus 119864 and assumes thatthe higher score a node pair has the more likely there is a linkbetween them
To test the performance of a similarity index we ran-domly divide the link set 119864 into two parts training set 119864119905119903and testing set 119864119905119904 such that 119864 = 119864119905119903 cup 119864119905119904 and 119864119905119903 cap 119864119905119904 = 0119864119905119903 is supposed to be the observed information and 119864119905119904 isused for testing Two parameter-free metrics are employedto quantify the accuracy of link prediction algorithms AUC[6] and ranking score [21 22] In this situation the AUCscore can be interpreted as the probability that a randomlyselected missing link (ie a link in 119864119905119904) is given a higherscore than a randomly selected nonexistent link (ie a linkin119880minus119864) When implementing if we perform 119899 independentcomparisons there are 1198991 times that the missing link hashigher score and 1198992 times that they have the same score TheAUC value is then computed as
Ranking score (RS) takes the ranks of links in testing setafter sorting in descend order according to their similarityscores into consideration Let 119867 = 119880 minus 119864119905119903 be the set ofnonobserved links Let 119890119894 be a missing link in 119864119905119904 and 119903119894 be itsrankThe ranking score of 119890119894 is defined as119877119878(119890119894) = 119903119894|119867| andthe ranking score of the link prediction result is as follows
Note that the AUC value is the higher the better whereasthe ranking score is the smaller the better
22 Local Similarity Indexes As yet many similarity indexeshave been proposed for link prediction [6 8 9] Here welist some local similarity indexes that will be used in ourexperiments for the purpose of comparison(1) Common Neighbor (CN) index [11] defines the
similarity between 119909 and 119910 as the number of their commonneighbors which is
119862119873(119909 119910) = 1003816100381610038161003816Γ (119909) cap Γ (119910)1003816100381610038161003816 (3)
where Γ(119909) denotes the set of neighbors of node 119909(2) Adamic-Adar (AA) index [12] is a variant of CN
index which believes that small-degree neighbors have morecontributions than large-degree neighbors when computingsimilarity Its definition is as follows
(4) Adaptive Degree Penalization (ADP) index [23]penalizes a common neighbor according to its degree and theaverage clustering coefficient of the networkTherefore it canautomatically adapt to the network The definition of ADPindex is as follows
where 120573 is a constant and 119862 is the average clusteringcoefficient of the network We set 120573 = 25 as suggested bythe authors(5)CAR index [20] suggests that two seed nodes are more
likely to link together if there are links between their commonneighbors which is defined as
where 119871(119911) is the number of links between 119911 and othercommon neighbors of 119909 and 119910(6)CAA and CRA indexes [20] are generated by plugging
the idea of CAR index into the AA and RA indexes respec-tively which are defined as
where119862119862119911 denotes the clustering coefficient of node 119911 whichis
119862119862119911 =2119905119911
119896119911 (119896119911 minus 1) (11)
in which 119905119911 is the number of triangles passing through node119911
23 Networks In this study we use 12 real-world networksdrawn from various fields to evaluate the effectiveness of linkprediction methods
(1) Advogato (ADV) a social network whose users aremainly free and open source software developers [25]
(2) Celegans (CE) the neural network of a Caenorhab-ditis elegans worm [26]
(3) Dolphin a social network of 62 dolphins in a commu-nity living off Doubtful Sound New Zealand [27]
(4) Email a network of email interchanges betweenmembers of a university [28]
4 Complexity
(5) Foodweb (FW) a food web in Florida Bay during therainy season [29]
(6) Hamster a friendship network between users onhamsterstercom [30]
(7) HEP the coauthorships network of scientists whoposted preprints on the high-energy theory archivefrom 1995 to 1999 [31]
(8) Karate the social network of a karate club at a USuniversity [32]
(9) Political blogs (PB) a network of blogs about US poli-tics [33]
(10) USAir a network of the US air transportation system[6]
(11) Word an adjacency network of common adjectivesand noun in the novel ldquoDavidCopperfieldrdquo byCharlesDickens [34]
(12) Yeast the protein-protein interaction network ofbudding yeast [35]
In this work all the aforementioned networks are treatedas undirected and unweighted networks and only the giantcomponent of each network is used Table 1 lists the basicstatistics of the giant components of these networks
Given network 119866(119881 119864) suppose 119909 119910 be two seed nodes(119909 119910) is called a seed node pair with common neighbors if theyhave at least one common neighbor119875Λ denotes the set of seednode pairs with common neighbors formally
119875Λ = (119909 119910) | (119909 119910) notin 119864 and Γ (119909) cap Γ (119910) = 0 (12)
Let 119909 119910 be two seed nodes and 119911 is one of their commonneighbors If 119862119862119911 = 0 we call 119911 is a zero-triangle-neighborotherwise 119911 is a triangle-neighbor If 119871(119911) = 0 119911 is called aCAR-triangle-neighbor and if (119909 119910 119911) = 0 (see (18)) 119911 iscalled a TRA-triangle-neighbor Let 119878 be the set of triangle-neighbors and 119878119862119860119877 119878119879119877119860 denote the sets of CAR- and TRA-triangle-neighbors respectively Clearly 119878119862119860119877 sube 119878119879119877119860 sube 119878Let 119875exist(119878) and 119875forall(119878) be two subsets of 119875Λ For any pairin 119875exist(119878) at least one of their shared neighbors is not atriangle-neighbor and for any pair in 119875forall(119878) all of theirshared neighbors are not triangle-neighbors More explicitly
119875exist (119878)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878
119875forall (119878)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878
(13)
Similarly we define 119875exist(119878119879119877119860) 119875forall(119878119879119877119860) 119875exist(119878119862119860119877) and119875forall(119878119862119860119877) which are
119875exist (119878119879119877119860)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119879119877119860
119875forall (119878119879119877119860)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119879119877119860
119875exist (119878119862119860119877)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119862119860119877
119875forall (119878119862119860119877)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119862119860119877
(14)
Correspondingly the ratios of those subsets to 119875Λ arerespectively defined as
24Wilcoxon Signed-Ranks Test TheWilcoxon signed-rankstest is a nonparametric statistical hypothesis test used tocheck whether two methods perform equally well over multi-ple networks [38 39] Let 119889119894 be the difference in performancescores of two link prediction methods on the 119894th networkThe differences are ranked in accordance with their absolutevalues in case of ties average ranks are assigned Let 119877+be the sum of ranks for the networks on which the secondmethod outperformed the first and 119877minus the sum of ranks forthe opposite For a larger number of networks the statistics
is distributed approximately normally [39] In (16) 119879 =min(119877+ 119877minus) and119873 is the number of networks
With 120572 = 005 if 119911 is small than -196 we reject the null-hypothesis which states that both methods perform equallywell
Complexity 5
Table 1 The basic structural features of the giant components of the 12 networks |119881| and |119864| are the total numbers of nodes and edgesrespectively119863 denotes the density which is119863 = 2|119864||119881|(|119881| minus 1) ⟨119896⟩ and ⟨119889⟩ present the average degree and the average shortest distancerespectively 119862 and 119903 indicate the clustering coefficient [26] and assortative coefficient [36] respectively 119867 is the degree heterogeneity [6]defined as119867 = ⟨1198962⟩⟨119896⟩2 and 119890 is the network efficiency [37]
The link prediction problem has a familiar relationship withthe network evolvingmechanism [2 40] A recently proposedtriangle growth mechanism demonstrates that various keyfeatures observed in most real-world networks can be gener-ated in simulated networks [41] Therefore triangle structureinformation has an important effect in link formation
In this work we focus on a new triangle structure namelyTRA-triangle A TRA-triangle passes through one seed nodeone common neighbor and one other node In our opinionthe commonneighbors that can formTRA-triangles aremoreimportant than others Given two nodes 119906 and V we denotethe number of triangles passing through them as (119906 V)which is
(119906 V) =
119862119873(119906 V) if (119906 V) isin 1198640 otherwise
(17)
For the example network in Figure 1(a) the trianglesused for seed nodes 119886 119887 are shown in Figure 1(d) Clearly
(119886 119888) = 2 and (119886 119889) = 1 Thus node 119888 is in more closecontact with 119886 than 119889 Given seed nodes 119909 and 119910 119911 is oneof their common neighbors Function(119909 119910 119911) sums up thenumber of TRA-triangles formed by 119909 119911 and 119910 119911 which is
In this paper we propose a new similarity index bycombining the aforementioned triangle structure and theidea of RA index [13] For the convenience of statement wename our new method TRA index Its definition is
In (19) the numerator is 1 + (119909 119910 119911)2 Therefore theTRA index does not miss the effect of any common neighborIf all common neighbors are zero-triangle-neighbors TRAdegenerates to RA For the example network in Figure 1(a)119879119877119860(119886 119887) = (1+32)4+(1+22)3+(1+02)2+(1+02)4 =4924
6 Complexity
Table 3 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 01The best performance for each network is emphasized by boldface
Table 3 lists the predicted results of different methods interms of AUC on the 12 networks The results are obtained byaveraging over 50 independent realizations for each networkwith testing set containing 10 links The highest AUC valuefor each network is highlighted in boldface Clearly TRAindex gets nine best results over the 12 networks MeanwhileTRA index outperforms the CAR CAA CRA and CCLPindexes on all networksWe can see fromTable 2 that onmostof the networks there exist varying degrees of such seed nodepairs with common neighbors that belong to 119875exist(119878) andor119875forall(119878) As stated in Introduction CCLP index will give loweror zero similarity scores to those pairs Furthermore bothvalues of 119877exist(119878119862119860119877) and 119877forall(119878119862119860119877) are very high on mostof the networks Particularly on Dolphin Email HamsterHEP and Yeast the corresponding values of 119877forall(119878119862119860119877) aregreater than 08 This phenomenon indicates that only a verysmall fraction of seed node pairs with common neighborson those networks can be assigned similarity scores by CAR-based indexes Although there are some seed node pairsbelonging to 119875exist(119878119879119877119860) andor 119875forall(119878119879119877119860) TRA index still canassign reasonable similarity scores to them Therefore theresults of TRA index in Table 3 are better than them ofCAR CAA CRA and CCLP indexes For CN AA RAand ADP indexes ADP index performs the best since itcan penalize common neighbors by automatically adaptingto the network On Dolphin HEP and USAir ADP indexobtains the best accuracy the performance of our indexapproximates to the best In addition TRA index achievesmuch better AUC scores than others on FW and Karate Thisresult suggests that TRA-triangles play an important role onthese two networks From Table 1 both networks are denseones Roughly speaking the probability that there exist TRA-triangle-neighbors between seed nodes on dense networks ismore than on sparse ones
To check whether the proposed index is significantly dif-ferent with compared methods we appliedWilcoxon signed-ranks test [39] based on the results in Table 3 The pairwisetest results are presented in Figure 2 From the statistical point
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 2 The results of Wilcoxon signed-ranks test based onTable 3With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
of view our index is significantly better than others exceptADP index because ADP index has the capability of adaptingto the structure of a network automatically Although there isno statistical difference between our index and ADP indexaccording toWilcoxon signed-ranks test our index performsbetter than ADP index in terms of AUC
Figure 3 exhibits the changes of AUC on 12 networkswhen the proportion of 119864119905119904 in 119864 increases from 10 to20 It is quite evident from Figure 3 that the AUC valuesof all indexes show downward trends when the proportionincreases from 10 to 20 except on FW The reason is thatthe increase of 119864119905119904 will decrease the size of training set 119864119905119903and then will result in the number of common neighborsbetween seed nodes becoming small Consequently thedifficulty of link prediction will enhance The FW networkwhich possesses high average degree small average shortestdistance and small-degree heterogeneity is a very dense
Complexity 7
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA070
075
080
085
090
095
100AU
C
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA065
070
075
080
085
090
095
AUC
CE
CN AA RA ADP CRA CCLP TRA055
060
065
070
075
080
085
AUC
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA060
065
070
075
080
085
090
AUC
Email
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
FW
CN AA RA ADP CRA CCLP TRA060
065
070
075
080
085
090
AUC
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA060065070075080085090095100
AUC
HEP
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
Karate
CN AA RA ADP CRA CCLP TRA080008250850087509000925095009751000
AUC
PB
CN AA RA ADP CAR CAA CRA CCLP TRA080008250850087509000925095009751000
AUC
USAir
CN AA RA ADP CRA CCLP TRA050005250550057506000625065006750700
AUC
Word
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080AU
CYeast
Figure 3 The changes of AUC when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over 50independent realizations
network Therefore the decrease of training set gives slightinfluence of accuracy on FW In addition we can observefrom Figure 3 that the performance presented by all indexeson ADV CE Dolphin Email Hamster HEP Karate Wordand Yeast is very similar On these nine networks the AUCvalues of CAR-based indexes are obvious lower than thoseof others On the network of FW the results of CAR-basedindexes are better than those of CN AA RA and ADPindexes because FW is a very dense network in which theratio of CAR-triangle-neighbor is very high (see Table 2) OnPB and USAir the performance of CAR-based indexes is notas bad as on other nine networks The reason is both networkshave high average degrees small average shortest distancesand high ratio of CAR-triangle-neighbors
Furthermore we list the AUC values of different methodson the 12 networks when |119864119905119904||119864| = 02 in Table 4 Theresults of our index outperform others on eight among the
12 networks while CCLP index achieves the highest value onCE
Table 5 gives the results in terms of ranking score Theseresults are similar to those in Table 3 The ranking score ofTRA index outperforms others except on Dolphin HEP andUSAir The pairwise Wilcoxon signed-ranks test results areshown in Figure 4 Similar to the test in Figure 2 TRA indexis significantly better than compared methods except ADPindex As depicted above ADP has the adaptive capabilityand hence performs better than other compared methods
Figure 5 describes the changes of ranking score on 12networks when |119864119905119904||119864| increases from 10 to 20 Clearlyall indexes yield higher ranking scores with the increase of119864119905119904 Do not forget that higher ranking score means loweraccuracy As analyzed above FW is very dense Thus thechanges of AUC on FW are very slight (see Figure 3)However the changes of ranking score on FW are more
8 Complexity
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 4 The results of Wilcoxon signed-ranks test based on Table 5 With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA010015020025030035040045050
Rank
ing
Scor
e
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA01
02
03
04
05
06
Rank
ing
Scor
e
CE
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA02
03
04
05
06
07
Rank
ing
Scor
e
Email
CN AA RA ADP CRA CCLP TRA030
032
034
036
038
040
042
044
Rank
ing
Scor
e
FW
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA
02
03
04
05
06
07
Rank
ing
Scor
e
HEP
CN AA RA ADP CRA CCLP TRA02
03
04
05
06
07
08
09
Rank
ing
Scor
e
Karate
CN AA RA ADP CRA CCLP TRA005000750100012501500175020002250250
Rank
ing
Scor
e
PB
CN AA RA ADP CAR CAA CRA CCLP TRA
006
008
010
012
014
016
018
020
Rank
ing
Scor
e
USAir
CN AA RA ADP CRA CCLP TRA04
05
06
07
08
09
Rank
ing
Scor
e
Word
CN AA RA ADP CRA CCLP TRA050055060065070075080085090
Rank
ing
Scor
e
Yeast
Figure 5 The changes of ranking score when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over50 independent realizations
Complexity 9
Table 4 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 02The best performance for each network is emphasized by boldface
Table 5The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =01 The best performance for each network is emphasized by boldface
evident especially for CAA and CRA indexes The reasonis that the calculation of ranking score considers all missinglinks In addition as seen in Figure 5 CAA and CRA indexesperform worse than CAR index according to ranking scoreFrom the definitions of these three indexes we find that bothCAA and CRA indexes can get more negative impact thanCAR index from zero-triangle-neighbors
Finally the ranking scores of all methods on the 12networks with |119864119905119904||119864| = 02 are listed in Table 6 Our indexoutperforms all other indexes except on HEP and USAir interms of ranking scoreThese results are consistent with themof AUC In contrast with that on FW the influence of TRA-triangles on HEP and USAir is small
From the above results we can conclude that TRA indexis superior to CAR-based indexes and CCLP index andperforms better than common-neighbor-based methods onmost of networks
5 Conclusion and Discussion
Link prediction is an important research topic of complexnetwork analysis and has a wide range of applications in
various fields Inspired by the triangle growth mechanism innetwork evolving [41] this paper proposed the TRA indexfor link prediction When computing the similarity betweentwo seed nodes the proposed index not only counts thecontributions of all common neighbors but also emphasizesthe importance of the neighbors that can formTRA-trianglesTo some extent TRA-triangles reflect the close relationshipsbetween neighbors and seed nodes In addition the proposedindex also adopts the theory of resource allocation [13] due toits effectiveness
The accuracy of the TRA index is experimentally evalu-ated over 12 real-world networks from various fields in termsof AUC and ranking score The experimental results showthat the proposed index performs far better than CAR-basedindexes Meanwhile our index outperforms the CCLP indexbecause of the superior strategy in our index For common-neighbor-based methods the proposed index yields someimprovements of accuracy onmost of networksThese resultsindicate that combining the information of TRA-trianglesand the theory of resource allocation in similarity index is ahelpful idea for link prediction
10 Complexity
Table 6The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =02 The best performance for each network is emphasized by boldface
There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair
Data Availability
Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)
References
[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013
[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015
[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012
[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016
[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011
[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015
[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014
[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016
[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016
[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007
[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003
[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009
[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953
[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002
[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
4 Complexity
(5) Foodweb (FW) a food web in Florida Bay during therainy season [29]
(6) Hamster a friendship network between users onhamsterstercom [30]
(7) HEP the coauthorships network of scientists whoposted preprints on the high-energy theory archivefrom 1995 to 1999 [31]
(8) Karate the social network of a karate club at a USuniversity [32]
(9) Political blogs (PB) a network of blogs about US poli-tics [33]
(10) USAir a network of the US air transportation system[6]
(11) Word an adjacency network of common adjectivesand noun in the novel ldquoDavidCopperfieldrdquo byCharlesDickens [34]
(12) Yeast the protein-protein interaction network ofbudding yeast [35]
In this work all the aforementioned networks are treatedas undirected and unweighted networks and only the giantcomponent of each network is used Table 1 lists the basicstatistics of the giant components of these networks
Given network 119866(119881 119864) suppose 119909 119910 be two seed nodes(119909 119910) is called a seed node pair with common neighbors if theyhave at least one common neighbor119875Λ denotes the set of seednode pairs with common neighbors formally
119875Λ = (119909 119910) | (119909 119910) notin 119864 and Γ (119909) cap Γ (119910) = 0 (12)
Let 119909 119910 be two seed nodes and 119911 is one of their commonneighbors If 119862119862119911 = 0 we call 119911 is a zero-triangle-neighborotherwise 119911 is a triangle-neighbor If 119871(119911) = 0 119911 is called aCAR-triangle-neighbor and if (119909 119910 119911) = 0 (see (18)) 119911 iscalled a TRA-triangle-neighbor Let 119878 be the set of triangle-neighbors and 119878119862119860119877 119878119879119877119860 denote the sets of CAR- and TRA-triangle-neighbors respectively Clearly 119878119862119860119877 sube 119878119879119877119860 sube 119878Let 119875exist(119878) and 119875forall(119878) be two subsets of 119875Λ For any pairin 119875exist(119878) at least one of their shared neighbors is not atriangle-neighbor and for any pair in 119875forall(119878) all of theirshared neighbors are not triangle-neighbors More explicitly
119875exist (119878)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878
119875forall (119878)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878
(13)
Similarly we define 119875exist(119878119879119877119860) 119875forall(119878119879119877119860) 119875exist(119878119862119860119877) and119875forall(119878119862119860119877) which are
119875exist (119878119879119877119860)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119879119877119860
119875forall (119878119879119877119860)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119879119877119860
119875exist (119878119862119860119877)
= (119909 119910) isin 119875Λ | exist119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119862119860119877
119875forall (119878119862119860119877)
= (119909 119910) isin 119875Λ | forall119911 isin Γ (119909) cap Γ (119910) and 119911 notin 119878119862119860119877
(14)
Correspondingly the ratios of those subsets to 119875Λ arerespectively defined as
24Wilcoxon Signed-Ranks Test TheWilcoxon signed-rankstest is a nonparametric statistical hypothesis test used tocheck whether two methods perform equally well over multi-ple networks [38 39] Let 119889119894 be the difference in performancescores of two link prediction methods on the 119894th networkThe differences are ranked in accordance with their absolutevalues in case of ties average ranks are assigned Let 119877+be the sum of ranks for the networks on which the secondmethod outperformed the first and 119877minus the sum of ranks forthe opposite For a larger number of networks the statistics
is distributed approximately normally [39] In (16) 119879 =min(119877+ 119877minus) and119873 is the number of networks
With 120572 = 005 if 119911 is small than -196 we reject the null-hypothesis which states that both methods perform equallywell
Complexity 5
Table 1 The basic structural features of the giant components of the 12 networks |119881| and |119864| are the total numbers of nodes and edgesrespectively119863 denotes the density which is119863 = 2|119864||119881|(|119881| minus 1) ⟨119896⟩ and ⟨119889⟩ present the average degree and the average shortest distancerespectively 119862 and 119903 indicate the clustering coefficient [26] and assortative coefficient [36] respectively 119867 is the degree heterogeneity [6]defined as119867 = ⟨1198962⟩⟨119896⟩2 and 119890 is the network efficiency [37]
The link prediction problem has a familiar relationship withthe network evolvingmechanism [2 40] A recently proposedtriangle growth mechanism demonstrates that various keyfeatures observed in most real-world networks can be gener-ated in simulated networks [41] Therefore triangle structureinformation has an important effect in link formation
In this work we focus on a new triangle structure namelyTRA-triangle A TRA-triangle passes through one seed nodeone common neighbor and one other node In our opinionthe commonneighbors that can formTRA-triangles aremoreimportant than others Given two nodes 119906 and V we denotethe number of triangles passing through them as (119906 V)which is
(119906 V) =
119862119873(119906 V) if (119906 V) isin 1198640 otherwise
(17)
For the example network in Figure 1(a) the trianglesused for seed nodes 119886 119887 are shown in Figure 1(d) Clearly
(119886 119888) = 2 and (119886 119889) = 1 Thus node 119888 is in more closecontact with 119886 than 119889 Given seed nodes 119909 and 119910 119911 is oneof their common neighbors Function(119909 119910 119911) sums up thenumber of TRA-triangles formed by 119909 119911 and 119910 119911 which is
In this paper we propose a new similarity index bycombining the aforementioned triangle structure and theidea of RA index [13] For the convenience of statement wename our new method TRA index Its definition is
In (19) the numerator is 1 + (119909 119910 119911)2 Therefore theTRA index does not miss the effect of any common neighborIf all common neighbors are zero-triangle-neighbors TRAdegenerates to RA For the example network in Figure 1(a)119879119877119860(119886 119887) = (1+32)4+(1+22)3+(1+02)2+(1+02)4 =4924
6 Complexity
Table 3 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 01The best performance for each network is emphasized by boldface
Table 3 lists the predicted results of different methods interms of AUC on the 12 networks The results are obtained byaveraging over 50 independent realizations for each networkwith testing set containing 10 links The highest AUC valuefor each network is highlighted in boldface Clearly TRAindex gets nine best results over the 12 networks MeanwhileTRA index outperforms the CAR CAA CRA and CCLPindexes on all networksWe can see fromTable 2 that onmostof the networks there exist varying degrees of such seed nodepairs with common neighbors that belong to 119875exist(119878) andor119875forall(119878) As stated in Introduction CCLP index will give loweror zero similarity scores to those pairs Furthermore bothvalues of 119877exist(119878119862119860119877) and 119877forall(119878119862119860119877) are very high on mostof the networks Particularly on Dolphin Email HamsterHEP and Yeast the corresponding values of 119877forall(119878119862119860119877) aregreater than 08 This phenomenon indicates that only a verysmall fraction of seed node pairs with common neighborson those networks can be assigned similarity scores by CAR-based indexes Although there are some seed node pairsbelonging to 119875exist(119878119879119877119860) andor 119875forall(119878119879119877119860) TRA index still canassign reasonable similarity scores to them Therefore theresults of TRA index in Table 3 are better than them ofCAR CAA CRA and CCLP indexes For CN AA RAand ADP indexes ADP index performs the best since itcan penalize common neighbors by automatically adaptingto the network On Dolphin HEP and USAir ADP indexobtains the best accuracy the performance of our indexapproximates to the best In addition TRA index achievesmuch better AUC scores than others on FW and Karate Thisresult suggests that TRA-triangles play an important role onthese two networks From Table 1 both networks are denseones Roughly speaking the probability that there exist TRA-triangle-neighbors between seed nodes on dense networks ismore than on sparse ones
To check whether the proposed index is significantly dif-ferent with compared methods we appliedWilcoxon signed-ranks test [39] based on the results in Table 3 The pairwisetest results are presented in Figure 2 From the statistical point
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 2 The results of Wilcoxon signed-ranks test based onTable 3With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
of view our index is significantly better than others exceptADP index because ADP index has the capability of adaptingto the structure of a network automatically Although there isno statistical difference between our index and ADP indexaccording toWilcoxon signed-ranks test our index performsbetter than ADP index in terms of AUC
Figure 3 exhibits the changes of AUC on 12 networkswhen the proportion of 119864119905119904 in 119864 increases from 10 to20 It is quite evident from Figure 3 that the AUC valuesof all indexes show downward trends when the proportionincreases from 10 to 20 except on FW The reason is thatthe increase of 119864119905119904 will decrease the size of training set 119864119905119903and then will result in the number of common neighborsbetween seed nodes becoming small Consequently thedifficulty of link prediction will enhance The FW networkwhich possesses high average degree small average shortestdistance and small-degree heterogeneity is a very dense
Complexity 7
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA070
075
080
085
090
095
100AU
C
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA065
070
075
080
085
090
095
AUC
CE
CN AA RA ADP CRA CCLP TRA055
060
065
070
075
080
085
AUC
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA060
065
070
075
080
085
090
AUC
Email
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
FW
CN AA RA ADP CRA CCLP TRA060
065
070
075
080
085
090
AUC
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA060065070075080085090095100
AUC
HEP
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
Karate
CN AA RA ADP CRA CCLP TRA080008250850087509000925095009751000
AUC
PB
CN AA RA ADP CAR CAA CRA CCLP TRA080008250850087509000925095009751000
AUC
USAir
CN AA RA ADP CRA CCLP TRA050005250550057506000625065006750700
AUC
Word
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080AU
CYeast
Figure 3 The changes of AUC when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over 50independent realizations
network Therefore the decrease of training set gives slightinfluence of accuracy on FW In addition we can observefrom Figure 3 that the performance presented by all indexeson ADV CE Dolphin Email Hamster HEP Karate Wordand Yeast is very similar On these nine networks the AUCvalues of CAR-based indexes are obvious lower than thoseof others On the network of FW the results of CAR-basedindexes are better than those of CN AA RA and ADPindexes because FW is a very dense network in which theratio of CAR-triangle-neighbor is very high (see Table 2) OnPB and USAir the performance of CAR-based indexes is notas bad as on other nine networks The reason is both networkshave high average degrees small average shortest distancesand high ratio of CAR-triangle-neighbors
Furthermore we list the AUC values of different methodson the 12 networks when |119864119905119904||119864| = 02 in Table 4 Theresults of our index outperform others on eight among the
12 networks while CCLP index achieves the highest value onCE
Table 5 gives the results in terms of ranking score Theseresults are similar to those in Table 3 The ranking score ofTRA index outperforms others except on Dolphin HEP andUSAir The pairwise Wilcoxon signed-ranks test results areshown in Figure 4 Similar to the test in Figure 2 TRA indexis significantly better than compared methods except ADPindex As depicted above ADP has the adaptive capabilityand hence performs better than other compared methods
Figure 5 describes the changes of ranking score on 12networks when |119864119905119904||119864| increases from 10 to 20 Clearlyall indexes yield higher ranking scores with the increase of119864119905119904 Do not forget that higher ranking score means loweraccuracy As analyzed above FW is very dense Thus thechanges of AUC on FW are very slight (see Figure 3)However the changes of ranking score on FW are more
8 Complexity
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 4 The results of Wilcoxon signed-ranks test based on Table 5 With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA010015020025030035040045050
Rank
ing
Scor
e
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA01
02
03
04
05
06
Rank
ing
Scor
e
CE
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA02
03
04
05
06
07
Rank
ing
Scor
e
Email
CN AA RA ADP CRA CCLP TRA030
032
034
036
038
040
042
044
Rank
ing
Scor
e
FW
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA
02
03
04
05
06
07
Rank
ing
Scor
e
HEP
CN AA RA ADP CRA CCLP TRA02
03
04
05
06
07
08
09
Rank
ing
Scor
e
Karate
CN AA RA ADP CRA CCLP TRA005000750100012501500175020002250250
Rank
ing
Scor
e
PB
CN AA RA ADP CAR CAA CRA CCLP TRA
006
008
010
012
014
016
018
020
Rank
ing
Scor
e
USAir
CN AA RA ADP CRA CCLP TRA04
05
06
07
08
09
Rank
ing
Scor
e
Word
CN AA RA ADP CRA CCLP TRA050055060065070075080085090
Rank
ing
Scor
e
Yeast
Figure 5 The changes of ranking score when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over50 independent realizations
Complexity 9
Table 4 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 02The best performance for each network is emphasized by boldface
Table 5The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =01 The best performance for each network is emphasized by boldface
evident especially for CAA and CRA indexes The reasonis that the calculation of ranking score considers all missinglinks In addition as seen in Figure 5 CAA and CRA indexesperform worse than CAR index according to ranking scoreFrom the definitions of these three indexes we find that bothCAA and CRA indexes can get more negative impact thanCAR index from zero-triangle-neighbors
Finally the ranking scores of all methods on the 12networks with |119864119905119904||119864| = 02 are listed in Table 6 Our indexoutperforms all other indexes except on HEP and USAir interms of ranking scoreThese results are consistent with themof AUC In contrast with that on FW the influence of TRA-triangles on HEP and USAir is small
From the above results we can conclude that TRA indexis superior to CAR-based indexes and CCLP index andperforms better than common-neighbor-based methods onmost of networks
5 Conclusion and Discussion
Link prediction is an important research topic of complexnetwork analysis and has a wide range of applications in
various fields Inspired by the triangle growth mechanism innetwork evolving [41] this paper proposed the TRA indexfor link prediction When computing the similarity betweentwo seed nodes the proposed index not only counts thecontributions of all common neighbors but also emphasizesthe importance of the neighbors that can formTRA-trianglesTo some extent TRA-triangles reflect the close relationshipsbetween neighbors and seed nodes In addition the proposedindex also adopts the theory of resource allocation [13] due toits effectiveness
The accuracy of the TRA index is experimentally evalu-ated over 12 real-world networks from various fields in termsof AUC and ranking score The experimental results showthat the proposed index performs far better than CAR-basedindexes Meanwhile our index outperforms the CCLP indexbecause of the superior strategy in our index For common-neighbor-based methods the proposed index yields someimprovements of accuracy onmost of networksThese resultsindicate that combining the information of TRA-trianglesand the theory of resource allocation in similarity index is ahelpful idea for link prediction
10 Complexity
Table 6The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =02 The best performance for each network is emphasized by boldface
There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair
Data Availability
Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)
References
[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013
[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015
[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012
[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016
[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011
[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015
[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014
[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016
[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016
[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007
[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003
[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009
[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953
[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002
[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 5
Table 1 The basic structural features of the giant components of the 12 networks |119881| and |119864| are the total numbers of nodes and edgesrespectively119863 denotes the density which is119863 = 2|119864||119881|(|119881| minus 1) ⟨119896⟩ and ⟨119889⟩ present the average degree and the average shortest distancerespectively 119862 and 119903 indicate the clustering coefficient [26] and assortative coefficient [36] respectively 119867 is the degree heterogeneity [6]defined as119867 = ⟨1198962⟩⟨119896⟩2 and 119890 is the network efficiency [37]
The link prediction problem has a familiar relationship withthe network evolvingmechanism [2 40] A recently proposedtriangle growth mechanism demonstrates that various keyfeatures observed in most real-world networks can be gener-ated in simulated networks [41] Therefore triangle structureinformation has an important effect in link formation
In this work we focus on a new triangle structure namelyTRA-triangle A TRA-triangle passes through one seed nodeone common neighbor and one other node In our opinionthe commonneighbors that can formTRA-triangles aremoreimportant than others Given two nodes 119906 and V we denotethe number of triangles passing through them as (119906 V)which is
(119906 V) =
119862119873(119906 V) if (119906 V) isin 1198640 otherwise
(17)
For the example network in Figure 1(a) the trianglesused for seed nodes 119886 119887 are shown in Figure 1(d) Clearly
(119886 119888) = 2 and (119886 119889) = 1 Thus node 119888 is in more closecontact with 119886 than 119889 Given seed nodes 119909 and 119910 119911 is oneof their common neighbors Function(119909 119910 119911) sums up thenumber of TRA-triangles formed by 119909 119911 and 119910 119911 which is
In this paper we propose a new similarity index bycombining the aforementioned triangle structure and theidea of RA index [13] For the convenience of statement wename our new method TRA index Its definition is
In (19) the numerator is 1 + (119909 119910 119911)2 Therefore theTRA index does not miss the effect of any common neighborIf all common neighbors are zero-triangle-neighbors TRAdegenerates to RA For the example network in Figure 1(a)119879119877119860(119886 119887) = (1+32)4+(1+22)3+(1+02)2+(1+02)4 =4924
6 Complexity
Table 3 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 01The best performance for each network is emphasized by boldface
Table 3 lists the predicted results of different methods interms of AUC on the 12 networks The results are obtained byaveraging over 50 independent realizations for each networkwith testing set containing 10 links The highest AUC valuefor each network is highlighted in boldface Clearly TRAindex gets nine best results over the 12 networks MeanwhileTRA index outperforms the CAR CAA CRA and CCLPindexes on all networksWe can see fromTable 2 that onmostof the networks there exist varying degrees of such seed nodepairs with common neighbors that belong to 119875exist(119878) andor119875forall(119878) As stated in Introduction CCLP index will give loweror zero similarity scores to those pairs Furthermore bothvalues of 119877exist(119878119862119860119877) and 119877forall(119878119862119860119877) are very high on mostof the networks Particularly on Dolphin Email HamsterHEP and Yeast the corresponding values of 119877forall(119878119862119860119877) aregreater than 08 This phenomenon indicates that only a verysmall fraction of seed node pairs with common neighborson those networks can be assigned similarity scores by CAR-based indexes Although there are some seed node pairsbelonging to 119875exist(119878119879119877119860) andor 119875forall(119878119879119877119860) TRA index still canassign reasonable similarity scores to them Therefore theresults of TRA index in Table 3 are better than them ofCAR CAA CRA and CCLP indexes For CN AA RAand ADP indexes ADP index performs the best since itcan penalize common neighbors by automatically adaptingto the network On Dolphin HEP and USAir ADP indexobtains the best accuracy the performance of our indexapproximates to the best In addition TRA index achievesmuch better AUC scores than others on FW and Karate Thisresult suggests that TRA-triangles play an important role onthese two networks From Table 1 both networks are denseones Roughly speaking the probability that there exist TRA-triangle-neighbors between seed nodes on dense networks ismore than on sparse ones
To check whether the proposed index is significantly dif-ferent with compared methods we appliedWilcoxon signed-ranks test [39] based on the results in Table 3 The pairwisetest results are presented in Figure 2 From the statistical point
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 2 The results of Wilcoxon signed-ranks test based onTable 3With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
of view our index is significantly better than others exceptADP index because ADP index has the capability of adaptingto the structure of a network automatically Although there isno statistical difference between our index and ADP indexaccording toWilcoxon signed-ranks test our index performsbetter than ADP index in terms of AUC
Figure 3 exhibits the changes of AUC on 12 networkswhen the proportion of 119864119905119904 in 119864 increases from 10 to20 It is quite evident from Figure 3 that the AUC valuesof all indexes show downward trends when the proportionincreases from 10 to 20 except on FW The reason is thatthe increase of 119864119905119904 will decrease the size of training set 119864119905119903and then will result in the number of common neighborsbetween seed nodes becoming small Consequently thedifficulty of link prediction will enhance The FW networkwhich possesses high average degree small average shortestdistance and small-degree heterogeneity is a very dense
Complexity 7
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA070
075
080
085
090
095
100AU
C
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA065
070
075
080
085
090
095
AUC
CE
CN AA RA ADP CRA CCLP TRA055
060
065
070
075
080
085
AUC
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA060
065
070
075
080
085
090
AUC
Email
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
FW
CN AA RA ADP CRA CCLP TRA060
065
070
075
080
085
090
AUC
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA060065070075080085090095100
AUC
HEP
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
Karate
CN AA RA ADP CRA CCLP TRA080008250850087509000925095009751000
AUC
PB
CN AA RA ADP CAR CAA CRA CCLP TRA080008250850087509000925095009751000
AUC
USAir
CN AA RA ADP CRA CCLP TRA050005250550057506000625065006750700
AUC
Word
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080AU
CYeast
Figure 3 The changes of AUC when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over 50independent realizations
network Therefore the decrease of training set gives slightinfluence of accuracy on FW In addition we can observefrom Figure 3 that the performance presented by all indexeson ADV CE Dolphin Email Hamster HEP Karate Wordand Yeast is very similar On these nine networks the AUCvalues of CAR-based indexes are obvious lower than thoseof others On the network of FW the results of CAR-basedindexes are better than those of CN AA RA and ADPindexes because FW is a very dense network in which theratio of CAR-triangle-neighbor is very high (see Table 2) OnPB and USAir the performance of CAR-based indexes is notas bad as on other nine networks The reason is both networkshave high average degrees small average shortest distancesand high ratio of CAR-triangle-neighbors
Furthermore we list the AUC values of different methodson the 12 networks when |119864119905119904||119864| = 02 in Table 4 Theresults of our index outperform others on eight among the
12 networks while CCLP index achieves the highest value onCE
Table 5 gives the results in terms of ranking score Theseresults are similar to those in Table 3 The ranking score ofTRA index outperforms others except on Dolphin HEP andUSAir The pairwise Wilcoxon signed-ranks test results areshown in Figure 4 Similar to the test in Figure 2 TRA indexis significantly better than compared methods except ADPindex As depicted above ADP has the adaptive capabilityand hence performs better than other compared methods
Figure 5 describes the changes of ranking score on 12networks when |119864119905119904||119864| increases from 10 to 20 Clearlyall indexes yield higher ranking scores with the increase of119864119905119904 Do not forget that higher ranking score means loweraccuracy As analyzed above FW is very dense Thus thechanges of AUC on FW are very slight (see Figure 3)However the changes of ranking score on FW are more
8 Complexity
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 4 The results of Wilcoxon signed-ranks test based on Table 5 With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA010015020025030035040045050
Rank
ing
Scor
e
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA01
02
03
04
05
06
Rank
ing
Scor
e
CE
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA02
03
04
05
06
07
Rank
ing
Scor
e
Email
CN AA RA ADP CRA CCLP TRA030
032
034
036
038
040
042
044
Rank
ing
Scor
e
FW
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA
02
03
04
05
06
07
Rank
ing
Scor
e
HEP
CN AA RA ADP CRA CCLP TRA02
03
04
05
06
07
08
09
Rank
ing
Scor
e
Karate
CN AA RA ADP CRA CCLP TRA005000750100012501500175020002250250
Rank
ing
Scor
e
PB
CN AA RA ADP CAR CAA CRA CCLP TRA
006
008
010
012
014
016
018
020
Rank
ing
Scor
e
USAir
CN AA RA ADP CRA CCLP TRA04
05
06
07
08
09
Rank
ing
Scor
e
Word
CN AA RA ADP CRA CCLP TRA050055060065070075080085090
Rank
ing
Scor
e
Yeast
Figure 5 The changes of ranking score when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over50 independent realizations
Complexity 9
Table 4 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 02The best performance for each network is emphasized by boldface
Table 5The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =01 The best performance for each network is emphasized by boldface
evident especially for CAA and CRA indexes The reasonis that the calculation of ranking score considers all missinglinks In addition as seen in Figure 5 CAA and CRA indexesperform worse than CAR index according to ranking scoreFrom the definitions of these three indexes we find that bothCAA and CRA indexes can get more negative impact thanCAR index from zero-triangle-neighbors
Finally the ranking scores of all methods on the 12networks with |119864119905119904||119864| = 02 are listed in Table 6 Our indexoutperforms all other indexes except on HEP and USAir interms of ranking scoreThese results are consistent with themof AUC In contrast with that on FW the influence of TRA-triangles on HEP and USAir is small
From the above results we can conclude that TRA indexis superior to CAR-based indexes and CCLP index andperforms better than common-neighbor-based methods onmost of networks
5 Conclusion and Discussion
Link prediction is an important research topic of complexnetwork analysis and has a wide range of applications in
various fields Inspired by the triangle growth mechanism innetwork evolving [41] this paper proposed the TRA indexfor link prediction When computing the similarity betweentwo seed nodes the proposed index not only counts thecontributions of all common neighbors but also emphasizesthe importance of the neighbors that can formTRA-trianglesTo some extent TRA-triangles reflect the close relationshipsbetween neighbors and seed nodes In addition the proposedindex also adopts the theory of resource allocation [13] due toits effectiveness
The accuracy of the TRA index is experimentally evalu-ated over 12 real-world networks from various fields in termsof AUC and ranking score The experimental results showthat the proposed index performs far better than CAR-basedindexes Meanwhile our index outperforms the CCLP indexbecause of the superior strategy in our index For common-neighbor-based methods the proposed index yields someimprovements of accuracy onmost of networksThese resultsindicate that combining the information of TRA-trianglesand the theory of resource allocation in similarity index is ahelpful idea for link prediction
10 Complexity
Table 6The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =02 The best performance for each network is emphasized by boldface
There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair
Data Availability
Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)
References
[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013
[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015
[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012
[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016
[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011
[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015
[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014
[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016
[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016
[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007
[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003
[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009
[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953
[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002
[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
6 Complexity
Table 3 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 01The best performance for each network is emphasized by boldface
Table 3 lists the predicted results of different methods interms of AUC on the 12 networks The results are obtained byaveraging over 50 independent realizations for each networkwith testing set containing 10 links The highest AUC valuefor each network is highlighted in boldface Clearly TRAindex gets nine best results over the 12 networks MeanwhileTRA index outperforms the CAR CAA CRA and CCLPindexes on all networksWe can see fromTable 2 that onmostof the networks there exist varying degrees of such seed nodepairs with common neighbors that belong to 119875exist(119878) andor119875forall(119878) As stated in Introduction CCLP index will give loweror zero similarity scores to those pairs Furthermore bothvalues of 119877exist(119878119862119860119877) and 119877forall(119878119862119860119877) are very high on mostof the networks Particularly on Dolphin Email HamsterHEP and Yeast the corresponding values of 119877forall(119878119862119860119877) aregreater than 08 This phenomenon indicates that only a verysmall fraction of seed node pairs with common neighborson those networks can be assigned similarity scores by CAR-based indexes Although there are some seed node pairsbelonging to 119875exist(119878119879119877119860) andor 119875forall(119878119879119877119860) TRA index still canassign reasonable similarity scores to them Therefore theresults of TRA index in Table 3 are better than them ofCAR CAA CRA and CCLP indexes For CN AA RAand ADP indexes ADP index performs the best since itcan penalize common neighbors by automatically adaptingto the network On Dolphin HEP and USAir ADP indexobtains the best accuracy the performance of our indexapproximates to the best In addition TRA index achievesmuch better AUC scores than others on FW and Karate Thisresult suggests that TRA-triangles play an important role onthese two networks From Table 1 both networks are denseones Roughly speaking the probability that there exist TRA-triangle-neighbors between seed nodes on dense networks ismore than on sparse ones
To check whether the proposed index is significantly dif-ferent with compared methods we appliedWilcoxon signed-ranks test [39] based on the results in Table 3 The pairwisetest results are presented in Figure 2 From the statistical point
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 2 The results of Wilcoxon signed-ranks test based onTable 3With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
of view our index is significantly better than others exceptADP index because ADP index has the capability of adaptingto the structure of a network automatically Although there isno statistical difference between our index and ADP indexaccording toWilcoxon signed-ranks test our index performsbetter than ADP index in terms of AUC
Figure 3 exhibits the changes of AUC on 12 networkswhen the proportion of 119864119905119904 in 119864 increases from 10 to20 It is quite evident from Figure 3 that the AUC valuesof all indexes show downward trends when the proportionincreases from 10 to 20 except on FW The reason is thatthe increase of 119864119905119904 will decrease the size of training set 119864119905119903and then will result in the number of common neighborsbetween seed nodes becoming small Consequently thedifficulty of link prediction will enhance The FW networkwhich possesses high average degree small average shortestdistance and small-degree heterogeneity is a very dense
Complexity 7
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA070
075
080
085
090
095
100AU
C
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA065
070
075
080
085
090
095
AUC
CE
CN AA RA ADP CRA CCLP TRA055
060
065
070
075
080
085
AUC
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA060
065
070
075
080
085
090
AUC
Email
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
FW
CN AA RA ADP CRA CCLP TRA060
065
070
075
080
085
090
AUC
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA060065070075080085090095100
AUC
HEP
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
Karate
CN AA RA ADP CRA CCLP TRA080008250850087509000925095009751000
AUC
PB
CN AA RA ADP CAR CAA CRA CCLP TRA080008250850087509000925095009751000
AUC
USAir
CN AA RA ADP CRA CCLP TRA050005250550057506000625065006750700
AUC
Word
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080AU
CYeast
Figure 3 The changes of AUC when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over 50independent realizations
network Therefore the decrease of training set gives slightinfluence of accuracy on FW In addition we can observefrom Figure 3 that the performance presented by all indexeson ADV CE Dolphin Email Hamster HEP Karate Wordand Yeast is very similar On these nine networks the AUCvalues of CAR-based indexes are obvious lower than thoseof others On the network of FW the results of CAR-basedindexes are better than those of CN AA RA and ADPindexes because FW is a very dense network in which theratio of CAR-triangle-neighbor is very high (see Table 2) OnPB and USAir the performance of CAR-based indexes is notas bad as on other nine networks The reason is both networkshave high average degrees small average shortest distancesand high ratio of CAR-triangle-neighbors
Furthermore we list the AUC values of different methodson the 12 networks when |119864119905119904||119864| = 02 in Table 4 Theresults of our index outperform others on eight among the
12 networks while CCLP index achieves the highest value onCE
Table 5 gives the results in terms of ranking score Theseresults are similar to those in Table 3 The ranking score ofTRA index outperforms others except on Dolphin HEP andUSAir The pairwise Wilcoxon signed-ranks test results areshown in Figure 4 Similar to the test in Figure 2 TRA indexis significantly better than compared methods except ADPindex As depicted above ADP has the adaptive capabilityand hence performs better than other compared methods
Figure 5 describes the changes of ranking score on 12networks when |119864119905119904||119864| increases from 10 to 20 Clearlyall indexes yield higher ranking scores with the increase of119864119905119904 Do not forget that higher ranking score means loweraccuracy As analyzed above FW is very dense Thus thechanges of AUC on FW are very slight (see Figure 3)However the changes of ranking score on FW are more
8 Complexity
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 4 The results of Wilcoxon signed-ranks test based on Table 5 With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA010015020025030035040045050
Rank
ing
Scor
e
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA01
02
03
04
05
06
Rank
ing
Scor
e
CE
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA02
03
04
05
06
07
Rank
ing
Scor
e
Email
CN AA RA ADP CRA CCLP TRA030
032
034
036
038
040
042
044
Rank
ing
Scor
e
FW
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA
02
03
04
05
06
07
Rank
ing
Scor
e
HEP
CN AA RA ADP CRA CCLP TRA02
03
04
05
06
07
08
09
Rank
ing
Scor
e
Karate
CN AA RA ADP CRA CCLP TRA005000750100012501500175020002250250
Rank
ing
Scor
e
PB
CN AA RA ADP CAR CAA CRA CCLP TRA
006
008
010
012
014
016
018
020
Rank
ing
Scor
e
USAir
CN AA RA ADP CRA CCLP TRA04
05
06
07
08
09
Rank
ing
Scor
e
Word
CN AA RA ADP CRA CCLP TRA050055060065070075080085090
Rank
ing
Scor
e
Yeast
Figure 5 The changes of ranking score when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over50 independent realizations
Complexity 9
Table 4 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 02The best performance for each network is emphasized by boldface
Table 5The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =01 The best performance for each network is emphasized by boldface
evident especially for CAA and CRA indexes The reasonis that the calculation of ranking score considers all missinglinks In addition as seen in Figure 5 CAA and CRA indexesperform worse than CAR index according to ranking scoreFrom the definitions of these three indexes we find that bothCAA and CRA indexes can get more negative impact thanCAR index from zero-triangle-neighbors
Finally the ranking scores of all methods on the 12networks with |119864119905119904||119864| = 02 are listed in Table 6 Our indexoutperforms all other indexes except on HEP and USAir interms of ranking scoreThese results are consistent with themof AUC In contrast with that on FW the influence of TRA-triangles on HEP and USAir is small
From the above results we can conclude that TRA indexis superior to CAR-based indexes and CCLP index andperforms better than common-neighbor-based methods onmost of networks
5 Conclusion and Discussion
Link prediction is an important research topic of complexnetwork analysis and has a wide range of applications in
various fields Inspired by the triangle growth mechanism innetwork evolving [41] this paper proposed the TRA indexfor link prediction When computing the similarity betweentwo seed nodes the proposed index not only counts thecontributions of all common neighbors but also emphasizesthe importance of the neighbors that can formTRA-trianglesTo some extent TRA-triangles reflect the close relationshipsbetween neighbors and seed nodes In addition the proposedindex also adopts the theory of resource allocation [13] due toits effectiveness
The accuracy of the TRA index is experimentally evalu-ated over 12 real-world networks from various fields in termsof AUC and ranking score The experimental results showthat the proposed index performs far better than CAR-basedindexes Meanwhile our index outperforms the CCLP indexbecause of the superior strategy in our index For common-neighbor-based methods the proposed index yields someimprovements of accuracy onmost of networksThese resultsindicate that combining the information of TRA-trianglesand the theory of resource allocation in similarity index is ahelpful idea for link prediction
10 Complexity
Table 6The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =02 The best performance for each network is emphasized by boldface
There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair
Data Availability
Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)
References
[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013
[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015
[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012
[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016
[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011
[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015
[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014
[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016
[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016
[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007
[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003
[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009
[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953
[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002
[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 7
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA070
075
080
085
090
095
100AU
C
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA065
070
075
080
085
090
095
AUC
CE
CN AA RA ADP CRA CCLP TRA055
060
065
070
075
080
085
AUC
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA060
065
070
075
080
085
090
AUC
Email
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
FW
CN AA RA ADP CRA CCLP TRA060
065
070
075
080
085
090
AUC
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA060065070075080085090095100
AUC
HEP
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080
AUC
Karate
CN AA RA ADP CRA CCLP TRA080008250850087509000925095009751000
AUC
PB
CN AA RA ADP CAR CAA CRA CCLP TRA080008250850087509000925095009751000
AUC
USAir
CN AA RA ADP CRA CCLP TRA050005250550057506000625065006750700
AUC
Word
CN AA RA ADP CRA CCLP TRA050
055
060
065
070
075
080AU
CYeast
Figure 3 The changes of AUC when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over 50independent realizations
network Therefore the decrease of training set gives slightinfluence of accuracy on FW In addition we can observefrom Figure 3 that the performance presented by all indexeson ADV CE Dolphin Email Hamster HEP Karate Wordand Yeast is very similar On these nine networks the AUCvalues of CAR-based indexes are obvious lower than thoseof others On the network of FW the results of CAR-basedindexes are better than those of CN AA RA and ADPindexes because FW is a very dense network in which theratio of CAR-triangle-neighbor is very high (see Table 2) OnPB and USAir the performance of CAR-based indexes is notas bad as on other nine networks The reason is both networkshave high average degrees small average shortest distancesand high ratio of CAR-triangle-neighbors
Furthermore we list the AUC values of different methodson the 12 networks when |119864119905119904||119864| = 02 in Table 4 Theresults of our index outperform others on eight among the
12 networks while CCLP index achieves the highest value onCE
Table 5 gives the results in terms of ranking score Theseresults are similar to those in Table 3 The ranking score ofTRA index outperforms others except on Dolphin HEP andUSAir The pairwise Wilcoxon signed-ranks test results areshown in Figure 4 Similar to the test in Figure 2 TRA indexis significantly better than compared methods except ADPindex As depicted above ADP has the adaptive capabilityand hence performs better than other compared methods
Figure 5 describes the changes of ranking score on 12networks when |119864119905119904||119864| increases from 10 to 20 Clearlyall indexes yield higher ranking scores with the increase of119864119905119904 Do not forget that higher ranking score means loweraccuracy As analyzed above FW is very dense Thus thechanges of AUC on FW are very slight (see Figure 3)However the changes of ranking score on FW are more
8 Complexity
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 4 The results of Wilcoxon signed-ranks test based on Table 5 With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA010015020025030035040045050
Rank
ing
Scor
e
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA01
02
03
04
05
06
Rank
ing
Scor
e
CE
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA02
03
04
05
06
07
Rank
ing
Scor
e
Email
CN AA RA ADP CRA CCLP TRA030
032
034
036
038
040
042
044
Rank
ing
Scor
e
FW
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA
02
03
04
05
06
07
Rank
ing
Scor
e
HEP
CN AA RA ADP CRA CCLP TRA02
03
04
05
06
07
08
09
Rank
ing
Scor
e
Karate
CN AA RA ADP CRA CCLP TRA005000750100012501500175020002250250
Rank
ing
Scor
e
PB
CN AA RA ADP CAR CAA CRA CCLP TRA
006
008
010
012
014
016
018
020
Rank
ing
Scor
e
USAir
CN AA RA ADP CRA CCLP TRA04
05
06
07
08
09
Rank
ing
Scor
e
Word
CN AA RA ADP CRA CCLP TRA050055060065070075080085090
Rank
ing
Scor
e
Yeast
Figure 5 The changes of ranking score when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over50 independent realizations
Complexity 9
Table 4 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 02The best performance for each network is emphasized by boldface
Table 5The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =01 The best performance for each network is emphasized by boldface
evident especially for CAA and CRA indexes The reasonis that the calculation of ranking score considers all missinglinks In addition as seen in Figure 5 CAA and CRA indexesperform worse than CAR index according to ranking scoreFrom the definitions of these three indexes we find that bothCAA and CRA indexes can get more negative impact thanCAR index from zero-triangle-neighbors
Finally the ranking scores of all methods on the 12networks with |119864119905119904||119864| = 02 are listed in Table 6 Our indexoutperforms all other indexes except on HEP and USAir interms of ranking scoreThese results are consistent with themof AUC In contrast with that on FW the influence of TRA-triangles on HEP and USAir is small
From the above results we can conclude that TRA indexis superior to CAR-based indexes and CCLP index andperforms better than common-neighbor-based methods onmost of networks
5 Conclusion and Discussion
Link prediction is an important research topic of complexnetwork analysis and has a wide range of applications in
various fields Inspired by the triangle growth mechanism innetwork evolving [41] this paper proposed the TRA indexfor link prediction When computing the similarity betweentwo seed nodes the proposed index not only counts thecontributions of all common neighbors but also emphasizesthe importance of the neighbors that can formTRA-trianglesTo some extent TRA-triangles reflect the close relationshipsbetween neighbors and seed nodes In addition the proposedindex also adopts the theory of resource allocation [13] due toits effectiveness
The accuracy of the TRA index is experimentally evalu-ated over 12 real-world networks from various fields in termsof AUC and ranking score The experimental results showthat the proposed index performs far better than CAR-basedindexes Meanwhile our index outperforms the CCLP indexbecause of the superior strategy in our index For common-neighbor-based methods the proposed index yields someimprovements of accuracy onmost of networksThese resultsindicate that combining the information of TRA-trianglesand the theory of resource allocation in similarity index is ahelpful idea for link prediction
10 Complexity
Table 6The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =02 The best performance for each network is emphasized by boldface
There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair
Data Availability
Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)
References
[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013
[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015
[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012
[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016
[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011
[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015
[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014
[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016
[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016
[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007
[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003
[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009
[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953
[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002
[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
8 Complexity
Wilcoxon signed-ranks testminus18
minus20
minus22
minus24
minus26
minus28
minus30
z
TRA
CN
TRA
AA
TRA
RA
TRA
AD
P
TRA
CA
R
TRA
CA
A
TRA
CRA
TRA
CCL
P
z=-196
Figure 4 The results of Wilcoxon signed-ranks test based on Table 5 With 120572 = 005 if 119911 lt= minus196 the null-hypothesis is rejected
CN AA RA ADP CAR CAA CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CAR CAA CAR CAA
CRA CCLP TRA010015020025030035040045050
Rank
ing
Scor
e
ADV
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
1020
CN AA RA ADP CRA CCLP TRA01
02
03
04
05
06
Rank
ing
Scor
e
CE
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Dolphin
CN AA RA ADP CAR CAA CRA CCLP TRA02
03
04
05
06
07
Rank
ing
Scor
e
Email
CN AA RA ADP CRA CCLP TRA030
032
034
036
038
040
042
044
Rank
ing
Scor
e
FW
CN AA RA ADP CRA CCLP TRA03
04
05
06
07
08
Rank
ing
Scor
e
Hamster
CN AA RA ADP CAR CAA CRA CCLP TRA
02
03
04
05
06
07
Rank
ing
Scor
e
HEP
CN AA RA ADP CRA CCLP TRA02
03
04
05
06
07
08
09
Rank
ing
Scor
e
Karate
CN AA RA ADP CRA CCLP TRA005000750100012501500175020002250250
Rank
ing
Scor
e
PB
CN AA RA ADP CAR CAA CRA CCLP TRA
006
008
010
012
014
016
018
020
Rank
ing
Scor
e
USAir
CN AA RA ADP CRA CCLP TRA04
05
06
07
08
09
Rank
ing
Scor
e
Word
CN AA RA ADP CRA CCLP TRA050055060065070075080085090
Rank
ing
Scor
e
Yeast
Figure 5 The changes of ranking score when |119864119905119904||119864| increases from 10 to 20 on 12 networks Each point is obtained by averaging over50 independent realizations
Complexity 9
Table 4 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 02The best performance for each network is emphasized by boldface
Table 5The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =01 The best performance for each network is emphasized by boldface
evident especially for CAA and CRA indexes The reasonis that the calculation of ranking score considers all missinglinks In addition as seen in Figure 5 CAA and CRA indexesperform worse than CAR index according to ranking scoreFrom the definitions of these three indexes we find that bothCAA and CRA indexes can get more negative impact thanCAR index from zero-triangle-neighbors
Finally the ranking scores of all methods on the 12networks with |119864119905119904||119864| = 02 are listed in Table 6 Our indexoutperforms all other indexes except on HEP and USAir interms of ranking scoreThese results are consistent with themof AUC In contrast with that on FW the influence of TRA-triangles on HEP and USAir is small
From the above results we can conclude that TRA indexis superior to CAR-based indexes and CCLP index andperforms better than common-neighbor-based methods onmost of networks
5 Conclusion and Discussion
Link prediction is an important research topic of complexnetwork analysis and has a wide range of applications in
various fields Inspired by the triangle growth mechanism innetwork evolving [41] this paper proposed the TRA indexfor link prediction When computing the similarity betweentwo seed nodes the proposed index not only counts thecontributions of all common neighbors but also emphasizesthe importance of the neighbors that can formTRA-trianglesTo some extent TRA-triangles reflect the close relationshipsbetween neighbors and seed nodes In addition the proposedindex also adopts the theory of resource allocation [13] due toits effectiveness
The accuracy of the TRA index is experimentally evalu-ated over 12 real-world networks from various fields in termsof AUC and ranking score The experimental results showthat the proposed index performs far better than CAR-basedindexes Meanwhile our index outperforms the CCLP indexbecause of the superior strategy in our index For common-neighbor-based methods the proposed index yields someimprovements of accuracy onmost of networksThese resultsindicate that combining the information of TRA-trianglesand the theory of resource allocation in similarity index is ahelpful idea for link prediction
10 Complexity
Table 6The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =02 The best performance for each network is emphasized by boldface
There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair
Data Availability
Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)
References
[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013
[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015
[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012
[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016
[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011
[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015
[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014
[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016
[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016
[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007
[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003
[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009
[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953
[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002
[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 9
Table 4 The AUC of different methods in 12 networks The results are the average of 50 independent implementations with |119864119905119904||119864| = 02The best performance for each network is emphasized by boldface
Table 5The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =01 The best performance for each network is emphasized by boldface
evident especially for CAA and CRA indexes The reasonis that the calculation of ranking score considers all missinglinks In addition as seen in Figure 5 CAA and CRA indexesperform worse than CAR index according to ranking scoreFrom the definitions of these three indexes we find that bothCAA and CRA indexes can get more negative impact thanCAR index from zero-triangle-neighbors
Finally the ranking scores of all methods on the 12networks with |119864119905119904||119864| = 02 are listed in Table 6 Our indexoutperforms all other indexes except on HEP and USAir interms of ranking scoreThese results are consistent with themof AUC In contrast with that on FW the influence of TRA-triangles on HEP and USAir is small
From the above results we can conclude that TRA indexis superior to CAR-based indexes and CCLP index andperforms better than common-neighbor-based methods onmost of networks
5 Conclusion and Discussion
Link prediction is an important research topic of complexnetwork analysis and has a wide range of applications in
various fields Inspired by the triangle growth mechanism innetwork evolving [41] this paper proposed the TRA indexfor link prediction When computing the similarity betweentwo seed nodes the proposed index not only counts thecontributions of all common neighbors but also emphasizesthe importance of the neighbors that can formTRA-trianglesTo some extent TRA-triangles reflect the close relationshipsbetween neighbors and seed nodes In addition the proposedindex also adopts the theory of resource allocation [13] due toits effectiveness
The accuracy of the TRA index is experimentally evalu-ated over 12 real-world networks from various fields in termsof AUC and ranking score The experimental results showthat the proposed index performs far better than CAR-basedindexes Meanwhile our index outperforms the CCLP indexbecause of the superior strategy in our index For common-neighbor-based methods the proposed index yields someimprovements of accuracy onmost of networksThese resultsindicate that combining the information of TRA-trianglesand the theory of resource allocation in similarity index is ahelpful idea for link prediction
10 Complexity
Table 6The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =02 The best performance for each network is emphasized by boldface
There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair
Data Availability
Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)
References
[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013
[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015
[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012
[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016
[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011
[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015
[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014
[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016
[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016
[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007
[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003
[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009
[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953
[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002
[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
10 Complexity
Table 6The ranking score of differentmethods in 12 networksThe results are the average of 50 independent implementationswith |119864119905119904||119864| =02 The best performance for each network is emphasized by boldface
There are some improved studies for our index in futureOne of them is to analyze the degree of influence of TRA-triangles on different networks and further to be adaptive toset the weight of TRA-triangles on different networks Thesecond is to study the application of TRA index on othertopics such as community detection and anomaly detectionIn addition for learning-based link prediction approachesTRA index can be used as a feature for a node pair
Data Availability
Thenetworks used in this study are available fromhttpdeimurvcatsimalexandrearenasdatawelcomehtm httpwww-personalumichedusimmejnnetdata httpvladofmfuni-ljsipubnetworksdata httpnoesisikororgdatasetslink-prediction and httpkonectuni-koblenzdenetworks
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 61602225) and the FundamentalResearch Funds for the Central Universities (no lzujbky-2017-192)
References
[1] Q-M Zhang L Lu W-Q Wang Y-X Zhu and T Zhou ldquoPo-tential theory for directed networksrdquo PLoS ONE vol 8 no 2Article ID e55437 2013
[2] Q Zhang X Xu Y Zhu and T Zhou ldquoMeasuring multipleevolution mechanisms of complex networksrdquo Scientific Reportsvol 5 no 1 2015
[3] L Lu M Medo C H Yeung Y Zhang Z Zhang and T ZhouldquoRecommender systemsrdquo Physics Reports vol 519 no 1 pp 1ndash49 2012
[4] R Guimera andM Sales-Pardo ldquoMissing and spurious interac-tions and the reconstruction of complex networksrdquo Proceedingsof the National Acadamy of Sciences of the United States ofAmerica vol 106 no 52 pp 22073ndash22078 2009
[5] S S Bhowmick and B S Seah ldquoClustering and SummarizingProtein-Protein Interaction Networks A Surveyrdquo IEEE Trans-actions on Knowledge and Data Engineering vol 28 no 3 pp638ndash658 2016
[6] L Lu and T Zhou ldquoLink prediction in complex networks a sur-veyrdquo Physica A Statistical Mechanics and its Applications vol390 no 6 pp 1150ndash1170 2011
[7] L Li L Qian XWang S Luo andXChen ldquoAccurate similarityindex based on activity and connectivity of node for link pre-dictionrdquo International Journal of Modern Physics B vol 29 no17 1550108 15 pages 2015
[8] P Wang B Xu Y Wu and X Zhou ldquoLink prediction in socialnetworks the state-of-the-artrdquo Science China Information Sci-ences vol 58 no 1 pp 1ndash38 2014
[9] V Martınez F Berzal and J-C Cubero ldquoA survey of link pre-diction in complex networksrdquoACMComputing Surveys vol 49no 4 pp 691ndash6933 2016
[10] C Ahmed A ElKorany and R Bahgat ldquoA supervised learningapproach to link prediction in Twitterrdquo Social Network Analysisand Mining vol 6 no 1 2016
[11] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the Association for Informa-tion Science and Technology vol 58 no 7 pp 1019ndash1031 2007
[12] L A Adamic and E Adar ldquoFriends and neighbors on theWebrdquoSocial Networks vol 25 no 3 pp 211ndash230 2003
[13] T Zhou L Lu and Y-C Zhang ldquoPredicting missing links vialocal informationrdquoThe European Physical Journal B vol 71 no4 pp 623ndash630 2009
[14] L Katz ldquoA new status index derived from sociometric analysisrdquoPsychometrika vol 18 no 1 pp 39ndash43 1953
[15] G Jeh and JWidom ldquoSimRankrdquo in Proceedings of the the eighthACM SIGKDD international conference p 538 EdmontonAlberta Canada July 2002
[16] H Tong C Faloutsos and J Pan ldquoFast random walk with re-start and its applicationsrdquo in Proceedings of the 6th InternationalConference on DataMining (ICDM rsquo06) pp 613ndash622 December2006
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 11
[17] L Lu C-H Jin and T Zhou ldquoSimilarity index based on localpaths for link prediction of complex networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 80 no 4Article ID 046122 2009
[18] A Papadimitriou P Symeonidis and Y Manolopoulos ldquoFastand accurate link prediction in social networking systemsrdquoTheJournal of Systems and Software vol 85 no 9 pp 2119ndash21322012
[19] W Liu and L Lu ldquoLink prediction based on local randomwalkrdquoEPL (Europhysics Letters) vol 89 no 5 Article ID 58007 2010
[20] C V Cannistraci G Alanis-Lobato and T Ravasi ldquoFrom link-prediction in brain connectomes and protein interactomes tothe local-community-paradigm in complex networksrdquo Scien-tific Reports vol 3 article 1613 no 4 2013
[21] B Chen and L Chen ldquoA link prediction algorithm based on antcolony optimizationrdquoApplied Intelligence vol 41 no 3 pp 694ndash708 2014
[22] D Caiyan L Chen and B Li ldquoLink prediction in complex net-work based on modularityrdquo Soft Computing vol 21 no 15 pp4197ndash4214 2017
[23] V Martnez F Berzal and J-C Cubero ldquoAdaptive degree pena-lization for link predictionrdquo Journal of Computational Sciencevol 13 pp 1ndash9 2016
[24] Z Wu Y Lin J Wang and S Gregory ldquoLink prediction withnode clustering coefficientrdquoPhysica A Statistical Mechanics andits Applications vol 452 pp 1ndash8 2016
[25] PMassaM Salvetti andDTomasoni ldquoBowling alone and trustdecline in social network sitesrdquo in Proceedings of the 8th IEEEInternational Symposium on Dependable Autonomic and SecureComputing DASC 2009 pp 658ndash663 China December 2009
[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998
[27] D Lusseau K Schneider O J Boisseau P Haase E Slootenand S M Dawson ldquoThe bottlenose dolphin community ofdoubtful sound features a large proportion of long-lasting asso-ciations can geographic isolation explain this unique traitrdquoBehavioral Ecology and Sociobiology vol 54 no 4 pp 396ndash4052003
[28] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[29] R E Ulanowicz and D L DeAngelis ldquoNetwork analysis of tro-phic dynamics in south florida ecosystemsrdquo in US GeologicalSurvey Program on the South Florida Ecosystem vol 114 45edition 2005
[30] J Kunegis ldquoKONECTmdashthe koblenz network collectionrdquo inPro-ceedings of the 22nd International Conference on World WideWeb (WWW rsquo13) pp 1343ndash1350 May 2013
[31] M E Newman ldquoThe structure of scientific collaboration net-worksrdquo Proceedings of the National Acadamy of Sciences of theUnited States of America vol 98 no 2 pp 404ndash409 2001
[32] WW Zachary ldquoAn information flowmodel for conflict and fis-sion in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[33] L A Adamic andN Glance ldquoThe political blogosphere and the2004 US Election Divided they blogrdquo in Proceedings of the 3rdInternational Workshop on Link Discovery (LinkKDD rsquo05) pp36ndash43 ACM 2005
[34] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[35] D Bu Y Zhao L Cai et al ldquoTopological structure analysis ofthe protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003
[36] M E Newman ldquoMixing patterns in networksrdquo Physical ReviewE Statistical Nonlinear and Soft Matter Physics vol 67 no 22003
[37] V Latora and M Marchiori ldquoEfficient behavior of small-worldnetworksrdquo Physical Review Letters vol 87 no 19 Article ID198701 2001
[38] F Wilcoxon ldquoIndividual comparisons by ranking methodsrdquoBiometrics Bulletin vol 1 no 6 pp 80ndash83 1945
[39] J Demsar ldquoStatistical comparisons of classifiers over multipledata setsrdquo Journal of Machine Learning Research vol 7 pp 1ndash302006
[40] W-Q Wang Q-M Zhang and T Zhou ldquoEvaluating networkmodels a likelihood analysisrdquo EPL (Europhysics Letters) vol 98no 2 Article ID 28004 2012