-
Node Embedding with Adaptive Similarities forScalable Learning
over Graphs
Dimitris Berberidis , Student Member, IEEE and Georgios B.
Giannakis , Fellow, IEEE
Abstract—Node embedding is the task of extracting informative
and descriptive features over the nodes of a graph. The importance
of
node embedding for graph analytics as well as learning tasks,
such as node classification, link prediction, and community
detection,
has led to a growing interest and a number of recent advances.
Nonetheless, node embedding faces several major challenges.
Practical embedding methods have to deal with real-world graphs
that arise from different domains, with inherently diverse
underlying
processes as well as similarity structures and metrics. On the
other hand, similar to principal component analysis in feature
vector
spaces, node embedding is an inherently unsupervised task.
Lacking metadata for validation, practical schemes motivate
standardization and limited use of tunable hyperparameters.
Finally, node embedding methods must be scalable in order to cope
with
large-scale real-world graphs of networks with ever-increasing
size. The present work puts forth an adaptive node embedding
framework that adjusts the embedding process to a given
underlying graph, in a fully unsupervised manner. This is achieved
by
leveraging the notion of a tunable node similarity matrix that
assigns weights on multihop paths. The design of multihop
similarities
ensures that the resultant embeddings also inherit interpretable
spectral properties. The proposed model is thoroughly
investigated,
interpreted, and numerically evaluated using stochastic block
models. Moreover, an unsupervised algorithm is developed for
training
the model parameters effieciently. Extensive node
classification, link prediction, and clustering experiments are
carried out on many
real-world graphs from various domains, along with comparisons
with state-of-the-art scalable and unsupervised node embedding
alternatives. The proposed method enjoys superior performance in
many cases, while also yielding interpretable information on
the
underlying graph structure.
Index Terms—SVD, SVM, unsupervised, multiscale, random walks,
spectral
Ç
1 INTRODUCTION
USUPERVISED node embedding is an exciting field, inwhich a
significant amount of progress has been madein recent years [15].
The task consists of mapping each nodeof a graph to a vector in a
low-dimensional euclidean space.The main goal is to extract
features that can be utilized down-stream in order to perform a
variety of unsupervised or(semi-)supervised learning tasks, such as
node classifica-tion, link prediction, or clustering [16]. Ideally,
it is desiredfor the embedded nodal vectors to convey at least as
muchinformation as the original graph. Nevertheless, an
appro-priate embedding can boost the performance of certainlearning
tasks because they allow one to work with themore “friendly” and
intuitive Euclidean representation, anddeploy mature and widely
implemented feature-based algo-rithms such as (kernel) support
vector machines (SVMs),logistic regression, and K-means.
Early embedding works mostly focused on a structure-preserving
dimensionality reduction of feature vectors(instead of nodes); see
for instance [22], [23], [24], [25], [26].In this context, graphs
are constructed from pairwise featurevector relations and are
treated as representations of the
manifold that data lie on; embedded vectors are then gener-ated
so that they preserve the corresponding pair-wiseproximities on the
manifold. More recently, nodal vectorembedding of a graph has
attracted considerable attentionin different fields, and is often
posed as the factorization ofa properly defined node similarity
matrix [27], [28], [29],[30], [31], [32], [33], [34]. Efforts in
this direction mostlyfocus on designing meaningful similarity
metrics to factor-ize. While some methods (e.g., [27], [29])
maintain scalabil-ity by factorizing similarity matrices in an
implicit manner(without explicitly forming them), others such as
[30], [31]form and/or factorize dense similarity matrices that
scalepoorly to large graphs. Another line of work opts to
gradu-ally fit pairs of embedded vectors to existing edges
usingstochastic optimization tools [35], [37]. Such approaches
arenaturally scalable and entail a high degree of
locality.Recently, stochastic edge-fitting has been generalized
toimplicitly accommodate long-range node similarities
[36].Meanwhile, other works have approached node embed-dings using
random-walk-based tools and concepts origi-nating from natural
language processing [38], [39], [40]; seealso related works on
embedding of knowledge graphs [41],[42], [50]. Methods that rely on
graph convolutional neuralnetworks and autoencoders have also been
proposed fornode embedding [45], [46], [47]. Moreover, a gamut
ofrelated embedding tasks are gaining traction, such asembedding
based on structural roles of nodes [43], [44],supervised embeddings
for classification [11], and inductiveembedding methods that
utilize multiple graphs [6]
� The authors are with theDepartment of Electrical and Computer
Engineering,and Digital Technology Center, University of Minnesota,
Minneapolis, MN55455USA. E-mail: {bermp001, georgios}@umn.edu.
Manuscript received 3 Dec. 2018; revised 1 June 2019; accepted
16 July 2019.Date of publication 29 July 2019; date of current
version 11 Jan. 2021.(Corresponding author: Dimitris
Berberidis.)Recommended for acceptance by F. Rusu.Digital Object
Identifier no. 10.1109/TKDE.2019.2931542
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 33,
NO. 2, FEBRUARY 2021 637
1041-4347� 2019 IEEE. Personal use is permitted, but
republication/redistribution requires IEEE permission.See ht
_tps://www.ieee.org/publications/rights/index.html for more
information.
https://orcid.org/0000-0003-3563-6052https://orcid.org/0000-0003-3563-6052https://orcid.org/0000-0003-3563-6052https://orcid.org/0000-0003-3563-6052https://orcid.org/0000-0003-3563-6052https://orcid.org/0000-0002-0196-0260https://orcid.org/0000-0002-0196-0260https://orcid.org/0000-0002-0196-0260https://orcid.org/0000-0002-0196-0260https://orcid.org/0000-0002-0196-0260mailto:
-
We identify the following challenges that need to beaddressed in
order to design embedding methods that areapplicable in
practice:
� Diversity. Since graphs that arise from differentdomains are
generally characterized by a diverse setof properties, there may
not be a “one-size-fits-all”node embedding approach.
� No supervision. At the same time, node embeddingmay need to be
performed in a fully unsupervisedman-ner, that is, without extra
information (node attrib-utes, labels, or groundtruth communities)
to guidethe parameter tuning process with cross-validation.
� Scalability. While some real-world networks are ofmoderate
size, others may contain massive numbersof nodes and edges.
Specifically, graphs encounteredwith social networks,
transportation networks,knowledge graphs and others, typically
scale to mil-lions of nodes and tens of millions of edges.
Thus,strict computational constraints must be accountedby the
design of node embedding methods.
In response to these challenges, we propose a scalable
nodeembedding framework that is based on factorizing anadaptive
node similarity matrix. The first challenge isaddressed by
utilizing a large family of node similaritymetrics, parametrized by
placing different weights onnode proximities of different orders;
see also our precursorwork [20]. Experiments indicate that the
proposed modelfor similarity metrics is expressive enough to
describe real-world graphs from diverse domains and with
differentstructures. To address the second challenge (lack of
super-vision), we put forth a self-supervised parameter
learningscheme based on predicting randomly removed edges.Finally,
we accommodate scalability by constraining theparametrization of
similarity matrices such that the prox-imity order parameters carry
over to the embedded vectorsin a smooth manner. This allows for
learning proximity
order parameters directly on the feature vectors. Conse-quently,
dense similarity matrices do not need to be explic-itly formed and
factorized, thus endowing the proposedmethod with the desired level
of scalability.
The rest of the paper is organized as follows. Section
2introduces the problem and the proposed similarity model.Section 3
presents a numerical study on model properties,while Section 4
deals with learning the model parameters inan unsupervised manner.
Finally, Section 5 discussesrelated methods, and Section 6 contains
experiments on realgraphs, comparisons with competing alternatives,
andinterpretation of the results. While notation is defined
wher-ever it is introduced, we also summarize the most
importantsymbols that appear throughout the paper in Table 1.
2 PROBLEM STATEMENT AND MODELING
Given an undirected graph G :¼ fV; Eg, where V is the setof N
nodes, and E � V � V is the set of edges, the task ofnode embedding
boils down to determining fð�Þ : V ! Rd,where d � N . In other
words, a function is sought to mapevery node of G to a vector in
the d-dimensional Euclid-ean space. Typically, the embedding is low
dimensionalwith d much smaller than the number of nodes. Givenfð�Þ,
the low-dimensional vector representation of eachnode vi is
ei ¼ fðviÞ 8vi 2 V :Since the number of nodes is finite, instead
of finding a gen-eral fð�Þ (induction), one may pose the embedding
task inits most general form as a the following minimization
prob-lem over the embedded vectors
fe�i gNi¼1 ¼ arg minfeigNi¼1X
vi;vj2V‘ sGðvi; vjÞ; sEðei; ejÞ� �
; (1)
where ‘ð�; �Þ : R�R ! R is a loss function; sGð�; �Þ : V � V !R
is a similarity metric over pairs of graph nodes; andsEð�; �Þ : Rd
�Rd ! R a similarity metric over pairs of vectorsin the
d-dimensional euclidean space.
In par with (1), node embedding can be viewed as thedesign of
nodal vectors feigNi¼1 that successfully “encode” acertain notion
of pairwise similarities among graph nodes.
2.1 Embedding as Matrix Factorization
Starting from the generalized framework in (1), one mayarrive at
concrete approaches by specifying choices of sGð�; �Þ,sEð�; �Þ, and
‘ð�; �Þ. To start, suppose that the node similaritymetric is
symmetric; that is, sGðvi; vjÞ ¼ sGðvj; viÞ 8vi; vj 2
V.Furthermore, let the loss function be quadratic
‘ðx; x0Þ ¼ x� x0ð Þ2;and the nodal vector similarity be the
inner product
sEðei; ejÞ ¼ e>i ej:Using these specifications, (1) reduces
to the following sym-metric matrix factorization problem
E� ¼ arg minE2RN�d
kSG � EE>k2F ; (2)
TABLE 1Important Notation
V , Set of nodesE , Set of edgesA , N �N adjacency matrixD ,
diagð1TAÞ diagonal degree matrixE , N � dmatrix of embeddingsei ,
Embedding vector of node visGð�; �Þ , Node – to – node
similarityskð�; �Þ , k-hop node – to – node similaritysEð�; �Þ ,
Embedding – to – embedding similarity‘ð�; �Þ , Distance (loss)
between similaritiesSG , Final node similarity matrixS , Basic
sparse (single-hop) and symmetric
node similarity matrixuk , Coefficient of k-hop pathsuu , ½u1; .
. . ; uK T vector of coefficientsSK , K-dimensional probability
simplexSþ , Set of sampled positive edgesS� , Set of all sampled
negative edgesS , Sþ [ S� all sampled edgesNs , Number of sampled
edgesuu�S , Optimal coefficients that fit sample STs , Number of
different edge samples
638 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
33, NO. 2, FEBRUARY 2021
-
where SG 2 RN�N is the symmetric similarity matrixwith SG½ i;j¼
SG½ j;i¼ sGðvi; vjÞ, and matrix E :¼ e1 . . . eN½ >concatenates
all node embeddings as rows. A well-knownanalytical solution to (2)
relies on the singular value decom-position (SVD) of the similarity
matrix, that is SG ¼ USSVT ,where U and V are the N �N unitary
matrices formed bythe left and right singular vectors, and SS is
diagonal withnon-negative singular values sorted in decreasing
order; inour case, U ¼ V since SG is symmetric. Given the SVD of
SG,the low-rank (d � N) solver in (2) is E� ¼ UdSS1=2d , where
SSdcontains the d largest singular values, and Ud the
corre-sponding singular vectors [19]. Matrices Ud and SSd can
beobtained directly using the reduced-complexity schemeknown as
truncated SVD.
If in addition SG is sparse, (2) can be solved even
moreefficiently, with complexity that scales with the number
ofedges. One such example with sparse similarities is whenSG ¼ A,
where A is the graph adjacency matrix. Embed-dings generally gain
scalability by avoiding the explicit con-struction of a dense SG.
In fact, simply storing SG in theworking memory becomes prohibitive
even for graphs ofmoderate sizes (sayN > 105).
In the ensuing section, we will design a family of
densesimilarity matrices that (among other properties) can
bedecomposed implicitly, at the cost of input sparsity.
2.2 Multihop Graph Node Similarities
Having reduced the node embedding problem to the onein (2), it
remains to specify the graph similarity metricthat gives rise to
SG. Towards this end, and in order tomaintain expressibility, we
will design a parametricmodel for SG, with each pairwise node
similarity metricexpressed as
sGðvi; vj; uuÞ ¼XKk¼1
ukskðvi; vjÞ; s:t: uu 2 SK; (3)
where SK :¼ fuu 2 RK : uu 0; uu>1 ¼ 1g is
theK-dimensionalprobability simplex, and skðvi; vjÞ is a similarity
metric thatdepends on all k-hop paths of possibly repeated nodes
thatstart from vi and end at vj (or vice-versa). Thus, sGð�; �;
uuÞcontains all k-hop interactions between two nodes, eachweighted
by a non-negative importance score uk withk ¼ 1; . . . ; K.
Let S be any similarity matrix that is characterized by thesame
sparsity pattern as the adjacency matrix, that is
Si;j ¼ si;j; ði; jÞ 2 E0; ði; jÞ =2 E;�
(4)
where fsi;jgs denote the generic non-negative values ofentries
that correspond to edges of G. Maintaining the samesparsity pattern
as A allows for the ði; jÞ entry of Sk to beinterpreted as a
measure of influence between vi and vj thatdepends on all k-hop
paths that connect them; that is,½Ski;j ¼ skðvi; vjÞ. For instance,
selecting S ¼ A is equivalentto using the k-step similarity skðvi;
vjÞ ¼ jfk� length pathsconnecting vi to vjgj [12]. Likewise, if S ¼
AD�1 where D ¼diagð1TAÞ, then skðvi; vjÞ can be interpreted as the
probabil-ity that a random walk starting from vj lands on vi
afterexactly k steps, e.g., [31]. Thus, for a properly selected
S
with entries as in (4), tunable multihop similarity metrics
in(3) can be collected as entries of the power series matrix
SGðuuÞ ¼XKk¼1
ukSk; s:t: uu 2 SK: (5)
Upon substituting (5) into (2) yields the tunable embed-dings
E�ðuuÞ that depend on the choice of parameters uu. Fromthe
eigen-decomposition S ¼ USSU>, and given that U>U ¼I, we
readily arrive at
Sk ¼ USSkU>; (6)and after plugging (6) into (5), we
obtain
SGðuuÞ ¼ UXKk¼1
ukSSk
!U>; s:t: uu 2 SK: (7)
Furthermore, the truncated singular pairs of SGðuuÞ
conve-niently follow from those of S, and they have to be com-puted
once. Specifically, the truncated singular vectors andsingular
values are UdðuuÞ ¼ Ud and SSdðuuÞ ¼
PKk¼1 ukSS
kd,
respectively. Thus, if S 2 SymN the solution to (2) with
SGparametrized by uu is simply given as
E�ðuuÞ ¼ UdffiffiffiffiffiffiffiffiffiffiffiffiSSdðuuÞ
q: (8)
Note that this holds only for non-negative parameters uk
0 8 k. If uk < 0 for at least one k 2 f1; . . . ; Kg, then
the diag-onal entries of SSdðuuÞ cannot be guaranteed to be
non-nega-tive and sorted in decreasing order, which would
causeUdðuuÞ;SSdðuuÞð Þ to not be a valid SVD pair.Having narrowed
down SG to belong to the parametrized
family in (5), we proceed to select an appropriate
sparsity-preserving S in order to obtain a solid model.
2.3 Spectral Multihop Embeddings
While any symmetric S that obeys (4) can be used for
con-structing multihop similarities (cf. (5)), judicious designs
ofS can effect certain desirable properties. Bearing this inmind,
consider the following identity
S 2 PþN , S ¼ USSU> ¼ ULLU>; (9)where PþN denotes the
space of N �N symmetric positivedefinite (SPD) matrices, and LL is
the diagonal matrix thatcontains the eigenvalues of S sorted in
decreasing order.For SPD matrices as in (9), the SVD is identical
to the eigen-value decomposition (EVD). Thus, if S 2 PþN , the
solution to(2) is also given as (cf. (8))
E�ðuuÞ ¼ UdffiffiffiffiffiffiffiffiffiffiffiffiLLdðuuÞ
p; (10)
where Ud are also the first d eigenvectors of S, and LLdðuuÞ
¼PKk¼1 ukLL
kd is the Kth order polynomial of its eigenvalues
defined by uu.Consider now specifying S as
S ¼ 12
IþD�1=2AD�1=2� �
: (11)
Recalling that �i D�1=2AD�1=2
� � 2 ½�1; 1 8 i, and afterusing the identity shifting and
scaling, we deduce that
BERBERIDIS AND GIANNAKIS: NODE EMBEDDING WITH ADAPTIVE
SIMILARITIES FOR SCALABLE LEARNING OVER GRAPHS 639
-
�iðSÞ 2 ½0; 1 8 i; hence, matrix S in (11) is SPD. It can also
bereadily verified that the first d eigenvectors of S coincidewith
the eigenvectors corresponding to the d smallest eigen-values of
the symmetric normalized Laplacian matrix
Lsym :¼ I�D�1=2AD�1=2: (12)These smallest eigenvalues are known
to contain usefulinformation on cluster structures of different
resolution lev-els, a key property that has been successfully
employed byspectral clustering [17]. Intuitively, assigning weight
uuk tok-hop paths in the node similarity of (5), is equivalent
toshrinking the d-dimensional spectral node embeddings(rows of Ud)
coordinates according to LLdðuuÞ. Interestingly,assigning large
weights to longer paths (K � 1) is equiva-lent to fast shrinking
the coordinates that correspond tosmall eigenvalues and capture the
fine-grained structuresand local relations, what leads to a coarse,
high-level clusterdescription of the graph.
2.4 Relation to RandomWalks
Apart from the spectral embedding interpretation discussedin the
last section, using powers of (11) to capture multihopsimilarities
also admits an interesting random walk inter-pretation. We begin by
expressing the kth power of S as
Sk ¼ 12k
IþD�1=2AD�1=2� �k
¼Xkt¼0
atðkÞ D�1=2AD�1=2� �t
;(13)
where the sequence
atðkÞ :¼12k
kt
� �; 0 � t � k
0; else
�(14)
can be interpreted as nonzero weights that Sk assigns to
allpaths with the number of hops up to k (see Fig. 1).
Using (13) and (14), the multihop similarity in (5)becomes
SGðuuÞ ¼XKt¼0
ctðuuÞ D�1=2AD�1=2� �t
¼ D�1=2XKt¼0
ctðuuÞPt !
D1=2;
(15)
where
ctðuuÞ :¼XKk¼1
ukatðkÞ; (16)
and P ¼ AD�1 is the probability transition matrix of a sim-ple
random walk defined over G; that is, Pi;j is the probabi-ity that a
random walker positioned on node (state) jtransitions to node i in
one step. Thus, the k-hop similarityfunction defined in (3) is
expressed as
sGðvi; vj; uuÞ ¼ffiffiffiffiffidjdi
s XKt¼0
ctðuuÞPrfXt ¼ vijX0 ¼ vjg; (17)
where PrfXt ¼ vijX0 ¼ vjg :¼ Pt½ ij is the probability that
arandom walk starting from vj lands on vi after t steps.
Interestingly, SGðuuÞ does not weigh landing probabilitiesof
different lengths independently. Instead, it accumulatesthe latter
as weighted combinations (cf. (16)) in a basis of“wavelet”-type
functions of different resolution (see Fig. 1).
Having established links to spectral clustering and ran-dom
walks, our novel SGðuuÞ is well motivated as a family ofnode
similarity matrices. Nevertheless, before devising analgorithm for
learning uu and testing it on real graphs, we
will evaluate how well the basis fSkgKk¼1, on which SGðuuÞ
isbuilt, can capture underlying node similarities.
3 MODEL EXPRESSIVENESS
This section introduces a performance metric that quantifieshow
well a node similarity matrix derived from the graphitself matches
the “true” underlying similarity structurebetween nodes. The
discussion is followed by numericalevaluation of the performance of
different similarity matri-ces (including the one in (13)) on
graphs that are generatedaccording to the stochastic block model
[2].
To begin, suppose that for a given set of nodes, an adja-cency
matrix A is generated as
A fAðAÞ;where fAðAÞ is a probability density function defined
overthe space of all possible adjacency matrices. Let the
“true”underlying similarity between nodes vi and vj be
s�ðvi; vjÞ :¼ Prfði; jÞ 2 Eg ¼ EfA Ai;j�
;
which is the probability that the two nodes are connected.The
“true” similarity matrix is thus given as the expectedadjacency
matrix
S� :¼ EfA A½ :We define the quality-of-match (QoM) between the
underly-ing S� and any similarity Ŝ ¼ F ðAÞ estimated from the
adja-cency matrix as
QoM :¼ EfA PC S�; F ðAÞð Þ½ ; (18)
Fig. 1. Matrix Sk is equivalent to applying “wavelet”-type
weights atðkÞover walks with hops � k.
640 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
33, NO. 2, FEBRUARY 2021
-
where
PC X1;X2ð Þ :¼ vec X1ð Þð Þ>vec X2ð Þ
kX1kFkX2kF; (19)
is the Pearson correlation between two matrices X1 and X2,with
vec Xð Þ denoting matrix vectorization. The latter is usedfor
appropriate rescaling of the “true” similarity matrix inorder for
the comparison with SG to be meaningful. Intui-tively, (18)
measures how well the estimated node similari-ties in Ŝ are
expected to match the pattern of trueunderlying similarities in S�,
when edges are generatedaccording to the known fAð�Þ.
3.1 Numerical Experiments and Observations
We numerically evaluate the QoM achieved by differentsimilarity
matrices, on a set of N nodes whose interconnec-tions are generated
according to a stochastic block model(SBM). For this set of
experiments, we divided the nodesinto three clusters of equal
size
Cl ¼ fi : ðl� 1ÞN=3 � i � lN=3g; l 2 f1; 2; 3g;with inter- and
intra-connection probabilities
Prfði; jÞ 2 Eg ¼p; ði; jÞ in the same Clcq; i 2 C1 and j 2 C3q
else;
8<: (20)
where p is the probability of connection when two nodesbelong to
the same cluster, and c < 1 introduces asymme-try and a
hierarchical clustering organization (see Fig. 2-topleft), by
making two of the clusters less likely to connect; we
have related Python scripts available.1 The SBM
probabilitymatrix [2] is given as
Wsbm ¼p q cqq p qcq q p
24
35; (21)
and the underlying similarity can be expressed as
S� ¼ E A½ ¼ Wsbm � 1N=31TN=3� �
� diagðp1NÞ; (22)
where � denotes the Kronecker product.For each experiment, we
set N ¼ 150 and generated a
graph according to (20). We then compared the QoMbetween (22)
and the kth power of the proposed (11), thekth power of the
adjacency (Ak), as well as each of the fol-lowing well known
similarity metrics:
� ŜPPR :¼ ð1� aÞðI� aAD�1Þ�1: the steady state prob-ability
that a random walk restarting at vj with prob-ability 1� a at every
step is located at vi. Essentiallya personalized PageRank (PPR)
computed for everynode of the graph, inheriting the properties of
thecelebrated centrality measure [7], [8], [9].
� ŜKATZ :¼ ð1� bÞðI� bAÞ�1A: the Katz index [12],
anexponentially weighted summation over paths of allpossible hops
between two nodes.
� ŜNEIGH :¼ A2: the number of common neighbors thatevery pair
of nodes shares.
Fig. 2. Depiction of groundtruth and estimated similarity
matrices, as yielded from an instance of the numerical experiments
described in Section 3.1.
1.
https://github.com/DimBer/ASE-project/tree/master/sim_tests
BERBERIDIS AND GIANNAKIS: NODE EMBEDDING WITH ADAPTIVE
SIMILARITIES FOR SCALABLE LEARNING OVER GRAPHS 641
-
� ŜAA :¼ AD�1A: Adamic-Adar [4] is a variant of com-mon
neighbors where each set of neighbors isweighted inversely
proportional to its cardinality.
The resulting QoM was averaged over 200 experiments.
Parameters a in ŜPPR and b in ŜKATZ were tuned to maxi-mize
the performance of the metrics. Fig. 3 depicts QoM asa function of
k, for three different scenarios.
In the first scenario (Fig. 3a), with graphs being denseand
clustered (p ¼ 0:3, q ¼ 0:1), the proposed Sk improvessharply in
the first few steps, reaching maximum QoM after4 or 5 steps, and
gradually decreases as k continues toincrease. The kth order
proximities that are given as entriesof Ak follow a similar trend,
however their QoM peaksshortly after 2 or 3 steps and declines fast
for larger k. Thematrix plots of a randomly selected experiment
depicted inFig. 2 can aid in understanding the underlying
mechanismthat gives rise to this highly step-dependent behavior.
Spe-cifically, S1 (bottom left) that has the same sparsity
patternas the adjacency is a poor match to the dense
block-structureof S�. On the other side of the spectrum, S15
(bottom right) istoo “flat” and also a poor similarity metric.
Meanwhile, tak-ing k ¼ 6 promotes enough mixing without
“dissipating.”As a result, S6 (bottom center) visibly matches the
structureof S�. Interestingly, for k 2 ½4; 10 the proposed Sk
surpassesin QoM all other similarity metrics that were tested.
Never-theless, the simple 2-hop Adamic-adar,
common-neighborssimilarities perform reasonably well by exploiting
the rela-tively dense structure of the graphs.
Results were markedly different in the second scenarioshown in
Fig. 3b. Here, graphs were generated with the sameclustering
structure but significantly sparser,with edge prob-ability
parameters p ¼ 0:15 and q ¼ 0:05. For sparser graphs,Ak and Sk
require more steps to reach peak QoM (4 and 9respectively).
Similarly, PPR which relies on long paths per-forms much better
than the short-reaching Adamic-Adar.This behavior is intuitively
reasonable because the sparser agraph is, the longer become the
paths that need to beexplored around each node, in order for the
latter to “gauge”its position on the graph.
Finally, a third scenario (Fig. 3c) was examined, whereeach
graph was generated without a clustering structure(p ¼ q ¼ 0:1 and
c ¼ 1); essentially an Erdos-Renyi graph.For this degenerate case
that is of no real practical interest,all pairs of nodes are
equally similar; this type of similarityrequires infinitely long
paths to be described.
In a nutshell, the presented numerical study hints at thetwo
following facts. First, Sk can successfully model similar-ities
that are based on grouping nodes in arbitrary and mul-tilevel sets
with variable degrees of homophily andheterophily. The second fact,
is that the performance of Sk
varies significantly with k. Moreover, the way that k
affectsperformance may also vary from graph to graph, dependingon
the underlying properties—what suggests viewing thisway as a graph
“signature” that is also validated by the realgraphs in Section 6.
Thus, a principled means of specifyingSGðuuÞ by learning the
parameters that match this graph“signature” in an unsupervised
mode, is highly motivated.
4 UNSUPERVISED SIMILARITY LEARNING
We have arrived at the point where for a given graph, it
isprudent to select a specific uu 2 SK without supervision.
Fol-lowing the discussion in Section 3, it would be ideal to
fitSGðuuÞ to a true S� by minimizing an expected cost
uu� ¼ arg minuu2SK
EfA ‘ S�;SGðA; uuÞð Þ½ : (23)
Unfortunately, we only have one realization A of fAð�Þ,which
means that without prior knowledge, the bestapproximation of S�
that we can obtain is the adjacencymatrix itself, that is S� � A.
Using this approximation yields
minuu2SK
‘ A;SGðA; uuÞð Þ: (24)
While straightforward, (24) yields embeddings with
limitedgeneralization capability. Simply put, regardless of
thechoice of ‘ð�Þ, solving (24) amounts to predicting a set ofedges
by tuning a similarity metric that is generated by thesame set of
edges.
To mitigate overfitting but also promote generalization ofthe
similarity metric and of the resulting embeddings, weexplore the
following idea. Suppose we are given a pairA1;A2 of adjacency
matrices both drawn independentlyfrom fAð�Þ. In this case, we would
be able to use one asapproximation of S� � A1, and the other to
form the multi-hop similarity matrix SGðA2; uuÞ; parameters uu can
then belearned by solving
minuu2SK
‘ A1;SGðA2; uuÞð Þ: (25)
Fig. 3. Quality of match between true SBM similarity and various
estimates, as yielded from experiments of Section 3.1.
642 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
33, NO. 2, FEBRUARY 2021
-
Since separate samples are not available, we approximatethe
aforementioned process by randomly extracting part ofA and
approaching (25) as
minuu2SK
‘S A;SGðA � Sc; uuÞð Þ; (26)
where S 2 f1; . . . ; Ng2 is a subset of all possible pairs
ofnodes with jSj ¼ Ns, and Sc is an N �N binary sectionmatrix with
Sci;j ¼ 0, if fi; jg 2 S, and Sci;j ¼ 1, otherwise.Furthermore,
‘Sð�; �Þ in (26) denotes cost ‘ð�; �Þ applied selec-tively only to
entries of the matrix variables that belong toS. Here, such that S
¼ Sþ [ S�, with Sþ 2 E being as subsetof the edges and S� 2 f1; . .
. ; Ng2 n E a subset of node indextuples that are not connected
(non-edges). To balance theinfluence of existing and non-existing
edges, we use subsetsof equal cardinality, that is jSþj ¼ jS�j ¼
Ns=2.
To arrive from the unsupervised similarity learningframework
(26) to a practical method, it remains to specifytwo modular
sub-systems: one responsible for samplingedges, and one specifying
‘ð�; �Þ to find uu� by solving (26).
4.1 Edge Sampling
The choice of the sampling scheme for S plays an importantrole
in the overall performance of the proposed adaptiveembedding
framework. Ideally, edge sampling should takeinto account the
following criteria.
� Sample Sþ should be representative of the graph;� Edge removal
should inflict minimal perturbation;� Edge removal should avoid
isolating nodes; and� Sampling scheme should be simple and
scalable.
Aiming at a ‘sweet spot’ of these objectives, we populate Sþby
sampling edges according to the following procedure:first, a node
v1 is sampled uniformly at random from V; then,a second node v2 is
sampled uniformly from the neighbor-hood set NGðv1Þ of v1. The
selected edge is removed only ifboth adjacent nodes have degree
greater than one. Non-edges S� are obtained by uniform
samplingwithout replace-ment over f1; . . . ; Ng2 n E. The overall
procedure is summa-rized in Algorithm 2. For Ns � N , sampling
probabilitiesremain approximately unchanged despite the
removals,since the probability of selecting the same node is
relativelysmall. Thus, one may approximate Prfet ¼ ði; jÞg � Prfe0
¼ði; jÞg, and assuming for simplicity that di > 18i, it
followsthat
Prfe0 ¼ ði; jÞg ¼ Prfv1 ¼ i; v2 ¼ jg þ Prfv1 ¼ j; v2 ¼ ig¼ Prfv2
¼ ijv1 ¼ jgPrfv1 ¼ jgþ Prfv2 ¼ jjv1 ¼ igPrfv1 ¼ ig
¼ 1dj
1
Nþ 1di
1
N/ di þ dj
didj;
(27)
meaning that edge e ¼ ði; jÞ is removed with probabilitythat is
proportional to the harmonic mean of the degrees ofthe nodes that
it connects. As shown in [14], the perturba-tion that the removal
of edge e ¼ ði; jÞ inflicts on the spec-trum of an undirected graph
is proportional to didj; that is,removing edges that connect
high-degree nodes leads tohigher perturbation. Thus, Algorithm 2
tends to inflict mini-mal perturbation by sampling with probability
that isinversely proportional to didj for di; dj � 1; this is
because
the denominator of (27) dominates its numerator for
largedegrees. On the other hand, for smaller di and dj,
thenumerator ensures relatively high probabilities for
moder-ate-degree nodes. The combination of the two effects
yieldsedge samples that are fairly representative of the
graph,while inflicting low perturbation when removed.
4.2 Parameter Training
Subsequently, for a given sample S, we can obtain the
corre-sponding optimal parameters as (cf. (26))
uu�S ¼ arg minuu2SK
Xi;j2S
‘ Ai;j; sG�ðvi; vj; uuÞ� �
; (28)
where G� :¼ V; E n Sþð Þ is the original graph with the
ran-domly sampled subset Sþ of edges removed.
Algorithm 1. ADAPTIVE SIMILARITY EMBEDDING
Input: G Output: E// Training phaseQQ ¼ ;while jQQj < Ts
doG�, Sþ, S� ¼ SAMPLE EDGES( G )uu�S ¼ TRAIN PARAMETERS(
G�;Sþ;S�)QQ ¼ QQ [ uu�S
end whileuu� ¼ T�1s
Puu2QQ uu
// Embedding phase
S ¼ 12 IþD�1=2AD�1=2� �
S ¼ UdSSdUTdSSdðuu�Þ ¼
PKk¼1 u
�kSS
kd
return E ¼
UdffiffiffiffiffiffiffiffiffiffiffiffiffiffiSSdðuu�Þ
p
Algorithm 2. SAMPLE EDGES
Input: G Output: G�;Sþ;S�// Sample edgesSþ ¼ ;, G� ¼ Gwhile jSþj
< Ns=2 doSample v1 Unif Vð Þif jN G�ðv1Þj > 1 thenSample v2
Unif NG�ðv1Þð Þif jN G�ðv2Þj > 1 thenSþ ¼ Sþ [ ðv1; v2ÞG� ¼ G� n
ðv1; v2Þ
end ifend if
end while// Sample non-edgesS� ¼ ;while jS�j < Ns=2 doSample
ðv1; v2Þ Unif V � Vð Þif ðv1; v2Þ =2 E doS� ¼ S� [ ðv1; v2Þ
end ifend whilereturn G�, Sþ, S�
Interestingly, one way that (28) could be solved is byexplicitly
computing the entries of SGðuuÞ that are in S. Thiswould require
performing K sparse matrix-vector products
BERBERIDIS AND GIANNAKIS: NODE EMBEDDING WITH ADAPTIVE
SIMILARITIES FOR SCALABLE LEARNING OVER GRAPHS 643
-
to obtain every column of Sk for k 2 f1; . . . ; Kg, for all
thecolumns that contain sampled entries. In the worst case, ifall
nodes in the tuples of S correspond to different columnsof SGðuuÞ,
two random walks are required for every tuple, fora total of 2Ns
random walks. This requires O NsKjEjð Þ com-putations, and O NsNð Þ
memory if they are to be performedconcurrently or in matrix form.
Since K will typically be inthe order of tens, these requirements
will be affordable, ifNs is relatively small. Nevertheless, they
quickly becomecumbersome for Ns � K, which may be necessary to
esti-mate theK-dimensional uu.
Instead, we will rely on the fact that the proposedembeddings
are smooth and differentiable wrt to uu (cf. (10)),to develop a
solution that allows for selecting arbitrarilylarge Ns, using the
approximation
sG�ðvi; vj; uuÞ � sEðe�i ðuu; e�j ðuuÞÞ¼ e�i ðuuÞ� �>
e�j ðuuÞ
¼ffiffiffiffiffiffiffiffiffiffiffiffiffiSS�d ðuuÞ
qu�i
�> ffiffiffiffiffiffiffiffiffiffiffiffiffiSS�d ðuuÞ
qu�j
¼ u�i� �>
SS�d ðuuÞu�j
¼ x>i;j uu;
(29)
where
xi;j ¼ u�i � u�j� �>
SSKd ; (30)
and
SSKd ¼
s1 s21 � � � sK1
..
. ... . .
. ...
sd�1 s2d�1 � � � sKd�1sd s
2d � � � sKd
2666664
3777775:
Conveniently, fxi;jgs act as features over every possible pairof
nodes, which when linearly combined with weights uu toproduce
similarities, allow us to approach (28) using well-understood
learning and optimization tools. Among thevarious loss functions
one may fit the removed edges2 usingthe hinge loss
‘ðy; fÞ :¼ maxð0; �� yfÞ; (31)which is suitable for real-world
graphs thanks to its robust-ness properties [13]; note that target
variables here aredefined as yi;j ¼ 2Ai;j � 1 so that yi;j 2 f�1;
1g. We can thenequivalently express (28) as
uu�S ¼ arg minuu2SK
Xi;j2S
maxð0; �� yi;jx>i;j uuÞ þ �kuuk22; (32)
where � 0 is the regularization parameter of the ‘2
regu-larization typically used to improve the robustness and
gen-eralization capability of SVMs [13]. To solve our variant
ofsimplex-constrained SVMs (cf. (32)), we employ the
projected-gradient descent approach [3] that we describe
inAlgorithm 4, where SIMPLEXPROJ( � ) is a subroutine
thatimplements projections onto SK ; the latter can be
performedwith OðKlogKÞ complexity as described in [21]. The
overallparameter learning procedure for a given sample is
summa-rized in Algorithm 3.
Algorithm 3. TRAIN PARAMETERS
Input: G, Sþ, S� Output: uu�SS ¼ 12 IþD�1=2AD�1=2
� �S ¼ UdSSdUTdS ¼ Sþ [ S�Form XS ¼ fxði;jÞgði;jÞ2S as in
(30)return uu�S ¼ SIMPLEXSVM( XS ;Sþ;S�)
Algorithm 4. SIMPLEXSVM
Input: X ;Sþ;S�Output: uu�uu0 ¼ 1K 1; t ¼ 1while kuut � uut�1k1
tol dot ¼ tþ 1; ht ¼ a=
ffiffit
p
Sþa ¼ fe 2 Sþj xTe uut�1 � �gS�a ¼ fe 2 S�j xTe uut�1 ��ggt
¼
Pe2S�a xe �
Pe2Sþa xe
zt ¼ ð1� 2ht�Þuut�1 � htNs gtuut ¼ SIMPLEXPROJ( zt )
end whilereturn uut
In general, if runtime or computational resources allow,the
sampling and training process described in the last twosections can
be repeated Ts times to obtain different fuu�Sgs,which can then be
averaged in order to reduce their vari-ance. In practice, this may
not be necessary if Ns is largeenough, which will yield a
near-deterministic uu. The overallproposed adaptive-similarity
embedding (ASE) frameworkis summarized in Algorithm 1.
4.3 Complexity
The computational complexity of ASE is dominated by thecost of
performing the truncated SVD of S in the training aswell as testing
phases of Algorithm 1. Relying on the spar-sity (jEj � N2) and
symmetry of S, the Lanczos algorithmfollowed by EVD of a
tridiagonal matrix yield the truncatedSVD in a very efficient
manner. Provided that d � N , thedecomposition can be achieved in
OðjEjdÞ time and usingOðNdÞ memory. Therefore, for the Ts 1
training roundsand a single embedding round of Algorithm 1, the
overallcomplexity is OððTs þ 1ÞjEjdÞ.
5 RELATED WORK
Two recent embedding methods also pursue similaritymatrices that
combine walks of different lengths [12], [38].Most relevant to the
proposed ASE is the “Arbitrary-OrderProximity Preserved Network
Embedding” [12] approach,where a method is proposed for obtaining
the SVD of apolynomial of the adjacency matrix without having
torecompute the singular vectors.
2. In our implementation, we also provide learning
mechanismsbased on least-squares, logistic regression, as well as
finding the bestsingle k. Due to space constrains though we only
present and reportresults of the SVM-based approach.
644 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
33, NO. 2, FEBRUARY 2021
-
Compared to [12], we put forth the following contribu-tions.
First, we introduce a family of multihop similaritieswhose
decomposition leads to embeddings that inherit therich information
contained in the spectral embeddings (cf.Section 2.3). An equally
important contribution in terms ofmodeling is that our embeddings
can be differentiated withrespect to (wrt) weights uu (cf. (29),
(30), (31), and (32)),whereas the embeddings in [12] are
non-differentiable wrtthe weights. Hence, [12] can only proceed in
a “forward”fashion given some order proximity weights uu, whereas
ourapproach allows for “navigating” the space of possible
simi-larity functions sðvi; vj; uuÞ in a smooth fashion, meaning
thatuu can be learned with simple optimization on
well-definedfitting models such as logistic regression or SVMs (cf.
(32)).This leads to the third main contribution, which is a meansof
learning “personalized” uu (cf. Section 4) in an unsuper-vised
fashion, meaning without downstream informationsuch as node or edge
labels/attributes that can guide cross-validation in
high-dimensional discretized parameter grids.
The second related embedding method presented in [38]builds on
the concept of graph attentionmechanisms to placeweights on lengths
of truncated randomwalks. These mech-anisms are used to build a
similarity matrix containing co-occurrence probabilities. The
matrix is jointly decomposedby maximizing a graph-likelihood
function. The model in[38] is a generalization of the ones
implicitly adopted by [39]and [40], building on similar tools and
concepts that emergefrom natural language processing. Different
from [39], [40]and the proposed ASE, [38] explicitly constructs and
factor-izes a denseN �N similaritymatrix. The detailed
procedureincurs complexity that is cubic wrt N , and becomes at
bestquadratic after model approximations, meaning that [38]scales
rather poorly beyond small graphs.
6 EXPERIMENTAL EVALUATION
The present section reports extensive experimental results ona
variety of real-world networks. The aim of the presentedtests is
twofold. First, to determine and quantify the qualityof the
proposed ASE embeddings for different downstreamlearning tasks.
Second, to analyze and interpret the resultingembedding parameters
for different networks.
Datasets. In our experiments, we used the following real-world
networks (see also Table 2).
� ca-AstroPh. The Astro Physics collaboration net-work is from
the e-print arXiv and covers scientific
collaborations between co-authored papers submit-ted to Astro
Physics category [52]. If an author i co-authored a paper with
author j, the graph contains aundirected edge from i to j. If the
paper is co-auth-ored by k authors, this generates a completely
con-nected (sub)graph on k nodes.
� ca-CondMat. Condense Matter Physics collabora-tion network
from ArXiv [52].
� CoCit. A co-citation network of papers citing otherpapers
extracted by [36]; labels represent conferencesin which papers were
published.
� com-DBLP. Computer science research bibliographycollaboration
network [52].
� com-Amazon. Network collected by crawling Ama-zon website
[52]. It is based on “Customers WhoBought This Item Also Bought”
feature of the Ama-zon website. If a product i is frequently
co-purchasedwith product j, the graph contains an undirectededge
from i to j.
� vk2016-17. VK is a Russian all-encompassingsocial network. In
[36], two snapshots of the networkwere extracted in November 2016
and May 2017, toobtain information about link appearance.
� email-Enron. Enron email communication net-work covering all
the email communication within adataset of around half a million
emails [52].
� PPI (H.Sapiens). Subgraph of the protein-proteininteraction
network for Homo Sapiens. The sub-graph corresponds to the graph
induced by nodesfor which labels (representing biological states)
wereobtained from the hallmark gene sets [40].
� Wikipedia. This is a co-occurrence network ofwordsappearing in
the first million bytes of the Wikipediadump. The labels represent
the Part-of-Speech (POS)tags inferred using the Stanford POS-Tagger
[40].
� BlogCatalog. A network of social relationships ofthe bloggers
listed on the BlogCatalog website. Thelabels represent blogger
interests inferred throughthe meta-data provided by the
bloggers.
Methods. Experiments were run using the following unsu-pervised
and scalable embedding methods.
� ASE. Our proposed adaptive similarity embedding.Based on
observations made in Sections 3, and toretain optimization
stability, we set the maximumnumber of steps to K ¼ 10. We also use
the defaultSVM regularizer (� ¼ 1). To have a single learninground
with learned parameters having small enoughvariance, we sampled
with Ns=2 ¼ 1;000. We madeour implementation of ASE freely
available.3
� VERSE [36]. This is a scalable framework for generat-ing node
embeddings according to a similarity func-tion by minimizing a
KL-divergence-objective viastochastic optimization. We used the
default versionwith similarity (PPR with a ¼ 0:85), as suggestedand
implemented by the authors.4
� Deepwalk [39]. This approach learns an embeddingby sampling
random walks from each node, and
TABLE 2Network Characteristics
Graph jVj jEj jYj DensityPPI (H. Sapiens) 3,890 76,584 50
10�2Wikipedia 4,733 184,182 40 1:6� 10�2BlogCatalog 10,312 333,983
39 6:2� 10�3ca-CondMat 23,133 93,497 - 3:5� 10�4ca-AstroPh 18,772
198,110 - 1:1� 10�3email-Enron 36,692 183,831 - 2:7� 10�4CoCit
44,312 195,362 15 2� 10�4vk2016-17 78,593 2,680,542 - 8:7�
10�4com-Amazon 334,863 925,872 - 1:7� 10�5com-DBLP 317,080
1,049,866 - 2:1� 10�5
3. https://github.com/DimBer/ASE-project4.
https://github.com/xgfs/verse
BERBERIDIS AND GIANNAKIS: NODE EMBEDDING WITH ADAPTIVE
SIMILARITIES FOR SCALABLE LEARNING OVER GRAPHS 645
-
applying word2vec-based learning on those walks.We use the
default parameters proposed in [39], i.e.,walk length t ¼ 80,
number of walks per nodeg ¼ 80, window size w ¼ 10, and the
scalable C++implementation5 provided in [36].
� HOPE [29]. This SVD-based approach approximateshigh-order
proximities and leverages directed edges.We report the results
obtained with the defaultparameters, i.e., Katz similarity as the
similaritymeasure with b inversely proportional to the
spectralradius.
� AROPE [12]. An approach for fast computation ofthin SVD of
different polynomials of A. We used theofficial Python
implementation6 to produce theembeddings. We selected the
polynomial (hyper)parameters of AROPE using a set of validation
edgesthat was sampled similarily to ASE (Algorithm 2).We consider
proximity orders in the range [1,10],and perform grid search over
the different proximityweights as suggested in [12].
� LINE [35]. This approach learns a d-dimensionalembedding in
two steps, both using adjacency simi-larity. First, it learns d=2
dimensions using first-orderproximity; then, it learns another d=2
features usingsecond-order proximity. Last, the two halves are
nor-malized and concatenated. We obtained a copy ofthe code,7 and
run experiments with T ¼ 1010 sam-ples (although T ¼ 109 yielded
the same accuracyfor smaller graphs), and s ¼ 5 negative samples,
asdescribed in the paper.
� Spectral. This approach relies on the first d eigenvec-tors of
D�1=2AD�1=2. The baseline was developed forclustering [17], and has
also been run as a benchmarkfor node embeddings [40]. In our case,
spectralembedding is of particular interest since it can beobtained
by column-wise normalization of theembeddings generated by the
proposed method.
We excluded comparisons with Node2vec [40] becausethey use
cross-validation on node labels for hyper-parame-ter selection.
Thus comparing Node2vec to methods such asLINE, Deepwalk, HOPE,
VERSE, and EMB that all operatewith fixed hyperparameters in a
fully unsupervised manner
would be unfair. We also excluded comparisons withGraRep [31]
and M-NMF [30] due to their limited scalability(OðN2dÞ
computational and OðN2Þmemory complexity).
Evaluation Methodology. Our experiment setting followsthe one in
[36]. All methods are set to embed nodes todimension d ¼ 100. Using
the resulting embeddings as fea-ture vectors, we evaluated their
performance in terms ofnode classification and link prediction
accuracy, and clus-tering quality. All experiments were repeated 10
times andreported are the averaged results.
Interpretation of Results. One interesting aspect of the
pro-posed ASE method, is that the inferred parameters uu� fromthe
first phase of Algorithm 1 can be used to characterisethe
underlying similarity structure of the graph, and theway nodes
“interact” over different path lengths (short,medium, and long
range). The “strength” of interactions isinferred by how uniform
the coefficients of uu� are, anddepend on the value of �. Since the
default value was � ¼ 1for all graphs, the results can be
interpreted as relative inter-action strengths between them. The
resulting fuu�gs for allgraphs are listed in Table 3.
It can be immediately observed that the type of nodeinteractions
varies significantly across different graphs,with similar behavior
for graphs that belong to the samedomain. Specifically, ca-CondMat,
ca-AstroPh, andCoCit that belong to the citation/co-authorship
domain allshow relatively strong interactions of short range.
BlogCa-talog shows very strong short-range similarities of
onlyone-hop neighborhood interactions among bloggers. On theother
hand, the Wikipedia word co-occurrence networkshows a strong
tendency for long-range interactions; whileother graphs, such as
the PPI protein interaction networkstay on the medium range.
Node Classification. Graphs with labeled nodes are fre-quently
used to measure the ability of embedding methodsto produce features
suitable for classification. For eachexperiment, nodes were
randomly split to a training set anda test set. Similar to other
works, and to cope with multi-label targets, we fed the training
features and labels into theone-vs-the-rest configuration of
logistic regression classifierprovided by the sklearn Python
library. In the testingphase, we sorted the predicted class
probabilities for eachnode in decreasing order, and extracted the
top-ki rankinglabels, were ki is the true number of labels of node
vi. Wethen computed the Micro- and Macro-averaged F1 scores[10] of
the predicted labels.
TABLE 3Inferred Parameters and Interpretation
Graph u1 u2 u3 u4 u5 u6 u7 u8 u9 u10 range strength
PPI (H. Sapiens) 0.00 0.14 0.31 0.29 0.21 0.04 0.00 0.00 0.00
0.00 medium mediumWikipedia 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01
0.37 0.62 long strongBlogCatalog 1.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 short very strongca-CondMat 0.55 0.33 0.12 0.00 0.00
0.00 0.00 0.00 0.00 0.00 short strongca-AstroPh 0.76 0.24 0.00 0.00
0.00 0.00 0.00 0.00 0.00 0.00 short strongemail-Enron 0.24 0.25
0.18 0.14 0.1 0.06 0.02 0.00 0.00 0.00 medium weakCoCit 0.61 0.33
0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00 short strongvk2016-17 0.71
0.29 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 short strongcom-Amazon
0.10 0.10 0.10 0.10 0.09 0.09 0.09 0.09 0.09 0.09 short very
weakcom-DBLP 0.11 0.10 0.10 0.09 0.09 0.09 0.09 0.09 0.09 0.08
short very weak
5. https://github.com/xgfs/deepwalk-c6.
https://github.com/ZW-ZHANG/AROPE7.
https://github.com/tangjianpku/LINE
646 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
33, NO. 2, FEBRUARY 2021
https://github.com/ZW-ZHANG/AROPE
-
Apart from comparisons with alternative embeddingmethods, node
classification can reveal whether availablenode labels (metadata)
are distributed in a manner thatmatches the node
relations/interactions that are inferred byASE. To reveal this
information, we obtain embeddings forevery k 2 f1; . . . ; 10g by
ignoring the training phase and“forcing” uu� ¼ ek (i.e., 1 at the
kth entry and 0 elsewhere) inAlgorithm 1, and then using each
embedding for classifica-tion with 10 percent labeling rate. Fig. 4
plots Micro andMacro F1 for all labeled graphs as a function of k,
while redshade is placed on the hops where the unsupervised
ASEparameters uu� are non-zero (cf. Table 1). As seen in Fig. 4,the
accuracy on the four labeled graphs evolves with k in amarkedly
different manner. Nevertheless, ASE identifiesthe trends and tends
to assign non-zero weights to hops
that yield a desirable trade-off between Micro and MacroF1.
Bearing in mind that ASE does not use labels for trainingor
validation, this is rather remarkable considering the factthat uu�
depends only on the graph.
We also compared the classification accuracy of ASEembeddings
with those of the alternative embeddingapproaches, with results
plotted in Fig. 5. The plots for somemethod-graph pairs are not
discernible when values are toolow. While the relative performance
of any given methodvaries from graph to graph, ASE adapts to each
graph andyields consistently reliable embeddings, with accuracy
thatin most cases reaches or surpasses that of
state-of-the-artmethods, especially in terms ofMacroF1. The two
exceptionsare the Macro F1 in CoCit, and Micro F1 in
Wikipedia,where VERSE andHOPE are correspondinglymore
accurate.Interestingly, HOPE achieving high Micro F1 and lowMacroF1
in Wikipedia is in agreement with the findings in Fig.
4,combinedwith the fact that HOPE focuses on longer paths.
Link Prediction. Link prediction is the task of estimatingthe
probability that a link between two unconnected nodeswill appear in
the future. We repeat the experiment per-formed in [36] on the
vk2016-17 social network. For every
Fig. 4. Micro and Macro F1 scores for the four labeled graphs,
when the “pure” k-order Sk is used for embedding, given as a
function of k. Red shade
denotes the corresponding k’s where ASE assigned non-zero uuk’s;
see also Table 2.
TABLE 4Link Prediction Accuracy on vk2016-17
VERSE ASE LINE Deepwalk AROPE HOPE Spectral
0.79 0.75 0.74 0.69 0.65 0.62 0.60
Fig. 5. Micro (upper row) and Macro (lower row) F1 scores that
different embeddings + logistic regression yield on labeled graphs,
as a function of thelabeling rated (percentage of training
data).
BERBERIDIS AND GIANNAKIS: NODE EMBEDDING WITH ADAPTIVE
SIMILARITIES FOR SCALABLE LEARNING OVER GRAPHS 647
-
possible edge, we build a feature vector as the Hadamardproduct
between the embedded vectors of its two adjacentnodes. Using the
two time instances of vk2016-17, we pre-dict whether a new
friendship link appears betweenNovember 2016 and May 2017, using 50
percent of the newlinks for training and 50 percent for testing. To
train thebinary logistic regression classifier, we also randomly
sam-ple non-existing edges as negative examples. The link
pre-diction accuracy for different embeddings is reported inTable
4. While for this experiment ASE does not reach theaccuracy of
VERSE, it provides the second most accuratelink prediction, far
surpassing the also SVD-based HOPEand spectral embeddings.
Node Clustering. Finally, the embedded vectors were usedto
cluster the nodes into different communities, using thesklearn
library K-means with the default K-means++ ini-tialization [18]. We
evaluate the quality of node clusteringwith conductance, a
well-known metric for measuring thegoodness of a community [5];
conductance is minimized forlarge,well connected communities that
are alsowell separatedfrom the rest of the graph. Each plot in Fig.
6 gives the averageconductance across communities, as a function of
the totalnumber of clusters. Results indicate that the proposed ASE
as
well as the spectral clustering benchmark yield much
lowerconductance compared to other embeddings. Apparently,since ASE
builds on the same basis of eigenvectors used bynormalized spectral
clustering, it inherits the property of thelatter to approximately
minimize the normalized-cut metric[17], which is very similar to
conductance. A closer look at theresulting clusters, reveals that
clustering beased on VERSE,Deepwalk, LINE, and HOPE splits graphs
into very largecommunities of roughly equal size, cutting a large
number ofedges in the process. This is an indication that these
methodsare subject to a resolution limit, which is the inability to
detectwell-separated communities that are below a certain size
[1].On the other hand, Spectral and the proposed ASE separatethe
graph into a large-core component, and many smallerwell-separated
communities, a structure thatmany large-scaleinformation networks
have been observed to have [5]. Indeed,the conductance gap is
smaller for BlogCatalog, which isrelatively small andwith less
pronounced communities.
Parameter Sensitivity. We also present results in Fig. 7
aftervarying ASE parameters and measured embedding runtimefor PPI
as well as classificationMicro F1 accuracy with 10 per-cent
labeling rate. The aim is to assess the sensitivity of ASEwrt its
basic parameters. The plot on the left shows how
Fig. 6. Average conductance of different embeddings used by
kmeans for clustering, as a function of number of clusters.
Fig. 7. Sensitivity (F-1 Micro on left axes, and Runtime on
right axes) of ASE on PPI graphs wrt various parameters.
648 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
33, NO. 2, FEBRUARY 2021
-
increasing � (cf. (32)) may decrease accuracy by forcing
theentries of uu� to be close to uniform, thus losing the benefits
ofgraph-specific adaptation. Regarding the number of samplededges
Ns, results (middle plot) indicate relative robustness ofASE
embeddings, given a minimum number of samples. Asexpected, sampling
a large number of edgesmay cause notice-able perturbation on the
graph (even using the minimally-per-turbing Algorithm 2); this may
be causing a slight decrease inaccuracy. Sensitivity is also
measured wrt K (i.e., the maxi-mum walk length considered in the
optimization). Asexpected, the accuracy increases sharply with K
for the firstfew steps, and then plateaus as higher order
coefficients ofPPI take zero values (cf., Table 3) and do not
affect the results.Finally, the plot on the left depicts accuracy
across a range ofembedding dimensions d.
Runtime. Finally, we compared different embeddingmethods in
terms of runtime. Results for all graphs arereported in Fig. 8. All
experiments were run on a personalworkstation with a quad-core i5
processor, and 16 GB ofRAM. For our proposed ASE, we provide a
light-weight yethighly portable implementation8 that uses the
SVDLIBClibrary [51] for sparse SVD. We also developed a more
scal-able implementation9 that relies on (and requires
installationof) the SLEPc package [49]; this scalable version can
performlarge-scale sparse SVD onmultiple processes and
distributedmemory environments using the message-passing
interface(MPI) [48]. We used the high-performance implementationfor
the five larger graphs, and the portable one for the fivesmaller
ones. Evidently, ASE and HOPE that are SVD-basedare orders of
magnitudes faster than VERSE, Deepwalk, andLINE. The main factor
that slows the latter down seems to bethe large number of
stochastic optimization iterations thatthese methods must perform
to reach accurate embeddings.Nevertheless, it should be noted that
sampling based meth-ods enjoy nearly-full parallelization and could
thus benefitmore from highly multi-threaded environments. On
theother hand, methods that rely on SVD (and EVD) can
greatlybenefit from decades of research on how to efficiently
per-form these decompositions, and a suite of stable and
highlyoptimized software tools.
7 CONCLUSIONS AND FUTURE WORK
We presented a scalable node embedding framework that isbased on
factorizing an adaptive node similarity matrix.The model is
carefully studied, interpreted, and numerically
evaluated using stochastic block models, with an algorith-mic
scheme proposed for training the model parametersefficiently and
without supervision.
The novel framework opens up several interesting futureresearch
directions. For instance, one can explore largerfamilies of node
similarity metrics that can be learned usingthe graph. Furthermore,
it would be interesting to assess theperformance of different
randomized edge sampling meth-ods, and generalize the notion of
adaptive-similarity to het-erogeneous and multi-layered graph
embedding, as well asto edge embedding.
ACKNOWLEDGMENTS
This work was supported by NSF 1901134, 171141, 1514056,and
1500713.
REFERENCES[1] S. Fortunato and M. Barthelemy, “Resolution limit
in community
detection,” Proc. Nat. Acad. Sci. United States America, vol.
104,no. 1, pp. 36–41, 2007.
[2] Y. Zhao, E. Levina, and J. Zhu, “Consistency of community
detec-tion in networks under degree-corrected stochastic block
models,”The Ann. Statist., vol. 40, no. 4, pp. 2266–2292, 2012.
[3] D. P. Bertsekas, Nonlinear Programming. Belmont, NC,
USA:Athena Scientific, 1999.
[4] L. A. Adamic, and E. Adar, “Friends and neighbors on the
web,”Social Netw., vol. 25, no. 3, pp. 211–230, 2003.
[5] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W.
Mahoney,“Community structure in large networks: Natural cluster
sizesand the absence of large well-defined clusters,” Internet
Math.,vol. 6, no. 1, pp. 29–123, 2009.
[6] W. Hamilton, Z. Ying, and J. Leskovec, “Inductive
representationlearning on large graphs,” in Proc. Int. Conf. Neural
Inf. Process.Syst., 2017, pp. 1024–1034.
[7] S. Brin and L. Page, “Reprint of: The anatomy of a
large-scalehypertextual web search engine,” Comput. Netw., vol. 56,
no. 18,pp. 3825–3833, 2012.
[8] D. F. Gleich, “Pagerank beyond the web,” SIAM Rev., vol.
57,no. 3, pp. 321–363, 2015.
[9] I. M. Kloumann, J. Ugander, and J. Kleinberg, “Block models
andpersonalized pagerank,” Proc. Nat. Acad. Sci. United States
America,vol. 114, no. 1, pp. 33–38, 2017.
[10] C.D.Manning, P. Raghavan, andH. Schutze, Introduction to
Informa-tion Retrieval. Cambridge,MA, USA: CambridgeUniv. Press,
2008.
[11] Z. Yang, W. W. Cohen, and R. Salakhutdinov, “Revisiting
semi-supervised learning with graph embeddings,” in Proc. 33rd
Int.Conf. Mach. Learn., 2016, vol. 48, pp. 40–48.
[12] Z. Zhang, P. Cui, X. Wang, J. Pei, X. Yao, and W. Zhu,
“Arbitrary-order proximity preserved network embedding,” in Proc.
Int.Conf. Knowl. Discovery Data Mining, 2018, pp. 2778–2786.
[13] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training
algorithmfor optimal margin classifiers,” in Proc. Workshop Comput.
Learn.Theory, 1992, pp. 144–152.
[14] A. Milanese, J. Sun, and T. Nishikawa, “Approximating
spectralimpact of structural perturbations in large networks,”
Phys. Rev.E, vol. 81, no. 4, pp. 046–112, 2010.
Fig. 8. Runtime of various embedding methods across different
graphs.
8. https://github.com/DimBer/ASE-project/tree/master/portable9.
https://github.com/DimBer/ASE-project/tree/master/
slepc_based
BERBERIDIS AND GIANNAKIS: NODE EMBEDDING WITH ADAPTIVE
SIMILARITIES FOR SCALABLE LEARNING OVER GRAPHS 649
-
[15] H. Cai, V. W. Zheng, and K. Chang, “A comprehensive survey
ofgraph embedding: Problems, techniques and applications,”
IEEETrans. Knowl. Data Eng., vol. 30, no. 9, pp. 1616–1637, Sep.
2018.
[16] P. Goyal, and E. Ferrara, “Graph embedding techniques,
applica-tions, and performance: A survey,” Knowl.-Based Syst., vol.
151,pp. 78–94, 2018.
[17] U. Von Luxburg, “A tutorial on spectral clustering,”
Statist. Com-put., vol. 17, no. 4, pp. 395–416, 2007.
[18] D. Arthur, and S. Vassilvitskii, “k-means++: The advantages
ofcareful seeding,” in Proc. 18th Annu. ACM-SIAM Symp.
Discr.Algorithms, 2007, pp. 1027–1035.
[19] G. H. Golub, and C. Reinsch, “Singular value decomposition
andleast squares solutions,” Numerische Math., vol. 14, no. 5, pp.
403–420, 1970.
[20] D. Berberidis, A. N. Nikolakopoulos, and G. B.
Giannakis,“Adaptive diffusions for scalable learning over graphs,”
IEEETrans. Signal Process., vol. 67, no. 5, pp. 1307–1321,
2018.
[21] L. Condat, “Fast projection onto the simplex and the ‘1
ball,”Math. Program., vol. 158, no. 1/2, pp. 575–585, 2016.
[22] Y. Han and Y. Shen, “Partially supervised graph embedding
forpositive unlabeled feature selection,” in Proc. Int. Joint Conf.
Artif.Intell., 2016, pp. 1548–1554.
[23] T. Hofmann and J. M. Buhmann, “Multidimensional scaling
anddata clustering,” in Proc. Int. Conf. Neural Inf. Process.
Syst., 1994,pp. 459–466.
[24] M. Balasubramanian and E. L. Schwartz, “The isomap
algorithmand topological stability,” Sci., vol. 295, no. 5552,
2002, Art. no. 7.
[25] X. He and P. Niyogi, “Locality preserving projections,” in
Proc.Int. Conf. Neural Inf. Process. Syst., 2003, pp. 153–160.
[26] S. T. Roweis, and L. K. Saul, “Nonlinear dimensionality
reductionby locally linear embedding,” Sci., vol. 290, no. 5500,
pp. 2323–2326,2000.
[27] A. Ahmed, N. Shervashidze, S. Narayanamurthy, V.
Josifovski,and A. J. Smola, “Distributed large-scale natural
graphfactorization,” in Proc. World Wide Web Conf., 2013, pp.
37–48.
[28] C. Yang, Z. Liu, D. Zhao, M. Sun, and E. Y. Chang,
“Networkrepresentation learning with rich text information,” in
Proc. Int.Joint Conf. Artif. Intell., 2015, pp. 2111–2117.
[29] M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu, “Asymmetric
transi-tivity preserving graph embedding,” in Proc. Int. Conf.
Knowl. Dis-covery Data Mining, 2016, pp. 1105–1114.
[30] J. Qiu, Y. Dong, H. Ma, J. Li, K. Wang, and J. Tang,
“Networkembedding as matrix factorization: Unifying DeepWalk,
LINE,PTE, and node2vec,” in Proc. Int. Conf. Web Search Data
Mining,2018, pp. 459–467.
[31] S. Cao, W. Lu, and Q. Xu, “GraRep: Learning graph
representa-tions with global structural information,” in Proc. Int.
Conf. Inf.Knowl. Manage., 2015, pp. 891–900.
[32] B. Shaw, and T. Jebara, “Structure preserving embedding,”
inProc. Int. Conf. Mach. Learn., 2009, pp. 937–944.
[33] Y. Zhao, Z. Liu, and M. Sun, “Representation learning for
measur-ing entity relatedness with rich information,” in Proc. Int.
JointConf. Artif. Intell., 2015, pp. 1412–1418.
[34] Y. Koren, R. M. Bell, and C. Volinsky, “Matrix
factorization techni-ques for recommender systems,” IEEE Comput.,
vol. 42, no. 8,pp. 30–37, Aug. 2009.
[35] J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei,
“LINE:Large-scale information network embedding,” in Proc. World
WideWeb Conf., 2015, pp. 1067–1077.
[36] A. Tsitsulin, D. Mottin, P. Karras, and E. Muller, “VERSE:
Versa-tile graph embeddings from similarity measures,” in Proc.
WorldWide Web Conf., 2018, pp. 539–548.
[37] J. Tang, M. Qu, and Q. Mei, “PTE: Predictive text
embeddingthrough large-scale heterogeneous text networks,” in Proc.
Int.Conf. Knowl. Discovery Data Mining, 2015, pp. 1165–1174.
[38] S. Abu-El-Haija, B. Perozzi, R. Al-Rfou, and A. Alemi,
“Watchyour step: Learning graph embeddings through attention,”
arXiv:1710.09599, 2017.
[39] B. Perozzi, R. Al-Rfou, and S. Skiena, “DeepWalk: Online
learningof social representations,” in Proc. ACM SIGKDD Int. Conf.
Knowl.Discovery Data Mining, 2014, pp. 701–710.
[40] A. Grover and J. Leskovec, “node2vec: Scalable feature
learningfor networks,” in Proc. ACM SIGKDD Int. Conf. Knowl.
DiscoveryData Mining, 2016, pp. 855–864.
[41] A. Bordes, N. Usunier, A. Garcia-Duran, J.Weston, andO.
Yakhnenko,“Translating embeddings for modeling multirelational
data,” in Proc.Int. Conf. Neural Inf. Process. Syst., 2013, pp.
2787–2795.
[42] R. Xie, Z. Liu, and M. Sun, “Representation learning of
knowledgegraphs with hierarchical types,” in Proc. Int. Joint Conf.
Artif.Intell., 2016, pp. 2965–2971.
[43] C. Donnat,M. Zitnik, D.Hallac, and J. Leskovec, “Learning
structuralnode embeddings via diffusion wavelets,” in Proc. 24th
ACMSIGKDD Int. Conf. Know.DiscoveryDataMining, 2018, pp.
1320–1329.
[44] L. F. R. Ribeiro, P. H. P. Saverese, and D. R. Figueiredo,
“struc2vec:Learning node representations from structural identity,”
in Proc.ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining,
2017,pp. 385–394.
[45] D. Wang, P. Cui, and W. Zhu, “Structural deep
networkembedding,” in Proc. ACM SIGKDD Int. Conf. Knowl.
DiscoveryData Mining, 2016, pp. 1225–1234.
[46] F. Tian, B. Gao, Q. Cui, E. Chen, and T. Liu, “Learning
deep repre-sentations for graph clustering,” in Proc. 28th AAAI
Conf. Artif.Intell., 2014, pp. 1293–1299.
[47] S. Cao, W. Lu, and Q. Xu, “Deep neural networks for
learninggraph representations,” in Proc. 30th AAAI Conf. Artif.
Intell., 2016,pp. 1145–1152.
[48] B. Barker, “Message passing interface (MPI),” Workshop:
High Per-form. Comput. Stampede, vol. 256, 2015.
[49] V. Hernandez, J. E. Roman, and V. Vidal, “SLEPc: A scalable
andflexible toolkit for the solution of eigenvalue problems,”
ACMTrans. Math. Softw., vol. 31, no. 3, pp. 351–362, 2005.
[50] T. Dettmers, P. Minervini, P. Stenetorp, and S.
Riedel,“Convolutional 2D knowledge graph embeddings,” in Proc.
AAAIConf., Feb. 2–7, 2018, pp. 1811–1818.
[51] [Online]. Available: https://tedlab.mit.edu/~
dr/SVDLIBC/,Accessed: 2019.
[52] [Online]. Available:
https://snap.stanford.edu/data/index.html,Accessed: 2019.
Dimitris Berberidis (S’15) received the diplomadegree in
electrical and computer engineering(ECE) from theUniversity of
Patras, Patras,Greece,in 2012; and the MSc as well as PhD degrees
inECE from the University of Minnesota, Minneapolis,MN. His
research interests lie in the areas of statisti-cal signal
processing, focusing on sketching andtracking of large-scale
processes, and in machinelearning, focusing on the developpement of
algo-rithms for scalable learning over graphs,
includingsemi-supervised classification, and node embed-ding. He is
a studentmember of the IEEE.
Georgios B. Giannakis (F’97) received thediploma degree in
electrical engr. from the NationalTechnical University of Athens,
Greece, in 1981,the MSc degree in electrical engineering, in
1983,the MSc degree in mathematics, in 1986, and thePhD in
electrical engineering, in 1986, from theUni-versity of Southern
California (USC). From 1982 to1986 he was with USC. He was with the
Universityof Virginia from 1987 to 1998, and since 1999 hehas been
a professor with the University of Minne-sota, where he holds an
Endowed chair inWireless
Telecommunications, a University of Minnesota McKnight
Presidentialchair in ECE, and serves as director of the Digital
Technology Center. Hisgeneral interests span the areas of
communications, networking and statis-tical learning - subjects on
which he has published more than 450 journalpapers, 750 conference
papers, 25 book chapters, two edited books, andtwo research
monographs (h-index 142). Current research focuses onlearning from
big data, wireless cognitive radios, and network science
withapplications to social, brain, and power networks with
renewables. He isthe (co-) inventor of 32 patents issued, and the
(co-) recipient of nine bestjournal paper awards from the
IEEESignal Processing (SP) andCommuni-cations Societies, including
the G. Marconi Prize Paper Award in WirelessCommunications. He also
received Technical Achievement Awards fromthe SP Society (2000),
from EURASIP (2005), a Young Faculty TeachingAward, the G. W.
Taylor Award for Distinguished Research from the Uni-versity of
Minnesota, and the IEEE Fourier Technical Field Award (inaugu-ral
recipient in 2015). He is a fellow of EURASIP, and has served the
IEEEin a number of posts, including that of a distinguished
lecturer for the IEEE-SPSociety. He is a fellow of the IEEE.
" For more information on this or any other computing
topic,please visit our Digital Library at
www.computer.org/csdl.
650 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL.
33, NO. 2, FEBRUARY 2021
https://tedlab.mit.edu/~
dr/SVDLIBC/https://snap.stanford.edu/data/index.html
01-tkde-li-2930987-x02-tkde-ding-2931903-x03-tkde-wu-2898401-x04-tkde-hu-2931014-x05-tkde-zhu-2930056-x06-tkde-ma-2932406-x07-tkde-ge-2930598-x08-tkde-kulkarni-2912179-x09-tkde-he-2932388-x10-tkde-wang-2931901-x11-tkde-wu-2930696-x12-tkde-wang-2931906-x13-tkde-xiao-2899597-x14-tkde-jiang-2930518-x15-tkde-feng-2933837-x16-tkde-zhang-2932063-x17-tkde-luo-2916683-x18-tkde-chen-2931687-x19-tkde-xiao-2931548-x20-tkde-kohn-2905235-x21-tkde-li-2930690-x22-tkde-lin-2930516-x23-tkde-bermperidis-2931542-x24-tkde-xu-2932984-x25-tkde-li-2931327-x26-tkde-yang-2932742-x27-tkde-yang-2932666-x28-tkde-wang-2904569-x29-tkde-aggarwal-2935203-x30-tkde-gao-2930060-x31-tkde-xuan-2933833-x32-tkde-chan-2931969-x33-tkde-zhang-2933516-x34-tkde-plantevit-2931340-x
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description >>>
setdistillerparams> setpagedevice
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description >>>
setdistillerparams> setpagedevice
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description >>>
setdistillerparams> setpagedevice
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description >>>
setdistillerparams> setpagedevice
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description >>>
setdistillerparams> setpagedevice
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description >>>
setdistillerparams> setpagedevice
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description >>>
setdistillerparams> setpagedevice
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description >>>
setdistillerparams> setpagedevice
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description >>>
setdistillerparams> setpagedevice
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition () /PDFXRegistryName () /PDFXTrapped
/False
/CreateJDFFile false /Description >>>
setdistillerparams> setpagedevice
/ColorImageDict > /JPEG2000ColorACSImageDict >
/JPEG2000ColorImageDict > /AntiAliasGrayImages false
/CropGrayImages true /GrayImageMinResolution 150
/GrayImageMinResolutionPolicy /OK /DownsampleGrayImages false
/GrayImageDownsampleType /Bicubic /GrayImageResolution 300
/GrayImageDepth -1 /GrayImageMinDownsampleDepth 2
/GrayImageDownsampleThreshold 1.50000 /EncodeGrayImages true
/GrayImageFilter /DCTEncode /AutoFilterGrayImages false
/GrayImageAutoFilterStrategy /JPEG /GrayACSImageDict >
/GrayImageDict > /JPEG2000GrayACSImageDict >
/JPEG2000GrayImageDict > /AntiAliasMonoImages false
/CropMonoImages true /MonoImageMinResolution 1200
/MonoImageMinResolutionPolicy /OK /DownsampleMonoImages false
/MonoImageDownsampleType /Bicubic /MonoImageResolution 600
/MonoImageDepth -1 /MonoImageDownsampleThreshold 1.50000
/EncodeMonoImages true /MonoImageFilter /CCITTFaxEncode
/MonoImageDict > /AllowPSXObjects false /CheckCompliance [ /None
] /PDFX1aCheck false /PDFX3Check false /PDFXCompliantPDFOnly false
/PDFXNoTrimBoxError true /PDFXTrimBoxToMediaBoxOffset [ 0.00000
0.00000 0.00000 0.00000 ] /PDFXSetBleedBoxToMediaBox true
/PDFXBleedBoxToTrimBoxOffset [ 0.00000 0.00000 0.00000 0.00000 ]
/PDFXOutputIntentProfile (None) /PDFXOutputConditionIdentifier ()
/PDFXOutputCondition ()