-
GRAPH MATCHING AND LEARNING IN PATTERN
RECOGNITION IN THE LAST 10 YEARS
PASQUALE FOGGIA*, GENNARO PERCANNELLA and MARIO VENTO
Department of Information Engineering
Electrical Engineering and Applied Mathematics
University of Salerno
Via Giovanni Paolo II, 132, 84084 Fisciano (SA),
Italy*[email protected]@[email protected]
Received 25 June 2013
Accepted 14 October 2013
Published 9 December 2013
In this paper, we examine the main advances registered in the
last ten years in Pattern
Recognition methodologies based on graph matching and related
techniques, analyzing more
than 180 papers; the aim is to provide a systematic framework
presenting the recent history and
the current developments. This is made by introducing a
categorization of graph-based tech-niques and reporting, for each
class, the main contributions and the most outstanding research
results.
Keywords : Structural pattern recognition; graph matching; graph
kernels; graph embeddings;
graph learning; graph clustering; graph and tree search
strategies.
1. Introduction
Structural Pattern Recognition bases its theoretical foundations
on the decomposi-
tion of objects in terms of their constituent parts
(subpatterns) and of the relations
among them. Graphs, usually enriched with node and edge
attributes, are the elec-
tive data structures for supporting this kind of
representations. Some of the methods
working on graphs introduce some restrictions on the structure
of the graphs (e.g.
only allowing planar graphs) or on the kind of attributes (e.g.
some methods only
allow single real-valued attributes for the graph edges).
The use of a graph-based pattern representation induces the need
to formulate
the main activities required for Pattern Recognition in terms of
operations on
graphs: classication, usually intended as the comparison between
an object and a
Corresponding author.
International Journal of Pattern Recognitionand Articial
Intelligence
Vol. 28, No. 1 (2014) 1450001 (40 pages)
#.c World Scientic Publishing CompanyDOI:
10.1142/S0218001414500013
1450001-1
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
set of prototypes, and learning, which is the process for
obtaining a model of a class
starting from a set of known samples, are among the key issues
that must be
addressed using graph-based techniques.
The use of graphs in Pattern Recognition dates back to the early
seventies, and
the paper \Thirty years of graph matching in Pattern
Recognition"26 reports a
survey of the literature on graph-based techniques since the rst
years and up to the
early 2000's. We have surely assisted to a maturation of the
classical techniques for
graph comparison, either exact or inexact; at the same time we
are assisting to a
rapid growth of many alternative approaches, such as graph
embedding and graph
kernels, aimed at making possible the application to graphs of
vector-based techni-
ques for classication and learning (such as the ones derived
from the statistical
classication and learning theory).
In this paper, we discuss the main advances registered in
graph-based method-
ologies in the last 10 years, analyzing more than 180 papers on
this topic; the aim is
to provide a systematic framework presenting the recent history
of graphs in Pattern
Recognition and the current trends.
Our analysis starts from the above mentioned survey26 and
completes its contents
by considering a selection of the most recent main
contributions; consequently, the
present paper, for the sake of conciseness, reports only
references to works published
during the last 10 years. The reader is kindly invited to
consult Ref. 26 for recovering
the previous related works. However, the taxonomy of the papers
presented in
Ref. 26 has been extended with other graph-based problems that
are related to
matching, either because they involve some form of graph
comparison, or because
they use a graph-based approach to group patterns into classes.
Figure 1 shows a
graphical representation of the taxonomy adopted in this
paper.
In fact, in the last decade we have assisted to the birth and
growth of methods
facing learning and classication in a rather innovative scientic
vision: the
computational burden of matching algorithms together with their
intrinsic com-
plexity, in opposition to the well-established world of
statistical Pattern Recognition
methodologies, suggested new paradigms for the graph-based
methods. Why do not
we try to reduce graph matching and learning to vector-based
operations, so as to
make it possible the use of statistical approaches?
Two opposite ways of facing the problem, each with its pros and
cons: \graphs
from the beginning to the end", with a few heavy algorithms, but
the exploitation of
all the information contained in the graphs; on the other side,
the risk of loosing
discriminating power during the conversion of graphs into
vectors (by selecting
suitable properties), counterbalanced by the immediate access to
all the theoretically
assessed achievements of the statistical framework. In a sense,
there are some tra-
ditional tools that can be considered to be halfway between
these two approaches: an
example is Graph Edit Distance (GED), that is based on a
matching between the
nodes and the edges of the two graphs, but produces a distance
information that can
be used to cast the graphs into a metric space. However, GED can
still be considered
P. Foggia, G. Percannella & M. Vento
1450001-2
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
an approach of the rst kind, since in the computation of the
metric, the information
attached to the subparts can be considered in a
context-dependent way, and has not
to be reduced a priori to a vectorial form.
These two opposite factions are now simultaneously active, each
hoping to
overcome the other; 10 years ago these innovative methods were
in the background,
but now they are gaining more and more attention in the scientic
literature on
graphs. This is the reason why the categorization reported in
this paper has
been further expanded by including a new section describing a
variety of novel
approaches, such as graph embedding, graph kernels, graph
clustering and graph
learning, dedicating a subsection to each of them. It is worth
pointing out that these
methods were of course already known at the time of Ref. 26, but
their diusion and
scientic interest has shown a signicant growth in the last
decade. For instance, a
recent survey by Hancock and Wilson71 compare and contrast the
work on graph-
based techniques by the Bern group led by Horst Bunke and the
York group led by
Edwin Hancock. The rst group has historically put more emphasis
on the purely
structural aspects of graph-based techniques, while the second
has focused on the
extensions to the graph domain of probabilistic and information
theoretic method-
ologies; however, both the schools in the last decade have found
a point of conver-
gence in the adoption of graph kernels and graph embedding
techniques. Another
recent paper by Livi and Rizzi98 present a survey of graph
matching techniques.
However, despite its title, it is mostly dedicated to graph
embeddings and graph
kernels, and does not aim to cover comprehensively the graph
matching techniques;
furthermore the paper is less specically devoted to approaches
used within the
Pattern Recognition community.
The overall organization of our paper is based on a
categorization of the
approaches with respect to the problem formulation they adopt,
and secondarily to
the kind of technique used to face the problem, following the
taxonomy reported in
Fig. 1. We have distinguished between graph matching problems,
that will be pre-
sented in Sec. 2, and other problems related to graph
comparison, that are discussed
in Sec. 3. In particular, the section on graph matching is
divided between exact and
inexact matching techniques. The section on other problems is
articulated according
to the techniques that have obtained most attention in recent
literature, namely
graph embedding, graph kernels, graph clustering and graph
learning with a mis-
cellaneous problems subsection for less common but related
problems.
For reasons of space, in this survey we have focused on the
algorithms and not on
their applications. The interested reader may nd some
complementary surveys on
the applications of graph matching to Computer Vision and
Pattern Recogniton, in
Refs. 28 and 53. For the very same reasons, we have not included
research papers
from outside of the Pattern Recognition community. Graph-based
methods are used
and investigated in many other research elds; among them, we can
mention, with no
pretense at completeness, Data Mining, Machine Learning, Complex
Networks
Analysis and Bioinformatics.
Graphs in Pattern Recognition in the Last 10 Years
1450001-3
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
2. Graph Matching
We recall briey the terminology used in our previous survey.26
Exact graph
matching is the search for a mapping between the nodes of two
graphs which is edge-
preserving, in the sense that if two nodes in the rst graph are
linked by an edge, the
corresponding nodes in the second graph must have an edge, too.
Several variants of
exact matching exist (e.g. isomorphism, subgraph isomorphism,
monomorphism,
homomorphism, maximum common subgraph) depending on whether this
constraint
must hold in both directions of the mapping or not, if the
mapping must be injective
and if the mapping must be surjective.
More formally, given two graphsG1 V1;E1 andG2 V2;E2 (where V and
Eare the sets of nodes and edges, respectively), a mapping is a
function : V1 ! V2. Amapping is edge preserving i:
8v;w 2 V1; v;w 2 E1 ) v; w 2 E2 _ v w: 1An edge preserving
mapping is also called a homomorphism. A monomorphism, also
called an edge-induced subgraph isomorphism, is a homomorphism
that is also
injective:
8v 6 w 2 V1; v 6 w: 2A graph isomorphism is a monomorphism that
is bijective, and whose inverse
mapping 1 is also a monomorphism:
8v2 2 V2; 9v1 1v2 2 V1 : v2 v1:1 is a monomorphism
3
Fig. 1. A graphical representation of the adopted categorization
of the considered graph-based techni-
ques. The techniques in the gure have been chosen because they
either involve some kind of graph
comparison, or use a graph-based approach to group objects into
classes.
P. Foggia, G. Percannella & M. Vento
1450001-4
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
A mapping is a subgraph isomorphism, that some authors call a
node-induced
subgraph isomorphism, if there is a (node-induced) subgraph of G
02 of G2 such that is an isomorphism between G1 and G
02. More formally:
V 02 V2 fv2 2 V2 : 9v1 2 V1 : v2 v1gE 02 E2 E2 \ V 02 V 02 is an
isomorphism between G1 and G
02 V 02 ;E 02:
8>>>:
4
Finally, the maximum common subgraph problem is the search of
the largest sub-
graph ofG1 that is isomorphic to a subgraph ofG2 (and usually,
of the corresponding
mapping between the two subgraphs).
In inexact graph matching, instead, the constraints on edge
preservation are
relaxed, either because the algorithms attempt to deal with
errors in the input graphs
(and so we have error-correcting matching) or because, for
reducing the computa-
tional cost, they search the mapping with a strategy that does
not ensure the opti-
mality of the found solutions (approximate or suboptimal
matching).
For inexact matching, there is not a single formal statement of
the problem;
instead, dierent papers often use slightly dierent
formalizations, that may lead to
dierent ways of relaxing the edge preservation constraints. With
no pretense at
completeness, in the following we will describe two
formalizations that have been
used by several works.
In the rst denition, the concept of a mapping function is
extended so as to
include the possibility of mapping a node v to a special, null
node denoted as ; thus
the mapping is a function : V1 ! V2 [ fg. We will assume that is
injective for thenodes of V1 not mapped to ,
8v 6 w 2 V1; v 6 ) v 6 w 5while allowing that several nodes may
be mapped to . With a slightly improper
notation, we will say that 1w to indicate that there is no node
v 2 V1 suchthat v w.
Then, the cost of a mapping is dened as:
C Xv2V1
v6
CRv; v Xv2V1
v
CDv Xw2V2
1w
CDw
X
v;w2E1v; w2E2
C 0Rv;w; v; w
X
v;w2E1v; w62E2
C 0Dv;w X
v;w2E21v; 1w62E1
C 0Dv;w; 6
where CR: ; : is the cost for the replacement of a node, CD is
the cost for thedeletion of a node, and C 0R and C
0D are the replacement and deletion costs for edges.
Graphs in Pattern Recognition in the Last 10 Years
1450001-5
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
These cost functions are to be dened according to the
application requirements, and
usually take into account additional, application-dependent
attributes that are at-
tached to nodes and edges.
In this formulation, the matching problem is cast as the search
of the matching
that minimizes the cost C. With an appropriate choice of the
cost functions, itcan be demonstrated that the exact matching
problems dened previously can be
seen as special cases of this one, with the additional
requirement that the matching
cost must be 0.
In the second formulation, called weighted graph matching, the
graphs are
represented through their adjacency matrices; usually the
elements of the matrices
are not restricted to 0 and 1, but can express a continuous
weight for the relation
between two nodes: so the generic element Aij of the matrix A is
0 if there is not an
edge between nodes i and j, and has otherwise a real value in 0;
1 denoting theweight for the edge i; j.
Given two graphs represented by their adjacency matrices A and
B, a com-
patibility tensor Cijkl is introduced to measure the
compatibility between two
edges:
Cijkl 0 if Aij 0 or Bij 0;cAij;Bkl otherwise
7
where c: ; : is a suitably dened compatibility function. The
matching is repre-sented by a matching matrix M , whose elements
Mik are 1 if node i of the rst
graph is matched with node k of the second graph, 0 otherwise.
Thus the matching
problem is formulated as the search of the matrix M that
maximizes the following
function:
WM Xi
Xj
Xk
Xl
Mik Mjl Cijkl: 8
subject to the constraints:
Mik 2 f0; 1g; 8i;Xk
Mik 1; 8k;Xi
Mik 1: 9
Also with this formulation, it can be demonstrated that with a
suitable choice of
the compatibility function c: ; :, the various forms of exact
matching can be seenas a special case.
While in the years covered by Ref. 26 the research has explored
both exact and
inexact matching, the recent work on graphs in the Pattern
Recognition community
has been mostly focused on inexact graph matching. This may be
due to the fact that
today the Pattern Recognition research is applying graphs to
more complex pro-
blems than those that were feasible some years ago, and so it is
more frequent the use
of larger and noisier graphs.
P. Foggia, G. Percannella & M. Vento
1450001-6
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
2.1. Exact matching
While there has been little work on the overall to improve
existing exact matching
algorithms, some eort has been put to provide a better
characterization of the
existing methods. As an example, the 2003 paper by De Santo et
al.138 presents an
extensive comparative evaluation of four exact algorithms for
graph isomorphism
and graph-subgraph isomorphism.
Most existing exact matching algorithms are based on some form
of tree search,
where the matching is constructed starting with an empty mapping
function and
adding a pair of nodes at a time, usually with the possibility
of backtracking, and the
use of heuristics to avoid the complete exploration of the space
of all the possible
matchings. In 2007, Konc and Janei87 propose MaxCliqueDyn, an
improved al-gorithm for nding the Maximum Clique (and hence the
Maximum Common Sub-
graph) which uses branch and bound, combined with approximate
graph coloring for
nding tight bounds in order to prune the search space. In a 2011
paper, Ullmann160
presents a substantial improvement of his own very well-known
subgraph isomor-
phism algorithm from 1976. The new algorithm incorporates
several ideas from the
literature on the Binary Constraint Satisfaction Problem, of
which the subgraph
isomorphism can be considered a special case. Also Zampelli et
al.179 propose a
method based on Constraint Satisfaction, which is an extension
of the technique
introduced by Larrosa and Valiente in 2002.90 A further
development of the tech-
nique, with the introduction of a better ltering based on the
AllDierent constraint,
is proposed by Solnon152 in 2010.
Among the approaches not based on tree search, we can mention
Gori et al.,64
who, in their 2005 paper, propose an isomorphism algorithm that
is based on Ran-
dom Walks, that works only on a class of graphs denoted by the
authors as Mar-
kovian Spectrally Distinguishable graphs; the authors verify
experimentally on a large
database of graphs that, as long as the graphs have some kind of
irregularity or
randomness, the probability of not satisfying this assumption is
very low. The 2011
paper by Weber et al.169 extends the matching algorithm based on
the construction
of a decision tree by Messmer and Bunke,109 signicantly reducing
the spatial
complexity for graphs whose nodes have a small number of dierent
labels. In their
2004 paper,39 Dickinson et al. discuss the matching problem
(graph isomorphism,
graph-subgraph isomorphism and maximum common subgraph) for the
special case
of graphs having unique node labels. Finally, the 2012 paper by
Dahm et al.34 present
a technique for speeding up existing exact subgraph isomorphism
algorithms on large
graphs.
2.2. Inexact matching
Inexact matching methods have received comparatively more
attention in the re-
search community, both by extending existing techniques and by
introducing novel
ideas unrelated to previous work. In particular, the extensions
of previous methods
have interested mostly algorithms based on the reduction of
graph matching to a
Graphs in Pattern Recognition in the Last 10 Years
1450001-7
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
continuous optimization problem, algorithms based on spectral
properties of the
graphs (i.e. properties related to the eigenvalues and
eigenvectors of the adjacency
matrix or of other matrices characterizing the graph structure),
and methods ap-
proximating the solution of the graph matching problem by means
of the bipartite
graph matching, which is a simpler problem solvable in
polynomial time.
Many inexact matching algorithms are formulated as an
approximate way to
compute the GED. A recent paper by Gao et al.57 in 2010 presents
a survey on this
topic. GED computes the distance between two graphs on the basis
of the minimal
set of edit operations (e.g. node additions and deletions, etc.)
needed to transform
one graph into the other one. A 2012 paper by Sole-Ribalta et
al.151 provides a
theoretical discussion on the relation between the properties of
the distance function
and the costs assigned to each edit operation.
Although in principle the GED problem is not related to
matching, in practice
most methods compute the distance by nding a matching for the
nodes that are
preserved by the edit operations (i.e. those that are not added
or removed, but
possibly have their label changed); given this matching, the
edit distance can be
obtained as the sum of a term accounting for the matched nodes
and their edges, and
a term accounting for the remaining nodes/edges (see Eq. (6)).
So usually the out-
come of the algorithm is not only an indication of the distance
between the graphs,
but also the matching that is supposed to minimize the value of
this distance. This is
why we have chosen to include some GED methods in this
section.
2.2.1. Techniques based on tree search
Methods based on tree search have been also used for inexact
matching. In this case,
the adopted heuristics may not ensure that the optimal solution
is found, yielding a
suboptimal matching. As an example, Sanfeliu et al.135,136 and
Serratosa et al.,142
extend their previous work on inexact matching of
Function-Described Graphs
(FDG), that are Attributed Relational Graphs enriched with
constraints on the joint
probabilities of nodes and edges, used to represent a set of
graphs, while in Ref. 141,
Serratosa et al. detail how these FDG can be automatically
constructed. Cook et al.29
in 2003 propose the use of beam search, a heuristic search
method derived from the
A* algorithm, for computing the GED. The paper by Hidovi and
Pelillo73 in 2004extends the denition of a graph metric based on
Maximum Common Subgraph,
introduced by Bunke in 1999, so that it can also be applied to
graphs with node
attributes.
2.2.2. Continuous optimization
While graph matching is inherently a discrete optimization
problem, several inexact
algorithms have been proposed to reformulate it as a continuous
problem (by
relaxing some constraints), solve the continuous problem using
one of the many
available optimization algorithms and then recast the found
solution in the discrete
domain. Usually the algorithm used for the continuous problem
only ensures that a
P. Foggia, G. Percannella & M. Vento
1450001-8
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
local optimum is found; moreover, since a discretization step is
required afterwards,
the matching found may not even guarantee to exhibit local
optimality.
An example of evolution of an existing matching method of this
category is given
by the 2003 paper by Massaro and Pelillo,107 which improves a
previous work on the
search of the Maximum Common Subgraphs that use a theorem by
Bomze to re-
formulate this problem as a quadratic optimization in a
continuous domain.
Zaslavskiy et al.182 in their 2009 paper present a graph
matching algorithm in which
the matching is formulated as a convex-concave programming
problem which is
solved by interpolating between two approximate simpler
formulations. Also the
2011 paper by Rota Bul et al.134 is based on the same
formulation of graphmatching; in this case the authors solve the
quadratic optimization problem using
infection-immunization dynamics, a new iterative algorithm based
on evolutionary
game theory. The 2002 paper by van Wyk et al.164 addresses the
problem of At-
tributed Graph matching as a parameter identication problem, and
propose the use
of a Reproducing Kernel Hilbert Space interpolator (RKHS) to
solve this problem.
The 2003 paper by van Wyk and van Wyk161 extends the previous
method by
providing a more general formulation of the problem. The same
authors in a 2004
paper163 further generalize the method, presenting a
kernel-based framework for
graph matching which include as special cases the previous two
algorithms. In 2004,
van Wyk and van Wyk162 present a graph matching algorithm based
on the Pro-
jections Onto Convex Sets approach. The 2006 paper by Justice
and Hero81 proposes
a reformulation of the GED as Binary Linear Programming problem,
for which
they provide upper and lower bounds in polynomial time. Kostin
et al.89 in 2005
present an extension of the probabilistic relaxation algorithm
by Christmas et al.25
Chevalier et al.22 propose in a 2007 paper a technique that
integrates probabilistic
relaxation with bipartite graph matching, applied to Region
Adjacency Graphs. In
their 2008 paper,157 Torresani et al. introduce an algorithm
based on a technique
called dual decomposition: the matching problem (in a continuous
reformulation) is
decomposed into a set of simpler problems, depending on a
parameter vector; the
simpler problems can be solved providing a lower bound to the
minimization of the
functional to be optimized. Then the algorithm searches for the
tightest bound by
varying the parameter vector. Caetano et al.19 propose in 2009 a
technique in which
the functional to be optimized has a parametric form, and the
authors propose a
training phase to learn these parameters. In a 2011 paper,21
Chang and Kimia
present an extension of the Graduated Assigment Graph Matching
by Gold and
Rangarajan, modied so as to work on hypergraphs instead of
graphs. Zhou and De
la Torre184 present a method called factorized graph matching in
which the anity
matrix used to dene the functional to be optimized is factored
into a Kronecker
product of smaller matrices, separately encoding the structure
of the graphs and the
anities between nodes and between edges. The authors propose an
optimization
method based on this factorization that leads to an improvement
in space and time
requirements.
Graphs in Pattern Recognition in the Last 10 Years
1450001-9
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
Sanrom et al.137 in 2012 propose a special purpose,
probabilistic graph matchingmethod for graphs representing sets of
2D points, based on the Expectation
Maximization (EM) algorithm.
Sole-Ribalta and Serratosa149 in their 2011 paper propose two
sub-optimal
algorithms for the common labeling problem, a generalization of
inexact graph
matching in which the number of graphs is larger than two (the
problem cannot be
reduced to several pairwise matchings). The rst proposed
algorithm uses an ex-
tension of Graduated Assignment, while the second is based on a
probabilistic for-
mulation and adopts an iterative approach somewhat similar to
Probabilistic
Relaxation. A 2011 paper by Rodenas et al.131 presents a
parallelized version of the
rst algorithm. A 2013 paper by Sole-Ribalta and Serratosa150
present a further
development of the rst algorithm, based on the matching of the
nodes of all graphs
to a virtual node set.
2.2.3. Spectral methods
Spectral matching methods are based on the observation that the
eigenvalues of a
matrix do not change if the rows and columns are permuted. Thus,
given the matrix
representations of two isomorphic graphs (for instance, their
adjacency matrices),
they have the same eigenvalues. The converse is not true; so,
spectral methods are
inexact in the sense that they do not ensure the optimality of
the solution found.
Caelli and Kosinov16,18 in 2004 propose a matching algorithm
that uses the graph
eigenvectors to dene a vector space onto which the graph nodes
are projected; a
clustering algorithm in this vector space is used to nd possible
matches. Also
Robles-Kelly and Hancock,130 in a 2007 paper, propose the
embedding of graph nodes
into a dierent space (a Riemannian manifold) using spectral
properties. The 2004
and 2005 papers by Robles-Kelly and Hancock,128,129 present a
graph matching
approach based on Spectral Seriation of graphs: the adjacency
matrix is transformed
into a sequence using spectral properties, then the matching is
performed by com-
puting the String Edit Distance between these sequences. Cour et
al.31 in 2007
propose a spectral matching method called balanced graph
matching, using a novel
relaxation scheme that naturally incorporates matching
constraints. The authors
also introduce a normalization technique that can be used to
improve several other
algorithms such as the classical Graduated Assignment Graph
Matching by Gold
and Rangarajan. Cho et al.23 propose a reformulation of the
inexact graph matching
as a random walk problem, and show that this formalization
provides a theoretical
interpretation of both spectral methods and of some other
techniques based on
continuous optimization; in this framework, the authors present
an original algo-
rithm based on techniques commonly used for Web ranking.
In a 2006 paper, Qiu and Hancock118 present an approximate,
hierarchical
method for graph matching that uses spectral properties to
partition each graph into
nonoverlapping subgraphs, which are then matched separately,
with a signicant
reduction of the matching time. The same authors present a
somewhat similar idea in
P. Foggia, G. Percannella & M. Vento
1450001-10
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
a 2007 paper,119 where the partition is based on commute times,
which can be
computed from the Laplacian spectrum of the graph. Wilson and
Zhu171 in their 2008
paper present a survey of dierent techniques for the spectral
representation of
graphs and trees. In 2011, Escolano et al.45 propose a matching
method based on the
representation of a graph as a bag of partial node coverages,
described using spectral
features. In 2011, Duchenne et al.40 present a generalization of
spectral matching
techniques to hypergraphs, using some results from tensor
algebra.
2.2.4. Other approaches
Among other techniques used for inexact matching, Bagdanov and
Worring2 in a
2003 paper introduce a matching algorithm based on bipartite
matching for the so-
called First Order Gaussian Graphs (FOGG ), which are an
extension of random
graphs having Gaussian random variables as their node
attributes. Also the paper by
Skomorowski147 in 2007 presents a pattern recognition algorithm
based on a variant
of random graphs, using for the matching a syntactic approach
based on graph
grammars. The 2003 paper by Park et al.116 addresses the problem
of partial
matching between a model graph and a larger image graph by
combining a proba-
bilistic formulation similar to the one used in probabilistic
relaxation with a greedy
search technique. In a 2006 paper, Conte et al.27 present an
inexact matching
technique for pyramidal graph structures, which is based on
weighted bipartite graph
matching, but use information from the upper levels of a pyramid
to constrain the
matching of the lower levels. Xiao et al.174 in 2008 propose a
graph distance based on
a vector representation called Substructure Abundance Vector
(SAV), that can be
considered as an extension of the graph distance based on
Maximum Common
Subgraph (MCS). The paper by Auwatanamongkol1 in 2007 proposes a
genetic
algorithm for a special case of inexact matching, where the
nodes are associated to 2D
points. Bourbakis et al.9 in 2007 introduce the so-called
Local-Global graphs (L-G
graphs), as an extension of Region Adjacency graphs in which the
edges are obtained
through a Delaunay triangulation, for which they introduce an
inexact, suboptimal
matching algorithm which is based on a greedy search. In 2002,
Wang et al.168
present a polynomial algorithm for the inexact graph-subgraph
matching for the
special case of undirected acyclic graphs. The 2004 paper by
Sebastian et al.140
presents a GED algorithm for the special case of shock graphs,
based on dynamic
programming. In their 2008 paper,4 Bai and Latecki propose an
inexact suboptimal
matching algorithm for skeleton graphs, based on the use of
bipartite graph
matching. Chowdury et al.24 in a 2009 paper combine weighted
bipartite graph
matching with the use of the automorphism groups for the cycles
contained in the
graph, to improve the accuracy of the matching found. A 2009
paper by Riesen and
Bunke125 proposes an approximation of GED with the use of
Bipartite Graph
Matching, solved using the Munkres' algorithm. The 2010 paper by
Kim et al.83
approximates the matching between Attributed Relational Graphs
using the nested
assignment problem: an inner assigment step is used to nd the
best matching of the
Graphs in Pattern Recognition in the Last 10 Years
1450001-11
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
adjacent edges; this information is used then to dene a matching
cost for the nodes,
and an outer assigment step nds the node matchings that
minimizes the sum of
these costs. This double application of the assignment problem
is the original aspect
of this method, dierentiating it, for instance, from the
heuristic proposed by Riesen
and Bunke. Also Raveaux et al.121 in 2010 present an approximate
algorithm based
on bipartite graph matching; in this case the aim is to compute
an approximation of
the GED, and the bipartite matching is performed between small
subgraphs of each
of the two graphs. In 2011, Fankhauser et al.46 present an
algorithm for computing
the GED using bipartite graph matching, solved using the
algorithm by Volgenant
and Jonker. The same authors in 201247 propose an suboptimal
technique for graph
isomorphism, also based on bipartite graph matching. The
algorithm has the dis-
tinctive feature that it either nds an exact solution, or it
rejects the pair of graphs; thus
a slower algorithm can be used for the cases not covered by the
proposed method.
Tang et al.155 in 2011 propose a graph matching algorithm based
on the Dot
Product Representation of Graphs (DPRG) proposed by Scheinerman
and Tucker,139
which represents each node using a numeric vector chosen so that
each edge value
corresponds approximately to the dot product of the nodes
connected by the edge; the
choice of the node vectors is formulated as a continuous
optimization problem. The
proposed method is extended in a 2012 paper by the same
authors.156 The 2011 paper
by Macrini et al.104 proposes a matching algorithm for bone
graphs, which are a
representation for 3D shapes, using weighted bipartite graph
matching. The paper by
Jiang et al.78 in 2011 presents a technique for inexact subgraph
isomorphism based on
geometric hashing, requiring very little computational cost for
the intended use case
of searching for several small input graphs within a large
reference graph.
A novel optimization technique, Estimation of Distributions
Algorithms (EDA),
has been succesfully used for inexact graph matching. EDA are
somewhat similar to
genetic/evolutive algorithms, but the parameters of each
tentative solution are
considered as random variables; a stochastic sampling process is
used to produce the
next generation.
The paper by Bengoetxea et al.5 in 2002 proposes the use of EDA
for inexact,
suboptimal graph matching, by associating each node of the rst
graph to a random
variable whose possible values are the nodes of the second
graph. In 2005, Cesar et al.80
formulate the inexact graph homomorphism as a discrete
optimization problem, and
compare beam search, genetic algoritms and EDA for solving this
problem. A dierent
approach, also based on a probabilistic framework, is proposed
by Caelli and Caetano
in 200517; the matching is formulated as an inference problem on
a Hidden Markov
Random Field (HMRF), for which an approximate solution is
computed.
The 2004 paper by Dickinson et al.38 denes a graph similarity
measure for the
special case of graphs having unique node labels, and proposes a
hierarchical algo-
rithm to eciently compute this measure. He et al.72 in 2004
propose an ad hoc
matching algorithm for skeleton graphs, that performs a
linearization of the graphs,
and then uses string matching to nd an inexact correspondence. A
similar approach
P. Foggia, G. Percannella & M. Vento
1450001-12
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
is presented in the paper by Das et al. 35 in 2012 for graphs
obtained by ngerprints.
In 2008, Gao et al.56 introduce a Graph Distance algorithm for
the special case of
graphs whose nodes represent points in a 2D space, based on the
Earth Mover
Distance (EMD). The 2009 paper by Emms et al.44 presents an
original approach to
graph matching based on quantum computing, that uses the
inherent parallelism of
some quantum physics phenomena if run on a (hypothetical)
quantistic computer.
3. Other Problems
In this section, we will present some recent developments on
graph problems that are
not, in a strict sense, forms of graph matching, but are related
to matching either
because they provide a way of comparing two graphs (this is the
case for graph
embeddings and graph kernels), or because they use a graph-based
approach to group
input patterns into classes (in an unsupervised way for graph
clustering, and in a
supervised or semi-supervised way for graph learning). We also
mention some works
on other graph-related problems which are of specic interest as
Pattern Recognition
basic tools, such as dimensionality reduction.
Graph embeddings and graph kernels are perhaps the most
signicant novelty in
graph-based Pattern Recognition in the recent years. Although
seminal works on
these elds were already present in earlier literature, it is in
the last decade that these
techniques have gained popularity in the Pattern Recognition
community. Gaertner
et al.55 presents an early survey on kernels applied to
nonvectorial data. Bunke
et al.12 in 2005 present a survey of graph kernels and other
graph-related techniques.
Bunke and Riesen14 in their 2011 paper propose a useful review
on the topic of graph
kernels and graph embeddings; the same authors in 201215 extend
this review and
present these techniques as a way to unify the statistical and
structural approaches
in Pattern Recognition. Please note that, although it may seem
that graph embed-
dings and kernels could help reducing the computational
complexity of graph com-
parison, many of the proposed algorithm have a cost that is
equal to or higher than
traditional matching methods (for instance, some embedding
methods require
computing the GED, while others involve a cost that is related
to the number of
graphs in the considered set). The main benet of the novel
techniques is instead in
the availability of the large corpus of theoretically sound
techniques from statistical
Pattern Recognition.
3.1. Graph embeddings
In the literature the term Graph embedding is used with two
slightly dierent
meanings:
. a technique that maps the nodes of a graph onto points in a
vector space, in such a
way that nodes having similar structural properties (e.g. the
structure of their
neighborhood) will be mapped onto points which are close in this
space;
Graphs in Pattern Recognition in the Last 10 Years
1450001-13
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
. a technique that maps whole graphs onto points in a vector
space, in such a way
that similar graphs are mapped onto close points (see Fig.
2).
References 16, 18, 45 and 130, discussed previously in Sec. 2.2,
are an example of
the rst kind; also, the Dot Product Representation of Graphs139
mentioned in
Sec. 2.2 belongs to this category. Yan et al.175 show in their
2007 paper that most
commonly used dimensionality reduction techniques can be
formulated as a graph
embedding algorithm of this kind. Their work is the basis for an
embedding tech-
nique proposed by You et al.,177 called General Solution for
Supervised Graph
Embedding (GSSGE).
In the following subsections, we will mainly concentrate on the
second kind of
graph embedding, presenting the relevant methods categorized
according to the
main properties they attempt to preserve in the mapping.
3.1.1. Isometric embeddings
Methods in this category start from a distance or similarity
measure between graphs,
and attempt to nd a mapping to vectors that preserve this
measure.
Bonabeau,6 in a 2002 paper, proposes a technique based on a
Self-Organizing Map
(SOM), an unsupervised neural network adopting competitive
learning, in order to
map graphs onto a bidimensional plane. Although the term
embedding is not ex-
plicitly used, it can be considered a form of graph embedding.
The mapping found by
the network is used both as an aid for the visualization of the
data represented by the
graphs, and for clustering.
Also the 2003 paper by de Mauro et al.36 uses a Neural Network
for graph
embedding. In particular, the proposed method works on directed
acyclic graphs, and
uses a Recursive Neural Network. The network is trained by
similarity learning: the
training set is made by pairs of graphs which have been manually
labeled with a
Fig. 2. Graph Embedding: the mapping between graphs and points
in a vector space is represented by thegraph name.
P. Foggia, G. Percannella & M. Vento
1450001-14
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
similarity value, and the network aims to produce an output
vector for each graph so
that the Euclidean distance between vectors is consistent with
the similarity between
the corresponding graphs.
A recent paper by Jouili and Tabbone79 proposes a graph
embedding technique
based on constant shift embedding, a framework proposed for the
embedding of
nonmetric spaces, mainly applied to clustering problems.
3.1.2. Spectral embeddings
The embedding algorithms in this subsection are based on the
exploitation of spec-
tral properties of graphs, i.e. properties related to the
eigenvalues and eigenvectors of
matrices representing the graphs, such as the adjacency matrix.
Since spectral
properties are invariant with respect to node permutations, they
ensure that graphs
with an isomorphic structure will be mapped to the same
vectors.
Luo et al.101 in a 2003 paper propose the use of spectral
features for graph
embedding; in particular, they decompose the adjacency matrix of
a graph into its
principal eigenmodes, and then compute from them a vector of
numerical features
(e.g. eigenmode volume, eigenmode perimeter, inter-eigenmode
distances, etc.).
Also the 2005 paper by Wilson et al.170 uses spectral properties
to dene a graph
embedding; in this case, the authors derive a set of polynomials
from the spectral
decomposition of the Laplacian of the adjacency matrix, and use
the coecients of
these polynomials as feature vectors.
Also the 2009 paper by Xiao et al.172 proposes a graph embedding
based on
spectral properties; in particular the method uses the heat
kernel, i.e. the solution of
the heat equation on the graph, to obtain a set of invariant
properties used to obtain
a vector representation of the graph.
Xiao et al.173 in a 2011 paper present an embedding for
hierachical graphs,
obtained by a hierarchical segmentation of images. Spectral
features are computed
on the levels of the hierarchy, obtaining a xed size feature
vector.
3.1.3. Subpattern embeddings
These methods are based on the detection, or the enumeration, of
specic types of
subpatterns within the graphs to be embedded.
Torsello and Hancock158 in 2007 propose an embedding algorithm
for trees. The
algorithm requires that all the trees to be embedded are known
in advance. The
embedding is based on the construction of a Union Tree, which is
a directed, acyclic
graph having all the considered trees as subgraphs; then each
tree is represented by a
vector that encodes which nodes of the Union Tree are used by
the tree.
Czech33 proposes in a 2011 paper an embedding method based on
B-matrices,
which are a structure based on the path lengths between the
nodes of a graph and are
invariant with respect to node permutations.
A recent paper by Luqman et al.103 presents a fuzzy multilevel
embedding tech-
nique, that combines structural information of the graph and
information from the
Graphs in Pattern Recognition in the Last 10 Years
1450001-15
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
graph attributes using fuzzy histograms. The method uses an
unsupervised learning
phase to nd the fuzzy intervals used in the representation.
In a 2011 paper, Gibert et al.60 present a graph embedding based
on graphs of
words, which are an extension of the popular bag of words
approach. The method
assumes that the graphs are obtained from images, with nodes
corresponding to
salient points, and node attributes corresponding to visual
descriptors of the points.
The method performs a quantization of the attribute space,
constructing a codebook.
This codebook is used to produce an intermediate graph, called
graph of words,
whose nodes are the codebook values, and whose edges correspond
to the adjacency
in the original graph of nodes mapped to those codebook values.
The nodes and edges
of the intermediate graph are labeled with the counts of the
corresponding nodes/
edges of the original graph; then an histogram of these counts
is used as the
embedding. Two 2012 papers by the same authors further develop
this method: in
Ref. 62 the authors add a more Sophisticated procedure for
constructing the code-
book, while in Ref. 61 they use a large set of features and
apply a feature selection
algorithm to determine the most signicant ones. The same
authors, in a 2013
paper,63 propose a somewhat similar embedding technique, that
removes the
assumptions that the graphs are obtained from images, and
exploits also edge
attributes if they are present.
The 2010 paper by Richiardi et al.122 proposes two graph
embedding techniques
specically tailored for graphs having the following constraints:
the number of nodes
is xed across all the considered set of graphs, and a total
ordering is dened in the
set of nodes. The authors show that a graph embedding exploiting
these constraints
can outperform a more general one.
3.1.4. Prototype-based embeddings
These embedding methods assume that a set of prototype graphs is
available, and
the mapping of a graph onto a vector space is based on the
distances (obtained
according to a suitably dened distance function) of the graph
from the prototypes.
This technique can be seen as a special case of the
dissimilarity representations
introduced by Pekalska and Duin.117
The rst of these methods has been proposed in 2007 by Riesen et
al.127 The
method has one prototype graph for each dimension of the vector
space; the corre-
sponding component of the vector is simply dened as the GED
between the pro-
totype and the graph to be embedded. The authors discuss several
strategies for
choosing the prototypes from a training set, and evaluate them
by using the
embedding for several classication tasks. In the same year, a
paper by Riesen and
Bunke124 further develops this idea by proposing the use of
several sets of randomly
chosen prototypes, and combining the classiers obtained for each
of the corre-
sponding embeddings to form aMultiple Classier System. The
advantage is that the
resulting classier is more robust with respect to the risk of a
poor choice for the
prototypes. A 2009 paper by Lee and Duin93 explores a similar
idea, but instead of a
P. Foggia, G. Percannella & M. Vento
1450001-16
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
random selection of the prototypes, the proposed method creates
dierent base
classiers by using node label information for extracting dierent
sets of subgraphs
from the training set. In 2010, Lee et al.94 propose a similar
method in which, instead
of extracting subgraphs, the node label information is used to
alter the training
graphs without changing their size.
In a 2009 paper, Riesen and Bunke123 present a Lipschitz
embedding for graphs.
Lipschitz embedding is usually employed to regularize vector
spaces, but in this case
it is proposed as a method to construct a graph embedding.
Basically, each com-
ponent of the vector representation of a graph is deduced from a
set of prototype
graphs; the value of the component is the mean distance (using
GED) with the
corresponding set of prototypes (a dierent aggregation function
than the mean
could be used). The sets of prototypes are constructed using a
clustering of a training
set, based on theK-Medoids clustering algorithm. The same
authors in another 2009
paper126 propose a method for reducing the dimensionality of
this embedding, by
using Principal Component Analysis and Linear Discriminant
Analysis. Bunke and
Riesen13 in 2011 propose an extension to this technique, which
formulates the
problem of choosing the reference graphs as a feature selection:
a rst embedding is
built using a large number of reference graphs; then a feature
selection algorithm is
applied to the obtained vectors in order to select the most
signicant features, and
only the reference graphs corresponding to these features are
retained.
Also the 2012 paper by Borzeshi et al.8 addresses the problem of
selecting the
reference graphs for graph embedding. The authors present
several algorithms which
are based on a discriminative approach: they dene several
objective functions to
measure how much the prototypes are able to discriminate between
classes, and
select the prototypes by a greedy optimization of these
functions.
3.2. Graph kernels
A graph kernel is a function that maps a couple of graphs onto a
real number, and
has similar properties to the dot product dened on vectors. More
formally, if we
denote with G the space of all the graphs, a graph kernel is a
function k such as:
k : GG! R; 10kG1;G2 kG2;G1 8G1;G2 2 G; 11
Xni1
Xnj1
ci cj kGi;Gj 0 8G1; . . . ;G2 2 G; 8 c1; . . . ; cn 2 R: 12
Equation (11) requires the function k to be symmetric, while Eq.
(12) requires it to be
positive semi-denite.
Informally, a graph kernel can be considered as a measure of the
similarity be-
tween two graphs; however its formal properties allow a kernel
to replace the vector
dot product in several vector-based algorithms that use this
operator (and other
functions related to dot product, such as the Euclidean norm).
Among the many
Graphs in Pattern Recognition in the Last 10 Years
1450001-17
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
Pattern Recogniton techniques that can be adapted to graphs
using kernels we
mention Support Vector Machine classiers and Principal Component
Analysis.
Kernels have been used for a long time to extend to the
nonlinear case linear
algorithms working on vector spaces, thanks to the Mercer's
theorem: given a kernel
function dened on a compact Hausdor space X, there is a vector
space V and a
mapping between X and V such that the value of the kernel
computed on two points
in X is equal to the dot product of the corresponding points in
V . Thus a kernel can
be seen as an implicit way of performing an embedding into a
vector space. Although
Mercer's theorem does not apply to graph kernels, in practice
these latter can be used
as a theoretically sound way to extend a vector algorithm to
graphs. Of course, the
performance of these algorithms strongly depend on the
appropriateness (with re-
spect to the task at hand) of the notion of similarity embodied
in the graph kernel.
In their 2003 paper, Kashima et al. 82 specialize for the graph
domain the idea of
marginalized kernels, a probabilistic technique for dening a
kernel based on the
introduction of hidden variables. In this case, the hidden
variable is a sequence of
node indices, generated according to a random walk on one of the
graphs. Given a
value for the hidden variable, a kernel on sequences is computed
using the sequence
of visited nodes and edges; the marginalized kernel is obtained
by computing the
expected value (with respect to the joint distribution of the
hidden and visible
variables) of this sequence kernel. Mahe and Vert105 in 2009
extend this technique to
trees, and present an application to molecular data.
Borgwardt and Kriegel7 in 2005 present a graph kernel that is
based on paths,
instead of walks (a path is a walk without repeated nodes); in
order to avoid the
exponential cost of enumerating all the paths in a graph, the
authors propose a
scheme to use only the shortest path between any pair of nodes,
since the shortest
paths can be computed in polynomial time.
Neuhaus and Bunke,112 in their 2006 paper, dene three graph
kernels based on
GED. The rst kernel requires the choice of a zero pattern, a
graph that, with respect
to the kernel, will behave similarly to a null vector. The
authors show that this kernel
fulls the theoretical requirements of a kernel function, but its
practical performance
is strongly aected by the choice of the zero pattern. The
authors then introduce two
other kernels, obtained from the sum and the product of the rst
kernel over a set of
zero patterns, and show that they have the same theoretical
properties, but are more
robust with respect to the choice of these patterns.
In their 2009 paper, Neuhaus et al. 114 present three possible
ways to use GED in
the denition of a kernel. The rst way is a diusion kernel, which
turns an edit
distance matrix into a positive denite matrix satisfying the
kernel properties, but
has the inconvenience that the set of graphs to which it is
applied must be nite and
known a priori. The second way is a convolution kernel, which is
based on a de-
composition of the edit path between the two graphs into a
sequence of substitution
operations; given a kernel for individual substitutions, this
approach provides a
denition for a kernel between two graphs. The main drawback is
the exponential
P. Foggia, G. Percannella & M. Vento
1450001-18
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
complexity with respect to the number of nodes, for which the
authors suggest an
approximation. The third way is a random walk kernel, where the
GED is used to
dene a fuzzy product graph, from which a kernel is obtained that
evaluates the local
similarity of corresponding parts of the two graphs.
The 2012 paper by Gazere et al.59 presents two graph kernels.
The rst, calledLaplacian kernel, is based on the GED (approximated
using the algorithm by Riesen
and Bunke125). The product operation derived from the GED is not
guaranteed to be
positive denite, and so does not have the formal properties of a
kernel; so the
authors propose a technique to obtain from the distance matrix a
positive denite
matrix, which is then used as the kernel. The second kernel,
called the treelet kernel,
is based on treelets, which are all the possible trees having
less than a xed number of
nodes (in the papers, treelets up to six nodes are considered);
the kernel is computed
by counting the occurrences of each treelet in the graphs. This
kernel can only be
used for unattributed graphs, while the Laplacian kernel can
also be employed for
graphs having node and edge attributes. The same authors in Ref.
58 propose a
kernel that is also based on treelets, but instead of simply
counting their occurrences,
uses a treelet edit distance to compare the treelets in one
graph with those in the
other one, so as to be tolerant with respect to slight
deformations of the graphs.
Grenier et al.66 in their 2013 paper propose a dierent
treelet-based kernel, speci-
cally devised for chemioinformatics applications, that
incorporates also information
on the position of each treelet within the graph.
Shervashidze et al.,145 in their 2009 paper, present a kernel
based on the use of
graphlets, that is all the possible graphs having less than a
xed number of nodes.
Also the graphlet kernel, as the previously mentioned treelet
kernel, has the limi-
tation of being applicable only to unlabeled graphs. The paper
considers graphlets up
to ve nodes, and propose two dierent techniques to reduce the
computational cost
of nding all the occurence of the graphlets in a large graph:
the rst is a probabilistic
technique based on sampling, that replaces the exact number of
graphlets with an
estimate that is ensured to converge in probability to the true
value; the second
technique is applicable only to bounded-valence graphs, and it
is based on an ecient
algorithm for enumerating, on this kind of graphs, all the paths
up to a xed length.
An extension of the idea of graphlet kernels is introduced by
Kondor et al. in
2009.88 The authors dene a set of graph invariants, called the
graphlet spectrum,
based on the generalization of Fourier transforms over
permutation groups. The
kernel based on these invariants has the advantages of being
applicable to labeled
graphs, and of taking into account the position of the graphlets
within the larger
graph, and not only their frequency of occurrence.
Bai and Hancock, in a 2013 paper,3 dene a novel kernel based on
the Jensen
Shannon divergence, which is an information theoretic measure of
entropy. To
apply this measure to graphs, the authors derive from each graph
a probability
distribution, based on the random walks on the graphs. Rossi et
al. in their paper133
propose an evolution of this method, dening a kernel that is
similarly based on the
Graphs in Pattern Recognition in the Last 10 Years
1450001-19
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
JensenShannon divergence, but uses continuous-time quantum walks
instead of
classical random walks.
In a 2011 paper, Strug153 proposes a kernel specically devised
for hierarchical
graphs, which is based on the combination of a tree kernel with
a classical graph
kernel.
Lozano and Escolano propose the use of graph kernels in a
slightly dierent
meaning as an aid to improve the performance of other operations
on graphs. For
instance, in Ref. 99 a kernelized version of the classical
Graduated Assignment
Graph Matching algorithm by Gold and Rangarajan, yielding an
improvement in the
accuracy and the robustness to noise of the matching. In Ref.
100 the same authors
adopt a kernel for dening a graph-matching cost function, that
is then used for a
kernelized version of two existing matching algorithms; in this
paper the authors also
dene a kernel-based algorithm for constructing a prototype graph
from a set of
graphs, using this technique for graph clustering.
A recent paper by Lee et al.92 investigates the dierent impact
of structural
information and graph attributes within a graph kernel, using a
kernel based on the
shortest paths, modied so as to have the possibility of changing
the relative weights
of the two kinds of information. The authors show experimentally
that these two
kinds are essentially dierent, and can reinforce each other. A
similar conclusion
investigation, with the same conclusion, is reached using a GED
for comparing the
graphs.
3.3. Graph clustering
The term Graph clustering is actually used in the literature
with two dierent and
unrelated meanings, which may be both of interest for
researchers working in Pattern
Recognition eld: in the rst sense, graphs are used to represent
each of the objects to
be clustered, so the clustering is performed on a set of graphs
(see Fig. 3). In the second
sense, which is the most frequently encountered, a single graph
is used to represent the
structure of the space to which the objects belong, with a node
for each object, and
edges encoding the relationships between pairs of objects
(usually a similarity or a
distance measure is associated with each edge); in this case the
clustering is performed
by partitioning the set of nodes of the graph according to some
criterion (see Fig. 4). In
order to dierentiate between the twomeanings of the term, we
will speak of clustering
of graphs when referring to the rst sense, and graph-based
clustering when referring
to the second one. This latter problem is related to graph-based
segmentation, which
is a wide eld of research that is not included in this
survey.
3.3.1. Clustering of graphs
Regarding the clustering of graphs, Gnter and Bunke69 in 2002
present an extensionto graphs of the Unsupervised Learning Vector
Quantization (LVQ ). The algorithm
uses GED to evaluate the distance between an input graph and a
cluster proto-
type, and an original algorithm, also based on GED, that
computes the weighted
P. Foggia, G. Percannella & M. Vento
1450001-20
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
combination of two graphs (by determining the minimal set of
edit operations to
transform the rst graph into the second, and then choosing a
subset of these
operations depending on the weight), which is used for updating
the winning pro-
totype. In 2003 the same authors70 propose an extension of this
method, by intro-
ducing a set of clustering validation indices to choose the
optimal number of LVQ
nodes.
Serratosa et al.141 propose an algorithm for the clustering of
graphs based on
Function-Described Graphs, which are Attributed Relational
Graphs extended with
Fig. 3. An example of graph clustering in the rst meaning
(clustering of graphs): each of the objects to
be clustered is represented by a graph.
Fig. 4. An example of graph clustering in the second meaning
(graph-based clustering): the clustering is
performed by partitioning the set of nodes of a single
graph.
Graphs in Pattern Recognition in the Last 10 Years
1450001-21
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
information about constraints on the joint probabilities of
nodes and edges. The
algorithm is based on an incremental, hierarchical clustering
strategy.
Also the 2011 paper by Jain and Obermayer77 presents a method
for the clus-
tering of graphs based on the Vector Quantization with the
k-Means algorithm. The
proposed algorithm uses an embedding of graphs into Riemannian
orbitfolds, based
on GED, to perform the quantization. The authors present an
extensive discussion
of the theoretical properties of the proposed approach,
providing some necessary
conditions for optimality of the found clustering and for
statistical consistency;
the authors also discuss the impact of possible approximations
for reducing the
computational cost.
3.3.2. Graph-based clustering
Among the recent algorithms proposed for graph-based clustering,
the paper by
Guigues et al.68 in 2003 denes the so called cocoons, which are
connected subgraphs
characterized by the fact that the maximum dissimilarity between
nodes within the
subgraph is less than the minimum dissimilarity between a node
within the subgraph
and an outside node. The authors demonstrate that the cocoons of
a graph form a
hierarchy, and dene an algorithm for constructing this
hierarchy, that can be used
for a hierarchical clustering of the nodes of the graph. The
same authors in Ref. 67
present a dierent method for obtaining a hierarchical
representation, applied to
image representation and segmentation.
The 2006 paper by Bras Silva et al.10 proposes an algorithm that
is based on the
graph coloring problem. Graph coloring involves assigning labels
(called colors) to
the nodes of a graph so that adjacent nodes have dierent colors,
with the goal of
minimizing the total number of colors used. The proposed
clustering algorithm uses a
greedy coloring technique from the literature, and then uses the
resulting color as-
signment as an aid to decide how to aggregate the nodes into
clusters.
Grady and Schwartz65 in 2006 present a graph-based clustering
technique based
on continuous optimization. The functional to be minimized is
chosen so as to have a
linear optimization problem, which can be solved with less
computational cost and
more numerical stability than other functionals used in
graph-based clustering.
However, the algorithm requires the choice of a ground node,
which can aect the
resulting partition; the authors propose a criterion for xing
this node, but warn that
this might not yield the oprimal performance, and so the method
best suited for
applications where an interactive form of clustering is
required, allowing the user to
change the ground node until a satisfactory clustering is found.
A recent paper by
Couprie et al.30 presents a generalized energy functional, which
is demonstrated to be
equivalent, by choosing the appropriate parameter values, to
several optimization-
based techniques used for clustering and segmentation, such as
graph cuts.
The 2006 paper by Frnti et al.54 proposes a graph-based
technique, which usesan approximate nearest neighbor graph, to
speed up an agglomerative clustering
algorithm.
P. Foggia, G. Percannella & M. Vento
1450001-22
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
Dhillon et al.37 in 2007 propose a multilevel graph-based
clustering algorithm that
exploits a theoretical relation between some kernels and some
graph-based spectral
clustering algorithms to perform the clustering with the same
properties of a spectral
method but without the computational cost of computing the
eigenvectors of the
graph.
Foggia et al.52 in 2008 propose a graph-based clustering that is
based on the
Minimum Spanning Tree, used in combination with the Fuzzy
C-Means (FCM )
algorithm to determine automatically the clustering threshold.
The method has been
further extended in Ref. 186.
The 2008 paper by Laskaris and Zafeiriou91 introduces a
graph-based clustering
algorithm that is based on FCM. Namely, FCM is used as
preprocessing step, with
the task of dividing the input data into a large number of
clusters (overclustering).
Then, the found clusters are used to construct a graph-based
representation, the
connectivity graph, with nodes corresponding to the cluster
centroids, and edges to
neighborhood relations among the clusters. The connectivity
graph is used with
several graph-based algorithms to nd a more accurate clustering,
choosing auto-
matically the optimal number of clusters, and for dimensionality
reduction.
Zanghi et al.180 propose in their 2008 paper a graph-based
clustering method based
on a probabilistic formulation of the problem, and using the
ErdsRenyi mixturemodel for random graphs. The EM algorithm is used
to solve the probabilistic
problem. An extension of this method is dened in Ref. 181; the
new algorithm adds
the ability to use node information (in the form of node feature
vectors) in addition to
edge information representing the similarity of the
corresponding data points.
The 2009 paper by Kim and Choi84 presents an algorithm for
graph-based clus-
tering that uses the decomposition of the graph into r-regular
subgraphs,
i.e. connected subgraphs whose nodes are adjacent to exactly r
other nodes. The
decomposition is reduced to a continuous optimization problem
and solved using
Linear Programming techniques. After the decomposition, a
renement step is used
to prune inconsistent edges and to remove outliers.
Wang et al.167 propose a clustering technique, called Integrated
KL clustering,
that is a hybrid between a traditional clustering approach (the
K-means algorithm)
and a graph-based clustering based on normalized graph cuts. The
method should be
convenient in situations where the input data are partly
described by a feature
vector, and partly by a set of similarity/dissimilarity
relations encoded using a graph
structure.
Mimaroglu and Erdil110 in a 2011 paper propose a graph-based
method for
combining the results of several clustering algorithms. The
method is given as input
the results of a set of clustering algorithms applied to the
same data; dierent
algorithms can be used, or the same algorithm with dierent
parameters. The
method builds a graph with nodes corresponding to data points,
and edges encoding
the number of clustering algorithms that have assigned two data
points to the same
cluster. Then, the nodes of this graph are clustered so as to
maximize the consensus
among the dierent clustering algorithms, using a greedy search
technique. The
Graphs in Pattern Recognition in the Last 10 Years
1450001-23
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
authors report that the nal clustering obtained by the method is
closer to a manual
partition of the data, and is less inuenced by the choice of
parameters than the
initial algorithms.
Nie et al.115 propose a graph-based clustering technique that
uses a new formu-
lation of the clustering problem, called the l1-norm graph
clustering, where the goal is
expressed as the minimization of the L1 norm of a suitable dened
vector; this
formulation should be more robust with respect to noise and
outliers.
The 2012 paper by Tabatabaei et al.154 presents a graph-based
clustering algo-
rithm where the clustering goal is formulated in terms of
minimizing the normalized
cut (Ncut ) metric. The clustering is performed using a greedy,
agglomerative algo-
rithm, followed by a renement procedure that evaluates the
opportunity of moving
the boundary nodes of each cluster to a neighboring one.
In 2012, Ducornau et al.41 propose a hierarchical algorithm for
hypergraph-based
clustering, that is a generalization of graph-based clustering.
The algorithm works by
performing a rst level partitioning of the nodes using a
spectral technique, and then
the obtained partition is recursively rened.
Shang et al.144 in their 2012 paper propose two graph-based
algorithms for the
co-clustering problem, which is aimed at nding at the same time
coherent subsets of
the datapoints and coherent subsets of the features used to
represent them. The
algorithms adopt iterative optimization schemes, based on graph
Laplacian.
3.4. Graph learning
Several learning methods use a graph-based structure as part of
the learning process.
In some cases, the individual patterns are represented by
graphs, and often also the
class descriptions have a graph-based representations; in such
cases, often some form
of graph matching is involved in the algorithm (see Fig. 5). In
other cases, a graph
structure represents the whole input space, with nodes
corresponding to individual
patterns, and edges representing some sort of proximity or
similarity relation. We
will use the terms learning of graphs for the rst case, and
graph-based learning for
the second.
3.4.1. Learning of graphs
In 2005, Neuhaus and Bunke111 present a method to learn a GED
using a Self-
Organizing Map. The method is given a set of graphs with class
labels, and learns the
edit costs so as to ensure that graphs in the same class have a
smaller GED than
graphs belonging to dierent classes. The same authors propose an
improved algo-
rithm for solving the same problem in a 2007 paper.113 In this
latter work, they
reformulate the learning of the graph edit costs in a
probabilistic framework, and use
the Expectation Maximization algorithm to optimize these costs
in the Maximum
Likelihood sense.
Also the paper by Serratosa et al.143 in 2011 proposes a method
for learning the
edit costs of a GED. In this case, the method is based on an
Adaptive Learning
P. Foggia, G. Percannella & M. Vento
1450001-24
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
paradigm, in which the system is sequentially given new graphs
and attempts to
classify them, and only if the class is dierent from the one a
human expert would
have chosen, a feedback is given to the system and it adapts the
edit costs. A 2011
paper by Sole-Ribalta and Serratosa148 provides a further
elaboration on this
method, by considering, once the edit costs are xed, the formal
properties of the
space of the possible matchings.
A similar problem is addressed by the 2012 paper by Leordeanu et
al.,96 where
learning (both supervised and unsupervised) is used to obtain
the parameters of a
graph matching algorithm based on spectral properties.
A 2008 paper by Maulik108 presents an algorithm for nding
repeated subgraphs
within large graphs, which can be considered an unsupervised
form of graph-based
learning. This can be used for data mining in domains suitable
to a structural re-
presentation, e.g. web pages or molecular databases. The
algorithm uses Evolu-
tionary Programming to perform the search, with a tness function
based on the
compression of the original graph attainable by the detection of
the repeated
substructure.
In their 2009 papers, Ferrer et al.49,50 propose an algorithm
for computing the
median graph, that is the graph within a set of graphs that
minimizes the sum of
graph distances from the other graphs. The computation of the
median graph can be
(a) (b) (c)
Fig. 5. An illustration of graph learning: (a) A set of objects
made of three dierent kinds of parts (circles,triangles,
rectangles); (b) the representation in terms of graphs (node
attributes are the kinds of parts,
while edges representing the only spatial relation \above" and
therefore they do not have attributes); (c)
the corresponding learned class description, a prototype
containing the common substructure: the questionmark on the
prototype nodes represents a generic value (a don't care) for the
corresponding attribute.
Graphs in Pattern Recognition in the Last 10 Years
1450001-25
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
considered a form of learning in the graph domain, since the
median graph can be
used as a prototype for a set of graphs. The proposed method is
based on genetic
algorithms, and performs a reduction of the search space by
exploiting a novel
theoretical bound on the sum of distances for the particular
graph distance measure
adopted (which is based on the maximum common subgraph).
Ferrer et al. present a dierent median graph algorithm in
2010,51 which is based
on the graph embedding technique by Riesen and Bunke.123 In
particular, the
method proposed by Ferrer et al. computes the median graph by
converting the
graphs into vectors using the cited graph embedding, nding the
median vector of
the set and then converting this vector back into a graph. In a
2011 paper,48 the same
authors present an improved procedure for performing this last
step of the algorithm.
Jain and Obermayer, in their 2010 paper75 discuss the mean and
the median of a
set of graphs, using a theoretical formulation based on
Riemannian orbitfolds, and
present some sucient conditions to ensure that the estimators of
the mean and the
median are consistent with an underlying probability
distribution of the graphs.
Also the paper by Raveaux et al.120 deals with learning a
prototype for a set of
graphs. Four dierent kinds of prototypes are considered:median
graphs, generalized
median graphs, discriminant graphs and generalized discriminant
graphs. Discrimi-
nant graphs are prototypes chosen so as to maximize the
performance of a Nearest
Neighbor classier over a labeled training set. The generalized
versions of median
graphs and discriminant graphs are obtained by lifting the
restriction that the
prototype must be a member of the training set. All the four
kinds of prototypes are
computed using a Genetic Algorithm, with dierent chromosome
encodings and
tness functions.
3.4.2. Graph-based learning
Culp and Michailidis,32 in their 2008 paper, propose a
semi-supervised learning al-
gorithm based on graphs. Semi-supervised learning is a form of
machine learning in
which only a subset of the training data has class labels. The
proposed method
assumes that the structure of the input space is described as a
graph, in which nodes
are the input samples and edges encode the neighborhood
relations; this graph
structure is used to assign a label to unlabeled training
samples during the learning
process. A similar technique is proposed by Elmoataz et al.43
for graph-based regu-
larization on weighted graphs.
Also the 2012 paper by Rohban and Rabiee132 is related to
graph-based semi-
supervised learning. In particular, the authors investigate the
preliminary step of
graph construction: given a set of datapoints in a metric space,
how a graph structure
can be constructed so that graph-based semi-supervised learning
can be applied
eectively; the authors propose a supervised graph construction
algorithm based on
the optimization of a smoothness functional and showing that the
use of neighbor-
hood graphs based on this method outperforms the k-NN technique
commonly used
for this task.
P. Foggia, G. Percannella & M. Vento
1450001-26
Int.
J. Pa
tt. R
ecog
n. A
rtif.
Inte
ll. 2
014.
28. D
ownl
oade
d fro
m w
ww
.wor
ldsc
ient
ific.
com
by 1
22.1
76.2
42.3
5 on
07/
12/1
5. F
or p
erso
nal u
se o
nly.
-
The 2012 paper by Wang et al.165 introduce a novel technique to
construct the
graph structure for graph-based semi-supervised learning. The
authors propose the
k-Regular Nearest Neighbor graph (k-RNN) instead of the more
common k-NN
graph. In the k-RNN graph, k is the average number of neighbors,
and the graph is
constructed so as to minimize total weights (representing
distances) of the edges. The
authors demonstrate the performance improvement of this
technique in conjunction
with the Manifold-ranking semi-supervised algorithm by Zhou et
al.183
Shiga and Mamitsuka146 in their 2012 paper present another
graph-based, semi-
supervised learning algorithm. The novel aspect of this proposal
is that it inte-
grates several graphs for representing dierent sources of
evidence regarding
the similarity of the input patterns. The algorithm