-
Submitted to International Journal of Computer Vision manuscript
No.(will be inserted by the editor)
Rigid-Motion Scattering for Texture Classification
Laurent Sifre · Stéphane Mallat
Received: date / Accepted: date
Abstract A rigid-motion scattering computesadaptive invariants
along translations and rotations,with a deep convolutional network.
Convolutionsare calculated on the rigid-motion group, withwavelets
defined on the translation and rotationvariables. It preserves
joint rotation and translationinformation, while providing global
invariants atany desired scale. Texture classification is
studied,through the characterization of stationary processesfrom a
single realization. State-of-the-art resultsare obtained on
multiple texture data bases, withimportant rotation and scaling
variabilities.
Keywords Deep network · scattering · wavelet ·rigid-motion ·
texture · classification
Work supported by ANR 10-BLAN-0126 and AdvancedERC
InvariantClass 320959
L. SifreCMAP Ecole PolytechniqueRoute de Saclay, 91128 Palaiseau
FranceE-mail: [email protected]
S. MallatDépartement d’informatiqueÉcole normale supérieure45
rue d’Ulm F-75230 Paris Cedex 05 France
1 Introduction
Image classification requires to find representationswhich
reduce non-informative intra-class variabil-ity, and hence which
are partly invariant, while pre-serving discriminative information
across classes.Deep neural networks build hierarchical
invariantrepresentations by applying a succession of linearand
non-linear operators which are learned fromtraining data. They
provide state of the art resultsfor complex image classifications
tasks (Hinton &Salakhutdinov, 2006; Lecun et al., 2010;
Sermanetet al., 2013; Krizhevsky et al., 2012; Dean et al.,2012). A
major issue is to understand the proper-ties of these networks,
what needs to be learned andwhat is generic and common to most
image classi-fication problems. Translations, rotations and
scal-ing are common sources of variability for most im-ages,
because of changes of view points and per-spective projections of
three dimensional surfaces.Building adaptive invariants to such
transformationis usually considered as a first necessary steps
forclassification (Poggio et al., 2012). We concentrateon this
generic part, which is adapted to the physicalproperties of the
imaging environment, as opposedto the specific content of images
which needs to belearned.
arX
iv:1
403.
1687
v1 [
cs.C
V]
7 M
ar 2
014
-
2 Laurent Sifre, Stéphane Mallat
This paper defines deep convolution scatteringnetworks which can
provide invariant to translationsand rotations, and hence to rigid
motions in R2. Thelevel of invariance is adapted to the
classificationtask. Scattering transforms have been introducedto
build translation invariant representations, whichare stable to
deformations (Mallat, 2012), with ap-plications to image
classification (Bruna & Mal-lat, 2013). They are implemented as
a convolutionalnetwork, with successive spatial wavelet
convolu-tions at each layer. Translations is a simple commu-tative
group, parameterized by the location of theinput pixels.
Rigid-motions is a non-commutativegroup whose parameters are not
explicitly given bythe input image, which raises new issues. The
firstone is to understand how to represent the joint infor-mation
between translations and rotations. We shallexplain why separating
both variables leads to im-portant loss of information and yields
representa-tions which are not sufficiently discriminative.
Thisleads to the construction of a scattering transform onthe full
rigid-motion group, with rigid-motion con-volutions on the joint
rotation and translation vari-ables. Rotations variables are
explicitly introducedin the second network layer, where
convolutionsare performed on the rigid-motion group along thejoint
translation and rotation variables. As opposedto translation
scattering where linear transforms areperformed along spatial
variables only, rigid-motionscattering recombines the new variables
created atthe second network layer, which is usually donein deep
neural networks. However, a rigid-motionscattering involves no
learning since convolutionsare computed with predefined wavelets
along spatialand rotation variables. The stability is guaranteed
byits contraction properties, which are explained.
We study applications of rigid-motion scatter-ing to texture
classification, where translations, ro-tations and scaling are
major sources of variability.Image textures can be modeled as
stationary pro-cesses, which are typically non Gaussian and
nonMarkovian, with long range dependencies. Texturerecognition is a
fundamental problem of visual per-ception, with applications to
medical, satellite imag-
ing, material recognition (Lazebnik et al., 2005; Xuet al.,
2010; Liu et al., 2011), object or scene recog-nition (Renninger
& Malik, 2004). Recognition isperformed from a single image,
and hence can notinvolve high order moments, because their
estima-tors have a variance which is too large. Finding
alow-variance ergodic representation, which can dis-criminate these
non-Gaussian stationary processes,is a fundamental probability and
statistical issue.
Translation invariant scattering representationof stationary
processes have been studied to dis-criminate texture which do not
involve important ro-tation or scaling variability (Bruna &
Mallat, 2013;Bruna et al., 2013). These results are extended
tojoint translation and rotation invariance. Invarianceto scaling
variability is incorporated through linearprojectors. It provides
effective invariants, whichyield state of the art classification
results on a largerange of texture data bases.
Section 2 reviews the construction of transla-tion invariant
scattering transforms. Section 2.4 ex-plains why invariants to
rigid motion can not becomputed by separating the translation and
rota-tion variables, without loosing important informa-tion. Joint
translation and rotation operators definesa rigid motion group,
also called special Euclideangroup. Rigid-motion scattering
transforms are stud-ied in Section 3. Convolutions on the
rigid-motiongroup are introduced in Section 3.1 in order to de-fine
wavelet tranforms over this group. Their prop-erties are described
in Section 3.2. A rigid-motionscattering iteratively computes the
modulus of suchwavelet transforms. The wavelet transforms
jointlyprocess translations and rotations, but can be com-puted
with separable convolutions along spatial androtation variables. A
fast filter bank implementationis described in Section 4, with a
cascade of spa-tial convolutions and downsampling. Invariant
scat-tering representations are applied to image
textureclassification in Section 5. State of the art results onfour
texture datasets containing different types andranges of
variability (KTH TIPS, 2004; UIUC Tex,2005; UMD, 2009; FMD, 2009).
All numerical ex-
-
Rigid-Motion Scattering for Texture Classification 3
periments are reproducible with the ScatNet (Scat-Net, 2013)
MATLAB toolbox.
2 Invariance to Translations, Rotations andDeformations
Section 2.1 reviews the property of translation in-variant
representations and their stability relativelyto deformations. The
use of wavelet transform isjustified because of their stability to
deformations.Their properties are summarized in Section 2.2.Section
2.3 describes translation scattering trans-forms, implemented with
a deep convolutional net-work. Separable extensions to translation
and ro-tation invariance is discussed in Section 2.4. It isshown
that this simple strategy leads to an impor-tant loss of
information.
2.1 Translation Invariance and DeformationStability
Building invariants to translations and small defor-mations is a
prototypical representation issue forclassification, which carries
major ingredients thatmakes this problem difficult. Translation
invarianceis simple to compute. There are many possiblestrategies
that we briefly review. The main difficultyis to build a
representation Φ(x) which is also stableto deformations.
A representation Φ(x) is said to be translationinvariant if
xv(u) = x(u− v) has the same represen-tation
∀v ∈ R2 , Φ(x) = Φ(xv) .
Besides translation invariance, it is often necessaryto build
invariants to any specific class of deforma-tions through linear
projectors. Invariant to transla-tion can be computed with a
registration Φx(u) =x(u− a(x)) where a(x) is an anchor point
whichis translated when x is translated. It means thatif xv(u) =
x(u− v) then a(xv) = a(x) + v. For ex-ample, a(x) = argmaxu |x ?
h(u)|, for some filterh(u). These invariants are simple and
preserve as
much information as possible. The Fourier trans-form modulus
|x̂(ω)| is also invariant to translation.
Invariance to translations is often not enough.Suppose that x is
not just translated but also de-formed to give xτ(u) = x(u− τ(u))
with |∇τ(u)| <1. Deformations belong to the infinite
dimensionalgroup of diffeomorphisms. Computing invariants
todeformations would mean losing too much infor-mation. In a digit
classification problem, a defor-mation invariant representation
would confuse a 1with a 7. We then do not want to be invariant to
anydeformations, but only to the specific deformationswithin the
digit class, while preserving informationto discriminate different
classes. Such deformationinvariants need to be learned as an
optimized linearcombinations.
Constructing such linear invariants requires therepresentation
to be stable to deformations. A repre-sentation Φ(x) is stable to
deformations if ‖Φ(x)−Φ(xτ)‖ is small when the deformation is
small.The deformation size is measured by ‖∇τ‖∞ =supu |∇τ(u)|. If
this quantity vanishes then τ is a“pure” translation without
deformation. Stability isformally defined as Lipschitz continuity
relativelyto this metric. It means that there exists C > 0
suchthat for all x(u) and τ(u) with ‖∇τ‖∞ < 1
‖Φ(x)−Φ(xτ)‖ ≤C‖∇τ‖∞ ‖x‖ . (1)
This Lipschitz continuity property implies thatdeformations are
locally linearized by the repre-sentation Φ . Indeed, Lipschitz
continuous operatorsare almost everywhere differentiable in the
sense ofGateau. It results that Φ(x)−Φ(xτ) can be approx-imated by
a linear operator of ∇τ if ‖∇τ‖∞ is small.A family of small
deformations thus generates a lin-ear space spanτ(Φ(xτ)). In the
transformed space,an invariant to these deformations can then be
com-puted with a linear projector on the orthogonal com-plement
spanτ(Φ(xτ))⊥.
Registration invariants are not stable to defor-mations. If x(u)
= 1[0,1]2(u)+ 1[α,α+1]2(u) then forτ(u) = εu one can verify that
‖x− xτ‖ ≥ 1 if |α|>ε−1. It results that (1) is not valid. One
can similarlyprove that the Fourier transform modulus Φ(x) =
|x̂|
-
4 Laurent Sifre, Stéphane Mallat
Fig. 1 Two images of the sametexture (left) from the
UIUCTexdataset (UIUC Tex, 2005) and thelog of their modulus of
Fouriertransform (right). The periodicpatterns of the texture
correspondsto fine grained dots on the Fourierplane. When the
texture is de-formed, the dots spread on theFourier plane, which
illustrates thefact that modulus of Fourier trans-form is unstable
to elastic defor-mation.
is not stable to deformations because high frequen-cies move too
much with deformations as can beseen on figure 1.
Translation invariance often needs to be com-puted locally.
Translation invariant descriptorswhich are stable to deformations
can be obtainedby averaging. If translation invariant is only
neededwithin a limited range smaller than 2J then it is suf-ficient
to average x with a smooth window φJ(u) =2−2Jφ(2−Ju) of width 2J
:
x?φJ(u) =∫
x(v)φJ(u− v)dv. (2)
It is proved in (Mallat, 2012) that if ‖∇φ‖1 < +∞and
‖|u|∇φ(u)‖1 < +∞ and ‖∇τ‖∞ ≤ 1− ε withε > 0 then there exists
C such that
‖xτ ?φJ−x?φJ‖≤C‖x‖(
2−J‖τ‖∞ +‖∇τ‖∞). (3)
Averaging operators lose all high frequencies,and hence
eliminate most signal information. Thesehigh frequencies can be
recovered with a wavelettransform.
2.2 Wavelet Transform Invariants
Contrarily to sinusoidal waves, wavelets are local-ized
functions which are stable to deformations.They are thus well
adapted to construct transla-tion invariants which are stable to
deformations. Webriefly review wavelet transforms and their
appli-cations in computer vision. Wavelet transform hasbeen used to
analyze stationary processes and imagetextures. They provide a set
of coefficients closelyrelated to the power spectrum.
A directional wavelet transform extracts thesignal
high-frequencies within different frequencybands and orientations.
Two-dimensional direc-tional wavelets are obtained by scaling and
rotatinga single band-pass filter ψ . Multiscale directionalwavelet
filters are defined for any j ∈ Z and rota-tion rθ of angle θ ∈
[0,2π] by
ψθ , j(u) = 2−2 jψ(2− jr−θ u) . (4)
If the Fourier transform ψ̂(ω) is centered at a fre-quency η
then ψ̂θ , j(ω) = ψ̂(2 jr−θ ω) has a sup-port centered at 2− jrθ η
, with a bandwidth propor-tional to 2− j. We consider a group G of
rotations rθwhich is either a finite subgroup of SO(2) or which
-
Rigid-Motion Scattering for Texture Classification 5
Fig. 2 The gaussian window φJ(left) and oriented and
dilatedMorlet wavelets ψθ , j (right). Sat-uration corresponds to
amplitudewhile color corresponds to com-plex phase.
is equal to SO(2). A finite rotation group is indexedby Θ =
{2kπ/K : 0 ≤ k < K} and if G = SO(2)then Θ = [0,2π). The wavelet
transform at a scale2J is defined by
Wx ={
x?φJ(u) , x?ψθ , j(u)}
u∈R2,θ∈Θ , j 0 is adjustedso that
∫ψ = 0. Morlet wavelets for π ≤ θ < 2π
are not computed since they verified ψθ+π, j = ψ∗θ , j,where z∗
denotes the complex conjugate of z. Theaveraging function is chosen
to be a Gaussian win-dow
φ(u) = (2πσ2)−1 exp(−u2/(2πσ2)) (7)
Figure 2 shows such window and Morlet wavelets.To simplify
notations, we shall write ∑θ∈Θ h(θ)
a summation over Θ even when Θ = [0,2π) in
which case this discrete sum represents the integral∫ 2π0 h(θ)dθ
. We consider wavelets which satisfy the
following Littlewood-Paley condition, for ε > 0 andalmost all
ω ∈ R2
1− ε ≤ |φ̂(ω)|2 + ∑j
-
6 Laurent Sifre, Stéphane Mallat
Fig. 3 Two images of the sametexture (left) and their
convolu-tion with the same Morlet wavelet(right). Even though the
textureis highly deformed, the waveletresponds to roughly the
sameoriented pattern in both images,which illustrates its stability
to de-formation.
ψθ , j(u) is translated. Removing the complex phaselike in a
Fourier transform defines a positive enve-lope |x ? ψθ , j(u)|
which is still covariant to trans-lation, not invariant. Averaging
this positive enve-lope defines locally translation invariant
coefficientswhich depends upon (u,θ , j):
S1x(u,θ , j) = |x?ψθ , j|?φJ(u) .
Such averaged wavelet coefficients are used un-der various forms
in computer vision. Global his-tograms of quantized filter
responses have been usedfor texture recognition in Leung &
Malik (2001).SIFT(Lowe, 2004) and DAISY(Tola et al.,
2010)descriptors computes local histogram of orientation.This is
similar to S1x definition, but with differ-ent wavelets and
non-linearity. Due to their stabil-ity properties, SIFT-like
descriptors have been usedextensively for a wide range of
applications wherestability to deformation is important, such as
keypoint matching in pair of images from different viewpoints, and
generic object recognition.
2.3 Transation Invariant Scattering
The convolution by φJ provides a local translationinvariance but
also loses spatial variability of thewavelet transform. A
scattering successively recov-ers the information lost by the
averaging whichcomputes the invariants. Scattering consists in a
cas-cade of wavelet modulus transforms, which can beinterpreted as
a deep neural network.
A scattering transform is computed by iteratingon wavelet
transforms and modulus operators. Tosimplify notations, we shall
write λ = (θ , j) andΛ = {(θ , j) : θ ∈ [0,2π]}. The wavelet
transformand modulus operations are combined in a singlewavelet
modulus operator defined by:
|W |x ={
x?φJ , |x?ψλ |}
λ∈Λ. (10)
This operator averages coefficients with φJ to pro-duce
invariants to translations and computes higherfrequency wavelet
transform envelopes which carrythe lost information. A scattering
transform can beinterpreted as a neural network illustrated in
Figure4 which propagates a signal x across multiple layers
-
Rigid-Motion Scattering for Texture Classification 7
x |W | |W | |W | |W |U1x U2x
S0x S1x S2x
. . .
Umx Um+1x
Smx
Fig. 4 Translation scattering can be seenas a neural network
which iterates overwavelet modulus operators |W |. Eachlayer m
outputs averaged invariant Smxand covariant coefficients Um+1x.
of the network and which outputs at each layer mscattering
invariant coefficients Smx.
The input of the network is the original signalU0x = x. The
scattering transform is then definedby induction. For any m ≥ 0,
applying the waveletmodulus operator |W | on Umx outputs the
scatter-ing coefficients Smx and computes the next layer
ofcoefficients Um+1x:
|W |Umx = (Smx ,Um+1x) , (11)
with
Smx(u,λ1, . . . ,λm) = Umx(.,λ1, . . . ,λm)?φJ(u)= | ||x?ψλ1 |?
. . . |?ψλm |?φJ(u)
and
Um+1 x (u,λ1, . . . ,λm,λm+1)= |Umx(.,λ1, . . . ,λm)?ψλm+1(u)|=
| ||x?ψλ1 |? . . . |?ψλm |?ψλm+1(u)|
This scattering transform is illustrated in Figure 4.The final
scattering vector concatenates all scatter-ing coefficients for 0≤
m≤M:
Sx = (Smx)0≤m≤M. (12)
A scattering tranform is a non-expansive opera-tor, which is
stable to deformations. Let ‖Sx‖ =∑m ‖Smx‖2, one can prove that
‖Sx−Sy‖ ≤ ‖x− y‖ . (13)
Because wavelets are localized and separate scalewe can also
prove that if x has a compact supportthen there exists C > 0
such that
‖Sxτ−Sx‖≤C‖x‖(
2−J‖τ‖∞+‖∇τ‖∞+‖Hτ‖∞).
(14)
Most of the energy of scattering coefficients isconcentrated on
the first two layers m= 1,2. As a re-sult, applications thus
typically concentrate on these
two layers. Among second layer scattering coeffi-cients
S2x(u,λ1,λ2) = ||x?ψλ1 |?ψλ2 |?φJ(u)
coefficients λ2 = 2 j2rθ2 with 2j2 ≤ 2 j1 have a small
energy. Indeed, |x?ψλ1 | has an energy concentratedin a lower
frequency band. As a result, we onlycompute scattering coefficients
for increasing scales2 j2 > 2 j1 .
2.4 Separable Versus Joint Rigid Motion Invariants
An invariant to a group which is a product of twosub-groups can
be implemented as a separable prod-uct of two invariant operators
on each subgroup.However, this separable invariant is often too
strong,and loses important information. This is shown
fortranslations and rotations.
To understand the loss of information producedby separable
invariants let us first consider thetwo-dimensional translation
group over R2. A two-dimensional translation invariant operator
appliedto x(u1,u2) can be computed by applying first atranslation
invariant operator Φ1 which transformsx(u1,u2) along u1 for u2
fixed. Then a second trans-lation invariant operator Φ2 is applied
along u2.The product Φ2Φ1 is thus invariant to any two-dimensional
translation. However, if xv(u1,u2) =x(u1− v(u2),u2) then Φ1xv = Φ1x
for all v(u2), al-though xv is not a translation of x because v(u2)
isnot constant. It results that Φx = Φxv. This sepa-rable operator
is invariant to a much larger set ofoperators than two-dimensional
translations and canthus confuse two images which are not
translationsof one-another, as in Figure 5 (left). To avoid
thisinformation loss, it is necessary to build a trans-lation
invariant operator which takes into accountthe structure of the
two-dimensional group. This is
-
8 Laurent Sifre, Stéphane Mallat
Fig. 5: (Left) Two images where each row of the second image is
translated by a different amount v(u1).A separable translation
invariant that would start by computing a translation invariant for
each row wouldoutput the same value, which illustrates the fact
that such separable invariants are too strong. (Right) Twotextures
whose first internal layer is translated by different values for
different orientations. In this example,vertical orientations are
not translated while horizontal orientations are translated by
1/2(1,1). Translationscattering and other separable invariants
cannot distinguish these two textures because it does not
connectvertical and horizontal nodes.
why translation invariant scattering operators in R2are not
computed as products of scattering operatorsalong horizontal and
vertical variables.
The same phenomena appears for invariantsalong translations and
rotations, although it is moresubtle because translations and
rotations interfere.Suppose that we apply a translation invariant
oper-ator Φ1, such as a scattering transform, which sepa-rate image
components along different orientationsindexed by an orientation
parameter θ ∈ [0,2π].Applying a second rotation invariant operator
Φ2which acts along θ produces a translation and ro-tation invariant
operator.
Locally Binary Pattern (Zhao et al., 2012) fol-lows this
approach. It first builds translation invari-ance with an histogram
of oriented pattern. Then, itbuilds rotation invariance on top, by
either poolingall patterns that are rotated versions of one
another,or by computing modulus of Fourier transform onthe angular
difference that relates rotated patterns.
Such separable invariant operators have the ad-vantage of
simplicity and have thus been used inseveral computer vision
applications. However, asin the separable translation case,
separable productsof translation and rotation invariants can
confusevery different images. Consider a first image, whichis the
sum of arrays of oscillatory patterns along
two orthogonal directions, with same locations. Ifthe two arrays
of oriented patterns are shifted asin Figure 5 (right) we get a
very different textures,which are not globally translated or
rotated one rel-atively to the other. However, an operator Φ1
whichfirst separates different orientation components andcomputes a
translation invariant representation in-dependently for each
component will output thesame values for both images because it
does nottake into account the joint location and
orientationstructure of the image. This is the case of
separablescattering transforms (Sifre & Mallat, 2012) or anyof
the separable translation and rotation invariant inused in (Xu et
al., 2010; Zhao et al., 2012).
Taking into account the joint structure of therigid-motion group
of rotations and translations inR2 was proposed by several
researchers (Citti &Sarti, 2006; Duits & Burgeth, 2007;
Boscain et al.,2013), to preserve image structures in
applicationssuch as noise removal or image enhancement
withdirectional diffusion operators (Duits & Franken,2011).
Similarly, a joint scattering invariant to trans-lations and
rotations is constructed directly on therigid-motion group in order
to take into accountthe joint information between positions and
orien-tations.
-
Rigid-Motion Scattering for Texture Classification 9
v1
v2
θ
x̃(v1,v2,θ ) ⋆y(r−θ .) ⋆̄ ȳ x̃ ⋆̃ ỹ
Fig. 6: A rigid-motion convolution (20) with a separable filter
ỹ(v,θ) = y(v)ȳ(θ) in SE(2) can be factorizedinto a two
dimensional convolution with rotated filters y(r−θ v) and a one
dimensional convolution with ȳ(θ).
3 Rigid-motion Scattering
Translation invariant scattering operators are ex-tended to
define invariant representations over anyLie group, by calculating
wavelet transforms on thisgroup. Such wavelet transforms are well
definedwith weak conditions on the Lie group. We con-centrate on
invariance to the action of rotations andtranslations, which belong
to the special Euclideangroup. Next section briefly reviews the
properties ofthe special Euclidean group. A scattering
operator(Mallat, 2012) computes an invariant image repre-sentation
relatively to the action of a group by ap-plying wavelet transforms
to functions defined onthe group.
3.1 Rigid-Motion Group
The set of rigid-motions is called the special Eu-clidean group
SE(2). We briefly review its prop-erties. A rigid-motion in R2 is
parameterized by atranslation v ∈ R2 and a rotation rθ ∈ SO(2) of
an-gle θ ∈ [0,2π). We write g = (v,θ). Such a rigid-motion g maps u
∈ R2 to
gu = v+ ru . (15)
A rigid-motion g applied to an image x(u) translatesand rotates
the image accordingly:
g.x(u) = x(g−1u) = x(r−1(u− v)) . (16)
The group action (15) must be compatible withthe product g′.(gu)
= (g′.g)u, so that successive ap-plications of two rigid-motions g
and g′ are equiv-alent to the application of a single product
rigid-motion g′.g. This combined to (15) implies that
g′.g = (v′+ rθ ′v, θ +θ ′) . (17)
This group product is not commutative. The neutralelement is
(0,0), and the inverse of g is
g−1 = (−r−θ v,−θ). (18)
The product (17) of SE(2) is the definition of thesemidirect
product of the translation group R2 andthe rotation group
SO(2):
SE(2) = R2 oSO(2) .
It is a Lie group, and the left invariant Haar measureof SE(2)
is dg = dvdθ , obtained as a product of theHaar measures on R2 and
SO(2).
The space L2(SE(2)) of finite energy measur-able functions
x̃(v,θ) is a Hilbert space
L2(SE(2)) ={
x̃ :∫R2
∫ 2π0|x̃(v,θ)|2 dθdv < ∞
}.
The left-invariant convolution of two functions x̃(g)and ỹ(g)
is defined by
x̃ ?̃ ỹ(g) =∫
SE(2)x̃(g′) ỹ(g′−1g)dg′ .
-
10 Laurent Sifre, Stéphane Mallat
Since (v′,θ ′)−1 = (−r−θ ′v′,−θ ′)
x̃ ?̃ ỹ(v,θ) =∫R2
∫ 2π0
x̃(v′,θ ′) ỹ(r−θ ′(v− v′), θ −θ ′)dv′dθ ′ .
(19)
For separable filters ỹ(v,θ) = y(v) ȳ(θ), this convo-lution
can be factorized into a spatial convolutionwith rotated filters
y(r−θ v) followed by convolutionwith ȳ(θ):
x̃ ?̃ ỹ(v,θ) =∫ 2π0
x̃(v′,θ ′)∫R2
y(r−θ ′(v− v′))dv′ ȳ(θ −θ ′)dθ ′ .
(20)
This is illustrated in Figure 6.
3.2 Wavelet Transform on the Rigid-Motion Group
A wavelet transform W̃ in L2(SE(2)) is defined asconvolutions
with averaging window and waveletsin L2(SE(2)). The wavelets are
constructed asseparable products of wavelets in L2(R2) and
inL2(SO(2)).
A spatial wavelet transform in L2(R2) is definedfrom L mother
wavelets ψl(u) which are dilatedψl, j(u) = 2−2 jψl(2− ju), and a
rotationally symmet-ric averaging function φJ(u) = 2−2Jφ(2−Ju) at
themaximum scale 2J :
Wx ={
x?φJ(u) , x?ψl, j(u)}
u∈R2,0≤l
-
Rigid-Motion Scattering for Texture Classification 11
∀ω ∈R2, 1−ε2 ≤ |φ̂(ω)|2+ ∑0≤l
-
12 Laurent Sifre, Stéphane Mallat
x |W | |W̃ | |W̃ | |W̃ |Ũ1x Ũ2x
S̃0x S̃1x S̃2x
. . .
Ũmx Ũm+1x
S̃mx
Fig. 7 Rigid-motion scattering is similarto translation
scattering of Figure 4, butdeep wavelet modulus operators |W | are
re-placed with rigid-motion wavelet modulusoperators |W̃ | where
convolutions are ap-plied along the rigid-motion group.
modulus operator |W̃ | on Ũmx outputs the scatter-ing
coefficients S̃mx and computes the next layer ofcoefficients
Ũm+1x:
|W̃ |Ũmx = (S̃mx , Ũm+1x) , (33)
with
S̃mx(g, j1,λ2, . . . ,λm)
= Ũmx(., j1,λ2, . . . ,λm) ?̃ φ̃J,K(g)
= | ||x?ψ., j1 | ?̃ ψ̃λ2 . . . ?̃ ψ̃λm | ?̃ φ̃J,K(g)
and
Ũm+1x(g, j1,λ2, . . . ,λm,λm+1)
= |Ũmx(., j1,λ2, . . . ,λm) ?̃ ψ̃λm+1(g)|= | ||x?ψ., j1 |? ψ̃λ2
. . . |? ψ̃λm |? ψ̃λm+1(g)|
(34)
This rigid-motion scattering transform is illustratedin Figure
7.
The final scattering vector concatenates all scat-tering
coefficients for 0≤ m≤M:
S̃x = (S̃mx)0≤m≤M. (35)
The following theorem proves that a scatteringtransform is a
non-expansive operator.
Theorem 2 For any M ∈N and any (x,y)∈L2(R2)
‖S̃x− S̃y‖ ≤ ‖x− y‖ . (36)
Proof: A modulus is non-expansive in the sensethat for any
(a,b)∈C2, ||a|−|b|| ≤ |a−b|. Since W̃is a linear non-expansive
operator, it results that thewavelet modulus operator |W̃ | is also
non-expansive
‖|W̃ |x−|W̃ |y‖ ≤ ‖x− y‖ .
Since W̃ is non-expansive, it results from (33) that
‖|W̃ |Ũmx−|W̃ |Ũmy‖
= ‖S̃mx− S̃my‖2 +‖Ũm+1x−Ũm+1y‖2
≤ ‖Ũmx−Ũmy‖2 .(37)
Summing this equation from m = 1 to M gives
M
∑m=1‖S̃mx− S̃my‖2 +‖ŨM+1x−ŨM+1y‖2
≤ ‖Ũ1x−Ũ1y‖2 .(38)
Since |W |x = (S0x,Ũ1x) which is also non-expansive, we get
‖S0x−S0y‖2 +‖Ũ1x−Ũ1y‖2 ≤ ‖x− y‖2 . (39)
Inserting (39) in (38) proves (36). �
4 Fast Rigid-Motion Scattering
For texture classification applications, first and sec-ond
layers of scattering are sufficient for achievingstate-of-the-art
results. This section describes a fastimplementation of
rigid-motion scattering based ona filter bank implementation of the
wavelet trans-form.
4.1 Wavelet Filter Bank Implementation
Rigid-motion scattering coefficients are computedby applying a
spatial wavelet tranform W and then arigid-motion wavelet tranform
W̃ . This section de-scribes filter bank implementations of the
spatialwavelet transform.
-
Rigid-Motion Scattering for Texture Classification 13
x h ↓ 2h ↓ 2h ↓ 2
g0g0g0
g1g1g1
A1x A2xA3x
B0,0x
B1,0x
B0,1x
B1,1x
B0,2x
B1,2x
Fig. 8 Filter bank implementation of thewavelet transform W with
J = 3 scales andC = 2 orientations. A cascade of low pass filterh
and downsampling computes low frequenciesA jx = x ? φ j and filters
gθ compute high fre-quencies Bθ , jx = x ?ψθ , j . This cascade
resultsin a tree whose internal nodes are intermediatecomputations
and whose leaves are the outputof the downsampled wavelet
transform.
A wavelet tranform
Wx ={
x?φJ(u) , x?ψθ , j(u)}
u∈R,θ∈Θ , j
-
14 Laurent Sifre, Stéphane Mallat
x̃ h ↓ 2h ↓ 2
g0,0g0,0
g0,1g0,1
g1,0g1,0
g1,1g1,1
h̄ ↓ 2h̄ ↓ 2h̄ ↓ 2h̄ ↓ 2
h̄ ↓ 2h̄ ↓ 2
h̄ ↓ 2h̄ ↓ 2h̄ ↓ 2h̄ ↓ 2
ḡḡḡḡ
ḡḡ
ḡḡḡḡ
A1 x̃ A2 x̃
B0,0x̃ B0,1x̃
B1,0x̃ B1,1x̃
C2,2 x̃
D2,1x̃
E0,0,2x̃ E0,1,2x̃
E1,0,2x̃ E1,2,1x̃
F0,0,2x̃ F0,2,1x̃
F1,0,2x̃ F1,2,1x̃
Fig. 9: Filter bank implementation of the rigid-motion wavelet
transform W̃ with J = 2 spatial scales, C = 2orientations, L = 2
spatial wavelets, K = 2 orientation scales . A first cascade
computes spatial downsam-pling and filtering with h and gl,θ . The
first cascade is a tree whose leaves are AJ x̃ and Bl, j x̃. Each
leaf isretransformed with a second cascade of downsampling and
filtering with h̄ and ḡ along the orientation vari-able. The
leaves of the second cascade are CJ,K x̃, DJ,kx̃ (whose ancestor is
AJ x̃) and El, j,K x̃, Fl, j,kx̃ (whoseancestors are the Bl, j x̃).
These leaves constitute the output of the downsampled rigid-motion
wavelet trans-form. They correspond to signals x̃ ?̃ φ̃J,K x̃, x̃
?̃ ψ̃J,k, x̃ ?̃ ψ̃l, j,K , x̃ ?̃ ψ̃l, j,k appropriately downsampled
along thespatial and the orientation variable.
rigid-motion wavelet transform coefficients:
CJ,K x̃(n,θ) = x̃ ?̃ φ̃J,K(2Jn,2Kθ)DJ,kx̃(n,θ) = x̃ ?̃
ψ̃J,k(2Jn,2kθ)
El, j,K x̃(n,θ) = x̃ ?̃ ψ̃l, j,K(2 jn,2Kθ)Fl, j,kx̃(n,θ) = x̃ ?̃
ψ̃l, j,k(2 jn,2kθ) .
These subsampled coefficients are initialized fromA and B with
CJ,0x̃ = AJ x̃ and El, j,0x̃ = Bl, j x̃. Wecompute them by
induction
CJ,k+1x̃ = (CJ,kx̃ ?̄ h̄)↓2DJ,kx̃ = CJ,kx̃ ?̄ ḡ
El, j,k+1x̃ = (E j,l,kx̃ ?̄ h̄)↓2Fl, j,kx̃ = E j,l,kx̃ ?̄ ḡ
where ?̄ , ↓, h̄, ḡ are the discrete convolution,
down-sampling, low pass and high pass filters along theorientation
variable θ .
The first spatial cascade computes CL convo-lutions at each
spatial resolution, which requiresO(CLNP) operations and O(CLN)
memory. Eachleaf is then retransformed by a cascade along
theorientation variable θ of cardinality C. Convolu-tions along the
orientations are periodic and sincethe size of the filter h̄, ḡ is
of the same order asC, we use FFT-based convolutions. One such
con-volution requires O(C logC) operations. One cas-cade of
filtering and downsampling along orienta-tions requires ∑k C2−k
log(C2−k) = O(C logC) timeand O(C) memory. There are O(LN) such
cascadesso that the total cost for processing along orientationis
O(CLN logC) operations and O(CLN) memory.Thus, the total cost for
the full rigid-motion wavelettransform W̃ is O(CLN(P+ logC))
operations andO(CLN) memory where C is the number of orienta-tions
of the input signal, L is the number of spatial
-
Rigid-Motion Scattering for Texture Classification 15
wavelets, N is the size of the input image, P is thesize of the
spatial filters.
5 Image Texture Classification
Image Texture classification has many applicationsincluding
satellite, medical and material imaging. Itis a relatively well
posed problem of computer vi-sion, since the different sources of
variability con-tained in texture images can be accurately
mod-eled. This section presents application of rigid-motion
scattering on four texture datasets contain-ing different types and
ranges of variability: (KTHTIPS, 2004; UIUC Tex, 2005; UMD, 2009)
texturedatasets, and the more challenging FMD (Sharan etal., 2009;
FMD, 2009) materials dataset. Results arecompared with
state-of-the-art algorithms in table 1,2, 3 and 4. All
classification experiments are repro-ducible with the ScatNet
(ScatNet, 2013) toolboxfor MATLAB.
5.1 Dilation, Shear and Deformation Invariancewith a PCA
Classifier
Rigid-motion scattering builds invariance to therigid-motion
group. Yet, texture images also un-dergo other geometric
transformations such as dila-tions, shears or elastic deformations.
Dilations andshears, combined with rotations and
translations,generates the group of affine transforms. One
candefine wavelets (Donoho et al., 2011) and a scat-tering
transform on the affine group to build affineinvariance. However
this group is much larger andit would involve heavy and unnecessary
computa-tions. A limited range of dilations and shears isavailable
for finite resolution images which allowsone to linearizes these
variations. Invariance to di-lations, shears and deformations are
obtained withlinear projectors implemented at the classifier
level,by taking advantage of the scattering’s stability tosmall
deformation. In texture application there istypically a small
number of training examples perclass, in which case PCA generative
classifiers can
perform better than linear SVM discriminative clas-sifiers
(Bruna & Mallat, 2013).
Let Xc be a stationary process representing a tex-ture class c.
Its rigid-motion scattering transformS̃Xc typically has a power law
behavior as a functionof its scale parameters. It is partially
linearized bya logarithm which thus improves linear classifiers.The
random process log S̃Xc has an energy which isessentially
concentrated in a low-dimensional affinespace
Ac = E(log S̃Xc)+Vc
where Vc is the principal component linear space,generated by
the eigenvalues of the covariance oflog S̃Xc having non-negligible
eigenvalues.
The expected value E(log S̃Xc) is estimated bythe empirical
average µc of the log S̃Xc,i for alltraining examples Xc,i of the
class c. To guaranteethat the scattering moments are partially
invariant toscaling, we augment the training set by dilating
eachXc,i by typically 4 scaling factors {1,
√2, 2, 2
√2}.
In the following, we consider {Xc,i}i as the set oftraining
examples augmented by dilation, which areincorporated in the
empirical average estimation µcof E(log S̃Xc).
The principal components space Vc is estimatedfrom the singular
value decomposition (SVD) of thematrix of centered training example
log S̃Xi,c − µc.The number of non-zero eigenvectors which can
becomputed is equal to the total number of training ex-amples. We
define Vc as the space generated by alleigenvectors. In texture
discrimination applications,it is not necessary to regularize the
estimation by re-ducing the dimension of this space because there
isa small number of training examples.
Given a test image X , we abusively denote bylog S̃X the average
of the log scattering transformof X and its dilated versions. It is
therefore a scaledaveraged scattering tranform, which provides a
par-tial scaling invariance. We denote by PVc log S̃X theorthogonal
projection of log S̃X in the scatteringspace Vc of a given class c.
The PCA classifica-tion computes the class ĉ(X) which minimizes
thedistance ‖(Id−PVc)(log S̃X−µc)‖ between S̃X and
-
16 Laurent Sifre, Stéphane Mallat
the affine space µc +Vc:
ĉ(X) = argminc‖(Id−PVc)(log S̃X−µc)‖2 (43)
The translation and rotation invariance of arigid-motion
scattering S̃X results from the spatialand angle averaging
implemented by the convolu-tion with φ̃J,K . It is nearly
translation invariant overspatial domains of size 2J and rotations
of angles atmost 2K . The parameters J and K can be adjusted
bycross-validation. One can also avoid performing anysuch averaging
and let the linear supervised classiferoptimize directly the
averaing. This last approaochis possible only if there is enough
supervised train-ing examples to learn the appropriate averaging
ker-nel. This is not the case in the texture experiments ofSection
5.2 where few training examples are avail-able, but where the
classification task is known tobe fully translation and rotation
invariant. The val-ues of J and K are thus maximum.
5.2 Texture Classification Experiments
This sections details classification results on im-age texture
datasets KTH-TIPS (KTH TIPS, 2004),UIUC (Lazebnik et al., 2005;
UIUC Tex, 2005) andUMD (UMD, 2009). Those datasets contains im-ages
with different range of variability for each dif-ferent geometric
transformation type. We give re-sults for progressively more
invariant versions ofthe scattering and compare with
state-of-the-art ap-proaches for all datasets.
Most state of the art algorithms use separable in-variants to
define a translation and rotation invariantalgorithms, and thus
lose joint information on posi-tions and orientations. This is the
case of (Lazebniket al., 2005) where rotation invariance is
obtainedthrough histograms along concentric circles, as wellas Log
Gaussian Cox processes (COX) (Nguyen etal., 2011) and Basic Image
Features (BIF) (Crosier& Griff, 2008) which use rotation
invariant patchdescriptors calculated from small filter
responses.Sorted Random Projection (SRP) (Liu et al., 2011)replaces
histogram with a similar sorting algorithm
Fig. 10: Each row shows images from the same tex-ture class in
the UIUC database (Lazebnik et al.,2005), with important rotation,
scaling and defor-mation variability.
and adds fine scale joint information between ori-entations and
spatial positions by calculating ra-dial and angular differences
before sorting. WaveletMultifractal Spectrum (WMFS) (Xu et al.,
2010)computes wavelet descriptors which are averagedin space and
rotations, and are similar to first orderscattering coefficients
S1x.
We compare the best published results (Lazeb-nik et al., 2005;
Nguyen et al., 2011; Crosier &Griff, 2008; Xu et al., 2010; Liu
et al., 2011) andscattering invariants on KTH-TIPS (table 1),
UIUC(table 2) and UMD (table 3) texture databases.For the KTH-TIPS,
UIUC and UMD database, Ta-bles 1,2,3 give the mean classification
accuracy andstandard deviation over 200 random splits
betweentraining and testing for different training sizes.
Clas-sification accuracy is computed with scattering
rep-resentations implemented with progressively moreinvariants, and
with the PCA classifier of Section5.1. As the training sets are
small for each class c,the dimension D of the high variability
space Vc isset to the training size. The space Vc is thus
gen-erated by the D scattering vectors of the trainingset. For
larger training databases, it must be adjustedwith a cross
validation as in (Bruna & Mallat, 2013).
-
Rigid-Motion Scattering for Texture Classification 17
Train size 5 20 40COX (Nguyen et al., 2011) 80.2±2.2 92.4±1.1
95.7±0.5BIF (Crosier & Griff, 2008) - - 98.5SRP (Liu et al.,
2011) - - 99.3Translation scattering 69.1±3.5 94.8±1.3
98.0±0.8Rigid-motion scattering 69.5±3.6 94.9±1.4 98.3±0.9+ log
& scale invariance 84.3±3.1 98.3±0.9 99.4±0.4
Table 1: Classification accuracy with standard deviations on
(KTH TIPS, 2004) database. Columns corre-spond to different
training sizes per class. The first few rows give the best
published results. The last rowsgive results obtained with
progressively refined scattering invariants. Best results are
bolded.
Training size 5 10 20Lazebnik (Lazebnik et al., 2005) - 92.6
96.0WMFS (Xu et al., 2010) 93.4 97.0 98.6BIF (Crosier & Griff,
2008) - - 98.8±0.5Translation scattering 50.0±2.1 65.2±1.9
79.8±1.8Rigid-motion scattering 77.1±2.7 90.2±1.4 96.7±0.8+ log
& scale invariance 93.3±1.4 97.8±0.6 99.4±0.4
Table 2 Classification accuracyon (UIUC Tex, 2005) database.
Training size 5 10 20WMFS (Xu et al., 2010) 93.4 97.0 98.7SRP
(Liu et al., 2011) - - 99.3Translation scattering 80.2±1.9 91.8±1.4
97.4±0.9Rigid-motion scattering 87.5±2.2 96.5±1.1 99.2±0.5+ log
& scale invariance 96.6±1.0 98.9±0.6 99.7±0.3
Table 3 Classification accuracyon (UMD, 2009) database.
Classification accuracy in Tables 1,2,3 are givenfor different
scattering representations. The rows“Translation scattering”
correspond to the scatter-ing described in Section 2.3 and
initially introducedin (Bruna & Mallat, 2013). The rows
“Rigid-motionscattering” replace the translation invariant
scatter-ing by the rigid-motion scattering of Section 3.3.Finally,
the rows “+ log & scale invariance” cor-responds to the
rigid-motion scattering, with a log-arithm non-linearity to
linearize scaling, and withthe partial scale invariance described
in Section 5.1,with augmentation at training and averaging at
test-ing along a limited range of dilation.
(KTH TIPS, 2004) contains 10 classes of 81samples with
controlled scaling, shear and illumi-nation variations but no
rotation. The Rigid-motionscattering does not degrade results but
the scale in-variant provides significant improvement.
(UIUC Tex, 2005) and (UMD, 2009) both con-tains 25 classes of 40
samples with uncontrolled de-formations including shear,
perspectivity effects and
Training size 50SRP (Liu et al., 2011) 48.2Best single feature
(SIFT) in (Sharan et al., 2013) 41.2Rigid-motion scattering + log
on grey images 51.22Rigid-motion scattering + log on YUV images
53.28
Table 4: Classification accuracy on (FMD, 2009)database.
non-rigid deformations. For both these databases,rigid-motion
scattering and the scale invarianceprovide considerable
improvements over transla-tion scattering. The overall approach
achieves andoften exceeds state-of-the-art results on all
thesedatabases.
(FMD, 2009) contains 10 classes of 100 sam-ples. Each class
contains images of the same ma-terial manually extracted from
Flickr. Unlike thethree previous databases, images within a class
arenot taken from a single physical sample object butcomes with
variety of material sub-types which can
-
18 Laurent Sifre, Stéphane Mallat
be very different. Therefore, the PCA classifier ofSection 5.1
can not linearize deformation and dis-criminative classifiers tend
to give better results.The scattering results reported in table 4
are ob-tained with a one versus all linear SVM. Rigid-motion log
scattering applied to each channel ofYUV image and concatenated
achieves 52.2 % ac-curacy which is to our knowledge the best for
asingle feature. Better results can be obtained usingmultiple
features and a feature selection framework(Sharan et al.,
2013).
6 Conclusion
Rigid motion scattering provides stable translationand rotation
invariants through a cascade of wavelettransform along the spatial
and orientation vari-ables. We have shown that such joint
operatorsprovide tighter invariants than separable operators,which
tends to be too strong and thus lose toomuch information. A wavelet
transform on the rigid-motion group has been introduced, with a
fast im-plementation based on two downsampling and fil-tering
cascade. Rigid-motion scattering has been ap-plied to texture
classification in presence of largegeometric transformations and
provide state-of-the-art classification results on most texture
datasets.
Recent work (Oyallon et al., 2014) has shownthat rigid-motion
scattering, with extension to di-lation, could also be used for
more generic visiontask such as object recognition, with promising
re-sults on the CalTech 101 and 256 datasets. For largescale deep
networks, group convolution might alsobe useful to learn more
structured and meaningfulmultidimensional filters.
References
G. E. Hinton and R. R. Salakhutdinov, “Reducing the
dimen-sionality of data with neural networks”, Science, Vol.
313.no. 5786, pp. 504 - 507, 28 July 2006.
Y. LeCun, K. Kavukvuoglu and C. Farabet, “ConvolutionalNetworks
and Applications in Vision”, Proc. of ISCAS2010.
T. Poggio, J. Mutch, F. Anselmi, L. Rosasco, J.Z. Leibo,and A.
Tacchetti, “The computational magic of the ven-tral stream: sketch
of a theory (and why some deep ar-chitectures work)”,
MIT-CSAIL-TR-2012-035, December2012.
P. Sermanet, K. Kavukcuoglu, S. Chintala, Y. LeCun, “Pedes-trian
Detection with Unsupervised Multi-Stage FeatureLearning”, Proc. of
Computer Vision and Pattern Recog-nition (CVPR), 2013.
A. Krizhevsky, I. Sutskever, and G.E. Hinton,
“ImageNetClassification with Deep Convolutional Neural Net-works”,
Proc. of Neural Information Processing Systems(NIPS), 2012
J. Dean, G.S. Corrado, R. Monga, K. Chen, M. Devin, Q.V.Le, M.Z.
Mao, M.A. Ranzato, A. Senior, P. Tucker, K.Yang, A. Y. Ng, “Large
Scale Distributed Deep Net-works”, Proc. of Neural Information
Processing Systems(NIPS), 2012.
J. Bruna, S. Mallat, “Invariant Scattering Convolution
Net-works”, Trans. on PAMI, vol. 35, no. 8, pp. 1872-1886,2013.
S. Mallat “Group Invariant Scattering”, Communications inPure
and Applied Mathematics, vol. 65, no. 10. pp. 1331-1398, 2012.
L. Sifre, S. Mallat, “Combined scattering for rotation
invari-ant texture analysis”, Proc. of European Symposium
onArtificial Neural Networks (ESANN), 2012.
L. Sifre, S. Mallat, “Rotation, Scaling and Deformation
In-variant Scattering for Texture Discrimination”, Proc. ofComputer
Vision and Pattern Recognition (CVPR), 2013.
J. Bruna, S. Mallat, E. Bacry and J-F. Muzy,
“IntermittentProcess Analysis with Scattering Moments”, submitted
toAnnals of Statistics, Nov 2013.
E. Oyallon, S. Mallat, L. Sifre “Generic Deep Networks
withWavelet Scattering”, submitted to International Confer-ence on
Learning Representations (ICLR), 2014.
G. Citti, A. Sarti, “A Cortical Based Model of
PerceptualCompletion in the Roto-Translation Space”, Journal
ofMathematical Imaging and Vision archive, Vol. 24, no. 3,p.
307=326, 2006.
U. Boscain, J. Duplaix, J.P. Gauthier, F. Rossi,
“Anthro-pomorphic Image Reconstruction via Hypoelliptic
Diffu-sion”, SIAM Journal on Control and Optimization, Vol-ume 50,
Issue 3, pp. 1071-1733, 2012.
R. Duits, B. Burgeth, “Scale Spaces on Lie Groups”, inScale
Space and Variational Methods in Computer Vision,Springer Lecture
Notes in Computer Science, Vol. 4485,pp 300-312, 2007.
R. Duits, E. Franken, “Left-Invariant Diffusions on the Spaceof
Positions and Orientations and their Application
toCrossing-Preserving Smoothing of HARDI images”, In-ternational
Journal of Computer Vision, Volume 92, Issue3, pp 231-264,
2011.
T. Leung, J. Malik, “Representing and Recognizing the Vi-sual
Appearance of Materials using Three-dimensional
-
Rigid-Motion Scattering for Texture Classification 19
Textons”, International Journal of Computer Vision , Vol-ume 43,
Issue 1, pp 29-44, 2001.
R. Girshick1, J. Donahue, T. Darrell, J. Malik, “Rich
featurehierarchies for accurate object detection and semantic
seg-mentation”, arXiv preprint:1311.2524, 2013.
D. Lowe, “Distinctive image features from
scale-invariantkeypoints”, IJCV, 60(4):91–110, 2004.
S. Lazebnik, C. Schmid and J. Ponce, “A sparse texture
repre-sentation using local affine regions”, Trans. on PAMI,
vol.27, no. 8, pp. 1265-1278, 2005.
S. Lazebnik, C. Schmid and J. Ponce, “Beyond Bags of Fea-tures:
Spatial Pyramid Matching for Recognizing NaturalScene Categories”,
Proc. of Computer Vision and PatternRecognition (CVPR), 2006.
E. Tola, V. Lepetit, P. Fua “DAISY: An Efficient DenseDescriptor
Applied to Wide Baseline Stereo” Trans. onPAMI, Vol. 32, Nr. 5, pp.
815 - 830, 2010.
H.-G. Nguyen, R. Fablet, and J.-M. Boucher, “Visual tex-tures as
realizations of multivariate log-Gaussian Cox pro-cesses”, Proc. of
Computer Vision and Pattern Recogni-tion (CVPR), 2011.
M. Crosier and L.D. Griffin, “Texture classification with
adictionary of basic image features”, Proc. of Computer Vi-sion and
Pattern Recognition (CVPR), 2008.
Y. Xu, X. Yang, H. Ling and H. Ji, “A new texture descrip-tor
using multifractal analysis in multi-orientation waveletpyramid”,
Proc. of Computer Vision and Pattern Recogni-tion (CVPR), 2010.
L. Liu, P. Fieguth, G. Kuang, H. Zha, “Sorted Random
Pro-jections for Robust Texture Classification”, Proc. of
ICCV,2011.
G. Zhao, T. Ahonen, J. Matas, M. Pietikäinen,
“Rotation-invariant image and video description with local bi-nary
pattern features”, Trans. on Image Processing,21(4):1465-1467,
2012.
L. Sharan, R. Rosenholtz, E. H. Adelson, “Material percep-tion:
What can you see in a brief glance?”, Journal of Vi-sion, 9(8):784,
2009.
L. Sharan, C. Liu, Ruth Rosenholtz, Edward H.
Adelson,“Recognizing Materials Using Perceptually Inspired
Fea-tures”, International Journal of Computer Vision Volume103,
Issue 3, pp 348-371, 2013.
G. Yu and J.M. Morel, “A Fully Affine Invariant ImageComparison
Method”, Proc. of International Conferenceon Acoustics, Speech, and
Signal Processing (ICASSP),Taipei, 2009.
D. L. Donoho, G. Kutyniok, M. Shahram and X. Zhuang. “ARational
Design of Discrete Shearlet Transform”, Proc. ofSampTA’11
(Singapore), 2011.
S. Mallat, “A Wavelet Tour of Signal Processing, 3rd
ed.”,Academic Press, 2008.
L. W. Renninger and J. Malik, “When is scene recognitionjust
texture recognition?”, Vision Research, 44, pp. 2301-2311,
2004.
KTH-TIPS: http://www.nada.kth.se/cvap/databases/kth-tips/
UIUC : http://www-cvr.ai.uiuc.edu/ponce_grp/data/UMD :
http://www.cfar.umd.edu/~fer/
website-texture/texture.htm
FMD : http://people.csail.mit.edu/celiu/CVPR2010/FMD/
ScatNet, a MATLAB toolbox for scattering network :
http://www.di.ens.fr/data/software/scatnet/
http://www.nada.kth.se/cvap/databases/kth-tips/http://www.nada.kth.se/cvap/databases/kth-tips/http://www-cvr.ai.uiuc.edu/ponce_grp/data/http://www.cfar.umd.edu/~fer/website-texture/texture.htmhttp://www.cfar.umd.edu/~fer/website-texture/texture.htmhttp://people.csail.mit.edu/celiu/CVPR2010/FMD/http://people.csail.mit.edu/celiu/CVPR2010/FMD/http://www.di.ens.fr/data/software/scatnet/http://www.di.ens.fr/data/software/scatnet/
1 Introduction2 Invariance to Translations, Rotations and
Deformations3 Rigid-motion Scattering4 Fast Rigid-Motion
Scattering5 Image Texture Classification6 Conclusion