Model Transport: Towards Scalable Transfer Learning on Manifolds 論文紹介

Model Transport: Towards Scalable Transfer Learning on Manifolds

論文紹介

2014/8/2 @_kohta

Outline

•  問題設定 •  Transfer Learning •  多様体、接空間、Parallel Transport •  TransporDng Methods •  Results •  所感

問題設定

•  問題 –  あるドメイン1では沢山データがあるが、そこから移動した

ドメイン2ではデータが少ない –  何とかドメイン2のよいモデルを作れないか？

•  条件 –  ドメイン間の移動の仕方は分かっている –  データそのものを移動するのはexpensive


Oren FreifeldMIT

Cambridge, MA, [email protected]

Søren HaubergDTU Compute

Lyngby, [email protected]

Michael J. BlackMPI for Intelligent Systems

Tubingen, [email protected]

Abstract

We consider the intersection of two research fields:

transfer learning and statistics on manifolds. In particu-

lar, we consider, for manifold-valued data, transfer learn-

ing of tangent-space models such as Gaussians distribu-

tions, PCA, regression, or classifiers. Though one would

hope to simply use ordinary Rn-transfer learning ideas, the

manifold structure prevents it. We overcome this by basing

our method on inner-product-preserving parallel transport,a well-known tool widely used in other problems of statis-

tics on manifolds in computer vision. At first, this straight-

forward idea seems to suffer from an obvious shortcom-

ing: Transporting large datasets is prohibitively expensive,

hindering scalability. Fortunately, with our approach, wenever transport data. Rather, we show how the statisticalmodels themselves can be transported, and prove that for

the tangent-space models above, the transport “commutes”

with learning. Consequently, our compact framework, ap-

plicable to a large class of manifolds, is not restricted by

the size of either the training or test sets. We demonstrate

the approach by transferring PCA and logistic-regression

models of real-world data involving 3D shapes and image

descriptors.

1. IntroductionIn computer vision, manifold-valued data arise often.

The advantages of representing such data explicitly on amanifold include a compact encoding of constraints, dis-

tance measures that are usually superior to ones from Rn,and consistency. For such data, statistical modeling on themanifold is generally better than statistical modeling in aEuclidean space [12, 18, 29, 39]. Here we consider the firstscalable generalization, from Rn to Riemannian manifolds,of certain types of transfer learning (TL). In particular, weconsider TL in the context of several popular tangent-space

models such as Gaussian distributions (Fig. 1b), PCA, clas-sifiers, and simple linear regression. In so doing, we recastTL on manifolds as TL between tangent spaces. This gen-

(a) Data on a manifold (b) Data models

(c) Ordinary translation (d) Model transport

Figure 1: Model Transport for covariance estimation. Onnonlinear manifolds, statistics of one class (red) are trans-ported to improve a statistical model of another class (blue).While ordinary translation is undefined (c), and data trans-port is expensive, a model can be inexpensively transported(green) while preserving data statistics (d).

eralizes those Rn-TL tasks where models learned in one re-gion of Rn are utilized in another. Note, however, that wedo not claim that all Rn-TL tasks have this form.

Let M denote an n-dimensional manifold and let TpMand TqM denote two tangent spaces to M , at points p, q 2M . One cannot simply apply models learned in TpM todata in TqM as these, despite both being isomorphic to Rn,are two different spaces: A model on TpM is usually noteven defined in TqM ; see Fig. 1c. Such obstacles, caused bythe curvature of M , do not arise in Rn-TL. To address thiswe could parallel transport (PT) [7] the data from TpM toTqM , learn a model of the transported data in TqM , and use

Transfer Learning

•  あるドメインで学習したモデルを別のドメインでも使いたい –  その為の「修正」を改めて学習する –  色々な問題設定、手法が存在する…

多様体、接空間、Parallel Transport

•  多様体 –  多様体は「曲面」を一般化したようなもの


•  多様体 –  多様体は「曲面」を一般化したようなもの –  ユークリッド空間に埋め込まれた多様体（球面とか）上の

点はベクトルで表せるが、足したり引いたりすることはできない

多様体上の点ではない！ベクトルの比較も簡単にはできない！


•  意味のあるデータは多様体上にある

M. Alex O. Vasilescu, h1p://alumni.media.mit.edu/~maov/research_index.html


•  意味のあるデータは多様体上にある

Parallel Transport on the Manifold

Parallel Transport on the Euclidean

Space

Original Sequence

Fig. 7. Comparison between the results of parallel transport on the manifold versus that of Euclidean space. The first sequence (its tangent vector fromthe leftmost to the rightmost shape) is parallel transported to the face on the second and third row, and the new sequence is synthesized on the manifold(second row) and Euclidean space (third row).

corresponding to each AU [29]. Especially for the CKdatabase, since each AU is represented using a sequence ofprojection matrices and since expressions occur at differentrates, it is necessary to time warp the sequences in order tolearn a rate-invariant model for them. Adapting the DTWalgorithm to the sequences that reside on a Riemannianmanifold is a straightforward task, since DTW can operatewith any measure of similarity between the different tem-poral features. Here, we use the geodesic distance betweenthe projection matrices of different sequences as a distancefunction and warp all the sequences (corresponding to anAU) to a randomly selected sequence. Then the final modelfor each AU is obtained by computing the Karcher mean ofall the warped sequences. This is a simple and fast approachthat works fairly well.1) Action Units Recognition: Using the learned AU mod-

els for the Bosphorus and CK databases, we perform AUrecognition. We report the results on seventeen single AUsin the Bosphorus database and nine single or combined AUsin the CK database. The training samples are chosen asimages/sequences containing only the target AU occurringin the corresponding local facial components (brow, eye,nose, and mouth). In the Bosphorus database the lack of

��

��

��

��

�

��

��

Fig. 6. A sequence of facial expression is a curve on the Grassmannmanifold.

sufficient landmarks on the faces limits our recognitioncapabilities. For example we cannot recognize the AU-43since no landmarks are provided for the eyes. Also for theCK database, since the sequences are mainly correspondingto the expressions and not AUs, we only chose those AUsfor which enough training sequences are available. We divideboth databases into eight sections, each of which containsimages from different subjects. Each time, we use sevensections for training and the remaining sections for testingso that the training and testing sets are mutually exclusive.The average recognition performance is computed on all thesections.

For the Bosphorus database, we perform maximum like-lihood (ML) recognition where we find the probability ofeach test velocity vector comes from the learned Gaussiandistribution. But for the CK database, we first warp eachtest sequence to the learned template using DTW and thenuse the distance between the two sequences for recognition.Figure 8 shows the confusion matrices for both databases. Asthe results indicate, for the AUs that are mainly identifiedby their facial deformations the recognition rate is high,e.g. AU-2, AU-4, and AU-27. However, for AUs whosedistinction is more due to the appearance deformations thanthe geometries, the algorithm may confuse them with theAUs with similar geometries, e.g. AU-16 and AU-25. In thesecases, AUs occurring in other parts of the face can be used ascues to remove the ambiguity and improve the recognition.

We also performed a recognition experiment using theBosphorus database on the Euclidean space, where thenormal parallel transport is performed before learning thedistributions. While the average recognition rate for AUs onthe Grassmann manifold is 83%, this value is 79% on theEuclidean space. Although the recognition rate is improvedon the Grassmannian, but it is not considerable. A possiblereason can be the fact that in the Bosphorus database thefaces are almost always frontal and there are not significant

S. Taheri, et al.: Towards View-‐Invariant Expression Analysis Using AnalyGc Shape Manifolds


•  多様体 –  より正確には、各点で局所座標系（ユークリッド空間への

同相写像）が定義できるような位相空間のこと –  「近傍」があるような集合と思えば（ビジョン等への応用上

は）たぶんよい


•  接空間 –  ある1点で多様体に「接する」平坦な空間（超平面）を考え

る事ができる。これを接空間と呼ぶ –  接空間はベクトル空間なので我々がよく知っている様々

な操作を行うことができる –  実用上は接空間上でのアルゴリズムを考えるのが普通

Exponential Map

� Given a Lie group G, with its related Lie Algebra g = TG(I), there always exists a smooth map from Lie Algebra g to the Lie group G called exponential map

SO(n) so(n) I

A

is the point in that can be reached by traveling along the geodesic passing through the identity in direction , for a unit of time (Note: A defines also the traveling speed)

hIp://www.inf.ethz.ch/personal/lballan/teaching.html

接空間TIM

多様体M

：（単位元Iの周りの）接空間上の点を　多様体にマッピング　（接空間上の直線が多様体上の測地線　になるようなマッピング）

log(a)

a

：多様体上の点を（単位元Iの周りの）接空間にマッピング（exponenDal mapの逆写像）


•  接空間 –  多様体と接空間の間を行き来する写像を定義できること

がある –  ExponenDal map: 接空間 -‐> 多様体 –  Logarithm map: 多様体 -‐> 接空間

Exponential Map


SO(n) so(n) I

A



接空間TIM


log(a)

a


多様体M


•  接空間 –  expmap等は、測地線との関係で定義される –  接空間上の直線が多様体上の測地線にマッピングされる

ようなものとして定義 –  多様体の距離構造（計量）に依存して決まる

Exponential Map


SO(n) so(n) I

A



接空間TIM


log(a)

a


多様体M


•  Lie群 –  Lie群は、「群でありかつ多様体である」ような集合 –  群演算が多様体上の写像になっていて、接空間（Lie代

数）やexpmap等が明示的に構成できることが多いので実際上便利

–  ビジョンの応用で現れる多様体は（サーフェスを除いて）Lie群であることが多い

� : G⇥G ! GG

群の定義

集合とその上の演算　　　　　　　　　　　　　　について

g, h, k 2 G

g � (h � k) = (g � h) � ke 2 G, g � e = e � g = g

（結合率）

（単位元の存在）

（逆元の存在） g�1 2 G, g � g�1 = g�1 � g = e


•  行列とLie群 –  実用上現れる多くのLie群は行列型で、行列積について群

になっている（一般線形群の部分群） –  このような群に対しては行列の指数関数が自然なexponenDal mapになっている

exp(X) =

1X

n=0

1

n!Xn


•  Lie群 –  Lie群の例: SO(3) (3次元回転群)

–  so(3): SO(3)のLie代数（単位元Iの周りでの接空間）

SO(3) = {R | R 2 R3⇥3, RRT = I, |R| = 1}

⌦ =

0

@0 �!1 !2

!1 0 �!3

�!2 !3 0

1

A , !1,!2,!3 2 R

exp(⌦) = I +sin |⌦||⌦| ⌦+

1� cos |⌦||⌦|2 ⌦

2

logR =

1

2 sin ✓(R�RT

), ✓ = cos

�1

✓Tr(R)� 1

2

◆


•  Parallel Transport –  接ベクトル（接空間の元）を

多様体に沿って移動させる –  接続(connecDon)と呼ばれ

る量によって定義される •  色々な接続を考えることがで

きる。通常は接空間のベクトルの内積を保存する接続（Levi-‐Civita接続）を用いる

–  接ベクトルの移動　＝接空間モデルの移動

170

!1!0.5

00.5

1

!1

!0.5

0

0.5

1

!1

!0.5

0

0.5

1

x1 2 TqMq

p

x0 2 TpM

Figure 5.3: Parallel transport of x0 along the geodesic between p and q.

1. �c(0)!c(0)(x0) = x0.

2. Let u, s and t be in [0, 1]. If �c(u)!c(t) and �

c(s)!c(u) are defined in a similar way to

�c(0)!c(t), then �

c(u)!c(t) � �c(s)!c(u) = �

c(s)!c(t).

3. If x0 is fixed and t varies, then �c(0)!c(t)(x0) : [0, 1] ! Rn is a smooth function of t.

4. If x and y are in Tp

M , then, for every t, their inner-product is preserved:

hx, yic(0) =

¨�c(0)!c(t)(x),�c(0)!c(t)(y)

∂c(t)

. (5.18)

Consequently, norms of tangent vectors and angles between tangent vectors are pre-

served; see Fig. 5.4.

5. �c(0)!c(t) : Tc(0)M ! T

c(t)M is a bijective linear map.

5.4.2 The General Case

We now proceed to the more general case. Let M be a geodesically-complete Riemannian

manifold. To see how a covariance can be moved across M , consider first tangent vectors.

While simple translation will not do, we can transport vectors from one tangent space to

another using parallel transport, to be defined as follows5.

5Another way to define it is trough the notion of a connection - a term we avoid elaborating on. See [27].

�cst : Tc(s)M ! Tc(t)M

曲線c

曲線c上の点c(s)でのベクトルから点c(t)のベクトルへの写像

Model Transport

•  Karcher mean (Frechet mean) SO(2) and SO(3): Tangent Spaces

SO(2) SO(3)

1-manifold 3-manifold

� What are the tangent spaces of these two manifolds?

vector space with 1 dimension

vector space with 3 dimensions

subspace of subspace of

They are matrices

多様体

Xf ⌘ argminX

X

i

d(X,Xi)

効率的な計算法: X. Pennec, ProbabiliDes and staDsDcs on Riemannian manifolds: Basic tools for geometric measurementss

Model Transport

•  Parallel Transportの数値計算 –  Schild’s Ladder 174

Figure 5.6: An illustration of Schild’s Ladder for approximating the parallel transport (forK = 3). See text for details.

5.4.4.2 Schild’s ladder

When there is no closed-form solution, we use Schild’s ladder [106], a strikingly simple

numerical technique to compute an arbitrarily accurate approximation6 of the LC parallel

transport using only the Exp/Log maps7. We are not the first to use Schild’s ladder in

computer vision applications (although we are the first to use it in the context of transfer

learning). For example, the technique has been recently used in modeling longitudinal

medical data [96,115] and in tracking [66].

Schild’s ladder is a numerical scheme that enables computation of the LC parallel

transport [106]. We wish to parallel transport a tangent vector v0 from x0 to xK

on

M along the geodesic curve ↵ that joins them. Schild’s ladder places points along ↵

and approximately parallel transports v0 to these by forming generalized parallelograms8

on M (see Fig. 5.6): Let {x1, . . . , xK�1} denote points along ↵. Start by computing

a0 = Expx0(v0) and the midpoint b1 of the geodesic segment joining x1 and a0. Follow the

geodesic from x0 through b1 for twice its length to the point a1. This scheme is repeated

for all sampled points along the geodesic from x0 to xK

. The final parallel transport of v0

6This is not true for every parallel transport – but it does hold at least for parallel transport associatedwith a symmetric connection [82] such as LC, the only connection which is both metric and symmetric.

7Note that if Exp/Log maps are unavailable analytically, then Exp maps can be computed by integratingan initial value problem and geodesics/Log maps can be computed by solving a boundary value problem[111].

8This generalization is known as the Levi-Civita parallelogramoid.

M. Lorenzi, et al.: Schild’s ladder for the parallel transport of deformaDons in Dme series of images IPMI 2011

TransporDng Methods

•  Task1 –  ドメインLのデータ　　　　　　が十分にあり、転移先のドメイ

ンSのデータ　　　　　　が少数しかない（平均は計算可能） –  　　　に対応する分散共分散行列を求めたい

•  Task2 –  ラベル（離散または連続）データ , とラベルなしデータがある –  のラベルを求めたい（識別、回帰）

{xLi }

NLi=1

{xSi }

NSi=1

{xAi }

NAi=1

{xBi }

NBi=1

{xSi }

y

Ai = label(xA

i )

{xBi }

Cov({pi}Ni=1) =1

N � 1

X

i

logµ(pi) logµ(pi)T

P. T. Fletcher, et al.: Principal Geodesic Analysis for the Study of Nonlinear StaDsDcs of Shape Trans. Med. Imag. (2004)

分散共分散行列

TransporDng Methods

•  Data Transport –  ドメインAのデータをドメインBのまわりにtransportする

•  データ数が多いとexpensive

•  Basis Transport –  モデルを構成する（接空間の）基底ベクトルをドメインBの

まわりにtransportする •  モデルの次元数が多いとexpensive

TransporDng Methods

•  Model Transport(proposed) –  モデルに関わる少数のベクトルだけtransportすることが

できる •  （実際は少数にならない場合も多い気がするけど…）

–  基本的な発想は、モデルを構成する（接）ベクトルだけをtransportする


Oren FreifeldMIT

Cambridge, MA, [email protected]

Søren HaubergDTU Compute

Lyngby, [email protected]

Michael J. BlackMPI for Intelligent Systems

Tubingen, [email protected]

Abstract

We consider the intersection of two research fields:

transfer learning and statistics on manifolds. In particu-

lar, we consider, for manifold-valued data, transfer learn-

ing of tangent-space models such as Gaussians distribu-

tions, PCA, regression, or classifiers. Though one would

hope to simply use ordinary Rn-transfer learning ideas, the

manifold structure prevents it. We overcome this by basing

our method on inner-product-preserving parallel transport,a well-known tool widely used in other problems of statis-

tics on manifolds in computer vision. At first, this straight-

forward idea seems to suffer from an obvious shortcom-

ing: Transporting large datasets is prohibitively expensive,

hindering scalability. Fortunately, with our approach, wenever transport data. Rather, we show how the statisticalmodels themselves can be transported, and prove that for

the tangent-space models above, the transport “commutes”

with learning. Consequently, our compact framework, ap-

plicable to a large class of manifolds, is not restricted by

the size of either the training or test sets. We demonstrate

the approach by transferring PCA and logistic-regression

models of real-world data involving 3D shapes and image

descriptors.

1. IntroductionIn computer vision, manifold-valued data arise often.

The advantages of representing such data explicitly on amanifold include a compact encoding of constraints, dis-

tance measures that are usually superior to ones from Rn,and consistency. For such data, statistical modeling on themanifold is generally better than statistical modeling in aEuclidean space [12, 18, 29, 39]. Here we consider the firstscalable generalization, from Rn to Riemannian manifolds,of certain types of transfer learning (TL). In particular, weconsider TL in the context of several popular tangent-space

models such as Gaussian distributions (Fig. 1b), PCA, clas-sifiers, and simple linear regression. In so doing, we recastTL on manifolds as TL between tangent spaces. This gen-

(a) Data on a manifold (b) Data models

(c) Ordinary translation (d) Model transport

Figure 1: Model Transport for covariance estimation. Onnonlinear manifolds, statistics of one class (red) are trans-ported to improve a statistical model of another class (blue).While ordinary translation is undefined (c), and data trans-port is expensive, a model can be inexpensively transported(green) while preserving data statistics (d).

eralizes those Rn-TL tasks where models learned in one re-gion of Rn are utilized in another. Note, however, that wedo not claim that all Rn-TL tasks have this form.

Let M denote an n-dimensional manifold and let TpMand TqM denote two tangent spaces to M , at points p, q 2M . One cannot simply apply models learned in TpM todata in TqM as these, despite both being isomorphic to Rn,are two different spaces: A model on TpM is usually noteven defined in TqM ; see Fig. 1c. Such obstacles, caused bythe curvature of M , do not arise in Rn-TL. To address thiswe could parallel transport (PT) [7] the data from TpM toTqM , learn a model of the transported data in TqM , and use

TransporDng Methods

•  PCA Transport –  transport前後の点 –  点pの周りの接空間のデータ –  点qの周りの接空間のデータ

–  でのPCA, SVDモデル

p, q 2 M{xi}Ni=1 ⇢ TpM

XXT = V S2V T

X = V SUT

V = [v1, . . . , vn]X = [x1, . . . , xn]

{xi}Ni=1 ⇢ TqM

TpM

TransporDng Methods

•  PCA Transport –  でのPCA, SVDモデル

–  このとき、でのPCA, SVDモデルは以下で与えられる

XXT = V S2V T

X = V SUT

V = [v1, . . . , vn]

V = [v1, . . . , vn]

X = [x1, . . . , xn]

X = V SUT

X = [x1, . . . , xn]

TpM

TqM

XXT = V S2V T

TransporDng Methods

•  Linear Regression Transport –  接空間上の回帰モデル

–  でのモデルは以下で与えられる

(�,�0) = argmin↵2TpM,↵02R

NX

i=1

li(xTi ↵+ ↵0)

loss funcDon

� = AqLA�1p � �0 = �0

Ap, Aq : 点p,qでの計量テンソル

L : pからqへのparallel transport (線形変換)

TqM

TransporDng Methods

•  実際の適用

⌃L, VL

⌃�, V�

⌃S , VS

VF

orthogonalize([V�, VS ])

⌃�, V�⌃� = �⌃� + (1� �)⌃S

: 元のドメインでのモデル

: 転移先のデータのみを使ったモデル

: LでのモデルをTransportしたモデル

: VΓとVSを両方用いて上位k次元（他のモデルと同じ次元）だけとったモデル

: 縮小推定によってΓとSのモデルを合成したモデル

Results

•  人体モデル（メッシュ）1 –  n = 129300次元のLie群 –  性別の異なる2つのドメイン（女性から男性にtransfer）

•  女性1000サンプル、男性50サンプル

–  PCAモデル •  女性モデル: 200次元、　男性モデル: 50次元 •  1000テストサンプルでモデルによるメッシングと真値との誤

差を評価（測地線距離で誤差定義） •  モデルによるreconstrucDonは

O. Freifeld and M. J. Black: Lie Bodies: A Manifold RepresentaDon of 3D Human Shape

expµ(V V Tlogµ(zi)) 2 M

{zi}

Results

•  人体モデル（メッシュ） –  n = 129300次元のLie群 –  性別の異なる2つのドメイン（女性から男性にtransfer）

O. Freifeld and M. J. Black: Lie Bodies: A Manifold RepresentaDon of 3D Human Shape

Figure 3: Summary for shape experiments. Left: Gender.Right: BMI. The bars represent the overall reconstructionerror for VL, VS, V�, and VF. For a given model, the heightof the bar represents the reconstruction error measured interms of SGE averaged over the entire test dataset as wellas all of the mesh triangles.

(a) VL (b) VS (c) V� (d) VF (e) V�

Figure 4: Model mean error: Genders. Blue and red in-dicate small and large errors respectively. The heat mapsare overlaid over the points of tangency associated with themodels: p for (a), and q for (b-e). See text for details.

copies of a 6-dimensional Lie group, which is isomorphicto the product of three smaller ones, including SO(3); thus,n = 129300. While here we do not advocate a particularmanifold nor does our work focus on shape spaces, this Menables us to easily demonstrate the MT framework. Thedata consist of aligned6 3D scans of real people [31]. Onthis M , the LC PT is computed as follows: For the SO(3)components of M , a closed-form solution is available [9],while for the rest we use Schild’s ladder (see, e.g., [17,24]).

From Venus to Mars. We first illustrate the surpris-ing power of MT. The training data contains NL = 1000shapes of women (Fig. 1a, red; shown here on a 2D man-ifold for illustration) but only NS = 50 shapes of men(blue), where all shapes are represented as points on M .As it is reasonable to expect some aspect of shape variationamong women may apply to men as well, we model theshape variation of men while leveraging that of women. Wefirst compute the Karcher means for women and men de-noted p and q, respectively (Fig. 2a–2b). We then computetheir PCA models, VL ⇢ TpM and VS ⇢ TqM (kL = 200and kS = 50), as well as V� = �(VL). For an animatedillustration see [13]. We also compute VF and V� usingthe procedures from Sec. 4.4. We evaluate performance on

6MT also applies to some shape spaces that do not require alignment.

GroundTruth

VL

(Women)

VS

(Men)

V�

(PT)

VF

(Fuse)

Figure 5: Selected results: Gender. Each column representsa different test body. The heat maps are overlaid on thereconstructions using different models.

(a) VL (b) VS (c) V� (d) VF (e) V�

Figure 6: Model mean error: BMI. Analogous to Fig. 4.

1000 test male shapes, whose deformations serve as ground-truth. Let V 2 {VL, VS, V�, VF, V�}. Let µ denote the pointof tangency; i.e., p for VS and q otherwise. Let zi 2 Mdenote the true deformation of test example i. Its recon-struction is Expµ(V V TLogµ(zi)) 2 M . We then com-

女性モデルをそのまま利用

男性サンプルからモデル構成

女性モデルを Transport

Fusionモデル分散共分散行列を縮小推定

Results

•  人体モデル（メッシュ）2 –  n = 129300次元のLie群 –  BMIの異なる2つのドメイン（BMI<=30からBMI>30にtransfer）

BMI<=30モデルをそのまま利用

BMI>30サンプルからモデル構成

BMI<=30モデルをTransport

Fusionモデル分散共分散行列を縮小推定


(a) VL (b) VS (c) V� (d) VF (e) V�





GroundTruth

VL

(Women)

VS

(Men)

V�

(PT)

VF

(Fuse)


(a) VL (b) VS (c) V� (d) VF (e) V�



Results

•  人体モデル（メッシュ） –  各ケースの誤差値


(a) VL (b) VS (c) V� (d) VF (e) V�





GroundTruth

VL

(Women)

VS

(Men)

V�

(PT)

VF

(Fuse)


(a) VL (b) VS (c) V� (d) VF (e) V�



Results

•  Classifier Transport –  斜めのアングルで学習した表情識別（2クラス）を正面アン

グルにTransfer •  画像の1/4分割ごとに5次元特徴量の分散共分散行列(Symmetric PosiDve Definite (SPD) Matrix)を計算し特徴量とする –  SPD Matrixxの集合は多様体になる（Lie群ではない） – 両ドメイン共に168サンプル

pute, for each triangle, the Squared Geodesic Error (SGE)between the reconstruction and the true deformation. Fix-ing i, SGE is averaged over all body triangles, yielding theMean SGE (MSGE) of the ith body. Overall performanceof V is defined by averaging MSGE over all test examples.MSGE results are summarized in Fig. 3 (left). To visualize,we average the SGE, per triangle, over all test examples,and display these per-triangle errors over the mesh associ-ated with µ (Fig. 4). Figure 4a shows that VL performs verypoorly; a shape model of women fails to model men. Whilethe errors for VS are much lower (Fig. 4b), there are stillnoticeable errors due to poor generalization. The surpriseis Fig. 4c, which shows the result for V�: the PT dramat-ically improves the female model (Fig. 4a) to the point itfares comparably with the male model (Fig. 4b), althoughthe only information used from the male data is the mean.Combining transported and local models lets us do even bet-ter. Figure 4d shows that VF significantly improves over VS

or V�. Figure 4e shows the regularized model, V�, whichhas the same dimensionality as VL and still performs well.Figure 5 shows selected results for test bodies; see [13] foradditional results and reconstructions.

From Normal-Weight to Obesity. A good statisticalshape model of obese women is important for fashion andhealth applications but is difficult to build since the data arescarce as reflected by their paucity in existing body shapedatasets [31]. This experiment is similar to the previousone, but both the data and the results are of different na-ture. Here, we have 1000 shapes of women with BMI 30but only 50 shapes of women with BMI > 30. We com-pute means and subspaces as before. Figure 2c shows q, thehigh-BMI mean; p, the normal-BMI mean, is not shown asit is very similar to p from the gender experiment. Figures3 (right) and 6 summarize the results. Compared with thegender experiment there are two main differences: 1) HereV� is already much better than VS so fusion only makes asmall difference. 2) Error bars (Fig. 3, right) are larger thanbefore (Fig. 3, left) due to the limited amount of test dataavailable for high-BMI women; this is truly a small-sampleclass: we were able to obtain only 50 test examples. Com-pared with using VS, reconstruction is noticeably improvedusing our method (VF). In both experiments, results for V�

look nearly identical to VF, and are not shown. See [13] forindividual reconstruction results.

5.2. Classification Transport and Image Descriptors

For Task II, our data consist of facial images7 and thegoal is binary facial-expression classification. Images aredescribed by SPD matrices that encode normalized corre-lations of pixel-wise features [40]. Each quarter of an im-age is described by a 5 ⇥ 5 SPD matrix, yielding an im-age descriptor in M = SPD(5)4. PT is computable by

7From www.wisdom.weizmann.ac.il/

˜

vision/FaceBase

Figure 7: Classifier-transport example. Select images. Top:First data set. Bottom: Second data set. In each row, exam-ples from class 1 (left) and class 2 (right) are shown.

Schild’s ladder, M is not a Lie group8 and n = 60. Thedatasets {pi}NA

i and {qj}NBj reflect two different viewing

directions; NA = NB = 168. The labels of {pi}NAi are

known, those of {qj}NBj withheld. See Fig. 7 for examples.

We compute p and q, the means of the datasets. Then, us-ing {Logp(pi)}

NAi ⇢ TpM , we learn a logistic-regression

model. This classifier, defined on TpM , is correct 59% ofthe time when applied to {Logp(qj)}

NBj ⇢ TpM . Apply-

ing the transported model to {Logq(qj)}NBj ⇢ TqM im-

proves performance to 67%. Thus, for the same unanno-tated {qj}NB

j , MT improves over the baseline. Note we hadto PT only one vector; even for such a small dataset thespeed gain is already significant.

6. ConclusionOur work is the first to suggest a framework for gener-

alizing transfer learning (TL) to manifold-valued data. Asis well-known, parallel transport (PT) provides a principledway to move data across a manifold. We follow this rea-soning in our TL tasks, but rather than transporting data we

transport models – so the cost does not depend on the size ofthe data – and show that for many models the approaches areequivalent. Thus, our framework naturally scales to largedatasets. Our experiments show that not only is this math-ematically sound and computationally inexpensive but alsothat in practice it can be useful for modeling real data.

Acknowledgments This work is supported in part byNIH-NINDS EUREKA (R01-NS066311). S.H. is sup-ported in part by the Villum Foundation and the DanishCouncil for Independent Research (Natural Sciences).

References[1] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Log-

euclidean metrics for fast and simple calculus on diffusiontensors. MR in medicine, 56(2):411–421, 2006. 2

[2] M. F. Beg, M. I. Miller, A. Trouve, and L. Younes. Comput-ing large deformation metric mappings via geodesic flows ofdiffeomorphisms. IJCV, 61(2):139–157, 2005. 2

8Since SPD is not a matrix Lie group. While not used here, some Liegroup structure can still be imposed to get a nonstandard matrix Lie group;i.e., the binary operation will not be the matrix product.

O. Tuzel, et al.: Region Covariance: A Fast Descriptor for DetecDon and ClassificaDon

Results

•  Classifier Transport –  斜めのアングルで学習した表情識別（2クラス）を正面アン

グルにTransfer •  斜めアングルの識別器をそのまま利用したケースでは59%の精

度だったものが、Transferありでは67%に改善した

pute, for each triangle, the Squared Geodesic Error (SGE)between the reconstruction and the true deformation. Fix-ing i, SGE is averaged over all body triangles, yielding theMean SGE (MSGE) of the ith body. Overall performanceof V is defined by averaging MSGE over all test examples.MSGE results are summarized in Fig. 3 (left). To visualize,we average the SGE, per triangle, over all test examples,and display these per-triangle errors over the mesh associ-ated with µ (Fig. 4). Figure 4a shows that VL performs verypoorly; a shape model of women fails to model men. Whilethe errors for VS are much lower (Fig. 4b), there are stillnoticeable errors due to poor generalization. The surpriseis Fig. 4c, which shows the result for V�: the PT dramat-ically improves the female model (Fig. 4a) to the point itfares comparably with the male model (Fig. 4b), althoughthe only information used from the male data is the mean.Combining transported and local models lets us do even bet-ter. Figure 4d shows that VF significantly improves over VS

or V�. Figure 4e shows the regularized model, V�, whichhas the same dimensionality as VL and still performs well.Figure 5 shows selected results for test bodies; see [13] foradditional results and reconstructions.

From Normal-Weight to Obesity. A good statisticalshape model of obese women is important for fashion andhealth applications but is difficult to build since the data arescarce as reflected by their paucity in existing body shapedatasets [31]. This experiment is similar to the previousone, but both the data and the results are of different na-ture. Here, we have 1000 shapes of women with BMI 30but only 50 shapes of women with BMI > 30. We com-pute means and subspaces as before. Figure 2c shows q, thehigh-BMI mean; p, the normal-BMI mean, is not shown asit is very similar to p from the gender experiment. Figures3 (right) and 6 summarize the results. Compared with thegender experiment there are two main differences: 1) HereV� is already much better than VS so fusion only makes asmall difference. 2) Error bars (Fig. 3, right) are larger thanbefore (Fig. 3, left) due to the limited amount of test dataavailable for high-BMI women; this is truly a small-sampleclass: we were able to obtain only 50 test examples. Com-pared with using VS, reconstruction is noticeably improvedusing our method (VF). In both experiments, results for V�

look nearly identical to VF, and are not shown. See [13] forindividual reconstruction results.

5.2. Classification Transport and Image Descriptors

For Task II, our data consist of facial images7 and thegoal is binary facial-expression classification. Images aredescribed by SPD matrices that encode normalized corre-lations of pixel-wise features [40]. Each quarter of an im-age is described by a 5 ⇥ 5 SPD matrix, yielding an im-age descriptor in M = SPD(5)4. PT is computable by

7From www.wisdom.weizmann.ac.il/

˜

vision/FaceBase

Figure 7: Classifier-transport example. Select images. Top:First data set. Bottom: Second data set. In each row, exam-ples from class 1 (left) and class 2 (right) are shown.

Schild’s ladder, M is not a Lie group8 and n = 60. Thedatasets {pi}NA

i and {qj}NBj reflect two different viewing

directions; NA = NB = 168. The labels of {pi}NAi are

known, those of {qj}NBj withheld. See Fig. 7 for examples.

We compute p and q, the means of the datasets. Then, us-ing {Logp(pi)}

NAi ⇢ TpM , we learn a logistic-regression

model. This classifier, defined on TpM , is correct 59% ofthe time when applied to {Logp(qj)}

NBj ⇢ TpM . Apply-

ing the transported model to {Logq(qj)}NBj ⇢ TqM im-

proves performance to 67%. Thus, for the same unanno-tated {qj}NB

j , MT improves over the baseline. Note we hadto PT only one vector; even for such a small dataset thespeed gain is already significant.

6. ConclusionOur work is the first to suggest a framework for gener-

alizing transfer learning (TL) to manifold-valued data. Asis well-known, parallel transport (PT) provides a principledway to move data across a manifold. We follow this rea-soning in our TL tasks, but rather than transporting data we

transport models – so the cost does not depend on the size ofthe data – and show that for many models the approaches areequivalent. Thus, our framework naturally scales to largedatasets. Our experiments show that not only is this math-ematically sound and computationally inexpensive but alsothat in practice it can be useful for modeling real data.

Acknowledgments This work is supported in part byNIH-NINDS EUREKA (R01-NS066311). S.H. is sup-ported in part by the Villum Foundation and the DanishCouncil for Independent Research (Natural Sciences).

References[1] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Log-

euclidean metrics for fast and simple calculus on diffusiontensors. MR in medicine, 56(2):411–421, 2006. 2

[2] M. F. Beg, M. I. Miller, A. Trouve, and L. Younes. Comput-ing large deformation metric mappings via geodesic flows ofdiffeomorphisms. IJCV, 61(2):139–157, 2005. 2

8Since SPD is not a matrix Lie group. While not used here, some Liegroup structure can still be imposed to get a nonstandard matrix Lie group;i.e., the binary operation will not be the matrix product.

まとめ

•  同一の多様体に乗っている2つの異なるドメイン間で、接空間上のモデルのTransportを行う手法が得られた

•  人体メッシュモデル、Classifierの場合で、元のドメインの情報を利用してTransport先のドメインでのモデル精度の向上が得られた

所感

•  「接空間モデル」は現実的でない？ –  例えば線形SVMとかでは、大きな特徴ベクトルをそのまま

学習する場合が（実用上）多い •  多様体が埋め込まれた大きな線形空間に属するベクトルがモデ

ルとなる →　接空間の直交補空間をそのままtransportしてよいかどうか？

–  正例と負例が属する多様体が大きく異なるような場合はどうなる？ •  全体を合わせるとユークリッド空間全体になってしまうかも

–  データの多様体が分かっている前提 •  意味のある多様体がわかっているケースはそんなにあるのかど

うか…

所感

–  どちらかというと、特徴量設計の時点で多様体を定めてしまうようなケースが想定 •  球面上、SPD、Lie群…

•  効果と応用 –  効果は言うほど大きくないような… –  学習時はオフラインなんだからデータをtransportすれば

いいじゃん、という気もする –  上手くハマるような使い方がある？

•  オンライン学習 •  トラッキング

–  subspace trackingとか（Grassmann多様体）

Model Transport: Towards Scalable Transfer Learning on Manifolds 論文紹介

Technology

intelligent

denmark sohaudtu

called exponential

scalable transfer

transfer learning

travelin geodesic

germany blacktue

preserving