Top Banner
Perturbation Theory for wide Networks ( joint w/ Dan Roberts , Sho Yaida ) F = no , , TCW - x - b) dllfcw , b) ° . Fact : Any " nice " Mf can be approximated : ° ° o ' c b ! - ! zazie ,pk+ , Mf I Guy , bj ) n , > 31 o - sea = Z' LE IR " ° . zcyc.IR " ' zcyc.IR 'll Tkm :[ Montanari t , Chizatt , Rotskofft , ] layer Let 's scale Wc ? = OC 's , ) , b' i' ' = o When z ! ! ! = ' W' ig ? or ( zjj ? ) x b' f ' you train w ! ? by =GD : T T neuron sea fish , GD = 2- Wass optimal transport tool : Say a bit w happens when we Ems ! on elf Tojo There are actually many different - when use lez - loss width limits : init , learning rate , no , be = Sw ¥ : ! ; ' ' II 't :{ ' wi .edu#oca-ibaoisJY' " s ' Note that Z' 3 ' = f ,µo+ , 6( Wore + b) dplhf.be ) Net Tag - K where t d Gen ) - - cmipneintgs . I w u = ¥ , ' W' ' j ' 8cway.yc.gg Meas Stilo , g) dy re . ) w/ # atoms e- width f tick , a) doc = u ( o ) East : Any " reasonable " o , any f- R " IR 1¥ we usually take W ' ? no Chik ) we can write =
4

Say Tojo .edu#oca-ibaoisJY'

Dec 23, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Say Tojo .edu#oca-ibaoisJY'

Perturbation Theory for wide Networks

( joint w/ Dan Roberts,Sho Yaida )

F = ↳no, ,

TCW - x - b) dllfcw , b)• •

•• °

. Fact : Any"

nice"

Mf can be approximated :•

° °

o

•'

cb

•! - -

- !

zazie ,pk+ ,Mf I Guy

, bj)n,> 31

.

o -

sea = Z'LE IR"°

.

zcyc.IR" ' zcyc.IR'll Tkm :[Montanari t , Chizatt ,

Rotskofft,. . . ]

← layer Let 's scale Wc? = OC 's,) ,

b'i''= o

.

When

z ! !! =

'

W'ig? or ( zjj? ) x b'f'

you train w !? by =GD :

T Tneuron sea fish

,

GD = 2- Wass optimal transporttool : Say a bit w happens when we Ems ! on elf .

Tojo Thereare actually many different -

when use lez - loss .

width limits : init, learning rate ,

no, be = Sw

⇐¥:! ;'

'II't

:{ ' wi.edu#oca-ibaoisJY'"s'Note that

Z' 3'

= f,µo+ ,6( Wore + b) dplhf.be) Net • Tag

-

K

wheret d Gen ) -- cmipneintgs . I w

u = ¥,

'

W' 'j'

8cway.yc.ggMeas Stilo

, g) dy -

- re . )w/ #atoms

e- width f tick , a) doc = u ( o )East : Any

" reasonable " o, any

f- R" → IR1¥ we usually take W

'? noChik)we can write

=

Page 2: Say Tojo .edu#oca-ibaoisJY'

ne

z !! = JE! wtf} oeczcejj's ) + b'i' H) = e- I E.

Ei ECE "-" 395, ocnuhy a

The "standard

" init scheme : cw.CI Finally , we can describe GP recursively :

Wing n Glo , Cwlne - i ) b'i' ~ GCO,Cb) Eff .IE?.a.z'ieLz)--EfCb- ¥

,

olda .) x

Beal : line 2-' am → GP zeicxa) scz' ]

N,

- - - he - e → X ly

Notation : fixed set in IR"

z'

= { zf.gg ,yea }

= Cbt Cw E [of 't ' '

L.

) 6C D)>CA Thos

tem : Given z'

,zina is Gaussian wliid components : K !:3

,

=him Con, ( Z'E) Coca ,) , z

'(xn)

g qle-n Wi - " heute e

Con, ( Zina.

.

Za,

IZ' ) iiza.dz

" '

= Cb + Cw ( T (Za) 6 (Zaz) >kle-is

=snccb - Ee. . Ei

"

-idzYES¥⇒ . Geo ,

-

(" "I:;5)FBh

The entries of E'e-"

are" collective observables " :

O'

;"

= ate,

"

flag ; SZFII.ae#I/kYIYz--cb+CnhsfzaiqCzanskTILem : O '

;'= E[ Uhf ) + o Cri't) t T

mm mm

←xx E fr" Z'Y = Z'e)Coca)

Thus,

E[ e-it

'# o 's ] = ELE [ e-it'# o '

( zee I]

= Elf e- IET 55 . de-''

og ;]

Page 3: Say Tojo .edu#oca-ibaoisJY'

K'

= Cbt Cw LOCH) Http) >kid kcgf-nntoxa.gs ypyno

This is a 3D system in K 'LL, kief , Kamp •

"B

xx

(*) seek fixed point for K 'LL : •

*if:c .= Est CE LE > k* B

⇒ folks lR=k* ⇒ kYL=K* He !-

•REE

(a)OK 'LL

= 1 He ⇒ CIL @2) " 7k¥ -- IWalk"I=k* F-

(⇒ OKEY =L He ↳ Cws@ ' 52=1-K* xp -- reactor) Var[zf%I=VaEf%to5⇒q/kYL=kYf=kYf=k* -EE

= e=

"

Tuning to criticality"

Necessary to do if want" nice

"

large d behavior

XavierEx : 6¥ Reluct ) = tllfso He

1¥ :c.÷¥⇒%:D .

Gioro tri

Page 4: Say Tojo .edu#oca-ibaoisJY'

Zina = Iii'

wie} ocz' 'al e b'in .EE/e=oNTK3/ELNtk3- FE + OKEY)summary of we Did :[

" " " ' ' " t'" " " Y "

depth cures laziness" I

① We obtain recursions in l for full Ex : the,

v 'Ll = Efczie!) ] - 3 # [ H'¥532distribution { Zina ,

i - i,. . -

, we ,LEA } =-1

We get recursion'

-

m know" once solve

to all orders in ' In : her'ne-

recursionEA d

Pz !Z) = exp { - ( E÷yz + Z

"

I + zb 'zw tell 'eI"

= (Mick.) +⇐⇒

+. ..

→ -

this shows that @ order Yu only 2nd &+ GET "

7K¥) tree V 'T'

+ Octa)-

-4kcumulants appear ! =L @ crit

② At criticality : we solve V'es,W"

,.. . = Ene + n÷e

,

V'I'

recursions for nearby inputs ( i.e. get

Z'g),7 Z 'T , .

-- ) to obtain :

Pz !.gg#--expf-(EI*+Enz4+Leay2z6+ . . .) }

= C EE Ee '

t c En

=

This shows that depth amplifies finite width

effects :

• Dist to Gaussian t En

• Cov CHE'd'

,Cz 's, = En