Perturbation Theory for wide Networks ( joint w/ Dan Roberts , Sho Yaida ) F = ↳no , , TCW - x - b) dllfcw , b) • • • • ° . Fact : Any " nice " Mf can be approximated : • ° ° o • ' c b • ! - ! zazie ,pk+ , Mf I Guy , bj ) n , > 31 o - sea = Z' LE IR " ° . zcyc.IR " ' zcyc.IR 'll Tkm :[ Montanari t , Chizatt , Rotskofft , ] ← layer Let 's scale Wc ? = OC 's , ) , b' i' ' = o When z ! ! ! = ' W' ig ? or ( zjj ? ) x b' f ' you train w ! ? by =GD : T T neuron sea fish , GD = 2- Wass optimal transport tool : Say a bit w happens when we Ems ! on elf Tojo There are actually many different - when use lez - loss width limits : init , learning rate , no , be = Sw ⇐ ¥ : ! ; ' ' II 't :{ ' wi .edu#oca-ibaoisJY' " s ' Note that Z' 3 ' = f ,µo+ , 6( Wore + b) dplhf.be ) Net • Tag - K where t d Gen ) - - cmipneintgs . I w u = ¥ , ' W' ' j ' 8cway.yc.gg Meas Stilo , g) dy re . ) w/ # atoms e- width f tick , a) doc = u ( o ) East : Any " reasonable " o , any f- R " → IR 1¥ we usually take W ' ? no Chik ) we can write =
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Perturbation Theory for wide Networks
( joint w/ Dan Roberts,Sho Yaida )
F = ↳no, ,
TCW - x - b) dllfcw , b)• •
•• °
. Fact : Any"
nice"
Mf can be approximated :•
° °
o
•'
cb
•! - -
- !
zazie ,pk+ ,Mf I Guy
, bj)n,> 31
.
o -
sea = Z'LE IR"°
.
zcyc.IR" ' zcyc.IR'll Tkm :[Montanari t , Chizatt ,
Rotskofft,. . . ]
← layer Let 's scale Wc? = OC 's,) ,
b'i''= o
.
When
z ! !! =
'
W'ig? or ( zjj? ) x b'f'
you train w !? by =GD :
T Tneuron sea fish
,
GD = 2- Wass optimal transporttool : Say a bit w happens when we Ems ! on elf .
Tojo Thereare actually many different -
when use lez - loss .
width limits : init, learning rate ,
no, be = Sw
⇐¥:! ;'
'II't
:{ ' wi.edu#oca-ibaoisJY'"s'Note that
Z' 3'
= f,µo+ ,6( Wore + b) dplhf.be) Net • Tag
-
K
wheret d Gen ) -- cmipneintgs . I w
u = ¥,
'
W' 'j'
8cway.yc.ggMeas Stilo
, g) dy -
- re . )w/ #atoms
e- width f tick , a) doc = u ( o )East : Any
" reasonable " o, any
f- R" → IR1¥ we usually take W
'? noChik)we can write
=
ne
z !! = JE! wtf} oeczcejj's ) + b'i' H) = e- I E.
Ei ECE "-" 395, ocnuhy a
The "standard
" init scheme : cw.CI Finally , we can describe GP recursively :
Wing n Glo , Cwlne - i ) b'i' ~ GCO,Cb) Eff .IE?.a.z'ieLz)--EfCb- ¥
,
olda .) x
Beal : line 2-' am → GP zeicxa) scz' ]
N,
- - - he - e → X ly
Notation : fixed set in IR"
z'
= { zf.gg ,yea }
= Cbt Cw E [of 't ' '
L.
) 6C D)>CA Thos
tem : Given z'
,zina is Gaussian wliid components : K !:3
,
=him Con, ( Z'E) Coca ,) , z
'(xn)
g qle-n Wi - " heute e
Con, ( Zina.
.
Za,
IZ' ) iiza.dz
" '
= Cb + Cw ( T (Za) 6 (Zaz) >kle-is
=snccb - Ee. . Ei
"
-idzYES¥⇒ . Geo ,
-
(" "I:;5)FBh
The entries of E'e-"
are" collective observables " :
O'
;"
= ate,
"
flag ; SZFII.ae#I/kYIYz--cb+CnhsfzaiqCzanskTILem : O '
;'= E[ Uhf ) + o Cri't) t T
mm mm
←xx E fr" Z'Y = Z'e)Coca)
Thus,
E[ e-it
'# o 's ] = ELE [ e-it'# o '
( zee I]
= Elf e- IET 55 . de-''
og ;]
K'
= Cbt Cw LOCH) Http) >kid kcgf-nntoxa.gs ypyno
This is a 3D system in K 'LL, kief , Kamp •
"B
xx
(*) seek fixed point for K 'LL : •
*if:c .= Est CE LE > k* B
⇒ folks lR=k* ⇒ kYL=K* He !-
•REE
(a)OK 'LL
= 1 He ⇒ CIL @2) " 7k¥ -- IWalk"I=k* F-
(⇒ OKEY =L He ↳ Cws@ ' 52=1-K* xp -- reactor) Var[zf%I=VaEf%to5⇒q/kYL=kYf=kYf=k* -EE
= e=
"
Tuning to criticality"
Necessary to do if want" nice
"
large d behavior
XavierEx : 6¥ Reluct ) = tllfso He
1¥ :c.÷¥⇒%:D .
Gioro tri
Zina = Iii'
wie} ocz' 'al e b'in .EE/e=oNTK3/ELNtk3- FE + OKEY)summary of we Did :[