Shallow Learning with Kernels for Dictionary-Free Magnetic Resonance Fingerprinting Gopal Nataraj * , Mingjie Gao * , Jakob Assl¨ ander † , Clayton Scott * , & Jeffrey A. Fessler * ISMRM Workshop on Magnetic Resonance Fingerprinting * Dept. of Electrical Engineering and Computer Science, University of Michigan † Center for Biomedical Imaging, NYU School of Medicine
31
Embed
Shallow Learning with Kernels for Dictionary-Free Magnetic ...web.eecs.umich.edu/~gmingjie/doc/nataraj-17-slw_presentation.pdf · for Dictionary-Free Magnetic Resonance Fingerprinting
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Shallow Learning with Kernels
for Dictionary-Free Magnetic Resonance Fingerprinting
Gopal Nataraj∗, Mingjie Gao∗, Jakob Asslander†, Clayton Scott∗, & Jeffrey A. Fessler∗
ISMRM Workshop on Magnetic Resonance Fingerprinting
∗Dept. of Electrical Engineering and Computer Science, University of Michigan†Center for Biomedical Imaging, NYU School of Medicine
Problem Statement
Given: at every voxel, measurement vector y = s(x) + ǫ
-1
4
a.u
.
MRF “component” images (more later...)
y x(y)x(·)
Task: design fast voxel-by-voxel estimator x(·)
that scales well with #unknowns per voxel, L2
Problem Statement
Given: at every voxel, measurement vector y = s(x) + ǫ
-1
4
a.u
.
MRF “component” images (more later...)
T1
600
800
1000
1200
1400
1600
1800
2000
ms
T2
50
80
110
140
170
200
ms
y x(y)x(·)
Task: design fast voxel-by-voxel estimator x(·)
that scales well with #unknowns per voxel, L2
Machine Learning at Different “Depths” for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
Deep Learning
• promising for QMRI [Cohen et al., 2017, Virtue et al., 2017]
• needs many training points to avoid overfitting
• trained via non-convex optimization
• limited theoretical basis
3
Machine Learning at Different “Depths” for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
Deep Learning
• promising for QMRI [Cohen et al., 2017, Virtue et al., 2017]
• needs many training points to avoid overfitting
• trained via non-convex optimization
• limited theoretical basis
3
Machine Learning at Different “Depths” for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
Deep Learning
• promising for QMRI [Cohen et al., 2017, Virtue et al., 2017]
• needs many training points to avoid overfitting
• trained via non-convex optimization
• limited theoretical basis
Shallow Learning
• simpler structure needs fewer training points
• fast training via convex optimization
3
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
gl∈Gbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2+ρl‖gl‖
2G
(1)
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
glbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2
ill-posed!
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
glbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2
ill-posed!
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
gl∈Gbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2+ρl‖gl‖
2G
(1)
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
Shallow Learning with Kernels for QMRI
Idea: “learn” separate scalar estimators x1(y), . . . , xL(y) from simulated training data
• sample (x1, ǫ1), . . . , (xN , ǫN) and simulate y1, . . . , yN via signal model s
• design nonlinear functions xl(·) := gl (·) + bl that seek to map each yn to xl ,n:
(gl , bl
)∈
arg min
gl∈Gbl∈R
1
N
N∑
n=1
(gl (yn) + bl − xl ,n)2+ρl‖gl‖
2G
(1)
Solution: Parameter Estimation via Regression with Kernels (PERK)
[Nataraj et al., 2017b, arXiv:1710.02441]
• restrict optimization to a certain rich function space G with kernel k
• optimal gl ∈ G takes form gl(·) =∑N
n=1 al ,nk(·, yn) [Scholkopf et al., 2001]
Fast, simple implementation: nonlinear lifting + high-dimensional linear regression
4
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
5
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
-1
4
a.u
.
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional
kx
ky
kx
ky
...
flip 1
...
flip 840
[Asslander et al., 2017]
0 500 1000 1500 2000 2500 3000 3500
Time (ms)
0
0.15
-1
4
a.u
.
data-sharing across flips;
gridding; FFT; PCA
V ∈ C840×6
minY
∥∥k−A(YVH
)∥∥22
Y ∈ Cnvoxels×6 5
PERK for Magnetic Resonance Fingerprinting (MRF)
To control lifting dimension, desirable for y to be low-dimensional