. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Computational methods in applied inverse problems Uri Ascher Department of Computer Science University of British Columbia October 2017 Uri Ascher IMPA thematic program October 2017 1 / 56
67
Embed
Computational methods in applied inverse problemsmtm.ufsc.br/~aleitao/public/impa-pt2017/Handout-MC03-B.pdf · Computational methods in applied inverse problems Uri Ascher Department
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
The practice of manipulating given observed data for solving inverseproblems is known to have its perils: loss of statistical relevance,danger of calibrating a model to handle our own generated errors, etc.
And yet it seems to be everywhere in practice!...1 “Completing scarce data” by some interpolation/extrapolation or otherapproximation
...2 Preferring to see data given at regular mesh nodes, or otherwise havinga hidden uncertainty in the location of data values
...3 “Completing data” to obtain a more efficient algorithm
...4 “Completing data” to obtain a “more solid theory”
...5 Manipulating data because we don’t know how to solve the problemotherwise.
When is it OK to do this?!
Attempt to get more insight by considering case studies.
Uri Ascher IMPA thematic program October 2017 5 / 56
The practice of manipulating given observed data for solving inverseproblems is known to have its perils: loss of statistical relevance,danger of calibrating a model to handle our own generated errors, etc.
And yet it seems to be everywhere in practice!...1 “Completing scarce data” by some interpolation/extrapolation or otherapproximation
...2 Preferring to see data given at regular mesh nodes, or otherwise havinga hidden uncertainty in the location of data values
...3 “Completing data” to obtain a more efficient algorithm
...4 “Completing data” to obtain a “more solid theory”
...5 Manipulating data because we don’t know how to solve the problemotherwise.
When is it OK to do this?!
Attempt to get more insight by considering case studies.
Uri Ascher IMPA thematic program October 2017 5 / 56
The practice of manipulating given observed data for solving inverseproblems is known to have its perils: loss of statistical relevance,danger of calibrating a model to handle our own generated errors, etc.
And yet it seems to be everywhere in practice!...1 “Completing scarce data” by some interpolation/extrapolation or otherapproximation
...2 Preferring to see data given at regular mesh nodes, or otherwise havinga hidden uncertainty in the location of data values
...3 “Completing data” to obtain a more efficient algorithm
...4 “Completing data” to obtain a “more solid theory”
...5 Manipulating data because we don’t know how to solve the problemotherwise.
When is it OK to do this?!
Attempt to get more insight by considering case studies.
Uri Ascher IMPA thematic program October 2017 5 / 56
Given observed data d ∈ Rl and a forward operator f(m) whichprovides predicted data for each instance of distributed parameterfunction m, find m (discretized and reshaped into m) such that thepredicted and observed data agree to within noise:
d = f(m) + η.
Consider a case where a PDE must be solved to evaluate the forwardoperator, i.e., f(m) = Pu = PG (m)q, where G is a discrete Green’sfunction.Iterative algorithm on m to reduce objective. Assumingη ∼ N (0, σ2I ), the maximum likelihood (ML) data misfit function is
ϕ(m) = ∥f(m)− d∥22The discrepancy principle yields the stopping criterion
ϕ(m) ≤ ρ, where ρ = σ2l
. Uri Ascher IMPA thematic program October 2017 7 / 56
Given observed data d ∈ Rl and a forward operator f(m) whichprovides predicted data for each instance of distributed parameterfunction m, find m (discretized and reshaped into m) such that thepredicted and observed data agree to within noise:
d = f(m) + η.
Consider a case where a PDE must be solved to evaluate the forwardoperator, i.e., f(m) = Pu = PG (m)q, where G is a discrete Green’sfunction.Iterative algorithm on m to reduce objective. Assumingη ∼ N (0, σ2I ), the maximum likelihood (ML) data misfit function is
ϕ(m) = ∥f(m)− d∥22The discrepancy principle yields the stopping criterion
ϕ(m) ≤ ρ, where ρ = σ2l
. Uri Ascher IMPA thematic program October 2017 7 / 56
Given observed data d ∈ Rl and a forward operator f(m) whichprovides predicted data for each instance of distributed parameterfunction m, find m (discretized and reshaped into m) such that thepredicted and observed data agree to within noise:
d = f(m) + η.
Consider a case where a PDE must be solved to evaluate the forwardoperator, i.e., f(m) = Pu = PG (m)q, where G is a discrete Green’sfunction.Iterative algorithm on m to reduce objective. Assumingη ∼ N (0, σ2I ), the maximum likelihood (ML) data misfit function is
ϕ(m) = ∥f(m)− d∥22The discrepancy principle yields the stopping criterion
ϕ(m) ≤ ρ, where ρ = σ2l
. Uri Ascher IMPA thematic program October 2017 7 / 56
[Dupire, 1994]: replace the Black-Scholes equation for option price by aparabolic PDE of the form
∂C
∂τ=
1
2σ2(τ,K )K 2 ∂
2C
∂K 2− bK
∂C
∂K, τ > 0,K ≥ 0,
s.t. initial and boundary conditions (for calls)
C (τ = 0,K ) = (S0 − K )+,
limK→∞
C (τ,K ) = 0, limK→0
C (τ,K ) = S0.
Here τ is time to maturity, K is strike price, C = C (τ,K ) is value of theEuropean call option with expiration date T = τ , and σ(τ,K ) is volatility.Can write all this in operator form as
L(σ)C = q(S0),
with L a linear differential operator for a given σ.Assume first that the stock price S0 is a given parameter.Calibrating the model: solve inverse problem for σ(τ,K ) given C -data.
Uri Ascher IMPA thematic program October 2017 11 / 56
Several researchers have applied interpolation/extrapolation to thistype of data, followed by assimilation of the resulting data set withthe discretized Dupire PDE problem.
Use [Kahale, 2005] for this purpose. This algorithm applies datacompletion with a “financial prior”, insisting that the resulting datasurface reproduce the “smile” effect.
An obvious objection, however, is that the resulting data surface doesnot satisfy the discretized differential problem for any m(τ, y), andvice versa. The assimilation of these two pieces of information may bemore difficult.
Compare this to not modifying the given data, using for both cases aTikhonov-type regularization as well as EnKF.
Uri Ascher IMPA thematic program October 2017 15 / 56
Several researchers have applied interpolation/extrapolation to thistype of data, followed by assimilation of the resulting data set withthe discretized Dupire PDE problem.
Use [Kahale, 2005] for this purpose. This algorithm applies datacompletion with a “financial prior”, insisting that the resulting datasurface reproduce the “smile” effect.
An obvious objection, however, is that the resulting data surface doesnot satisfy the discretized differential problem for any m(τ, y), andvice versa. The assimilation of these two pieces of information may bemore difficult.
Compare this to not modifying the given data, using for both cases aTikhonov-type regularization as well as EnKF.
Uri Ascher IMPA thematic program October 2017 15 / 56
Maximum likelihood for simplest case of white noise:
ϕ(mh, uh) = ∥Puh − d∥2 = ∥PLh(mh)−1qh − d∥2
where matrix P projects to data locations: P has many more columnsthan rows for original data, whereas P = I for completed data.Regularize the problem: minimize the maximum a posteriori (MAP)merit function
ϕR(mh, uh) = ϕ(mh, uh) + R(mh).
Our Tikhonov-like regularization operator is (a0 a known constant)
R(mh) = α1
∑i
∑j
(mi ,j − a0)2
+α2
∆τ2
∑j
Mτ∑i=1
(mi ,j −mi−1,j)2 +
α3
∆y2
∑i
My∑j=1
(mi ,j −mi ,j−1)2.
Uri Ascher IMPA thematic program October 2017 16 / 56
These results clearly show that the data completion approach has notdelivered.
Additional tests for Henry Hub and WTI prices, using bilinearinterpolation for the data completion and different α-weights in theTikhonov-type priors, also clearly indicate that it is better to avoidthe extensive data completion required here: the market impliedsmile, which has an important relationship with market risk, is betterfitted upon using just the original data.
Both EnKF algorithms we tried [Iglesias, Law & Stuart, 2013; Calvetti,
Ernst & Somersalo 2014] were trivially (and significantly) improved byadding additional regularization using a0 and first derivatives.
After this improvement the EnKF algorithms were comparable to butnot better than the Tikhonov-type regularization. Big plus: no ad hocparameter search was required.
Uri Ascher IMPA thematic program October 2017 18 / 56
These results clearly show that the data completion approach has notdelivered.
Additional tests for Henry Hub and WTI prices, using bilinearinterpolation for the data completion and different α-weights in theTikhonov-type priors, also clearly indicate that it is better to avoidthe extensive data completion required here: the market impliedsmile, which has an important relationship with market risk, is betterfitted upon using just the original data.
Both EnKF algorithms we tried [Iglesias, Law & Stuart, 2013; Calvetti,
Ernst & Somersalo 2014] were trivially (and significantly) improved byadding additional regularization using a0 and first derivatives.
After this improvement the EnKF algorithms were comparable to butnot better than the Tikhonov-type regularization. Big plus: no ad hocparameter search was required.
Uri Ascher IMPA thematic program October 2017 18 / 56
These results clearly show that the data completion approach has notdelivered.
Additional tests for Henry Hub and WTI prices, using bilinearinterpolation for the data completion and different α-weights in theTikhonov-type priors, also clearly indicate that it is better to avoidthe extensive data completion required here: the market impliedsmile, which has an important relationship with market risk, is betterfitted upon using just the original data.
Both EnKF algorithms we tried [Iglesias, Law & Stuart, 2013; Calvetti,
Ernst & Somersalo 2014] were trivially (and significantly) improved byadding additional regularization using a0 and first derivatives.
After this improvement the EnKF algorithms were comparable to butnot better than the Tikhonov-type regularization. Big plus: no ad hocparameter search was required.
Uri Ascher IMPA thematic program October 2017 18 / 56
Uncertainty in data locations Different forms of location uncertainty
.. Examples
The notion that a (potentially noisy) data value di is given at a known,deterministic location xi is often violated in practice. Here are someexamples:
Uncertainty in data locations triangle mesh vs image denoising
.. Surface triangle mesh denoising
Left: noisy triangle mesh: the data are nodal values (xi , yi , zi ). Nodistinction between data value and location! Uncertainty in higherdimension.Right: Our denoised triangle mesh.[We had set out to generalize multiscale techniques for image denoisingand ended up devising a completely different multiscale method for thesurface mesh.]
Uri Ascher IMPA thematic program October 2017 22 / 56
Uncertainty in data locations triangle mesh vs image denoising
.. Surface triangle mesh denoising
Left: noisy triangle mesh: the data are nodal values (xi , yi , zi ). Nodistinction between data value and location! Uncertainty in higherdimension.Right: Our denoised triangle mesh.[We had set out to generalize multiscale techniques for image denoisingand ended up devising a completely different multiscale method for thesurface mesh.]
Uri Ascher IMPA thematic program October 2017 22 / 56
Forced to cut corners Calibrating a soft 3D object
.. Capturing data cont.
Left: a (high resolution) surface mesh S with 15,368 vertices is used as atemplate to track captured point clouds;Right: its (low resolution) corresponding volumetric mesh T with 9,594nodes is used for spatial co-rotated linear FEM simulations.
Uri Ascher IMPA thematic program October 2017 34 / 56
Forced to cut corners Calibrating a soft 3D object
.. Elastic deformation
Denote reference shape by X and dynamic, deformed positions at atime instant t by x = x(t).Element-wise stress-strain relationship using Hooke’s law andCauchy’s linear strain tensor is
σ = Eϵ = EBe(xe − Xe),
where the 6× 12 matrix Be = Be(Xe) depends on Xe nonlinearly.
For isotropic materials the 6× 6 matrix E only depends on Young’smodulus E and Poisson’s ratio ν.
Denoting the per-element rotation matrix obtained from polardecomposition by Re = Re(xe(t),Xe), the element-wise elastic forcesusing the co-rotated linear approximation are
fe(E , ν,Xe, xe(t)) = ReKe(RTe xe(t)− Xe),
Ke = VeBTe EBe,
Ke is 12× 12 element stiffness matrix and Ve is element volume.
Uri Ascher IMPA thematic program October 2017 35 / 56
Domain Ω is the unit square. Sources are of the form
qi (x) = δ(x− xi1)− δ(x− xi2)
with x1 positive unit point source on west boundary, x2 negative unitpoint source on east boundary. Vary p boundary wall locations to gets = p2 data sets.
Receivers are all grid points on north and south walls. No sources orreceivers at corners.
Uniform 64× 64 mesh
For bounds set µmax = 1.2maxµ(x), µmin = 1.2−1minµ(x)
PCG inner iteration limit r = 20; cgtol = 1.e-3.
Uri Ascher IMPA thematic program October 2017 42 / 56
Thus, we want s larger for better reconstruction quality.
But the cost of solving the problem grows very fast! (at least linearlywith s). Need to find more efficient approximations for evaluatingmisfit function ϕ(m).
Uri Ascher IMPA thematic program October 2017 43 / 56
Thus, we want s larger for better reconstruction quality.
But the cost of solving the problem grows very fast! (at least linearlywith s). Need to find more efficient approximations for evaluatingmisfit function ϕ(m).
Uri Ascher IMPA thematic program October 2017 43 / 56
Data completion and other statistically unholy manipulations such asignoring location uncertainty are not an ideal undertaking from atheoretical point of view.
But in practical situations it is often quietly done bymathematicians, computer scientists and engineers alike.
We have seen instances where (more massive) such practices shouldbe avoided.
We have seen instances where such practices can be tolerated,typically when other uncertainties dominate.
We have seen instances where such practices seem essential forobtaining plausible results, and where better algorithms are furthersought.
The larger the proportion of missing data, the harder it is to producean adequate completed set.
Uri Ascher IMPA thematic program October 2017 55 / 56