Seismic data inversion

2000 07 07 1 AGIP MILANO

Seismic data inversion

Enrico PieroniErnesto BonomiEmma Calabresu ()

Geophysics Area CRS4


Inverse problems are among the most challenging in

computational and applied science and have been studied

extensively.

Although there is no precise definition inverse problems

are concerned with the determination of inputs or sources

from observed outputs or responses.

This is in contrast to direct problems, in which outputs or

responses are determined using knowledge of the inputs

or sources.

Inverse problems are among the most challenging in

computational and applied science and have been studied

extensively.

Although there is no precise definition inverse problems

are concerned with the determination of inputs or sources

from observed outputs or responses.

This is in contrast to direct problems, in which outputs or

responses are determined using knowledge of the inputs

or sources.

The Art of Inverse Probleminferring model parameters from output data


“Gauss-Newton and full Newton methods in frequency-space seismic waveform inversion”Pratt, Shin, HicksGeohpys.J.Int. (1998) 133, 341-362

“High resolution velocity model estimation from refraction and reflection data”Forgues, Scala, PrattSEG 1998

“Seismic Waveform inversion in the frequency domain”Pratt, Geophysics Jan 11, 1999

“Nonmonotone Spectral Projected Gradient Methods in Convex Set”1999, Birgin, Martinez, Raydan

Presentation outline

• inversion framework

• mathematical framework

• steepest descent optim.

• lagrangian approach

• optimization loop

• newton optimization

• conjugate direction opt.

• 1d optimization

• constrained optimization

• test cases

“Multiscale seismic waveform inversion”Bunks, Saleck, Zaleski, ChaventGeohpysics (1995) 60, 1457-1473

“Nonlinear inversion of seismic reflection data in a laterally invariant medium”Pica, Diet, Tarantola, Geohpysics (1990) 55, 284-292

“Pre-stack inversion of a 1D medium”Kolb, Collino, Lailly IEEE (1986) 74, 498-508


• Parameters: NxNyNz unknowns to recover: the velocity field c(x,y,z)

• Observed data/measurements: recorded data at a reference depth

STACK(x,y,t) = P(x,y,z=0,t).

• Simulated data: wave-field propagation imposed by the acoustic wave

equation using some trial velocity field

• Inversion: find the velocity field that minimizes some measure of the misfit

between observed and simulated data

Inversion framework

We solved the inverse problem with a single shot acquisition.The generalization to multiple shots is straightforward and can result in a better inversion.


Mathematical framework

)( ),,(),,,( 2

1)]([ 2 ztyxSTACKtzyxPdtdVcPE

),,,(),,,( ),,(

1 22

tzyxstzyxPzyxc tt

• measure STACK(x,y,t) at same reference level z=0, produced by a single source

• try a guess c(0)(x,y,z) for the velocity field

• solving the acoustic wave equation, simulate the pressure field over the entire

spatial domain (with adequate B.C. and I.C.)

• evaluate the error or cost function and if necessary its derivatives (cumbersome)

)]([min s.t. )()()1( cPEccc cnnn

• update iteratively the velocity field, with the intent to minimize the error function

• iterate this procedure up to a fixed “error threshold”


Steepest descent optimization

The velocity updating technique is usually based

on local informations, e.g. the gradient: 0 some with dc

dEcc

0)( cdc

dE

c

)(cE

0)( cdc

dE

*c

0*' dc

dEcc

problem: avoid local minima

*0 min cccdc

dE ?

Fixed point = minima

c

)(cE

*cminc


sP

zyxcdtdVcPEcPj tt

22 ),,(

1)]([],,[

0),,,(),,,(

)()],,(),,,([),,,(

),,,(),,,(),,(

1 22

TzyxTzyx

ztyxSTACKtzyxPtzyxS

tzyxStzyxzyxc

t

tt

0

0

jP

j

Waveequation

Lagrangian approach

T

tt dtPcdc

dE

c

j

03

2

From P and evaluate the gradientPROBLEM: time alignment!

Constrained minimization problem adjoint field

A sort of wave

equation with source

term = residual error

Back in time!


do it=0, nit-1! call FMod do step = 0, nt-1 ! call BMod call LoadMeasField call AdjMod call PartialGrad call PartialCostF ! end do call Optimizer!end do Inner loop: align in time both direct

and adjoint fields to perform in-core gradient evaluation

Optimization loop

Record data at z=0 & on the boundaries

Use information on the boundaries to backpropagate field P

Load observed data

With real and simulated measurements build the source term and solve for the adjoint field

FMod

BMod AdjMod

),( trP

),( trP

),( tr

0t

Tt

t

Evaluate partial cost function and gradient

Update velocity field


Newton optimization

The optimization procedure can use also information from the Hessian (second

derivative matrix) but this is very expensive for both computational (# direct

propagation = # parameters) and storage requirements ( [NxNyNz]2 )!

E.G. Newton, Quasi-Newton or Gauss-Newton methods:

Thus, aiming to a 3D reconstruction, we decided to only use the gradient.

)( 1 EcHcc


Optimization techniquesst

orag

e

convergence


Conjugate direction optimization

To achieve better convergence

we studied different conjugate

direction algorithms:

[1] Fletcher-Reeves

[2] Polak-Ribiere

[3] Hestenes-Stiefel

(but we have not observed

sensible differences)

[1]

[2]

[3]

)(

)()1()(

)1()(2)1(

2)(

)1()(2)1(

2)(

2)1(

)0()0()()1()1(

)()()1(

kkk

kkk

k

kkk

k

k

k

kk

kk

kk

kk

ggd

ggg

g

ggg

g

g

gddgd

dcc

Eg


1D optimization

At each iteration step, for each fixed direction d and velocity c, find a scalar

such that the resulting error function (depending now on a single real

parameter)

be minimum,

E.G. by line search, bisection, generalized decreasing conditions

) () ( dcjF

) (min F


Constrained optimization

Because of the box constraints over the velocity,

we are forced to adopt the projected conjugate gradient:

if

otherwise

if

'

maxmax

minmin

ccc

c

ccc

c

cg

cg

cg

maxmin ccc

dcccg


nx = 116nz = 66nt = 270dx = 3. dz = 3. dt = 0.00065thick_x = 0thick_z = 0rec_thick_x = 1rec_thick_z = 1 z_record = 4 Nopt = 20Niter = 100

We will consider inversion of small 2D synthetic data-sets. For a better tuning of the algorithms we used velocity field with no lateral variations, but thecode is genuine 2D.

Test cases


Target: piecewise constant function

Initial guess: straight line

Very good result, small changes after 140 iterations ...


Log !The cost function decreasesof about 4 orders of magnitude.The steepest slope is obtained in the first ~20 iterations.A second sudden jump comes as the velocity gets the second ridge!


Target: piecewiseconstant function


After ~10 iterationswe get the first ridge ...


We see the steepest slope in the first ~10 iterations, a ‘plateau’ seems tofollow!


We take one of the last iterated field (#11) and freeze the gradientof the first (20) layers

In ~20 iterations we reach both the first and the second ridges!


After ~5 iterations the main ridge is detected!


Target: piecewiseconstant function

Initial guess: straight linebut it does not matches the ‘trend’

Iterated velocity field

Things goes wrong if the low frequency trend is not included in the initial guess ...



Freezing the first 20 layers, the 1st discontinuity gets worse but we better recover the 2nd one ...

Here we startfrom #2 of previousiterations ...



Target: parabola


Good!After ~170 iterationsthings does not change too much!


Log !

3 orders of magnitudedecreasing!Steepest slope in the very first(~5) steps


Target: parabola

We start from the previousvelocity field (#60) and freezethe gradient at the first layers (#20)

In ~10 iterationswe get a reallygood result!


In the first ~8 iterationswe have the steepestslope ...


Target: parabola + sin


Iterated velocity field

Nice!The greatest part is donein the first ~100 iterations!


Cost Function

Log !

As observed, the big is done in the first ~100 iterations!


Target: parabola + sin

We start from a previous iteration(#20) and freeze the gradient at the first (20) layers

Not good as before: we only get the medium trend!



The main problem is the presence of a large number of local minima. To get rid of them is possible to linearize the direct model (eg Born approx.), to have a convex cost function

or adopt some multi-scale approach:large to small spatial scale, orlow to high time frequencies

but: loose refracted/multiply reflected waves, ecc. ecc.

but: the ultra-low frequency (the velocity field trend) components don’t produce reflected waves, thus must be already present in the initial guess.

Some preliminary conclusions


Advantages of the time frequency domain

• high data compression rate (~10)

• uncoupled problems in embarassing

parallelism

• large to small spatial scale approach, inverting

separately small and large frequencies

quickest and scalable approach

The advantage of

the time domain is

the intuitive

comprehension of

the involved fields

and results

Time versus Frequency Domain


Extra time!


Spectral conjugate gradient

Spectral Conjugate Gradient Method

the advantage is that in this way the conjugate direction (-g)

contains some explicit information on the Hessian matrix.

media integralHessian

)()(

2)(

)()1()1(

)0()0(

kk

k

k

kk

kk

k

ys

s

sgd

gd

)()1()(

)()1()(

kkk

kkk

ggy

ccs


• In geophysic application, the number of parameters is very large, this motivates the choice of a conjugate gradient minimization algorithm

• Without uphill movements (a<0) in the line search procedure, none optimization method will prevent the trapping inside the local minima

Modified Nonlinear Conjugate Gradient


• In our approach a can be either positive, describing a movement in the descent direction of pk, or negative.

• For a negative, the line search is similar

Line search (>0)


• very noisy function, presenting oscillations up to small scales (many local minima)• after 7 steps both Wolfe conditions are satisfied

Allowing a<0, the algorithm can visit and leave most local minima

2

)100cos(24

1)80cos(

16

112)( 2 xxxxf

Analytical 1D example


• the function is a sum of a simple convex quadratic and low-amplitude high frequency perturbation (N=2)• after 8 steps both Wolfe conditions are satisfied

Allowing a<0, the algorithm can visit and leave most local minima

)1,,1( ; )(sin,),1(sin2

1H

10cos1100

1

10cos100

11 H)(

10

2

1

2

100

xNxi

xx

xxxxxxxxf

ijij

T



• same function as before, with N=32• standard gradient based minimization methods are not satisfactory with such a noisy function• on nontrivial analytical examples, our approach converges quickly towards the global minimum



The number of parameters plays a crucial role in the choice of the algorithm to minimize the cost function j(p) in the parameter space

stor

age

Without uphill movements in the line search procedure, none optimization method will prevent the trapping inside the local minima

The landscape of the cost function presents many local minima

convergenceNumber of parameters


The number p of parameters impacts on the choice of the optimization strategy:• for very small p the gradient can be computed numerically• for small p, use the gradient and the Hessian to compute the search directions

- exact Hessian (Newton)- approximation of the Hessian as the iteration progresses (Quasi-Newton).

• for large p, use only the gradient to compute the search directions - nonlinear conjugate gradient

• for very large p, use stochastic methods- simulated annealing

The number p of parameters impacts on the choice of the optimization strategy:• for very small p the gradient can be computed numerically• for small p, use the gradient and the Hessian to compute the search directions

- exact Hessian (Newton)- approximation of the Hessian as the iteration progresses (Quasi-Newton).

• for large p, use only the gradient to compute the search directions - nonlinear conjugate gradient

• for very large p, use stochastic methods- simulated annealing

Number of parameters

Seismic data inversion

Documents