Top Banner
Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error Sergei Zhilin, [email protected] Altai State University, Barnaul, Russia
19

Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

Jan 01, 2016

Download

Documents

alan-sampson

Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error. Sergei Zhilin, [email protected] Altai State University, Barnaul, Russia. Plan. Fitting under interval error Simple method for outlier detection Geometric correction of satellite images - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

Simple Method for Outlier Detectionin Fitting Experimental Data

under Interval Error

Sergei Zhilin,[email protected]

Altai State University,Barnaul, Russia

Page 2: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

2

Plan

• Fitting under interval error

• Simple method for outlier detection

• Geometric correction of satellite images

• Connections between the proposed approach and other theories

• Conclusions

Page 3: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

3

Fitting under Interval Error

f(x,) +…

x1x2

xp

y

Input variablesx = (x1,…,xp) measured

without error

Output variable y

measured with error

Modeling function with known structure

Model parametersto be estimated Measurement error

• Black box approach

Page 4: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

4

Fitting under Interval Error

• Classical statistical approach often assumes that the measurement error is normal

• In real-life applications the error is rather interval than normal

• “Interval” means “unknown but bounded”: [j, j], where j is error bound in j-th

measurement, j=1,…,n• There are no other assumptions about the

error

Page 5: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

5

Fitting under Interval Error

• The structure of the modeling functionf (x,) is assumed fixed

.,...,1,),( nj yxfy S jjjjjj

• Each row (xj , yj , j) of the measurements table constrains possible values of the parameter with the set

n

jjSA

1

• Values of the parameter consistent with all constraints form the uncertainty set

Page 6: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

6

Set of feasible models

Fitting under Interval Error

• Fitting data with the model y = 1 + 2x

1

2

x

y

In (x, y) domain In (1, 2) domain

Uncertainty set A is unbounded =

not enough data to build the model

Uncertainty set A

Uncertainty set ASet of feasible

models

Page 7: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

7

Fitting under Interval Error

• Problems that may be stated with respect to the uncertainty set A

– Model parameters estimation

,min iA

i

,max iA

i

:],[],...,,[ 11 pp α .,...,1 pi

• Interval estimates of

• Point estimates of

,21

iii

.,...,1 pi :,...,1

p

Page 8: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

8

Fitting under Interval Error

• Problems that may be stated with respect to the uncertainty set A

– Prediction of the output variable value for fixed values of input variables

• Point estimate of y

)()(21

)( xyxyxy

,min)( xxy T

A

:)](),([ xyxy(x) y

• Interval estimate of y

,max)( xxy T

A

Page 9: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

9

Fitting under Interval Error

• All the above problems make sense only if the uncertainty set is not empty

• Possible reasons of the emptiness of the uncertainty set– Presence of outliers in the data set– Wrong structure assumed for the modeling

function

Page 10: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

10

Simple method for outlier detection

• Core idea– An outlier may be treated as a measurement

with the underestimated error (i.e. the actual measurement error is greater than the declared error j for it)

– What are the lower bounds j' for actual errors which provide non-empty uncertainty set?

Page 11: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

11

Simple method for outlier detection

1

2

x

y

In variables domain In parameters domain

• How much must we stretch the declared error interval in order to «correct» an outlier?

j'j

Let j' = wj ·j

wj = ?

Page 12: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

12

Simple method for outlier detection

• Weights wj may be found from the following optimization problem

(1)

(2)

n

jj

ww

1,min

,),( jjjjjjj wyxfwy nj ,...,1

(3),1jw nj ,...,1

(4),1jw nkj ,...,1We can only enlarge error intervals…

(3),1jw kj ,...,1

,...121 jwww

(5)njj www

mm ...21

......,....................

Uncertainty set constraints with movable bounds

…or “freeze” some of error

intervalsSome of the measurements

are obtained with equal errors

Page 13: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

13

Simple method for outlier detection

• Example

#Measurement

method x y

1 A 1 2.13 0.20

2 A 2 2.95 0.20

3 A 3 5.01 0.20

4 A 4 4.99 0.20

5 A 5 5.97 0.20

6 B 6 7.04 0.40

7 B 7 8.02 0.40

8 C 8 8.15 0.40

9 C 9 10.01 0.40

10 D 10 10.98 0.50

1 2 3 4 5 6 7 8 9 101

2

3

4

5

6

7

8

9

10

11

Data with outliers which give empty

uncertainty set

w

1.000

1.000

4.686

1.000

1.000

1.000

1.000

1.343

1.000

1.000 x

y

1st attempt Solution of LPP (1)-(3)

y = 1 + 2x

Looks like outlier caused by a blunder.Let’s try to exclude it.

Not so explicit.We need to examine the

precision of method C

Page 14: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

14

Simple method for outlier detection

• Example

#Measurement

method x y

1 A 1 2.13 0.20

2 A 2 2.95 0.20

3 A 3 5.01 0.20

4 A 4 4.99 0.20

5 A 5 5.97 0.20

6 B 6 7.04 0.40

7 B 7 8.02 0.40

8 C 8 8.15 0.40

9 C 9 10.01 0.40

10 D 10 10.98 0.50

1 2 3 4 5 6 7 8 9 101

2

3

4

5

6

7

8

9

10

11w

1.000

1.000

1.000

1.000

1.000

1.000

1.143

1.143

1.000 x

y

y = 1 + 2x

2nd attemptSolution of (1) subject to (2)-(3) and w8 = w9

Is the precision of the method C overestimated

on ~14%?

Summary

In order to correct inconsistent data set we have to answer the following questions:

1. Is the outlier #3 really caused by a blunder?

2. Is the outlier #8 caused by a blunder OR is the precision of the method C overestimated?

Page 15: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

15

Geometric correction ofsatellite images

y

x u

v

Distorted image

+

++

++

+

+

+

++

Target coordinate system

Ground Control Points

5919.3014309.602714179514

5927.5514349.30274520452

5991.4914486.30307229351

vuyx

Target coordinatesSource coordinates#

202

22011011000

202

22011011000

vbubuvbvbubby

vauauvavauaax

Geometric transformation

Obtained usinghigh-precision methods (GPS, large-scale maps)

Pointed by operatoron the screen with the error ≥ 1 pixel

+ +

Outliers are detected «on the fly» and operator

is noticed about error

+

After correction of outliers and building transformation,

target image is built

Page 16: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

16

Geometric correction ofsatellite images

Resulting image with ground control points

Po

siti

on

al u

nce

rta

inty

(x x

)+(y

y)

, p

ixel

s

Resulting image with positional uncertainty map

Page 17: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

17

Connections with other theories

• Proposed approach andinconsistent linear programming problems– When outliers are presented in the data, most of the

problems with respect to the uncertainty set may be stated as inconsistent linear programming problems

– Simple outlier detection method may be regarded as one of the possible ways to correct an inconsistent linear programming problem by building a minimal cost approximation by a proper linear programming problem.

Page 18: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

18

Connections with other theories

• Proposed approach and robust estimation

(1)

(2)

n

jj

ww

1,min

,),( jjjjjjj wyxfwy nj ,...,1

(3),1jw nj ,...,1

We can only enlarge error intervals…

Uncertainty set constraints with movable bounds (3'),0jw nj ,...,1

We allow to scale error intervals freely

(to expand and to contract)

Solution (*, w*) of (1)-(3') gives

* is M-estimator for parameters (known as L1)

Weight function: W(x) = 1/|x|.

Residuals: wj*·j.

Page 19: Simple Method for Outlier Detection in Fitting Experimental Data under Interval Error

19

Conclusions

• Outlier detection is necessary tool in fitting experimental data

• Interval error model provides effective means of solving outliers detection problem

• Proposed approach is based on the simple idea and may be simply implemented

• Proposed approach provides flexible way to express and take into account a priori information