Top Banner
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008
26

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Prénom Nom

Document Analysis:Parameter Estimation for Pattern Recognition

Prof. Rolf Ingold, University of Fribourg

Master course, spring semester 2008

Page 2: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

2

Outline

Introduction Parameter estimation Non parametric classifiers : kNN Neural networks Hidden Markov Models Other approaches

Page 3: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

3

Introduction

Bayesian decision theory provides a theoretical framework for statistical pattern recognition

It supposes the following probabilistic information to be available: n, the number of classes

P(i), the a priori probability (prior) of each class i

p(x|i), the distribution of the feature vector x, depending of the

class i

How to estimate these values and functions ? especially how to estimate the class dependent distribution (or

density) functions

Page 4: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

4

Approaches for statistical pattern recognition

Several approaches try to overcome the difficulty of getting the class dependent feature distributions (or densities): Parameter estimation : the form of the distributions is

supposed to be known; only some parameters have to be estimated from training samples

Parzen windows : densities are estimated from training samples by “smoothing” them with a window function

K-nearest neighbors (KNN) rule : the decision is associated with the dominant class of the K-nearest neighbors taken from the training samples

Functional discrimination : the decision consist in minimizing an objective function within an augmented feature space

Page 5: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

5

Parameter Estimation

By hypothesis, the following information is supposed to be known n, the number of classes

for each class i

the a priori probability P(i) the functional form of the class conditional feature densities

with unknown parameters i

a labeled set of training data Di={xi1, xi2,..., xiNi} supposed

to be drawn randomly from i

In fact parameter estimation can be performed class by class

Page 6: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

6

Maximum likelihood criteria

Maximum likelihood estimation consists in determining i that

maximizes the likelihood of Di , i.e

For some distributions, the problem can be solved analytically by the equations

is it really a maximum ?

If the solution can not be found analytically, it can be computed iteratively by a gradient climbing method

k

iikii pDp )|()|( θxθ

0θx k

ikp )|(lnθ

Page 7: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

7

Univariate Gaussian distribution

In one dimension, the normal distribution N(,) is defined by the expression

represents the mean

represents la variance lemaximum of the curve

corresponds to

2

2

1exp

2

1)(

xxp

399.0~2/1)(p

Page 8: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

8

Multivariate Gaussian distribution

At d dimensions, the generalized normal distribution N(,) is defined by

where represents the mean vector

represents the covariance matrix

)()(

2

1exp

)2(

1)( 1

2/12/μxμxx t

dp

][

)(][

ii xE

dpE

xxxxμ

)])([(

)())((]))([(

jjiiij

tt

xxE

dpE

xxμxμxμxμx

Page 9: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

9

Interpretation of the parameters

The mean vector represents the center of the distribution The covariance matrix describes the scatter

it is symmetrical : ijji

it is positive semidefinite (usually postive definite)ii i

2 ≥ the principal axes of the

hyperboloids are given by the eigenvectors of

the length of the axes are given by the eigenvalues

if two features xi and xj are

statistically independent, then ij ji

Page 10: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

10

Mahalanobis distance

Regions of constant density are hyperboloids centered at and characterized by the equations

where C is a positive constant The Mahalanobis distance from x to is defined as

Ct )()( 1 μxμx

)()( 1 μxμx t

Page 11: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

11

Estimation of and of normal distributions

In the one-dimensional case, the maximum likelihood criteria leads to following equations

In the one-dimensional case the solution is

Generalized to the multi-dimensional case, we obtain

0)(1

0)(1

4*

2*

2*

*2*

k

k

k

kk

x

x

k

kk

k xn

xn

2*2** )(11

tk

kk

kk nn

)()(11 **** μxμxxμ

Page 12: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

12

Bias Problem

The estimation for (resp. ) is biased; the expected value over all sets of size n is different to the true variance, which is

An unbiased estimation would be

Both estimator converge asymptotically

Which estimator is correct ? they are neither right or wrong ! no one has all desirable properties Bayesian learning theory can give an answer

22 1)(

1

n

nxx

nE

kk

k

kxn

2*)μ(1

Page 13: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

13

Discriminant functions for normal distributions (1)

For normal distributions, the following discriminant functions may be stated

In the case where all classes share the same covariance matrix

the decision boundaries are linear

)(ln)|(ln)( iii Ppg xx

)(lnln2

12ln

2)()(

2

1)( 1

iiit

ii Pd

g μxμxx

)(lnln2

1)()(

2

1)( 1

iiit

ii Pg μxμxx

)(ln2

1)( 11

iit

it

ii Pg μμxμx

)(ln)()(2

1)( 1

iit

ii Pg μxμxx

Page 14: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

14

Linear decision boundaries for normal distributions

Page 15: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

15

Discriminant functions for normal distributions (2)

In the case of arbitrary covariance matrices, boundaries become quadratic

)(lnln2

1

2

1

2

1)( 111

iiiit

i

t

iiit

i Pg

μμxμxxx

Page 16: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

16

Page 17: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

17

Font Recognition : 1D-Gaussian estimation (1)

Font style discrimination (■ roman ■ italic) using hpd-stdev

estimated models fit with distributions decision boundary is accurate recognition accuracy (96.3%) is confirmed

by the experimental confusion matrix

5610 39049 5951

2 4 6 8 10 12 14 16

0 .2

0 .4

0 .6

0 .8

Page 18: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

18

Font Recognition : 1D-Gaussian estimation (2)

Font boldness discrimination (■ normal ■ bold) using hr-mean

estimated models do not fit real distributions decision boundary is surprisingly well adapted recognition accuracy (97.6%) is high

as observed from the experimental confusion matrix

2 3 4 5 6 7 8

0 .2

0 .4

0 .6

0 .8

5942 58228 5772

Page 19: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

19

100

200

Font Recognition : 1D-Gaussian estimation (3)

Boldness is generally dependent on the font family

hr-mean can perfectly discriminate ■ normal and ■ bold fonts if the font family is known (recognition rate > 99.9%)

hr mean

hr mean

hr mean

hr mean

Times

Courier

Arial

all

100

100

200

100

200

Page 20: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

20

Font Recognition : 1D-Gaussian estimation (4)

Font family discrimination (■ Arial, ■ Courier, ■ Times) using hr-mean

estimated models do not fit real distributions at all decision boundary are inadequate recognition accuracy is bad

(41,9%)

2 3 4 5 6 7 8

0 .2

0 .4

0 .6

0 .8

2000 403 15975 2004 19911055 1927 1018

Page 21: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

21

Font Recognition : 1D-Multi-Gaussian estimation

Font family discrimination (■ Arial, ■ Courier, ■ Times) using hr-mean, supposing font style to be known for learning

estimated models fit real distributions decision boundary are adequate recognition accuracy is nearly optimal

for the given feature (89,6%)

2 3 4 5 6 7 8

1

2

3

4

5

3722 125 153119 3753 128517 209 3274

Page 22: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

22

Font Recognition : 2D-Gaussian estimation

Font family discrimination (■ Arial, ■ Courier, ■ Times) using two features: hr-stdev and vr-mean

models fit approximately two classes but not the third one decision boundary is surprisingly well adapted recognition accuracy (93,5%) is reasonable

0 1 2 3 41

2

3

4

5

6

7

3918 82 0175 3683 14218 364 3618

Page 23: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

23

Font Recognition : General Gaussian estimation

Performance of font family discrimination (■ Arial, ■ Courier, ■ Times) depends of the used feature set hr-stdev : recognition rate => 72,7% hr-stdev, vr-mean : recognition rate => 93,5% hp-mean, hr-mean, vr-mean : recognition rate => 98,0% hp-mean, hpd-stdev, hr-mean, vr-mean, hr-stdev, vr-stdev :

recognition rate => 99,7%

Page 24: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

24

Font recognition : classifier for all 12 classes

Discrimination of all fonts using all featureshp-mean, hpd-stdev, hr-mean, hr-stdev, vr-mean, vr-stdev overall recognition rate of 99.6% most errors due to roman/italic confusion

992 0 7 0 0 1 0 0 0 0 0 00 996 0 4 0 0 0 0 0 0 0 01 0 996 0 0 2 0 1 0 0 0 00 4 0 996 0 0 0 0 0 0 0 00 0 0 0 1000 0 0 0 0 0 0 00 0 0 0 0 983 0 17 0 0 0 00 0 0 0 2 0 998 0 0 0 0 00 0 5 0 0 4 0 991 0 0 0 00 0 0 0 0 0 0 0 1000 0 0 00 0 0 0 0 0 0 0 0 1000 0 00 0 0 0 0 0 0 0 0 0 1000 00 0 0 0 0 0 0 0 0 0 0 1000

Page 25: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

25

Error types

In a Bayesian classifier using parameter estimation, several error types occur Indistinguishability errors, due to overlapping of distributions,

which are inherent to the problem can not be reduced

Modeling errors, due to a bad choice for the parametric density functions (models) can be avoided by changing the models

Modeling errors, due to the imprecision of training data; can be improved by increasing training data

Page 26: Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

© Prof. Rolf Ingold

26

Influence of the size of training data

Evolution of the error rate as function of the size of training sets(experiment with 4 training sets and 2 test sets, ■ average)

50 100 150 200

0.994

0.995

0.996