06a Math Essentials 2

8/13/2019 06a Math Essentials 2

1/22

Jeff Howbert Introduction to Machine Learning Winter 2012 #

Machine Learning

Math EssentialsPart 2


2/22


Most commonly used continuous probabilitydistribution

Also known as the normal distribution

Two parameters define a Gaussian:

Mean location of center

Variance 2 width of curve

Gaussian distribution


3/22


2

2

2

)(

2/12

2

)2(

1),|(

x

ex


In one dimension


4/22


2

2

2

)(

2/122

)2(1),|(

x

ex


In one dimension

Normalizing constant:

insures that distribution

integrates to 1

Controls width of curve

Causes pdf to decrease as

distance from centerincreases


5/22



= 0 2= 1 = 2 2= 1

= 0 2= 5 = -2 2= 0.3

2

2

1

21

x

ex


6/22


)()(2

1

2/12/

1T

||

1

)2(

1),|(

xx

x

ed

Multivariate Gaussian distribution

In ddimensions

xand now d-dimensional vectorsgives center of distribution in d-dimensional space

2replaced by , the dx dcovariance matrix contains pairwise covariances of every pair of features Diagonal elements of are variances 2of individual

features

describes distributions shape and spread


7/22


Covariance Measures tendency for two variables to deviate from

their means in same (or opposite) directions at same

time


no

covariance h

igh(positive)

covarian

ce


8/22


In two dimensions


13.0

3.025.0

00


9/22


In two dimensions


26.0

6.02

10

02

10

01


10/22


In three dimensions


rng( 1 );

mu = [ 2; 1; 1 ];sigma = [ 0.25 0.30 0.10;

0.30 1.00 0.70;

0.10 0.70 2.00] ;

x = randn( 1000, 3 );

x = x * sigma;

x = x + repmat( mu', 1000, 1 );

scatter3( x( :, 1 ), x( :, 2 ), x( :, 3 ), '.' );


11/22


Orthogonal projection of yonto x Can take place in any space of dimensionality > 2

Unit vector in direction of xis

x/ || x||

Length of projection of yindirection of xis

|| y|| cos()

Orthogonal projection of

yonto xis the vectorprojx( y) = x|| y|| cos() / || x|| =

[ ( xy) / || x||2] x (using dot product alternate form)

Vector projection

y

x

projx( y )


12/22


13/22


There are many types of linearmodels in machine learning.

The projection output zis typically transformed to a final

predicted output yby some function :

example: for logistic regression, is logistic functionexample: for linear regression, ( z ) = z

Models are called linear because they are a linear

function of the model vector components 1, , d.

Key feature of all linear models: no matter what is, aconstant value of zis transformed to a constant value of

y, so decision boundaries remain linear even after

transform.

Linear models

)()()( 11 ddxxffzfy x


14/22


Geometry of projections

slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

w0

w


15/22





16/22





17/22





18/22




margin


19/22


20/22


Interpreting the model vector of coefficients

From MATLAB: B = [ 13.0460 -1.9024 -0.4047 ]

= B( 1 ), = [ 12] = B( 2 : 3 )

, define location and orientation

of decision boundary

- is distance of decision

boundary from origin

decision boundary is

perpendicular to

magnitude of defines gradient

of probabilities between 0 and 1

Logistic regression in two dimensions


21/22


Logistic function in ddimensions



22/22


Decision boundary for logistic regression


06a Math Essentials 2

Documents