Top Banner

of 22

06a Math Essentials 2

Jun 04, 2018

Download

Documents

nikola001
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/13/2019 06a Math Essentials 2

    1/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Machine Learning

    Math EssentialsPart 2

  • 8/13/2019 06a Math Essentials 2

    2/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Most commonly used continuous probabilitydistribution

    Also known as the normal distribution

    Two parameters define a Gaussian:

    Mean location of center

    Variance 2 width of curve

    Gaussian distribution

  • 8/13/2019 06a Math Essentials 2

    3/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    2

    2

    2

    )(

    2/12

    2

    )2(

    1),|(

    x

    ex

    Gaussian distribution

    In one dimension

  • 8/13/2019 06a Math Essentials 2

    4/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    2

    2

    2

    )(

    2/122

    )2(1),|(

    x

    ex

    Gaussian distribution

    In one dimension

    Normalizing constant:

    insures that distribution

    integrates to 1

    Controls width of curve

    Causes pdf to decrease as

    distance from centerincreases

  • 8/13/2019 06a Math Essentials 2

    5/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Gaussian distribution

    = 0 2= 1 = 2 2= 1

    = 0 2= 5 = -2 2= 0.3

    2

    2

    1

    21

    x

    ex

  • 8/13/2019 06a Math Essentials 2

    6/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    )()(2

    1

    2/12/

    1T

    ||

    1

    )2(

    1),|(

    xx

    x

    ed

    Multivariate Gaussian distribution

    In ddimensions

    xand now d-dimensional vectorsgives center of distribution in d-dimensional space

    2replaced by , the dx dcovariance matrix contains pairwise covariances of every pair of features Diagonal elements of are variances 2of individual

    features

    describes distributions shape and spread

  • 8/13/2019 06a Math Essentials 2

    7/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Covariance Measures tendency for two variables to deviate from

    their means in same (or opposite) directions at same

    time

    Multivariate Gaussian distribution

    no

    covariance h

    igh(positive)

    covarian

    ce

  • 8/13/2019 06a Math Essentials 2

    8/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    In two dimensions

    Multivariate Gaussian distribution

    13.0

    3.025.0

    00

  • 8/13/2019 06a Math Essentials 2

    9/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    In two dimensions

    Multivariate Gaussian distribution

    26.0

    6.02

    10

    02

    10

    01

  • 8/13/2019 06a Math Essentials 2

    10/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    In three dimensions

    Multivariate Gaussian distribution

    rng( 1 );

    mu = [ 2; 1; 1 ];sigma = [ 0.25 0.30 0.10;

    0.30 1.00 0.70;

    0.10 0.70 2.00] ;

    x = randn( 1000, 3 );

    x = x * sigma;

    x = x + repmat( mu', 1000, 1 );

    scatter3( x( :, 1 ), x( :, 2 ), x( :, 3 ), '.' );

  • 8/13/2019 06a Math Essentials 2

    11/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Orthogonal projection of yonto x Can take place in any space of dimensionality > 2

    Unit vector in direction of xis

    x/ || x||

    Length of projection of yindirection of xis

    || y|| cos()

    Orthogonal projection of

    yonto xis the vectorprojx( y) = x|| y|| cos() / || x|| =

    [ ( xy) / || x||2] x (using dot product alternate form)

    Vector projection

    y

    x

    projx( y )

  • 8/13/2019 06a Math Essentials 2

    12/22

  • 8/13/2019 06a Math Essentials 2

    13/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    There are many types of linearmodels in machine learning.

    The projection output zis typically transformed to a final

    predicted output yby some function :

    example: for logistic regression, is logistic functionexample: for linear regression, ( z ) = z

    Models are called linear because they are a linear

    function of the model vector components 1, , d.

    Key feature of all linear models: no matter what is, aconstant value of zis transformed to a constant value of

    y, so decision boundaries remain linear even after

    transform.

    Linear models

    )()()( 11 ddxxffzfy x

  • 8/13/2019 06a Math Essentials 2

    14/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Geometry of projections

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

    w0

    w

  • 8/13/2019 06a Math Essentials 2

    15/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Geometry of projections

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

  • 8/13/2019 06a Math Essentials 2

    16/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Geometry of projections

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

  • 8/13/2019 06a Math Essentials 2

    17/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Geometry of projections

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

  • 8/13/2019 06a Math Essentials 2

    18/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Geometry of projections

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

    margin

  • 8/13/2019 06a Math Essentials 2

    19/22

  • 8/13/2019 06a Math Essentials 2

    20/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Interpreting the model vector of coefficients

    From MATLAB: B = [ 13.0460 -1.9024 -0.4047 ]

    = B( 1 ), = [ 12] = B( 2 : 3 )

    , define location and orientation

    of decision boundary

    - is distance of decision

    boundary from origin

    decision boundary is

    perpendicular to

    magnitude of defines gradient

    of probabilities between 0 and 1

    Logistic regression in two dimensions

  • 8/13/2019 06a Math Essentials 2

    21/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Logistic function in ddimensions

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)

  • 8/13/2019 06a Math Essentials 2

    22/22

    Jeff Howbert Introduction to Machine Learning Winter 2012 #

    Decision boundary for logistic regression

    slide thanks to Greg Shakhnarovich (CS195-5, Brown Univ., 2006)