Top Banner
Lecture 2.1: Vector Calculus CSC 84020 - Machine Learning Andrew Rosenberg February 5, 2009
46

Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Mar 16, 2018

Download

Documents

ngominh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Lecture 2.1: Vector Calculus

CSC 84020 - Machine Learning

Andrew Rosenberg

February 5, 2009

Page 2: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Today

Last Time

Probability Review

Today

Vector Calculus

Page 3: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Background

Let’s talk.

Linear Algebra

VectorsMatricesBasis SpacesEigenvectors/values?Inversion and transposition

Calculus

DerivationIntegration

Vector Calculus

GradientsDerivation w.r.t. a vector

Page 4: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Linear Algebra Basics

What is a vector?

What is a matrix?

Transposition

Adding matrices and vectors

Multiplying matrices.

Page 5: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Definitions

A vector is a one dimensional array.We denote vectors as either x, x.If we don’t specify otherwise assume x is a column vector.

x =

x0

x1

. . .

xn−1

Page 6: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Definitions

A matrix is a higher dimensional array.We typically denote matrices as capital letters e.g., A.If A is an n-by-m matrix, it has the following structure

A =

a0,0 a0,1 . . . a0,m−1

a1,0 a1,1 a1,m−1...

. . ....

an−1,0 an−1,1 . . . an−1,m−1

Page 7: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Matrix transposition

Transposing a matrix or vector swaps rows and columns.

A column-vector becomes a row-vector

x =

x0

x1

. . .

xn−1

xT =(

x0 x1 . . . xn−1

)

Page 8: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Matrix transposition

Transposing a matrix or vector swaps rows and columns.

A column-vector becomes a row-vector

A =

a0,0 a0,1 . . . a0,m−1

a1,0 a1,1 a1,m−1...

. . ....

an−1,0 an−1,1 . . . an−1,m−1

AT =

a0,0 a1,0 . . . an−1,0

a0,1 a1,1 a1,m−1...

. . ....

a0,m−1 a1,m−1 . . . an−1,m−1

If A is n-by-m, then AT is m-by-n.

Page 9: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Adding Matrices

Matrices can only be added if they have the same dimension.

A+B =

a0,0 + b0,0 a0,1 + b0,1 . . . a0,m−1 + b0,m−1

a1,0 + b1,0 a1,1 + b1,1 a1,m−1 + b1,m−1...

. . ....

an−1,0 + bn−1,0 an−1,1 + bn−1,1 . . . an−1,m−1 + bn−1,m−1

Page 10: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Multiplying matrices

To multiply two matrices, the inner dimensions must match.

An n-by-m can be multiplied by an n′-by-m′ matrix iff m = n′.

AB = C

cij =

m∑

k=0

aik ∗ bkj

That is, multiply the i -th row by the j-th column.

Image from wikipedia.

Page 11: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Useful matrix operations

Inversion

Norm

Eigenvector decomposition

Page 12: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Matrix Inversion

The inverse of an n-by-m matrix A is denoted A−1, and has thefollowing property.

AA−1 = I

Where I is the identity matrix, an n-by-n matrix where Iij = 1 iffi = j and 0 otherwise.If A is a square matrix (iff n = m) then,

A−1A = I

Page 13: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Matrix Inversion

The inverse of an n-by-m matrix A is denoted A−1, and has thefollowing property.

AA−1 = I

Where I is the identity matrix, an n-by-n matrix where Iij = 1 iffi = j and 0 otherwise.If A is a square matrix (iff n = m) then,

A−1A = I

What is the inverse of a vector? x−1 =?

Page 14: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Some useful Matrix Inversion Properties

(A−1)−1 = A

(kA)−1 = k−1A−1

(AT )−1 = (A−1)T

(AB)−1 = B−1A−1

Page 15: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

The norm of a vector

The norm of a vector x is written ||x||.The norm represents the euclidean length of a vector.

||x|| =

n−1∑

i=0

x2i

=√

x20 + x2

1 + . . . + x2n−1

Page 16: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Positive Definite/Semi-Definite

A positive definite matrix, M has the property that

xTMx > 0

A positive semi-definite matrix, M has the property that

xTMx ≥ 0

Why might we care about these matrices?

Page 17: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Eigenvectors

For a square matrix A, the eigenvector is defined as

Aui = λiui

Where ui is an eigenvector and λi is its correspondingeigenvalue.

In general, eigenvalues are complex numbers, but if A issymmetric, they are real.

Eigenvectors describe how a matrix transforms a vector, and canbe used to define a basis space, namely the eigenspace.

Who cares? The eigenvectors of a covariance matrix have somevery interesting properties.

Page 18: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Basis Spaces

Basis spaces allow vectors to be represented in different spaces.Our normal 2-dimensional basis space is generated by the vectors[0, 1], [1, 0].

Any 2-d vector can be expressed as the sum of linear factorsof these two basis vectors.

However, any two non-colinear vectors can generate a 2-d basisspace. In this basis space, the generating vectors are perpendicular.

Page 19: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Basis Spaces

Page 20: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Basis Spaces

Page 21: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Basis Spaces

Why do we care?

Dimensionality reduction.

Page 22: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Calculus Basics

What is a derivative?

What is an integral?

Page 23: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Derivatives

A derivative, ddx

f (x) can be thought of as defining the slope of afunction f (x). This is sometimes also written as f ′(x).

Page 24: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Derivative Example

Page 25: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Integrals

Integrals are an inverse operation of the derivative (plus aconstant).

f (x)dx = F (x) + c

F ′(x) = f (x)

An integral can be thought of as a calculation of the area underthe curve defined by f (x).

A definite integral evaluates the area over a finite region. Anindefinite integral is calculated over the range of (−∞,∞).

Page 26: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Integration Example

Page 27: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Useful calculus operations

Product, quotient, summation rules for derivatives.

Useful integration and derivative identities.

Chain rule

Integration by parts

Variable substitution (don’t forget the Jacobian!)

Page 28: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Calculus Identities

Summation ruleg(x) = f0(x) + f0(x)

g ′(x) = f ′0(x) + f ′1(x)

Product Ruleg(x) = f0(x)f1(x)

g ′(x) = f0(x)f ′1(x) + f ′0(x)f1(x)

Quotient Rule

g(x) =f0(x)

f1(x)

g ′(x) =f0(x)f ′1(x) − f ′0(x)f1(x)

f 21 (x)

Page 29: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Calculus Identities

Constant multipliersg(x) = cf (x)

g ′(x) = cf ′(x)

Exponent Rule

g(x) = f (x)k

g ′(x) = kf (x)k−1

Chain Ruleg(x) = f0(f1(x))

g ′(x) = f ′0(f1(x))f ′1(x)

Page 30: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Calculus Identities

Exponent Ruleg(x) = ex

g ′(x) = ex

g(x) = kx

g ′(x) = ln(k)kx

Logarithm Ruleg(x) = ln(x)

g ′(x) =1

x

g(x) = logb(x)

g ′(x) =1

x ln b

Page 31: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Calculus Operations

Integration by Parts

f (x)dg(x)

dxdx = f (x)g(x) −

g(x)df (x)

dxdx

Variable Substitution

∫ b

a

f (g(x))g ′(x)dx =

∫ g(b)

g(a)f (x)dx

Page 32: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Vector Calculus

Derivation with respect to to a vector or matrix.

Gradient of a vector.

Change of variables with a vector.

Page 33: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Derivation with respect to a vector

Given a vector x = (x0, x1, . . . , xn−1)T , and a function

f (x) : Rn → R how can we find ∂f (x)

∂x?

Page 34: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Derivation with respect to a vector

Given a vector x = (x0, x1, . . . , xn−1)T , and a function

f (x) : Rn → R how can we find ∂f (x)

∂x?

∂f (x)

∂x=

∂f (x)∂x0

∂f (x)∂x1...

∂f (x)∂xn−1

This is also called the gradient of the function, and is oftenwritten ∇f (x) or ∇f .

Page 35: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Derivation with respect to a vector

Given a vector x = (x0, x1, . . . , xn−1)T , and a function

f (x) : Rn → R how can we find ∂f (x)

∂x?

∂f (x)

∂x=

∂f (x)∂x0

∂f (x)∂x1...

∂f (x)∂xn−1

This is also called the gradient of the function, and is oftenwritten ∇f (x) or ∇f .

Why might this be useful?

Page 36: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Useful Vector Calculus identities

Given a vector x with |x| = n and a scalar variable y .

∂x

∂y=

∂x0∂y∂x1∂y...

∂xn−1

∂y

Page 37: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Useful Vector Calculus identities

Given a vector x with |x| = n and a vector y with |y| = m .

∂x

∂y=

∂x0∂y0

∂x0∂y1

. . . ∂x0∂ym−1

∂x1∂y0

∂x1∂y1

. . . ∂x1∂ym−1

......

. . ....

∂xn−1

∂y0

∂xn−1

∂y1. . .

∂xn−1

∂ym−1

Page 38: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Vector Calculus Identities

Similar to – Scalar Multiplication Rule

∂x(xTa) =

∂x(aTx) = a

Similar to – Product Rule

∂x(AB) =

∂A

∂xB + A

∂B

∂x

Derivative of an Matrix inverse.

∂x(A−1) = −A−1∂A

∂xA−1

Change of Variable in an Integral

f (x)dx =

f (u)

∂x

∂u

du

Page 39: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Calculating the Expectation of a Gaussian

Now we have enough tools to calculate the expectation of avariable given a Gaussian Distribution.

Recall:

E[x |µ, σ2] =

p(x |µ, σ2)xdx

=

N(x |µ, σ2)xdx

=

1√2πσ2

exp

{

− 1

2σ2(x − µ)2

}

xdx

Page 40: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Calculating the Expectation of a Gaussian

E[x |µ, σ2] =

Z

1√2πσ2

exp

− 1

2σ2(x − µ)2

ff

xdx

u = x − µ

du = dx

E[x |µ, σ2] =

Z

1√2πσ2

exp

− 1

2σ2(x − µ)2

ff

xdx

=

Z

1√2πσ2

exp

− 1

2σ2u

2

ff

(u + µ)du

=

Z

1√2πσ2

exp

− 1

2σ2u

2

ff

udu + µ

Z

1√2πσ2

exp

− 1

2σ2u

2

ff

du

Page 41: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Calculating the Expectation of a Gaussian

E[x |µ, σ2] =

Z

1√2πσ2

exp

− 1

2σ2u

2

ff

udu + µ

Z

1√2πσ2

exp

− 1

2σ2u

2

ff

du

Z

1√2πσ2

exp

− 1

2σ2u

2

ff

du = 1

E[x |µ, σ2] =

Z

1√2πσ2

exp

− 1

2σ2u

2

ff

udu + µ

Aside: A function is Odd iff f (x) = −f (−x).

Odd functions have the propertyR

−∞f (x)dx = 0.

A function is Even iff f (x) = f (−x).

The product of an odd function and an even function is an odd function.

Page 42: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Calculating the Expectation of a Gaussian

E[x |µ, σ2] =

1√2πσ2

exp

{

− 1

2σ2u2

}

udu + µ

exp

{

− 1

2σ2u2

}

is even

u is odd

exp

{

− 1

2σ2u2

}

u is odd

1√2πσ2

exp

{

− 1

2σ2u2

}

udu = 0

E[x |µ, σ2] = µ

Page 43: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Why does Machine Learning need these tools?

Calculus

We need to find maximum likelihoods or minimum risks. Thisoptimization is accomplished with derivatives.

Integration allows us to marginalize continuous probabilitydensity functions.

Linear Algebra

We will be working in high-dimension spaces.

Vectors and Matrices allow us to refer to high dimensionalpoints – groups of features – as vectors.

Matrices allow us to describe the feature space.

Page 44: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Why does machine learning need these tools

Vector Calculus

We need to do all of the calculus operations inhigh-dimensional feature spaces.

We will want to optimize multiple values simultaneously –Gradient Descent.

We will need to take a marginal over a high dimensionaldistributions – Gaussians.

Page 45: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Broader Context

What we have so far:

Entities in the world are represented as feature vectors andmaybe a label.

We want to construct statistical models of the feature vectors.

Finding the most likely model is an optimization problem.

Since the feature vectors may have more than one dimension,linear algebra can help us work with them.

Page 46: Lecture 2.1: Vector Calculus CSC 84020 - Machine …eniac.cs.qc.cuny.edu/andrew/gcml/lecture3.pdfProbability Review Today Vector Calculus. Background Let’s talk. Linear Algebra Vectors

Bye

Next

Linear Regression