Top Banner
General remarks about learning Probability Theory and Statistics Linear spaces Machine Learning Preliminaries and Math Refresher M. L¨ uthi, T. Vetter February 18, 2008 M. L¨ uthi, T. Vetter Machine Learning Preliminaries and Math Refresher
32
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Machine LearningPreliminaries and Math Refresher

M. Luthi, T. Vetter

February 18, 2008

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 2: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Outline

1 General remarks about learning

2 Probability Theory and Statistics

3 Linear spaces

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 3: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Outline

1 General remarks about learning

2 Probability Theory and Statistics

3 Linear spaces

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 4: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

The problem of learning is arguably at the very core of the problemof intelligence, both biological and artificial.

T. Poggio and C.R. Shelton

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 5: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Model building in natural sciences

Model building

Given a phenomenon, construct a model for it.

Example (Heat Conduction)

Phenomenon: The spontaneous transfer of thermal energythrough matter, from a region of higher temperature to a region oflower temperatureModel:

∂Q

∂t= −k

∮S∇T · dS

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 6: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Learning as Model Building

Example (Learning)

Phenomenon: Learning (Inferring general rules from examples)Model:

f ∗ = arg maxf ∈H

P(f )P(f |D)

P(D)

Neural networks, Decision Trees, Naive Bayes, Support Vectormachines, etc.

Models for learning

The models for learning are the learning algorithms

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 7: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Learning as Model Building

Example (Learning)

Phenomenon: Learning (Inferring general rules from examples)Model:

f ∗ = arg maxf ∈H

P(f )P(D|f )

P(D)

Neural networks, Decision Trees, Naive Bayes, Support Vectormachines, etc.

Models for learning

The models for learning are the learning algorithms

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 8: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Goals of the first block

Life is short . . .

We want to cover the essentials of learning.

General Setting

Mathematicallyprecise settingof the learningproblem

Valid for anykind of learningalgorithm

StatisticalLearning Theory

When doeslearning work

Conditions anyalgorithm hasto satisfy

Performancebounds

Kernel Methods

Theory ofKernels

Make linearalgorithmsnon-linear.

Learning fromnon-vectorialdata.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 9: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Mathematics needed in the first block

The need for mathematics

As we treat the learning problem in a formal setting, the resultsand methods are necessarily formulated in mathematical terms.

General Setting

Probabilitytheory

Statistics

Basicoptimizationtheory

StatisticalLearning Theory

Moreprobabilitytheory

More statistics

Kernel Methods

Linear spaces

Linear algebra

Basicoptimizationtheory

A bit of mathematical maturity and an open mind is required. Therest will be explained.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 10: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Mathematics needed in the first block

The need for mathematics

As we treat the learning problem in a formal setting, the resultsand methods are necessarily formulated in mathematical terms.

General Setting

Probabilitytheory

Statistics

Basicoptimizationtheory

StatisticalLearning Theory

Moreprobabilitytheory

More statistics

Kernel Methods

Linear spaces

Linear algebra

Basicoptimizationtheory

A bit of mathematical maturity and an open mind is required. Therest will be explained.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 11: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Nothing is more practical than a good theory.

Vladimir N. Vapnik

Nothing (in computer science) is more beautiful than learningtheory?

M. Luthi

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 12: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Nothing is more practical than a good theory.

Vladimir N. Vapnik

Nothing (in computer science) is more beautiful than learningtheory?

M. Luthi

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 13: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Outline

1 General remarks about learning

2 Probability Theory and Statistics

3 Linear spaces

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 14: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 15: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Probability theory vs Statistics

Definition (Probability Theory)

A branch of mathematicsconcerned with the analysis ofrandom phenomena.

General ⇒ Specific

Definition (Statistics)

The science of collecting,analyzing, presenting, andinterpreting data.

Specific ⇒ General

Statistical Machine learning is closely related to (inferential)statistics.

Many state-of-the-art learning algorithms are based onconcepts from probability theory.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 16: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Probabilities

Definition (Probability Space)

A probability space is the triple

(Ω,F ,P)

where

Ω is a set of events ω

F is a collection of events (e.g. the power-set P(Ω))

P is a measure that satisfies the probability axioms.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 17: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Axioms of Probability

1 For any A ∈ F , there exists a number P(A), the probability ofA, satisfitying P(A) ≥ 0.

2 P(Ω) = 1.

3 Let An, n ≥ 1 be a collection of pairwise disjoint events,and let A be their union. Then

P(A) =∞∑

n=1

P(An).

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 18: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Independence

Definition (Independence)

Two events, A and B, are independent iff the probability of theirintersection equals the product of the individual probabilities, i.e.

P(A ∩ B) = P(A) · P(B).

Definition (Conditional probability)

Given two events A and B, with P(B) > 0, we define theconditional probability for A given B, P(A|B), by the relation

P(A|B) =P(A ∩ B)

P(B).

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 19: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Random Variables

A single event is not that interesting.

Definition (Random Variable)

A random variable X is a function from the probability space to avector of real numbers

X : Ω → Rn.

Random variables are characterized by their distribution function F :

Definition (Probability Distribution Function)

Let X : Ω → R be a random variable. We define

FX (x) = P(X ≤ x) −∞ < x < ∞.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 20: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Probability density function

Definition (Probability density function)

The density function, is the function fX , with the property

FX (x) =

∫ x

−∞fX (y) dy , −∞ < x < ∞.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 21: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Convergence

Definition (Convergence in Probability)

Let X1,X2, . . . be random variables. We say that Xn converges inprobability to the random variable X as n →∞, iff, for all ε > 0,

P(|Xn − X | > ε) → 0, as n →∞.

We write Xnp−→ X as n →∞.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 22: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Weak law of large numbers

Theorem (Bernoulli’s Theorems (Weak law of large numbers))

Let X1, . . . ,Xn be a sequence of independent and identicallydistributed (i.i.d.) random variables, each having mean µ andstandard deviation σ. Then

P[|(X1 + . . . + Xn)/n − µ| > ε] → 0

as n →∞.

Thus given enough observations xi ∼ FX , the sample meanx = 1

n

∑ni=1 xi will approach the true mean µ.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 23: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Expectation

Definition (Expectation)

Let X be a random variable with probability density function fX ,and g : R → R a function. We define the expectation

E [g(X )] :=

∫ ∞

−∞g(x)fX (x) dx .

Definition (Sample mean)

Let a sample x = x1, x2, . . . , xn be given. We define the(sample) mean to be

x =1

n

n∑i=1

xi .

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 24: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Variance

Definition (Variance)

Let X be a random variable with density funciton fX . The varianceis given by

Var[X ] = E [(X − E [X ])2] = E [X 2]− (E [X ])2.

The square root√

Var[X ] of the variance is referred to as thestandard deviation.

Definition (Sample Variance)

Let the sample x = x1, x2, . . . , xn with sample mean x be given.We define the sample variance to be

s2 =1

n − 1(xi − x)2.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 25: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Notation

Assume F has a probability density function:

f (x) =dF (x)

dx

Formally, we write:f (x) dx = dF (x)

Example: Expectation

E [g(X )] :=

∫ ∞

−∞g(x)f (x) dx . =

∫ ∞

−∞g(x)dF (x)

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 26: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Outline

1 General remarks about learning

2 Probability Theory and Statistics

3 Linear spaces

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 27: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Vector Space

A set V together with two binary operations

1 vector addition + : V × V → V and

2 scalar multiplication · : R× V → V

is called a vector space over R, if it satisfies the following axioms:

1 ∀x , y ∈ V : x + y = y + x (commutativity)

2 ∀x , y ∈ V : x + (y + z) = (x + y) + z (associativity)

3 ∃0 ∈ V ,∀x ∈ V : 0 + x = x (identity of vector addition)

4 ∃1 ∈ V ,∀x ∈ V : 1 · x = x (identity of vector multiplication)

5 ∀x ∈ V : ∃x ∈ V : x + (−x) = 0 (additive inverse element)

6 ∀α ∈ R,∀x , y ∈ V : α · (x + y) = α · x + α · y (distributivity)

7 ∀α, β ∈ R,∀x ∈ V : (α + β) · x = α · x + β · x (distributivity)

8 ∀α, β ∈ R,∀x ∈ V : α(β · x) = (αβ) · xM. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 28: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Vector Space

More importantly for us, the definition implies:

x + y ∈ V , ∀x , y ∈ V

αx ∈ V , ∀α ∈ R,∀x ∈ V

Subspace criterion

Let V be a vector space over R, and let W be a subset of V .Then W is a subspace if and only if it satisfies the following 3conditions:

1 0 ∈ W

2 If x , y ∈ W then x + y ∈ W

3 If x ∈ W and α ∈ R then αx ∈ W

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 29: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Normed spaces

Definition (Normed vector space)

A normed vector space is a pair (V , ‖·‖) where V is a vector spaceand ‖·‖ is the associated norm, satisfying the following propertiesfor all u, v ∈ V :

1 ‖v‖ ≥ 0 (positivity)

2 ‖u + v‖ ≤ ‖u‖+ ‖v‖ (triangle inequality)

3 ‖αv‖ = |α|‖v‖ (positive scalability)

4 ‖v‖ = 0 ⇔ v = 0 (positive definiteness)

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 30: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Definition (Inner product space)

An real inner product space is a pair (V , 〈·, ·〉), where V is a realvector space and 〈·, ·〉 the associated inner product, satisfying thefollowing properties for all u, v ,∈ V

1 〈u, v〉 = 〈v , u〉 (symmetry)

2 〈αu, v〉 = α〈u, v〉, 〈u, αv〉 = α〈u, v〉and〈u + v ,w〉 = 〈u,w〉+ 〈v ,w〉, 〈u, v + w〉 = 〈u, v〉+ 〈u,w〉,(bilinearity)

3 〈u, u〉 ≥ 0 (positive definiteness)

Definition (Strict inner product space)

A inner product space is called strict if

〈u, u〉 = 0 ⇔ u = 0

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 31: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

Inner product space

The strict inner product

induces a norm: ‖f ‖2 = 〈f , f 〉.is used to define distances and angles between elements.

Theorem (Cauchy Schwarz inequality)

For all vectors u and v of a real inner product space (V , 〈·, ·〉), thefollowing inequality holds:

|〈u, v〉| ≤ ‖u‖‖v‖.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher

Page 32: Machine Learning Preliminaries and Math Refresher

General remarks about learningProbability Theory and Statistics

Linear spaces

If you’re not comfortable with any of the presented material, youshould take your favourite textbook and read it up within the nexttwo weeks.

M. Luthi, T. Vetter Machine Learning Preliminaries and Math Refresher