Robotics Probabilities Random variables, joint, conditional, marginal distribution, Bayes theorem, Probability distributions, Gauss, Dirac, Conjugate priors Marc Toussaint University of Stuttgart Winter 2017/18 Lecturer: Duy Nguyen-Tuong Bosch Center for AI - Research Dept.
32
Embed
Introduction to Robotics Probabilitiesipvs.informatik.uni-stuttgart.de/mlr/.../Suppl.-Material-probabilities.pdf · Robotics Probabilities Random variables, joint, conditional, marginal
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Robotics
Probabilities
Random variables, joint, conditional, marginaldistribution, Bayes theorem, Probability
distributions, Gauss, Dirac, Conjugate priors
Marc ToussaintUniversity of Stuttgart
Winter 2017/18
Lecturer: Duy Nguyen-Tuong
Bosch Center for AI - Research Dept.
Probability Theory
• Why do we need probabilities?
– Obvious: to express inherent (objective) stochasticity of the world
• But beyond this: (also in a “deterministic world”):– lack of knowledge!– hidden (latent) variables– expressing uncertainty– expressing information (and lack of information)– Subjective Probability
• Probability Theory: an information calculus
2/23
Probability Theory
• Why do we need probabilities?
– Obvious: to express inherent (objective) stochasticity of the world
• But beyond this: (also in a “deterministic world”):– lack of knowledge!– hidden (latent) variables– expressing uncertainty– expressing information (and lack of information)– Subjective Probability
• Probability Theory: an information calculus
2/23
Probability Theory
• Why do we need probabilities?
– Obvious: to express inherent (objective) stochasticity of the world
• But beyond this: (also in a “deterministic world”):– lack of knowledge!– hidden (latent) variables– expressing uncertainty– expressing information (and lack of information)– Subjective Probability
– Additivity P (A ∪B) = P (A) + P (B) if A ∩B = ∅– Normalization P (Ω) = 1
• Implications0 ≤ P (A) ≤ 1
P (∅) = 0
A ⊆ B ⇒ P (A) ≤ P (B)
P (A ∪B) = P (A) + P (B)− P (A ∩B)
P (Ω \A) = 1− P (A)5/23
Probabilities & Random Variables
• For a random variable X with discrete domain dom(X) = Ω we write:∀x∈Ω : 0 ≤ P (X=x) ≤ 1∑x∈Ω P (X=x) = 1
Example: A dice can take values Ω = 1, .., 6.X is the random variable of a dice throw.P (X=1) ∈ [0, 1] is the probability that X takes value 1.
• A bit more formally: a random variable is a map from a measureable space to adomain (sample space) and thereby introduces a probability measure on thedomain (“assigns a probability to each possible value”)
6/23
Probabilty Distributions
• P (X=1) ∈ R denotes a specific probabilityP (X) denotes the probability distribution (function over Ω)
Example: A dice can take values Ω = 1, 2, 3, 4, 5, 6.By P (X) we discribe the full distribution over possible values 1, .., 6. Theseare 6 numbers that sum to one, usually stored in a table, e.g.: [ 1
6, 16, 16, 16, 16, 16]
• In implementations we typically represent distributions over discreterandom variables as tables (arrays) of numbers
• Notation for summing over a RV:In equations we often need to sum over RVs. We then write∑
X P (X) · · ·as shorthand for the explicit notation
∑x∈dom(X) P (X=x) · · ·
7/23
Probabilty Distributions
• P (X=1) ∈ R denotes a specific probabilityP (X) denotes the probability distribution (function over Ω)
Example: A dice can take values Ω = 1, 2, 3, 4, 5, 6.By P (X) we discribe the full distribution over possible values 1, .., 6. Theseare 6 numbers that sum to one, usually stored in a table, e.g.: [ 1
6, 16, 16, 16, 16, 16]
• In implementations we typically represent distributions over discreterandom variables as tables (arrays) of numbers
• Notation for summing over a RV:In equations we often need to sum over RVs. We then write∑
X P (X) · · ·as shorthand for the explicit notation
∑x∈dom(X) P (X=x) · · ·
7/23
Joint distributionsAssume we have two random variables X and Y
• JointP (X,Y )
• Marginal (sum rule)P (X) =
∑Y P (X,Y )
• Conditional:P (X|Y ) = P (X,Y )
P (Y )
The conditional is normalized: ∀Y :∑
X P (X|Y ) = 1
• X is independent of Y iff: P (X|Y ) = P (X)
(table thinking: all columns of P (X|Y ) are equal)
8/23
Joint distributionsAssume we have two random variables X and Y
• JointP (X,Y )
• Marginal (sum rule)P (X) =
∑Y P (X,Y )
• Conditional:P (X|Y ) = P (X,Y )
P (Y )
The conditional is normalized: ∀Y :∑
X P (X|Y ) = 1
• X is independent of Y iff: P (X|Y ) = P (X)
(table thinking: all columns of P (X|Y ) are equal)
8/23
Joint distributionsAssume we have two random variables X and Y
• JointP (X,Y )
• Marginal (sum rule)P (X) =
∑Y P (X,Y )
• Conditional:P (X|Y ) = P (X,Y )
P (Y )
The conditional is normalized: ∀Y :∑
X P (X|Y ) = 1
• X is independent of Y iff: P (X|Y ) = P (X)
(table thinking: all columns of P (X|Y ) are equal)
8/23
Joint distributionsAssume we have two random variables X and Y
• JointP (X,Y )
• Marginal (sum rule)P (X) =
∑Y P (X,Y )
• Conditional:P (X|Y ) = P (X,Y )
P (Y )
The conditional is normalized: ∀Y :∑
X P (X|Y ) = 1
• X is independent of Y iff: P (X|Y ) = P (X)
(table thinking: all columns of P (X|Y ) are equal)
8/23
Bayes’ Theorem
• Implications of these definitions:
Product rule: P (X,Y ) = P (X) P (Y |X) = P (Y ) P (X|Y )
Bayes’ Theorem: P (X|Y ) = P (Y |X) P (X)P (Y )
posterior = likelihood · priornormalization
9/23
Bayes’ Theorem
• Implications of these definitions:
Product rule: P (X,Y ) = P (X) P (Y |X) = P (Y ) P (X|Y )
Bayes’ Theorem: P (X|Y ) = P (Y |X) P (X)P (Y )
posterior = likelihood · priornormalization
9/23
Bayes’ Theorem
• Implications of these definitions:
Product rule: P (X,Y ) = P (X) P (Y |X) = P (Y ) P (X|Y )
Bayes’ Theorem: P (X|Y ) = P (Y |X) P (X)P (Y )
posterior = likelihood · priornormalization
9/23
Multiple RVs:
• Analogously for n random variables X1:n (stored as a rank n tensor)Joint: P (X1:n)
Marginal: P (X1) =∑X2:n
P (X1:n),Conditional: P (X1|X2:n) = P (X1:n)
P (X2:n)
• X is conditionally independent of Y given Z iff:P (X|Y,Z) = P (X|Z)
• Product rule and Bayes’ Theorem:
P (X1:n) =∏n
i=1 P (Xi|Xi+1:n)
P (X1|X2:n) = P (X2|X1,X3:n) P (X1|X3:n)P (X2|X3:n)
P (X,Z, Y ) = P (X|Y,Z) P (Y |Z) P (Z)
P (X|Y,Z) = P (Y |X,Z) P (X|Z)P (Y |Z)
P (X,Y |Z) = P (X,Z|Y ) P (Y )P (Z)
10/23
Distributions over continuous domain
• Let x be a continuous RV. The probability density function (pdf)p(x) ∈ [0,∞) defines the probability
P (a ≤ x ≤ b) =
∫ b
a
p(x) dx ∈ [0, 1]
(In discrete domain: probability distribution and probability mass functionP (x) ∈ [0, 1] are used synonymously.)
• The cumulative distribution function (cdf)F (y) = P (x ≤ y) =
∫ y−∞ p(x)dx ∈ [0, 1] is the cumulative integral with
limy→∞ F (y) = 1
• Two basic examples:Gaussian: N(x |µ,Σ) = 1
| 2πΣ | 1/2 e− 1
2 (x−µ)> Σ-1 (x−µ)
Dirac or δ (“point particle”) δ(x) = 0 except at x = 0,∫δ(x) dx = 1
δ(x) = ∂∂xH(x) where H(x) = [x ≥ 0] = Heavyside step function
11/23
Distributions over continuous domain
• Let x be a continuous RV. The probability density function (pdf)p(x) ∈ [0,∞) defines the probability
P (a ≤ x ≤ b) =
∫ b
a
p(x) dx ∈ [0, 1]
(In discrete domain: probability distribution and probability mass functionP (x) ∈ [0, 1] are used synonymously.)
• The cumulative distribution function (cdf)F (y) = P (x ≤ y) =
∫ y−∞ p(x)dx ∈ [0, 1] is the cumulative integral with
limy→∞ F (y) = 1
• Two basic examples:Gaussian: N(x |µ,Σ) = 1
| 2πΣ | 1/2 e− 1
2 (x−µ)> Σ-1 (x−µ)
Dirac or δ (“point particle”) δ(x) = 0 except at x = 0,∫δ(x) dx = 1
δ(x) = ∂∂xH(x) where H(x) = [x ≥ 0] = Heavyside step function
11/23
Gaussian distribution
• 1-dim: N(x |µ, σ2) = 1| 2πσ2 | 1/2 e
− 12 (x−µ)2/σ2
N (x|µ, σ2)
x
2σ
µ
• n-dim Gaussian in normal form:
N(x |µ,Σ) =1
| 2πΣ | 1/2exp−1
2(x− µ)>Σ-1 (x− µ)
with mean µ and covariance matrix Σ. In canonical form:
N[x | a,A] =exp− 1
2a>A-1a
| 2πA-1 | 1/2exp−1
2x>A x+ x>a (1)
with precision matrix A = Σ-1 and coefficient a = Σ-1µ (and meanµ = A-1a).
12/23
Gaussian identitiesSymmetry: N(x | a,A) = N(a |x,A) = N(x− a | 0, A)
Product:N(x | a,A) N(x | b,B) = N[x |A-1a + B-1b, A-1 + B-1] N(a | b, A + B)
N[x | a,A] N[x | b,B] = N[x | a + b, A + B] N(A-1a |B-1b, A-1 + B-1)
“Propagation”:∫yN(x | a + Fy,A) N(y | b,B) dy = N(x | a + Fb,A + FBF>)
Transformation:N(Fx + f | a,A) = 1
|F | N(x | F -1(a− f), F -1AF ->)
Marginal & conditional:
N
(x
y
∣∣∣∣ ab,
A C
C> B
)= N(x | a,A) ·N(y | b + C>A-1(x - a), B − C>A-1C)
More Gaussian identities: seehttp://ipvs.informatik.uni-stuttgart.de/mlr/marc/notes/gaussians.pdf