Top Banner
Probabilistic models Haixu Tang School of Informatics
29

Probabilistic models Haixu Tang School of Informatics.

Dec 14, 2015

Download

Documents

Isis Ellson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Probabilistic models Haixu Tang School of Informatics.

Probabilistic models

Haixu Tang

School of Informatics

Page 2: Probabilistic models Haixu Tang School of Informatics.

Probability

• Experiment: a procedure involving chance that leads to different results

• Outcome: the result of a single trial of an experiment;

• Event: one or more outcomes of an experiment;

• Probability: the measure of how likely an event is;

Page 3: Probabilistic models Haixu Tang School of Informatics.

Example: a fair 6-sided dice

• Outcome: The possible outcomes of this experiment are 1, 2, 3, 4, 5 and 6;

• Events: 1; 6; even

• Probability: outcomes are equally likely to occur. – P(A) = The Number Of Ways Event A Can Occur  / The Total

Number Of Possible Outcomes

– P(1)=P(6)=1/6; P(even)=3/6=1/2;

Page 4: Probabilistic models Haixu Tang School of Informatics.

Probability distribution

• Probability distribution: the assignment of a probability P(x) to each outcome x.

• A fair dice: outcomes are equally likely to occur the probability distribution over the all six outcomes P(x)=1/6, x=1,2,3,4,5 or 6.

• A loaded dice: outcomes are unequally likely to occur the probability distribution over the all six outcomes P(x)=f(x), x=1,2,3,4,5 or 6, but f(x)=1.

Page 5: Probabilistic models Haixu Tang School of Informatics.

Example: DNA sequences

• Event: Observing a DNA sequence S=s1s2…sn: si {A,C,G,T};

• Random sequence model (or Independent and identically-distributed, i.i.d. model): si occurs at random with the probability P(si), independent of all other residues in the sequence;

• P(S)=• This model will be used as a background

model (or called null hypothesis).

n

iisP

1

Page 6: Probabilistic models Haixu Tang School of Informatics.

Conditional probability

• P(i|): the measure of how likely an event i happens under the condition ;– Example: two dices D1, D2

• P(i|D1) probability for picking i using dicer D1

• P(i|D2) probability for picking i using dicer D2

Page 7: Probabilistic models Haixu Tang School of Informatics.

Joint probability

• Two experiments X and Y– P(X,Y) joint probability (distribution) of experiments

X and Y– P(X,Y)=P(X|Y)P(Y)=P(Y|X)P(X)– P(X|Y)=P(X), X and Y are independent

• Example: experiment 1 (selecting a dice), experiment 2 (rolling the selected dice)– P(y): y=D1 or D2

– P(i, D1)=P(i| D1)P(D1)– P(i| D1)=P(i| D2), independent events

Page 8: Probabilistic models Haixu Tang School of Informatics.

Marginal probability

• P(X)=YP(X|Y)P(Y)

• Example: experiment 1 (selecting a dice), experiment 2 (rolling the selected dice)– P(y): y=D1 or D2

– P(i) =P(i| D1)P(D1)+P(i| D2)P(D2)– P(i| D1)=P(i| D2), independent events

• P(i)= P(i| D1)(P(D1)+P(D2))= P(i| D1)

Page 9: Probabilistic models Haixu Tang School of Informatics.

Probability models

• A system that produces different outcomes with different probabilities.

• It can simulate a class of objects (events), assigning each an associated probability.

• Simple objects (processes) probability distributions

Page 10: Probabilistic models Haixu Tang School of Informatics.

Example: continuous variable

• The whole set of outcomes X (xX) can be infinite.

• Continuous variable x[x0,x1]– P(x0≤x≤x1) ->0

– P(x-x/2 ≤ x ≤ x+x/2) = f(x)x; f(x)x=1– f(x) – probability density function (density, pdf)

– P(xy)= yx0f(x)x – cumulated density function (cdf)

x0

x1

x1

x

Page 11: Probabilistic models Haixu Tang School of Informatics.

Mean and variance

• Mean– m=xP(x)

• Variance 2= (k-m)2P(k) : standard deviation

Page 12: Probabilistic models Haixu Tang School of Informatics.

Typical probability distributions

• Binomial distribution

• Gaussian distribution

• Multinomial distribution

• Dirichlet distribution

• Extreme value distribution (EVD)

Page 13: Probabilistic models Haixu Tang School of Informatics.

Binomial distribution

• An experiment with binary outcomes: 0 or 1;

• Probability distribution of a single experiment: P(‘1’)=p and P(‘0’) = 1-p;

• Probability distribution of N tries of the same experiment

• Bi(k ‘1’s out of N tries) ~ kNk pp

k

N

)1(

Page 14: Probabilistic models Haixu Tang School of Informatics.

Gaussian distribution

• N -> , Bi -> Gaussian distribution

• Define the new variable u = (k-m)/ – f(u)~ 2/exp 2

21 u

Page 15: Probabilistic models Haixu Tang School of Informatics.

Multinomial distribution

• An experiment with K independent outcomes with probabilities i, i =1,…,K, i =1.

• Probability distribution of N tries of the same experiment, getting ni occurrences of outcome i, ni =N.

• M(N|) ~

K

i

ni

ii

i

n

N

1!

!

Page 16: Probabilistic models Haixu Tang School of Informatics.

Example: a fair dice

• Probability: outcomes (1,2,…,6) are equally likely to occur

• Probability of rolling 1 dozen times (12) and getting each outcome twice:– ~3.410-3 126

12!126

Page 17: Probabilistic models Haixu Tang School of Informatics.

Example: a loaded dice

• Probability: outcomes (1,2,…,6) are unequally likely to occur: P(6)=0.5, P(1)=P(2)=…=P(5)=0.1

• Probability of rolling 1 dozen times (12) and getting each outcome twice:– ~1.8710-4 102

2!12 1.05.06

Page 18: Probabilistic models Haixu Tang School of Informatics.

Dirichlet distribution

• Outcomes: =(1, 2,…, K)

• Density: D(|)~

• (1, 2,…, K) are constants different gives different probability distribution over .

• K=2 Beta distribution

111

1K

ii

K

iii

Page 19: Probabilistic models Haixu Tang School of Informatics.

Example: dice factories

• Dice factories produces all kinds of dices: (1), (2),…, (6)

• A dice factory distinguish itself from the others by parameters

• The probability of producing a dice in the factory is determined by D(|)

Page 20: Probabilistic models Haixu Tang School of Informatics.

Extreme value distribution

• Outcome: the largest number among N samples from a density g(x) is larger than x;

• For a variety of densities g(x),– pdf:

– cdf:

Page 21: Probabilistic models Haixu Tang School of Informatics.

Probabilistic model

• Selecting a model– Probabilistic distribution– Machine learning methods

• Neural nets• Support Vector Machines (SVMs)

– Probabilistic graphical models• Markov models• Hidden Markov models• Bayesian models• Stochastic grammars

• Model data (sampling)• Data model (inference)

Page 22: Probabilistic models Haixu Tang School of Informatics.

Sampling

• Probabilistic model with parameter P(x| ) for event x;

• Sampling: generate a large set of events xi with probability P(xi| );

• Random number generator ( function rand() picks a number randomly from the interval [0,1) with the uniform density;

• Sampling from a probabilistic model transforming P(xi| ) to a uniform distribution– For a finite set X (xiX), find i s.t. P(x1)+…+P(xi-1) <

rand(0,1) < P(x1)+…+P(xi-1) + P(xi)

Page 23: Probabilistic models Haixu Tang School of Informatics.

Inference (ML)

• Estimating the model parameters (inference): from large sets of trusted examples

• Given a set of data D (training set), find a model with parameters with the maximal likelihood P( |D);

Page 24: Probabilistic models Haixu Tang School of Informatics.

Example: a loaded dice

• loaded dice: to estimate parameters 1, 2,

…, 6, based on N observations D=d1,d2,…dN

i=ni / N, where ni is of i, is the maximum likelihood solution (11.5)

• Inference from counts

Page 25: Probabilistic models Haixu Tang School of Informatics.

Bayesian statistics

• P(X|Y)=P(Y|X)P(X)/P(Y)

• P( |D) = P()[P(D | )/P(D)]=P()[P(D | )/ (P(D | )P ()]

P() prior probability; P(|D) posterior probability;

Page 26: Probabilistic models Haixu Tang School of Informatics.

Example: two dices

• Fair dice 0.99; loaded dice: 0.01, P(6)=0.5, P(1)=…P(5)=0.1

• 3 consecutive ‘6’es:– P(loaded|3’6’s)=P(loaded)*[P(3’6’s|loaded)/

P(3’6’s)] = 0.01*(0.53 / C)– P(fair|3’6’s)=P(fair)*[P(3’6’s|fair)/P(3’6’s)] =

0.99 * ((1/6)3 / C)– Likelihood ratio: P(loaded|3’6’s) / P(fair|3’6’s)

< 1

Page 27: Probabilistic models Haixu Tang School of Informatics.

Inference from counts: including prior knowledge

• Prior knowledge is important when the data is scarce

• Use Dirichlet distribution as prior: – P( |n) = D(|)[P(n|)/P(n)]

– Equivalent to add i as pseudo-counts to the observation I (11.5)

– We can forget about statistics and use pseudo-counts in the parameter estimation!

Page 28: Probabilistic models Haixu Tang School of Informatics.

Entropy

• Probabilities distributions P(xi) over K events

• H(x)=- P(xi) log P(xi)

– Maximized for uniform distribution P(xi)=1/K

– A measure of average uncertainty

Page 29: Probabilistic models Haixu Tang School of Informatics.

Mutual information

• Measure of independence of two random variable X and Y

• P(X|Y)=P(X), X and Y are independent P(X,Y)/P(X)P(Y)=1

• M(X;Y)=x,y P(x,y)log[P(x,y)/P(x)P(y)]– 0 independent