Lecture: Joint, Conditional and Marginal Probabilities

Joint, Condi*onal and Marginal Probabili*es

Last Updated: 24 March 2015

Slideshare: h7p://www.slideshare.net/marinasan*ni1/mathema*cs-‐for-‐language-‐technology

Mathema*cs for Language Technology

h7p://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/

Marina San*ni [email protected]

Department of Linguis*cs and Philology Uppsala University, Uppsala, Sweden

Spring 2015 1

Acknowledgements •  Several slides borrowed from Prof Joakim Nivre. •  Prac*cal Ac*vi*es by Prof Joakim Nivre

•  Required Reading: –  E&G (2013): Ch. 5 (pp. pp. 110-‐114) –  Compendium (4): 9.2, 9.3, 9.4 –  E&G (2013): Ch. 5.2-‐5.3 (self-‐study)

•  Recommended Reading: –  Sec5ons 3-‐6 in Goldsmith J. (2007) Probability for Linguists. The University of Chicago. The Department of Linguis*cs: •  h7p://hum.uchicago.edu/~jagoldsm/Papers/probability.pdf

2

Outline

•  Joint Probability •  Condi*onal Probability •  Mul*plica*on Rule •  Marginal Probability •  Bayes Law •  Independence

3

Linguis*c Note:

•  Tradi*onally, the plural is dice, but the singular is die. (i.e. 1 die, 2 dice.)

•  Modern lexicography says: ex, MacMillan: –  h7p://www.macmillandic*onary.com/dic*onary/bri*sh/dice_1

Joint vs Condi*onal In many situa*ons where we want to make use fo probabili*es, there are dependencies between different variables or events. For this reason we need the no*on of condi*onal probability, ie the probabability of an event given some other event.

the condi5onal probability of A given B is defined as the probability of the intersec*on of A and B divided by the probability of B.

the probability of the intersec*on is referred to as the joint probability because it is the probability that both A and B occur.

CONDITIONAL = NOT SYMMETRICAL 5

Condi*onal

When we talk about the joint probability of A and B, then we are considering the intersec*on of A and B, ie those outcomes that are both in A and B. And we ask: how large is that set of events compared to the en*re sample space?

6

Example: Bigrams

10-‐3 = 1/103=1/1000= one in thousand one in one million

joint probability = one in 10 millions

We apply the formula of condi*onal probability

7

From the defini*on of condi*onal probability we can derive the

Mul*plica*on Rule

8

One way to compute the probability of A and B (ie the joint probability) is to take the probability of B by itself and mul*ply it with the probability of A given B.

Another way to compute the joint probability of A and B is to start with the simple probability of A and mul*ply that by the probability of B given A

Quiz 1: only one answer is correct

9

Probability is the measure of the likeliness that an event will occur. The higher the probability of an event, the more certain we are that the event will occur.

Quiz 1: Solu*on 1. Smaller than 1 in a million — correct [P(A, B) = 0.00001(=100 000) 0.000001(=1 million)x 0.0001 (=10 000) < 0.000001; P is 1 in 10 million] 2. Greater than 1 in a million — incorrect [P(A, B) = 0.00001(=100 000) 0.000001(=1 million)x 0.0001 (=10 000) < 0.000001; P is 1 in 10 million] 3. Impossible to tell — incorrect [Given P(A | B) and P(B), we can derive P(A, B) exactly.]

10

Quiz 1: only one answer is correct

11

We apply the following mul*plica*on rule: P(A,B)=P(B)P(A|B), since we know these elements: P(B) (i.e 1/10 000 = 0.0001) ; P(A|B) (i.e 1/1 000 000 = 0.000001) P(A,B)=P(B)P(A|B) = 0.0001 * 0.000001 = 0.0000000001 (= 10 000 000 000 = 10 billions) Result: the intersec*on of A and B (ie people having BOTH a PhD in physics and winning a nobel prize) is 1 in 10 billions 1: is the probability of 1 in 10 billions smaller than 1 in 1 million ? yes! 0.0000000001 is smaller than 0.000001 2: is the probability of 1 in 10 millions greater than 1 in 1 million ? NO! 0.0000000001 is NOT smaller than 0.000001 3: impossible to predict: INCORRECT! it is possible to predict the probability because you have all the elements to apply the mul*plica*on rule.

Mul*plica*on Rule

P(A,B)=P(B)P(A|B) Variant 1

Variant 2

Marginaliza*on

13

Introduc*on to the concept of Marginaliza5on

14

par**on means: events are disjoint, ie they do not have members in common. In other words: their intersec*on is empty; their union is the en*re sample space. This a way to divide the sample space in non-‐overlapping events.

Pairwise comparison generally refers to any process of comparing en**es in pairs…

Given that we have some par**ons and given that we are interested in another event A in the same sample space, then we can compute the probability of A by summing up all the joint probabili*es with A to each member of the par**on (this is the summa*on formula in the middle).

… con*nued…

15

All this seems a very strange method because we are compu*ng something very simple, ie the probability of A, from something more complex involving summa*on, joint probabili*es and condi*onal probabili*es. But this is something that is very useful in situa5ons where we do not know the probability of A but we know the joint or the condi5onal probabili5es of A with the members of a par55on.

Knowing the mul*plica*on rule, we also know that the joint probability of A and Bi can be expressed as the condi*onal probability of A given Bi *mes the simple probability of Bi.

Marginal probability

Mul*plica*on rule

Joint, Marginal & Condi*onal Probabili*es

16

What is important is to understand the rela*on between the joint, the marginal and the condi*onal probabili*es, and the way we can derive them from each other. In par*cular, given that we know the joint probabili*es of the events we are interested in, we can always derive the marginal and condi*onal probability from them, whereas the opposite does not hold (except in some special condi*ons).

sum up to 1

What if we want the simple probabili*es?

Once we have the joint probabili*es and the simple probabili*es, we can combine these to get condi*onal probabili*es.

Joint, Marginal & Condi*onal Probabili*es

17

Bayes Law

18

Given events A and B in the sample space omega, the condi*onal probability of A given B is equal to the simple probability of A *mes the inverse condi*onal probability, ie the probability of B given A divided by the simple probabiity of B.

We know thanks to the mul*plica*on/chain rule that the joint probabili*es can be replaced by the simple probability mul*plied by the condi*onal probability. Bayes Law is a powerful tool that allows us to invert condi5onal probability. When we find ourselves in a situa*on where we need to know the probability of A given B, but our data gives us only the probability of B given A, we can invert the expression and get the probabili*es that we need ( a li7le bit more on this, next *me)

Independence

19

Two events A and B independent if and only if the joint probability of A and B is equal to the simple probability of A mul*plied by the simple probability of B. This is equivalent to say that the probability of A by itself is equal to the condi*onal probability of A given B. Or viceversa that the simple probability of B is equal to the probability of B given A. One way to think of this is to say that if two events are independent, knowing that one of them has occurred does not give us any new informa*on about the other event, because the condi*onal probability is the same as the simple probability.

Independence

20

Quiz 2 (only one answer is correct)

21

Quiz 2: Solu*ons (Joakim’s original) 1.  The probability is 0.1 — incorrect [We cannot

compute P(A | B) from P(B | A) without addi*onal informa*on.]

2. The probability is 0.9 — incorrect [We cannot compute P(A | B) from P(B | A) without addi*onal informa*on.] 3. Nothing — correct [We cannot compute P(A | B) from P(B | A) without addi*onal informa*on.]

22

Quiz 2: Solu*ons 1.  The probability is 0.1 — incorrect [We cannot

compute P(Dis|Sym) from P(Sym|Dis) without addi*onal informa*on.]

2.  The probability is 0.9 — incorrect [We cannot compute P(Dis|Sym) from P(Sym|Dis) without addi*onal informa*on.]

3.   Nothing — correct [We cannot compute P(Dis|Sym) from P(Sym|Dis) without addi*onal informa*on.]

23

Break down

•  P(Sym|Dis) = 0.9 à P(B|A)=0.9

•  P(Dis|Sym) = ? à P(A|B)=?

•  Bayes: •  P(A|B)= P(A) P(B|A) / P(B) •  P(A)=? •  P(B)=?

24

We need additonal info, ie P(A) and P(B)

Can we use marginaliza;on/Law of Total Probability to derive (A) and P(B)?

Total number of individual outcomes

Prac*cal Ac*vity 2: Part-‐of-‐Speech Bigrams -‐ Independence

25

See calcula*ons overleaf

Prac*cal Ac*vity 1: Solu*on

26

The end

27

Lecture: Joint, Conditional and Marginal Probabilities

Education

probability of b

joint probability condi

condi5onal probability

simple probability

onal probability mul

given b

incorrect given pa b

joint vs condi