Top Banner
Joint, Condi*onal and Marginal Probabili*es Last Updated: 24 March 2015 Slideshare: h7p://www.slideshare.net/marinasan*ni1/mathema*csforlanguagetechnology Mathema*cs for Language Technology h7p://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/ Marina San*ni [email protected]fil.uu.se Department of Linguis*cs and Philology Uppsala University, Uppsala, Sweden Spring 2015 1
27
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture: Joint, Conditional and Marginal Probabilities

Joint,  Condi*onal  and  Marginal  Probabili*es    

Last  Updated:  24  March  2015    

Slideshare:  h7p://www.slideshare.net/marinasan*ni1/mathema*cs-­‐for-­‐language-­‐technology  

 Mathema*cs  for  Language  Technology  

h7p://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/  

Marina  San*ni  [email protected]  

 Department  of  Linguis*cs  and  Philology  Uppsala  University,  Uppsala,  Sweden  

 Spring  2015   1  

Page 2: Lecture: Joint, Conditional and Marginal Probabilities

Acknowledgements  •  Several  slides  borrowed  from  Prof  Joakim  Nivre.  •  Prac*cal  Ac*vi*es  by  Prof  Joakim  Nivre  

•  Required  Reading:  –  E&G  (2013):  Ch.  5  (pp.  pp.  110-­‐114)  –  Compendium  (4):  9.2,  9.3,  9.4  –  E&G  (2013):  Ch.  5.2-­‐5.3  (self-­‐study)  

•  Recommended  Reading:  –  Sec5ons  3-­‐6  in  Goldsmith  J.  (2007)  Probability  for  Linguists.  The  University  of  Chicago.  The  Department  of  Linguis*cs:  •  h7p://hum.uchicago.edu/~jagoldsm/Papers/probability.pdf  

2  

Page 3: Lecture: Joint, Conditional and Marginal Probabilities

Outline  

•  Joint  Probability  •  Condi*onal  Probability  •  Mul*plica*on  Rule  •  Marginal  Probability  •  Bayes  Law  •  Independence  

3  

Page 4: Lecture: Joint, Conditional and Marginal Probabilities

Linguis*c  Note:  

•  Tradi*onally,  the  plural  is  dice,  but  the  singular  is  die.  (i.e.  1  die,  2  dice.)  

•  Modern  lexicography  says:  ex,  MacMillan:  –  h7p://www.macmillandic*onary.com/dic*onary/bri*sh/dice_1  

Page 5: Lecture: Joint, Conditional and Marginal Probabilities

Joint  vs  Condi*onal  In  many  situa*ons  where  we  want  to  make  use  fo  probabili*es,  there  are  dependencies  between  different  variables  or  events.  For  this  reason  we  need  the  no*on  of  condi*onal  probability,  ie  the  probabability  of  an  event  given  some  other  event.  

the  condi5onal  probability  of  A  given  B  is  defined  as  the  probability  of  the  intersec*on  of  A  and  B  divided  by  the  probability  of  B.    

the  probability  of  the  intersec*on  is  referred  to  as  the  joint  probability  because  it  is  the  probability  that  both  A  and  B  occur.  

CONDITIONAL  =  NOT  SYMMETRICAL    5  

Page 6: Lecture: Joint, Conditional and Marginal Probabilities

Condi*onal  

When  we  talk  about  the  joint  probability  of  A  and  B,  then  we  are  considering  the  intersec*on  of  A  and  B,  ie  those  outcomes  that  are  both  in  A  and  B.  And  we  ask:  how  large  is  that  set  of  events  compared  to  the  en*re  sample  space?  

6  

Page 7: Lecture: Joint, Conditional and Marginal Probabilities

Example:  Bigrams  

10-­‐3  =  1/103=1/1000=  one  in  thousand  one  in  one  million  

joint  probability  =  one  in  10  millions  

We  apply  the  formula  of  condi*onal  probability    

7  

Page 8: Lecture: Joint, Conditional and Marginal Probabilities

From  the  defini*on  of  condi*onal  probability  we  can  derive  the  

Mul*plica*on  Rule  

8  

One  way  to  compute  the  probability  of  A  and  B  (ie  the  joint  probability)  is  to  take  the  probability  of  B  by  itself  and  mul*ply  it  with  the  probability  of  A  given  B.    

Another  way  to  compute  the  joint  probability  of  A  and  B  is  to  start  with  the  simple  probability  of  A  and  mul*ply  that  by  the  probability  of  B  given  A  

Page 9: Lecture: Joint, Conditional and Marginal Probabilities

Quiz  1:  only  one  answer  is  correct  

9  

Probability  is  the  measure  of  the  likeliness  that  an  event  will  occur.  The  higher  the  probability  of  an  event,  the  more  certain  we  are  that  the  event  will  occur.    

Page 10: Lecture: Joint, Conditional and Marginal Probabilities

Quiz  1:  Solu*on  1.  Smaller  than  1  in  a  million  —  correct  [P(A,  B)  =  0.00001(=100  000)  0.000001(=1  million)x  0.0001  (=10  000)  <  0.000001;  P  is  1  in  10  million]    2.  Greater  than  1  in  a  million  —  incorrect  [P(A,  B)  =  0.00001(=100  000)  0.000001(=1  million)x  0.0001  (=10  000)  <  0.000001;  P  is  1  in  10  million]    3.  Impossible  to  tell  —  incorrect  [Given  P(A  |  B)  and  P(B),  we  can  derive  P(A,  B)  exactly.]  

10  

Page 11: Lecture: Joint, Conditional and Marginal Probabilities

Quiz  1:  only  one  answer  is  correct  

11  

We  apply  the  following  mul*plica*on  rule:  P(A,B)=P(B)P(A|B),  since  we  know  these  elements:    P(B)  (i.e  1/10  000  =  0.0001)  ;  P(A|B)  (i.e  1/1  000  000  =  0.000001)    P(A,B)=P(B)P(A|B)  =  0.0001  *  0.000001  =  0.0000000001  (=  10  000  000  000  =  10  billions)    Result:  the  intersec*on  of  A  and  B  (ie  people  having  BOTH  a  PhD  in  physics  and  winning  a  nobel  prize)  is  1  in  10  billions    1:  is  the  probability  of  1  in  10  billions  smaller  than  1  in  1  million  ?  yes!  0.0000000001  is  smaller  than  0.000001  2:  is  the  probability  of  1  in  10  millions  greater  than  1  in  1  million  ?  NO!  0.0000000001  is  NOT  smaller  than  0.000001  3:  impossible  to  predict:  INCORRECT!  it  is  possible  to  predict  the  probability  because  you  have  all  the  elements  to  apply  the  mul*plica*on  rule.    

Page 12: Lecture: Joint, Conditional and Marginal Probabilities

Mul*plica*on  Rule  

 P(A,B)=P(B)P(A|B)   Variant  1  

Variant  2  

Page 13: Lecture: Joint, Conditional and Marginal Probabilities

Marginaliza*on  

13  

Page 14: Lecture: Joint, Conditional and Marginal Probabilities

Introduc*on  to  the  concept  of  Marginaliza5on  

14  

par**on  means:  events  are  disjoint,  ie  they  do  not  have  members  in  common.    In  other  words:  their  intersec*on  is  empty;  their  union  is  the  en*re  sample  space.    This  a  way  to  divide  the  sample  space  in  non-­‐overlapping  events.    

Pairwise  comparison  generally  refers  to  any  process  of  comparing  en**es  in  pairs…  

Given  that  we  have  some  par**ons  and  given  that  we  are  interested  in  another  event  A  in  the  same  sample  space,  then  we  can  compute  the  probability  of  A  by  summing  up  all  the  joint  probabili*es  with  A  to  each  member  of  the  par**on  (this  is  the  summa*on  formula  in  the  middle).    

Page 15: Lecture: Joint, Conditional and Marginal Probabilities

…  con*nued…  

15  

All  this  seems  a  very  strange  method  because  we  are  compu*ng  something  very  simple,  ie  the  probability  of  A,  from  something  more  complex  involving  summa*on,  joint  probabili*es  and  condi*onal  probabili*es.        But  this  is  something  that  is  very  useful  in  situa5ons  where  we  do  not  know  the  probability  of  A  but  we  know  the  joint  or  the  condi5onal  probabili5es  of  A  with  the  members  of  a  par55on.    

Knowing  the  mul*plica*on  rule,  we  also  know  that  the  joint  probability  of  A  and  Bi  can  be  expressed  as  the  condi*onal  probability  of  A  given  Bi  *mes  the  simple  probability  of  Bi.  

Marginal  probability  

Mul*plica*on  rule  

Page 16: Lecture: Joint, Conditional and Marginal Probabilities

Joint,  Marginal  &  Condi*onal  Probabili*es  

16  

What  is  important  is  to  understand  the  rela*on  between  the  joint,  the  marginal  and  the  condi*onal  probabili*es,  and  the  way  we  can  derive  them  from  each  other.  In  par*cular,  given  that  we  know  the  joint  probabili*es  of  the  events  we  are  interested  in,  we  can  always  derive  the  marginal  and  condi*onal  probability  from  them,  whereas  the  opposite  does  not  hold  (except  in  some  special  condi*ons).  

sum  up  to  1  

What  if  we  want  the  simple  probabili*es?  

Once  we  have  the  joint  probabili*es  and  the  simple  probabili*es,  we  can  combine  these  to  get  condi*onal  probabili*es.    

Page 17: Lecture: Joint, Conditional and Marginal Probabilities

Joint,  Marginal  &  Condi*onal  Probabili*es  

17  

Page 18: Lecture: Joint, Conditional and Marginal Probabilities

Bayes  Law  

18  

Given  events  A  and  B  in  the  sample  space  omega,  the  condi*onal  probability  of  A  given  B  is  equal  to  the  simple  probability  of  A  *mes  the  inverse  condi*onal  probability,  ie  the  probability  of  B  given  A  divided  by  the  simple  probabiity  of  B.    

We  know  thanks  to  the  mul*plica*on/chain  rule  that  the  joint  probabili*es  can  be  replaced  by  the  simple  probability  mul*plied  by  the  condi*onal  probability.      Bayes  Law  is  a  powerful  tool  that  allows  us  to  invert  condi5onal  probability.    When  we  find  ourselves  in  a  situa*on  where  we  need  to  know  the  probability  of  A  given  B,  but  our  data  gives  us  only  the  probability  of  B  given  A,  we  can  invert  the  expression  and  get  the  probabili*es  that  we  need  (  a  li7le  bit  more  on  this,  next  *me)  

Page 19: Lecture: Joint, Conditional and Marginal Probabilities

Independence  

19  

Two  events  A  and  B  independent  if  and  only  if  the  joint  probability  of  A  and  B  is  equal  to  the  simple  probability  of  A  mul*plied  by  the  simple  probability  of  B.      This  is  equivalent  to  say  that  the  probability  of  A  by  itself  is  equal  to  the  condi*onal  probability  of  A  given  B.  Or  viceversa  that  the  simple  probability  of  B  is  equal  to  the  probability  of  B  given  A.      One  way  to  think  of  this  is  to  say  that  if  two  events  are  independent,  knowing  that  one  of  them  has  occurred  does  not  give  us  any  new  informa*on  about  the  other  event,  because  the  condi*onal  probability  is  the  same  as  the  simple  probability.    

Page 20: Lecture: Joint, Conditional and Marginal Probabilities

Independence  

20  

Page 21: Lecture: Joint, Conditional and Marginal Probabilities

Quiz  2  (only  one  answer  is  correct)  

21  

Page 22: Lecture: Joint, Conditional and Marginal Probabilities

Quiz  2:  Solu*ons  (Joakim’s  original)  1.  The  probability  is  0.1  —  incorrect  [We  cannot  

compute  P(A  |  B)  from  P(B  |  A)  without  addi*onal  informa*on.]  

2.  The  probability  is  0.9  —  incorrect  [We  cannot  compute  P(A  |  B)  from  P(B  |  A)  without  addi*onal  informa*on.]    3.  Nothing  —  correct  [We  cannot  compute  P(A  |  B)  from  P(B  |  A)  without  addi*onal  informa*on.]  

22  

Page 23: Lecture: Joint, Conditional and Marginal Probabilities

Quiz  2:  Solu*ons  1.  The  probability  is  0.1  —  incorrect  [We  cannot  

compute  P(Dis|Sym)  from  P(Sym|Dis)  without  addi*onal  informa*on.]  

2.  The  probability  is  0.9  —  incorrect  [We  cannot  compute  P(Dis|Sym)  from  P(Sym|Dis)  without  addi*onal  informa*on.]  

3.   Nothing  —  correct  [We  cannot  compute  P(Dis|Sym)  from  P(Sym|Dis)  without  addi*onal  informa*on.]  

23  

Page 24: Lecture: Joint, Conditional and Marginal Probabilities

Break  down  

•  P(Sym|Dis)  =  0.9  à  P(B|A)=0.9  

•  P(Dis|Sym)  =  ?  à  P(A|B)=?  

•  Bayes:    •  P(A|B)=  P(A)  P(B|A)  /  P(B)  •  P(A)=?  •  P(B)=?  

24  

We  need  additonal  info,  ie  P(A)  and  P(B)  

Can  we  use  marginaliza;on/Law  of  Total  Probability  to  derive  (A)  and  P(B)?    

Total  number  of  individual  outcomes  

Page 25: Lecture: Joint, Conditional and Marginal Probabilities

Prac*cal  Ac*vity  2:  Part-­‐of-­‐Speech  Bigrams  -­‐  Independence  

25  

See  calcula*ons  overleaf  

Page 26: Lecture: Joint, Conditional and Marginal Probabilities

Prac*cal  Ac*vity  1:  Solu*on    

26  

Page 27: Lecture: Joint, Conditional and Marginal Probabilities

The  end  

27