Top Banner
1 Probability theory LING 570 Fei Xia Week 2: 10/01/07
37

1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

Jan 05, 2016

Download

Documents

Blake Robertson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

1

Probability theory

LING 570

Fei Xia

Week 2: 10/01/07

Page 2: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

2

Misc.

• Patas account and dropbox

• Course website, “Collect it”, and GoPost.

• Mailing list– Received message on Thursday?

• Questions about hw1?

Page 3: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

3

Outline

• Quiz #1

• Unix commands

• Linguistics

• Elementary Probability theory: M&S 2.1

Page 4: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

4

Quiz #1

Five areas: weight ave

• Programming: 4.0 (3.74)– Try Perl or Python

• Unix commands: 1.2 (0.99)

• Probability: 2.0 (1.09)

• Regular expression: 2.0 (1.62)

• Linguistics knowledge: 0.8 (0.71)

Page 5: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

5

Results

• 9.0-10: 4

• 8.0-8.9: 8

• < 8.0: 8

Page 6: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

6

Unix commands

• ls (list), cp (copy), rm (remove)

• more, less, cat

• cd, mkdir, rmdir, pwd

• chmod: to change file permission

• tar, gzip: to tar/zip files

• ssh, sftp: to log on or ftp files

• man: to learn a command

Page 7: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

7

Unix commands (cont)

• compilers: javac, gcc, g++, perl, …• ps, top, • which

• Pipe:cat input_file | eng_tokenizer.sh | make_voc.sh > output_file

• sort, unique, awk, grepgrep “the” voc | awk ‘{print $2}’ | sort | uniq –c | sort -nr

Page 8: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

8

Examples

• Set the permission of foo.pl so it is readable and executable by the user and the group.

rwx rwx rwx => 101 101 000 chmod 550 foo.pl

• Move a file, foo.pl, from your home dir to /tmp mv ~/foo.pl /tmp

Page 9: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

9

Linguistics: POS tags

• Open class: Noun, verb, adjective, adverb– Auxiliary verb/modal: can, will, might, ..– Temporal noun: tomorrow– Adverb: adj+ly, always, still, not, …

• Closed class: Preposition, conjunction, determiner, pron,– Conjunction: CC (and), SC (if, although)– Complementizer: that,

Page 10: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

10

Linguistics: syntactic structure

• Two kinds:– Phrase structure (a.k.a. parse tree): – Dependency structure

• Examples: – John said that he would call Mary tomorrow

Page 11: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

11

Outline

• Quiz #1

• Unix commands

• Linguistics

• Elementary Probability theory

Page 12: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

12

Probability Theory

Page 13: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

13

Basic concepts

• Sample space, event, event space

• Random variable and random vector

• Conditional probability, joint probability, marginal probability (prior)

Page 14: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

14

Sample space, event, event space

• Sample space (Ω): the set of all possible outcomes. – Ex: toss a coin three times: {HHH, HHT, HTH, HTT, …}

• Event: an event is a subset of Ω.– Ex: an event is {HHT, HTH, THH}

• Event space (2Ω): the set of all possible events.

Page 15: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

15

Probability function

• A probability function (a.k.a. a probability distribution) distributes a probability mass of 1 throughout the sample space .

• It is a function from 2 ! [0,1] such that:

P() = 1

For any disjoint sets Aj 2 2, P( Aj) = P(Aj)

- Ex: P({HHT, HTH, HTT})

= P({HHT}) + P({HTH}) + P({HTT})

Page 16: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

16

The coin example

• The prob of getting a head is 0.1 for one toss. What is the prob of getting two heads out of three tosses?

• P(“Getting two heads”)

= P({HHT, HTH, THH})

= P(HHT) + P(HTH) + P(THH)

= 0.1*0.1*0.9 + 0.1*0.9*0.1+0.9*0.1*0.1

= 3*0.1*0.1*0.9

Page 17: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

17

Random variable

• The outcome of an experiment need not be a number.

• We often want to represent outcomes as numbers.

• A random variable X is a function: ΩR.– Ex: the number of heads with three tosses:

X(HHT)=2, X(HTH)=2, X(HTT)=1, …

Page 18: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

18

The coin example (cont)

• X = the number of heads with three tosses

• P(X=2)

= P({HHT, HTH, THH})

= P({HHT}) + P({HTH}) + P({THH})

Page 19: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

19

Two types of random variables

• Discrete: X takes on only a countable number of possible values.– Ex: Toss a coin three times. X is the number

of heads that are noted.

• Continuous: X takes on an uncountable number of possible values.– Ex: X is the speed of a car

Page 20: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

20

Common trick #1: Maximum likelihood estimation

• An example: toss a coin 3 times, and got two heads. What is the probability of getting a head with one toss?

• Maximum likelihood: (ML)

* = arg max P(data | )

• In the example, – P(X=2) = 3 * p * p * (1-p) e.g., the prob is 3/8 when p=1/2, and is 12/27 when p=2/3 3/8 < 12/27

Page 21: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

21

Random vector

• Random vector is a finite-dimensional vector of random variables: X=[X1,…,Xk].

• P(x) = P(x1,x2,…,xn)=P(X1=x1,…., Xn=xn)

• Ex: P(w1, …, wn, t1, …, tn)

Page 22: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

22

Notation

• X, Y, Xi, Yi are random variables.

• x, y, xi are values.

• P(X=x) is written as P(x)

• P(X=x | Y=y) is written as P(x | y).

Page 23: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

23

Three types of probability

• Joint prob: P(x,y)= prob of X=x and Y=y happening together

• Conditional prob: P(x | y) = prob of X=x given a specific value of Y=y

• Marginal prob: P(x) = prob of X=x for all possible values of Y.

Page 24: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

24

An example

• There are two coins. Choose a coin and then toss it. Do that 10 times.

• Coin 1 is chosen 4 times: one head and three tails.

• Coin 2 is chosen six times: four heads and two tails.

• Let’s calculate the probabilities.

Page 25: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

25

Probabilities

• P(C=1) = 4/10, P(C=2) = 6/10

• P(X=h) = 5/10, P(X=t) = 5/10

• P(X=h | C=1) = ¼, P(X=h |C=2) =4/6• P(X=t | C=1) = ¾, P(X=t |C=2) = 2/6

• P(X=h, C=1) =1/10, P(X=h, C=2)= 4/10• P(X=t, C=1) = 3/10, P(X=t | C=2) = 2/10

Page 26: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

26

Relation between different types of probabilities

P(X=h, C=1)

= P(C=1) * P(X=h | C=1)

= 4/10 * ¼ = 1/10

P(X=h)

= P(X=h, C=1) + P(X=h, C=2)

= 1/10 + 4/10 = 5/10

Page 27: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

27

Common trick #2:Chain rule

)|(*)()|(*)(),( BAPBPABPAPBAP

),...|(),...,( 111

1 ii

in AAAPAAP

Page 28: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

28

Common trick #3: joint prob Marginal prob

B

BAPAP ),()(

nAA

nAAPAP,...,

11

2

),...,()(

Page 29: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

29

Common trick #4:Bayes’ rule

)(

)()|(

)(

),()|(

AP

BPBAP

AP

BAPABP

)()|(maxarg

)(

)()|(maxarg

)|(maxarg*

yPyxP

xP

yPyxP

xyPy

y

y

y

Page 30: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

30

Independent random variables

• Two random variables X and Y are independent iff the value of X has no influence on the value of Y and vice versa.

• P(X,Y) = P(X) P(Y)

• P(Y|X) = P(Y)

• P(X|Y) = P(X)

• Our previous examples: P(X, C) != P(X) P(C)

Page 31: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

31

Conditional independence

Once we know C, the value of A does not affect the value of B and vice versa.

• P(A,B | C) = P(A|C) P(B|C)

• P(A|B,C) = P(A | C)

• P(B|A, C) = P(B |C)

Page 32: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

32

Independence and conditional independence

• If A and B are independent, are they conditional independent?

• Example:– Burglar, Earthquake– Alarm

Page 33: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

33

Common trick #5:Independence assumption

)|(

),...|(),...,(

11

111

1

ii

i

ii

in

AAP

AAAPAAP

Page 34: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

34

An example

• P(w1 w2 … wn)

= P(w1) P(w2 | w1) P(w3 | w1 w2) * …

* P(wn | w1 …, wn-1)

¼ P(w1) P(w2 | w1) …. P(wn | wn-1)

• Why do we make independence assumption which we know are not true?

Page 35: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

35

Summary of elementaryprobability theory

• Basic concepts: sample space, event space, random variable, random vector

• Joint / conditional /marginal probability

• Independence and conditional independence

• Five common tricks:– Max likelihood estimation– Chain rule– Calculating marginal probability from joint probability– Bayes’ rule– Independence assumption

Page 36: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

36

Outline

• Quiz #1

• Unix commands

• Linguistics

• Elementary Probability theory

Page 37: 1 Probability theory LING 570 Fei Xia Week 2: 10/01/07.

37

Next time

• J&M Chapt 2– Formal language and formal grammar– Regular expression

• Hw1 is due at 3pm on Wed.