Top Banner
Math and Probability for ML Recap Jeongmin Lee Computer Science Department University of Pittsburgh CS 1675 Intro to Machine Learning – Recitation
38

CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Sep 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MathandProbabilityforMLRecap

JeongminLeeComputerScienceDepartment

UniversityofPittsburgh

CS1675IntrotoMachineLearning– Recitation

Page 2: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Acknowledgement

• Slidecontentsarebasedontheseresources:

• Dr.Hauskrecht’s CS1675LectureNote:http://people.cs.pitt.edu/~milos/courses/cs1675/Lectures/Class5.pdf

• Dr.Ainsworth’sIntrotoMatricesLectureNote:http://www.csun.edu/~ata20315/psy524/docs/Intro%20to%20Matrices.pptx

• WhiteandMarino’sProbabilityRecitationLectureNote:http://www.cs.cmu.edu/~ninamf/courses/401sp18/recitations/probability_recitation.pdf

Page 3: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Outline

• Part1.MatrixOperations• Part2.Probabilities

Page 4: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Part1.MatrixOperations

• VectorandMatrix• Vector– Vectormultiplication•Matrix– MatrixMultiplication• InnerProduct• OuterProduct

Page 5: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Vector

• Arrayofnumbers• Columnvector(verticallyarranged):X• Rowvector(horizontallyarranged):Y

[ ]4 1 1 3

129

7 22 144

0

x xX Y

é ùê úê ú= = -ê ú-ê úë û

Page 6: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Vector-VectorMultiplication

• Twovectorsshouldbeinsamesize

• Inner(dot)product

Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar

or a matrix, depending on how the vectors are multiplied.

Inner (dot) product

Outerproduct

»»»

¼

º

«««

¬

ª

»»»

¼

º

«««

¬

ª�

189

,73

2uv

> @ 17)24(18189

732 ��� »»»

¼

º

«««

¬

ª� uvT

> @»»»

¼

º

«««

¬

ª���

»»»

¼

º

«««

¬

ª�

7566432427

21618189

73

2Tvu

Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar

or a matrix, depending on how the vectors are multiplied.

Inner (dot) product

Outerproduct

»»»

¼

º

«««

¬

ª

»»»

¼

º

«««

¬

ª�

189

,73

2uv

> @ 17)24(18189

732 ��� »»»

¼

º

«««

¬

ª� uvT

> @»»»

¼

º

«««

¬

ª���

»»»

¼

º

«««

¬

ª�

7566432427

21618189

73

2Tvu

Page 7: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Vector-VectorMultiplication

• Twovectorsshouldbeinsamesize

• Inner(dot)product

• Outerproduct

Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar

or a matrix, depending on how the vectors are multiplied.

Inner (dot) product

Outerproduct

»»»

¼

º

«««

¬

ª

»»»

¼

º

«««

¬

ª�

189

,73

2uv

> @ 17)24(18189

732 ��� »»»

¼

º

«««

¬

ª� uvT

> @»»»

¼

º

«««

¬

ª���

»»»

¼

º

«««

¬

ª�

7566432427

21618189

73

2Tvu

Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar

or a matrix, depending on how the vectors are multiplied.

Inner (dot) product

Outerproduct

»»»

¼

º

«««

¬

ª

»»»

¼

º

«««

¬

ª�

189

,73

2uv

> @ 17)24(18189

732 ��� »»»

¼

º

«««

¬

ª� uvT

> @»»»

¼

º

«««

¬

ª���

»»»

¼

º

«««

¬

ª�

7566432427

21618189

73

2Tvu

Basic OperationsMatrix/vector operations: Vector-Vector Multiplication• The product of two vectors of the same length is either a scalar

or a matrix, depending on how the vectors are multiplied.

Inner (dot) product

Outerproduct

»»»

¼

º

«««

¬

ª

»»»

¼

º

«««

¬

ª�

189

,73

2uv

> @ 17)24(18189

732 ��� »»»

¼

º

«««

¬

ª� uvT

> @»»»

¼

º

«««

¬

ª���

»»»

¼

º

«««

¬

ª�

7566432427

21618189

73

2Tvu

Page 8: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MatrixAddition andSubtraction• Theyalsoshouldbeinthesamesize• Simplyaddorsubtractthecorrespondingcomponentsofeachmatrix.

Page 9: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

2 3 2 3

1 2 3 5 6 7

7 8 9 3 4 5

1 2 3 5 6 7 1 5 2 6 3 7 6 8 107 8 9 3 4 5 7 3 8 4 9 5 10 12 14

1 2 3 5 6 7 1 5 2 6 3 7 4 4 47 8 9 3 4 5 7 3 8 4 9 5 4 4 4

x xA B

A B

A B B A

A B

é ù é ù= =ê ú ê úë û ë û

+ + +é ù é ù é ù é ù+ = + = =ê ú ê ú ê ú ê ú+ + +ë û ë û ë û ë û+ = +

- - - - - -é ù é ù é ù é ù- = - = =ê ú ê ú ê ú ê ú- - -ë û ë û ë û ë û

MatrixAddition

Page 10: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

2 3 2 3

1 2 3 5 6 7

7 8 9 3 4 5

1 2 3 5 6 7 1 5 2 6 3 7 6 8 107 8 9 3 4 5 7 3 8 4 9 5 10 12 14

1 2 3 5 6 7 1 5 2 6 3 7 4 4 47 8 9 3 4 5 7 3 8 4 9 5 4 4 4

x xA B

A B

A B B A

A B

é ù é ù= =ê ú ê úë û ë û

+ + +é ù é ù é ù é ù+ = + = =ê ú ê ú ê ú ê ú+ + +ë û ë û ë û ë û+ = +

- - - - - -é ù é ù é ù é ù- = - = =ê ú ê ú ê ú ê ú- - -ë û ë û ë û ë û

MatrixSubtraction

Page 11: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MatrixMultiplication• ThenumberofcolumnsinAequalsthenumberofrowsinB.• AssumingA hasi xjdimensionsandB hasjxkdimensions,theresultingmatrix,C,willhavedimensionsi xk• Inotherwords,inordertomultiplythemtheinnerdimensionsmustmatchandtheresultistheouterdimensions.• EachelementinCcanbycomputedby:

ik j ij jkC A B= S

Page 12: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MatrixMultiplication

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

2 3 3 2

11 12

2 3 3 2 2 221 22

11 1 1

12 1 2

21 2 1

22 2 2

2 3 3 2 2 2

5 31 2 3

' 6 47 8 9

7 5

'

1*5 2*6 3*7 38

1*3 2*4 3*5 26

7*5 8*6 9*7 155

7*3 8*4 9*5 98

38 26'

1

x x

x x x

j j

j j

j j

j j

x x x

A B

c cA B C

c c

c A B

c A B

c A B

c A B

A B C

é ùé ù ê ú= =ê ú ê úë û ê úë û

é ù= = ê ú

ë û= = + + =

= = + + =

= = + + =

= = + + =

= =

åååå

55 98é ùê úë û

Matchinginnerdimensions!!Resultingmatrixhasouterdimensions!!!

Page 13: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MatrixMultiplication

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

2 3 3 2

11 12

2 3 3 2 2 221 22

11 1 1

12 1 2

21 2 1

22 2 2

2 3 3 2 2 2

5 31 2 3

' 6 47 8 9

7 5

'

1*5 2*6 3*7 38

1*3 2*4 3*5 26

7*5 8*6 9*7 155

7*3 8*4 9*5 98

38 26'

1

x x

x x x

j j

j j

j j

j j

x x x

A B

c cA B C

c c

c A B

c A B

c A B

c A B

A B C

é ùé ù ê ú= =ê ú ê úë û ê úë û

é ù= = ê ú

ë û= = + + =

= = + + =

= = + + =

= = + + =

= =

åååå

55 98é ùê úë û

Matchinginnerdimensions!!Resultingmatrixhasouterdimensions!!!

Page 14: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MatrixMultiplication

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

2 3 3 2

11 12

2 3 3 2 2 221 22

11 1 1

12 1 2

21 2 1

22 2 2

2 3 3 2 2 2

5 31 2 3

' 6 47 8 9

7 5

'

1*5 2*6 3*7 38

1*3 2*4 3*5 26

7*5 8*6 9*7 155

7*3 8*4 9*5 98

38 26'

1

x x

x x x

j j

j j

j j

j j

x x x

A B

c cA B C

c c

c A B

c A B

c A B

c A B

A B C

é ùé ù ê ú= =ê ú ê úë û ê úë û

é ù= = ê ú

ë û= = + + =

= = + + =

= = + + =

= = + + =

= =

åååå

55 98é ùê úë û

Matchinginnerdimensions!!Resultingmatrixhasouterdimensions!!!

Page 15: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MatrixMultiplication

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

2 3 3 2

11 12

2 3 3 2 2 221 22

11 1 1

12 1 2

21 2 1

22 2 2

2 3 3 2 2 2

5 31 2 3

' 6 47 8 9

7 5

'

1*5 2*6 3*7 38

1*3 2*4 3*5 26

7*5 8*6 9*7 155

7*3 8*4 9*5 98

38 26'

1

x x

x x x

j j

j j

j j

j j

x x x

A B

c cA B C

c c

c A B

c A B

c A B

c A B

A B C

é ùé ù ê ú= =ê ú ê úë û ê úë û

é ù= = ê ú

ë û= = + + =

= = + + =

= = + + =

= = + + =

= =

åååå

55 98é ùê úë û

Resultingmatrixhasouterdimensions!!!

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

2 3 3 2

11 12

2 3 3 2 2 221 22

11 1 1

12 1 2

21 2 1

22 2 2

2 3 3 2 2 2

5 31 2 3

' 6 47 8 9

7 5

'

1*5 2*6 3*7 38

1*3 2*4 3*5 26

7*5 8*6 9*7 155

7*3 8*4 9*5 98

38 26'

1

x x

x x x

j j

j j

j j

j j

x x x

A B

c cA B C

c c

c A B

c A B

c A B

c A B

A B C

é ùé ù ê ú= =ê ú ê úë û ê úë û

é ù= = ê ú

ë û= = + + =

= = + + =

= = + + =

= = + + =

= =

åååå

55 98é ùê úë û

Matchinginnerdimensions!!

Page 16: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Outline

• Part1.MatrixOperations• Part2.Probabilities

Page 17: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Part2.Probabilities

• Probabilitydefinition• Conditionalprobability• BayesRule• ConceptofIndependence• Randomvariable• Distributions• Discretedistribution• Continuousdistribution• Jointdistribution(multiplerandomvariables)• Meanandvarianceofadistribution

Page 18: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

SetBasics

• Asetisacollectionofelements• Intersection:• Union:• Complement:

Set Basics

3

A set is a collection of elements• Intersection: 𝐴 ∩ 𝐵 = 𝑥: 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵• Union: 𝐴 ∪ 𝐵 = {𝑥: 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵}• Complement: 𝐴C = {𝑥: 𝑥 ∉ 𝐴}

Set Basics

3

A set is a collection of elements• Intersection: 𝐴 ∩ 𝐵 = 𝑥: 𝑥 ∈ 𝐴 and 𝑥 ∈ 𝐵• Union: 𝐴 ∪ 𝐵 = {𝑥: 𝑥 ∈ 𝐴 or 𝑥 ∈ 𝐵}• Complement: 𝐴C = {𝑥: 𝑥 ∉ 𝐴}

Page 19: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

ProbabilityDefinitions

• SamplespaceΩ:setofpossibleoutcomes• EventspaceF:collectionofsubsets• ProbabilitymeasureP:assignsprobabilitiestoevents• Probabilityspace(Ω,F,P):setofsamplespace,eventspace,andprobabilitymeasure

Page 20: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

ProbabilityDefinitions

• Forexample,let’sconsideracaseofrollingadice:• SamplespaceΩ:={1,2,3,4,5,6}• EventspaceF ={{1},{2},… ,{1,2},… ,{1,2,3,4,5,6},∅ }• P({1})=1/6,P({2,4,6})=½,etc…

Page 21: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

ProbabilityAxioms

• P(Ac)=1– P(A)• P(A)≤1• P(∅)=0

Page 22: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Note

• Wewillnotate𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴, 𝐵 inthisrecitation

Page 23: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

ConditionalProbabilities

• ConditionalprobabilityofAgivenB:• Thatis,treatBastheentiresamplespaceandthenfindtheprobabilityofA

Conditional Probabilities

8

The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)

I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.

This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =

𝑖𝑃 𝐵 ∩ 𝐴𝑖 =

𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)

Conditional Probabilities

8

The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)

I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.

This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =

𝑖𝑃 𝐵 ∩ 𝐴𝑖 =

𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)

Conditional Probabilities

8

The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)

I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.

This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =

𝑖𝑃 𝐵 ∩ 𝐴𝑖 =

𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)

Page 24: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

ConditionalProbabilities

• ConditionalprobabilityofAgivenB:• Thatis,treatBastheentiresamplespaceandthenfindtheprobabilityofA

• Product(chain)rule• Rewritingoftheconditionalprobability

Conditional Probabilities

8

The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)

I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.

This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =

𝑖𝑃 𝐵 ∩ 𝐴𝑖 =

𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)

Conditional Probabilities

8

The conditional probability of 𝑨 given 𝑩: 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)

I.e., treat 𝐵 as the entire sample space, and then find the probability of 𝐴.

This implies 𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃(𝐴 ∩ 𝐵)“chain rule for probabilities”Given a partition 𝐴1, 𝐴2, … of Ω,𝑃 𝐵 =

𝑖𝑃 𝐵 ∩ 𝐴𝑖 =

𝑖𝑃 𝐵 𝐴𝑖 𝑃(𝐴𝑖)

P(A|B)=P(A,B)/P(B)

𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝐵 𝑃(𝐵)

𝑎𝑠𝑃 𝐴 ∩ 𝐵 = 𝑃(𝐴, 𝐵)

Page 25: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

ConditionalProbabilitiesExample

• Givenadie,• A={1,2,3,4}:i.e.,therollis<5• B={1,3,5}:i.e.,therollisodd• P(A)=2/3• P(B)=1/2• P(A|B)=

• P(B|A)=

Conditional Probability Example

9

Given a die, Ω = {1,2,3,4,5,6}, 𝐹 = 2Ω, 𝑃 𝑖 = 1/6,𝐴 = {1,2,3,4}, i.e., the roll is < 5,𝐵 = 1,3,5 , i.e., the roll is odd.• 𝑃 𝐴 = 2/3• 𝑃 𝐵 = 1/2

• 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)

= 𝑃( 1,3 )𝑃(𝐵)

= 23

• 𝑃 𝐵 𝐴 = 𝑃(𝐴∩𝐵)𝑃(𝐴)

= 𝑃( 1,3 )𝑃(𝐴)

= 12

• Note these quantities are not the same!

Conditional Probability Example

9

Given a die, Ω = {1,2,3,4,5,6}, 𝐹 = 2Ω, 𝑃 𝑖 = 1/6,𝐴 = {1,2,3,4}, i.e., the roll is < 5,𝐵 = 1,3,5 , i.e., the roll is odd.• 𝑃 𝐴 = 2/3• 𝑃 𝐵 = 1/2

• 𝑃 𝐴 𝐵 = 𝑃(𝐴∩𝐵)𝑃(𝐵)

= 𝑃( 1,3 )𝑃(𝐵)

= 23

• 𝑃 𝐵 𝐴 = 𝑃(𝐴∩𝐵)𝑃(𝐴)

= 𝑃( 1,3 )𝑃(𝐴)

= 12

• Note these quantities are not the same!

Page 26: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Bayestheorem

• FromChainrule:

• WecanrearrangeittoBayesrule:

Image:wikipedia.org

Bayes’ Rule

10

Using the chain rule,𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃(𝐴),Rearranging gives us Bayes’ rule:

𝑃 𝐵 𝐴 =𝑃 𝐴 𝐵 𝑃(𝐵)

𝑃(𝐴)If 𝐵1, 𝐵2, … is a partition of Ω, we have

𝑃 𝐵𝑖 𝐴 =𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖) 𝑖 𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖)

(from Bayes’ rule + Law of Total Probability)

Bayes’ Rule

10

Using the chain rule,𝑃 𝐴 𝐵 𝑃 𝐵 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃(𝐴),Rearranging gives us Bayes’ rule:

𝑃 𝐵 𝐴 =𝑃 𝐴 𝐵 𝑃(𝐵)

𝑃(𝐴)If 𝐵1, 𝐵2, … is a partition of Ω, we have

𝑃 𝐵𝑖 𝐴 =𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖) 𝑖 𝑃 𝐴 𝐵𝑖 𝑃(𝐵𝑖)

(from Bayes’ rule + Law of Total Probability)𝑃 𝐵 𝐴 =

𝑃 𝐵 ∩ 𝐴𝑃(𝐴)

𝑃 𝐵 ∩ 𝐴 = 𝑃(𝐴|𝐵)𝑃(𝐴)

Page 27: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Independence

• A,Bareindependentif

• WhenP(A)>0,thenwecanalsowriteP(B|A)=P(B)

𝑃 𝐴, 𝐵 = 𝑃 𝐴 𝑃 𝐵

Page 28: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

RandomVariables

• ArandomvariableisafunctionX: Ω → ℝ6

• Example

• Rollingadicen times,X=sumofthenumbers

• Throwingadartatadartboard,𝑋 ∈ ℝ9 arethecoordinateswherethedartlands

Ω:setofpossibleoutcomesℝ6:d-dimensionalrealvalue

Page 29: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Distributions

• Byconsideringrandomvariables,wecanthinkofprobabilitymeasuresasfunctionsontherealnumbers

• Theprobabilitymeasureassociatedwiththerandomvariableischaracterizedbyitscumulativedistributionfunction(CDF):𝐹; 𝑥 = 𝑃 𝑋 ≤ 𝑥 .Wewrite𝑋~𝐹;

• IftworandomvariableshavethesameCDF,wecallthemidenticallydistributed

Page 30: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

DiscreteDistributions

• If𝑋 onlyhasacountable numberofvalues,thenwecancharacterizeitusingaprobabilitymassfunction(PMF)whichdescribestheprobabilityofeachvalue

𝑓; 𝑥 = 𝑃 𝑋 = 𝑥

• ∑ 𝑓; 𝑥�; = 1

• Example:Coinflip(Bernoullidistribution)• 𝑋 ∈ 0,1 ,𝑓; 𝑥 = 𝜃F 1 − 𝜃 HIF

• i.e.,𝜃 =probabilityofhead• Bernoullidist.combinesprobabilitiesofaheadandatail• For𝑥=1,𝑓; 𝑥 = 𝜃• For𝑥=0,𝑓; 𝑥 = (1 − 𝜃)

Page 31: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

ContinuousDistributions

• WhentheCDFiscontinuous,wecanlookatthederivative:

𝑓; 𝑥 =𝑑𝑑𝑥𝑓; 𝑥

• Thisiscalledtheprobabilitydensityfunction(PDF).• Wecancomputetheprobabilityofaninterval(𝑎,𝑏)withP 𝑎 < 𝑋 < 𝑏 = ∫ 𝑓; 𝑥 𝑑𝑥N

O• Forexample,Gaussiandistribution:

Continuous Distributions

15

• When the CDF is continuous, we can look at the derivative 𝑓𝑋 𝑥 = 𝑑

𝑑𝑥𝐹𝑋 𝑥 .

• This is called the probability density function (PDF).• We can compute the probability of an interval (𝑎, 𝑏) with 𝑃 𝑎 < 𝑋 < 𝑏 = 𝑎

𝑏 𝑓𝑋 𝑥 𝑑𝑥.• Note the probability of any specific point 𝑐, 𝑃 𝑋 = 𝑐 = 0

• E.g. Uniform distribution, 𝑓𝑋 𝑥 = 1𝑏−𝑎

∗ 1 𝑎,𝑏 (𝑥)

• E.g. Gaussian distribution, 𝑓𝑋 𝑥 = 12𝜋𝜎

exp((𝑥−𝜇)2

2𝜎2)

Page 32: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MultipleRandomVariables

• Wecanalsohavemultiplerandomvariables• E.g.,flippingoftwocoinsatthesametime

(firstcoin:X,secondcoin:Y)• Wecanthinkoffourcases:• x=0andy=0• x=0andy=1• x=1andy=0• x=1andy=1

• Assumethatwecanflippingtwocoinsmultipletimesandrecorditsoutcomes

X=0 X=1

Y=0 50 30

Y=1 70 50

Page 33: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MultipleRandomVariables

• Whenwenormalizethetable,wecanrepresentthejointdistribution:

• Sumofallelementsshouldbe1• WewritethejointPMForPDFas𝐹;,P 𝑥, 𝑦

X=0 X=1

Y=0 0.25 0.15

Y=1 0.35 0.25

X=0 X=1

Y=0 50 30

Y=1 70 50

Page 34: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MultipleRandomVariables

• If𝐹;,P 𝑥, 𝑦 = 𝐹; 𝑥 𝐹P 𝑦 ,thenthetwoRVsareindependent

Page 35: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Marginalization

• Summing(discrete)orintegrating(continuous)overonevariable• 𝐹; 𝑥 = ∑ 𝐹;,P(𝑥, 𝑦)�

P :discreteRVs• 𝑓; 𝑥 = ∫ 𝑓;,P 𝑥, 𝑦 𝑑𝑦�

S :continuousRVs

X=0 X=10.6 0.4

X=0 X=1

Y=0 0.25 0.15

Y=1 0.35 0.25

Page 36: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

MeanofaDistribution

• Expectationormeanofadistribution:• 𝐸 𝑋 = ∑ 𝑥 ⋅ 𝑓;(𝑥)�

; ifX isdiscrete• 𝐸 𝑋 = ∫ 𝑥 ⋅ 𝑓;𝑑𝑥

VIV ifX iscontinuous

• 𝐸 𝑋 ∗ 𝑌 = 𝐸 𝑋 𝐸 𝑌 onlyifwhenX,Yareindependent• 𝐸 𝐸(𝑋 ) = 𝐸 𝑋

Page 37: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

VarianceofaDistribution

• Varianceofadistribution:Var 𝑋 = 𝐸 𝑋 − 𝐸𝑋 9

• Ittellsabouthow“spreadout”thedistributionis

Page 38: CS 1675 Intro to Machine Learning –Recitation Math and ...people.cs.pitt.edu/~jlee/teaching/cs1675/matlab_tutorial...Math and Probability for ML Recap Jeongmin Lee Computer Science

Thanks!-

Questions?