Top Banner
CSC 412/2506 Spring 2017 Probabilis7c Graphical Models Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel
48

Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Jun 27, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

CSC412/2506Spring2017Probabilis7cGraphicalModels

Lecture3:DirectedGraphicalModels

andLatentVariables

BasedonslidesbyRichardZemel

Page 2: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Learningoutcomes

•  Whataspectsofamodelcanweexpressusinggraphicalnotation?

•  Whichaspectsarenotcapturedinthisway?•  Howdoindependencieschangeasaresultofconditioning?

•  Reasonsforusinglatentvariables•  Commonmotifssuchasmixturesandchains•  Howtointegrateoutunobservedvariables

Page 3: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

JointProbabili7es

•  Chainruleimpliesthatanyjointdistributionequals

•  Directedgraphicalmodelimpliesarestrictedfactorization

p(x1:D) = p(x1)p(x2|x1)p(x3|x1, x2)p(x4|x1, x2, x3)...p(xD|x1:D�1)

Page 4: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Condi7onalIndependence•  Notation:xA⊥xB|xC

•  DeGinition:two(setsof)variablesxAandxBareconditionallyindependentgivenathirdxCif:

whichisequivalenttosaying

•  Onlyasubsetofalldistributionsrespectanygiven(nontrivial)conditionalindependencestatement.ThesubsetofdistributionsthatrespectalltheCIassumptionswemakeisthefamilyofdistributionsconsistentwithourassumptions.

•  Probabilisticgraphicalmodelsareapowerful,elegantandsimplewaytospecifysuchafamily.

P (xA,xB |xC) = P (xA|xC)P (xB |xC) ; 8xC

P (xA|xB ,xC) = P (xA|xC) ; 8xC

Page 5: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

DirectedGraphicalModels•  Considerdirectedacyclicgraphsovernvariables.•  Eachnodehas(possiblyempty)setofparentsπi•  Wecanthenwrite

•  Hencewefactorizethejointintermsoflocalconditionalprobabilities

•  Exponentialin“fan-in”ofeachnodeinsteadofinN

P (x1, ...,xN ) =Y

i

P (xi|x⇡i)

Page 6: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Condi7onalIndependenceinDAGs

•  Ifweorderthenodesinadirectedgraphicalmodelsothatparentsalwayscomebeforetheirchildrenintheorderingthenthegraphicalmodelimpliesthefollowingaboutthedistribution:

wherearethenodescomingbeforexithatarenotitsparents

•  Inotherwords,theDAGistellingusthateachvariableisconditionallyindependentofitsnon-descendantsgivenitsparents.

•  Suchanorderingiscalleda“topological”ordering

x⇡i

{xi ? x⇡i |x⇡i} ; 8i

Page 7: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

ExampleDAGConsiderthissixnodenetwork:Thejointprobabilityisnow:

Page 8: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

MissingEdges•  Keypointaboutdirectedgraphicalmodels:

Missingedgesimplyconditionalindependence•  Rememberthatbythechainrulewecanalwayswritethefulljointasaproductofconditionals,givenanordering:

•  IfthejointisrepresentedbyaDAGM,thensomeoftheconditionedvariablesontherighthandsidesaremissing.

•  Thisisequivalenttoenforcingconditionalindependence.•  Startwiththe“idiot’sgraph”:eachnodehasallpreviousnodesintheorderingasitsparents.

•  NowremoveedgestogetyourDAG.•  Removinganedgeintonodeieliminatesanargumentfromtheconditionalprobabilityfactor

P (x1,x2, ...) = P (x1)P (x2|x1)P (x3|x2,x1)P (x4|x3,x2,x1)

P (xi|x1,x2, ...,xi�1)

Page 9: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

D-Separa7on•  D-separation,ordirected-separationisanotionofconnectednessinDAGMsinwhichtwo(setsof)variablesmayormaynotbeconnectedconditionedonathird(setof)variable.

•  D-connectionimpliesconditionaldependenceandd-separationimpliesconditionalindependence.

•  Inparticular,wesaythatxA⊥xB|xCifeveryvariableinAisd-separatedfromeveryvariableinBconditionedonallthevariablesinC.

•  Tocheckifanindependenceistrue,wecancyclethrougheachnodeinA,doadepth-GirstsearchtoreacheverynodeinB,andexaminethepathbetweenthem.Ifallofthepathsared-separated,thenwecanassertxA⊥xB|xC

•  Thus,itwillbesufGicienttoconsidertriplesofnodes.(Why?)•  Pictorially,whenweconditiononanode,weshadeitin.

Page 10: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Chain

•  Q:Whenweconditionony,arexandzindependent?

whichimpliesandthereforex⊥z|y•  Thinkofxasthepast,yasthepresentandzasthefuture.

P (x,y, z) = P (x)P (y|x)P (z|y)

P (z|x,y) = P (x,y, z)

P (x,y)

=P (x)P (y|x)P (z|y)

P (x)P (y|x)= P (z|y)

Page 11: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

CommonCause

•  Q:Whenweconditionony,arexandzindependent?

whichimpliesandthereforex⊥z|y

= P (x|y)P (z|y)

=P (y)P (x|y)P (z|y)

P (y)

P (x, z|y) = P (x,y, z)

P (y)

P (x,y, z) = P (y)P (x|y)P (z|y)

Page 12: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

ExplainingAway

•  Q:Whenweconditionony,arexandzindependent?•  xandzaremarginallyindependent,butgivenytheyareconditionallydependent.

•  Thisimportanteffectiscalledexplainingaway(Berkson’sparadox.)

•  Forexample,Gliptwocoinsindependently;letx=coin1,z=coin2.•  Lety=1ifthecoinscomeupthesameandy=0ifdifferent.•  xandzareindependent,butifItellyouy,theybecomecoupled!

P (x,y, z) = P (x)P (z)P (y|x, z)

Page 13: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Bayes-BallAlgorithm•  TocheckifxA⊥xB|xCweneedtocheckifeveryvariableinAisd-separatedfromeveryvariableinBconditionedonallvarsinC.

•  Inotherwords,giventhatallthenodesinxCareclamped,whenwewigglenodesxAcanwechangeanyofthenodesinxB?

•  TheBayes-BallAlgorithmisasuchad-separationtest.•  WeshadeallnodesxC,placeballsateachnodeinxA(orxB),letthembouncearoundaccordingtosomerules,andthenaskifanyoftheballsreachanyofthenodesinxB(orxA).

Page 14: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Bayes-BallRules•  Thethreecasesweconsideredtellusrules:

Page 15: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Bayes-BallBoundaryRules

•  Wealsoneedtheboundaryconditions:

•  Here’satrickfortheexplainingawaycase:Ifyoranyofitsdescendantsisshaded,theballpassesthrough.

•  Noticeballscantraveloppositetoedgedirections.

Page 16: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

CanonicalMicrographs

Page 17: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

ExamplesofBayes-BallAlgorithm

Page 18: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

ExamplesofBayes-BallAlgorithm

•  Notice:ballscantraveloppositetoedgedirection

Page 19: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Plates

Page 20: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Plates&Parameters

•  SinceBayesianmethodstreatparametersasrandomvariables,wewouldliketoincludetheminthegraphicalmodel.

•  Onewaytodothisistorepeatalltheiidobservationsexplicitlyandshowtheparameteronlyonce.

•  Abetterwayistouseplates,inwhichrepeatedquantitiesthatareiidareputinabox.

Page 21: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Plates:MacrosforRepeatedStructures•  Platesarelike“macros”thatallowyoutodrawaverycomplicatedgraphicalmodelwithasimplernotation.

•  Therulesofplatesaresimple:repeateverystructureinaboxanumberoftimesgivenbytheintegerinthecornerofthebox(e.g.N),updatingtheplateindexvariable(e.g.n)asyougo.

•  Duplicateeveryarrowgoingintotheplateandeveryarrowleavingtheplatebyconnectingthearrowstoeachcopyofthestructure.

Page 22: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Nested/Intersec7ngPlates

•  Platescanbenested,inwhichcasetheirarrowsgetduplicatedalso,accordingtotherule:drawanarrowfromeverycopyofthesourcenodetoeverycopyofthedestinationnode.

•  Platescanalsocross(intersect),inwhichcasethenodesattheintersectionhavemultipleindicesandgetduplicatedanumberoftimesequaltotheproductoftheduplicationnumbersonalltheplatescontainingthem.

Page 23: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Example:NestedPlates

Page 24: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

ExampleDAGM:MarkovChain

•  MarkovProperty:Conditionedonthepresent,thepastandfutureareindependent

Page 25: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

UnobservedVariables•  CertainvariablesQinourmodelsmaybeunobserved,eithersomeofthetimeoralways,eitherattrainingtimeorattesttime

•  Graphically,wewilluseshadingtoindicateobservation

Page 26: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Par7allyUnobserved(Missing)Variables•  Ifvariablesareoccasionallyunobservedtheyaremissingdata,e.g.,undeGinedinputs,missingclasslabels,erroneoustargetvalues

•  Inthiscase,wecanstillmodelthejointdistribution,butwedeGineanewcostfunctioninwhichwesumoutormarginalizethemissingvaluesattrainingortesttime:

Recallthat

`(✓;D) =

X

complete

log p(xc,yc|✓) +X

missing

log p(xm|✓)

=

X

complete

log p(xc,yc|✓) +X

missing

log

X

y

p(xm,y|✓)

p(x) =X

q

p(x, q)

Page 27: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

LatentVariables•  Whattodowhenavariablezisalwaysunobserved?Dependsonwhereitappearsinourmodel.Ifweneverconditiononitwhencomputingtheprobabilityofthevariableswedoobserve,thenwecanjustforgetaboutitandintegrateitout.e.g.,giveny,xGitthemodelp(z,y|x)=p(z|y)p(y|x,w)p(w).(Inotherwordsifitisaleafnode.)

•  Butifzisconditionedon,weneedtom odelit:odelit:

e.g.giveny,xGitthemodelp(y|x)=Σzp(y|x,z)p(z)

Page 28: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

WhereDoLatentVariablesComeFrom?•  Latentvariablesmayappearnaturally,fromthestructureoftheproblem,becausesomethingwasn’tmeasured,becauseoffaultysensors,occlusion,privacy,etc.

•  Butalso,wemaywanttointentionallyintroducelatentvariablestomodelcomplexdependenciesbetweenvariableswithoutlookingatthedependenciesbetweenthemdirectly.Thiscanactuallysimplifythemodel(e.g.,mixtures).

Page 29: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

LatentVariablesModels&Regression

•  YoucanthinkofclusteringastheproblemofclassiGicationwithmissingclasslabels.

•  Youcanthinkoffactormodels(suchasfactoranalysis,PCA,ICA,etc.)aslinearornonlinearregressionwithmissinginputs.

Page 30: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

WhyisLearningHarder?

•  Infullyobservediidsettings,theprobabilitymodelisaproduct,thustheloglikelihoodisasumwheretermsdecouple.(Atleastfordirectedmodels.)

•  Withlatentvariables,theprobabilityalreadycontainsasu m,sotheloglikelihoodhasallparameterscoupledtogetherviaz:rscoupledtogetherviaz:

(Justaswiththepartitionfunctioninundirectedmodels)

`(✓;D) = log p(x, z|✓)

= log p(z|✓z

) + log p(x|z, ✓x

)

`(✓;D) = log

X

z

p(x, z|✓)

= log

X

z

p(z|✓z

) + log p(x|z, ✓x

)

Page 31: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

WhyisLearningHarder?

•  Likelihood couplesparameters:

•  Wecantreatthisasablackboxprobabilityfunctionandjusttrytooptimizethelikelihoodasafunctionofθ(e.g.gradientdescent).However,sometimestakingadvantageofthelatentvariablestructurecanmakeparameterestimationeasier.

•  Goodnews:soonwewillseehowtodealwithlatentvariables.•  Basictrick:putatractabledistributiononthevaluesyoudon’tknow.Basicmath:useconvexitytolowerboundthelikelihood.

= log

X

z

p(z|✓z

) + log p(x|z, ✓x

)

Page 32: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

MixtureModels•  Mostbasiclatentvariablemodelwithasinglediscretenodez.•  Allowsdifferentsubmodels(experts)tocontributetothe(conditional)densitymodelindifferentpartsofthespace.

•  Divide&conqueridea:usesimplepartstobuildcomplexmodels(e.g.,multimodaldensities,orpiecewise-linearregressions).

Page 33: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

MixtureDensi7es•  ExactlylikeaclassiGicationmodelbuttheclassisunobservedandsowesumitout.Whatwegetisaperfectlyvaliddensity:

wherethe“mixingproportions”addtoone:Σkαk=1.•  WecanuseBayes’ruletocomputetheposteriorprobabilityofthemixturecomponentgivensomedata:

thesequantitiesarecalledresponsibilities.

p(z = k|x, ✓) = ↵kpk(x|✓k)Pj ↵jpj(x|✓j)

p(x|✓) =KX

k=1

p(z = k|✓z)p(x|z = k, ✓k)

=X

k

↵kpk(x|✓k)

Page 34: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Example:GaussianMixtureModels•  ConsideramixtureofKGaussiancomponents:

•  Densitymodel:p(x|θ)isafamiliaritysignal.Clustering:p(z|x,θ)istheassignmentrule,−l(θ)isthecost.

p(x|✓) =X

k

↵kN (x|µk,⌃k)

p(z = k|x, ✓) = ↵kN (x|µk,⌃k)Pj ↵jN (x|µj ,⌃j)

`(✓;D) =

X

n

log

X

k

↵kN (x

(n)|µk,⌃k)

Page 35: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Example:MixturesofExperts•  Alsocalledconditionalmixtures.Exactlylikeaclass-conditionalmodelbuttheclassisunobservedandsowesumitoutagain:

where

•  Harder:mustlearnα(x)(unlesschosezindependentofx).•  WecanstilluseBayes’ruletocomputetheposteriorprobabilityofthemixturecomponentgivensomedata:

thisfunctionisoftencalledthegatingfunction.

p(y|x, ✓) =KX

k=1

p(z = k|x, ✓z)p(y|z = k,x, ✓k)

=X

k

↵k(x|✓z)pk(y|x, ✓k)X

k

↵k(x) = 1 8x

p(z = k|x,y, ✓) = ↵k(x)pk(y|x, ✓k)Pj ↵j(x)pj(y|x, ✓j)

Page 36: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Example:MixturesofLinearRegressionExperts•  EachexpertgeneratesdataaccordingtoalinearfunctionoftheinputplusadditiveGaussian noise:

•  The“gate”functioncanbeasoftmaxclassiGicationmachine

•  Remember:wearenotmodelingthedensityoftheinputsx

p(y|x, ✓) =X

k

↵kN (y|�Tk x,�

2k)

↵k(x) = p(z = k|x) = e⌘Tk x

Pj e

⌘Tj x

Page 37: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

GradientLearningwithMixtures

•  Wecanlearnmixturedensitiesusinggradientdescentonthelikelihoodasusual.Thegradientsarequiteinteresting:

•  Inotherwords,thegradientistheresponsibilityweightedsumoftheindividualloglikelihoodgradients

`(✓) = log p(x|✓) = log

X

k

↵kpk(x|✓k)

@`

@✓=

1

p(x|✓)X

k

↵k@pk(x|✓k)

@✓

=

X

k

↵k1

p(x|✓)pk(x|✓k)@ log pk(x|✓k)

@✓

=X

k

↵kpk(x|✓k)p(x|✓)

@`k@✓k

=X

k

↵krk@`k@✓k

Page 38: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

ParameterConstraints

•  Ifwewanttousegeneraloptimizations(e.g.,conjugategradient)tolearnlatentvariablemodels,weoftenhavetomakesureparametersrespectcertainconstraints(e.g.,Σkαk=1,ΣkαkpositivedeGinite)

•  Agoodtrickistore-parameterizethesequantitiesintermsofunconstrainedvalues.Formixingproportions,usethesoftmax:

•  Forcovariancematrices,usetheCholeskydecomposition

whereAisupperdiagonalwithpositivediagonal

↵k =

exp(qk)Pj exp(qj)

⌃�1 = ATA |⌃|�1/2 =Y

i

Aii

Aii = exp(ri) > 0 Aij = aij (j > i) Aij = 0 (j < i)

Page 39: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Logsumexp•  Oftenyoucaneasilycomputebk=logp(x|z=k,θk),

butitwillbeverynegative,say-106orsmaller.•  Now,tocomputel=logp(x|θ)youneedtocompute(e.g.,forcalculatingresponsibilitiesattesttimeorforlearning)

•  Careful!Donotcomputethisbydoinglog(sum(exp(b))).YouwillgetunderGlowandanincorrectanswer.

•  Insteaddothis:–AddaconstantexponentBtoallthevaluesbksuchthatthe

largestvalueequalszero:B=max(b).–Computelog(sum(exp(b-B)))+B.•  Example:iflogp(x|z=1)=−120andlogp(x|z=2)=−120,whatislogp(x)=log[p(x|z=1)+p(x|z=2)]?Answer:log[2e−120]=−120+log2.

•  Ruleofthumb:neveruselogorexpbyitself

log

Pk e

bk

Page 40: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

HiddenMarkovModels(HMMs)

•  Averypopularformoflatentvariablemodel

•  ZtàHiddenstatestakingoneofKdiscretevalues•  XtàObservationstakingvaluesinanyspace

Example:discrete,MobservationsymbolsB 2 <KxM

p(xt = j|zt = k) = Bkj

Page 41: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

InferenceinGraphicalModels

xEàObservedevidencevariables(subsetofnodes)xFàunobservedquerynodeswe’dliketoinferxRàremainingvariables,extraneoustothisquerybutpartofthegivengraphicalrepresentation

Page 42: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

InferencewithTwoVariables

Tablelook-up

Bayes’Rulep(x|y = y) =

p(y|x)p(x)p(y)

p(y|x = x)

Page 43: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

NaïveInference

•  Supposeeachvariabletakesoneofkdiscretevalues

•  CostsO(k)operationstoupdateeachofO(k5)tableentries•  Usefactorizationanddistributedlawtoreducecomplexity

p(x1, x2, ..., x5) =X

x6

p(x1)p(x2|x1)p(x3|x1)p(x4|x2)p(x5|x3)p(x6|x2, x5)

= p(x1)p(x2|x1)p(x3|x1)p(x4|x2)p(x5|x3)X

x6

p(x6|x2, x5)

Page 44: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

InferenceinDirectedGraphs

Page 45: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

InferenceinDirectedGraphs

Page 46: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

InferenceinDirectedGraphs

Page 47: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Learningoutcomes

•  Whataspectsofamodelcanweexpressusinggraphicalnotation?

•  Whichaspectsarenotcapturedinthisway?•  Howdoindependencieschangeasaresultofconditioning?

•  Reasonsforusinglatentvariables•  Commonmotifssuchasmixturesandchains•  Howtointegrateoutunobservedvariables

Page 48: Lecture 3: Directed Graphical Models and Latent Variables ... › ~duvenaud › courses › csc... · Lecture 3: Directed Graphical Models and Latent Variables Based on slides by

Ques7ons?

•  Thursday:Tutorialonautomaticdifferentiation•  Thisweek:Assignment1released