VisualDynamics:ProbabilisticFutureFrameSynthesisviaCrossConvolutionalNetworks
TianfanXue* JiajunWu* KatieBouman BillFreeman
NIPS2016
VGGReadingGroup,24Feb2017AnkushGupta
Frame2
Task:futureframeprediction
Frame1 Frame2Deterministicneuralnetwork
Deterministicpredictionsfailtomodeluncertainty
Frame1 Deterministicneuralnetwork
Deterministicpredictionsfailtomodeluncertainty
Prediction
Whatistheproblem?
Frame1 Deterministicneuralnetwork
Deterministicpredictionsfailtomodeluncertainty
Prediction
Whatistheproblem?
SynthesisnetworkInputframe Sampledfutureframe
Sampledifferentfutureframes
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Inputrandommotionvector𝑧~𝑝$(𝑧)
SynthesisnetworkInputframe
Sampledifferentfutureframes
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Inputrandommotionvector𝑧~𝑝$(𝑧)
Sampledfutureframe
Inputframe Anothersampledfutureframe
Segments Transformedsegments
Inputrandommotionvector𝑧~𝑝$(𝑧)
Synthesizeusingdifferenttransformations
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Sampledfutureframe
Motionvector𝑧
SynthesisnetworkInputframe
Encodingnetwork
Futureframe(groundtruth)
Training
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Motionvector𝑧
Encodingnetwork
Synthesisnetwork Futureframe
(prediction)Trainingsamples
(Label-free)
Training
Inputframe
Futureframe(groundtruth)
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Futureframe𝐼()*(prediction)
Motionvector𝑧
Encodingnetwork
Synthesisnetwork
Training
Futureframe𝐼+,(groundtruth)
Inputframe
Objectivefunction:𝐼()* − 𝐼+, + 𝐷01(𝒛||𝑁(𝟎, 𝐈))
Reconstructionloss
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Futureframe𝐼()*(prediction)
Futureframe𝐼+,(groundtruth)
Inputframe
Encodingnetwork
Synthesisnetwork
Training Objectivefunction:𝐼()* − 𝐼+, + 𝐷01(𝒛||𝑁(𝟎, 𝐈))
KL-divergenceloss
Motionvector𝑧
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Variational Autoencoder[Kingma andWelling,2014]
Futureframe𝐼()*(prediction)
Synthesisnetwork
Testing
Futureframe𝐼+,(groundtruth)
Encodingnetwork
Inputframe
Inputframe
Mainidea NetworkstructureOutline Whatthenetworklearns Result
u
Inputrandommotionvector𝑧~𝑝$(𝑧)
Realoutputfromournetwork
Inputframe Futureframe
TransformsegmentsFindsegments
Inputrandommotionvector𝑧
Synthesizebytransformingsegments
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Imagesegments Convolution
0 0 0
0 1 0
0 0 0
0 0 1
0 0 0
0 0 0
Movementcanbesynthesizedthroughconvolution
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Imagesegments
Applyingmotiontoeachsegment
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Motionkernels
Thedecodingnetworkgeneratesamotionkernelforeachcorrespondingsegment
Decodingnet
Motionvector𝑧
[Brabandere etal.2016][Finnetal.2016]
Motionvector𝑧
Inputframe
Futureframe
Synthesisnetwork
Futureframe
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Whatisencodedinthemotionvector?
Encodingnetwork
Motionvector𝑧 Upwardmotionwhenchangingthisdimension
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Eachdimensionencodesatypeofmotion
Motionvector𝑧 Legmotionwhenchangingthisdimension
Eachdimensionencodesatypeofmotion
Mainidea NetworkstructureOutline Whatthenetworklearns Result
• Simulatedshapes
• Trainingsamples
Results:toyexample
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Input
Learnedsegments
Networkautomaticallydetectssegments
Triangles
Circles
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Input SamplednextframeGroundtruthdistribution
Sampledistribution
Networklearnsthecorrelationbetweenappearanceandmotion
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Input Sampledfutureframes
Results:real-worldimages
Mainidea NetworkstructureOutline Whatnetworklearns Result
Challenge:largemotion
Mainidea NetworkstructureOutline Whatthenetworklearns Result
Input TwosampledfutureframesArtifactsappearwhenmotionislarge
Baseline:Transferflow 25.5%Ourmethod 31.3%
Labeledasreal
MechanicalTurkstudytoassesssynthesisquality
Idealsynthesisalgorithmachieves50%
Mainidea NetworkstructureOutline Whatthenetworklearns Result
• Samplemultiplefutureframesthatareconsistentwiththeinput
• Synthesizeframesbytransformingsegments
• Learnamotionrepresentationwithoutsupervision
…
Contributions