What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets De-An Huang 1 , Vignesh Ramanathan 2 , Dhruv Mahajan 2 , Lorenzo Torresani 2 , Manohar Paluri 2 , Li Fei-Fei 1 , Juan Carlos Niebles 1 Stanford University 1 , Facebook 2 Motivation Class-Agnostic Temporal Generator Analysis Ø Videos contain much more than just the images Ø Still missing an explicit analysis of temporal information Ø Analyze the video model trained on a dataset (fixed weights) Ø Propose three frameworks to ablate temporal info from test video Ø Single frame is just an image and contains no temporal information (b) Video matching C3D deep features of (a) (a) Original Video Approach Overview 0 10 20 30 40 50 60 70 80 90 Original Video No Temporal Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 C3D trained on UCF101 Test Video Selected Frame Subsampling Frame Selector Temporal Generator Generated Video Generator Selector 6% Ø Temporal Dist Shift: Model has not seen “static videos” in training Ø Generate a video from the frame to bridge the distribution shift but without using any ”real” temporal information Ø Learning the Temporal Generator: The video generated from the image should be perceptually similar to the original video for the model Ø Key frame for us to recognize the action without temporal information Ø ! " : Estimate of frame quality Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 C3D trained on UCF101 Test Video Middle Frame Replicated Frames Replicate Frames Middle Frame Conv 1 Conv 2 Conv 3 Conv 4 Conv 5 C3D trained on UCF101 Test Video Middle Frame Middle Frame Temporal Generator Generated Video Naïve Subsampling Video Model (C3D) Input Video Selected Frame Generated Video Temporal Generator Subsampling ℓ $ ℓ % ℓ & ℓ ' ℓ ( Motion-Invariant Frame Selector ! ) * = max / 0 / () * ) 0 / () * ) : score of class 3 Input Video Sub-sampled Frame Candidates … … ) $ ) * ) 4 !(") !(") !(") argmax Ø Oracle Key Frames (Upper Bound): select the frames that can give correct prediction Ø Analyzing Motion Information Ø 40% of UCF101 and 35% of Kinetics classes do not need motion Ø Temporal Generator: Ø Frame Selection: Ø Oracle Fame Selection JuggleBalls Original Vid JuggleBalls Temp. Gen. PlayFlute Original Vid PlayFlute Temp. Gen. Sled Dog Racing Ice Skating Boxing speedbag Ski Jumping