Michael J. Black February 2002 Learning the Learning the Appearance and Motion Appearance and Motion of People in Video of People in Video Hedvig Sidenbladh Michael J. Black http://www.cs.brown.edu/~blac Department of Computer Science Brown University Defense Research Institute Stockholm Sweden ttp://www.nada.kth.se/~hedvig (The Science of Silly Walks) (The Science of Silly Walks)
55
Embed
Michael J. BlackFebruary 2002 Learning the Appearance and Motion of People in Video Hedvig Sidenbladh Michael J. Black black Department.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Michael J. BlackFebruary 2002
Learning the Appearance and Learning the Appearance and Motion of People in VideoMotion of People in Video
Hedvig Sidenbladh Michael J. Black
http://www.cs.brown.edu/~black
Department of Computer ScienceBrown University
Defense Research InstituteStockholm Sweden
http://www.nada.kth.se/~hedvig
(The Science of Silly Walks)(The Science of Silly Walks)
Michael J. BlackFebruary 2002
CollaboratorsCollaborators
David Fleet, Xerox PARC
Nancy Pollard, Brown University
Dirk Ormoneit and Trevor Hastie Dept. of Statistics, Stanford University
Allan Jepson, University of Toronto
Michael J. BlackFebruary 2002
The (Silly) ProblemThe (Silly) Problem
Unsolved without manual intervention.
Michael J. BlackFebruary 2002
Inferring 3D Human MotionInferring 3D Human Motion
* No special clothing* Monocular, grayscale, sequences (archival data)* Unknown, cluttered, environment* Incremental estimation
* Infer 3D human motion from 2D image properties.
Michael J. BlackFebruary 2002
Why is it Hard?Why is it Hard?
Low contrast
Self occlusion
Singularities in viewing direction
Unusual viewpoints
Ambiguous matches
Michael J. BlackFebruary 2002
Clothing and LightingClothing and Lighting
Michael J. BlackFebruary 2002
Large MotionsLarge Motions
Limbs move rapidly with respect to their width.
Non-linear dynamics.
Motion blur.
Michael J. BlackFebruary 2002
AmbiguitiesAmbiguities
Where is the leg?
Which leg is in front?
Michael J. BlackFebruary 2002
AmbiguitiesAmbiguities
Accidental alignment
Michael J. BlackFebruary 2002
AmbiguitiesAmbiguities
Whose legs are whose?Occlusion
Michael J. BlackFebruary 2002
RequirementsRequirements
1. Represent uncertainty and multiple hypotheses.
2. Model non-linear dynamics of the body.
3. Exploit image cues in a robust fashion.
4. Integrate information over time.
5. Combine multiple image cues.
Michael J. BlackFebruary 2002
Simple Body ModelSimple Body Model
* Limbs are truncated cones* Parameter vector of joint angles and angular velocities =
3. Need an effective way to explore the model space (very high dimensional) and represent ambiguities.
p(cues)
1. Need a constraining likelihood model that is alsoinvariant to variations in human appearance.
2. Need a prior model of how people move.
Michael J. BlackFebruary 2002
Key Idea #4 Key Idea #4 (Represent Ambiguity)(Represent Ambiguity)
Samples from a distributionover 3D poses.
* Represent a multi-modal posterior probability distribution over model parameters - sampled representation - each sample is a pose and its probability - predict over time using a particle filtering approach.
Michael J. BlackFebruary 2002
Particle FilteringParticle Filtering* large literature (Gordon et al ‘93, Isard & Blake ‘96,…)
* non-Gaussian posterior approximated by N discrete samples
* explicitly represent the ambiguities
* exploit stochastic sampling for tracking
)(nt )10( 3NNn ,...,1
Michael J. BlackFebruary 2002
Particle FilterParticle Filter
samplesample
samplesample
normalizenormalize
Posterior)I|( 11 ttp
Temporal dynamics)|( 1ttp
Likelihood
)|I( ttp )I|( ttp
Posterio
r
Michael J. BlackFebruary 2002
Particle FilterParticle Filter
Isard & Blake ‘96
Michael J. BlackFebruary 2002
Tracking with OcclusionTracking with Occlusion
1500 samples, ~2 minutes/frame.
Michael J. BlackFebruary 2002
Moving CameraMoving Camera
1500 samples, ~2 minutes/frame.
Michael J. BlackFebruary 2002
Stochastic 3D TrackingStochastic 3D Tracking
* 2500 samples (now down as low as 300 with the new prior).
Michael J. BlackFebruary 2002
ConclusionsConclusionsInferring human motion, silly or not, from video is challenging.
We have tackled three important parts of the problem:
1. Probabilistically modeling human appearance in a generic, yet useful, way.
2. Representing the range of possible motions using techniques from texture modeling.
3. Dealing with ambiguities and non-linearities using particle filtering for Bayesian inference.
Michael J. BlackFebruary 2002
Ongoing and Future WorkOngoing and Future WorkBetter search algorithms Hybrid Monte Carlo tracker (Choo and Fleet ’01) Covariance scaled sampling (Schiminescu&Triggs’01)
Richer prior models of motion.
Estimate background motion.
Statistical models of color and texture.
Automatic initialization.
Training data and likelihood models to be available in the web.