Multiple Targets Tracking in World Coordinate with a Single, Minimally Calibrated Camera. Wongun Choi Silvio Savarese Department of Electrical Engineering and Computer Science, University of Michigan Experimental Results Conclusion References • Joint estimation of camera and targets’ state helps improve tracking performance. • Other geometric cues further stabilize the system. • Better target association can be achieved by modeling interaction between targets. Challenges • Unknow (uncalibrated) camera motion. • Background clutter. • Detector failure (missed detection). • Occlusion between targets. Estimation : MCMC Particle Filtering • Motivation - Non-linear and non-gaussian density function. • Proposal Distribution Q(Ω t ;Ω t ) - Choose one of the variables : camera parameter, target states, or ground features. - Sample from Gaussian distribution centered on current sample. - Randomly change the class of a target or a ground feature. - Direct sampling from high dimensional space is not efficient. • Improved detection accuracy. 0 0.5 1 1.5 2 2.5 3 0.6 0.65 0.7 0.75 0.8 0.85 Recall/FPPI FPPI Recall Full Model No Geometry No Interaction Baseline [9] • Example tracking results on ETH sequence. 1 2 3 4 5 13 14 15 16 17 18 19 20 117 121 122 121 161 162 165 167 121 162 166 -20 -15 -10 -5 0 5 10 10 5 10 15 20 25 Frame 6 -30 -25 -20 -15 -10 -5 0 0 5 10 15 20 25 30 35 40 Frame 96 -50 -45 -40 -35 -30 -25 -20 -110 15 20 25 30 35 40 45 Frame 600 -80 -75 -70 -65 -60 -55 -50 -45 -415 20 25 30 35 40 45 50 Frame 995 -80 -75 -70 -65 -60 -55 -50 -45 -40 15 20 25 30 35 40 45 50 Frame 1005 Frame 176 42 43 44 45 Frame 1064 121 177 178 179 • Accurate localization of camera • Effect of interaction model. 4 25 31 33 34 35 Frame 533 2 4 24 27 28 30 Frame 479 4 24 28 30 32 33 Frame 533 2 4 25 28 29 31 33 Frame 479 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 479 Group Interaction −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 533 Group Interaction −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 479 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 533 • Effect of geometric features. 1 2 3 4 Frame 89 1 2 3 4 Frame 223 1 2 3 4 Frame 89 1 2 6 8 Frame 223 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 89 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 223 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 89 −15 −10 −5 0 5 10 15 0 5 10 15 20 25 Frame 223 Interaction Model Z k(t-1) Z k(t) Z k(t+1) Z j(t-1) Z j(t) Z j(t+1) Z i(t-1) Z i(t) Z i(t+1) • Pairwise interaction Model • Two exclusive mode of interactions - Repulsion: people want to keep a distance from others. - Group Interaction: People moving as a group tend to move together. - Targets’ motions are dependent. P (Z t |Z t−1 )= i<j ψ (Z it ,Z jt ; β ijt ) i<j P (β ijt |β ij (t−1) ) N i=1 P (Z it |Z i(t−1) ) Repulsion Group Interaction ψ (Z it ,Z jt ; β ijt )= ψ g (Z it ,Z jt ), if β ijt =1 ψ r (Z it ,Z jt ), otherwise - Switch variables to select one of the above models. [1] Ess, A., Leibe, B., Schindler, K., , van Gool, L.: A mobile vision system for robust multi-person tracking. In: CVPR. (2008) [2] Khan, Z., Balch, T., Dellaert, F.: Mcmc-based particle fltering for tracking a variable number of interacting targets. PAMI (2005) [3] Pellegrini, S., Ess, A., Schindler, K., van Gool, L.: You'll never walk alone: Modeling social behavior for multi-target tracking. In: ICCV. (2009) [4] Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: CVPR. (2006) Joint Model X 2 τ 1 X 3 X 1 τ 3 τ 2 ϴ Z 1 G 2 Z 3 Z 2 G 3 G 1 X t-1 X t θ t-1 θ t Z t-1 Z t X t+1 θ t+1 Z t+1 τ t-1 τ t G t-1 G t τ t+1 G t+1 Graphical Representation World Illustration • Sequential bayesian formulation P (Ω t |χ t ) ∝ P (Ω t ,χ t |χ t−1 )= P (χ t |Ω t ) P (Ω t |Ω t−1 )P (Ω t−1 |χ t−1 )dΩ t−1 P (χ t |Ω t )= P (X t ,Y t |Z t , Θ t )P (τ t |G t , Θ t ) P (Ω t |Ω t−1 )= P (Z t |Z t−1 )P (Θ t |Θ t−1 )P (G t |G t−1 ) • Motion models • Observation models - Simplified camera projection function [4]. - Target’s individual motion modeled as a first order linear dynamic model. - Ground features assumed to be static. - Camera assumed to move along the viewing direction. • Indicator variable - Target : indicate whether this target is valid human or not. - Ground features : indicate whether the feature is on static ground plane. Overview • Objective : robustly track multiple targets in 3D. • Modeling interaction between targets. • MCMC particle filtering for efficient estimation. • Ground feature points for robust camera estimation. • Joint estimation of camera and target’s motion. -36 -34 -32 -30 -28 -26 -24 -22 -20 -18 -16 18 20 22 24 26 28 30 32 34 Frame 657 3D Trajectory Ground Plane Feature Frame 657 114 121 122 123 124 Projected Target 2D Trajectory KLT Feature Recall/FPPI on ETH dataset ETH [1] Recall FPPI 0.498 0.404 0.338 0.781 0.431 0.262 0.673 0.616 0.484 2.772 1.593 0.638 Our Algorithm Recall FPPI 0.556 0.541 0.519 0.792 0.442 0.267 0.339 0.421 0.497 2.792 1.608 0.647 Method Seq.#2 Seq.#3 Randomly Select one Human? Non-human? How fast? Where? ... New random sample X’ Accept/Reject (Metropolis Hasting) Time t Time t+1