Face Detection in Crowded Images Todd Wittman Math 8600: Image Analysis Prof. Jackie Shen May 2002
Jan 20, 2016
Face Detection in Crowded Images
Todd Wittman
Math 8600: Image Analysis
Prof. Jackie Shen
May 2002
Face Detection
Ultimate Goal: Detect and locate human face(s) in a crowded color image.
Short-term Goal: Determine if a “mug shot” contains a human face (YES or NO). YES
Neural Network Face DetectorInput: Color image.
Output: P(w|x) = probability that image contains a face. (Only 1 output node.)
Set 1 for face, 0 for no face.
P=1 P=0
3 Possible Outputs•P > 0.5 FACE•P < 0.5 NOT FACE•P = 0.5 DON’T KNOW
Color-Based ApproachFor each color image, prepare a YES color histogram.
Y = 0.253R + 0.684G + 0.063B
E = 0.5R - 0.5G
S = 0.25R + 0.25G - 0.5B
Y E STrain the neural network by feeding it many color histograms, telling it which are faces (1) and which are not (0).Idea: Neural network will learn which bins represent flesh tones. (Network develops an internal “chroma chart”.)Note: Technically, this is a flesh detector, not a face detector.
Training Data100 Faces: Mug shots were chosen to represent different flesh tones and gamma values.
100 Non-Faces: Objects, landscapes, animals, and computer-generated random images not containing flesh tones.
ResultsTraining for 100 iterations took 13 hours.
7 of 100 training faces were mis-classified2 of 100 training non-faces were mis-classifiedNetwork performed favorably on test images.
1.0
0.0
0.5
Face Detection in Crowded Image
Popular Approach: WindowingCreate a small box. Run the face detector in that box. Move the box over one pixel. Repeat.
Now that we have a face detector for mug-shots, how do we detect faces in a general image that could contain many objects and multiple faces?
Our Approach: SegmentationSegment the image into its connected components. Run the face detector on each component.
Pre-assumes size of face in image. Color histogram is shape and size invariant!
Shape Recovery by Diffusion Generated MotionJawreth-Lin: Alternately sharpen and diffuse a region, propagatingthe front towards the object boundaries.
ΧΩ(x) =
1 if G1 ∗Χ Ω ≥1− ∇Gσ ∗I2T2
0 otherwise
⎧ ⎨ ⎪
⎩ ⎪
ΧΩInitialize to have 1’s on the image and 0’s on the border.
Update
where G is Gaussian
Gσ x( ) =1
4πσ( )2 exp−x2
4σ 2
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
Note: Instead of convolution, we can apply a digital Gaussian:
K x( ) =
1
16
1 2 1
2 4 2
1 2 1
⎡
⎣
⎢
⎢
⎢
⎤
⎦
⎥
⎥
⎥
Why Does This Work?
∇Gσ ∗I2
≈0
G1 ∗ΧΩ ≥1− ∇Gσ ∗I2T2
In smooth regions, so the RHS is very close to 1.Near the boundary of the front, the G averages a 1 with the nearby zeros. So the LHS is < 1.
The front will propagate inward until we hit a jump in the image, where .∇Gσ ∗I
2T2 >1
0000000000011111111001111111100000000000
T = 30.0 m = 2 n = 2
Conclusion: It didn’t work.Although the segmentation and general face detection worked on simple synthetic images, it failed for general photographs.
Problems:•Segmentation fails for noisy backgrounds, overlapping objects, and objects that intersect the border of the image.•Segmentation would not necessarily pick out just the face (mug),but also the body that goes along with it. So the neural network would receive the colors of the clothes as well. (Perhaps this methodcan detect naked people though.)
Th-th-that’s All, Folks!