Face Detection in Crowded Images

Face Detection in Crowded Images

Todd Wittman

Math 8600: Image Analysis

Prof. Jackie Shen

May 2002

Face Detection

Ultimate Goal: Detect and locate human face(s) in a crowded color image.

Short-term Goal: Determine if a “mug shot” contains a human face (YES or NO). YES

Neural Network Face DetectorInput: Color image.

Output: P(w|x) = probability that image contains a face. (Only 1 output node.)

Set 1 for face, 0 for no face.

P=1 P=0

3 Possible Outputs•P > 0.5 FACE•P < 0.5 NOT FACE•P = 0.5 DON’T KNOW

Color-Based ApproachFor each color image, prepare a YES color histogram.

Y = 0.253R + 0.684G + 0.063B

E = 0.5R - 0.5G

S = 0.25R + 0.25G - 0.5B

Y E STrain the neural network by feeding it many color histograms, telling it which are faces (1) and which are not (0).Idea: Neural network will learn which bins represent flesh tones. (Network develops an internal “chroma chart”.)Note: Technically, this is a flesh detector, not a face detector.

Training Data100 Faces: Mug shots were chosen to represent different flesh tones and gamma values.

100 Non-Faces: Objects, landscapes, animals, and computer-generated random images not containing flesh tones.

ResultsTraining for 100 iterations took 13 hours.

7 of 100 training faces were mis-classified2 of 100 training non-faces were mis-classifiedNetwork performed favorably on test images.

1.0

0.0

0.5

Face Detection in Crowded Image

Popular Approach: WindowingCreate a small box. Run the face detector in that box. Move the box over one pixel. Repeat.

Now that we have a face detector for mug-shots, how do we detect faces in a general image that could contain many objects and multiple faces?

Our Approach: SegmentationSegment the image into its connected components. Run the face detector on each component.

Pre-assumes size of face in image. Color histogram is shape and size invariant!

Shape Recovery by Diffusion Generated MotionJawreth-Lin: Alternately sharpen and diffuse a region, propagatingthe front towards the object boundaries.

ΧΩ(x) =

1 if G1 ∗Χ Ω ≥1− ∇Gσ ∗I2T2

0 otherwise

⎧ ⎨ ⎪

⎩ ⎪

ΧΩInitialize to have 1’s on the image and 0’s on the border.

Update

where G is Gaussian

Gσ x( ) =1

4πσ( )2 exp−x2

4σ 2

⎛

⎝ ⎜ ⎜

⎞

⎠ ⎟ ⎟

Note: Instead of convolution, we can apply a digital Gaussian:

K x( ) =

1

16

1 2 1

2 4 2

1 2 1

⎡

⎣

⎢

⎢

⎢

⎤

⎦

⎥

⎥

⎥

Why Does This Work?

∇Gσ ∗I2

≈0

G1 ∗ΧΩ ≥1− ∇Gσ ∗I2T2

In smooth regions, so the RHS is very close to 1.Near the boundary of the front, the G averages a 1 with the nearby zeros. So the LHS is < 1.

The front will propagate inward until we hit a jump in the image, where .∇Gσ ∗I

2T2 >1

0000000000011111111001111111100000000000

T = 30.0 m = 2 n = 2

Conclusion: It didn’t work.Although the segmentation and general face detection worked on simple synthetic images, it failed for general photographs.

Problems:•Segmentation fails for noisy backgrounds, overlapping objects, and objects that intersect the border of the image.•Segmentation would not necessarily pick out just the face (mug),but also the body that goes along with it. So the neural network would receive the colors of the clothes as well. (Perhaps this methodcan detect naked people though.)

Th-th-that’s All, Folks!

Face Detection in Crowded Images

Documents