Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

Deep Face Recognition Challenges and Tips for Real-life Deployment

[email protected]

1 Deep Face Recognition

2 Public DBs

3 Public models

4 Managing imbalance

5 Embeddings

6 Conclusions

HERTA

www.hertasecurity.com

Deep Face Recognition

GPU-powered face recognition

Offices in Barcelona, Madrid, London, Los Angeles

Crowds, unconstrained

Deep Face Recognition

Large training DBs, >100K images, >1K subjects (Public DBs)

Public models (Inception, VGG, ResNet, SENet…), close to state-of-the-art

Typically, embedding layer (yielding facial descriptor) feeds one-hot encoding

Unconstrained (in-the-wild) environments

HERTA


Public DBs

CWF

LFW

VGG Face

VGG Face 2

IJBB

• Mostly celebrities: subjects overlap

2.6K

9.1K

10.6K

1.8K

5.7K

HERTA


Public DBs

LFWCWF

• Highly imbalancedD

emo

grap

hic

gro

up

Imag

es /

su

bje

ct

HERTA


Public models

Public models • trained on public DBs (DIY)

Validate with • demographically-balanced DB:

Asian female: 1M pairsAsian male: 1M pairsBlack female: 1M pairsBlack male: 1M pairsWhite female:1M pairsWhite male: 1M pairs

FaceNet (2015) CWF / MS-1MCVGGFace (2015) VGGSphereFace (2017) CWFVGGFace2 (2017) MS-1MC + VGG2

(50% same ID, 50% different ID)

HERTA


Public models: examples of failures

False positives False negatives

HERTA


Public models: evaluation

FaceNet (2015)

SphereFace (2017)

VGGFace (2015)

VGGFace2 (2017)

1MC CWF VGG

CWF 1MC 1MCVG2 VG2

White male Black male Asian female

HERTA


“Features get better at understanding faces, improving

performances of individual tasks”

Multi-tasklearning

id

gender

ethnics

Managing imbalance

Undersampling

Oversampling

Cost-sensitive learning

c

SAMPLING(DATA-ORIENTED)

TRAINING LOSS(MODEL-ORIENTED)

R Ranjan, VM Patel, R Chellappa. “Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition.” TPAMI 2017

HERTA


Managing imbalance – Data augmentation

• Data augmentation: makes imbalance mitigation much more effective

Stochasticdata augmentation

Oversampled DB DNNDatabase

I Masi et al. "Do we really need to collect millions of faces for effective face recognition?" ECCV 2016.

HERTA


Managing imbalance – Proposal

Traditional imbalance:

Proposal: IDR(robust to outliers)

Iterative multi-label oversampling:

𝑚𝑎𝑥 𝑋

𝑚𝑖𝑛(𝑋)

𝐷9 𝑋

𝐷1(𝑋)

1. Find most imbalanced label L2. Find most imbalanced category C within L3. Draw random sample from C, replicate

𝐷1

𝐷9

𝑚𝑎𝑥 𝑋

𝑚𝑖𝑛(𝑋)

𝐷9 𝑋

𝐷1(𝑋)

#samples added #samples added

HERTA


Managing imbalance – Sample training batch

Before oversampling… …and after

HERTA


Managing imbalance

• Results with ResNet 20 (tiny network, for comparison only)• Better with almost 6X less subjects, 2X less images!

10.6K subjects,494K images

1.8K subjects,295K images

HERTA


Sparse embedding

Typically, in deep face recognition: •

What about • ReLU + embedding + one-hot encoding? (e.g. VGGFace)Why more dimensions, if 90% zero?

Larger representation subspace, at expense of computational efficiency•

But can gain it back! • ̴200M comp/s

image CNNembedding

layerone-hot encoding

Sparse 4096-d Dense 512-dDict + Dense 256-d

HERTA


Conclusions

• Public training / validation DBs: heavily biased at multiple levels• Without balancing, trained models will be biased, too!• Prefer “better data” over “more data”

• Machine Learning vs Machine Teaching

Explainable ML

Designing algorithms to passively train models

Choosing which examplesto show a learner

Zhu, Xiaojin, et al. "An Overview of Machine Teaching." arXiv preprint arXiv:1801.05927 (2018).

Questions?

[email protected]

Deep Face Recognition - NVIDIA · HERTA Deep Face Recognition GPU-powered face recognition Offices in Barcelona, Madrid, London, Los Angeles Crowds, unconstrained Deep Face Recognition

Documents