Scaling face recognition with big data - Bogdan Bocse

@ITCAMPRO #ITCAMP17Community Conference for IT Professionals

Scaling Face Recognition

with Big Data

Bogdan BOCȘE

Solutions Architect & Co-founder VisageCloud

https://VisageCloud.com

https://www.linkedin.com/in/bogdanbocse/

https://twitter.com/bocse

https://visagecloud.com/





Many thanks to our sponsors & partners!

GOLD

SILVER

PARTNERS

PLATINUM

POWERED BY


• How to learn ?

• What to learn?

• Defining learning objectives

• How to scale learning?

• Gotchas

• VisageCloud

–Architecture

–Use Cases

Agenda


• What questions to ask before writing the code?

• How to look at the data before feeding it to the

machine?

• What is the state of the art regarding ML?

• What frameworks to use?

• What are the common traps to avoid?

• How to design for scale?

Objectives


HOW TO LEARN?


Vision

• Convolutional Neural Networks

• Inception Paper

NLP

• Word2Vec

• GloVe: Global Vectors for Words Representation

Generic

• Classification

• Prediction

How to Learn?


Convolutional Neural Networks: Big Picture


• Pooling / Max Pooling

• Convolution

• Fully Connected Activation–Activation Function, eg. ReLu

Convolutional Neural Networks : Components


• Learning is an optimization problem

–Find parameters of a system (neural network) that minimize a fixed error function

–Not unlike planning orbital paths

• Defining the network architecture

• Defining the training algorithm

–Stochastic Gradient Descent

• With momentum

• With noisy

Taking a Step Back: The Math


• DeepLearning4j– Independent company

– Java interface with C-bindings for performance

• TensorFlow– Python & C++ API

– Developed by Google

– Compatible with TPU

• Torch– Developed by Facebook

– Written in LuaJIT, with Python bindings

Frameworks


WHAT TO LEARN?


• Public data sets

– Labelled Faces in the Wild (LFW)

–Youtube faces

–Kaggle

• Private data sets

• Build your own

–Outsourcing: Mechanical Turk

–Crowsourcing: ReCaptcha model

Data Sets


Preparing Data

Clean data

Cropping

Structure

Homogeneity

Normalization

Histograms

Filtering


• Machine learning is not magic

• If you can’t understand the data, a machine probably

won’t either

• Preprocessing makes the difference between results

• Applying filters, normalization, anomaly detection is

computationally inexpensive

Preparing Data


DEFINING LEARNING OBJECTIVES


• Supervised

–Classification

–Scoring and regression

– Identification

• Unsupervised

–Clustering

Defining learning objectives


• Projecting input onto a fixed set of classes

• “Don’t use a cannon to kill a fly”

–Support Vector Machines

• Linear

• Radial Based Functions

Classification


• Embedding

–Projecting input (image) onto an vector space with a

known property

• Triplet Loss Function

Identification


• Splitting a set of items into non-overlapping subsets,

based on item attributes

• Counting people in video streams

• Algorithms:

–Fixed threshold

–K-means

–Rank-order clustering

Clustering


HOW TO SCALE LEARNING?


• Scaling training

– Requires shared memory space

– Vertical scaling

• GPU

• Soon-to-come: TPU (tensor processing unit)

• Scaling evaluation

– Shared nothing architecture

–Neural network/classifier rarely change

– Load balancing pattern

– Partitioning data if needed

How to scale learning?


• There is no “reduce” for neural networks

• Averaging weights/parameters

–Usually not a good idea

• Genetic algorithms

– Requires a lot of processing power

– Running independent iterations on different machines

– Crossover between weights/parameters of independently trained neural networks after each epoch

Ideas for horizontal scaling


GOTCHAS


• Our 2D and 3D intuition often fails in high dimensions

• Distances tend to become relatively “the same” as

number of dimensions increases

• Dimensionality reduction

– Embedding functions

– Principal component analysis

The Curse of Dimensionality


• “The bottom of a valley is not necessarily the lowest

point on Earth”

• Learning algorithms may get stuck in local optima

• Using momentum or some random noise reduces

this possibility

• Using genetic algorithms can be even more robust,

but it’s computationally expensive

Local Optima


Visualizing Local Optima

monkey saddle


“Based on state-of-the-art machine learning, our

weather forecast system can predict tomorrow’s

weather with 72% accuracy”

Evaluating of Learning

You get the same results by saying “it’s going to be the same as today”


• Don’t test on the data you train on–Use different data set

– Split the data sets you have

• Beware of data biases– Confirmation bias

– Survivorship bias

– Selection bias

• Compare against a benchmark, even a dummy one– Coin flip

– Linear algorithms

– “Same-as-before”

Evaluation of Learning


Architecture and Use Cases


High Level Architecture

VisageCloud Production

HAProxy(reverse proxy)

Image StorageAWS S3

Service(API Controller)

Cassandra Containers

(Docker)

Neural Networks(OpenCV, Dlib,

Torch, pixie magic)CQL Binary

HTTP

API Consumer(Customer Infrastructure)

HTTPS

HTTP

HTTPS


Detect faces

Align facesPre-

processingFeature

extractionFeature

comparison

Processing Pipeline


• The collection

–Slice of data used together

– 10K-100K records

• The Cache-Inside Pattern

– Loading / preloading collection in one application server

–Content based routing/balancing to maximize cache hits

–No logic in the database layer

–Requires periodic polling for updates

• Weaker consistency

Partitioning Data: Application Level Logic



Application Layer

Application Application Application

Cassandra (Database Layer)

Cassandra Node Cassandra Node Cassandra Node Cassandra Node

Content-based balancing/routing

Preload collectionPoll for updatesWrite updates


• Perform comparison logic in database

–User Defined Aggregate Functions

• Removes the need to move data around between

application and database

• Harder to deploy/test

• Stronger consistency



• It’s math, not magic

• If you don’t understand the data, neither will the

machine

• Preprocessing makes the difference

• Test against a benchmark, any benchmark

• Evaluate first, scale later

Key Take-away


[email protected]

+(40) 724 714 234



Let’s keep in touch

mailto:[email protected]




Many thanks to our sponsors & partners!

GOLD

SILVER

PARTNERS

PLATINUM

POWERED BY


Q & A

Scaling face recognition with big data - Bogdan Bocse

Technology