Top Banner
A brief introduction to Computer Vision Michele Ermidoro SPEAKER Michele Ermidoro PLACE UniBG, Dalmine DATE 12 March 2019 Ingegneria dei Sistemi di Controllo AA 2018-2019
114

AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Oct 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

A brief introduction to Computer Vision

Michele Ermidoro

SPEAKER

Michele Ermidoro

PLACE

UniBG, Dalmine

DATE

12 March 2019

Ingegneria dei Sistemi di Controllo

AA 2018-2019

Page 2: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Outline

1. What is Machine Vision?

2. Digital image: start from basics

• Image representation

• Image processing

3. Classic approach

4. Convolutional Neural Network

5. An Object Detection Pipeline

6. Deep Learning Framework

2

Page 3: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

What is Machine Vision?

Page 4: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do.Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g., in the forms of decisions.Understanding in this context means the transformation of visual images (the input of the retina) into descriptions of the world that can interface with other thought processes and elicit appropriate action.

[https://en.wikipedia.org/wiki/Computer_vision]

Machine vision (MV) is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industryMachine vision as a systems engineering discipline can be considered distinct from computer vision, a form of computer science. It attempts to integrate existing technologies in new ways and apply them to solve real world problems.

[https://en.wikipedia.org/wiki/Machine_vision]

What is Machine Vision?

Page 5: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer

Vision

Machine

Vision

Image

Processing

Neurobiology Imaging

Optics

Signal

processing

RoboticsAr tificial

intelligence(AI)

Machine

Learning

Math

Computer

intelligence

Object

Detection

Cognitive

Vision

Geometry

Statistics

Optimization

Human Vision

Lighting

Imaging

sensors

Lenses

Odometry

Navigation

Data

Estimation

Filtering

What is Machine Vision?

Page 6: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

• Almost 80% of the data traveling on the net is visual data

• Everybody has a smartphone, and every smartphone has at least 2 cameras

• "A picture is worth a thousand words" is an English language-idiom

• A camera is one of the powerful sensor in a lot of applications

• It has wide fields of application:• Robotics• Surveillance• Industry • Self-driving cars/drones/buses..• Medical• …

http://crcv.ucf.edu/people/faculty/Bagci/research.php

Computer Vision – Why?

Page 7: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer Vision – Hype?

Page 8: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer Vision – Hype?

≈5.1

Human

Page 9: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer Vision – Hype?

≈5.1

Human

What happened

here?

1. Computational Power

2. Convolutional Neural

Network (CNN)

Page 10: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computational Power

https://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-characteristics-over-time/

2009 - “We argue that modern graphics processors far surpass the computational capabilities of multicore CPUs, and have the potential to revolutionize the applicability of deep unsupervised learning methods” “Large-scale Deep Unsupervised Learning using Graphics

Processors” Rajat Raina, Anand Madhavan, Andrew Y. Ng

https://blog.inten.to/cpu-hardware-for-deep-learning-b91f53cb18af

IMAGENET Challenge

Page 11: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Neural Network

Page 12: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer Vision Tasks

http://vision.stanford.edu

Classification

What’s in the image?

• People• Car• Traffic light• Clock • …

Page 13: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer Vision Tasks

Car

http://vision.stanford.edu

Detection

What’s in the image? And Where it is?

• A car in the orange box

Page 14: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

CarPerson

Clock

Computer Vision Tasks

http://vision.stanford.edu

Detection

What’s in the image? And Where it is?

• A car in the orange box

• A person in the blue box

• A clock in the green box

Page 15: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer Vision Tasks

http://vision.stanford.edu

Segmentation

What’s in the image? And Where it is?

And Which pixels are?

Car

Clock

Page 16: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer Vision Tasks

http://vision.stanford.edu

Annotation

How would you describe the picture?

“People crossing a

street while a car

is waiting”

Page 17: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer Vision Tasks

https://engineering.matterport.com/splash-of-color-instance-

segmentation-with-mask-r-cnn-and-tensorflow-7c761e238b46

Page 18: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

• Face detection

• Smile detection

• Eye-open detection

• …

Computer Vision Tasks - Others

Page 19: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

https://medium.com/waymo/recreating-the-self-driving-experience-the-making-of-the-waymo-360-video-

37a80466af49

Computer Vision Tasks - Others

Page 20: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Figure credit: Dai, He, and Sun, “Instance-aware Semantic Segmentation via Multi-task Network Cascades”, CVPR 2016

Computer Vision Tasks - Others

Page 21: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Cardiologist-Level Arrhythmia Detection

with Convolutional Neural Networks

Computer Vision Tasks - Others

Page 22: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

https://www.youtube.com/watch?v=bcswZLwhTUI

Computer Vision Tasks - Others

Page 23: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Computer Vision Tasks - Others

Page 24: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

https://www.youtube.com/watch?v=NrmMk1Myrxc

Computer Vision Tasks - Others

Page 25: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Digital images Basics

Classic approachFeature engineering

CNN

What’s next

Page 26: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Digital image: start from basicsImage representation

Page 27: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Image representation

An image, inside a PC, is just a matrix of numbers

255 -> white

0 -> black

0

1

1

1

1

0

1

1

1

1

1

0

1

1

0

1

4x4x1 matrixColor depth – 1 bitColor channels – 116 pixel

Page 28: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Image representation

An image, inside a PC, is just a matrix of numbers

217

255

255

255

255

191

255

255

255

255

255

127

255

255

0

255

4x4x1 matrixColor depth – 8 bitColor channels – 116 pixel

Page 29: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Image representation

255

255

255

255

255

170

255

255

255

255

255

85

255

255

0

255

4x4x2 matrixColor depth – 8 bitColor channels – 216 pixel

0

255

255

255

255

85

255

255

255

255

255

170

255

255

255

255

Page 30: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Image representation

2453x2453x3 matrixColor depth – 24 bitColor channels – 3Color space - sRGB

6 Mpixel (6.017.209 pixel)

Page 31: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Color spaces

Color space, also known as the color model (or colorsystem), is an abstract mathematical model which simply describes the range of colors as tuples of numbers, typically as 3 or 4 values or color components

There are a variety of color spaces, such as RGB, CMYK, HSV, CIEXYZ..

CMYK

(Cyan, Magenta, Yellow,

Key black)

RGB (Red, Green, Blue)

Page 32: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Color spaces

CIE XYZ

It is the most accurate from a scientific point of view. It tries to represent all the colors that an human eye can see.

RGBThe RGB color model is an additive color model in which red, green and blue light are added. The name comes from the three additive primary colors, red, green, and blue.

HSB (Hue/Sat/Bright)

Designed in the 1970s by

computer graphics researchers

to more closely align with the

way human vision perceives

color-making attributes

Page 33: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Histogram

A color histogram is a representation of the distribution of colors in an image

Gray-scale histogram

Color histogram

Histogram equalization

Page 34: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Digital image: start from basicsImage processing

Page 35: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

BinarizationImage binarization

It trasfroms a grey scale image in a binary

image, depending on a threshold

𝑔[𝑛,𝑚] = ቊ255, 𝑓 𝑛,𝑚 > 100

0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Page 36: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Filters

Most of the operation on an image are made using filters

• Filters are mathematic functions which are applied to each pixel.

• The filter is represented using a matrix, called Kernel

• Depending on the size of the kernel (3x3, 5x5, 7x7..), the functions will involve the pixel and its neighbors.

• The filters are applied to an image convolvingthe image and the kernel

• The kernel sum must be 1

Page 37: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

ConvolutionConvolution is the process of adding each element of the image to its local neighbors, weighted by the kernel.

Input image

Kernel

-13

Output

STEP 1

http://www.songho.ca/dsp/convolution/convolution2d_example.html

Page 38: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolution

Input image

Kernel Output

Convolution is the process of adding each element of the image to its local neighbors, weighted by the kernel.

STEP 5

-13

-18

-20

-24

-17

http://www.songho.ca/dsp/convolution/convolution2d_example.html

Page 39: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolution

Input image

Kernel Output

Convolution is the process of adding each element of the image to its local neighbors, weighted by the kernel.

STEP 9

-13

-18

13

-20

-24

20

-17

-18

17

http://www.songho.ca/dsp/convolution/convolution2d_example.html

Page 40: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolution

Page 41: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Kernel shape

Depending on the shape of the kernel, different operations can be made:

-1 -1-1

-1 -18

-1 -1-1

Edge-Detection

0 0-1

-1 -15

0 0-1

Sharpen

1 11

1 11

1 11

Blur/moving average

1

9

0 00

0 01

0 00

Original

Page 42: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Kernel shape

Depending on the shape of the kernel, different operations can be made:

-1

2

-1

-1

2

-1

-1

2

-1

-1

-1

-1

2

2

2

-1

-1

-1

-1

-1

2

-1

2

-1

2

-1

-1

-1

-1

-1

-1

8

-1

-1

-1

-1

EDGE 45° Lines

Horiz. Lines Vert. Lines

Page 43: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Kernel example

Denoising

Reduce noise in an image

1 12

2 24

1 12

1

16

Filtro Gaussiano

Page 44: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Classic approach

Page 45: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Visual recognition

Analyze images and extract high level information, like what’s in the

picture, where it is..

Different viewpoint Different illumination

Deformations Different shape of the same object

Page 46: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

How to identify an object?

Object Bag of ‘Features’

Page 47: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

How to identify an object?

The idea is to extract from the image some of its most important characteristics. We’ll call them “features”

These features will be then compared with a dictionary, where the specific knowledge is stored.

The selection of what’s important to search in the image is called features

engineering and extraction

The comparison phase is done through a classifier

The knowledge is passed to the classifier during the training

Page 48: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Classification pipeline

The classifier is a function that receive as input all the representative features

of an image and it decides what’s in the picture.

𝑓 = 𝑎𝑝𝑝𝑙𝑒

𝑓 = 𝑐𝑜𝑤

𝑓 = 𝑡𝑜𝑚𝑎𝑡𝑜

Page 49: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

features

extraction

Classification pipeline

Train images

Features Training

Training

labels

Classifier

Test image

features

extractionFeatures Classifier It’s an apple

Prediction

Page 50: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

features

extraction

Classification pipeline

Train images

Features Training

Training

labels

Classifier

Test image

features

extractionFeatures Classifier It’s an apple

Prediction

Features engineering Choice of the classifier

Page 51: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Features?

In Computer Vision, a feature is a piece of information which is particularly relevant in the solution of a certain problem (e.g. in the detection of a face, the presence of two eyes is a good feature)

We can identify 3 hierarchical categories of features:

Low-level features• Colors• Edges• Blob• Corners

Mid-level features• Scale-invariant

features• SIFT• SVD

High-level features• Histograms of

gradients (HOG)• Region descriptors

The selection process is called feature engineering

Page 52: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Low Level Features

Edges & Corners:Since the process of image classification involve the exploitation of edges and corners, there are a lot of methods for finding them.

Canny edge detector Harris corner detection

https://dsp.stackexchange.com/questions/14338/corner-detection-using-chris-harris-mike-stephens

It uses a 5 step algorithm, involving some filters (gradient) to compute all the edges in an image.

A mobile window slide over the image and compute the Hessian. Evaluating the eigenvalue of each matrix the corners are detected.

Page 53: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Mid Level Features

SIFT – Scale Invariant Feature Transform [1999]It is an algorithm which is able to detect and describe features in an image. In particular it is able to do this at different scales and rotations.

It has diffent steps which involve a scaling of the image and the computation of the Difference Of Gaussians (DoG)

Page 54: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

High Level Features

HOG – Histogram of Oriented Gradients [2005]The technique counts occurrences of gradient orientation in localized portions of an image. The HOG feature descriptor, the distribution ( histograms ) of directions of gradients ( oriented gradients ) are used as features. Gradients ( x and y derivatives ) of an image are useful because the magnitude of gradients is large around edges and corners ( regions of abrupt intensity changes ) and we know that edges and corners pack in a lot more information about object shape than flat regions

https://www.learnopencv.com/histogram-of-oriented-gradients/

Page 55: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Classifier

• The classifier needs a phase of training or learning

• In this phase the classifier “learn” how to distinguish between the classes in output

• This training phase needs a dataset where each images is labelled with the corresponding class name

Training

Training

labels

Classifier

Choice of classifier

DOGS CATS

The classifiers which learn from a labelled dataset are called supervised and they can be divided into 3 major categories:1. Linear (with training)2. Trees (with training)3. Based on distances (without training)

Page 56: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Manual classifier - example

As hypothesis, we want to recognize the type of bottle watching it from above. The

diameter of the neck determines the type of bottle.

Classifierfeatures info

Hough Transform

𝑑1

Hough Transform

𝑑2

IF 𝑑𝑖𝑛 > 𝑡ℎ1 THEN

𝑏𝑜𝑡𝑡𝑙𝑒𝑡𝑦𝑝𝑒 = 1

ELSE

𝑏𝑜𝑡𝑡𝑙𝑒𝑡𝑦𝑝𝑒 = 2

END

1

2

Features extraction

features Manual classifier Class

Page 57: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Supervised classifier – distance example

A more complex problem, is there a cat or a dog in the image?

Kaggle Dogs vs. Cats dataset.

Istogramma RGB - 3D

Manual

classifier?

Classifierfeatures info

Page 58: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Supervised classifier – distance example

A more complex problem, is there a cat or a dog in the image?

Kaggle Dogs vs. Cats dataset.

Istogramma RGB - 3D

Classifierfeatures info

x x

xx

x

x

x

x

o

oo

o

o

o

o

x2

x1

+

1-NN - cat

3-NN - dog

5-NN - dog

cat

dog

k-Nearest

Neighbor

FeaturesClassifier

without training

Dataset

Page 59: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Supervised classifier - linear

There are various linear supervised classifier, some of them are:• Logistic regression• SVM• Neural Network

Kaggle Dogs vs. Cats dataset.

Istogramma RGB - 3D

Training

Labels

Trained

Classifier

cat

Parameters

Page 60: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Supervised learning – tree

We have different classifier based on trees:• Decision trees• Random forest• Gradient Boosting

Kaggle Dogs vs. Cats dataset.

Istogramma RGB - 3D

Training

Labels

Trained

Classifier

cat

Parameters

Page 61: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Viola Jones object detector (Haar Cascades)

• First object detection framework to provide competitive object detection rates in real-time

• Employ Haar Features tocharacterize the input image

Viola, Jones: Robust Real-time Object Detection, IJCV 2001

Each feature resuts in a single value computedsubtracting the sum of pixels under white rectanglefrom the sum of pixels under black rectangle

Page 62: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Viola Jones object detector (Haar Cascades)

• First object detection framework to provide competitive object detection rates in real-time

• Employ Haar Features tocharacterize the input image

• Makes use of a bank of Adaboost classifiers in acascade fashion

Viola, Jones: Robust Real-time Object Detection, IJCV 2001

Sub windows of the image at different scales are passedthrougth a series of classifier and discarded if they fail in any of the stage

Page 63: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Viola Jones object detector (Haar Cascades)

• First object detection framework to provide competitive object detection rates in real-time

• Employ Haar Features tocharacterize the input image

• Makes use of a bank of Adaboost classifiers in acascade fashion

https://vimeo.com/12774628

Page 64: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

N. Dalal and B. Triggs Histograms of oriented gradients for human detection CVPR, 2005

Step 1: scan image at

all scales and locations

Step 2: extract features

over each sliding

window location

Step 3: use linear SVM

to classify features

extracted from each

window

Step 4: apply non-

maxima suppression to

obtain final bounding

boxes

Pros/Cons:

+ Real-time

+ Open source software implementation (Dlib)

+ Higher accuracy than Haar Cascades

- Pre-trained only for frontal faces

- Low accuracy on distance and overlapping faces

- High false-positive rate

- High sensitivity to parameter changes

An example – HOG for pedestrian detection

Page 65: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Where we are?

PASCAL VOC Challenge

It is an online competion for object detection (classification and localization of objects)

20 classes.

11,530 images

27,450 objects

Page 66: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Where we are?

• Plateau of results between2011/2012

• Results of 2012 are achieved using a combination of HOG+LBP features and an ensemble of classifier in: Boosted Local Structured HOG-LBP for Object Localization Junge Zhang, Kaiqi Huang, Yinan Yu and Tieniu Tan, 2012

PASCAL Visual Object Classes:• Train/validation/test: 9,963 images

containing 24,640 annotated objects• 20 classes of objects

Page 67: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Where we are?

Page 68: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Neural NetworkCNN

Page 69: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Neural Network

Train images Training

labels

Test image/s

It’s an applePrediction

Training

Features Classifier

Learned Model

Learned

Features

Learned

Classifier

Learned Model

Learned

Features

Learned

Classifier

Page 70: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Neural Network

In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analysing visual imagery.

Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field.

CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered.

A CNN consists of an input and an output layer, as well as multiple hidden layers. The hidden layers of a CNN typically consist of convolutional layers, RELU layer i.e. activation function, pooling layers, fully connected layers and normalization layers.

https://en.wikipedia.org/wiki/Convolutional_neural_network

Page 71: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Neural Network

≈5.1

Human

Page 72: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

CNN - AlexNet

Conv 1

Conv 2

Conv 5

Conv 4

Conv 3

FC

8

FC

7

FC

6

OU

TP

UT

Inp

ut Im

age

Page 73: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

Recap

Convolution

Page 74: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

32

32

3

5

5

3

Input image: 32x32x3 (height,width,depth)

Filter: 5x5x3

1. Convolve the filter with the image

(“slide over the image spatially)

2. Filter should have the same depth

of the previous layer (in this case 3)

3. Convolution preserve spatial

structure

Page 75: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

28

28

1

Input image: 32x32x3

Filter: 5x5x3

1 number:Convolution result, the dot product

between the filter and a small 5x5x3

chunk of the image

Slide (convolve) over spatial location

Activation Map

Page 76: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

28

28

1

Input image: 32x32x3

Filter #2: 5x5x3

Slide (convolve) over spatial location

Activation Map

Page 77: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

If we have a 4 filters, we’ll have 4 activation maps

Slide (convolve) over spatial location

We obtain a new 28x28x4 image

Page 78: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

In the previous step, convolving we reduced the dimension from 32 to 28

7

7

3

3

7x7 image3x3 filter

Page 79: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

In the previous step, convolving we reduced the dimension from 32 to 28

7

7

3

3

7x7 image3x3 filter

Page 80: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

In the previous step, convolving we reduced the dimension from 32 to 28

7

7

3

3

7x7 image3x3 filter

Page 81: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

In the previous step, convolving we reduced the dimension from 32 to 28

7

7

3

3

7x7 image3x3 filter

→ Produce a 5x5 output

In order to keep the dimensionality (keep to a very depth neauralnetwork), we can add a frame, called padding

Page 82: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

9

3

9x9 (7x7 image + 1 frame of padding)

3x3 filter

The size of padding depend on the size of the filter

→ Produce a 7x7 output

Assume we want to reduce the size of the output of the convolution layer, we can use the stride parameter

0 0 0 0 0 0 0 0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0 0 0 0 0 0 0 0

93

Page 83: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

9x9 (7x7 image + 1 frame of padding)

3x3 filter

Stride 2

9

3

0 0 0 0 0 0 0 0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0 0 0 0 0 0 0 0

93

Page 84: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

9x9 (7x7 image + 1 frame of padding)

3x3 filter

Stride 2

9

3

0 0 0 0 0 0 0 0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0 0 0 0 0 0 0 0

93

Page 85: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

9x9 (7x7 image + 1 frame of padding)

3x3 filter

Stride 2

9

3

0 0 0 0 0 0 0 0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0 0 0 0 0 0 0 0

93

Page 86: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

9x9 (7x7 image + 1 frame of padding)

3x3 filter

Stride 2

9

3

0 0 0 0 0 0 0 0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0

0 0 0 0 0 0 0 0 0

9 [N]3 [F]

→ Produce a 4x4 output

In general:

𝐎𝐮𝐭𝐩𝐮𝐭 =𝑵− 𝑭

𝒔𝒕𝒓𝒊𝒅𝒆+ 𝟏

Page 87: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Convolutional Layer

The Conv Layer:• Accepts a volume of size W1 x H1 x D1

• Requires 4 hyperparameters:• Number of filters K• Their spatial extent F• The amount of zero padding P• The stride S

• Produce a volume of size W2 x H2 x D2 where:

• 𝑊2 =𝑊1−𝐹+2𝑃

𝑆+ 1

• 𝐻2 =𝐻1−𝐹+2𝑃

𝑆+ 1

• 𝐷2 = 𝐾• It introduces 𝐹 ∙ 𝐹 ∙ 𝐷1 weights per filter, for a total of (𝐹 ∙ 𝐹 ∙ 𝐷1) ∙ 𝐾

weights (10 filter with a dimension of 5x5 on an RGB image will have 5 ∙ 5 ∙ 3 ∙ 10 = 𝟕𝟓𝟎 parameters)

Common settings: • K = (powers of 2, e.g. 32, 64, 128, 512) • F = 3, S = 1, P = 1 • F = 5, S = 1, P = 2 • F = 5, S = 2, P = ? (whatever fits) • F = 1, S = 1, P = 0

Page 88: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Brain / Convolutional Layer

28

28

1

Filter: 5x5x3 Convolution

result

Activation Map

32

32

3

This is like a single neuron with local connectivity

An activation map is a 28x28 sheet of neuron outputs:1. Each is connected to a small region in the

input 2. All of them share parameters

28 “5x5 filter” -> “5x5 receptive field for each neuron”

Page 89: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Brain / Convolutional Layer

28

28

5

32

32

3

Using 5 filters, we are stacking neurons in a matrix 28x28x5.

This means that, somehow, 5 different neurons are looking at the same piece of image and producing an output.

28

Page 90: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Activation layer

Done by convolution

Done by activation function

Page 91: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Activation layer

Sigmoid

𝒚 𝒙 =𝟏

𝟏 + 𝒆−𝒙

Hyperbolic tang.𝒚 𝒙 = 𝒕𝒂𝒏𝒉(𝒙)

ReLU𝒚 𝒙 = 𝒎𝒂𝒙(𝟎,𝒙)

Leaky ReLU𝒚 𝒙 = 𝒎𝒂𝒙(𝟎.𝟏𝒙, 𝒙)

ELU

𝒚 𝒙 = ቊ𝒙

𝜶(𝒆𝒙 − 𝟏)𝒙 ≥ 𝟎𝒙 < 𝟎

Page 92: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Pooling layer

This layer reduce the spatial size of the representation of the image. It aims to reduce the amount of parameters and computation in the network, and hence to also control overfitting

MAX POOLING2x2 filter

Stride of 2

The idea is to reduce the size keeping the neuron “more activated”

Page 93: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Pooling layer

• Accepts a volume of size W1 x H1 x D1

• Requires two hyperparameters:

• their spatial extent F,• the stride S,

• Produces a volume of size W2 x H2 x D2 where:

• 𝑊2 =𝑊1−𝐹

𝑆+ 1

• 𝐻2 =𝐻1−𝐹

𝑆+ 1

• 𝐷1 = 𝐷2

• Introduces zero parameters since it computes a fixed function of the input

• Different version of pooling exist. The most used is MAX pooling, then you have AVG

pooling..

• Pooling can be obtained even through Conv Layer with big stride

Page 94: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Fully connected layer

These layers are just like the classic neural network layers.

All the inputs are connected to the outputs

Page 95: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Fully connected layer

In the CNNs they have the task to “classify” the high level features extracted by the previous layers

Image 32x32x3Stretched to 3072x1

3072

1

Output can be the number of classes (e.g. 10)

10

1Wx10x3072 weights

This will be the layer involved in the concept of “Transfer Learning” (more on this later)

Page 96: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

CNN – AlexNet – Recap

Conv 1

11

x11

x3x9

6 s

trid

e 4

Conv 2

5x5

x96

x25

6

Conv 5

3x3

x38

4x2

56

Conv 4

3x3

x38

4x3

84

Conv 3

3x3

x25

6x3

84

FC

8

FC

7

FC

6

OU

TP

UT

Inp

ut Im

age

Page 97: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

CNN – AlexNet – Recap

Figure credit: Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014

Features (and filters) are more complex compared to the manual ones

Hierarchical

features

Page 98: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Other networksGoogLeNet. The ILSVRC 2014 winner. Its main contribution was the development of an Inception Module that dramatically reduced the number of parameters in the network (4M, compared to AlexNet with 60M). Additionally, this paper uses Average Pooling instead of Fully Connected layers at the top of the ConvNet, eliminating a large amount of parameters. (Inception)

VGGNet. The runner-up in ILSVRC 2014. Its main contribution was in showing that the depth of the network is a critical component for good performance. Their final best network contains 16 CONV/FC layers and, appealingly, features an extremely homogeneous architecture that only performs 3x3 convolutions and 2x2 pooling from the beginning to the end.

ResNet. Residual Network was the winner of ILSVRC 2015. It features special skip connections and a heavy use of batch normalization. The architecture is also missing fully connected layers at the end of the network (as of May 10, 2016).

Architecture #Params Size Accuracy Year #OPS FW time

[GPU]

FW time

[CPU]

AlexNet 61M 238 MB 80.2 2012 724M 3.1 ms 0.29 s

Inception V1 7M 70 MB 88.3 2014 1.43B - -

VGGNet 138M 528 MB 91.2 2014 15.5B 9.4 ms 4.36 s

ResNet-50 25.5M 99 MB 93 2015 3.9B 11 ms 1.13 s

GPU - Titan X | CPU i7-4790K (4 GHz)

Page 99: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Training Process

1. Create your model

2. Choose the Activation Functions (use ReLU)

3. Data Preprocessing (images: subtract mean)

4. Weight Initialization

5. Use Batch Normalization

6. Babysitting the Learning process

7. Hyperparameter Optimization

Page 100: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

An Object Detection Pipeline

Page 101: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

How to use a CNN on your data

AIM: being able to reuse a CNN for object detection on our own data

1. REUSE A CNN2. OBJECT DETECTION3. OWN DATA

Page 102: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Data gathering

Suppose we want to create a CV software which is able to detect playing cards (for simplicity only 9,10, Jack, Queen , King and Ace)

Since the algorithm is supervised we need a dataset with the label associated

The dataset must be:1. As large as possible (200 items per class at least)2. With the same object in different “conditions” (background, lights)3. With random objects along with the desired object4. It should respect “application condition”. So decide if we need partial objects,

overlapping and so on5. With no label errors6. Not too large (less training time)

You can create the dataset taking pictures (e.g. smartphone) or harvest from Google Images.

Page 103: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Data gathering

Page 104: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

LabellingWe need to put an annotation on each image, to explain to the CNN what’s in the image.

We are building an object detection, so the label process will involve the creation of bounding boxes

10

King

King

1. This process is how we transfer the knowledge to the data

2. We can use a lot of open-source software (LabelImg→https://github.com/tzutalin/labelImg)

3. It will create an XML file associated to the image

<object>

<name>ten</name>

<pose>Unspecified</pose>

<truncated>0</truncated>

<difficult>0</difficult>

<bndbox>

<xmin>145</xmin>

<ymin>68</ymin>

<xmax>303</xmax>

<ymax>225</ymax>

</bndbox>

</object>

Page 105: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Object detection

Our dataset is ready, we need now to search a model for Object detection.

So far we learned how to do Image Classification, we can use the same models and slide a window over the image.

CONS: different shape of the window in different position. Huge amount of time.

Page 106: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Object detection – Classification based

Region proposal: run the CNN only on part which can contain an object

R-CNN

Further improvements leaded to the creation of:• Fast-RCNN• Faster-RCNN

They are also called two-stage algorithms

Girshick et al, “Rich feature hierarchies for accurate object detection and

semantic segmentation”, CVPR 2014

Page 107: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Object detection – Regression based

NO Region proposal: run the CNN in “pre-defined” areas

Divide image into grid 7 x 7

Imagine you have B base boxes for each grid cell

The CNN will regress from base boxes to a new tensor

(dx, dy, dh, dw, confidence)

Output of these network are a big tensor

7 x 7 x (B * (5 + C))C - classes

Page 108: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Object detection – Regression based

NO Region proposal

Classes: C=2 [cat – dog]Boxes: B = 3

Output: 7x7x21

(confidence[0.85], bx, by, bh, bw, cat[0], dog[1])

The most famous structures are:• YOLO (You Only Look Once)• SSD (Single Shot Detection)

They are also called one-stage algorithms and they’ve been built with the aim of obj detection

They are faster compared to other methods, but even less accurate.

Page 109: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Object detection

Once we decided the model, we can download the structure and the weights.

https://github.com/tensorflow/models/blob/master/resea

rch/object_detection/g3doc/detection_model_zoo.md

These models already have the structure of the CNN implemented.

The weights we download, however, are trained on some other dataset (COCO, Pascal, Kitti…)

How to use these huge network already trained for our problem?

The solution is transfer learning

Page 110: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Transfer learningTraining a CNN from zero requires a very big dataset and a very expensive hardware

RetrainFreeze

Already trained Small dataset Big dataset

#Class #Class

It re-uses the ability learnt in another task

Pre trained on ImageNet

Page 111: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Object detection pipeline

Create your own Dataset

Choose your network structure

(Faster-RCNN / SSD..)

Modify the Fully Connected layers

Do a «trasnfer training»

Detect your objects

Page 112: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Object detection examples

YOLO – 1 class SSD – 5 classes

Page 113: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Calibrated

cropping

Persons Detection

R-FCN

Face Extraction

D-Lib

Age/Gender estim.

VGG-16 + DC layer

W –

(25,35)

M –

(38,43)

Object detection examples

Page 114: AA 2018-2019 Computer Vision - CAL UniBg · Computer vision (CV), from the perspective of engineering, it seeks to automate tasks that the human visual system can do. Computer vision

Bibliografia

• https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html• https://www.youtube.com/playlist?list=PL3FW7Lu3i5JvHM8ljYj-zLfQRF3EO8sYv• http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-

networks.pdf• https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-

Multiple-Objects-Windows-10• https://project.inria.fr/deeplearning/files/2016/05/DLFrameworks.pdf [for in depth

comparison of DL frameworks]