Top Banner
36

[243] turning data into value

Jan 07, 2017

Download

Technology

NAVER D2
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: [243] turning data into value
Page 2: [243] turning data into value

Ph.D in Computer Science at ENS Paris/INRIA

Postdoctoral Fellow at Carnegie Mellon University

>500 citations, Best Paper Award at 2009 CVPR Conference

NEC Labs (Bell Labs) in Cupertino (Silicon Valley)

Senior Researcher at Intel (3 pending patents)

- Developed ML algorithms for face recognition

Invited speaker to CMU, Samsung, Tokyo Univ, SNU, etc.

Co-Founder of Solidware

Olivier Duchenne

Co-founder | Chief Machine Learning Scientist

8 years experience in Machine learning, Computer Vision and Big Data

Page 3: [243] turning data into value

Guidelines for using Machine Learning on real data

Avoid Common Mistakes

Understand Better the Data

1.Big Enough Data?

2.Changing Data

Machine Learning and Data Science

Page 4: [243] turning data into value

From Computer Vision Experience

To Solving Companies issues:

Ex: car accident prediction (insurance),

default prediction (bank),

stock value prediction

Machine Learning and Data Science

Page 5: [243] turning data into value

Prediction Function

Predicted Target Value

ML Algorithms analyze

historical data

to detect patterns

PAST DATA

(Training Data Set)

Internal Data

Ex: Age, Gender

External Data

Ex: Web Crawl

Target Value

Machine-Learning based Predictive Modeling Newly Incoming Data

Unknown

Target Value

Internal Data External Data

Page 6: [243] turning data into value

1. Prediction Function. Ex: a linear function, a neural net,…

2. The prediction function is parametrized. Ex: 𝐟𝜶 𝐗 = 𝜶𝒊𝑿𝒊𝒊

3. The goal is to find the best prediction function, i.e. the best

parameters.

4. We build an objective function, that represents how good a

prediction function is.

5. The objective function always has a data term. Ex: 𝐨𝐛𝐣 𝜶 = 𝒇𝜶 𝑿𝒔 − 𝒀𝒔 𝟐

𝒔

6. The algorithm tries to find the best parameters, that optimizes this

objective function. Ex: closed form solution, stochastic gradient

descent, …

Basic Explanation of Machine Learning

Page 7: [243] turning data into value

History of Machine Learning for Computer Vision

Model-Driven Mixed Data-Driven

1970s Hand-designed Model

1980s Alignment

Method

2000s Deformable

Model

2010s Conv. Network

1990s Grid Model

Page 8: [243] turning data into value

Why didn’t people use ML since the beginning?

General Assumptions for the reason

1.“Better Computer” available now

2.“Better Algorithms”

3.“Amount of Data” “We create so much data that 90% of the data in the world today has

been created in the last two years alone”

- Petter Bae Brandtzæ g, SINTEF ICT

Page 9: [243] turning data into value

How much data did CV Researcher use?

Image source: http://www.vision.caltech.edu/ Image source: http://doi.ieeecomputersociety.org/

2004

Caltech 101

10K Images

2005-2010

Pascal VOC

2K 30K objects

2010-2015

Image Net

10M 15M images

http://www.image-net.org/

Page 10: [243] turning data into value

The answer is… “Amount of Data”

Image source - Smartdatacollective.com

• Most Advanced Machine

Learning cannot be applied if

there are not enough data

• Critical mass of data is

necessary to use, for example,

deep learning

• When the amount of data

increases, the machine

learning models and, therefore,

the prediction model becomes

more complex and better

Page 11: [243] turning data into value

With enough data, ANY algorithms okay?

Support vector machines Bayesian networks

Regression forest Sparse dictionary learning

Artificial neural networks K-Nearest neighbors

Deep learning Boosting

Deep Learning Neural Networks Log. Regression

No, it depends on the company and the problem you are trying to solve

A B C

Page 12: [243] turning data into value

What Changed in Machine Learning Domain From the Past to the Present:

Page 13: [243] turning data into value

Synonym: Over generalizing

That is like visiting a new place during one day, seeing a mountain fire.

And believing that there are fires everyday there.

Why do we need lots of data?

Overfitting

In real life, we do not have many chances of having

clean & BIG data

Page 14: [243] turning data into value

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Seoul Busan Daejeon Gwangju

Prob. To default

Prob. To default… (many more cities)

An example: Overfitting due to lack of data

As there are many

categories,

some categories with small

data show outlier results

Page 15: [243] turning data into value

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

Seoul Busan Daejon Kyangju

Prob. To default

Prob. To default

… (many more cities)

So, always use error bars

Page 16: [243] turning data into value

You want to detect an event which occur on average with probability: p=5%

Let’s say you have many cities with ~50 samples

On average, 1/13 will have this event 0 times.

Without proper handling, the extreme case, will be all wrong.

This kind of error can happen often

Page 17: [243] turning data into value

How to fight against overfitting

Data More Samples

Less Variables

Artificial Data Extension

Algorithm Simpler Objective Function

Regularization

Bagging

Modeling Feature Engineering

Data Normalization

Page 18: [243] turning data into value

Data In Computer Vision, it is possible to extend the data.

Ex: Hiring annotator, Amazon Mechanical Turk, Google Re-Captcha

Companies often they have a limited number of samples, and cannot extend it.

Ex: A Korean Bank that gives ~100K loans per year

Page 19: [243] turning data into value

1. Count only positives ( Detecting rare events require more data)

Ex: Image Detection. It’s easy to find an infinite number of negatives.

Often company want to detect rare events (few positives)

Ex: predicting car accident / ad clicks / defaults / online purchase

How to count your data?

Page 20: [243] turning data into value

2. Difficulty of the task

How to count your data?

• Learning addition ( 𝒚 = 𝟏 ∗ 𝑿𝟏 + 𝟏 ∗ 𝑿𝟐 )

(Requires ~100 samples)

• Learning object recognition

( Requires ~10M samples)

Page 21: [243] turning data into value

3. Probabilistic event detection is harder.

What is in this image?

Will this user click on a car advertisement?

Client #1: Male, 27y.o, lives in Seoul, Salary

man in the construction sector, already

previously clicked on a car advertisement

Client #2: Male, 27y.o, lives in Seoul, Salary

man in the construction sector, already

previously clicked on a car advertisement

Yes

No

How to count your data?

Page 22: [243] turning data into value

Algorithm

1. Many algorithms exist: GLM, Boosting, Lasso, Regression Forest, SVM,

Gaussian Process, Bayesian Networks, Deep Learning, …

2. The complexity of their prediction functions differ.

3. The more complex the prediction function is, the more it fits the data.

Purchase

Prob.

Age

Purchase

Prob.

Age

Purchase

Prob.

Age

Underfitting Overfitting

Algorithm

Page 23: [243] turning data into value

1. Less parameters Less overfitting

2. More parameters Less underfitting

3. Ex: Best of both worlds: Deep Conv Nets

Algorithm

Page 24: [243] turning data into value

Avoiding “Too Many Categories” problem

Busan

Seoul

Dae-

jeon

Dae

-gou

Po-

hang

In-

cheon

Soo-

won

Ul-

San

Page 25: [243] turning data into value

Avoiding “Too Many Categories” problem

Busan

Seoul

Dae-

jeon

Dae

-gou

Po-

hang

In-

cheon

Soo-

won

Ul-

San

Grouping

Merging

Page 26: [243] turning data into value

Avoiding “Too Many Categories” problem

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1 2 3 4 5 6

Prob. To default

Prob. To default log10(population)

Page 27: [243] turning data into value

Regularization

𝑚𝑖𝑛𝜃 𝑠𝑙𝑜𝑠𝑠 𝑔𝑡𝑠, 𝑓𝜃 𝑋𝑠 + 𝜆Ω(𝜃)

𝑚𝑖𝑛𝜃 𝑠𝑙𝑜𝑠𝑠 𝑔𝑡𝑠, 𝑓𝜃 𝑋𝑠 , s.t. Ω 𝜃 < 𝜆

Ω 𝜃 = 𝜃 2

𝜃 1

Page 28: [243] turning data into value

Data Normalization

Removing variance that has no impact on the target value Help the ML system to focus on meaningful variance

Deep Face (Facebook 2014), DB size: 120M images

Page 29: [243] turning data into value

Bagging

1. Randomly modify slightly the training set.

2. Do the training

3. Repeat

4. Average all prediction functions

Page 30: [243] turning data into value

• Market changes

• Law/Regulation Changes

• Collected Data changes

• Client filtering / Marketing changes

Data change through time

Representation of data change

• Variable names change

• Category names change

Changing Data

• Cyclic Data Changes

Seasonality

• Trending has to be handled separately

Interpolation – Extrapolation

Page 31: [243] turning data into value

Why is time so different from other variables ?

Prob.

To buy

A

smartphone

Age

Prob.

To buy

A

smartphone

Time

?

?

Interpolation Extrapolation

Page 32: [243] turning data into value

Time is correlated with hidden variables

Cost for car

insurance

(one type of

insurance)

Time New Law

Page 33: [243] turning data into value

Change causes can be unknown, but consistant

Cost for car

insurance

(one type of

insurance)

Time

Page 34: [243] turning data into value

Seasonality

Cost for car

insurance

(one type of

insurance)

Time

Page 35: [243] turning data into value

Changing Data Representation

• Collected Data changes

• Category splitting, merging

• Variable names change

• Category names change

Page 36: [243] turning data into value

Job Applications: [email protected]

Visit our booth

Thank you

Visit our website: solidware.io