Top Banner
DATA SCIENCE – how we do the magic And how can the customer help. Prof. Danko Nikolic, PhD
83

How data science works and how can customers help

Jan 22, 2018

Download

Data & Analytics

Danko Nikolic
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: How data science works and how can customers help

DATA SCIENCE –how we do the magic

And how can the customer help.

Prof. Danko Nikolic, PhD

Page 2: How data science works and how can customers help

How we become successful together?

Explain how CSC does data science and what we need

from the customer.

MAIN GOAL:

Page 3: How data science works and how can customers help

PRIMARY AUDIENCE

technical personnel at the customer side

Page 4: How data science works and how can customers help

How technically deep is this document (scale 1 – 10)?:

5

For effective data science, how essential is collaboration with the customer (scale 1-10)?:

10

Page 5: How data science works and how can customers help

INTRO

Page 6: How data science works and how can customers help

Textbooks…

… use simplified data to explain how you apply statistical methods,

do not say much on how you deal with data in real life.

… and are thus, misleading.

Page 7: How data science works and how can customers help

Textbooks make you believe that an appropriate model for your data already exists.

“You just needs to select the right model and apply it.”

Page 8: How data science works and how can customers help

Unfortunately, data science is not that simple.

Data scientists do not just pick models.

Page 9: How data science works and how can customers help

Correction: a data scientist creates a model.

Misconception: a data scientist applies a model.

Page 10: How data science works and how can customers help

Each data set has its own oddities, quirks, issues, …

Each phenomenon that we want to model lives in its own world.

The job of a data scientist is to understand this world, and to tailor a model accordingly.

Page 11: How data science works and how can customers help

Rarely will an off-the-shelf model be outright optimal for a real-life problem.

Page 12: How data science works and how can customers help

What a customer buys: a unique model optimized for the customer’s needs.

Page 13: How data science works and how can customers help

Skills and experience of a data scientist translate into the ability to create customized models.

It may take 10 or 20 years to stack up a skill set to effectively build customized models.

Page 14: How data science works and how can customers help

At we offer that experience.

Page 15: How data science works and how can customers help

Creation of a model requires one to master:

statistics, coding, optimization, story telling, visualization, experimental design, Big Data technology, clustering, business models, regression, handling data bases, probability …

Page 16: How data science works and how can customers help

… scientific thinking, deep learning, intuition, distributions, overfitting, information theory, cross-correlation, fractal geometry, computation, multivariate analysis, statistical biases, no-free-lunch theorem, support-vector machine, normalization, regularization, matrix algebra, graph theory, …

… Boltzman machine, drop out, entropy, auto-associative networks, reinforcement learning, Lasso, Cohonen network, back propagation, …

.… natural language processing, scientific publishing, Bayes theorem, genetic algorithms, swarm intelligence, boosting, Markov process, softmax, power spectrum, good regulator theorem, presentation skills, …

… + keeping up with 100s of new models and tools announced every year.

Page 17: How data science works and how can customers help

Hence, a team of experienced data scientists can often navigate this world more effectively

Experience + Team is what gets the customer the best model at the end.

and creatively

than an individual alone.

Page 18: How data science works and how can customers help

Examples of notable team efforts:

Page 19: How data science works and how can customers help
Page 20: How data science works and how can customers help
Page 21: How data science works and how can customers help

US$ 1,000,000

Page 22: How data science works and how can customers help
Page 23: How data science works and how can customers help

These are all unique, newly created models tailored for a particular purpose.

No existing model off-the-shelf could be simply applied.

Page 24: How data science works and how can customers help

But what will a data scientist do?

How does one create a new model?

Page 25: How data science works and how can customers help

Important to distinguish model architecture from a complete model.

Architecture: model specified but without training. Equations and interactions between equations are defined, but parameter values are not yet known.

Complete model: trained model. Parameter values are known. Machine learning has been applied. The model has been fully trained and tested, and is ready to be deployed.

Page 26: How data science works and how can customers help

Example architecture:

A wiring diagram, defined data flow, topology, equations,…but parameter values are not yet specified.

Page 27: How data science works and how can customers help

Example complete model:

W1,1 = 0.12

W1,2 = 0.03

W2,4 = -0.45

…+

Optimal values of parameters are found through machine learning (training) process.

Page 28: How data science works and how can customers help

Architecture

Training

Human person does the work.

Machine does the work.

+

Page 29: How data science works and how can customers help

A data scientist works with a tradeoff between effort invested in designing model’s architecture and training a model.

The more specialized the architecture for a given problem, the less training is needed.

IMPORTANT:

Architecture

Training

Page 30: How data science works and how can customers help

Advantages from a specialized architecture:

- smaller datasets for training- more resilient to over-fitting- closer to global maximum- fewer resources- cost effective- better overall performance

Page 31: How data science works and how can customers help

The opposite is an eclectic architecture.Eclectic architecture can be applied to many different data but needs more training. As a result,

- larger amounts of data needed

- intensive computation

- easily over-fitted

- likely ending in local minima

- higher development costs

- weaker performance

Page 32: How data science works and how can customers help

Architecture

Training

Specialized architecture brings heavy weight to the performance of a model.

Page 33: How data science works and how can customers help

Why does specialized architecture enhance learning?

- The architecture possesses already a part of the needed knowledge — less is left to be learned.

- The learning space becomes smaller (reduced dimensionality)

- During learning, specialized architecture rises signal above noise.

Page 34: How data science works and how can customers help

Big Data, due to their mass,

allow working with more general (eclectic)

architectures.

Page 35: How data science works and how can customers help

Relative contributions to model’s knowledge

Highly specializedarchitecture

“Small” data

This is the ratio we prefer.

Eclecticarchitecture

Big Data

This tradeoff is often successful.

Page 36: How data science works and how can customers help

Example of specialized architecture -general liner model (GLM):Regression based on GLM can work well already with as few as 100 data points.

The architecture of GLM already contains knowledge about: - Gaussian distributions, - linear relationships, - independent sampling, - pairwise correlations,- …

Page 37: How data science works and how can customers help

The specialization of GLM is founded in the discoveries by generations of statisticians.

Over years, they discovered a set of properties that tended to repeat in real-life data sets.

The result is GLM.

Page 38: How data science works and how can customers help

A neural net can learn the same linear relations as GLM + many other relations that GLM cannot. This makes neural nets more eclectic.

However, much larger data sets are needed. The price for the generality of architecture is data size and training time.

it can learn a lot of different things.

Example of an eclectic architecture – Multi-layer perceptron: (aka, artificial neural network)

Page 39: How data science works and how can customers help

Small architecture (general) Big Data also

profit from specialized architectures!

Bigger architecture (more specific)

(less) Big Data

More data cannot always replace architecture: (curse of dimensionality)

Page 40: How data science works and how can customers help

Example Big Data combined with specialized architecture– Convolutional NN:

Only local connectivity; the same weights are repeated across all neurons of one layer.

Convolutional layers in a neural network contain specific knowledge on how the visual world is organized.

Addition of convolutional layers improves learning.

Page 41: How data science works and how can customers help

Better NN architecture; more suitable for processing images; the model ‘knows’ that local pixels are correlated and that they contain information on visual features.

Consequences:

A deep neural network with convolutional layers will perform more effectively than either an all-to-all connected deep network or any other “shallow” network.

Page 42: How data science works and how can customers help

A customer can assist data scientists in developing:

as specialized architecture as possible.

Page 43: How data science works and how can customers help

“Any two optimization algorithms are equivalent when their performance is

averaged across all possible problems.”

No free lunch theorem

Can there exist an eclectic model that also learns easily, like a specialized model?

No!

Because of:

Page 44: How data science works and how can customers help

This is what machine learning is not - even with Big Data.

Any data science problem will require working on an appropriate model architecture.

Page 45: How data science works and how can customers help

high training effort,lower performance

Sp

eci

alize

d

arc

hit

ect

ure

low training effort,often high performance

Ecle

ctic

a

rch

ite

ctu

re

Fast

le

arn

ers

Slo

w le

arn

ers

If you are in this corner, you may be using a wrong model for the given data.

Laws of

physics

Linear

regression

Deep

learningGenetic

algo-

rithms

SVM

Decision

tree

Random

forest

Naïve

Bayes

Various off-the-shelf models can

be approximately sorted according to

how specialized they are:

the black

triangle of unreality

due to the no–free–lunch theorem

The slope of optimal model application

Page 46: How data science works and how can customers help

Off-the-shelf models usually are not end architectures. More often, they are only components of specialized models.

The more eclectic an off-the-shelf model, the more room for adding specializations there is.

Page 47: How data science works and how can customers help

A data scientists will often combine of-the-shelf models

with other components to build a model specialized for

customer’s data.

Page 48: How data science works and how can customers help

Commonly used specialization tool: data wrangling.Data wrangling extracts from the data what is important (the signal!) and in a way that is suitable for an off-the-shelf model. Example:

Equations for data wrangling

Data

Neural net + Specific wrangling steps -> form together a highly specialized model.Here, data wrangling plays a role similar to that of convolution in deep neural nets.

Less thought may be needed to apply a neural net. This is because neural net alone provides an eclectic architecture.

+

Extensive thought given to data wrangling.

Page 49: How data science works and how can customers help

Remember: A data scientist CREATES a model.

Page 50: How data science works and how can customers help

High training effort

Sp

eci

alize

d

arc

hit

ect

ure

Low training effort

Ecle

ctic

a

rch

ite

ctu

re

An inexperienced data scientists may spend a lot of time in this corner.

Where does a data scientist operate?

A naïve ‘data scientist’ would hope to end up here.

.

Page 51: How data science works and how can customers help

How does a data scientist do that?

Three main steps for building a

specialized architecture:

Page 52: How data science works and how can customers help

1. UNDERSTAND!- Analyze data,

dependencies between variables, distributions, etc.

- Study the (physical) system that generated the data.

Page 53: How data science works and how can customers help

A data scientist will perform calculations with the goal to understand the data.

Various tools to help understanding:

A data scientist will talk to experts, ask questions, read literature, go for a walk to think.

descriptive statistics, distribution plots, visualizations, scatter plots, time series, cross-correlation, fractal dimension, …

By doing so, a data scientist will seek insights necessary to implement novel model architectures.

Page 54: How data science works and how can customers help

2. Formally describe

Describe the insight by drawing a graph, writing equations, listing the rules, … ?

Page 55: How data science works and how can customers help

3. Implement into software (code)

Page 56: How data science works and how can customers help

Understand

Formalize

Code

Page 57: How data science works and how can customers help

Various software tools lay on data scientist’s disposal.

Page 58: How data science works and how can customers help

No simple recipe on which parts of a model to begin working first

it’s a creative process!

Page 59: How data science works and how can customers help

Understand

Formalize

Code model

Test

Train

Evaluate

Important help from the customer

comes here.

Therefore, iterations:

Page 60: How data science works and how can customers help

Examples of successful specialized models created by Data Science team:

Page 61: How data science works and how can customers help

Example I:

Predictive maintenance—fan operations

Vibration analysis

Page 62: How data science works and how can customers help

Goal: Detect healthy and unhealthy operations of a fan + classify the source of disturbances. 3-axis vibration sensor mounted on the fan.

Data wrangling and insights: power spectrum to identify frequency bands carrying signals.

Anomaly detection: An auto-associative neural network on full power spectrum.

Disturbance classification: Logistic regression on selected frequency bands.

Performance: 100% on new data sets.

Data Science tools:

Page 63: How data science works and how can customers help

Example II:

Mind reading

Brain signals

Page 64: How data science works and how can customers help

Goal: Reconstruct what the animal sees (stimulus) from the activity of neurons in the visual

cortex.

Data wrangling and insights: Spike sorting; Convolution of neuronal spiking activity.

Stimulus identification: Support vector machine fed with convoluted neural activity.

Stimulus reconstruction: An array of naïve bias classifiers.

Performance: Up to 90%, 10-fold cross-validation.

Data Science tools:

Reference: Nikolić, D.*, S. Häusler*, W. Singer and W. Maass (2009) Distributed fading memory for stimulus properties in the primary visual cortex. PLoS Biology 2009, 7: e1000260.

Page 65: How data science works and how can customers help

Example III:

Predictive maintenance—Coffee machines

Visits from a service technician

Page 66: How data science works and how can customers help

Goal: Predict whether a coffee machine will be visited by a technician within the next 3 months. Data: telemetric data on machine usage.

Data wrangling and insights: cumulative variables, cross-correlation, heat map.

Model: 4-layer artificial neural network on wrangled data.

Performance: 14.1% above chance, 10-fold cross validation. Best performance among 10 competitors.

Data Science tools:

Page 67: How data science works and how can customers help

Example IV:

Train departure and arrival time

Page 68: How data science works and how can customers help

Goal: Compute new timetables in real-time depending on the current traffic situation.

Model specialization: Railway network implementedas a graph; nodes and edges executed as neural nets.

Predictions: individual delays; departure, arrival and waiting times.

Performance: We could predict with 68% accuracy a 3-minute window in which a train will arrive/depart, for as far as 48 hours in the future;

Data Science and Big Data tools:

Page 69: How data science works and how can customers help

How exactly does a customer help?

Page 70: How data science works and how can customers help

Customer does not only deliver data.

Page 71: How data science works and how can customers help

WHAT WE NEED FROM THE CUSTOMER IS:

Make us understand your world!

Page 72: How data science works and how can customers help

You need to do everything in your power to transfer model-relevant knowledge to us.

(We’ll do the rest.)

Page 73: How data science works and how can customers help

Customer’s homework:

- Know your economics.

- Describe the process that created the data.

- Formulate hypotheses.

- Ensure access to relevant experts in your company.

Page 74: How data science works and how can customers help

Your economics: Which model could possibly make you money, or bring other benefits?

Costs increase with Data Science and analytics effort.

As a resultsavings and profits rise, but not linearly.

Sweet spot:Data Science costs are low, benefits are large

Data Science can cost you more than what it saves.

Page 75: How data science works and how can customers help

The process that created data

Be it a single machine or an entire factory floor, a hospital ward or a marketing campaign, the more we understand about the process, the more specialization can we insert into the model.

Page 76: How data science works and how can customers help

Where do you think the signal in the data is? What is your hypothesis?

Good specialized architecture extracts signal over noise.

Point us to the direction you think is right. We’ll check whether there is a signal.

Page 77: How data science works and how can customers help

The person we may need to talk to

Page 78: How data science works and how can customers help

CSC + Customer form a full team.

Page 79: How data science works and how can customers help

The difference between taking an off-the-shelf-model and investing time and expertise to create a specialized model translates into a difference between

mediocre results

and excellent results.

Page 80: How data science works and how can customers help

At we are after excellent results.

Page 81: How data science works and how can customers help

CSC provides top Data Science expertise for developing specialized model architectures in industry.

Page 82: How data science works and how can customers help

Dr. Günter KochSenior [email protected]

Davor AndricPrincipal Solution [email protected]

Christian KaupaDirector BD&[email protected]

Prof. Dr. Danko NikolicLead Data [email protected]

Contacts:

Page 83: How data science works and how can customers help