How data science works and how can customers help

Post on 22-Jan-2018

587 Views

Category:

Data & Analytics

4 Downloads

Preview:

Click to see full reader

Transcript

DATA SCIENCE –how we do the magic

And how can the customer help.

Prof. Danko Nikolic, PhD

How we become successful together?

Explain how CSC does data science and what we need

from the customer.

MAIN GOAL:

PRIMARY AUDIENCE

technical personnel at the customer side

How technically deep is this document (scale 1 – 10)?:

5

For effective data science, how essential is collaboration with the customer (scale 1-10)?:

10

INTRO

Textbooks…

… use simplified data to explain how you apply statistical methods,

do not say much on how you deal with data in real life.

… and are thus, misleading.

Textbooks make you believe that an appropriate model for your data already exists.

“You just needs to select the right model and apply it.”

Unfortunately, data science is not that simple.

Data scientists do not just pick models.

Correction: a data scientist creates a model.

Misconception: a data scientist applies a model.

Each data set has its own oddities, quirks, issues, …

Each phenomenon that we want to model lives in its own world.

The job of a data scientist is to understand this world, and to tailor a model accordingly.

Rarely will an off-the-shelf model be outright optimal for a real-life problem.

What a customer buys: a unique model optimized for the customer’s needs.

Skills and experience of a data scientist translate into the ability to create customized models.

It may take 10 or 20 years to stack up a skill set to effectively build customized models.

At we offer that experience.

Creation of a model requires one to master:

statistics, coding, optimization, story telling, visualization, experimental design, Big Data technology, clustering, business models, regression, handling data bases, probability …

… scientific thinking, deep learning, intuition, distributions, overfitting, information theory, cross-correlation, fractal geometry, computation, multivariate analysis, statistical biases, no-free-lunch theorem, support-vector machine, normalization, regularization, matrix algebra, graph theory, …

… Boltzman machine, drop out, entropy, auto-associative networks, reinforcement learning, Lasso, Cohonen network, back propagation, …

.… natural language processing, scientific publishing, Bayes theorem, genetic algorithms, swarm intelligence, boosting, Markov process, softmax, power spectrum, good regulator theorem, presentation skills, …

… + keeping up with 100s of new models and tools announced every year.

Hence, a team of experienced data scientists can often navigate this world more effectively

Experience + Team is what gets the customer the best model at the end.

and creatively

than an individual alone.

Examples of notable team efforts:

US$ 1,000,000

These are all unique, newly created models tailored for a particular purpose.

No existing model off-the-shelf could be simply applied.

But what will a data scientist do?

How does one create a new model?

Important to distinguish model architecture from a complete model.

Architecture: model specified but without training. Equations and interactions between equations are defined, but parameter values are not yet known.

Complete model: trained model. Parameter values are known. Machine learning has been applied. The model has been fully trained and tested, and is ready to be deployed.

Example architecture:

A wiring diagram, defined data flow, topology, equations,…but parameter values are not yet specified.

Example complete model:

W1,1 = 0.12

W1,2 = 0.03

W2,4 = -0.45

…+

Optimal values of parameters are found through machine learning (training) process.

Architecture

Training

Human person does the work.

Machine does the work.

+

A data scientist works with a tradeoff between effort invested in designing model’s architecture and training a model.

The more specialized the architecture for a given problem, the less training is needed.

IMPORTANT:

Architecture

Training

Advantages from a specialized architecture:

- smaller datasets for training- more resilient to over-fitting- closer to global maximum- fewer resources- cost effective- better overall performance

The opposite is an eclectic architecture.Eclectic architecture can be applied to many different data but needs more training. As a result,

- larger amounts of data needed

- intensive computation

- easily over-fitted

- likely ending in local minima

- higher development costs

- weaker performance

Architecture

Training

Specialized architecture brings heavy weight to the performance of a model.

Why does specialized architecture enhance learning?

- The architecture possesses already a part of the needed knowledge — less is left to be learned.

- The learning space becomes smaller (reduced dimensionality)

- During learning, specialized architecture rises signal above noise.

Big Data, due to their mass,

allow working with more general (eclectic)

architectures.

Relative contributions to model’s knowledge

Highly specializedarchitecture

“Small” data

This is the ratio we prefer.

Eclecticarchitecture

Big Data

This tradeoff is often successful.

Example of specialized architecture -general liner model (GLM):Regression based on GLM can work well already with as few as 100 data points.

The architecture of GLM already contains knowledge about: - Gaussian distributions, - linear relationships, - independent sampling, - pairwise correlations,- …

The specialization of GLM is founded in the discoveries by generations of statisticians.

Over years, they discovered a set of properties that tended to repeat in real-life data sets.

The result is GLM.

A neural net can learn the same linear relations as GLM + many other relations that GLM cannot. This makes neural nets more eclectic.

However, much larger data sets are needed. The price for the generality of architecture is data size and training time.

it can learn a lot of different things.

Example of an eclectic architecture – Multi-layer perceptron: (aka, artificial neural network)

Small architecture (general) Big Data also

profit from specialized architectures!

Bigger architecture (more specific)

(less) Big Data

More data cannot always replace architecture: (curse of dimensionality)

Example Big Data combined with specialized architecture– Convolutional NN:

Only local connectivity; the same weights are repeated across all neurons of one layer.

Convolutional layers in a neural network contain specific knowledge on how the visual world is organized.

Addition of convolutional layers improves learning.

Better NN architecture; more suitable for processing images; the model ‘knows’ that local pixels are correlated and that they contain information on visual features.

Consequences:

A deep neural network with convolutional layers will perform more effectively than either an all-to-all connected deep network or any other “shallow” network.

A customer can assist data scientists in developing:

as specialized architecture as possible.

“Any two optimization algorithms are equivalent when their performance is

averaged across all possible problems.”

No free lunch theorem

Can there exist an eclectic model that also learns easily, like a specialized model?

No!

Because of:

This is what machine learning is not - even with Big Data.

Any data science problem will require working on an appropriate model architecture.

high training effort,lower performance

Sp

eci

alize

d

arc

hit

ect

ure

low training effort,often high performance

Ecle

ctic

a

rch

ite

ctu

re

Fast

le

arn

ers

Slo

w le

arn

ers

If you are in this corner, you may be using a wrong model for the given data.

Laws of

physics

Linear

regression

Deep

learningGenetic

algo-

rithms

SVM

Decision

tree

Random

forest

Naïve

Bayes

Various off-the-shelf models can

be approximately sorted according to

how specialized they are:

the black

triangle of unreality

due to the no–free–lunch theorem

The slope of optimal model application

Off-the-shelf models usually are not end architectures. More often, they are only components of specialized models.

The more eclectic an off-the-shelf model, the more room for adding specializations there is.

A data scientists will often combine of-the-shelf models

with other components to build a model specialized for

customer’s data.

Commonly used specialization tool: data wrangling.Data wrangling extracts from the data what is important (the signal!) and in a way that is suitable for an off-the-shelf model. Example:

Equations for data wrangling

Data

Neural net + Specific wrangling steps -> form together a highly specialized model.Here, data wrangling plays a role similar to that of convolution in deep neural nets.

Less thought may be needed to apply a neural net. This is because neural net alone provides an eclectic architecture.

+

Extensive thought given to data wrangling.

Remember: A data scientist CREATES a model.

High training effort

Sp

eci

alize

d

arc

hit

ect

ure

Low training effort

Ecle

ctic

a

rch

ite

ctu

re

An inexperienced data scientists may spend a lot of time in this corner.

Where does a data scientist operate?

A naïve ‘data scientist’ would hope to end up here.

.

How does a data scientist do that?

Three main steps for building a

specialized architecture:

1. UNDERSTAND!- Analyze data,

dependencies between variables, distributions, etc.

- Study the (physical) system that generated the data.

A data scientist will perform calculations with the goal to understand the data.

Various tools to help understanding:

A data scientist will talk to experts, ask questions, read literature, go for a walk to think.

descriptive statistics, distribution plots, visualizations, scatter plots, time series, cross-correlation, fractal dimension, …

By doing so, a data scientist will seek insights necessary to implement novel model architectures.

2. Formally describe

Describe the insight by drawing a graph, writing equations, listing the rules, … ?

3. Implement into software (code)

Understand

Formalize

Code

Various software tools lay on data scientist’s disposal.

No simple recipe on which parts of a model to begin working first

it’s a creative process!

Understand

Formalize

Code model

Test

Train

Evaluate

Important help from the customer

comes here.

Therefore, iterations:

Examples of successful specialized models created by Data Science team:

Example I:

Predictive maintenance—fan operations

Vibration analysis

Goal: Detect healthy and unhealthy operations of a fan + classify the source of disturbances. 3-axis vibration sensor mounted on the fan.

Data wrangling and insights: power spectrum to identify frequency bands carrying signals.

Anomaly detection: An auto-associative neural network on full power spectrum.

Disturbance classification: Logistic regression on selected frequency bands.

Performance: 100% on new data sets.

Data Science tools:

Example II:

Mind reading

Brain signals

Goal: Reconstruct what the animal sees (stimulus) from the activity of neurons in the visual

cortex.

Data wrangling and insights: Spike sorting; Convolution of neuronal spiking activity.

Stimulus identification: Support vector machine fed with convoluted neural activity.

Stimulus reconstruction: An array of naïve bias classifiers.

Performance: Up to 90%, 10-fold cross-validation.

Data Science tools:

Reference: Nikolić, D.*, S. Häusler*, W. Singer and W. Maass (2009) Distributed fading memory for stimulus properties in the primary visual cortex. PLoS Biology 2009, 7: e1000260.

Example III:

Predictive maintenance—Coffee machines

Visits from a service technician

Goal: Predict whether a coffee machine will be visited by a technician within the next 3 months. Data: telemetric data on machine usage.

Data wrangling and insights: cumulative variables, cross-correlation, heat map.

Model: 4-layer artificial neural network on wrangled data.

Performance: 14.1% above chance, 10-fold cross validation. Best performance among 10 competitors.

Data Science tools:

Example IV:

Train departure and arrival time

Goal: Compute new timetables in real-time depending on the current traffic situation.

Model specialization: Railway network implementedas a graph; nodes and edges executed as neural nets.

Predictions: individual delays; departure, arrival and waiting times.

Performance: We could predict with 68% accuracy a 3-minute window in which a train will arrive/depart, for as far as 48 hours in the future;

Data Science and Big Data tools:

How exactly does a customer help?

Customer does not only deliver data.

WHAT WE NEED FROM THE CUSTOMER IS:

Make us understand your world!

You need to do everything in your power to transfer model-relevant knowledge to us.

(We’ll do the rest.)

Customer’s homework:

- Know your economics.

- Describe the process that created the data.

- Formulate hypotheses.

- Ensure access to relevant experts in your company.

Your economics: Which model could possibly make you money, or bring other benefits?

Costs increase with Data Science and analytics effort.

As a resultsavings and profits rise, but not linearly.

Sweet spot:Data Science costs are low, benefits are large

Data Science can cost you more than what it saves.

The process that created data

Be it a single machine or an entire factory floor, a hospital ward or a marketing campaign, the more we understand about the process, the more specialization can we insert into the model.

Where do you think the signal in the data is? What is your hypothesis?

Good specialized architecture extracts signal over noise.

Point us to the direction you think is right. We’ll check whether there is a signal.

The person we may need to talk to

CSC + Customer form a full team.

The difference between taking an off-the-shelf-model and investing time and expertise to create a specialized model translates into a difference between

mediocre results

and excellent results.

At we are after excellent results.

CSC provides top Data Science expertise for developing specialized model architectures in industry.

Dr. Günter KochSenior Managergkoch@csc.com

Davor AndricPrincipal Solution Architectdandric@csc.com

Christian KaupaDirector BD&Ackaupa@csc.com

Prof. Dr. Danko NikolicLead Data Scientistdnikolic3@csc.com

Contacts:

top related