Top Banner
Deriving Knowledge from Data at Scale
142

Barga Data Science lecture 6

Feb 09, 2017

Download

Data & Analytics

Roger Barga
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 2: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Feature extraction and selection are the most important but underrated step

of machine learning. Better features are better than better algorithms…

Page 3: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 4: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 5: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Lecture Objectives

homework

There is an order or workflow

that takes place here, don’t lose

the forest in the trees…

Page 6: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Review…

Page 7: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

• Cluster 0 – It contains a cluster of Females with an average age of 37 who live in inner city and possess saving account number and current account number. They are unmarried and do not have any mortgage or pep. The average monthly income is 23,300.

• Cluster 1 - It contains a cluster of Females with an average age of 44 who live in rural area and possess saving account number and current account number. They are married and do not have any mortgage or pep. The average monthly income is 27,772.

• Cluster 2 - It contains a cluster of Females with an average age of 48 who live in inner city and possess current account number but no saving account number. They are unmarried and do not have mortgage but do have pep. The average monthly income is 27,668.

• Cluster 3 - It contains a cluster of Females with an average age of 39 who live in town and possess saving account number and current account number. They are married and do not have any mortgage or pep. The average monthly income is 24,047.

• Cluster 4 - It contains a cluster of Males with an average age of 39 who live in inner city and possess current account number but no saving account number. They are married and have mortgage and pep. The average monthly income is 26,359.

• Cluster 5 - It contains a cluster of Males with an average age of 47 who live in inner city and possess saving account number and current account number. They are unmarried and do not have mortgage but do have pep. The average monthly income is 35,419.

Page 8: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 9: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Classifiers Lazy –> IBk

Page 10: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 11: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 12: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 13: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 14: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 15: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale15

Page 16: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

No Prob Target CustID Age

1 0.97 Y 1746 …

2 0.95 N 1024 …

3 0.94 Y 2478 …

4 0.93 Y 3820 …

5 0.92 N 4897 …

… … … …

99 0.11 N 2734 …

100 0.06 N 2422

Use a model to assign score (probability) to each instance

Sort instances by decreasing score

Expect more targets (hits) near the top of the list

3 hits in top 5% of

the list

If there 15 targets

overall, then top 5

has 3/15=20% of

targets

Page 17: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

40% of responses for

10% of cost

Lift factor = 4

80% of responses for

40% of cost

Lift factor = 2Model

Random

Page 18: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 19: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 20: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 21: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 22: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

to impact…

1. Build our predictive model in WEKA Explorer;

2. Use our model to score (predict) which new customers to target in our upcoming advertising campaign;• ARFF file manipulation (hacking), all too common pita…• Excel manipulation to join model output with our customers list

3. Compute the lift chart to assess business impact of our predictive model on the advertising campaign• How are Lift charts built, of all the charts and/or performance

measures from a model this one is ‘on you’ to construct;• Where is the business ‘bang for the buck’?

Page 23: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 24: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 25: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 26: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

You can’t turn data lead into modeling gold – we’re data scientists, not data alchemists…

Page 27: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Motivation: Real world examplesExample (1)

Lesson: Correct data transformation is important!

Page 28: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Motivation: Real world examplesExample (2): KDD Cup 2001

Lesson: A model that uses lots of features can turn out to be

very sub-optimal, however well it is designed!

Page 29: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Motivation: Real world examplesExample (3)

Lesson: Feature selection can be crucial even when the

number of features is small!

Page 30: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Motivation: Real world examplesExample (4)

Lesson: Variations of the same ML method can give vastly

different performances!

Page 31: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 32: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Predictive modeling competitions

Photo by mikebaird, www.flickr.com/photos/mikebaird

Page 33: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Global competitions

1½ weeks 70.8%

Competition closes 77%

State of the art 70%

Predicting HIV viral load

Improved by 10%

Page 34: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Mismatch between those with data and

those with the skills to analyse it

Crowdsourcing

Page 35: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Forecast Error

(MASE)

Existing model

Tourism Forecasting Competition

Aug 9 2 weeks

later

1 month

later

Competition

End

Page 36: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

• neural networks

• logistic regression

• support vector machine

• decision trees

• ensemble methods

• adaBoost

• Bayesian networks

• genetic algorithms

• random forest

• Monte Carlo methods

• principal component analysis

• Kalman filter

• evolutionary fuzzy modeling

Users apply different techniques

Page 37: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

VicRoads has an algorithm they use to forecast travel time on Melbourne freeways (taking into

account time, weather, accidents, etc). Their current model is inaccurate and somewhat

useless. They want to do better (or at least find out about whether it’s possible to do better).

Page 38: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

1 2 3

Upload Submit Evaluate &

Exchange

Page 39: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Use the wizard to post a competition

Page 40: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Participants make their entries

Page 41: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Competitions are judged based on predictive accuracy

Page 42: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Competition Mechanics

Competitions are judged on objective criteria

Page 43: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

KaggleHow They Won It…

Page 44: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 45: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 46: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Three Filesford_train

• 510 Trials, ~1,200 observations each spaced by 0.1 sec -> 604,330 rows

ford_test

• 100 Trials,~1,200 observations/trial, 120,841 rows

example_submission.csv

Page 47: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Junpei Komiyama (#4)

Page 48: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Junpei Komiyama (#4)

Page 49: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Mick Wagner (#2)

Page 50: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Mick Wagner (#2)

Page 51: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Inference (#1)

Page 52: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

VicRoads has an algorithm they use to forecast travel time on Melbourne freeways (taking into

account time, weather, accidents etc). Their current model is inaccurate and somewhat useless.

They want to do better (or at least find out about whether it’s possible to do better).

Page 53: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 54: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

François GUILLEM (#14)

Page 55: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

#1 used Random Forests

Page 56: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 57: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Homework Week 6

Monday Sept. 21st

Upload to site…

http://blog.kaggle.com/category/dojo/Content is 10 pages of interview on how the team(s) built their models, some have multiple interviews;

You will review at least 10 interviews, bounce around do not go sequentially.

1) What model(s) did they use, 2) insights they had that influenced modeling, 3) what feature creation and

selection, 4) other observations. I will cons all these together and upload as shared document on our site.

Page 58: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

5 Minute Break…

Page 59: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Course Project

Page 60: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 61: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

https://www.kaggle.com/c/springleaf-marketing-response

not

Determine whether to send a direct mail piece to a customer

Page 62: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

The Data

Page 63: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

The Rules

Page 64: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 65: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 66: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 67: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

what is the data telling you

Page 68: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 69: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 70: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Data Wrangling

Page 71: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Data

Acquisition

Data

Exploration

Pre-

processing

Feature and

Target

construction

Train/ Test

split

Feature

selection

Model

training

Model

scoring

Model

scoring

Evaluation

Evaluation

Compare

metrics

Page 72: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

• Data preparation step is by far the most time consuming step

0

10

20

30

40

50

60

70

Understanding

of Domain

Understanding

of Data

Preparation of

Data

Data Mining Evaluation of

Results

Deployment of

Results

KDDM steps

relative effort [%] Cabena et al. estimates

Shearer estimatesCios and Kurgan estimates

Page 73: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Out of Class Reading, highly recommended

Page 74: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Out of Class Reading, highly recommended

Page 75: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

1. Do you have domain knowledge?

2. Are your features commensurate?

3. Do you suspect interdependence of features?

4. Do you need to prune the input variables

5. Do you need to assess features individually

6. Do you need a predictor?

7. Do you suspect your data is “dirty”

8. Do you know what to try first?

9. Do you have new ideas, time, computational resources, and enough examples?

10. Do you want a stable solution

Page 76: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 77: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 78: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 79: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale15 15

𝑃 = 0.5𝑃 = 0.5

Page 80: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

15 157 13

𝑃 = 0.5𝑃 = 0.5

𝑃 = 0.35𝑃 = 0.65

Page 81: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale15 1515 15

𝑃 = 0.5𝑃 = 0.510 10

Page 82: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale15 1515 15

𝑃 = 0.5𝑃 = 0.5

Time

T

r

a

i

n

T

e

s

t

Horizontal

Vertical

Page 83: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Data Characterization…

Page 84: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

1. Unique values

2. Most frequent values

3. Highest and lowest values

4. Location and dispersion – gini, statistical test for dispersion

5. Quartiles

Page 85: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

1. Missing values

2. Outliers

3. Coding

4. Constraints

Page 86: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Missing values – UCI machine learning repository, 31 of 68 data sets

reported to have missing values. “Missing” can mean many things…

MAR: "Missing at Random":– usually best case

– usually not true

Non-randomly missing

Presumed normal, so not measured

Causally missing

– attribute value is missing because of other attribute values (or because of

the outcome value!)

Page 87: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 88: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 89: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 90: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Outliers – may indicate ‘bad data’ or it may represent

something scientifically interesting in the data…

Simple working definition: an outlier is an element of a data sequence

S that is inconsistent with expectations, based on the majority of other

elements of S.

Sources of outliers

• Measurement errors

• Other uninteresting anomalous data

• Surprising observations that may be important

Page 91: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Outliers – may indicate ‘bad data’ or it may represent

something scientifically interesting in the data…

Simple working definition: an outlier is an element of a data sequence

S that is inconsistent with expectations, based on the majority of other

elements of S.

Sources of outliers• Insurance company sees niche of sports car enthusiasts, married boomers

with kids and second family car. Low risk, lower rate to attract. Simple case

where outlier carries meaning for modeling…

Page 92: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Outliers can distort the regression results. When an outlier is

included in the analysis, it pulls the regression line towards

itself. This can result in a solution that is more accurate for the

outlier, but less accurate for all the other cases in the data set.

Outliers – may indicate ‘bad data’ or it may represent

something scientifically interesting in the data…

Page 93: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Identify outliers• Question origin, domain knowledge invaluable

• Dispersion – "spread" of a data set, departure from central tendency, use a box plot…

Deal with outliers• Winsorize – Set all outliers to a specified percentile of the data. Not

equivalent to trimming, which simply excludes data. In a Winsorizedestimator, extreme values are instead replaced by certain percentiles (thetrimmed minimum and maximum). Same as clipping in signal processing.

Outliers – may indicate ‘bad data’ or it may represent

something scientifically interesting in the data…

Page 94: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Identify outliers• Question origin, domain knowledge invaluable

• Dispersion – "spread" of a data set, departure from central tendency, use a box plot…

Deal with outliers• Include – Robust statistics, a convenient way to summarize results when

they include a small proportion of outliers. A hot topic for research, seeNIPS 2010 Workshop, Robust Statistical learning (robustml).

Outliers – may indicate ‘bad data’ or it may represent

something scientifically interesting in the data…

Page 95: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

• Entity integrity

• Referential integrity

• Type checking

• Format

• Bounds checking

Constraints

Page 96: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

• weka.filters.unsupervised.instance.RemoveMisclassified

• weka.filters.unsupervised.instance.RemovePercentage

• weka.filters.unsupervised.instance.RemoveRange

• weka.filters.unsupervised.instance.RemoveWithValues

• weka.filters.unsupervised.instance.Resample

Page 97: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

5 Minute Break…

Page 98: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Simple Definition

feature selection problem

Feature extraction

11 .{ ,..., ,..., } { ,..., ,..., }

j mi n i i if selectionf f f f f f

F

F‘ F F‘

1 1 1 1 1.{ ,..., ,..., } { ( ,..., ),..., ( ,..., ),..., ( ,..., )}i n n j n m nf extraction

f f f g f f g f f g f f

Page 99: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 100: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

3 types of methodsFilter Methods

Wrapper Methods

Embedded Methodsdecision trees, random forests

Page 101: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Most learning methods implicitly do feature selection:• Decision Trees: use info gain or gain ratio to decide what attributes to use as

tests. Many features don’t get used.

• neural nets: backprop learns strong connections to some inputs, and near-

zero connections to other inputs.

• kNN, MBL (any similarity based learning): weights in Weighted Euclidean

Distance determine how important each feature is. Weights near zero mean

feature is not used.

• SVMs: maximum margin hyperplane may focus on important features,

ignore irrelevant features.

So why do we need feature selection?

Data Integration

Page 102: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Curse of Dimensionality

exponentially

In many cases the information lost by

discarding variables is made up for by a

more accurate mapping/sampling in the

lower-dimensional space !

Page 103: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Feature Selection and EngineeringOptimality?

This deserves a deeper treatment, which we will cover next week with hands-on exercises in class…

Page 104: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Numerical data• Binning – a mapping to discrete categories;

• Recenter – shift by c where max, min, avg and median shift, the range and

standard deviation will not shift;

• Rescale – multiply everything by d, all measures change;

• Standard ND – recenter, make mean 0, divide all previous values by SD

Character data

• Lower case

• Spellcheck

• Data extraction (e.g. regular expressions)

Coding – shape and enrich…

Page 105: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 106: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

feature

red

blue

green

red

red

green

blue

red blue green

1 0 0

0 1 0

0 0 1

1 0 0

1 0 0

0 0 1

0 1 0

Page 107: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Outlook T empera ture Humidity Windy Play

sunny 85 85 false no

sunny 80 90 true no

overcast 83 78 false yes

rain 70 96 false yes

rain 68 80 false yes

rain 65 70 true no

overcast 64 65 true yes

sunny 72 95 false no

sunny 69 70 false yes

rain 75 80 false yes

sunny 75 70 true yes

overcast 72 90 true yes

overcast 81 75 false yes

rain 71 80 true no

Attributes:

Outlook (overcast, rain, sunny)

Temperature real

Humidity real

Windy (true, false)

Play (yes, no)

OutLook OutLook OutLook Temp Humidity Windy Windy Play Play

overcast rain sunny TRUE FALSE yes no

0 0 1 85 85 0 1 1 0

0 0 1 80 90 1 0 0 1

1 0 0 83 78 0 1 1 0

0 1 0 70 96 0 1 1 0

0 1 0 68 80 0 1 1 0

0 1 0 65 70 1 0 0 1

1 0 0 64 65 1 0 1 0

. . . . . . . . .

. . . . . . . . .

Standard

Spreadsheet

Format

Page 108: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 109: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 110: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Household income

$10.000 $200.000

verylow

low average high veryhigh

Page 111: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Less features, more discrimination ability

concept hierarchies

Page 112: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

• Equal-width (distance) partitioning

uniform grid

• Equal-depth (frequency) partitioning

• Class label based partitioning

Page 113: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

into the user-specified

Page 114: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

[64,67) [67,70) [70,73) [73,76) [76,79) [79,82) [82,85]

Temperature values:

64 65 68 69 70 71 72 72 75 75 80 81 83 85

2 2

Count

4

2 2 20

Page 115: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

[0 – 200,000) … ….

1

Count

Salary in a corporation

[1,800,000 –

2,000,000]

Page 116: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

user-specified nFi number of intervals

Page 117: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

[64 .. .. .. .. 69] [70 .. 72] [73 .. .. .. .. .. .. .. .. 81] [83 .. 85]

Temperature values:

64 65 68 69 70 71 72 72 75 75 80 81 83 85

4

Count

4 4

2

Page 118: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 119: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 119

Page 120: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 120

Page 121: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 121

Page 122: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 122

Page 123: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 123

Page 124: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 124

Page 125: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 125

Page 126: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 126

Page 127: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 127

Page 128: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 128

Page 129: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 129

Page 130: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 130

Page 131: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 131

Page 132: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 132

Page 133: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 133

Page 134: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 134

Page 135: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 135

Page 136: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 136

Page 137: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 137

Page 138: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 138

Page 139: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

4/12/2016 University of Waikato 139

Page 140: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Page 141: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

Domain expertise, play a hunch in terms of feature discrimination

Page 142: Barga Data Science lecture 6

Deriving Knowledge from Data at Scale

That’s all for tonight….