Top Banner
APPLIED DATA SCIENCE Giovanni Lanzani – Chief Science Officer GoDataDriven @gglanzani
25

Giovanni Lanzani GoDataDriven

Jan 22, 2018

Download

Data & Analytics

BigDataExpo
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Giovanni Lanzani GoDataDriven

APPLIED DATA SCIENCE

Giovanni Lanzani – Chief Science Officer

GoDataDriven

@gglanzani

Page 2: Giovanni Lanzani GoDataDriven

WHO AM I

Italy

01

Leiden

University

02

KPMG

03

GoDataDriven

04

Page 3: Giovanni Lanzani GoDataDriven
Page 4: Giovanni Lanzani GoDataDriven

WHAT IS MACHINE LEARNING

Page 5: Giovanni Lanzani GoDataDriven

LEARNING FROM DATA

• You have some (lots) of data

• You need to generalize

Page 6: Giovanni Lanzani GoDataDriven

BEST MODEL

• Which one would you choose here?

• It’s about making a tradeoff

• This trade off is the most important job of the PO

• A 100% correct answer might not exist!!!

Page 7: Giovanni Lanzani GoDataDriven

WHAT’S DATA SCIENCE

Page 8: Giovanni Lanzani GoDataDriven

ULTIMATELY

• It’s about creating value from data

• Using Machine Learning, Advanced Analytics, and visualization

Page 9: Giovanni Lanzani GoDataDriven

WHEN YOU SAY DATA SCIENCE, COMPANIES UNDERSTAND

• All the things big data

• Predictive modeling & Advanced Analytics

• More money

• Do all the cool things the others are doing

Page 10: Giovanni Lanzani GoDataDriven

HOW TO GET THERE

Page 11: Giovanni Lanzani GoDataDriven

TRADITIONAL DATA WAREHOUSE

ARCHITECTURE

EDW

Data consumer

Web app

Dashboard /Reporting

TraditionalBusiness app

Page 12: Giovanni Lanzani GoDataDriven

AND NOW?

?

Data consumer

Web app

Dashboard /Reporting

TraditionalBusiness app

API

Page 13: Giovanni Lanzani GoDataDriven

WHAT COMPANIES GOT

• A lot of POCs

• A lot of screenshots/presentations/dashboards on a laptop

• Nice stories to tell to their network, about those screenshots and especially those dashboards

• Headaches with data and infra even more scattered

Page 14: Giovanni Lanzani GoDataDriven

BUT…

• We got a data scientist working on trees, and forests

• Neural networks!

• Deep learning!!!

Page 15: Giovanni Lanzani GoDataDriven

WHAT DO COMPANIES ACTUALLY NEED

• Put things into production

• They don’t teach that in any data science course or MOOC (that I know)

Page 16: Giovanni Lanzani GoDataDriven

THE THREE HURDLES

Credit to Jon Shave gdd.li/lavaredo

Page 17: Giovanni Lanzani GoDataDriven

OVERSIMPLIFYING

Requirements

DataSources

ExplorationModeling

Products

Feedback

Data scientist MLengineer

Dataengineer

Dataengineer

🤦🤦♀️🤦🤦

Customers

Page 18: Giovanni Lanzani GoDataDriven

KAGGLE CURSE

• gdd.li/toldYouSo

• Many data scientists approach the problem at hand with a Kaggle-like mentality: delivering the best model in absolute terms, no matter what the practical implications are.

• In reality it's not the best model that we implement, but the one that combines quality and practicality: a continuous balancing act

• Netflix competition

Page 19: Giovanni Lanzani GoDataDriven

SOLVING THEM

Page 20: Giovanni Lanzani GoDataDriven

BUSINESS CASE

Business case for

• True Positives

• True Negatives

Cost of

• False Positives

• False Negatives

Page 21: Giovanni Lanzani GoDataDriven

DATA

Data {insert something here}

should be pro grade

Page 22: Giovanni Lanzani GoDataDriven

SKILLS

• Participate in actually building production quality systems OR being proficient enough in R or python to hack together a prototype on a very small dataset?

• Supply of the second group keeps growing while demand is flat or shrinking

• Especially as executives get burned by “data scientists” who don't know how to help them build things of value

Page 23: Giovanni Lanzani GoDataDriven

HIRING

• Companies that are not engineering driven, often have trouble hiring good technical people

• The “IQ” test is not really representative of applied data science

• At GoDataDriven we do a “at home, at your convenience” assessment

• Real dataset, real business question, real product

• Models are software: treat them as such

Page 24: Giovanni Lanzani GoDataDriven

TAKEAWAYS

• POs should know “their stuff”

• Automate all the data movements

• Hire data scientists that are good at programming (or hire machine learning engineers)

Page 25: Giovanni Lanzani GoDataDriven

QUESTIONS?

• We’re hiring

• Data & Machine Learning Engineers!

[email protected]