Top Banner
Data Science Retreat Berlin, Mar 2014 http://datascienceretreat.com / Introduction to the first Data Science school in Europe Plus advice for upcoming data scientists
61

Data science-retreat-how it works plus advice for upcoming data scientists

Aug 20, 2015

Download

Documents

Jose Quesada
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data science-retreat-how it works plus advice for upcoming data scientists

Data Science Retreat

Berlin, Mar 2014

http://datascienceretreat.com/

Introduction to the first Data Science school in Europe

Plus advice for upcoming data scientists

Page 2: Data science-retreat-how it works plus advice for upcoming data scientists

Who Am I

Twitter: @quesada

Before: Consulting on predictive

models of ecommerce (CLV),

data scientist at GetYourGuide

Page 3: Data science-retreat-how it works plus advice for upcoming data scientists

What this talk is about: Two problems

•Making the jump from junior to senior data

science is hard (solution: data science retreat)

•Acquiring the skillset, even with killer online

courses, is hard (solution: meerkat method)

Page 4: Data science-retreat-how it works plus advice for upcoming data scientists

contents

• Data Science retreat

• The meerkat method

• Data Science retreat for companies

• Hiring

• Getting tailored courses at your location

• Advice to anyone on their path to be a data scientist

• Advice to companies growing a data team

Page 5: Data science-retreat-how it works plus advice for upcoming data scientists

Problem: Making the jump from junior to

senior data science is hard

Page 6: Data science-retreat-how it works plus advice for upcoming data scientists

It’s too hard for companies to find data scientists

“It takes 150 phone interviews to find someone who is good enough to bring in to continue on-site”

Alex Kagoshima, Pivotal, Berlin

Page 7: Data science-retreat-how it works plus advice for upcoming data scientists

People applying to Data scientist jobs have no experience

• Vincent Granville:

“There is no shortage of data scientists. For every linkedin

Job, there are several hundreds applications on average”

Page 8: Data science-retreat-how it works plus advice for upcoming data scientists

Data scientists need to program (5 year experience)

Stefan Schmidt (Amazon Berlin):

“It takes us months to fill our positions; we hire world-wide

for Berlin openings. Most profiles cannot program at the level

we need. We have engineers, but the data scientist needs to be

able to understand large projects and commit code”

Page 9: Data science-retreat-how it works plus advice for upcoming data scientists

Truth is, data scientist is a senior role

• Often, advising to the CEO directly

• This is why so many people with strong profiles and

lots of coursera courses cannot find jobs

Page 10: Data science-retreat-how it works plus advice for upcoming data scientists

The gap from junior to senior

• Junior:

• Has a technical degree

• Has done some courses online

• Has never worked with data that generates value to

companies

• Can apply ‘recipes’, but not think creatively about data

sources and algorithms

Page 11: Data science-retreat-how it works plus advice for upcoming data scientists

The gap from junior to senior

• Junior:

• Has a technical degree

• Has done some courses online

• Has never worked with data that generates value to

companies

• Can apply ‘recipes’, but not think creatively about data

sources and algorithms

This profile has no practical value for most companies

Page 12: Data science-retreat-how it works plus advice for upcoming data scientists

Data Science Retreat

Page 13: Data science-retreat-how it works plus advice for upcoming data scientists

The story

Page 14: Data science-retreat-how it works plus advice for upcoming data scientists

TEAM: Chief-Data-Scientist-level mentors

Page 15: Data science-retreat-how it works plus advice for upcoming data scientists

Formulating the analytical problem• Finding the question

• Translating something vague into a

dependent measure and an actual set of

predictors

• What generates business value?

• The Business Model Canvas to design a data

product

• Key performance indicators; examples,

measurement, improvement

• Most business problems are not very well

defined. How do we make them actionable?

• Analyzing big success stories in data science

• Getting Buy-in

Page 16: Data science-retreat-how it works plus advice for upcoming data scientists

Getting data (APIs, feature engineering)

• Using APIs

• Using databases

• Parsing html; web scrapping

• Transforming data (reshape)

• Finding APIs

• Feature engineering

• Avoiding autocorrelation

• Removing features with low variance

• Detecting outliers

• Exploratory analyses

• Measuring predictor importance

Page 17: Data science-retreat-how it works plus advice for upcoming data scientists

Finding insights, making predictions

• Regression

• Linear regression, penalized

models

• non-linear regression

• SVM

• K-nearest neighbors

• regression trees + rule-based

models (random forests)

Page 18: Data science-retreat-how it works plus advice for upcoming data scientists

Finding insights, making predictions

• Classification

• Logistic regression, linear

classification

• nonlinear classification models

• Classification trees + rule-

bases models (random forests)

• SVM

• K-nearest neighbors

• naive bayes

Page 19: Data science-retreat-how it works plus advice for upcoming data scientists

R

• R language fundamentals

• data structures (including

data.table)

• subsetting

• input/output

• functions/control flow

• vectorization

• split-apply-combine

advanced R

functional programming in R

Profiling

object systems

packaging

Rcpp

Page 20: Data science-retreat-how it works plus advice for upcoming data scientists

R

• advanced R

• functional programming in

R

• Profiling

• object systems

• packaging

• Rcpp

Page 21: Data science-retreat-how it works plus advice for upcoming data scientists

data at scale

• MapReduce

• MapReduce, Google 2004.

• Applications, extensions. Beyond

MapReduce.

• Big Data analysis

• Preparation and configuration

• Hadoop cluster overview.

• Practice: Uploading / downloading

/ moving files around, executing

jobs, checking for completion /

failure, etc.

Page 22: Data science-retreat-how it works plus advice for upcoming data scientists

data at scale

• Hive / Pig

• Defining a Hive table, querying a Hive table.

• Integrating R with Hive.

• An introduction to Pig.

• Mahout

• Executing clustering tasks. Visualizing the

results with R.

• Executing an item-to-item recommender.

• Cascading / Pattern

• Data flow modeling using PyCascading.

• Executing Machine Learning "Pattern"

algorithms.

Page 23: Data science-retreat-how it works plus advice for upcoming data scientists

Location: Microsoft ventures berlin

Page 24: Data science-retreat-how it works plus advice for upcoming data scientists

Methodology: portfolio project

• Ten students per batch

• Pair programming and code reviews with mentors (guild model)

• Datasets come from companies (non-NDA only)

• Portfolio project, where the fellow demonstrates what he can do

end-to-end to deliver value

• Weekly presentation training to improve communication to

non-technical stakeholders (video feedback)

Page 25: Data science-retreat-how it works plus advice for upcoming data scientists

Who we are looking for

• Passion for generating insights from data

• Familiarity with trends in data growth, open-source platforms, and public

data sets.

• From familiarity to strong knowledge of statistical methods

• Some experience with statistical languages and packages, including Mahout, R

or python with pandas

• Some familiarity with visualization software and techniques (including

Tableau)

• Preferably, experience working hands-on with large-scale data sets

• Excellent written and verbal communications skills

Page 26: Data science-retreat-how it works plus advice for upcoming data scientists

Acquiring the skillset, even with killer online

courses, is hard

Page 27: Data science-retreat-how it works plus advice for upcoming data scientists
Page 28: Data science-retreat-how it works plus advice for upcoming data scientists
Page 29: Data science-retreat-how it works plus advice for upcoming data scientists

Problem projectexercise

Page 30: Data science-retreat-how it works plus advice for upcoming data scientists

Problem ProjectExercise

Problem ExerciseProject

Page 31: Data science-retreat-how it works plus advice for upcoming data scientists
Page 32: Data science-retreat-how it works plus advice for upcoming data scientists
Page 33: Data science-retreat-how it works plus advice for upcoming data scientists
Page 34: Data science-retreat-how it works plus advice for upcoming data scientists

Concepts in meerkat method

• Learner

• Mentor

• Scorpion

• Maiming the scorpion

Page 35: Data science-retreat-how it works plus advice for upcoming data scientists

Problem ProjectExercise

Page 36: Data science-retreat-how it works plus advice for upcoming data scientists
Page 37: Data science-retreat-how it works plus advice for upcoming data scientists

advantages

• No need to find the right tutorial/book/whatever

• Spend more time at the border of your capability

• You Save time doing exercises that would be too easy

Page 38: Data science-retreat-how it works plus advice for upcoming data scientists

Advantages (cont)

• Higher project completion rates: all projects must have a

concrete output, so you will see your own progress in

tangible ways

• You will have an Easier time to demonstrate progress to

yourself and to others (the Mentor vouches for the

Learner).

• You will get more hands-on training than in other methods

Page 39: Data science-retreat-how it works plus advice for upcoming data scientists

Interested in being a learner?

AppLY

Page 40: Data science-retreat-how it works plus advice for upcoming data scientists

Data Science Retreat for companies

Page 41: Data science-retreat-how it works plus advice for upcoming data scientists

Sponsors

Page 42: Data science-retreat-how it works plus advice for upcoming data scientists

Meet Stefan, he's the Chief data scientist of bigCompany.

They wanted to go 'all in' with big data, but needed people capable of taming it.

Page 43: Data science-retreat-how it works plus advice for upcoming data scientists

He realized it's not easy to find Data Scientists.

Page 44: Data science-retreat-how it works plus advice for upcoming data scientists

What do you mean ‘sub-

second queries by Monday’?

Page 45: Data science-retreat-how it works plus advice for upcoming data scientists

What do you mean ‘improve predictions two

standard deviations’?

Page 46: Data science-retreat-how it works plus advice for upcoming data scientists

€?

Page 47: Data science-retreat-how it works plus advice for upcoming data scientists
Page 48: Data science-retreat-how it works plus advice for upcoming data scientists
Page 49: Data science-retreat-how it works plus advice for upcoming data scientists

Then Laura, Stefan’s friend, pointed him to

Data Science Retreat

… an intensive course helping selected fellows ramp-up fast for a career in data science. “Tell

me more…”. Stefan was very interested.

Page 50: Data science-retreat-how it works plus advice for upcoming data scientists

Stefan could interview ten

data scientists that were

as good as Ben.

He hired three, an they jumped into

their roles with little training.

Stefan was Ecstatic!

Page 51: Data science-retreat-how it works plus advice for upcoming data scientists

How It works

• As a sponsor you pay 7000€ in advance + 3000€ after the

data scientist worked on-site for 3 months and you know you

want to keep him.

• students who take the sponsorship agree to work for a reduced

salary (50%) the first 3 months. The salary savings during the

internship should cover the cost of the sponsorship. When the

students finish the program, no one has any obligations.

Page 52: Data science-retreat-how it works plus advice for upcoming data scientists

• You prepay 7000€ and become a sponsor.

• At this point, you don't know the students.

• But as a committed sponsor you participate actively

during the retreat, see the student's presentations, go

out for lunch with them, etc

• Thanks to these activities, you have now developed

strong relationships and know more about the students

than what would come out in interviews.

How It works: an example

Page 53: Data science-retreat-how it works plus advice for upcoming data scientists

How it works, an example (contd)

• You have set your target on a killer candidate: Klaas. You

make an offer, he accepts, and he starts working at your

location

• Klaas gets paid 50% of his negotiated salary. If Klaas’

60000€/year, that is a 5000€/mo cost, and produces

2500€ * 3 months = 7500€ savings, which covers your

initial investment of 7000€.

Page 54: Data science-retreat-how it works plus advice for upcoming data scientists

Data Science Retreat

Contact:Jose Quesada, PhD,

Director, Data Science Retreat [email protected]

DO you want to be a sponsor?

Page 55: Data science-retreat-how it works plus advice for upcoming data scientists

Advice to anyone on their path to be a data scientist

• Try to find a mentor

• Spend as much time at the border of your ability

• Practice communication

• Having a culture that can integrate such individuals is as

hard as finding them. Interview your companies

• How do you move from being a junior person to being the

'CEO wisperer'? Spend time with people who are

Page 56: Data science-retreat-how it works plus advice for upcoming data scientists

Getting tailored courses at your location

• We listen to the people you need to train before we

design the course

• We will start with a dataset that is important for your

company. Lacking that, we’ll bring a public that is

relevant

• Enterprisey courses are supposed to be non-effective

Page 57: Data science-retreat-how it works plus advice for upcoming data scientists

Hire somebody who’s better at engineering and teach him data science or

hire somebody who’s better with data and teach him engineering?

• Is your culture ready? Because if you manage to attract

someone senior enough, they will sense if it's not

• The problems you have must be a good match for the data

scientists. People are extremely specialized, more so after

PhDs. If you have say graph theory/recommendation

problems, and hire someone with a time series background,

things will take a while no matter who good he is in his

field

Page 58: Data science-retreat-how it works plus advice for upcoming data scientists

And Stefan’s CEO?

Page 59: Data science-retreat-how it works plus advice for upcoming data scientists

He’s optimistic about thisnew big-data deployment!

Page 60: Data science-retreat-how it works plus advice for upcoming data scientists

Thank you for your attentionJose Quesada: [email protected]

@quesada

@datascienceret

Page 61: Data science-retreat-how it works plus advice for upcoming data scientists

Sponsors