Top Banner
1 © All rights reserved to Agile Analytics in Mobile Gaming: lessons learned Volodymyr (Vlad) Kazantsev Head of Data Science at Product Madness 2015
41
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Agile Data Science

1© All rights reserved to

Agile Analytics in Mobile Gaming:lessons learned

Volodymyr (Vlad) KazantsevHead of Data Science at Product Madness

2015

Page 3: Agile Data Science

3

Heart of Vegas in charts

iPad rankings, US iPad rankings, Australia

Page 4: Agile Data Science

4

Data Impact Team

● Ad-hoc analytics and daily fires

● Deep dive analysis;Predictive analytics

● Data Engineering; R&D

Team of 6

Page 5: Agile Data Science

5

Few Examples

A B

A/B TestsCustomer Lifetime Value

days

$ va

lue

Segmentation

group 1 group 2 group 3 group 4

Page 6: Agile Data Science

6

Technology Stack

C++ETL

orchestration

Transformation& Aggregation

SQL

Data Products

Reports

Dashboards

+

Page 7: Agile Data Science

7

Lessons

Page 8: Agile Data Science

8

Lesson 1: Agile Philosophy for Data Science

1

Page 9: Agile Data Science

9

Agile Manifesto

Individuals and interactions over processes and tools

Working software over comprehensive documentation

Customer collaboration over contract negotiation

Responding to change over following a plan

* agilemanifesto.org

Page 10: Agile Data Science

10

Agile Data Science Manifesto

Individuals and interactions over processes and tools

Actionable insights over comprehensive reports

Customer collaboration over project negotiation

Responding to change over following a plan

Page 11: Agile Data Science

11

“If a building doesn’t encourage [collaboration], you’ll lose a lot of innovation and the magic that’s sparked by serendipity” - Steve Jobs

Individuals and interactions over processes and tools

Page 12: Agile Data Science

12

Individuals and interactions over processes and tools

Standing Desks + Easily Available Whiteboard

Page 13: Agile Data Science

13

Agile Principles

Iterative, incremental and evolutionary

Efficient and face-to-face communication

Very short feedback loop and adaptation cycle

Quality focus

- iterations, timeboxed estimates

- no to tasks by email (with no face-to-face)

- daily standups, pair analysis

- verifiable, reproducible findings

Page 14: Agile Data Science

14

Data Science Board

Page 15: Agile Data Science

15

Scrum-Ban in Data Science @ProductMadness

● Weekly cycle

● Daily standup meeting @10am

● ToDo/WIP/Waiting buckets are kept small

● Disruptions to weekly plan are expected

● On-demand planning

Page 16: Agile Data Science

16

Lesson 1: Agile methods in Data Science

1. co-location matter; whiteboard next to your desk

2. Work with decision maker; share preliminary findings

3. Make a research plan; pivot early

4. Book “Findings” meeting before project start

5. MVP for Data Products

6. Do Daily Stand-ups !

Page 17: Agile Data Science

17

Lesson 2: Agile Velocity vs. Acceleration

2

Page 18: Agile Data Science

18

What is Agile Acceleration

Waterfall Scrum

Units of WorkTime IntervalVelocity = ΔVelocity = Acceleration* ΔTime

VS.

Page 19: Agile Data Science

19

a = Fm

I run SQL, copy-paste data to Excel and send it by email

I created a deep neural network to predict high spenders

Page 20: Agile Data Science

20

Case Study: to Git or not to Git

Scripts (ruby, bash, python)Python AppsPython ModulesIPython NotebooksResearch Documents (word)Presentations (powerpoint)Spreadsheets (excel)

Page 21: Agile Data Science

21

Case Study: Git or not to Git

Scripts (ruby, bash, python)Python AppsPython ModulesIPython Notebooks ?Research Documents (word)Slides (powerpoint)Spreadsheets (excel)

Page 22: Agile Data Science

22

Case Study: Git or not to Git

Scripts (ruby, bash, python)Python AppsPython ModulesIPython NotebooksResearch Documents (word)Slides (powerpoint)Spreadsheets (excel)

Page 23: Agile Data Science

23

Remove unnecessary weight

Page 24: Agile Data Science

24

Lesson 2: find the lightest suitable tool1. IPython notebooks: Dropbox over Git2. Google Slides over Powerpoint

Google Slides over Email with images (>2 images)

3. Google Spreadsheets over Excel (for analytics)4. Podio over Jira (for analytics)5. Data Transformations in DWH in SQL over Hadoop6. Don’t copy-paste code in IPython notebooks; use functions;

don’t copy-paste functions in notebooks, use modules

Page 25: Agile Data Science

25

Lesson 3: Focus on Closing the Loop

3

Page 26: Agile Data Science

26

Analytics Loop

Spot Opportunity

Ask the Right Question

Make Decision

Improve the Business

Data Science @work

Page 27: Agile Data Science

27

Analytics Spiral

Ideas & Questions

Data Analysis

Insights

Impact

Page 28: Agile Data Science

28

Data Science Value Pyramid

Store & Query

Reports

Descriptive Analytics

Predictive Analytics

Data Products

* inspired by Agile Data Science, Russell Jurney, O'Reilly Media 2013

Record what Happened

Was it good or bad?

Why did it happen?

What will happen?

Affect the outcome

com

plex

ity

valu

e

Page 29: Agile Data Science

29

Data Science Value Loop

Record what Happened

Was it good or bad?

Why did it happen?

What will happen?

Affect the outcome

Page 30: Agile Data Science

30

Limit the number of Open Loops

90% 90%

75%80%

80%60%

100% 100%

100%100%

0% 0%

Always prefer to have: 90% of tasks are 100% complete

over 100% of tasks are 90% complete

VS.

Page 31: Agile Data Science

31

Lesson 3: Focus on Closing the Loop1. Don’t build predictive models that you can’t act upon. Don’t

analyse stuff that does not help to make a decision

2. The best way to deal with Analytics Spiral is to avoid the spiral. Practise Crack a Case and “what if” method.

3. Climb the Data Value Pyramid fast.Once climbed - optimise the Data Value Loop.

4. Limit the number of “open loops”

Page 32: Agile Data Science

32

Lesson 4: Reproducibility Matters

4

Page 33: Agile Data Science

33

To the and back!

Page 34: Agile Data Science

34

Why?

Boss: “Great! Can you run this for all monthly cohorts?”Because:

Page 35: Agile Data Science

35

Why?

Because:Boss: “Sam is on holiday.Can you re-run his analysis?”

Page 36: Agile Data Science

36

Few IPython Tips

Page 37: Agile Data Science

37

Import all commonly used toolsin one line.

All access and security is abstracted away.Focus on SQL, not data access

formatting and publishing a .png in one line of code

PyCharm has great SQL editor

Page 38: Agile Data Science

38

Lesson 4: Reproducibility

● Get rid of Windows and you get rid of Excel

● ipynb are always shared and versioned;Prefer simple cloud sharing to VCS

● Streamline data access functions

● Cache long-running code and queries

● Develop a common library

Page 39: Agile Data Science

39

In Summary...

Page 40: Agile Data Science

40

Summary

● Agile approach works well in Data Science

● Find the lightest suitable tool for a task

● Reproducibility is not negotiable

● Focus on closing the loop(s)

Page 41: Agile Data Science

41

Questions?

We are Hiring !

volodymyrk

[email protected]