Top Banner
Building Guerrilla Analytics Teams Presented by: Enda Ridge, PhD People, Process and Technology for Doing Data Science Copyright Enda Ridge 2014
24
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Building Guerrilla Analytics Teams

Building Guerrilla Analytics Teams

Presented by:

Enda Ridge, PhD

People, Process and Technologyfor Doing Data Science

Copyright Enda Ridge 2014

Page 2: Building Guerrilla Analytics Teams

What this talk is about

• Data Science: expectations and reality

• 3 Drivers for doing Data Science

• Why Data Science projects are so challenging

• Introduction to Guerrilla Analytics

• Building Guerrilla Analytics Capability

Copyright Enda Ridge 2014 1

Guerrilla Analytics

People

ProcessTech

Page 3: Building Guerrilla Analytics Teams

What we hear about Data Science

2Copyright Enda Ridge 2014

“Data is the new science. Big data holds the answers.”

“the sexy job in the next 10 years will be statisticians”

“Data Scientist: The Sexiest Job of the 21st Century”

“Information is the oil of the 21st century, and analytics is the combustion engine.”

http://www.gapminder.org/http://www.statistics.com/data-science-quotes/https://github.com/mbostock/d3/wiki/Gallery

Page 4: Building Guerrilla Analytics Teams

What we really want from Data Science

Copyright Enda Ridge 2014 3

• “I have made data available, now how do I use it?”

Leverage

• “I want to make data available or buy a data product. How do I know it will be worth it?”

Justify

• “I think I have a fraud problem / security breach / etc”

• “Help me better understand my customers”

Ad-hoc

Page 5: Building Guerrilla Analytics Teams

My background

PhD Computer Science

• Design of Experiments for Tuning Algorithms”

Boutique Consultancy

• Social Network Analysis for Fraud

Forensic Data Analytics

• Professional Services

Senior Manager

• Data Science Consulting& Data Product Development

Copyright Enda Ridge 2014 4

Page 6: Building Guerrilla Analytics Teams

Misconception about how we do Data Science

Copyright Enda Ridge 2014 5

Shearer C., The CRISP-DM model: the new blueprint for data mining, J Data Warehousing (2000); 5:13—22

Page 7: Building Guerrilla Analytics Teams

Reality – Guerrilla Analytics

• Disruptions

• Data

• Requirements

• Resources

• Business Rules

• Constraints

• Time

• Toolsets

• People

• Repeatable

• Explainable

• Tested

Copyright Enda Ridge 2014 6

Page 8: Building Guerrilla Analytics Teams

Guerrilla Analytics Workflow

Copyright Enda Ridge 2014 7

Data

• Extract

• Receive

• Load

Analytics

• Transform

• Algorithm

• Consolidate

Insight

• Reports

• Work Products

Disruptions

Page 9: Building Guerrilla Analytics Teams

Some Guerrilla Analytics Principles

• Prefer simple, project structures over heavily documented and complex ones. 1

• Prefer automation with program code over manual graphical approaches. 2

• Link data on the file system, to data in the analytics environment, to data in work products.3

• Version control changes to program code AND data. 4

Copyright Enda Ridge 2014 8

Page 10: Building Guerrilla Analytics Teams

Building Guerrilla Analytics Capability

Copyright Enda Ridge 2014 9

Leverage

Justify

Ad-hoc

Guerrilla Analytics

People

ProcessTech

Page 11: Building Guerrilla Analytics Teams

People Capability

Copyright Enda Ridge 2014 10

People

Hard Skills

Programming

Software Engineering

Visualization

Maths / Stats

Soft Skills

Communication

Domain Knowledge

Mindset

Page 12: Building Guerrilla Analytics Teams

Capability: Data Programming

“Using a programming language to describe and execute data manipulations, data analyses, data visualizations”

Copyright Enda Ridge 2014 11

Guerrilla Environment

• Wide variety of data

• Poor quality data

• Evolving understanding

• Reproduce and repeat

Benefit

• Flexibility

• Consolidation

• Knowledge transfer

• Self describing

Page 13: Building Guerrilla Analytics Teams

Capability: Software Engineering

“the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software”

Copyright Enda Ridge 2014 12

Guerrilla Environment

• Changing data

• Iterations of work products

• Reproduce despite pace

• Correctness despite complexity

Benefit

• Version control

• Testing

• Automation

• Issue/bug tracking

Page 14: Building Guerrilla Analytics Teams

Capability: Domain Knowledge & Communication

Prefer analytics skills with great communication

Analytics

Forensic Accounting

Forensic Accountant

Data Scientist

Copyright Enda Ridge 2014 13

Page 15: Building Guerrilla Analytics Teams

Capability: Mind-set

Guerrilla Environment

• Changing requirements

• Poorly understood data

• Constraints

• Time pressure

• Iterations

• Dead Ends

Required Capability

• Tenacity

• Curiosity

• Problem solving

• Communication

The attitude and approach to work that best matches Guerrilla Analytics

Copyright Enda Ridge 2014 14

Page 16: Building Guerrilla Analytics Teams

TECHNOLOGY

Copyright Enda Ridge 2014 15

Guerrilla Analytics

People

ProcessTech

Page 17: Building Guerrilla Analytics Teams

Common Misconceptions about Technology

“If we use this tech, my team don’t need to code”

“We can productionise all possible data science scenarios”

“We need to invest in a platform to get value from our data”

“We need Big Data technology X”

Copyright Enda Ridge 2014 16

Page 18: Building Guerrilla Analytics Teams

Technology Capability

Copyright Enda Ridge 2014 17

People

Agility

Data Manipulation Environment

Scripting & Command Line

Shared Space

Visualization

Consolidate

Code Libraries

Machine Images

Project Wiki

Process Support

Source Code Control

Issue Tracking

Security

Page 19: Building Guerrilla Analytics Teams

PROCESS

Copyright Enda Ridge 2014 18

Guerrilla Analytics

People

ProcessTech

Page 20: Building Guerrilla Analytics Teams

Guerrilla Analytics Workflow

Copyright Enda Ridge 2014 19

Data

• Extract

• Receive

• Load

Analytics

• Transform

• Algorithm

• Consolidate

Insight

• Reports

• Work Products

Disruptions

Page 21: Building Guerrilla Analytics Teams

Common Misconceptions about Process

“We must document everything”

“We can completely plan a data science job”

“We should track everything in a traditional top-down way”

“Work products must be right first time”

Copyright Enda Ridge 2014 20

Page 22: Building Guerrilla Analytics Teams

Process Capability

Copyright Enda Ridge 2014 21

Data• Extract

• Receive

• Load

Analytics• Transform

• Algorithm

• Consolidate

Insight• Reports

• Work Products

Log Data ReceiptTrack Work

Product VersionsTrack Work

Product Release

Page 23: Building Guerrilla Analytics Teams

Summary

• Leverage

• Justify

• Ad-hoc

Data Science Aims

• Disruptions

• Constraints

• Reproducible, Testable, Explainable

Guerrilla Analytics

Copyright Enda Ridge 2014 22

• Hard Skills

• Soft SkillsPeople Capability

• Analytics Agility

• Consolidation

• Process Support

Technology Capability

• Tracking Data (Inputs)

• Tracking Work Products Creation

• Tracking Outputs

Process Capability

Page 24: Building Guerrilla Analytics Teams

Keep in Touch!

Copyright Enda Ridge 2014 23

@Enda_Ridge

[email protected]

www.guerrilla-analytics.net