Top Banner
Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta
27

Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

Jan 02, 2016

Download

Documents

Alicia Wilkins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

Data Science from3,209 Feet

John ChandlerUniversity of Montana and Ars Quanta

Page 2: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 3: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 4: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

A Data Scientist Toolkit

• A scripting language (Python, C#, Java, Perl)• A statistical computing language (R, SAS, SPSS)• Database languages/environments (MSSQL, Oracle, Postgres, sqlite)• Distributed computing environment (MapReduce, in many flavors)

Fundamentally we are flipping bits, but this isn’t software development.

Page 5: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 6: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

CRISP-DM, Shearer, 2000

Page 7: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 8: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

CRISP-DM, Shearer, 2000

Page 9: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 10: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 11: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 12: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 13: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 14: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 15: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 16: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 17: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

CRISP-DM, Shearer, 2000

Page 18: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 19: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

Tools for data preparation

• A scripting language (Python, C#, Java)• A statistical computing language (R, SAS, SPSS)• Database languages/environments (MSSQL, Oracle, Postgres, sqlite)• Distributed computing environment (MapReduce)

Page 20: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

CRISP-DM, Shearer, 2000

Page 21: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 22: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

CRISP-DM, Shearer, 2000

Page 23: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 24: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 25: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

CRISP-DM, Shearer, 2000

Page 26: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.
Page 27: Data Science from 3,209 Feet John Chandler University of Montana and Ars Quanta.

Advice

• What is the simplest thing that could possibly work?• Start small and expand scope.• Use general tools. • Bring uncertainty into the spotlight.• Expect iteration.• Clear-eyed evaluation of not competing on data.