Top Banner
Why Python? (For Stats Why Python? (For Stats People) People) @__mharrison__ @__mharrison__ © 2013 © 2013
30

Why Python (for Statisticians)

Aug 21, 2015

Download

Technology

Matt Harrison
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Why Python (for Statisticians)

Why Python? (For Stats Why Python? (For Stats People)People)

@__mharrison__@__mharrison__© 2013© 2013

Page 2: Why Python (for Statisticians)

About MeAbout Me

● 12+ years Python12+ years Python● Worked in Data Analysis, HA, Search, Worked in Data Analysis, HA, Search,

Open Source, BI, and StorageOpen Source, BI, and Storage● Author of multiple Python BooksAuthor of multiple Python Books

Page 3: Why Python (for Statisticians)

BookBook

Page 4: Why Python (for Statisticians)

BookBook

Treading on Python Volume 1Treading on Python Volume 1 meant to meant to make people proficient in Python quicklymake people proficient in Python quickly

Page 5: Why Python (for Statisticians)

Why Python?Why Python?

Page 6: Why Python (for Statisticians)

General Purpose LanguageGeneral Purpose Language

““I’d rather do math in a general-purpose I’d rather do math in a general-purpose language than do general-purpose language than do general-purpose programming in a math language.”programming in a math language.”

John D CookJohn D Cook

Page 7: Why Python (for Statisticians)

Who's Using Python?Who's Using Python?

● Startups (on HN)Startups (on HN)● Data Scientists (Strata)Data Scientists (Strata)● Big CompaniesBig Companies

Page 8: Why Python (for Statisticians)

WhoWho● GoogleGoogle

● NasaNasa

● ILMILM

● RedhatRedhat

● FinanceFinance

● InstagramInstagram

● PinterestPinterest

● YoutubeYoutube

● ......

Page 9: Why Python (for Statisticians)

Open SourceOpen Source

Free in both senses of the wordFree in both senses of the word

Page 10: Why Python (for Statisticians)

Batteries IncludedBatteries Included

● TextText● NetworkNetwork● JSONJSON● Command LineCommand Line● FilesFiles● XMLXML

Page 11: Why Python (for Statisticians)

Large CommunityLarge CommunityPyPi - PYthon Package IndexPyPi - PYthon Package Index

● WebWeb

● DatabaseDatabase

● GUIGUI

● ScientificScientific

● Network ProgrammingNetwork Programming

● GamesGames

Page 12: Why Python (for Statisticians)

Large CommunityLarge Community

● User GroupsUser Groups● PyLadiesPyLadies● ConferencesConferences

Page 13: Why Python (for Statisticians)

LocalLocal

● utahpython.org - 2nd Thurs. 7pmutahpython.org - 2nd Thurs. 7pm● Utah Open Source ConferenceUtah Open Source Conference

Page 14: Why Python (for Statisticians)

ToolingTooling

● EditorsEditors● TestingTesting● ProfilingProfiling● DebuggingDebugging● DocumentationDocumentation

Page 15: Why Python (for Statisticians)

Optimizes for Programmer TimeOptimizes for Programmer Time

““We We shouldshould forget about small forget about small efficiencies, say about 97% of the time: efficiencies, say about 97% of the time: premature optimization is the root of premature optimization is the root of all evil.all evil.””

Donald KnuthDonald Knuth

Page 16: Why Python (for Statisticians)

Executable PseudocodeExecutable Pseudocodefunction quicksort(array)function quicksort(array) if length(array) ≤ 1 if length(array) ≤ 1 return array // an array of zero or one elements is already sorted return array // an array of zero or one elements is already sorted select and remove a pivot element pivot from 'array' // see '#Choice of select and remove a pivot element pivot from 'array' // see '#Choice of pivot' belowpivot' below create empty lists less and greater create empty lists less and greater for each x in array for each x in array if x ≤ pivot then append x to less' if x ≤ pivot then append x to less' else append x to greater else append x to greater return concatenate(quicksort(less), list(pivot), quicksort(greater)) // return concatenate(quicksort(less), list(pivot), quicksort(greater)) // two recursive callstwo recursive calls

http://en.wikipedia.org/wiki/Quicksorthttp://en.wikipedia.org/wiki/Quicksort

Page 17: Why Python (for Statisticians)

Executable PseudocodeExecutable Pseudocode>>> def quicksort(array):>>> def quicksort(array):... if len(array) <= 1:... if len(array) <= 1:... return array... return array... pivot = array.pop(len(array)/2)... pivot = array.pop(len(array)/2)... lt = []... lt = []... gt = []... gt = []... for item in array:... for item in array:... if item < pivot:... if item < pivot:... lt.append(item)... lt.append(item)... else:... else:... gt.append(item)... gt.append(item)... return quicksort(lt) + [pivot] + quicksort(gt)... return quicksort(lt) + [pivot] + quicksort(gt)

Page 18: Why Python (for Statisticians)

But...But...

Python has Python has TimsortTimsort. Optimized for real . Optimized for real world (takes advantage of inherent order) world (takes advantage of inherent order) and written in C. (Stolen by Java, and written in C. (Stolen by Java, Android, and Octave)Android, and Octave)

Page 19: Why Python (for Statisticians)

Multi-paradigm LanguangeMulti-paradigm Languange

● ImperativeImperative● Object OrientedObject Oriented● FunctionalFunctional

Page 20: Why Python (for Statisticians)

ImperativeImperative>>> def sum(items):>>> def sum(items):... total = 0... total = 0... for item in items:... for item in items:... total = total + item... total = total + item... return total... return total

>>> sum([2, 4, 8])>>> sum([2, 4, 8])1414

Page 21: Why Python (for Statisticians)

OOOO>>> class Summer:>>> class Summer:... def __init__(self):... def __init__(self):... self.items = []... self.items = []... def add_item(self, item):... def add_item(self, item):... self.items.append(item)... self.items.append(item)... def sum(self):... def sum(self):... return sum(self.items)... return sum(self.items)

>>> s = Summer()>>> s = Summer()>>> s.add_item(2)>>> s.add_item(2)>>> s.add_item(3)>>> s.add_item(3)>>> s.sum()>>> s.sum()55

Page 22: Why Python (for Statisticians)

FunctionalFunctional>>> import operator>>> import operator>>> sum = lambda x: reduce(operator.add, x)>>> sum = lambda x: reduce(operator.add, x)

>>> sum([4, 8, 22])>>> sum([4, 8, 22])3434

Page 23: Why Python (for Statisticians)

Why Not Python?Why Not Python?

Page 24: Why Python (for Statisticians)

SlowSlow

Sometimes you have to optimize. Good C Sometimes you have to optimize. Good C integrationintegration

Page 25: Why Python (for Statisticians)

If it ain't broke don't fix itIf it ain't broke don't fix it

Don't replace existing solutions for funDon't replace existing solutions for fun

Page 26: Why Python (for Statisticians)

R has more depthR has more depth

Though Python is catching up in some Though Python is catching up in some areasareas

Page 27: Why Python (for Statisticians)

Going ForwardGoing Forward

Page 28: Why Python (for Statisticians)

IPython NotebookIPython Notebook

● Notebook w/ integrated graphsNotebook w/ integrated graphs

Page 29: Why Python (for Statisticians)

LibrariesLibraries● Numpy - matrix mathNumpy - matrix math

● scipy - scientific librariesscipy - scientific libraries

● scipy.stats - statsscipy.stats - stats

● statsmodel - modelingstatsmodel - modeling

● pandas - dataframepandas - dataframe

● matplotlib - graphingmatplotlib - graphing

● scikit.learn - mlscikit.learn - ml

Page 30: Why Python (for Statisticians)

That's allThat's all

Questions? Tweet meQuestions? Tweet me

For beginning Python secrets see For beginning Python secrets see Treading on Python Volume 1Treading on Python Volume 1

@__mharrison__@__mharrison__http://hairysun.comhttp://hairysun.com