Python Tutorial: Python for Data Science
By Fahad Kamran
Overview● Introduction● Variables and Types● Operators● Containers● Functions● Control Flow● Packages
○ Numpy○ Pandas○ Matplotlib
Overview● Introduction● Variables and Types● Operators● Containers● Functions● Control Flow● Packages
○ Numpy○ Pandas○ Matplotlib
- Easy to learn.- Elegant syntax.- Lots of scientific computing resources.- Very user friendly.
Jupyter Notebook
print(“Hello World”)
Hello World
Overview● Introduction● Variables and Types● Operators● Containers● Functions● Control Flow● Packages
○ Numpy○ Pandas○ Matplotlib
Variables
- Put data in memory, and give it a name.- Created using the assignment operator =
x = 3print(x)3
x = 3 + 3print(x)6
Variables
x = 3
pi = 3.14
day = “Monday”
is_monday = True
# Objects. Comment: Notes for programmers, doesn’t get executed.
Boolean: True/False .
String: Characters, different than a variable.
Float: Real numbers
Int: Integers
Overview● Introduction● Variables and Types● Operators● Containers● Functions● Control Flow● Packages
○ Numpy○ Pandas○ Matplotlib
Operators
- Symbols that carry out computation.- PEMDAS
x = 15y = 4print(x + y) # Addition.19
print(y - x) # Subtraction.-11
print(y * x) # Multiplication.60
Operatorsx = 15y = 4
print(x / y) # Division.3.75
print(x // y) # Floor Division.3
print(x % y) # Modulus (remainder).3
print(y ** y) # Exponent.256
Operatorsx = 15y = 4
print(x == y) # Equality.False
print(x > y) # Greater than.True
print(x <= y) # Less than or Equal.False
print(x != y) # Not Equal.True
Operators
print(True and True)True
print(True and False)False
print(True or False)True
print(False or False)False
Operators
w = “Hello “x = “World!”y = 1 z = 1.1
print(w + x)Hello World!
print(y + z)2.1
print(w + y)Error!
Operators
Arithmetic Operators
+-*///%**
Logical operators are evaluated after arithmetic.
Logical Operators
==>>=!=<<=andor not
Demo
Overview● Introduction● Variables and Types● Operators● Containers● Functions● Control Flow● Packages
○ Numpy○ Pandas○ Matplotlib
Class/Object
- Combine data and behaviors related to the class’ data.- Making our own is out of the scope of this course.- We will use them!
List
- Series of values.
a = [] # Empty list.
b = [1, 2, 3, 4, 5] # List containing 5 elements.
List
- Series of values.
a = [] # Empty list.
b = [1, 2, 3, 4, 5] # List containing 5 elements.
b[0] # Zero-indexed!1
List
- Series of values.
a = [] # Empty list.
b = [1, 2, 3, 4, 5] # List containing 5 elements.
b[0] # Zero-indexed!1
b[3]4
List
0 1 2 3 4 # Index.
b = [1, 2, 3, 4, 5]
- We can get parts of the list with the slice operator : - Optionally, define a range of indices, excludes last element.
b[:1][1]
b[1:3][2, 3]
List
- Access class methods using the dot operator .
x = [1]print(x)[1]
x.append(2) # Add element to end of list.[1, 2]
x.pop() # Get last element in list and remove from list.2print(x)[1]
List
- Reassign parts of a list
x = [1]print(x)[1]x[0] = ‘new element!’print(x)[‘new element!’]
docs.python.org/3/tutorial/datastructures.html
Dictionary / Map
- List of key-value pairs.- Lists but access elements with anything!
a = {} # Empty dict.
a[“key”] = “value”
print(a){“key”: “value”}
Dictionary / Map
person = {“name”: “Jim”,“family_name”: “Harbaugh”
}
person[“name”]Jim
person[“age”] = 55
print(person){“name”: “Jim”, “family_name”: “Harbaugh”, “age”: 55}
Other ContainersContainers not covered include:
● Sets● Tuples● Generators
Creating Copies- What does an assignment statement do?
x = [1,2,3]y = x # Make a copy of x.y[2] = 100 # Reassign part of y.print(x)
Creating Copies- What does an assignment statement do?
x = [1,2,3]y = x # Make a copy of x.y[2] = 100 # Reassign part of y.print(x)[1, 2, 100]
Creating Copies- What does an assignment statement do?
x = [1,2,3]y = x # Make a copy of x.y[2] = 100 # Reassign part of y.print(x)[1, 2, 100]
Changing y still changes x
Creating Copies- What does an assignment statement do?
x = [1,2,3]y = x # Make a copy of x.y[2] = 100 # Reassign part of y.print(x)[1, 2, 100]
Changing y still changes x
Creating Copies- Fixing the assignment issue
x = [1,2,3]y = x[:] # Make a new copy of x.y[2] = 100 # Reassign part of y.print(x)
Creating Copies- Fixing the assignment issue
x = [1,2,3]y = x[:] # Make a new copy of x.y[2] = 100 # Reassign part of y.print(x)[1, 2, 3]
Creating Copies- Fixing the assignment issue
x = [1,2,3]y = x[:] # Make a new copy of x.y[2] = 100 # Reassign part of y.print(x)[1, 2, 3]
Exercise #1
- What are the outputs of each line of code?
fib = [“hello”, “hi”, “hey”, “hola”, “heyo”, “sup”, “howdy”]
fib[1:5]# Q1.
fib[3:]# Q2.
fib[:3]# Q3.
Demo
Overview● Introduction● Variables and Types● Operators● Containers● Functions● Control Flow● Packages
○ Numpy○ Pandas○ Matplotlib
Functions
- Group related code that performs a task.- Reusable blocks of code.- Take in arguments.
Functionsn_hours = 181 // 60n_minutes = 181 % 60print(n_hours) print(n_minutes)
n_hours = 72 // 60n_minutes = 72 % 60 print(n_hours)
print(n_minutes)
n_hours = 451 // 60n_minutes = 451 % 60print(n_hours) print(n_minutes)
Functionsn_hours = 181 // 60n_minutes = 181 % 60print(n_hours) print(n_minutes)
n_hours = 72 // 60n_minutes = 72 % 60print(n_hours) print(n_minutes)
n_hours = 451 // 60n_minutes = 451 % 60print(n_hours) print(n_minutes)
Functionsn_hours = 181 // 60n_minutes = 181 % 60print(n_hours) print(n_minutes)
n_hours = 72 // 60n_minutes = 72 % 60print(n_hours) print(n_minutes)
n_hours = 451 // 60n_minutes = 451 % 60print(n_hours) print(n_minutes)
def hours_and_minutes(minutes):n_hours = minutes // 60n_minutes = minutes % 60return n_hours,n_minutes
print(hours_and_minutes(181))print(hours_and_minutes(72))print(hours_and_minutes(451))
Functions
def hours_and_minutes(minutes):n_hours = minutes // 60n_minutes = minutes % 60return n_hours,n_minutes
About to define a function
Functions
def hours_and_minutes(minutes):n_hours = minutes // 60n_minutes = minutes % 60return n_hours,n_minutes
Name of the function.
Functions
def hours_and_minutes(minutes):n_hours = minutes // 60n_minutes = minutes % 60return n_hours,n_minutes
Parameters: Stuff you may need to compute with.
Functions
def hours_and_minutes(minutes):n_hours = minutes // 60n_minutes = minutes % 60return n_hours,n_minutes
Code the function performs.
Whitespace/tabs matter!
Define scope
Functions
def hours_and_minutes(minutes):n_hours = minutes // 60n_minutes = minutes % 60return n_hours,n_minutes
n_hours = 10 ret_hours,ret_minutes = hours_and_minutes(81)print(n_hours)
Functions
def hours_and_minutes(minutes):n_hours = minutes // 60n_minutes = minutes % 60return n_hours,n_minutes
n_hours = 10 ret_hours,ret_minutes = hours_and_minutes(81)print(n_hours)10
Functions
def hours_and_minutes(minutes):n_hours = minutes // 60n_minutes = minutes % 60print(n_hours,n_minutes)
ret_hours,ret_minutes = hours_and_minutes(81)1,21print(ret_hours)Noneret_hours + 1Error
Exercise #2
def fahrenheit_to_celsius(f): # Converts from °F to °C. # Where: °C = (°F - 32) * 5/9
# Your code here.
return c
farhenheit_to_celsius(98.6)37.0
Demo
Overview● Introduction● Variables and Types● Operators● Containers● Functions● Control Flow● Packages
○ Numpy○ Pandas○ Matplotlib
If
Condition Statement. Must result in True/False
If/Elif Block. Evaluated if condition is true.
Else Block. Evaluated if no conditions are true.
COLONS
- Only execute a block of code if a condition is True.
if age > 21:print(“Come in!”)
elif age == 20:print(“Try again next year bud.”)
else:print(“You’ve got some time!”)
For-Loop
- Perform operations using item(s) in a container.- Cannot modify items.
container = [1, 2, 3]
For-Loop
- Perform operations using item(s) in a container.- Cannot modify items.
container = [1, 2, 3]
for x in container:print(x)
123
For-Loop
range(n) function creates container of size nuseful for repeating an action n times
for x in range(5):print(‘Hello world!’)
Hello world!Hello world!Hello world!Hello world!Hello world!
List Comprehensions
lst = [1,2,3,4,5]squared_lst = []for num in lst:
squared_lst.append(num**2)squared_list[1,4,9,16,25]
List Comprehensions
lst = [1,2,3,4,5]squared_lst = []for num in lst:
squared_lst.append(num**2)squared_list[1,4,9,16,25]
squared_lst2 = [num**2 for num in lst]squared_list[1,4,9,16,25]
List Comprehensions
lst = [1,2,3,4,5]squared_even_lst = []for num in lst:
if num % 2 == 0:squared_even_lst.append(num**2)
squared_even_lst[4,16]
squared_even_lst2 = [num**2 for num in lst if num % 2 == 0]squared_even_lst2[4,16]
While-Loop
- Perform operations until condition is met.
i = 0
while i < 3:print(i)i += 1
012
Exercise #3
Please sort the following data, into two new lists: one for words and the other numbers.
data = [“michigan”, 1, “stats”, 3.33, “hello”]
# Your code here.
At the end we’ll have two lists one containing “michigan”, “stats”, and “hello”; the other list containing 1, 3.33.
Exercise #4
Define a function only_odd, which takes in a dictionary, and returns a new dictionary with the key:value pairs where the value is an odd number
dct = {“Janice”: 3, “Fred”: 2, “Gregg”: 8, “Gloria”: 1}
# Your code here.
only_odd(dct){“Janice”: 3, “Gloria”: 1}
Demo
Overview● Introduction● Variables and Types● Operators● Containers● Functions● Control Flow● Packages
○ Numpy○ Pandas○ Matplotlib
Module
- Collect variables, functions, classes into a module.- Sometimes called: library or package.
Module
- Collect variables, functions, classes into a module.- Sometimes called: library or package.
import math
Module
- Collect variables, functions, classes into a module.- Sometimes called: library or package.
import mathmath.sqrt(16)4
Module
- Collect variables, functions, classes into a module.- Sometimes called: library or package.
import mathmath.sqrt(16)4
from math import sqrtsqrt(16)4
NumPy
- Tensor/matrix operation library.- Lists, but more dimensions, and faster.- NOTE: Normally you would need to install this library.
import numpy as npx = np.array([1, 2, 3])
NumPy
- Tensor/matrix operation library.- Lists, but more dimensions, and faster.- NOTE: Normally you would need to install this library.
import numpy as npx = np.array([1, 2, 3])
np.dot([1, 2], [1, 2]) # Dot Product.4
NumPy Array with Operators
x = np.array([1, 2, 3])
x + 1[2, 3, 4]
x + x[2, 4, 6]
x ** 2[2, 4, 9]
NumPy Array with Operators
x = np.array([1, 2, 3])
x > 1[False, True, True]
x == 1[True, False, False]
NumPy Statistics
np.sum([[0, 1], [2, 3]])6
np.sum([[0, 1], [2, 3]], axis=1)[1, 5]
np.max([[0, 1], [2, 3]])3
np.max([[0, 1], [2, 3]], axis=0)[2, 3]
NumPy Statistics
np.sum([[0, 1], [2, 3]])6
np.sum([[0, 1], [2, 3]], axis=1)[1, 5]
np.max([[0, 1], [2, 3]])3
np.max([[0, 1], [2, 3]], axis=0)[2, 3]
https://docs.scipy.org/doc/numpy-1.13.0/reference/routines.statistics.html
Creating Sequences in NumPy
np.arange(3)[0, 1, 2]
np.arange(0, 4, 2) # (start, stop, step).[0, 2]
Creating Arrays in NumPy
np.zeros(5)arr([0, 0, 0, 0, 0])
np.ones(5)*3arr([3, 3, 3, 3, 3])
np.random.random((2,2)) arr([[.810, .081],[.687, .541]])
Array Indexing
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])a[0,0]1
a[1,:]arr([5, 6, 7, 8])
a[1:,2:]arr([[7, 8], [11, 12]])
Array Indexing
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])a[0,0] = 4a[1] = [4,8,9,1]print(a)arr([[4,2,3,4],[4,8,9,1],[9,10,11,12]])
Array Indexing
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])b = a[:2,1:3]print(b)arr([[2, 3], [6, 7]])
b[0,0] = 1print(a)arr([[1,1,3,4],[5,6,7,8],[9,10,11,12]])
Array Indexing
a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])b = np.array(a[:2,1:3])print(b)arr([[2, 3], [6, 7]])
b[0,0] = 1print(a)arr([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
Demo
Pandas
- Database package.- Import data from:
- Excel (csv, tsv, etc.)
- Stata, sas, matlab
- SQL,
- Etc.
import pandas as pd
Pandas
- This will be a high-level summary of the package.- We’ll look at stuff that you can follow along with our data.
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Pandas
Useful Functionsdf.values #converts a pandas table to a 2D numpy array
df.iloc[[0,2], [1,3]] #returns rows 1-2, columns 1-3 of df
df.loc[df[‘A’] > 4] #returns rows where A is greater than 4
Pandas
import pandas as pddataset = pd.read_csv("nbaallelo.csv")dataset.describe()
https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html
Matplotlib
- Package for plotting data.
import matplotlib.pyplot as plt
Matplotlib
- Package for plotting data.
- Collection of functions that make changes to a figure.- The package keeps track of the current figure,- Therefore changes are all to the same figure.
Matplotlib
- Package for plotting data.
import matplotlib.pylab as plt
x = np.arange(5)y = x ** 2plt.plot(x, y)
Matplotlib
- Package for plotting data.
import matplotlib.pylab as plt
plt.figure() # Let’s start plotting on a new figure.
Matplotlib
- Package for plotting data.
import matplotlib.pylab as plt
x = np.arange(5)y = x ** 2plt.plot(x, y)
plt.xlabel(“x”)plt.ylabel(“y”)plt.title(“Matplotlib Demo!”)
Matplotlib
- Package for plotting data.
import matplotlib.pylab as plt
x = np.arange(5)y = x ** 2plt.scatter(x, y)
plt.xlabel(“x”)plt.ylabel(“y”)plt.title(“Matplotlib Demo!”)
Matplotlib
- For most functions that plot data,- They accept a format string after the data.
plt.plot(x, y, “b-”)
Matplotlib
- For most functions that plot data,- They accept a format string after the data.
plt.plot(x, y, “b-”)
- “b-” is the default.
- The pattern of the string is:- Color (b: blue),
- Pattern (-: straight line).
Matplotlib
plt.plot(x, y, “b-”)
Matplotlib
plt.plot(x, y, “b-”)
Colors
- The world is more diverse than a Crayola 8-pack.
Colors
- HTML Color Codes: method for describing colors used in websites.- Specify how much of each primary color to use in mixed color (R, G, B).
Colors
- HTML Color Codes: method for describing colors used in websites.- Specify how much of each primary color to use in mixed color (R, G, B).- Slang: hex, hex-code.
“#RRGGBB”
Colors
- HTML Color Codes: method for describing colors used in websites.- Specify how much of each primary color to use in mixed color (R, G, B).- Slang: hex, hex-code.
“#RRGGBB”
[0, 255]
Colors
- HTML Color Codes: method for describing colors used in websites.- Specify how much of each primary color to use in mixed color (R, G, B).- Slang: hex, hex-code.
“#RRGGBB”
[0, 255][0, FF]
16.7m colors
Matplotlib
- Package for plotting data.
import matplotlib.pylab as plt
x = np.arange(5)y = x ** 2plt.plot(x, y, color=”#D7BDE2”)
Matplotlib
- Package for plotting data.
import matplotlib.pylab as plt
x = np.arange(5)y = x ** 2plt.plot(x, y, color=”#D7BDE2”, label=”Purple”)plt.legend()
Matplotlib
import matplotlib.pylab as plt
x = np.arange(5)y = x ** 2plt.plot(x, y, color=”#D7BDE2”, label=”Purple”)plt.plot(x, x**3, label=”Blue”)plt.legend()
Matplotlib
import matplotlib.pylab as plt
x = np.arange(5)y = x ** 2plt.plot(x, y, color=”#D7BDE2”, label=”Purple”)
plt.figure()
plt.plot(x, x**3, label=”Blue”)plt.legend()
Demo
Other Useful Packages● SciPy
○ Scientific computing package often used in conjunction with the others
● Scikit-Learn○ Machine learning packages for classification, regression, model selection, etc. ○ Algorithms include linear regression, k-nearest neighbors, support vector machines, etc.
● PyTorch○ User-friendly deep learning python package
We’re done! Now what?● Learning Python for 3 hours through slides and demos is not enough● Go home and practice!● Continue learning what different packages have to offer● Try Python out to analyze some datasets you might have
○ Get comfortable with pandas, numpy, and matplotlib