Intro to Multitasking

brian [email protected]

http://brianschrader.com

intro to multitasking(with Python)

what is multitasking?

quick diversion: processors

• this is Apple’s A8 CPU

• it has CPU 2 cores

• each core can do 1 thing at once

• switches between tasks when there’s downtime

• "preemptive multitasking"

processors

quick diversion: operating systems

some OS terminology• process

• a running program

• isolated memory (it can’t see another process’ memory)

• all processes require their own memory

• starting and stopping processes can take time

• thread

• a series of instructions that can be executed asynchronously inside a process

• a process can have multiple threads

• every thread must belong to a process

• threads all share memory with their parent process

• quick to start up, easy to shut down

some OS terminology

what does this mean to me?

• processes and threads are the building blocks for multitasking

• it’s important to know the differences and tradeoffs for each

let’s do some multitasking

the problem

• you’re building a program to fetch a bunch of wikipedia articles (~1000)

the solution (no multitasking)import requests

links = [] with open('articles.txt') as f: links = f.readlines()

articles = [] for link in links: response = requests.get(link) articles.append(response.text)

time: ~5m 54s

look at all that cpu we aren’t using

the waiting is the hardest part

http://blog.codinghorror.com/the-infinite-space-between-words/

http://blog.codinghorror.com/the-infinite-space-between-words/

the solution (threads)import requests from multiprocessing.pool import ThreadPool


def do_get(link): response = requests.get(link) return response.text

with ThreadPool(4) as p: articles = p.map(do_get, links)

time: ~1m 43s

cpu usage is better this time

the solution (processes)import requests from multiprocessing.pool import Pool


def do_get(link): response = requests.get(link) return response.text

with Pool(4) as p: articles = p.map(do_get, links)

time: ~1m 43s

lots of processes this time

it’s faster!

why?

• because the CPU can spend more time actually doing something instead of waiting around

• threads: whenever there’s downtime, it switches to a different thread

• processes: whenever it can, the processors share the workload

quick samples in other languages

ExecutorService executorService = Executors.newThreadExecutor();

executorService.execute(new Runnable() { public void run() { //Do your heavy lifting here. } }); executorService.shutdown();

threads in Java

threads in Objective-C

dispatch_async( dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0), ^(void){

//Do your heavy lifting here. });

threads in good ol’ C#include <stdio.h> #include <pthread.h>

void *doSomething(void *arg) { // Do your heavy lifting here. }

int main(void) { pthread_t pth; // this is our thread identifier /* Create worker thread */ pthread_create(&pth,NULL,doSomething,"processing..."); /* wait for our thread to finish */ pthread_join(pth, NULL); return 0; }

wait… one question

question

• in the examples both multithreading and multiprocessing took basically the same amount of time

• they were both ~4-5x faster than the normal case

• why use one over the other?

answer: context

good candidates

• lots of simple, or independent tasks

• no shared state

• I/O bound tasks are easier

• CPU bound tasks are harder*

more examples

the problem

• you’re building a program that can take a given number and sum all it’s previous values

the solution (no multitasking)def do_something(n): val = 0 for i in range(n): for x in range(i): val += x return val

do_something(2000) do_something(15000) do_something(10000) do_something(1000) do_something(12000)

time: ~19s

1 cpu: 99% used

from multiprocessing.pool import Pool

def do_something(n): val = 0 for i in range(n): for x in range(i): val += x return val

with Pool(4) as p: nums = [2000, 15000, 10000, 1000, 12000] p.map(do_something, nums)

the solution (processes)

time: ~13s

use all the CPUs!

the solution (threads)remember that catch I talked about? here it is.

from multiprocessing.pool import ThreadPool

def do_something(n): val = 0 for i in range(n): for x in range(i): val += x return val

with ThreadPool(4) as p: nums = [2000, 15000, 10000, 1000, 12000] p.map(do_something, nums)

time: ~20s

why did this happen?

Python does threads different

• the quick, quick, quick version:

• Python threads are real OS threads

• but only 1 can execute at a time

• result:

• using threads for CPU bound tasks is slower

WARNING: this is specific to Python!

it pays to be familiar with your tools

some other types of multitasking

coroutines• 1 thread but with code that is good to it’s

neighbors

• functions can pause, and when they do, the program switches to another function for a bit

• useful in situations where lots of waiting is involved (i.e. networking)

• example: Python’s gevent or async syntax

asynchronous functions

• single threaded

• every function is treated as asynchronous. there is no such thing as synchronous code.

• example: Javascript… all of it.

stuff I didn’t cover

• semaphores

• locks

• race-conditions

• basically all the scary stuff

multitasking done right can be really helpful; when done wrong it can be disastrous

and use Python 3 - it’s easier there

go out and do it!

brian [email protected]

http://brianschrader.com

Intro to Multitasking

Software