Top Banner
Copyright (C) 2010, David Beazley, http://www.dabeaz.com In Search of the Perfect Global Interpreter Lock 1 David Beazley http://www.dabeaz.com @dabeaz Presented at RuPy 2011 Poznan, Poland October 15, 2011
75

In Search of the Perfect Global Interpreter Lock

May 10, 2015

Download

Technology

Presentation on the Python/Ruby Global Interpreter Lock at RuPy 2011. October 14, 2011. Poznan, Poland.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

In Search of the Perfect Global Interpreter Lock

1

David Beazleyhttp://www.dabeaz.com

@dabeaz

Presented at RuPy 2011Poznan, Poland

October 15, 2011

Page 2: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Introduction

• As many programmers know, Python and Ruby feature a Global Interpreter Lock (GIL)

• More precise: CPython and MRI

• It limits thread performance on multicore

• Theoretically restricts code to a single CPU

2

Page 3: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

An Experiment• Consider a trivial CPU-bound function

def countdown(n): while n > 0: n -= 1

3

• Run it once with a lot of workCOUNT = 100000000 # 100 millioncountdown(COUNT)

• Now, divide the work across two threadst1 = Thread(target=count,args=(COUNT//2,))t2 = Thread(target=count,args=(COUNT//2,))t1.start(); t2.start()t1.join(); t2.join()

Page 4: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

An Experiment• Some Ruby

def countdown(n) while n > 0 n -= 1 endend

4

• SequentialCOUNT = 100000000 # 100 millioncountdown(COUNT)

• Subdivided across threadst1 = Thread.new { countdown(COUNT/2) }t2 = Thread.new { countdown(COUNT/2) }t1.joint2.join

Page 5: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Expectations

• Sequential and threaded versions perform the same amount of work (same # calculations)

• There is the GIL... so no parallelism

• Performance should be about the same

5

Page 6: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results

6

• Ruby 1.9 on OS-X (4 cores)Sequential Threaded (2 threads)

: 2.46s: 2.55s (~ same)

Page 7: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results

• Python 2.7

7

Sequential Threaded (2 threads)

: 6.12s: 9.28s (1.5x slower!)

• Ruby 1.9 on OS-X (4 cores)Sequential Threaded (2 threads)

: 2.46s: 2.55s (~ same)

Page 8: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results

• Python 2.7

8

Sequential Threaded (2 threads)

: 6.12s: 9.28s (1.5x slower!)

• Ruby 1.9 on OS-X (4 cores)Sequential Threaded (2 threads)

: 2.46s: 2.55s (~ same)

• Question: Why does it get slower in Python?

Page 9: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results

9

• Ruby 1.9 on Windows Server 2008 (2 cores)Sequential Threaded (2 threads)

: 3.32s: 3.45s (~ same)

Page 10: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results

• Python 2.7

10

Sequential Threaded (2 threads)

: 6.9s: 63.0s (9.1x slower!)

• Ruby 1.9 on Windows Server 2008 (2 cores)Sequential Threaded (2 threads)

: 3.32s: 3.45s (~ same)

Page 11: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results

• Python 2.7

11

Sequential Threaded (2 threads)

: 6.9s: 63.0s (9.1x slower!)

• Ruby 1.9 on Windows Server 2008 (2 cores)Sequential Threaded (2 threads)

: 3.32s: 3.45s (~ same)

• Why does it get that much slower on Windows?

Page 12: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Experiment: Messaging

12

• A request/reply server for size-prefixed messages

ServerClient

• Each message: a size header + payload

• Similar: ZeroMQ

Page 13: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

An Experiment: Messaging

13

• A simple test - message echo (pseudocode)

def client(nummsg,msg): while nummsg > 0: send(msg) resp = recv() sleep(0.001) nummsg -= 1

def server(): while True: msg = recv() send(msg)

Page 14: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

An Experiment: Messaging

14

• A simple test - message echo (pseudocode)

def client(nummsg,msg): while nummsg > 0: send(msg) resp = recv() sleep(0.001) nummsg -= 1

def server(): while True: msg = recv() send(msg)

• To be less evil, it's throttled (<1000 msg/sec)

• Not a messaging stress test

Page 15: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

An Experiment: Messaging

15

• A test: send/receive 1000 8K messages

• Scenario 1: Unloaded server

ServerClient

• Scenario 2 : Server competing with one CPU-thread

ServerClient

CPU-Thread

Page 16: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results• Messaging with no threads (OS-X, 4 cores)

16

CPython 2.7Ruby 1.9

: 1.26s: 1.29s: 1.29s

Page 17: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results• Messaging with no threads (OS-X, 4 cores)

17

CPython 2.7Ruby 1.9

: 1.26s: 1.29s: 1.29s

• Messaging with one CPU-bound thread*

CPython 2.7 Ruby 1.9

: 1.16s (~8% faster!?): 12.3s (10x slower): 42.0s (33x slower)

• Hmmm. Curious.* On Ruby, the CPU-bound thread was also given lower priority

Page 18: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results• Messaging with no threads (Linux, 8 CPUs)

18

CPython 2.7Ruby 1.9

: 1.13s: 1.18s: 1.18s

Page 19: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results• Messaging with no threads (Linux, 8 CPUs)

19

CPython 2.7Ruby 1.9

: 1.13s: 1.18s: 1.18s

• Messaging with one CPU-bound thread

CPython 2.7 Ruby 1.9

: 1.11s (same): 1.60s (1.4x slower) - better: 5839.4s (~5000x slower) - worse!

Page 20: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Results• Messaging with no threads (Linux, 8 CPUs)

20

CPython 2.7Ruby 1.9

: 1.13s: 1.18s: 1.18s

• Messaging with one CPU-bound thread

CPython 2.7 Ruby 1.9

: 1.11s (same): 1.60s (1.4x slower) - better: 5839.4s (~5000x slower) - worse!

• 5000x slower? Really? Why?

Page 21: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

The Mystery Deepens• Disable all but one CPU core

21

Python 2.7 (4 cores+hyperthreading)Python 2.7 (1 core)

: 9.28s: 7.9s (faster!)

• Messaging with one CPU-bound thread

Ruby 1.9 (4 cores+hyperthreading)Ruby 1.9 (1 core)

: 42.0s: 10.5s (much faster!)

• ?!?!?!?!?!?

• CPU-bound threads (OS-X)

Page 22: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Better is Worse• Change software versions

22

Python 2.7 (Messaging)Python 3.2 (Messaging)

: 12.3s: 20.1s (1.6x slower)

• Let's downgrade to Ruby 1.8 (Linux)

Ruby 1.9 (Messaging)Ruby 1.8.7 (Messaging)

: 42.0: 10.0s (4x faster)

• Let's upgrade to Python 3 (Linux)

• So much for progress (sigh)

Page 23: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

What's Happening?

• The GIL does far more than limit cores

• It can make performance much worse

• Better performance by turning off cores?

• 5000x performance hit on Linux?

• Why?

23

Page 24: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Why You Might Care

• Must you abandon Python/Ruby for concurrency?

• Having threads restricted to one CPU core might be okay if it were sane

• Analogy: A multitasking operating system (e.g., Linux) runs fine on a single CPU

• Plus, threads get used a lot behind the scenes (even in thread alternatives, e.g., async)

24

Page 25: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Why I Care

• It's an interesting little systems problem

• How do you make a better GIL?

• It's fun.

25

Page 26: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Some Background

• I have been discussing some of these issues in the Python community since 2009

26

http://www.dabeaz.com/GIL

• I'm less familiar with Ruby, but I've looked at its GIL implementation and experimented

• Very interested in commonalities/differences

Page 27: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 27

A Tale of Two GILs

Page 28: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Thread Implementation

• System threads (e.g., pthreads)

• Managed by OS

• Concurrent execution of the Python interpreter (written in C)

28

• System threads (e.g., pthreads)

• Managed by OS

• Concurrent execution of the Ruby VM (written in C)

Page 29: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Alas, the GIL

• Parallel execution is forbidden

• There is a "global interpreter lock"

• The GIL ensures that only one thread runs in the interpreter at once

• Simplifies many low-level details (memory management, callouts to C extensions, etc.)

29

Page 30: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

GIL Implementation

30

int gil_locked = 0;mutex_t gil_mutex;cond_t gil_cond;

void gil_acquire() { mutex_lock(gil_mutex); while (gil_locked) cond_wait(gil_cond); gil_locked = 1; mutex_unlock(gil_mutex);}void gil_release() { mutex_lock(gil_mutex); gil_locked = 0; cond_notify(); mutex_unlock(gil_mutex);}

mutex_t gil;

void gil_acquire() { mutex_lock(gil);}void gil_release() { mutex_unlock(gil);}

Simple mutex lock

Condition variable

Page 31: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Thread Execution Model• The GIL results in cooperative multitasking

31

Thread 1

Thread 2

Thread 3

block block block block block

• When a thread is running, it holds the GIL

• GIL released on blocking (e.g., I/O operations)

run

runrun

run

run

release GIL

acquire GIL

release GIL

acquire GIL

Page 32: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Threads for I/O

• For I/O it works great

• GIL is never held very long

• Most threads just sit around sleeping

• Life is good

32

Page 33: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Threads for Computation

• You may actually want to compute something!

• Fibonacci numbers

• Image/audio processing

• Parsing

• The CPU will be busy

• And it won't give up the GIL on its own

33

Page 34: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

CPU-Bound Switching

• Releases and reacquires the GIL every 100 "ticks"

• 1 Tick ~= 1 interpreter instruction

34

• Background thread generates a timer interrupt every 10ms

• GIL released and reacquired by current thread on interrupt

Page 35: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Python Thread Switching

35

CPU BoundThread

Run 100ticks

Run 100ticks

Run 100ticks

• Every 100 VM instructions, GIL is dropped, allowing other threads to run if they want

• Not time based--switching interval depends on kind of instructions executed

relea

se

acqu

ire

relea

se

acqu

ire

relea

se

acqu

ire

Page 36: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Ruby Thread Switching

36

CPU BoundThread

Run Run

TimerThread

Timer (10ms) Timer (10ms)

relea

se

acqu

ire

relea

se

acqu

ire

• Loosely mimics the time-slice of the OS

• Every 10ms, GIL is released/acquired

Page 37: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

A Common Theme• Both Python and Ruby have C code like this:

37

void execute() { while (inst = next_instruction()) { // Run the VM instruction ... if (must_release_gil) { GIL_release();

/* Other threads may run now */ GIL_acquire(); } }}

• Exact details vary, but concept is the same

• Each thread has periodic release/acquire in the VM to allow other threads to run

Page 38: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Question

38

if (must_release_gil) { GIL_release();

/* Other threads may run now */ GIL_acquire(); }

• Short answer: Everything!

• What can go wrong with this bit of code?

Page 39: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 39

Pathology

Page 40: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Thread Switching• Suppose you have two threads

40

• Thread 1 : Running

• Thread 2 : Ready (Waiting for GIL)

Thread 1Running

Thread 2 READY

Page 41: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Thread Switching• Easy case : Thread 1 performs I/O (read/write)

41

• Thread 1 : Releases GIL and blocks for I/O

• Thread 2 : Gets scheduled, starts running

Thread 1Running

Thread 2 READY

I/O

pthreads/OS

scheduleRunning

BLOCKED

acquire GIL

releaseGIL

Page 42: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Thread Switching• Tricky case : Thread 1 runs until preempted

42

Thread 1Running

Thread 2 READY

preem

pt

pthreads/OS

releaseGIL

Which thread runs?

???

???

Page 43: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Thread Switching• You might expect that Thread 2 will run

43

• But you assume the GIL plays nice...

Thread 1Running

Thread 2 READY

preem

pt

pthreads/OS

releaseGIL

Runningschedule

READY

acquire GIL

Page 44: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Thread Switching• What might actually happen on multicore

44

Thread 1Running

Thread 2 READY

preem

pt

pthreads/OS

releaseGIL

schedule

Running

acquire GIL

fails (GIL locked)

READY

• Both threads attempt to run simultaneously

• ... but only one will succeed (depends on timing)

Page 45: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Fallacy

45

if (must_release_gil) { GIL_release();

/* Other threads may run now */ GIL_acquire(); }

• This code doesn't actually switch threads

• It might switch threads, but it depends

• What operating system

• # cores

• Lock scheduling policy (if any)

Page 46: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Fallacy

46

if (must_release_gil) { GIL_release(); sleep(0);

/* Other threads may run now */ GIL_acquire(); }

• This doesn't force switching (sleeping)

• It might switch threads, but it depends

• What operating system

• # cores

• Lock scheduling policy (if any)

Page 47: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Fallacy

47

if (must_release_gil) { GIL_release(); sched_yield()

/* Other threads may run now */ GIL_acquire(); }

• Neither does this (calling the scheduler)

• It might switch threads, but it depends

• What operating system

• # cores

• Lock scheduling policy (if any)

Page 48: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

A Conflict

• There are conflicting goals

• Python/Ruby - wants to run on a single CPU, but doesn't want to do thread scheduling (i.e., let the OS do it).

• OS - "Oooh. Multiple cores." Schedules as many runnable tasks as possible at any instant

• Result: Threads fight with each other

48

Page 49: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Multicore GIL Battle

49

• Python 2.7 on OS-X (4 cores)Sequential Threaded (2 threads)

: 6.12s: 9.28s (1.5x slower!)

Thread 1100 ticks

preem

pt

preem

pt

preem

pt

100 ticks

Thread 2

...

release

schedule

READY

Eventually...

READY

release

run

pthreads/OS

acquire acquire

fail

READY

schedule fail

READY

• Millions of failed GIL acquisitions

Page 50: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Multicore GIL Battle

50

• You can see it! (2 CPU-bound threads)

Why >100%?

• Comment: In Python, it's very rapid

• GIL is released every few microseconds!

Page 51: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

I/O Handling• If there is a CPU-bound thread, I/O bound

threads have a hard time getting the GIL

51

Thread 1 (CPU 1) Thread 2 (CPU 2)

Network PacketAcquire GIL (fails)

run

Acquire GIL (fails)

Acquire GIL (fails)

Acquire GIL (success)

preempt

preempt

preempt

preempt

run

sleep

Might repeat 100s-1000s of times

run

run

run

Page 52: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Messaging Pathology

52

• Messaging on Linux (8 Cores)

Ruby 1.9 (no threads)Ruby 1.9 (1 CPU thread)

: 1.18s: 5839.4s

• Locks in Linux have no fairness

• Consequence: Really hard to steal the GIL

• And Ruby only retries every 10ms

Page 53: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Let's Talk Fairness

53

• Fair-locking means that locks have some notion of priorities, arrival order, queuing, etc.

Lock t1 t2 t3 t4 t5waiting

t0running

Lock t2 t3 t4 t5 t0waiting

t1running

release

• Releasing means you go to end of line

Page 54: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Effect of Fair-Locking

54

• Ruby 1.9 (multiple cores)Messages + 1 CPU Thread (OS-X)Messages + 1 CPU Thread (Linux)

• Question: Which one uses fair locking?

: 42.0s: 5839.4s

Page 55: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Effect of Fair-Locking

55

• Ruby 1.9 (multiple cores)Messages + 1 CPU Thread (OS-X)Messages + 1 CPU Thread (Linux)

• Benefit : I/O threads get their turn (yay!)

: 42.0s (Fair): 5839.4s

Page 56: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Effect of Fair-Locking

56

• Ruby 1.9 (multiple cores)Messages + 1 CPU Thread (OS-X)Messages + 1 CPU Thread (Linux)

• Benefit : I/O threads get their turn (yay!)

: 42.0s (Fair): 5839.4s

• Python 2.7 (multiple cores)

2 CPU-Bound Threads (OS-X)2 CPU-Bound Threads (Windows)

: 9.28s: 63.0s

• Question: Which one uses fair-locking?

Page 57: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Effect of Fair-Locking

57

• Ruby 1.9 (multiple cores)Messages + 1 CPU Thread (OS-X)Messages + 1 CPU Thread (Linux)

• Benefit : I/O threads get their turn (yay!)

: 42.0s (Fair): 5839.4s

• Python 2.7 (multiple cores)

2 CPU-Bound Threads (OS-X)2 CPU-Bound Threads (Windows)

: 9.28s: 63.0s (Fair)

• Problem: Too much context switching

Page 58: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Fair-Locking - Bah!

58

• In reality, you don't want fairness

• Messaging Revisited (OS X, 4 Cores)Ruby 1.9 (No Threads)Ruby 1.9 (1 CPU-Bound thread)

: 1.29s: 42.0s (33x slower)

• Why is it still 33x slower?

• Answer: Fair locking! (and convoying)

Page 59: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Messaging Revisited

59

• Go back to the messaging server

def server(): while True: msg = recv() send(msg)

Page 60: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Messaging Revisited

60

• The actual implementation (size-prefixed messages)

def server(): while True: size = recv(4) msg = recv(size) send(size) send(msg)

Page 61: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Performance Explained

61

• What actually happens under the covers

def server(): while True: size = recv(4) msg = recv(size) send(size) send(msg)

GIL releaseGIL releaseGIL releaseGIL release

• Why? Each operation might block

• Catch: Passes control back to CPU-bound thread

Page 62: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Performance Illustrated

62

CPU BoundThread

run

TimerThread

10ms

I/O Thread

10ms 10ms 10ms

DataArrives

recv recv send send done

run run run run run

10ms

• Each message has 40ms response cycle

• 1000 messages x 40ms = 40s (42.0s measured)

Page 63: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 63

Despair

Page 64: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

A Solution?

• Yes, yes, everyone hates threads

• However, that's only because they're useful!

• Threads are used for all sorts of things

• Even if they're hidden behind the scenes

64

Don't use threads!

Page 65: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

A Better Solution

• It's probably not going away (very difficult)

• However, does it have to thrash wildly?

• Question: Can you do anything?

65

Make the GIL better

Page 66: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

GIL Efforts in Python 3

• Python 3.2 has a new GIL implementation

• It's imperfect--in fact, it has a lot of problems

• However, people are experimenting with it

66

Page 67: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Python 3 GIL• GIL acquisition now based on timeouts

67

Thread 1

Thread 2 READY

running

wait(gil, TIMEOUT)

release

runningIOWAIT

data arrives

wait(gil, TIMEOUT)

5ms

drop_request

• Involves waiting on a condition variable

Page 68: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Problem: Convoying• CPU-bound threads significantly degrade I/O

68

Thread 1

Thread 2 READY

running

run

data arrives

• This is the same problem as in Ruby

• Just a shorter time delay (5ms)

data arrives

running

READYrun

release

running

READY

data arrives

5ms 5ms 5ms

Page 69: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Problem: Convoying

• You can directly observe the delays (messaging)

69

Python/Ruby (No threads)Python 3.2 (1 Thread)Ruby 1.9 (1 Thread)

: 1.29s (no delays): 20.1s (5ms delays): 42.0s (10ms delays)

• Still not great, but problem is understood

Page 70: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com 70

Promise

Page 71: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Priorities

• Best promise : Priority scheduling

• Earlier versions of Ruby had it

• It works (OS-X, 4 cores)

71

Ruby 1.9 (1 Thread)Ruby 1.8.7 (1 Thread)Ruby 1.8.7 (1 Thread, lower priority)

: 42.0s: 40.2s: 10.0s

• Comment: Ruby-1.9 allows thread priorities to be set in pthreads, but it doesn't seem to have much (if any) effect

Page 72: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Priorities

• Experimental Python-3.2 with priority scheduler

• Also features immediate preemption

• Messages (OS X, 4 Cores)

72

Python 3.2 (No threads)Python 3.2 (1 Thread)Python 3.2+priorities (1 Thread)

: 1.29s: 20.2s: 1.21s (faster?)

• That's a lot more promising!

Page 73: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

New Problems

• Priorities bring new challenges

• Starvation

• Priority inversion

• Implementation complexity

• Do you have to write a full OS scheduler?

• Hopefully not, but it's an open question

73

Page 74: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Final Words

• Implementing a GIL is a lot trickier than it looks

• Even work with priorities has problems

• Good example of how multicore is diabolical

74

Page 75: In Search of the Perfect Global Interpreter Lock

Copyright (C) 2010, David Beazley, http://www.dabeaz.com

Thanks for Listening!

• I hope you learned at least one new thing

• I'm always interested in feedback

• Follow me on Twitter (@dabeaz)

75