Concurrency in Python Vishal Sapre
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 1/21
Concurrency in Python
Vishal Sapre
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 2/21
What is concurrency ?
•
Wikipedia: – In computer science, concurrency is a property of systems in which
several computations are executing simultaneously, and potentially interacting witheach other.
•
Concurrency is one of the most well researched and one of the mostdifficult subjects in Computer Science today.
– Its “difficult” to express a sequential process in a parallel fashion, without rethinkingand reengineering system architecture itself.
– Retrofitting concurrency on sequential processes is NOT EASY.
• Literally hundreds of papers have been presented belonging to variousaspects of Concurrency.
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 3/21
Why is concurrency needed?
Performance: Most machines today have multiple compute resources available.
Our software should be able make use of these resources.
Flexibility to interact with outside world: Any decently „significant‟ software project interacts with the external world.
The nature of things in the external world can and will be highly asynchronous. (e.g. HTTP requests,Users interactions with the software (UI /command line), interrupts from an external device, events in asimulation...etc)
Ideally, software should be able to do its own “stuff” and at the “same time” cater to interactions with
external world.
As a means of survival in the near future: The financial worth of a company‟s products may depend on the use of concurrency methods.
Our job security as software engineers may depend on our know-how of concurrency methods.
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 4/21
Concurrency primitives in Python
This talk focuses on options available within Python.
Following are the usually accepted concurrency options in Python:
Multi-Threading
Multi-Processing
Distributed Computing
C extensions
Cooperative Multitasking
Alternate Python interpreter
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 5/21
Concurrency primitives in Python
Multi-Threading
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 6/21
Multi-threading
Distribute work across multiple threads
Python has „real‟ OS threads available for use
Posix threads on Unix / Linux
Windows threads on Windows
Very easy to setup and use (will be shown shortly)
Memory wise cheap (because data is shared)
For IO bound applications, they are very good
GUI Networking
Database operations
Best used for heavily IO bound operations !!!
Best avoided for heavily CPU bound operations !!!
Concurrency primitives in Python
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 7/21
Multi-threading
The interpreter (python.exe) is shared among threads and only one thread uses it at a time (Demo 1)
The python interpreter was never designed to be thread-safe
Threads were bolted onto an existing interpreter
Python decided to provide a thin wrapper around OS threads (since OSes already have threads)
Allows C extensions to be written without worrying about thread safety issues
Eases interpreter maintenance (remember Python is mostly maintained by volunteers)
To share the interpreter, each thread:
Acquires a lock on the interpreter (called the Global Interpreter Lock or GIL) and then does its “stuff”
Releases the lock: after every „n‟ interpreter operations OR while waiting for I/O OR while sleeping OR when it exits
Sends a signal to the OS; OS performs a context switch and tries to schedule other available threads
As a ramification, “thread contention” results for “CPU bound” operations
OS Context Switch time is nondeterministic and “mostly” greater than the time for which the originalthread waits before reacquiring the GIL.
So, the thread that had the lock initially, mostly holds the interpreter until its work is done.
While other threads „fight‟ to get the interpreter.
Affects performance negatively and acutely
Concurrency primitives in Python
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 8/21
Concurrency primitives in Python
Multi-Processing
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 9/21
Multi-Processing
Distribute work across multiple processes (multiple python.exe invocations)
“multiprocessing” module in Python 2.6+ Create new python processes
Follows the „threading‟ API very closely Mostly a drop-in replacement for „threading‟ module if multi-processing is required
Uses „fork()‟ on Unix / Linux and „CreateProcess()‟ on Windows
Uses “cPickle” to pickle objects to send to new process on Windows:
only „pickelable‟ entities are allowed to be exchanged between process
Parallel Python Full fledged framework for parallelizing applications in Python using processes
“subprocess” module in Python 2.4+ Communicate using subprocess.PIPE and stdin, stdout, stderr
Idea is to manage child processes, not do multiprocessing.
API is not geared towards multiprocessing, data safety etc.
Concurrency primitives in Python
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 10/21
Multi-Processing
Setting up a new process is an expensive operation
No shared stuff, all memory has to be copied for the new process Large data exchange (e.g. 1 list containing 1million chart points or 10 lists, each having 100000
points) will bring the performance down.
In most cases, its worthwhile only for CPU bound operations Need to amortize the communication cost against computing done in individual processes
We better have lots of work for the CPU in each process !!!
We may need to use a separate RPC mechanism XML-RPC
Well supported on most platforms and languages JSON-RPC (JavaScript Object Notation)
Created to do web services (e.g: “blip.tv” getting video stream from “youtube.com”)
Light weight and higher performing than its XML cousin.
• Almost ever „type‟ in a JSON stream maps directly into some „type‟ in Python / C++
• Python 2.6+ has the parser and dumper in the standard library as „json‟ module
Concurrency primitives in Python
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 11/21
Concurrency primitives in Python
Distributed Computing
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 12/21
Distributed Computing:
Distribute work across multiple compute resources (local or remote)
Essentially gives handle to a remote process
Depends on RPC mechanisms
Prior examples: CORBA, COM/DCOM, Java RMI (Remote Method Invocation)
Basic Idea:
Create a (remote or local) object that acts as a server and call methods on it
Transparently Manage : method calling, error handling, return value ordering and security aspects
User sees just a function call. Lot of magic happens under the hood.
„PyRO‟ – Python Remote Objects The first known distributed computing framework in Python.
Found cryptic by many people
„RPyC‟ – Remote Python Call
Most people say this is the easiest route to Distributed Computing in Python
PyScripter uses this to allow remote python debugging
MPI (Message Passing Interface)
“mpi4py” module
Concurrency primitives in Python
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 13/21
Concurrency primitives in Python
C Extensions
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 14/21
C Extensions:
Write code in C/C++ functions, call them from Python using Python/C API
Once inside C/C++ we can make use of what-ever concurrency method we want
The Interpreter is released/acquired by using a couple of macros in Python/C API Py_Begin_Allow_Threads, Py_End_Allow_Threads
Python/C API can be used by: Code it by hand use Cython to create C extensions or wrap existing C/C++ code use SWIG to wrap existing C code
One of the oldest methods to connect Python to existing C++
Makes it easier than using Python/C API
use Boost.Python to wrap existing C/C++ code use SIP to wrap existing C/C++ code
Created for PyQt
„ShedSkin‟ Python to C++ compiler Converts python code to C++ code using static code analysis (compiler stuff !!!) Actively maintained only on Linux Very new.
Concurrency primitives in Python
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 15/21
Concurrency primitives in Python
Cooperative Multitasking
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 16/21
Cooperative Multitasking:
Distribute work across multiple agents, each one of which “cooperatively” yields control to the caller, undercertain conditions.
A Scheduler is the „absolute essential‟ aspect of this kind of concurrency method
Coroutines : Python generators turned inside-out
Stackless Python : Different Python interpreter (used by Cisco and EVE online)
Greenlet module : C Extension that mimics Stackless, with the standard interpreter
Asynchronous I/O + cooperative multitasking often results in high performance multitasking systems Basically, every IO request is passed on to the underlying OS, control is passed back to the caller, an event is
generated by OS when I/O completes
Asynchronous I/O is different for Unix / Linux and Windows
Unix / Linux: use Python „select‟ module to employ in built „epoll‟ mechanism Windows: use Python Windows Extensions and employ Windows OVERLAPPED I/O
Many projects in Python use greenlets + coroutines + event loops: Eventlet, gevent, Cogen, multitask etc
Most geared towards Unix/Linux, less for Windows.
May require many basic libraries (socket, time.sleep, threading) to be patched to support asynchronous I/O
Concurrency primitives in Python
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 17/21
Concurrency primitives in Python
Alternate Python Implementations
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 18/21
Alternate Python implementations:
IronPython
Python implemented in C#
Python source gets compiled to .Net byte code
Source level compatibility with Python2.6
Provides access to all .Net internals (or Mono on Unix / Linux) Allows user to employ whatever concurrency method .Net provides
Single threaded performance << CPython
Jython
Python implemented in Java
Python source gets compiled to Java byte code
Source level compatibility with Python 2.5.2 Provides access to the entire Java ecosystem
Allows users to employ all concurrency primitives that Java provides
Single threaded performance < CPython
Once stuck with one of them, it would be difficult to move across platforms !!!
Concurrency primitives in Python
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 19/21
Concurrency primitives in Python
Moral of the Story:
There is no single „best‟ way.
We cannot follow a „blanket‟ approach.
We have to work with available options on acase-by-case basis.
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 20/21
Lets Use (as a proposal)
1. Threads for heavily I/O bound operations
1. Processes for heavy computations.
“multiprocessing”
“subprocess” with JsonRPC RPyC
1. If the data passed between processes is „large‟ Lets use Cython to convert the computationally intensive code to C
And remember to release the GIL once we are inside C
1. Mix 2nd and 3rd approach
1. Create a library that employs Async I/O and Python Coroutines for havingsimple concurrency within our systems.
Concurrency primitives in Python
7/29/2019 ConCurrency Primitives in Python
http://slidepdf.com/reader/full/concurrency-primitives-in-python 21/21
Concurrency primitives in Python
Q & A