Top Banner
Concurrency in Python  Vishal Sapre
21

ConCurrency Primitives in Python

Apr 14, 2018

Download

Documents

vsapre
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 1/21

Concurrency in Python

 Vishal Sapre

Page 2: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 2/21

 What is concurrency ?

 Wikipedia: – In computer science, concurrency is a property of systems in which

several computations are executing simultaneously, and potentially interacting witheach other.

Concurrency is one of the most well researched and one of the mostdifficult subjects in Computer Science today.

 – Its “difficult” to express a sequential process in a parallel fashion, without rethinkingand reengineering system architecture itself.

 – Retrofitting concurrency on sequential processes is NOT EASY.

• Literally hundreds of papers have been presented belonging to variousaspects of Concurrency.

Page 3: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 3/21

 Why is concurrency needed?

Performance: Most machines today have multiple compute resources available.

Our software should be able make use of these resources.

Flexibility to interact with outside world:  Any decently „significant‟ software project interacts with the external world. 

The nature of things in the external world can and will be highly asynchronous. (e.g. HTTP requests,Users interactions with the software (UI /command line), interrupts from an external device, events in asimulation...etc)

Ideally, software should be able to do its own “stuff” and at the “same time” cater to interactions with

external world.

 As a means of survival in the near future: The financial worth of a company‟s products may depend on the use of concurrency methods. 

Our job security as software engineers may depend on our know-how of concurrency methods.

Page 4: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 4/21

Concurrency primitives in Python

This talk focuses on options available within Python.

Following are the usually accepted concurrency options in Python:

Multi-Threading

Multi-Processing

Distributed Computing

C extensions

Cooperative Multitasking

 Alternate Python interpreter

Page 5: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 5/21

Concurrency primitives in Python

Multi-Threading

Page 6: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 6/21

Multi-threading

Distribute work across multiple threads

Python has „real‟ OS threads available for use 

Posix threads on Unix / Linux

 Windows threads on Windows

 Very easy to setup and use (will be shown shortly)

Memory wise cheap (because data is shared)

For IO bound applications, they are very good

GUI Networking

Database operations

Best used for heavily IO bound operations !!!

Best avoided for heavily CPU bound operations !!!

Concurrency primitives in Python

Page 7: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 7/21

Multi-threading

The interpreter (python.exe) is shared among threads and only one thread uses it at a time (Demo 1)

The python interpreter was never designed to be thread-safe

Threads were bolted onto an existing interpreter

Python decided to provide a thin wrapper around OS threads (since OSes already have threads)

 Allows C extensions to be written without worrying about thread safety issues

Eases interpreter maintenance (remember Python is mostly maintained by volunteers)

To share the interpreter, each thread:

 Acquires a lock on the interpreter (called the Global Interpreter Lock or GIL) and then does its “stuff” 

Releases the lock: after every „n‟ interpreter operations OR while waiting for I/O OR while sleeping OR  when it exits

Sends a signal to the OS; OS performs a context switch and tries to schedule other available threads

 As a ramification, “thread contention” results for “CPU bound” operations 

OS Context Switch time is nondeterministic and “mostly” greater than the time for which the originalthread waits before reacquiring the GIL.

So, the thread that had the lock initially, mostly holds the interpreter until its work is done.

 While other threads „fight‟ to get the interpreter. 

 Affects performance negatively and acutely 

Concurrency primitives in Python

Page 8: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 8/21

Concurrency primitives in Python

Multi-Processing

Page 9: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 9/21

Multi-Processing

Distribute work across multiple processes (multiple python.exe invocations)

“multiprocessing” module in Python 2.6+  Create new python processes

Follows the „threading‟ API very closely   Mostly a drop-in replacement for „threading‟ module if multi-processing is required

Uses „fork()‟ on Unix / Linux and „CreateProcess()‟ on Windows 

Uses “cPickle” to pickle objects to send to new process on Windows:

only „pickelable‟ entities are allowed to be exchanged between process 

Parallel Python Full fledged framework for parallelizing applications in Python using processes

“subprocess” module in Python 2.4+  Communicate using subprocess.PIPE and stdin, stdout, stderr

Idea is to manage child processes, not do multiprocessing.

 API is not geared towards multiprocessing, data safety etc.

Concurrency primitives in Python

Page 10: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 10/21

Multi-Processing

Setting up a new process is an expensive operation

No shared stuff, all memory has to be copied for the new process Large data exchange (e.g. 1 list containing 1million chart points or 10 lists, each having 100000

points) will bring the performance down.

In most cases, its worthwhile only for CPU bound operations Need to amortize the communication cost against computing done in individual processes

 We better have lots of work for the CPU in each process !!!

 We may need to use a separate RPC mechanism XML-RPC

 Well supported on most platforms and languages JSON-RPC (JavaScript Object Notation)

Created to do web services (e.g: “blip.tv” getting video stream from “youtube.com”) 

Light weight and higher performing than its XML cousin.

•  Almost ever „type‟ in a JSON stream maps directly into some „type‟ in Python / C++ 

• Python 2.6+ has the parser and dumper in the standard library as „json‟ module 

Concurrency primitives in Python

Page 11: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 11/21

Concurrency primitives in Python

Distributed Computing

Page 12: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 12/21

Distributed Computing:

Distribute work across multiple compute resources (local or remote)

Essentially gives handle to a remote process

Depends on RPC mechanisms

Prior examples: CORBA, COM/DCOM, Java RMI (Remote Method Invocation)

Basic Idea:

Create a (remote or local) object that acts as a server and call methods on it

Transparently Manage : method calling, error handling, return value ordering and security aspects

User sees just a function call. Lot of magic happens under the hood.

„PyRO‟ – Python Remote Objects The first known distributed computing framework in Python.

Found cryptic by many people

„RPyC‟ – Remote Python Call

Most people say this is the easiest route to Distributed Computing in Python

PyScripter uses this to allow remote python debugging

MPI (Message Passing Interface)

“mpi4py” module 

Concurrency primitives in Python

Page 13: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 13/21

Concurrency primitives in Python

C Extensions

Page 14: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 14/21

C Extensions:

 Write code in C/C++ functions, call them from Python using Python/C API

Once inside C/C++ we can make use of what-ever concurrency method we want

The Interpreter is released/acquired by using a couple of macros in Python/C API Py_Begin_Allow_Threads, Py_End_Allow_Threads

Python/C API can be used by: Code it by hand use Cython to create C extensions or wrap existing C/C++ code use SWIG to wrap existing C code

One of the oldest methods to connect Python to existing C++

Makes it easier than using Python/C API

use Boost.Python to wrap existing C/C++ code use SIP to wrap existing C/C++ code

Created for PyQt

„ShedSkin‟ Python to C++ compiler  Converts python code to C++ code using static code analysis (compiler stuff !!!)  Actively maintained only on Linux  Very new.

Concurrency primitives in Python

Page 15: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 15/21

Concurrency primitives in Python

Cooperative Multitasking

Page 16: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 16/21

Cooperative Multitasking:

Distribute work across multiple agents, each one of which “cooperatively” yields control to the caller, undercertain conditions.

 A Scheduler is the „absolute essential‟ aspect of this kind of concurrency method 

Coroutines : Python generators turned inside-out

Stackless Python : Different Python interpreter (used by Cisco and EVE online)

Greenlet module : C Extension that mimics Stackless, with the standard interpreter

 Asynchronous I/O + cooperative multitasking often results in high performance multitasking systems Basically, every IO request is passed on to the underlying OS, control is passed back to the caller, an event is

generated by OS when I/O completes

 Asynchronous I/O is different for Unix / Linux and Windows

Unix / Linux: use Python „select‟ module to employ in built „epoll‟ mechanism   Windows: use Python Windows Extensions and employ Windows OVERLAPPED I/O

Many projects in Python use greenlets + coroutines + event loops: Eventlet, gevent, Cogen, multitask etc

Most geared towards Unix/Linux, less for Windows.

May require many basic libraries (socket, time.sleep, threading) to be patched to support asynchronous I/O

Concurrency primitives in Python

Page 17: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 17/21

Concurrency primitives in Python

 Alternate Python Implementations

Page 18: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 18/21

 Alternate Python implementations:

IronPython

Python implemented in C#

Python source gets compiled to .Net byte code

Source level compatibility with Python2.6

Provides access to all .Net internals (or Mono on Unix / Linux)  Allows user to employ whatever concurrency method .Net provides

Single threaded performance << CPython

Jython

Python implemented in Java

Python source gets compiled to Java byte code

Source level compatibility with Python 2.5.2 Provides access to the entire Java ecosystem

 Allows users to employ all concurrency primitives that Java provides

Single threaded performance < CPython

Once stuck with one of them, it would be difficult to move across platforms !!!

Concurrency primitives in Python

Page 19: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 19/21

Concurrency primitives in Python

Moral of the Story:

There is no single „best‟ way. 

 We cannot follow a „blanket‟ approach. 

 We have to work with available options on acase-by-case basis.

Page 20: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 20/21

Lets Use (as a proposal)

1. Threads for heavily I/O bound operations

1. Processes for heavy computations.

“multiprocessing”

“subprocess” with JsonRPC  RPyC

1. If the data passed between processes is „large‟  Lets use Cython to convert the computationally intensive code to C

 And remember to release the GIL once we are inside C

1. Mix 2nd and 3rd approach

1. Create a library that employs Async I/O and Python Coroutines for havingsimple concurrency within our systems.

Concurrency primitives in Python

Page 21: ConCurrency Primitives in Python

7/29/2019 ConCurrency Primitives in Python

http://slidepdf.com/reader/full/concurrency-primitives-in-python 21/21

Concurrency primitives in Python

Q & A