Top Banner
Concurrency and Distributed systems ... With Python today. Jesse Noller Saturday, March 28, 2009
51

Concurrency and Distributed systems

Oct 04, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Concurrency and Distributed systems

Concurrency and Distributed systems

... With Python today.

Jesse Noller

Saturday, March 28, 2009

Page 2: Concurrency and Distributed systems

30,000 Foot View

• Introduction

• Concurrency/Parallelism

• Distributed Systems

• Where Python is today

• Ecosystem

• Where can we go?

• Questions

Saturday, March 28, 2009

Page 3: Concurrency and Distributed systems

Hello there!

• Who am I?

• Why am I doing this?

• Email: [email protected]

• Blog - http://www.jessenoller.com

• Pycon - http://jessenoller.com/category/pycon-2009/

Saturday, March 28, 2009

Page 4: Concurrency and Distributed systems

Most of all, it’s fun!

Saturday, March 28, 2009

Page 5: Concurrency and Distributed systems

No Code, Why?

Saturday, March 28, 2009

Page 6: Concurrency and Distributed systems

Bike sheds

Saturday, March 28, 2009

Page 7: Concurrency and Distributed systems

Concurrency

• What is it?

• Doing many things “at once”

• Typically local to the machine running the app.

• Implementation Options:

• threads / multiple processes

• cooperative multitasking

• coroutines

• asynchronous programming

Saturday, March 28, 2009

Page 8: Concurrency and Distributed systems

... vs Parallelism

• What is it?

• Doing many things simultaneously

• Implementation options:

• threads

• multiple processes

• distributed systems

Saturday, March 28, 2009

Page 9: Concurrency and Distributed systems

... vs Distributed Systems

• What is it?

• Doing many things, across multiple machines, simultaneously

• Many cores, on many machines

• There are many designs

• There are eight fallacies...

Saturday, March 28, 2009

Page 10: Concurrency and Distributed systems

8 fallacies of distributed systems

Saturday, March 28, 2009

Page 11: Concurrency and Distributed systems

The network is reliable

Saturday, March 28, 2009

Page 12: Concurrency and Distributed systems

Latency is zero

Saturday, March 28, 2009

Page 13: Concurrency and Distributed systems

Bandwidth is infinite

Saturday, March 28, 2009

Page 14: Concurrency and Distributed systems

The network is secure

Saturday, March 28, 2009

Page 15: Concurrency and Distributed systems

Topology doesn’t change

Saturday, March 28, 2009

Page 16: Concurrency and Distributed systems

There is only one administrator

Saturday, March 28, 2009

Page 17: Concurrency and Distributed systems

Transport cost is zero

Saturday, March 28, 2009

Page 18: Concurrency and Distributed systems

The network is homogenous

Saturday, March 28, 2009

Page 19: Concurrency and Distributed systems

Summary• All 3 are related to one another, the fundamental

goals of which are to:

• Decrease latency

• Increase throughput

• Applications start simple, progress to concurrent systems and evolve into parallel, distributed systems

• As the system evolves, the fallacies become more pertinent, you have to account for them early

Saturday, March 28, 2009

Page 20: Concurrency and Distributed systems

Saturday, March 28, 2009

Page 21: Concurrency and Distributed systems

• We have threads. Shiny, real OS ones

• Except for the Global Interpreter Lock

• The GIL makes the interpreter easier to maintain

• ...And it simplifies extension module code

Where is (C)Python?

Saturday, March 28, 2009

Page 22: Concurrency and Distributed systems

• Yes. Sorta. Maybe. It depends.

• I/O Bound / C extensions release it!

• Most applications are I/O bound

• The GIL still has non-zero overhead

• The GIL is not going away*

• You can build concurrent applications regardless of the GIL

Is the GIL a problem?

* ... more on this in a moment, dun dun dun.

Saturday, March 28, 2009

Page 23: Concurrency and Distributed systems

Multiprocessing!

• Added in the 2.6/3.0 timeline, PEP 371

• Processes and IPC (via pipes) to allow parallelism

• Same(ish) API as threading and queue

• Includes Pool, remote Managers for data sharing over a network, etc

• Multiprocessing “outperforms” threading

• IPC requires pickle-ability. Incurs overhead

Saturday, March 28, 2009

Page 24: Concurrency and Distributed systems

Summary

• We have the Global Interpreter Lock

• We also have multiprocessing (no GIL)

• Threads (as an approach) are good for some problems

• They’re not impossible to use correctly

• While hampered, python threads are still useful

• Python still allows you to leverage other approaches to concurrency

Saturday, March 28, 2009

Page 25: Concurrency and Distributed systems

(remember that asterisk?)*

Saturday, March 28, 2009

Page 26: Concurrency and Distributed systems

• Python on the JVM (in Java)

• 2.5-Compatible

• Frank and the others are awesome for resurrecting this project

• May allow python in the Java door

• Pros:

• Unrestricted threading

• Hooray java.util.concurrent!

• Cons:

• No C extensions

Saturday, March 28, 2009

Page 27: Concurrency and Distributed systems

IronPython

• Python on the .NET CLR

• 2.5.2 Compatible

• Matured rapidly, highly usable

• Great for windows environments

• Pros:

• Unrestricted threading

• Some C extensions via ironclad

• Cons:

• Mostly windows only, barring mono

Saturday, March 28, 2009

Page 28: Concurrency and Distributed systems

Stackless

• Modified CPython interpreter

• Offers Coroutines, Channels - “lightweight threads”

• Cooperative multitasking (single thread executes)

• (mostly) Still alive courtesy of CCP Games

• Still has a GIL

• “Stackless is dead, long live PyPy”

Saturday, March 28, 2009

Page 29: Concurrency and Distributed systems

• Python written in (R)Python

• Getting close to 2.5-Compatibility

• Complete “rethink” of the interpreter

• Focusing on JIT/interpreter speed right now

• Still has the GIL

• Some Stackless features (e.g. coroutines, channels)

• Not mature

Saturday, March 28, 2009

Page 30: Concurrency and Distributed systems

The Ecosystem

Saturday, March 28, 2009

Page 31: Concurrency and Distributed systems

That’s a lot of nuts!

• When I started, I had around 40 libraries on my list

• Coroutines, messaging, frameworks, etc

• Python has a huge ecosystem of “stuff”

• Unfortunately, much of is long in the tooth, or of beta quality

• New libraries/frameworks/approaches are coming out every week

Saturday, March 28, 2009

Page 32: Concurrency and Distributed systems

ConcurrencyFrameworks

Saturday, March 28, 2009

Page 33: Concurrency and Distributed systems

Twisted

• “OK, who hasn’t tried twisted?”

• Asynchronous, Event Driven multitasking

• Vast networking library, large ecosystem

• Supports thread usage, but twisted code may not be thread safe

• Supports using processes (not mprocessing).

• Can be mind-bending

Saturday, March 28, 2009

Page 34: Concurrency and Distributed systems

Kamaelia

• Came out of BBC Research

• Uses an easy to understand “components talking via mailboxes” approach

• Cooperative multitasking via generators by default.

• Honkin’ library of cool things

• Supports thread-based components as well

• Very easy to get up and running

• Abstracts IPC, Process, Threads, etc “away”

Saturday, March 28, 2009

Page 35: Concurrency and Distributed systems

Frameworks

• Both kamaelia and twisted have nice networking support

• Both use schedulers which allow scheduled items to schedule other items

• Two different approaches to thinking about the problem

• Both can be used to build distributed apps

• Like all frameworks, you adopt the methodology

Saturday, March 28, 2009

Page 36: Concurrency and Distributed systems

New: Concurrence

• New on the scene (’09) version 0.3

• Lightweight tasks-with-message passing

• Has a main scheduler/dispatcher

• Built on top of stackless/greenlets/libevent

• Network-oriented (HTTP, WSGI servers)

• Still raw (more docs please)

• Very promising (minus compilation problems)

Saturday, March 28, 2009

Page 37: Concurrency and Distributed systems

Coroutines

• Coroutines are essentially light-weight threads of control, Think micro/green threads

• Typically use explicit task switching (cooperative)

• Most implementations have a scheduler, and some communications method (e.g. pipes)

• Not parallel unless used in a distributed fashion

• Both Kamaelia and Twisted “fit” here

• Enhanced generators make these easy to build

Saturday, March 28, 2009

Page 38: Concurrency and Distributed systems

Coroutine libraries

• Fibra: microthreads, tubes, scheduler

• Greenlet: C based, microthreads, no scheduler

• Eventlet: Network “framework” layer on top of greenlet. Has an Actor implementation \o/

• Circuits: Event-based, components/microthreads

• Cogen: network oriented, scheduler, microthreads

• Multitask: microthreads, no channels (it’s dead jim)

Saturday, March 28, 2009

Page 39: Concurrency and Distributed systems

Actors

• Isolated, self reliant components

• Can spawn other Actors

• Communicate via message passing only (by value)

• Operate in parallel

• Communication is asynchronous

• A good model to overcome the fallacies

• See also: Erlang, Scala

Saturday, March 28, 2009

Page 40: Concurrency and Distributed systems

Actor Libraries• Dramatis (alpha quality)

• Great start, excellent base to start working with them

• Parley (alpha quality)

• Another excellent start, supports actors in threads, greenlets or stackless tasklets

• Candygram (2004)

• Old, implements erlang primitives, spawns in threads

• Kamaelia components can fit here(ish)

Saturday, March 28, 2009

Page 41: Concurrency and Distributed systems

(local) Parallelism• Multiprocessing

• Processes and IPC via the threading API, in Python-Core as of 2.6

• Parallel Python

• Allows local parallelism, but also distributed parallelism in a “full” package

• pprocess

• Another easy to use fork/process based package

• Has IPC mechanisms

Saturday, March 28, 2009

Page 42: Concurrency and Distributed systems

Distributed Systems

• Lots of various technologies to help build something

• communications libraries

• socket/networking libraries

• message queues

• some shared memory implementations

• No “full stack” approach

• Most users end up rolling their own, using some combinations of libraries and tools

Saturday, March 28, 2009

Page 43: Concurrency and Distributed systems

Distributed Processing

• Frameworks:

• Parallel Python is the closest for a processing cluster

• The Disco Project is an erlang-based (with python bindings) map-reduce framework

Saturday, March 28, 2009

Page 44: Concurrency and Distributed systems

RPC/Messaging

• RPC:

• Pyro

• rPyc

• Thrift

• Messaging:

• pySage

• python-spread

• XMPP

• Protocol Buffers

Saturday, March 28, 2009

Page 45: Concurrency and Distributed systems

Shared Memory/Message Qs

• Shared Memory

• Posh (dead)

• Memcached

• posix_ipc

• Message Queues

• Apache ActiveMQ

• RabbitMQ

• Stomp

• MemcacheQ

• Beanstalkd

Saturday, March 28, 2009

Page 46: Concurrency and Distributed systems

So...

• Where the hell do we point new users?

• While good, Twisted and Kamaelia have a documentation problem

• The rest is a mish-mash of technologies

• Concurrency is hard let’s go shopping!

Saturday, March 28, 2009

Page 47: Concurrency and Distributed systems

Where does this leave us?

• The GIL is here for the foreseeable future

• Not entirely a bad thing (extensions!)

• Python-Core is not the right place for much of this, but can provide some basics

• Actor implementation

• Java.util.concurrent-like abstractions

• Anything going in must make this work safe

Saturday, March 28, 2009

Page 48: Concurrency and Distributed systems

Where does this leave us?

• Lots of great community work

• Continued room for growth, adoption of other language’s technologies

• If we can build a stack of reusable, swappable components for all three areas: everyone wins

• Anyone for a “distributed Django”?

• “loose coupling and tight cohesion”

• Must take the fallacies into account

Saturday, March 28, 2009

Page 49: Concurrency and Distributed systems

Django?

• The point of a framework is to make the easy things easy, and the hard things easier

• The abstractions must be leaky

• Go see abstractions as leverage!

• It must be safe

• It can not ignore the fallacies

• I shall call it Mustaine (Megadeth)

Saturday, March 28, 2009

Page 50: Concurrency and Distributed systems

Questions?

Saturday, March 28, 2009

Page 51: Concurrency and Distributed systems

Fin.

Saturday, March 28, 2009