Top Banner
Celery - A Distributed Task Queue Duy Do (@duydo) 1
34

Celery - A Distributed Task Queue

Aug 16, 2015

Download

Software

Duy Đỗ
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Celery - A Distributed Task Queue

Celery - A Distributed Task QueueDuy Do (@duydo)

1

Page 2: Celery - A Distributed Task Queue

Outline1. About

2. What is Celery?

3. Celery Architecture

4. Broker, Task, Worker

5. Monitoring

6. Coding

7. Q & A

2

Page 3: Celery - A Distributed Task Queue

About

A father, a husband and a software engineer

Passionate in distributed systems, real-time data processing, search engine

Work @sentifi as a backend engineer

Follow me @duydo

3

Page 4: Celery - A Distributed Task Queue

What is Celery?

Distributed Task Queue written in Python

Simple, fast, flexible, highly available, scalable

Mature, feature rich

Open source, BSD License

Large community

4

Page 5: Celery - A Distributed Task Queue

What is Task Queue?

Task Queue is a system for parallel execution of tasks

5

Client WorkerBrokersend tasks distribute tasks

Worker

distribute tasks

Page 6: Celery - A Distributed Task Queue

Celery Architecture

6

Client 1

Task Queue 2…

Task Queue N

Task Queue 1

Broker

Client 2

Worker1

Worker2

Task Result Storage

distribute tasks

distribute tasks

send tasks

send tasks

store task results

store task results

get task result

get task result

Page 7: Celery - A Distributed Task Queue

Broker

The middle man holds the tasks (messages)

Celery supports:

• RabbitMQ, Redis

• MongoDB, CouchDB

• ZeroMQ, Amazon SQS, IronMQ

7

Page 8: Celery - A Distributed Task Queue

Task

A unit of work

Exists until it has been acknowledged

Result of the tasks can be stored or ignored

States: PENDING, STARTED, SUCCESS, FAILURE, RETRY, REVOKED

Periodic task (cron jobs)

8

Page 9: Celery - A Distributed Task Queue

Define Tasks

#  function  style  @app.taskdef  add(x,  y):        return  x  *  y  

#  class  style  class  AddTask(app.Task):        def  run(self,  x,  y):                return  x  +  y

9

Page 10: Celery - A Distributed Task Queue

Calling Tasksapply_async(args[,  kwargs[,  …]])

delay(*args,  **kwargs)

calling(__call__)  

e.g:

• result  =  add.delay(1,  2)

• result  =  add.apply_async((1,  2),  countdown=10)

10

Page 11: Celery - A Distributed Task Queue

Calling Task Optionseta a specific date time that is the earliest time at which task will be executed

countdown set eta by seconds into the future

expires set task’s expire time

serializer pickle (default), json, yaml and msgpack

compression compress the messages using gzip or bzip2

queue route the tasks to different queues

11

Page 12: Celery - A Distributed Task Queue

Task Result

result.ready() true if the task has been executed

result.successful() true if the task executed successfully

result.result the return value of the task or exception

result.get() blocks until the task is complete, return result or exception

12

Page 13: Celery - A Distributed Task Queue

Tasks Workflows

Signatures: Partials, Immutability, Callbacks

The Primitives: Chains, Groups, Chords, Map & Starmap, Chunks

13

Page 14: Celery - A Distributed Task Queue

Signatures

signature() wraps args, kwargs, options of a single task invocation in a way such that it can be:

• passed to functions

• serialized and sent across the wire

like subtasks

14

Page 15: Celery - A Distributed Task Queue

Create Signatures#  ws.tasks.add(1,  2)s  =  signature('ws.tasks.add',  args=(1,  2),  countdown=10)  s  =  add.subtask((1,  2),  countdown=10)  s  =  add.s(1,  2)  s  =  add.s(1,  2,  debug=True)

#  inspect  fieldss.args    #  (1,  2)s.kwargs    #  {'debug':  True')s.options    #  {countdown=10}  

#  execute  as  task  s.delay()  s.apply_async()  s()

15

Page 16: Celery - A Distributed Task Queue

Partial Signatures

16

Specifying additional args, kwargs or options to apply_async/delay to create partial

• partial  =  add.s(1)  

• partial.delay(2)  #  1  +  2  

• partial.apply_async((2,))  #  1  +  2

Page 17: Celery - A Distributed Task Queue

Immutable Signatures

17

A signature can only be set with options

Using si() to create immutable signature

• add.si(1,  2)

Page 18: Celery - A Distributed Task Queue

Callbacks Signatures

18

Use the link arg of apply_sync to add callbacks

add.apply_async((1,  2),  link=add.s(3))

Page 19: Celery - A Distributed Task Queue

Group

19

A signature takes a list of tasks should be applied in parallel

s  =  group(add.s(i,  i)  for  i  in  xrange(5))  

s().get()  =>  [0,  2,  4,  6,  8]

Page 20: Celery - A Distributed Task Queue

Chain

20

Chain of callbacks, think pipeline

c  =  chain(add.s(1,  2),  add.s(3),  add.s(4))  

c  =  chain(add.s(1,  2)  |  add.s(3)  |  add.s(4))  

c().get()  =>  ((1  +  2)  +  3)  +  4

Page 21: Celery - A Distributed Task Queue

Chord

21

Like a group but with a callback

c  =  chord((add.s(i,  i)  for  i  in  xrange(5)),  xsum.s())  

c  =  chord(add.s(i,  i)  for  i  in  xrange(5))(xsum.s())  

c().get()  =>  20

Page 22: Celery - A Distributed Task Queue

Map

22

Like built-in map function

c  =  task.map([1,  2,  3])  

c()  =>  [task(1),  task(2),  task(3)]

Page 23: Celery - A Distributed Task Queue

Starmap

23

Same map except the args are applied as *args

c  =  add.map([(1,  2),  (3,  4)])  

c()  =>  [add(1,  2),  add(3,  4)]

Page 24: Celery - A Distributed Task Queue

Chunks

24

Chunking splits a long list of args to parts

items  =  zip(xrange(10),  xrange(10))  

c  =  add.chunks(items,  5)  

c()  =>  [0,  2,  4,  6,  8],  [10,  12,  14,  16,  18]

Page 25: Celery - A Distributed Task Queue

WorkerAuto reloading

Auto scaling

Time & Rate Limits

Resource Leak Protection

Scheduling

User Components

25

Page 26: Celery - A Distributed Task Queue

Autoloading

Automatically reloading the worker source code as it changes

celery  worker  —autoreload

26

Page 27: Celery - A Distributed Task Queue

Autoscaling

Dynamically resizing the worker pool depending on load or custom metrics defined by user

celery  worker  —autoscale=8,2  

=>  min  processes:  2,  max  processes:8

27

Page 28: Celery - A Distributed Task Queue

Time & Rate Limits

number of tasks per second/minute/hour

how long a task can be allowed to run

28

Page 29: Celery - A Distributed Task Queue

Resource Leak Protection

Limit number of tasks a pool worker process can execute before it’s replaced by a new one

celery  worker  —maxtaskperchild=10

29

Page 30: Celery - A Distributed Task Queue

Scheduling

Specify the time to run a task

in seconds, date time

periodic tasks (interval, crontab expressions)

30

Page 31: Celery - A Distributed Task Queue

User ComponentsCelery uses a dependency graph enabling fire grained control of the workers internally, called “bootsteps”

Customize the worker components, e.g: ConsumerStep

Add new components

Bootsteps http://celery.readthedocs.org/en/latest/userguide/extending.html

31

Page 32: Celery - A Distributed Task Queue

MonitoringFlower - Real-time Celery web monitor

• Task progress and history

• Show task details (arguments, start time, runtime, and more)

• Graphs and statistics

• Shutdown, restart worker instances

• Control worker pool size, autoscaling settings

• …

32

Page 33: Celery - A Distributed Task Queue

Coding…

Get your hand dirty…

33

Page 34: Celery - A Distributed Task Queue

–Duy Do (@duydo)

Thank you

34