Transcript
Outline1. About
2. What is Celery?
3. Celery Architecture
4. Broker, Task, Worker
5. Monitoring
6. Coding
7. Q & A
2
About
A father, a husband and a software engineer
Passionate in distributed systems, real-time data processing, search engine
Work @sentifi as a backend engineer
Follow me @duydo
3
What is Celery?
Distributed Task Queue written in Python
Simple, fast, flexible, highly available, scalable
Mature, feature rich
Open source, BSD License
Large community
4
What is Task Queue?
Task Queue is a system for parallel execution of tasks
5
Client WorkerBrokersend tasks distribute tasks
Worker
distribute tasks
Celery Architecture
6
Client 1
Task Queue 2…
Task Queue N
Task Queue 1
Broker
Client 2
Worker1
Worker2
Task Result Storage
distribute tasks
distribute tasks
send tasks
send tasks
store task results
store task results
get task result
get task result
Broker
The middle man holds the tasks (messages)
Celery supports:
• RabbitMQ, Redis
• MongoDB, CouchDB
• ZeroMQ, Amazon SQS, IronMQ
7
Task
A unit of work
Exists until it has been acknowledged
Result of the tasks can be stored or ignored
States: PENDING, STARTED, SUCCESS, FAILURE, RETRY, REVOKED
Periodic task (cron jobs)
8
Define Tasks
# function style @app.taskdef add(x, y): return x * y
# class style class AddTask(app.Task): def run(self, x, y): return x + y
9
Calling Tasksapply_async(args[, kwargs[, …]])
delay(*args, **kwargs)
calling(__call__)
e.g:
• result = add.delay(1, 2)
• result = add.apply_async((1, 2), countdown=10)
10
Calling Task Optionseta a specific date time that is the earliest time at which task will be executed
countdown set eta by seconds into the future
expires set task’s expire time
serializer pickle (default), json, yaml and msgpack
compression compress the messages using gzip or bzip2
queue route the tasks to different queues
11
Task Result
result.ready() true if the task has been executed
result.successful() true if the task executed successfully
result.result the return value of the task or exception
result.get() blocks until the task is complete, return result or exception
12
Tasks Workflows
Signatures: Partials, Immutability, Callbacks
The Primitives: Chains, Groups, Chords, Map & Starmap, Chunks
13
Signatures
signature() wraps args, kwargs, options of a single task invocation in a way such that it can be:
• passed to functions
• serialized and sent across the wire
like subtasks
14
Create Signatures# ws.tasks.add(1, 2)s = signature('ws.tasks.add', args=(1, 2), countdown=10) s = add.subtask((1, 2), countdown=10) s = add.s(1, 2) s = add.s(1, 2, debug=True)
# inspect fieldss.args # (1, 2)s.kwargs # {'debug': True')s.options # {countdown=10}
# execute as task s.delay() s.apply_async() s()
15
Partial Signatures
16
Specifying additional args, kwargs or options to apply_async/delay to create partial
• partial = add.s(1)
• partial.delay(2) # 1 + 2
• partial.apply_async((2,)) # 1 + 2
Immutable Signatures
17
A signature can only be set with options
Using si() to create immutable signature
• add.si(1, 2)
Callbacks Signatures
18
Use the link arg of apply_sync to add callbacks
add.apply_async((1, 2), link=add.s(3))
Group
19
A signature takes a list of tasks should be applied in parallel
s = group(add.s(i, i) for i in xrange(5))
s().get() => [0, 2, 4, 6, 8]
Chain
20
Chain of callbacks, think pipeline
c = chain(add.s(1, 2), add.s(3), add.s(4))
c = chain(add.s(1, 2) | add.s(3) | add.s(4))
c().get() => ((1 + 2) + 3) + 4
Chord
21
Like a group but with a callback
c = chord((add.s(i, i) for i in xrange(5)), xsum.s())
c = chord(add.s(i, i) for i in xrange(5))(xsum.s())
c().get() => 20
Starmap
23
Same map except the args are applied as *args
c = add.map([(1, 2), (3, 4)])
c() => [add(1, 2), add(3, 4)]
Chunks
24
Chunking splits a long list of args to parts
items = zip(xrange(10), xrange(10))
c = add.chunks(items, 5)
c() => [0, 2, 4, 6, 8], [10, 12, 14, 16, 18]
WorkerAuto reloading
Auto scaling
Time & Rate Limits
Resource Leak Protection
Scheduling
User Components
25
Autoscaling
Dynamically resizing the worker pool depending on load or custom metrics defined by user
celery worker —autoscale=8,2
=> min processes: 2, max processes:8
27
Resource Leak Protection
Limit number of tasks a pool worker process can execute before it’s replaced by a new one
celery worker —maxtaskperchild=10
29
Scheduling
Specify the time to run a task
in seconds, date time
periodic tasks (interval, crontab expressions)
30
User ComponentsCelery uses a dependency graph enabling fire grained control of the workers internally, called “bootsteps”
Customize the worker components, e.g: ConsumerStep
Add new components
Bootsteps http://celery.readthedocs.org/en/latest/userguide/extending.html
31
MonitoringFlower - Real-time Celery web monitor
• Task progress and history
• Show task details (arguments, start time, runtime, and more)
• Graphs and statistics
• Shutdown, restart worker instances
• Control worker pool size, autoscaling settings
• …
32
top related