Top Banner
Distributed Task Execution Vitalii Tymchyshyn [email protected]
19

Distributed Task Execution

Jun 23, 2015

Download

Technology

Your architecture needs distributed task processing? This presentation will help you to define criterias to choose the implementation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed Task Execution

DistributedTaskExecution

Vitalii [email protected]

Page 2: Distributed Task Execution

Participants

● Task source● Tasks● Task distribution server(s)● Task executors● Monitoring & reporting

subsystem

Page 3: Distributed Task Execution

Task source types & main problems

Dynamic task source

● Pull vs Push source● Buffering● Flow Control

Large set of tasks to execute● Batching● Restart on failures● Progress &

problems monitoring

● Tail slowdown

Page 4: Distributed Task Execution

Tasks variety

● Uniform vs different● Idempotent vs

Transactional● Fat vs Thin● Fire and forget vs Tasks

with result

Page 5: Distributed Task Execution

Uniform tasks

Source Distributor

Executor

Executor

Executor

Page 6: Distributed Task Execution

Differently typed tasks

Source Distributor

Executor, type A

Executor, type A

Executor, type B

Executor, type B

Page 7: Distributed Task Execution

Use idempotent tasks!

● This means that task executed twice don't do any harm (usually second call is NOOP)

● This makes batching a lot easier● This makes task source/coordinator a lot

easier, especially in distributed case● This is usually quite easy● This gives you quite strong guaranties● Transaction with two transactional resources

is not guarantied

Page 8: Distributed Task Execution

Regular model

We need to perform task to move $X money from A to B1. Take task and remove it from queue2. Start database transaction3. Perform move4. Commit database transactionIf there are any problems between (1) and (4) - your move is lost

Page 9: Distributed Task Execution

Transactional model

We need to perform task to move $X money from A to B1. Take task2. Start database transaction3. Perform move4. Commit database transaction5. Ack taskIf there are any problems between (4) and (5) - your move is duplicated

Page 10: Distributed Task Execution

We need to perform task T to move $X money from A to B1. Take task2. Start database transaction3. Create move with id T or skip if move exists4. Commit database transaction5. Ack taskIf there are any problems between (4) and (5) - task will be tried once more and do no harm

Idempotent model

Page 11: Distributed Task Execution

Dealing with fat tasks

● Most queues do not tolerate fat tasks● Move the fat to the storage● Don't forget to GC your fat

Page 12: Distributed Task Execution

What result can task generate

● Task finished indicator● OK / ERROR outcome● Different task statistics to be stored /

aggregated (e.g. time taken / resources used)

● Business level task resultThis means that business-level "Fire and forget" is not usually "Fire and forget" at operations level

Page 13: Distributed Task Execution

Task tail problem

time

speed

Page 14: Distributed Task Execution

Task tail problem

80% of tasks complete in 20% of time :)● Let average task take 10 seconds● Let slow task take 100 seconds● Let it be 1% of slow tasks● This means out of last 100 tasks, at least

one will be slow with 63% probability.● This means last 100 tasks will take 100

seconds, no matter number of executors● Even if we have an executor for each task,

the whole set will take 100 seconds

Page 15: Distributed Task Execution

Task tail problem solutions

● Start slow tasks early● If slowness is variable, start slow task

multiple times in parallel● Cut your tail

Page 16: Distributed Task Execution

ActiveMQ (JMS)

● Easy for "Fire and forget tasks"● Problematic for a lot of "in-flight"

tasks● Pain to configure (does not work OK

Out of box)● Complex to monitor/control single

task

Page 17: Distributed Task Execution

Zookeeper as distributor for Dynamic Task Source

● Done with small custom module● Push task source / pull task executor

design● Task priorities / timeouts /

reprocessing logic easy with custom module

● Task monitoring / reporting is done by task source

Page 18: Distributed Task Execution

● Pluggable task sources, supporting JMS/RDBMs/Plain file sources for different deployments

● Complex task reprocessing schema with timeouts & customizable affinity

● RDBMs source provides per-task execution information

● "Killer" task detection● Configurable batching abilities for fast speed

processing

Custom task execution solution

Page 19: Distributed Task Execution

Q & A

Questions are welcome!

● Voice (red ale preferred☺)● [email protected]● @tivv00● tinyurl.com/LinkedTIVV