Control your data right way TPL Dataflow crash course Mikhail Veselov Moscow 20 Апрель 2015
Aug 22, 2015
Control your data right way
TPL Dataflow crash course
Mikhail Veselov
Moscow
20 Апрель 2015
2
History overview § TPL Dataflow is another abstracGon level § ReacGve Extensions & Flow sync § TTB Flow Graph analog § AAL à CCR à TDF
TPL Dataflow – crash course IntroducGon
20 Апрель 2015
Data
AcGon
Recycle
Cache
Thread Pool
Threads
3
Main idea -‐ why Dataflow? § Define your applicaGon dataflow § Async I/O and CPU-‐oriented code, high-‐throughput & low-‐latency § Random, unstructured data (compare with Parallel and PLINQ) § Rx IObservable<T> support § Easy start: ActionBlock<T> & BufferBlock<T>
TPL Dataflow – crash course IntroducGon
20 Апрель 2015
FIFO queue Action
Example: GZIP compressing simple schema
4
Interfaces -‐ IDataflowBlock public interface IDataflowBlock { void Complete(); void Fault(Exception error); Task Completion { get; } } block.Completion.ContinueWith(t => { if (t.IsFaulted) ((IDataflowBlock)nextBlock).Fault(t.Exception); else nextBlock.Complete(); });
TPL Dataflow – crash course Fundamental Interfaces
20 Апрель 2015
§ IDataflowBlock – base interface, no abstract implementaGon
§ Task CompleGon staGc methods: § WhenAll § ContinueWith
§ AggregateException with previous block’s fault § CancellationToken
5
Interfaces -‐ ITargetBlock public interface ITargetBlock<in TInput> : IDataflowBlock { DataflowMessageStatus OfferMessage( DataflowMessageHeader messageHeader, TInput messageValue, ISourceBlock<TInput> source, bool consumeToAccept); }
§ DataflowMessageStatus: – Accepted – OK, I’ll handle it – Declined – NO, take it back – Postponed – May Be, please, call back later J – NotAvailable – Tried to consume with no luck – DecliningPermanently – No, and don’t call me anymore L
§ bool consumeToAccept: – Call ConsumeMessage synchroniously
TPL Dataflow – crash course Fundamental Interfaces
20 Апрель 2015
6
Interfaces -‐ ISourceBlock public interface ISourceBlock<out TOutput> : IDataflowBlock { IDisposable LinkTo(ITargetBlock<TOutput> target, bool unlinkAfterOne); bool ReserveMessage( // prepare DataflowMessageHeader messageHeader, ITargetBlock<TOutput> target); TOutput ConsumeMessage( // commit DataflowMessageHeader messageHeader, ITargetBlock<TOutput> target, out bool messageConsumed); void ReleaseReservation( //rollback DataflowMessageHeader messageHeader, ITargetBlock<TOutput> target); }
§ 2-‐phase commit protocol (a.k.a. transacGon)
TPL Dataflow – crash course Fundamental Interfaces
20 Апрель 2015
7
Advanced Interfaces § IPropagatorBlock<TInput, TOutput>
public interface IPropagatorBlock<in TInput, out TOutput> : ITargetBlock<TInput>, ISourceBlock<TOutput> { }
§ 1-‐by-‐1 linking vs 1-‐by-‐n linking
§ IReceivableSourceBlock<TOutput> public interface IReceivableSourceBlock<TOutput> :
ISourceBlock<TOutput> { bool TryReceive(out TOutput item, Predicate<TOutput> filter); bool TryReceiveAll(out IList<TOutput> items); }
§ Easier data process
TPL Dataflow – crash course Fundamental Interfaces
20 Апрель 2015
8
Pure Buffering Blocks § BufferBlock<T>
– FIFO queue – Producer/Consumer
BufferBlock<FacebookDTO> dataToProcess = new BufferBlock<FacebookDTO>(); dataToProcess.PostAsync(newVideo); dataToProcess.Post(newRepost); dataToProcess.SendAsync(newLike);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
9
Pure Buffering Blocks § BroadcastBlock<T>
– Current overwrite – No receivers – drop it
var bb = new BroadcastBlock<ImageDTO>(i => i); var saveToDisk = new ActionBlock<ImageDTO>(item => item.Image.Save(item.Path)); var showInUi = new ActionBlock<ImageDTO>(item => imagePanel.AddImage(item.Image), new DataflowBlockOptions { TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext() }); bb.LinkTo(saveToDisk); bb.LinkTo(showInUi);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
10
Pure Buffering Blocks § WriteOnceBlock<T>
– Singleton
writeOnce = new WriteOnceBlock<Lazy<Task<T>>>(i => i); writeOnce.Post(new Lazy<Task<T>>(() => Task.Run(amadeusConnectionFactory))); var lazyValue = await writeOnce.RecieveAsync(); var taskConnection = await lazyValue.Value;
var connection = taskConnection.Result;
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
11
Executor Blocks § ActionBlock<TInput> var chooser = new ActionBlock<PostDTO>(post => { Process(post); });
var threeMessageAtOnce = new DataflowBlockOptions { BoundedCapacity = 3, TaskScheduler = TaskScheduler.Current }; var threePerTask = new DataflowBlockOptions { MaxMessagesPerTask = 3 };
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
12
Executor Blocks § TransformBlock<TInput, TOutput>
– Output ordering-‐safe queue
var gzipper = new TransformBlock<byte[], Task<byte[]>> (b => Task.Run(() => Compress(b)); var RSAEncryptor = new TransformBlock<byte[], byte[]>(z => RSA(z)); gzipper.LinkTo(RSAEncryptor);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
13
Executor Blocks § TransformManyBlock<TInput, TOutput>
– Produce zero or more items per 1 input message – Output can be a Task
// .SelectMany() analog var tagCloudAggregator = new TransformManyBlock<TagFromPost[], TagFromPost> (arrayOfTags => arrayOfTags); var filteringTags = return new TransformManyBlock<T, T>(async tag => await filter(tag) ? new [] { tag } : Enumerable.Empty<T>()); tagCloudAggregator.LinkTo(filteringTags); // provide info to UI filteringTags.TryRecieveAll(out tagDataSource); tagCloudControl.Show(tagDataSource);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
14
Executor Blocks § NullTarget<TInput> Recycle bin
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
15
Joining Blocks § BatchBlock<T>
– accumulate and run var batch = new BatchBlock<T>(batchSize: Int32.MaxValue); new Timer(delegate { batch.TriggerBatch(); }).Change(1000, 1000);
var batch = new BatchBlock<T>(batchSize: 100);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
16
Joining Blocks § JoinBlock<T1, T2, …>
– make a Tuple – StarvaGon problem
var throttle = new JoinBlock<SyntaxTree, Request>(); for (int i = 0; i < 10; ++i) throttle.Target1.Post(new SyntaxTree()); var processor = new TransformBlock<Tuple<SyntaxTree, Request>, SyntaxTree> (pair => { var request = pair.Item2; var resource = pair.Item1; request.ProcessWith(resource); return resource; }); throttle.LinkTo(processor); processor.LinkTo(throttle.Target1);
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
17
Joining Blocks § BatchedJoinBlock<T1, T2,…>
– accumulate a Tuples and run try { batchedJoin.Target1.Post(DoWork()); batchedJoin.Target2.Post(default(T2)); } Catch (Exception e) { batchJoin.Target2.Post(e); batchJoin.Target1.Post(default(T1)); } // Item1 – results from Target1 // Item2 – results from Target12 await batchedJoin.RecieveAsync();
TPL Dataflow – crash course Built-‐in Dataflow Blocks
20 Апрель 2015
18
ConfiguraKon OpKons – TPL support § TaskScheduler & SynchronizationContext
– TaskScheduler.Default is default – TaskScheduler.Current is not a default – ConcurrentExclusiveSchedulerPair
§ MaxDegreeOfParallelism – ExecuGonDataflowBlockOpGons – All operaGons are not concurrent by default
§ CancellationToken
TPL Dataflow – crash course Advanced topics
20 Апрель 2015
19
ConfiguraKon OpKons – load balancing § MaxMessagesPerTask
– Blocks try to minimize number of Tasks § MaxNumberOfGroups
– Grouping blocks autocomplete § Greedy
– How to create batches and join § BoundedCapacity
– load balancing – queue size var taskSchedulerPair = new ConcurrentExclusiveSchedulerPair(); var readerActions = from checkBox in new[] { checkBox1, checkBox2, checkBox3 } select new ActionBlock<int>(milliseconds => { toggleCheckBox.Post(checkBox); Thread.Sleep(milliseconds); toggleCheckBox.Post(checkBox);
);}, new ExecutionDataflowBlockOptions { TaskScheduler = taskSchedulerPair.ConcurrentScheduler });
TPL Dataflow – crash course Advanced topics
20 Апрель 2015
20
StaKc extension methods – data process § Choose
– MulGple sources for an acGon – There will be only one message processed
§ OutputAvailableAsync – Analog of Stack.Peek operaGon
§ Post/SendAsync – Always async data propagaGon – You can postpone the message with SendAsync , not with Post
§ Receive(Async)
TPL Dataflow – crash course Advanced topics
20 Апрель 2015
21
StaKc extension methods – outer API § Encapsulate (propagator block factory method) § LinkTo
– Filter the message propagaGon – Link opGons – Do not confuse with ISourceBlock<T> method
§ AsObsevable(er) – Rx extension support, no holy war here
new DataflowLinkOptions { MaxMessages = 1, Append = false, PropagateCompletion = true
}
TPL Dataflow – crash course Advanced topics
20 Апрель 2015
22
Deep inside § Implement your own block § Advanced debug info with DebuggerDisplayAttribute
§ Chapter #4 in Concurrency in C# Cookbook by Stephen Cleary
TPL Dataflow – crash course Advanced topics
20 Апрель 2015
23
Gains When to use Losses
TPL Dataflow – crash course Ending
20 Апрель 2015
Gathering it up
§ Thread Safety § Structured dataflow § Async TPL-‐oriented § Rx-‐Extension support § CCR-‐oriented code
easy migraGon
§ Another abstracGon layer
§ Hard debug when dataflow is complicated
§ Too many generics
§ You have a random data which must be ordered
§ CPU and I/O operaGons § You can parallelize work
Your QR Code
I am at your disposal in case of any questions or doubts
20 Апрель 2015
Mikhail Veselov
Moscow
+7 911 951 42 98