8/14/2019 Topic 2: Untimed Model of Computation
1/46
Topic 2: Untimed Model of Computation
Seyed Hosein Attarzadeh Niaki
KTH
November 17, 2009
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 1 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
2/46
Outline
1 Kahn Process Networks
2 Dataflow Process Networks
3 Synchronous Dataflow Process NetworksEmbedded Software SynthesisCyclo-Static Dataflow
4 Communicating Sequential Processes
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 2 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
3/46
Outline
1 Kahn Process Networks
2 Dataflow Process Networks
3 Synchronous Dataflow Process Networks
4 Communicating Sequential Processes
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 3 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
4/46
Introduction
introduced as a simple language for parallel programming
programs are composed of computing stations which communicateover one-way FIFO like channels using blocking read (wait) andnon-blocking write (send) semantics
communication lines are the only way by which computing stationsmay communicate
a communication line transmits information within an unpredictablebut finite amount of time
at any given time, a computing station is either computing or waitingfor information on one of its input lines
each computing station follows a sequential program
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 4 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
5/46
Model Properties
the formal model is based on associating continuous functions fromhistory of node inputs to history of their outputs
Monotonicity: a machine need not have all of its inputs to startcomputing, since future input concerns only future outputcontinuity prevents station from deciding to send some output onlyafter it has received an infinite amount of input
the set of assigned functions together with the inputs form a set of
fixpoint equations with a unique minimal solution
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 5 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
6/46
Model Properties
the formal model is based on associating continuous functions fromhistory of node inputs to history of their outputs
Monotonicity: a machine need not have all of its inputs to startcomputing, since future input concerns only future outputcontinuity prevents station from deciding to send some output onlyafter it has received an infinite amount of input
the set of assigned functions together with the inputs form a set of
fixpoint equations with a unique minimal solution
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 5 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
7/46
Recursion
recursive parallel programs allow unbounded number of machines to
compute in parallela recursive parallel schema F, may include the node F with the sameoccurrence of input and output lines
start unfolding not when the input present, but when the output isrequested (laziness?)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 6 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
8/46
Outline
1 Kahn Process Networks
2 Dataflow Process Networks
3 Synchronous Dataflow Process Networks
4 Communicating Sequential Processes
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 7 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
9/46
Introduction
a dataflow actor, when it fires, maps input tokens into output tokens
a set of firing rules specify when an actor can fire
a firing consumes input tokens and produces output tokens
a sequence of such firings is a particular type of a Kahn process thatwe call a dataflow process
a network of such processes is called a dataflow process network
dataflow process networks are a special case of Kahn process networks
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 8 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
10/46
Streams
Streams and functions operating on them are a natural way to modelreactive systems. Streams are interpreted differently
defining streams recursively using cons operator and treat them usingfunctional semantics
as a two-element cell, head of the stream and procedure forcomputing the rest of stream (recursion without lazy semantics)
streams as channels (like KPN)
representing streams as recursive-cons and translating to channels for
implementationassociating with each stream a clock (Lustre and Signal)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 9 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
11/46
Firing Rules
an actor with p 1 input streams can have N firing rules
R = {R1, R2, ..., RN} (1)
the actor can fire when one or more of its firing rules is satisfied
Ri = {Ri,
1, Ri,
2, ..., Ri,
p} (2)
for firing rule i to be satisfied, each pattern Ri,j must form a prefix ofthe sequence of unconsumed tokens at input j.
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 10 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
12/46
Firing Rules
an actor with p 1 input streams can have N firing rules
R = {R1, R2, ..., RN} (1)
the actor can fire when one or more of its firing rules is satisfied
Ri = {Ri,
1, Ri,
2, ..., Ri,
p} (2)
for firing rule i to be satisfied, each pattern Ri,j must form a prefix ofthe sequence of unconsumed tokens at input j.
firing rules example
R1 = {[],, [T]} (3)
R2 = {, [], [F]} (4)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 10 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
13/46
Continuity, Monotonicity, Sequentiality
for a dataflow process to be continuous, each actor firing must befunctional, and the set of firing rules must be sequential
functional means that an actor firing lacks side effects and that outputtokens are purely function of consumed input tokenssequential means that the firing rules can be tested in a predefinedorder using only blocking reads.
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 11 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
14/46
Execution Models
Concurrent Processes: multitasking with demand-driven style,configuration of network on the fly is allowed (recursive processdefinition), context switch is expensive
Dynamic Scheduling of Dataflow Process Networks: instead ofcontext switch, scheduler tracks actor inputs and fires them
Static Scheduling of Dataflow Process Networks: possible forsynchronous dataflow networks, avoids overhead of runtime scheduler(except for select and switch)
Compilation of Dataflow Graphs: static memory allocation for
efficient compilationThe tagged-Token Model: each token has a tag, firing is enabledby presence of same tagged tokens on inputs, no need for FIFO
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 12 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
15/46
Outline
1 Kahn Process Networks
2 Dataflow Process Networks
3 Synchronous Dataflow Process NetworksEmbedded Software SynthesisCyclo-Static Dataflow
4 Communicating Sequential Processes
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 13 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
16/46
Introduction
Synchronous DataFlow (SDF) is a special case of dataflowthere is always a single firing rule or equivalently,
the number of consumed and produced tokens on each input/outputfor a single firing of an actor is fixed and known a priori.
in a homogeneous SDF graph (process network), all nodes produce orconsume a single token on each input or output arc in each invocation
a delay of d samples on an arc is implemented by initializing the arcbuffer with d zero samples
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 14 / 39
S h d li
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
17/46
Scheduling
a schedule determines when and where nodes fire
a blocked schedule is a periodic schedule where each cycle terminatesbefore the next cycly begins
the iteration period is the length of one cycle of a blocked schedule =reciprocal of throughput
putting extra delays on feed-forward paths is called pipelining
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 15 / 39
F li f C i SDF G h
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
18/46
Formalism for a Consistence SDF Graph
an SDF graph can be characterized by a matrix (similar to incident matrix)
assuming the node invoked at time n as vector v(n) and the number ofavailable tokens in each buffer as vector b(n):
b(n + 1) = b(n) + v(n)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 16 / 39
F li f C i SDF G h
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
19/46
Formalism for a Consistence SDF Graph
an SDF graph can be characterized by a matrix (similar to incident matrix)
assuming the node invoked at time n as vector v(n) and the number ofavailable tokens in each buffer as vector b(n):
b(n + 1) = b(n) + v(n)
identifying inconsistence sample rates: a necessary condition forexistence of a periodic sequential schedule with bounded memory is:rank() = s 1insufficient sample rates: every directed loop must have at least onedelay
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 16 / 39
Cl S Al ith
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
20/46
Class S Algorithms
Definition
Given a positive integer q such that q = 0 and an initial state for thebuffers b(0), the ith node is said to be runnable at a given time, if it hasnot been run qi times and running it will not cause a negative buffer size.
Definition
A class S algorithm is any algorithm that schedules a node if it isrunnable, updates b(n), and stops only when no more nodes are runnable.
any class S algorithm will run to completion if a periodic admissiblesequential schedule (PASS) exists for a given SDF graph.
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 17 / 39
S h d li f P ll l P
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
21/46
Scheduling for Parallel Processors
a periodic admissible parallel schedule (PAPS) specifies a periodicschedule for each processor
if p is the smallest positive integer vector in the nullspace of thenthe cycle of schedule invokes each node the number of times given byq = Jp, for some positive integer J which is called blocking factor
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 18 / 39
Static Buffering
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
22/46
Static Buffering
Fifo queues are costly and involve considerable run-time overhead
we can statically compute the location of the input and outputsamples for each invocation of the code
static buffering means each invocation of a node in a cyclic periodaccesses the same locations in the memory for its inputs and outputs
assuming a node is invoked q times in a period and produces i tokenseach time to a circular buffer of size N, static buffering occurs if andonly if iq = KN (for some integer K)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 19 / 39
Embedded Software Synthesis
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
23/46
Embedded Software Synthesis
the target architecture is a DSP with limited on-chip memory
SDF is scheduled to generate sequence of actor invocations
the code generator steps through the schedule and for each actor,inserts the code (C, assembly) for the related computation
the code generator generates inline code
schedule loops are incorporated reduce code size
better try to first minimize for code size, then for buffer sizes
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 20 / 39
Embedded Software Synthesis
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
24/46
Embedded Software Synthesis
the target architecture is a DSP with limited on-chip memory
SDF is scheduled to generate sequence of actor invocations
the code generator steps through the schedule and for each actor,inserts the code (C, assembly) for the related computation
the code generator generates inline code
schedule loops are incorporated reduce code size
better try to first minimize for code size, then for buffer sizes
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 20 / 39
Single Appearance Schedules I
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
25/46
Single Appearance Schedules I
single appearance schedules give the optimum size
a valid single appearance schedules that minimizes the buffer memoryrequirement over all single appearance schedules is called buffermemory optimal schedule
any consistent acyclic SDF graph has at least one valid single
appearance schedule
Definition
if we can partition the actors of an SDF graph into two subsets P1 and P2,such that P
1is precedence-independent of P
2throughout a single schedule
period then P1 is said to be subindependt of P2
If such a partition exists, the strongly connected SDF graph is looselyinterdependent, otherwise it is tightly interdependent.
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 21 / 39
Single Appearance Schedules II
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
26/46
Single Appearance Schedules II
Theorem
An SDF graph has a single appearance schedule if and only if each
strongly connected component has a single appearance schedule.
Theorem
A strongly connected, consistent SDF graph G has a single appearance
schedule if and only if every strongly connected subgraph of G is loosely
interdependent.
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 22 / 39
Loose Interdependence Algorithms
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
27/46
Loose Interdependence Algorithms
Consist three components:
acyclic scheduling algorithm: constructs single appearanceschedules for acyclic graphs
subindependence scheduling algorithm: finds a sunindependentpartition of a strongly connected SDF graph, if it is looselyinterdependent
tight scheduling algorithm: generates a valid schedule for a tightlyinterdependent SDF graph
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 23 / 39
Example
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
28/46
Example
q(A, B, ..., P) = [16, 16, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1]T
single appearance schedule for the clustered graph:(16A)(16B)(2C)1GH
for the strongly connected component: DIJKLM(2N)OPFE
Thus, for the whole graph: (16A)(16B)(2C)DIJKLM(2N)OPFEGH
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 24 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
29/46
8/14/2019 Topic 2: Untimed Model of Computation
30/46
Buffer Memory Lower Bound
8/14/2019 Topic 2: Untimed Model of Computation
31/46
Buffer Memory Lower Bound
Definition
The buffer memory lower bound (BMLB) of an SDF edge e is:
BMLB(e) =
(e) + d(e) if d(e) < (e)d(e) if d(e) (e)
where (e) = lcm(p(e), c(e))
DefinitionIf G is an SDF graph, then
eE
BMLB(e)
is called BMLB of G, and a valid single appearance schedule S for G thatsatisfies BMLB(e) = max tokens(e, S) for all e E is called a BMLBschedule for G
Many practical graphs have BMLB schedules
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 26 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
32/46
Cyclo-Static Dataflow
8/14/2019 Topic 2: Untimed Model of Computation
33/46
y
CSDF generalizes SDF by allowing the number of tokens consumedand produced by an actor to vary from one firing to the next in acyclic pattern
still possible to statically construct periodic schedules with boundedmemory
CSDF actors have one firing rule for each firing of a cyclicallyrepeated sequence
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 28 / 39
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
34/46
Cyclo-Static Dataflow Scheduling
8/14/2019 Topic 2: Untimed Model of Computation
35/46
y g
instead of scalar consumption and production rates ij for SDF, theseparameters are vectors ij for CSDF
pij = dim(ij) is the length of period of the token production patternfor the ith actor (no connection, then pij = 1)
the jth actor fires in a cycle with period Pj = lcm(pij)
if we let ij be the sum of the elements inij, then the total number
of tokens produced on an arc ina cycle of firings is:
ij = Pjij
pij
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 30 / 39
Cyclo-Static Dataflow Scheduling
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
36/46
instead of scalar consumption and production rates ij for SDF, theseparameters are vectors ij for CSDF
pij = dim(ij) is the length of period of the token production patternfor the ith actor (no connection, then pij = 1)
the jth actor fires in a cycle with period Pj = lcm(pij)
if we let ij be the sum of the elements inij, then the total number
of tokens produced on an arc ina cycle of firings is:
ij = Pjij
pij
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 30 / 39
Transforming CSDF to SDF
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
37/46
the number of actor firings that must be scheduled can beexponential relative to the number of nodes in a graph
the problem is made worse by having a cycle of Pj firings for CSDFactorswe can transform a cycle of CSDF actor firings into a single SDFactor firing with a higher order function
loop(f, N)[x1, x2, x3, ...] = [f(x1), f(x2), ..., f(xN)]should not introduce deadlocks!
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 31 / 39
CSDF Benefits
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
38/46
Dead code elimination: stateless, side effect-free sinks could be removed
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 32 / 39
CSDF Benefits
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
39/46
Dead code elimination: stateless, side effect-free sinks could be removed
Parallelism: more room for parallelism in CSDF
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 32 / 39
Concluding CSDF
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
40/46
the loop transformation allows us to use SDF scheduling techniquesfor CDSF graphs
adding a single multirate actor mixer, gives us full advantage ofCSDF without the need to support its full generality
mixer is a generalization of the distributor and commutator
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 33 / 39
Outline
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
41/46
1 Kahn Process Networks
2 Dataflow Process Networks
3 Synchronous Dataflow Process Networks
4 Communicating Sequential Processes
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 34 / 39
Introduction
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
42/46
suggested first as a base for a programming language for concurrentprogramming
later developed as a process algebra enabling program verification
programs are written by parallel composition of a set of sequential
processescommunication between processes is done using message passing viaunidirectional channels (no shared state or variables)
communication is done using rendezvous mechanism i.e., reads and
writes are blocking and no buffering of messages is donecompared to process networks (PN), CSP it is non-deterministic
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 35 / 39
Communication in CSP domain
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
43/46
Guarded communication statements provide nondeterministic rendezvousguard; communication => statements;
Conditional IF
CIF {G1;C1 => S1;
[]
G2;C2 => S2;
[]
...
}
Conditional DO
CDO {G1;C1 => S1;
[]
G2;C2 => S2;
[]
...
}
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 36 / 39
Informal Algebraic Description
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
44/46
Primitives
Events: represent communications or interactions
Primitive Processes: represent fundamental behaviorsAlgebraic operators
Prefix: combines an event and a process to produce a new process (a P)Deterministic Choice: allows the future evolution of a process to be definedas a choice between two component processes, and allows the environment to
resolve the choice by communicating an initial event for one of the processes((a P)(b Q))Nondeterministic Choice: does not allow the environment any control overwhich of the component processes will be selected ((a P) (b Q))Interleaving: represents completely independent concurrent activity (P|||Q)
Interface Parallel: the interface parallel operator represents concurrentactivity that requires synchronization between the component processes(P|[{a}]|Q)Hiding: provides a way to abstract processes, by making some eventsunobservable ((a P) \ {a})
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 37 / 39
Languages, Tools, and Extensions
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
45/46
OCCAM is a language which is totally based on CSP
many implementations of CSP exist in Java, C++, C#, Haskell, etc.
the newly introduced language GO is said to be CSP based forconcurrency
refinement checker and model checking tools for CSP descriptions are
developed (FDR2, ARC, ProB, etc.)
TCSP is an extension of CSP with the notion of time (suitable fordescribing real-time systems)
CSPP extends CSP with the notion of priority (suitable for describing
resource-limited systems)HCSP extends CSPP with new synchronization primitives and amodeling of state (suitable also for hardware modeling)
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 38 / 39
References
http://find/http://goback/8/14/2019 Topic 2: Untimed Model of Computation
46/46
Shuvra S. Battacharyya, Edward A. Lee, and Praveen K. Murthy.Software Synthesis from Dataflow Graphs.Kluwer Academic Publishers, 1996.
Shuvra S. Bhattacharyya, Praveen K. Murthy, and Edward A. Lee.Synthesis of embedded software from synchronous dataflow specifications.The Journal of VLSI Signal Processing, 21(2):151166, June 1999.
C. A. R. Hoare.Communicating sequential processes.
Commun. ACM, 21(8):666677, 1978.E.A. Lee and D.G. Messerschmitt.Synchronous data flow.Proceedings of the IEEE, 75(9):12351245, 1987.
E.A. Lee and T.M. Parks.Dataflow process networks.Proceedings of the IEEE, 83(5):773801, 1995.
T.M. Parks, J.L. Pino, and E.A. Lee.A comparison of synchronous and cycle-static dataflow.In Signals, Systems and Computers, 1995. 1995 Conference Record of the Twenty-NinthAsilomar Conference on, volume 1, pages 204210 vol.1, 1995.
Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 39 / 39
http://find/http://goback/