Topic 2: Untimed Model of Computation

8/14/2019 Topic 2: Untimed Model of Computation

1/46

Topic 2: Untimed Model of Computation

Seyed Hosein Attarzadeh Niaki

KTH

November 17, 2009

Seyed Hosein Attarzadeh Niaki (KTH) Topic 2: Untimed Model of Computation November 17, 2009 1 / 39
http://find/http://goback/


2/46

Outline

1 Kahn Process Networks

2 Dataflow Process Networks

3 Synchronous Dataflow Process NetworksEmbedded Software SynthesisCyclo-Static Dataflow

4 Communicating Sequential Processes



3/46

Outline



3 Synchronous Dataflow Process Networks




4/46

Introduction

introduced as a simple language for parallel programming

programs are composed of computing stations which communicateover one-way FIFO like channels using blocking read (wait) andnon-blocking write (send) semantics

communication lines are the only way by which computing stationsmay communicate

a communication line transmits information within an unpredictablebut finite amount of time

at any given time, a computing station is either computing or waitingfor information on one of its input lines

each computing station follows a sequential program



5/46

Model Properties

the formal model is based on associating continuous functions fromhistory of node inputs to history of their outputs

Monotonicity: a machine need not have all of its inputs to startcomputing, since future input concerns only future outputcontinuity prevents station from deciding to send some output onlyafter it has received an infinite amount of input

the set of assigned functions together with the inputs form a set of

fixpoint equations with a unique minimal solution



6/46

Model Properties

the formal model is based on associating continuous functions fromhistory of node inputs to history of their outputs

Monotonicity: a machine need not have all of its inputs to startcomputing, since future input concerns only future outputcontinuity prevents station from deciding to send some output onlyafter it has received an infinite amount of input

the set of assigned functions together with the inputs form a set of

fixpoint equations with a unique minimal solution



7/46

Recursion

recursive parallel programs allow unbounded number of machines to

compute in parallela recursive parallel schema F, may include the node F with the sameoccurrence of input and output lines

start unfolding not when the input present, but when the output isrequested (laziness?)



8/46

Outline







9/46

Introduction

a dataflow actor, when it fires, maps input tokens into output tokens

a set of firing rules specify when an actor can fire

a firing consumes input tokens and produces output tokens

a sequence of such firings is a particular type of a Kahn process thatwe call a dataflow process

a network of such processes is called a dataflow process network

dataflow process networks are a special case of Kahn process networks



10/46

Streams

Streams and functions operating on them are a natural way to modelreactive systems. Streams are interpreted differently

defining streams recursively using cons operator and treat them usingfunctional semantics

as a two-element cell, head of the stream and procedure forcomputing the rest of stream (recursion without lazy semantics)

streams as channels (like KPN)

representing streams as recursive-cons and translating to channels for

implementationassociating with each stream a clock (Lustre and Signal)



11/46

Firing Rules

an actor with p 1 input streams can have N firing rules

R = {R1, R2, ..., RN} (1)

the actor can fire when one or more of its firing rules is satisfied

Ri = {Ri,

1, Ri,

2, ..., Ri,

p} (2)

for firing rule i to be satisfied, each pattern Ri,j must form a prefix ofthe sequence of unconsumed tokens at input j.



12/46

Firing Rules

an actor with p 1 input streams can have N firing rules

R = {R1, R2, ..., RN} (1)

the actor can fire when one or more of its firing rules is satisfied

Ri = {Ri,

1, Ri,

2, ..., Ri,

p} (2)

for firing rule i to be satisfied, each pattern Ri,j must form a prefix ofthe sequence of unconsumed tokens at input j.

firing rules example

R1 = {[],, [T]} (3)

R2 = {, [], [F]} (4)



13/46

Continuity, Monotonicity, Sequentiality

for a dataflow process to be continuous, each actor firing must befunctional, and the set of firing rules must be sequential

functional means that an actor firing lacks side effects and that outputtokens are purely function of consumed input tokenssequential means that the firing rules can be tested in a predefinedorder using only blocking reads.



14/46

Execution Models

Concurrent Processes: multitasking with demand-driven style,configuration of network on the fly is allowed (recursive processdefinition), context switch is expensive

Dynamic Scheduling of Dataflow Process Networks: instead ofcontext switch, scheduler tracks actor inputs and fires them

Static Scheduling of Dataflow Process Networks: possible forsynchronous dataflow networks, avoids overhead of runtime scheduler(except for select and switch)

Compilation of Dataflow Graphs: static memory allocation for

efficient compilationThe tagged-Token Model: each token has a tag, firing is enabledby presence of same tagged tokens on inputs, no need for FIFO



15/46

Outline



3 Synchronous Dataflow Process NetworksEmbedded Software SynthesisCyclo-Static Dataflow




16/46

Introduction

Synchronous DataFlow (SDF) is a special case of dataflowthere is always a single firing rule or equivalently,

the number of consumed and produced tokens on each input/outputfor a single firing of an actor is fixed and known a priori.

in a homogeneous SDF graph (process network), all nodes produce orconsume a single token on each input or output arc in each invocation

a delay of d samples on an arc is implemented by initializing the arcbuffer with d zero samples


S h d li


17/46

Scheduling

a schedule determines when and where nodes fire

a blocked schedule is a periodic schedule where each cycle terminatesbefore the next cycly begins

the iteration period is the length of one cycle of a blocked schedule =reciprocal of throughput

putting extra delays on feed-forward paths is called pipelining


F li f C i SDF G h


18/46

Formalism for a Consistence SDF Graph

an SDF graph can be characterized by a matrix (similar to incident matrix)

assuming the node invoked at time n as vector v(n) and the number ofavailable tokens in each buffer as vector b(n):

b(n + 1) = b(n) + v(n)


F li f C i SDF G h


19/46

Formalism for a Consistence SDF Graph

an SDF graph can be characterized by a matrix (similar to incident matrix)

assuming the node invoked at time n as vector v(n) and the number ofavailable tokens in each buffer as vector b(n):

b(n + 1) = b(n) + v(n)

identifying inconsistence sample rates: a necessary condition forexistence of a periodic sequential schedule with bounded memory is:rank() = s 1insufficient sample rates: every directed loop must have at least onedelay


Cl S Al ith


20/46

Class S Algorithms

Definition

Given a positive integer q such that q = 0 and an initial state for thebuffers b(0), the ith node is said to be runnable at a given time, if it hasnot been run qi times and running it will not cause a negative buffer size.

Definition

A class S algorithm is any algorithm that schedules a node if it isrunnable, updates b(n), and stops only when no more nodes are runnable.

any class S algorithm will run to completion if a periodic admissiblesequential schedule (PASS) exists for a given SDF graph.


S h d li f P ll l P


21/46

Scheduling for Parallel Processors

a periodic admissible parallel schedule (PAPS) specifies a periodicschedule for each processor

if p is the smallest positive integer vector in the nullspace of thenthe cycle of schedule invokes each node the number of times given byq = Jp, for some positive integer J which is called blocking factor


Static Buffering


22/46

Static Buffering

Fifo queues are costly and involve considerable run-time overhead

we can statically compute the location of the input and outputsamples for each invocation of the code

static buffering means each invocation of a node in a cyclic periodaccesses the same locations in the memory for its inputs and outputs

assuming a node is invoked q times in a period and produces i tokenseach time to a circular buffer of size N, static buffering occurs if andonly if iq = KN (for some integer K)


Embedded Software Synthesis


23/46


the target architecture is a DSP with limited on-chip memory

SDF is scheduled to generate sequence of actor invocations

the code generator steps through the schedule and for each actor,inserts the code (C, assembly) for the related computation

the code generator generates inline code

schedule loops are incorporated reduce code size

better try to first minimize for code size, then for buffer sizes




24/46


the target architecture is a DSP with limited on-chip memory

SDF is scheduled to generate sequence of actor invocations

the code generator steps through the schedule and for each actor,inserts the code (C, assembly) for the related computation

the code generator generates inline code

schedule loops are incorporated reduce code size

better try to first minimize for code size, then for buffer sizes


Single Appearance Schedules I


25/46

Single Appearance Schedules I

single appearance schedules give the optimum size

a valid single appearance schedules that minimizes the buffer memoryrequirement over all single appearance schedules is called buffermemory optimal schedule

any consistent acyclic SDF graph has at least one valid single

appearance schedule

Definition

if we can partition the actors of an SDF graph into two subsets P1 and P2,such that P

1is precedence-independent of P

2throughout a single schedule

period then P1 is said to be subindependt of P2

If such a partition exists, the strongly connected SDF graph is looselyinterdependent, otherwise it is tightly interdependent.


Single Appearance Schedules II


26/46

Single Appearance Schedules II

Theorem

An SDF graph has a single appearance schedule if and only if each

strongly connected component has a single appearance schedule.

Theorem

A strongly connected, consistent SDF graph G has a single appearance

schedule if and only if every strongly connected subgraph of G is loosely

interdependent.


Loose Interdependence Algorithms


27/46

Loose Interdependence Algorithms

Consist three components:

acyclic scheduling algorithm: constructs single appearanceschedules for acyclic graphs

subindependence scheduling algorithm: finds a sunindependentpartition of a strongly connected SDF graph, if it is looselyinterdependent

tight scheduling algorithm: generates a valid schedule for a tightlyinterdependent SDF graph


Example


28/46

Example

q(A, B, ..., P) = [16, 16, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1]T

single appearance schedule for the clustered graph:(16A)(16B)(2C)1GH

for the strongly connected component: DIJKLM(2N)OPFE

Thus, for the whole graph: (16A)(16B)(2C)DIJKLM(2N)OPFEGH



29/46


30/46

Buffer Memory Lower Bound


31/46

Buffer Memory Lower Bound

Definition

The buffer memory lower bound (BMLB) of an SDF edge e is:

BMLB(e) =

(e) + d(e) if d(e) < (e)d(e) if d(e) (e)

where (e) = lcm(p(e), c(e))

DefinitionIf G is an SDF graph, then

eE

BMLB(e)

is called BMLB of G, and a valid single appearance schedule S for G thatsatisfies BMLB(e) = max tokens(e, S) for all e E is called a BMLBschedule for G

Many practical graphs have BMLB schedules



32/46

Cyclo-Static Dataflow


33/46

y

CSDF generalizes SDF by allowing the number of tokens consumedand produced by an actor to vary from one firing to the next in acyclic pattern

still possible to statically construct periodic schedules with boundedmemory

CSDF actors have one firing rule for each firing of a cyclicallyrepeated sequence



34/46

Cyclo-Static Dataflow Scheduling


35/46

y g

instead of scalar consumption and production rates ij for SDF, theseparameters are vectors ij for CSDF

pij = dim(ij) is the length of period of the token production patternfor the ith actor (no connection, then pij = 1)

the jth actor fires in a cycle with period Pj = lcm(pij)

if we let ij be the sum of the elements inij, then the total number

of tokens produced on an arc ina cycle of firings is:

ij = Pjij

pij


Cyclo-Static Dataflow Scheduling


36/46

instead of scalar consumption and production rates ij for SDF, theseparameters are vectors ij for CSDF

pij = dim(ij) is the length of period of the token production patternfor the ith actor (no connection, then pij = 1)

the jth actor fires in a cycle with period Pj = lcm(pij)

if we let ij be the sum of the elements inij, then the total number

of tokens produced on an arc ina cycle of firings is:

ij = Pjij

pij


Transforming CSDF to SDF


37/46

the number of actor firings that must be scheduled can beexponential relative to the number of nodes in a graph

the problem is made worse by having a cycle of Pj firings for CSDFactorswe can transform a cycle of CSDF actor firings into a single SDFactor firing with a higher order function

loop(f, N)[x1, x2, x3, ...] = [f(x1), f(x2), ..., f(xN)]should not introduce deadlocks!


CSDF Benefits


38/46

Dead code elimination: stateless, side effect-free sinks could be removed


CSDF Benefits


39/46

Dead code elimination: stateless, side effect-free sinks could be removed

Parallelism: more room for parallelism in CSDF


Concluding CSDF


40/46

the loop transformation allows us to use SDF scheduling techniquesfor CDSF graphs

adding a single multirate actor mixer, gives us full advantage ofCSDF without the need to support its full generality

mixer is a generalization of the distributor and commutator


Outline


41/46






Introduction


42/46

suggested first as a base for a programming language for concurrentprogramming

later developed as a process algebra enabling program verification

programs are written by parallel composition of a set of sequential

processescommunication between processes is done using message passing viaunidirectional channels (no shared state or variables)

communication is done using rendezvous mechanism i.e., reads and

writes are blocking and no buffering of messages is donecompared to process networks (PN), CSP it is non-deterministic


Communication in CSP domain


43/46

Guarded communication statements provide nondeterministic rendezvousguard; communication => statements;

Conditional IF

CIF {G1;C1 => S1;

[]

G2;C2 => S2;

[]

...

}

Conditional DO

CDO {G1;C1 => S1;

[]

G2;C2 => S2;

[]

...

}


Informal Algebraic Description


44/46

Primitives

Events: represent communications or interactions

Primitive Processes: represent fundamental behaviorsAlgebraic operators

Prefix: combines an event and a process to produce a new process (a P)Deterministic Choice: allows the future evolution of a process to be definedas a choice between two component processes, and allows the environment to

resolve the choice by communicating an initial event for one of the processes((a P)(b Q))Nondeterministic Choice: does not allow the environment any control overwhich of the component processes will be selected ((a P) (b Q))Interleaving: represents completely independent concurrent activity (P|||Q)

Interface Parallel: the interface parallel operator represents concurrentactivity that requires synchronization between the component processes(P|[{a}]|Q)Hiding: provides a way to abstract processes, by making some eventsunobservable ((a P) \ {a})


Languages, Tools, and Extensions


45/46

OCCAM is a language which is totally based on CSP

many implementations of CSP exist in Java, C++, C#, Haskell, etc.

the newly introduced language GO is said to be CSP based forconcurrency

refinement checker and model checking tools for CSP descriptions are

developed (FDR2, ARC, ProB, etc.)

TCSP is an extension of CSP with the notion of time (suitable fordescribing real-time systems)

CSPP extends CSP with the notion of priority (suitable for describing

resource-limited systems)HCSP extends CSPP with new synchronization primitives and amodeling of state (suitable also for hardware modeling)


References


46/46

Shuvra S. Battacharyya, Edward A. Lee, and Praveen K. Murthy.Software Synthesis from Dataflow Graphs.Kluwer Academic Publishers, 1996.

Shuvra S. Bhattacharyya, Praveen K. Murthy, and Edward A. Lee.Synthesis of embedded software from synchronous dataflow specifications.The Journal of VLSI Signal Processing, 21(2):151166, June 1999.

C. A. R. Hoare.Communicating sequential processes.

Commun. ACM, 21(8):666677, 1978.E.A. Lee and D.G. Messerschmitt.Synchronous data flow.Proceedings of the IEEE, 75(9):12351245, 1987.

E.A. Lee and T.M. Parks.Dataflow process networks.Proceedings of the IEEE, 83(5):773801, 1995.

T.M. Parks, J.L. Pino, and E.A. Lee.A comparison of synchronous and cycle-static dataflow.In Signals, Systems and Computers, 1995. 1995 Conference Record of the Twenty-NinthAsilomar Conference on, volume 1, pages 204210 vol.1, 1995.


Topic 2: Untimed Model of Computation

Documents