Top Banner
AAA which application form ? R. de Simone INRIA EPI Aoste semi-technical considerations
24

AAA which application form ?

Feb 24, 2016

Download

Documents

fairly

AAA which application form ?. R. de Simone INRIA EPI Aoste semi-technical considerations. A dequation A lgorithme - A rchitecture A llocation A pplication / A rchitecture. m apping comprises: spatial allocation: distribution, placement (of tasks on resources) - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: AAA which application form ?

AAA

which a

pplic

ation

form

?

R. de SimoneINRIA EPI Aoste

semi-technical considerations

Page 2: AAA which application form ?

Adequation Algorithme-ArchitectureAllocation Application / Architecture

mapping comprises:– spatial allocation: distribution, placement (of tasks on resources)

– temporal allocation: scheduling

original formulation: Yves Sorel (1988)since then rephrased as Platform-based design, Y-Chart approach,…

ArchitectureApplicationmapping

adequation

Page 3: AAA which application form ?

Adequation Algorithme-ArchitectureAllocation Application / Architecture

mapping comprises:– spatial allocation: distribution, placement (of tasks on resources)

– temporal allocation: scheduling

architecture trends: – networks of processors (multicore, GPGPU/MIC, many-buzz)– communication bandwith the issue, on-chip networks… spatial routing, temporal arbitration, what else ?

ArchitectureApplicationmapping

Page 4: AAA which application form ?

Adequation Algorithme-ArchitectureAllocation Application / Architecture

what kind of application description models ?– where concurrency and timing could be extracted, made explicit– Under favorable conditions, could be performed at compile time

(static, time-predictable)– Dimensioning should be feasible to optimally fit the architecture

(cache sizes, PRET notions,…) – (Existing theories developed in neighboring domains could be

combined and adaptedGOAL is to refine the application so it reflects the architecture

ArchitectureApplicationmapping

Page 5: AAA which application form ?

5

Draft example: Mapping

Bus

Proc1

Proc2

Page 6: AAA which application form ?

6

Mapping = spatial distribution/routing + temporal scheduling

Bus

Proc1

Proc2

Page 7: AAA which application form ?

Applications: the scope MoCCs

Syn/Poly-chronous

Process Networks

Affine bounds nested loops

explicit distribution, concurrency extraction

explicit scheduling, time constraints extraction

Page 8: AAA which application form ?

Applications: the scope MoCCs

• Exercice: count the communities(do not forget Model-Driven Engineering for model transformations, and Classical or Real-Time Scheduling)

Syn/Poly-chronous

Process Networks

Affine bounds nested loops

explicit distribution, concurrency extraction

explicit scheduling, time constraints extraction

Page 9: AAA which application form ?

Motivations for the joint use of these models

• Similar restrictions !!– little (in fact no) concern for data values– control largely data-independent (uninterpreted functions)– finite-state control property– role of conflict-freeness as functional determinism– Formal models and mathematical analysis– Issues tacked at compile time

but…– Largely developed independently (even if in neighboring groups)– Lack of common vocabulary, or of common framework, or common

drive (risks to loose AAA grade ?)

…Or is it just me who needs to go back to school ?

Page 10: AAA which application form ?

Process Networks

• Data-flow: computations triggered by data availability– Pure Data-Flow: Marked Graphs, SDF extension

– w/ data route switches: Boolean DF, CycloStatic DF, Kahn PNs

• Conflict-freeness: all computations amount to same partial order, only different scheduling/time assignments

• ASAP schedule: provides best throughput

• correctness issues: safety and liveness• Optimization issues: optimize buffer sizes while preserving

throughputSecond-level semantics: computations according to schedules (activation conditions) Representation as polychronous descriptions

Page 11: AAA which application form ?

Explicit timing for Process Networks

• Classical scheduling of Marked Graphs, Synchronous Data Flow graphs

• Latency-Insensitive Design– Ultimately k-periodic schedules

Represented with– N-synchronous formalisms– Signal and affine clock calculus

Syn/Poly-chronous

Process Networks

Page 12: AAA which application form ?

Ultimately k-periodic scheduling

f

g1,1

131

2,1

1,1

1,11bf

ef111

(11010)

(10101)

(10110)

(01101)

(01011)

(11010)

[00100]

loop pause; emit clk; pause; pause; emit clk; pause; emit clk; pause;

endloop

Page 13: AAA which application form ?

Explicit routing

• Regular switching patterns in BDF, CSDF, KPNs– Extending scheduling with similar expressivity (ultimately

periodic infinite binary words)KRGs (K-periodically) Routed Graphs– Axiomatics for an equational theory of communication re-wiring

• Allows to consider sharing of communications (on common interconnect structures– Interplay of scheduling and routing (arbitration between

communications using the same channels)

Syn/Poly-chronous

Process Networks

Page 14: AAA which application form ?

Interconnect modeling and optimization

•On-Chip Networks

•Switching conditions of Select/Marge nodes can configure

different communication paths, possibly overlapping in time

•Predictable routing schemes (ultimately k-periodic) will match the

temporal schedules obtained in classical Process Network scheduling

theory

C1

C2

C3

C4 Merge node

Select node

Page 15: AAA which application form ?

C1

C2

C3

C4

one possible routing configuration1

00

0

0

0

0

00

1

1

1

1

1

1

1

Page 16: AAA which application form ?

C1

C2

C3

C4

another possible configuration

1

0

1

11

111

1 1

11

1

0 0

0

0

0

0

00

0

00

0

Page 17: AAA which application form ?

Nested loops with affine bounds

•Parallel compilation models– static control (regular indices, not while (data_cond) loops)

– Iteration space (multidimensional arrays, polyhedra)

– Source-to-source transformations (rewrite as program, only with DOSEQ and DOPAR at various levels)

– Improving data locality

• Variants– Systems of Uniform Recurrence Equations (SUREs, MMAlpha)– Geometrical description formalisms (Array-OL)

Page 18: AAA which application form ?

Example + Reduced Data Graphs

Process Network

Affine bounds

nested loops

loop i = 1 to N loop j = 1 to N a(i,j) := a(i-1,j-1) + a(i, j-1) endend

dependence levels

2 1

direction vectors

01

11

DOSEQ j = 1 to N DOPAR i = 1 to N a(i,j) := a(i-1,j-1) + a(i, j-1) endend

Page 19: AAA which application form ?

From Nested Loops to Process Networks

• Existing efforts: basic idea is to assign one computing node to each assignment

• COMPAAN, Pico Express• Tiling, chunking • Stefanov, et al, Polyhedral Process Networks• Multidimensional SDF for Array-OL/Gaspard

Process Network

Affine bounds

nested loops

Page 20: AAA which application form ?

21

Dream (or nightmare?)• How can one produce Process Network descriptions from

nested loops (or better said, SUREs or RDGs)• Of course limited, but very often already polyhedra (for

bounds) separated from dependency graphs (for computations)

• Needs to split clearly between (sequential) iterations and parallel ones (currently expanded)• Issue of potential (application) vs real

(architecture) concurrency/parallelisma(i,j)

i

j

a(i,j-1)a(i-1,j-1)

8

4

Page 21: AAA which application form ?

22

Dream (or nightmare?)• Greenish nodes provide constants (in parallel at the

bottom, sequentially on the left)• Blue nodes each compute 4 values in parallel, shifts the

last right across the borderLack of generality certainly when values to be sent across boundaries are not ordered as expected at target

a(i,j)

i

j

a(i,j-1)a(i-1,j-1)

8

4

a[1,1..4] a[1,5..8]

4a[1..4,0])

1 1

3 34

0.(1) 0.(1)

Page 22: AAA which application form ?

Challenges of it

• Not entirely clear (to me at least) how data locality and transfer (if any) is dealt with in different works

• Typical: a data block is transfered in some order (row), and consumed in different order (column)

Page 23: AAA which application form ?

SoC/NoC intelligent routers ?

• To match (and be programmed from) the application, routers – should be able to fork the data,– Should only care about directions (not single target)– Should let ccommunications cross one another if not using same lines– Should be able to buffer values (at least in the size of the processor array)

P Core P Core P Core P Core

P Core P Core P Core P Core

P Core P Core P Core P Core

Page 24: AAA which application form ?

ThA Ank You

Questions anybody ?

ThAAAnk YouA