Top Banner
Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee , Ritesh Parikh and Valeria Bertacco University of Michigan
33

Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

Dec 13, 2015

Download

Documents

Eric Edwards
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

Highly Fault-tolerant NoC Routingwith Application-aware

Congestion Management

Doowon Lee, Ritesh Parikh and Valeria BertaccoUniversity of Michigan

Page 2: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

2

Wide Range of Applications

(picture sources) 1. N-body simulation: https://www.astro.rug.nl/~weygaert 2. semiconductor: http://spectrum.ieee.org 3. computational biology: http://csbio.cs.umn.edu/ 4. molecular structure: http://nanotechnologyuniverse.com

everyday applications

cloud computing

physical simulation

scientific applications

computationalchemistry

computational biology

semiconductorsimulation

varying computation characteristic,user requirement, etc.

Page 3: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

3

Application Running on Network-on-Chip

(picture sources) 1. Video encoder: Gary Sullivan et al., Standardized Extensions of High Efficiency Video Coding (HEVC) 2. Tilera TILE-Gx8072: http://www.tilera.com

application example: video encoder

chip multiprocessorwith network-on-chip (NoC)

mapping

analysis

communication frequency

destination

sour

ce

64-thread simulationof SPLASH-2 (ocean)

(number of flits)

some pairscommunicatemore frequently

A

B

B

A

Page 4: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

4

Fragile Networks-on-Chip

increasing transistor density transistor reliability↓

network-on-chip… possible single point of failure

22 nm(Intel)

14 nm(Intel)

7 nm(IBM)

tail of transistor scaling

permanent faults solution:network-on-chiproutingreconfiguration

Page 5: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

5

How to reduce NoC degradation from faults?

state-of-the-artrouting reconfiguration[Aisopos 11]

0 10 20 30 40 50 600

2

4

6

8

10

number of faults affecting the NoC

satu

ratio

n th

roug

hput

(fl

its/c

ycle

) minimum throughput requirement

our goal

motivating experiment: fault vs. performance degradation

KEY IDEA: application-aware routing optimized to application’s communication patterns

Network-on-chip reconfiguration entails performance degradation

Page 6: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

6

solution 1problem

Application-Aware Routing (1/2)

various route options(no restriction)

S

D

path diversity = 6

1

1

1

1

1

2

1

3

1 2

1 3

S

D

1

1

1

1

1

1

1

1

1 2

path diversity = 3

deadlock-free

deadlock possible

avoid deadlock

by restricting turns 0 0

How do we find adaptive routing optimized to communication patterns?

Page 7: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

7

solution 2

Application-Aware Routing (2/2)

various route options(no restriction)

S

D

path diversity = 6

1

1

1

1

1

2

1

3

1 2

1 3

Where to best place turn restrictions? NP-complete problem

path diversity = 6

11S

D

1

1

1

2

1

3

1 3

1 2

How do we find adaptive routing optimized to communication patterns?

OUR CONTRIBUTION: turn-restriction placement heuristic

deadlock possible

avoid deadlock

by restricting turns

problem

Page 8: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

8

Presentation Outline

• FATE (Fault- and Application-aware Turn-model Extension)(1) Turn-enabling rules (2) Load estimation “How to reduce search?” “Which is the most valuable turn?”

(3) Overall routing computation algorithm

• Experimental evaluation• Conclusions

0 1

3 4

2

5

6 7 8

Page 9: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

9

How to reduce turn-restriction search?

To avoid unfruitful turn-restriction patterns…

0 1

2 3

pattern 1. network disconnection pattern 2. non-minimal restriction

pattern 3. possible deadlock

0 1

3 4

2

5

0 1

2 3

Page 10: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

10

Turn-Enabling Rules

0 1

3 4

2

5

6 7 8

basic rules

enable adjacent turns(cycle, node, link)

0 1

4 5

2

6

8 9

3

7

10 11

15141312

advanced rules

enable remote turns(horizontal, vertical, diagonal)

… each time a turn is disabled, several others should be enabledTo avoid unfruitful turn-restriction patterns…

Page 11: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

11

Traffic-Load Estimation

Which is the most valuable turn? use traffic-load estimation to decide

specific goals(1) balancing link utilization(2) prioritizing turns that are critical

load calculation steps

pathdiversity

linkload

turnload

cycleload

weightscaling

take into account hop-by-hop route-decisions

Page 12: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

12

Traffic-Load Estimation Step by Step

0 1

4 5

8 9

3

7

11

151312

2

6

10

14source

destination

1

1

1

1

1

1

1

1

1

2

1

0

3

3

1

2

3

3

3

3

6

6

path diversity

link load

turn load

cycle load

weight scaling

multiply by communication frequencymedium traffic low traffic

high traffic

Page 13: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

13

Example: Link, Turn, Cycle Load (1/2)

link load (from path diversity)

0 1

4 5

8 9

3

7

11

151312

2

6

10

14source

destination

1

1

pathdiversity

1

1

1

1

1/2 = 0.5

0.5

link load

0.25

0.25

0.25

0.25

90.25

0.25

turn load

0.125

0.1250

0.25

9 10

1413

cycle load

sum: 2 4

0.125

diversity link turn cycle scaletraffic-load estimation 5 steps:

1

1

1

2

1

0

0.17

0.17

0.17

0.33

0.17

0

6

Page 14: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

14

Example: Link, Turn, Cycle Load (2/2)

link load (from path diversity)

0 1

4 5

8 9

3

7

11

151312

2

6

10

14source

destination

1/2 = 0.5

0.5

link load

0.25

0.25

0.25

0.25

0.25

turn load0.25

0(no path)

9 10

1413

0.125

cycle load

14

sum:0.3750.25

diversity link turn cycle scaletraffic-load estimation 5 steps:

0.17

0.17

0.17

0.33

0.17

0

Page 15: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

15

Example: Weight Scaling

1

4 5

8 9

7

11

13S1

D12

6

10

14

0.125

0.250.38

0.38

most congested cycle

1

4 5

8 9

7

11

D213S1

D1S2 2

6

10

14

2.5

53

3

9.8

8

13.2

12.5

9.2

9

scaling

sourcedestination S1D1 S2D2communication frequency 20 8

D2

S2

13.5

Page 16: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

16

Putting it all together

1

4 5

8 9

7

11

D213S1

D1S2 2

6

10

14

1

4 5

8 9

7

11

D213S1

D1S2 2

6

10

14

1) evaluate turns, one at a time (choose the one leading to least congestion)

2) apply turn-enabling rules

iterate this process until no undecided turn is left

1 2

3 4

Page 17: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

17

Backtracking

deadlock possible due to greedy turn-restriction selections turn-enabling rules do not resolve all deadlock-causing patterns

backtrack to the last decision

example placement

0 1

4 5

2

6

8 9

3

7

10 11

decision tree

node 5turn NW

node 6turn NE

deadlockdetected

backtra

ck

node 3turn SW

Page 18: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

18

FATE Route-Computation Procedure

start (trigger)

end

estimate traffic load

choose turn to bedisabled

deadlock?disconnect?

no undecidedturn?

apply turn-enablingrules

back

trac

k

loop

: disabled turn: enabled turn: undecided turn

: high traffic: medium traffic: low traffic

network example

procedure flowchart trigger: (1) new application launch(2) fault occurrence

Page 19: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

19

Presentation Outline

• FATE routing

• Experimental evaluation

– Experimental setup

– Evaluation on faulty topologies

– Evaluation on fault-free topologies

– Overheads

• Conclusions

Page 20: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

20

Experimental Setup

• BookSim simulation with 8 X 8 mesh networks– 3-stage router pipeline, 2 VCs/protocol class, 5 flits/VC

• Fault injection– faults in bidirectional links– 5 fault rates: 1 faulty link, 3%, 5%, 10%, and 15% faulty links– 10 random fault patterns for each fault rate

• Traffic benchmarks– 5 synthetic patterns: bit complement, bit reversal, shuffle, transpose,

uniform random– 11 traces from SPLASH-2 multi-threaded workloads

• generated from gem5 simulation with MESI cache coherence• 4 memory controllers at mesh corners

Page 21: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

21

Prior Routing Solutions

• Fault-tolerant routing– Breadth-First Search (BFS) [Schroeder 91, Aisopos 11]– Depth-First Search (DFS) [Sancho 04]

• Application-aware routing– Bandwidth-Sensitive Oblivious Routing (BSOR) [Kinsy 09, Kinsy 13] – Application-Specific Routing Algorithms (APSRA) [Palesi 08]

• Fully-adaptive routing on 2D mesh (congestion management)– Dynamic XY (DyXY) [Li 06]– Neighbor on Path (NoP) [Ascia 08]– Regional Congestion Awareness (RCA) [Gratz 08]

Page 22: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

22

Saturation Throughput for Synthetic Patterns

number of faulty links

satu

ratio

n th

roug

hput

(pac

ket/

cycl

e/ro

uter

)

0

0.01

0.02

0.03

0.04

0.05BFS DFS BSOR APSRA FATE

bitcomp bitrev shuffle transpose uniform0

0.01

0.02

0.03

0.04

0.05BFS DFS BSOR APSRA FATE

traffic pattern

satu

ratio

n th

roug

hput

(pac

ket/

cycl

e/ro

uter

)

9.5% 10.6% 17.7%23.3%

33.3%

5.5% -0.5% 0.1%2.9%

9.3%

less performancedegradation asfaults increase

33.3% ↑ over fault-tolerant routing

9.3% ↑ over app.-aware routing

gains maximizedwith unbalancedload

still provide gainwith uniform load

(15% fault rate)

fault-tolerant application-aware our solution

Page 23: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

23

Packet Latency for SPLASH-2 Traces

1 fault 3% faults 5% faults 10% faults 15% faults0

20406080

100120

BFS DFS BSOR APSRA FATE

aver

age

pack

et la

tenc

y (c

ycle

s)

number of faulty links

0

20

40

60

80

100

120

benchmark programaver

age

pack

et la

tenc

y (c

ycle

s)

minimal increaseuntil 5% faults

up to 59% (13%)latency reductionover BFS (APSRA)

13%

228 cycles

59%

significantly lowerlatency in 5 programs

(15% fault rate)

Page 24: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

24

Performance on Fault-Free Meshes

3 VCs 4 VCs 6 VCs0

0.02

0.04

0.06

0.08

0.1DORDyXYNoPRCA1DBFSDFSBSORAPSRAFATE

number of VCs

satu

ratio

n th

roug

hput

(pac

ket/

cycl

e/ro

uter

)

fully-adaptive

fault-tolerant

application-aware

Compared to DOR, fault-tolerant and application-aware routing,FATE always provides higher saturation throughput ( better traffic-load estimation)

Compared to fully-adaptive,FATE outperforms at small number of VCs ( more VCs for normal transfer)

deterministic

our solution

Page 25: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

25

Overheads

• Software computation– 2-4 sec for 8X8 meshes on Intel Xeon® processor

(two orders of magnitude faster than APSRA)

– ~110 turn-placement attempts

(little dependence on fault rate)

• Hardware overheads– Area: 6% increase (routing table, route-computation logic)

– Power consumption not measured

• Better power-efficiency than APSRA

• Can be more power-efficient than application-agnostic solutions

when reusing same routing multiple times

Page 26: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

26

Conclusions

• FATE provides highly fault-tolerant routing with graceful

performance degradation by leveraging application traffic patterns

• Performance improvement over existing fault-tolerant routing

33% improvement in saturation throughput (synthetic traffic patterns)

59% improvement in packet latency (SPLASH-2 traces)

• Two orders of magnitude faster route-computation

Page 27: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

27

Thank you! Question?

Page 28: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

28

Backup Slides

Page 29: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

29

Various Turn-Restriction Choices

exponential increase of turn-restrictionchoices as network size increases

4 possibilities

16 possibilities (not shown other 8 cases)

2-D mesh with M nodes con-tains possibilities𝟒(√𝑴−𝟏)×(√𝑴−𝟏)

example 1: 4 nodes

example 2: 6 nodes

Page 30: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

30

Basic Turn-Enabling Rules(Cycle, Node, Link)

0 1

2 3

rule 1(cycle): undecided

: enabled: disabled

turn types

0 1

3 4

2

5

6 7 8

rule 2 (node)0 2

5

6 7 8

1

3 4

rule 3 (link)

Which turns should be enabled upon a turn-restriction decision?(1) to minimize the number of restrictions(2) to guarantee deadlock-freedom

0 1

3 4

2

5

violatedturn

What happens ifwe break the rules?

deadlock happens

Page 31: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

31

Advanced Turn-Enabling Rules(Common Link, Opposite-corner Turn)

: undecided: enabled (basic)

: disabled

turn types

: enabled (advanced)

: candidate

0 1

4 5

2

6

8 9

3

7

10 11

15141312

rule 4: common link0 1

4 5

2

6

3

7

Why rule 4? Let’s applying basic rules…

should beenabled forboth candidates

rule 5: opposite-corner turn0 1

4 5

2

6

8 9

3

7

10 11

15141312

horizontalenabling

verticalenabling

diagonalenabling

see paperfor details

Page 32: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

32

Applying Basic Turn-Enabling Rulesto Faulty Topologies

rule 1: cyclespecial case – no doublecount:counted only for one cycle

0 1

3 4

2

5

6 7 8

mutualturn

rules 2 & 3: node & link

no special change

0 1

3 4

2

5

6 7 8

deadlock when disabling only mutual turn

Page 33: Highly Fault-tolerant NoC Routing with Application-aware Congestion Management Doowon Lee, Ritesh Parikh and Valeria Bertacco University of Michigan.

33

Applying Advanced Turn-Enabling Rulesto Faulty Topologies

rule 4: common link

apply only towards fault-free directions

0 1

4 5

2

6

8 9

3

7

10 11

15141312

rule 5: opposite-corner turn

apply as if fault-free

0 1

4 5

2

6

8 9

3

7

10 11

15141312