Top Banner
A Heterogeneous Multiple Network-On-Chip Design: An Application-Aware Approach Asit K. Mishra Chita R. Das Onur Mutlu
66

A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Jun 05, 2018

Download

Documents

phungkhue
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

A Heterogeneous Multiple Network-On-Chip Design:

An Application-Aware Approach

Asit K. Mishra

Chita R. Das Onur Mutlu

Page 2: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Executive summary •  Problem: Current day NoC designs are agnostic to application requirements and are provisioned for the general case or worst case. Applications have widely differing demands from the network •  Our goal: To design a NoC that can satisfy the diverse dynamic performance requirements of applications •  Observation: Applications can be divided into two general classes in terms of their requirements from the network: bandwidth-sensitive and latency-sensitive - Not all applications are equally sensitive to bandwidth and latency •  Key idea: Design two NoC - Each sub-network customized for either BW or LAT sensitive applications - Propose metrics to classify applications as BW or LAT sensitive - Prioritize applications’ packets within the sub-networks based on their sensitivity •  Network design: BW optimized network has wider link width but operates at a lower frequency and LAT optimized network has narrow link width but operates at a higher frequency •  Results: Our proposal is significantly better when compared to an iso-resource monolithic network (5%/3% weighted/instruction throughput improvement and 31% energy reduction)

2

Page 3: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

•  Channel bandwidth affects network latency, throughput and energy/power

•  Increase in channel BW leads to - Reduction in packet serialization - Increase in router power

Resource requirements of various applications - I

3

Impact of channel bandwidth on application performance

Page 4: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Resource requirements of various applications - I

4

Impact of channel bandwidth on application performance

Simulation settings:

•  8x8 multi-hop packet based mesh network •  Each node in the network has an OoO processor (2GHz), private L1 cache and a router (2GHz) •  Shared 1MB per core shared L2 •  6VC/PC, 2 stage router

Page 5: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Resource requirements of various applications - I

5

Impact of channel bandwidth on application performance

0 1 2 3 4 5 6 7

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264

gcc

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

IT (

norm

. to

64b

links

)

64b links 128b links 256b links 512b links

Page 6: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Resource requirements of various applications - I

6

Impact of channel bandwidth on application performance

0 1 2 3 4 5 6 7

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264

gcc

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

IT (

norm

. to

64b

links

)

64b links 128b links 256b links 512b links

1. 18/30 (21/36 total) applications’ performance is agnostic to channel BW (8x BW inc. → less than 2x performance inc.)

Page 7: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Resource requirements of various applications - I

7

Impact of channel bandwidth on application performance

0 1 2 3 4 5 6 7

appl

u

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264

gcc

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

IT (

norm

. to

64b

links

)

64b links 128b links 256b links 512b links

1. 18/30 (21/36 total) applications’ performance is agnostic to channel BW (8x BW inc. → less than 2x performance inc.)

2. 12/30 (15/36 total) applications’ performance scale with increase in channel BW (8x BW inc. → at least 2x performance inc.)

Page 8: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

8

•  Reduction in router latency (by increasing frequency) -  Reduction in packet latency -  Increase in router power consumption

Impact of network latency on application performance

Resource requirements of various applications - II

Page 9: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Resource requirements of various applications - II

9

Simulation settings:

•  … same as last experiment

•  128b links

•  Added dummy stages (2-cycle and 4-cycle ) to each router

Impact of network latency on application performance

Page 10: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Resource requirements of various applications - II

10

0.5  

0.6  

0.7  

0.8  

0.9  

1.0  

1.1  

applu  

wrf  

art  

deal  

sjeng  

barne

grmcs  

namd  

h264  

gcc  

pvray  

tonto  

libq  

gobm

astar  

milc  

hmm

swim

 

sjbb  

sap  

xalan  

sphn

x  

bzip  

lbm  

sjas  

soplx  

cacts  

omne

gems  

mcf  

IT  (n

orm.  to  2-­‐cycle  router)  

2-­‐cycle  router   4-­‐cycle  router   6-­‐cycle  router  

Impact of network latency on application performance

Page 11: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Resource requirements of various applications - II

11

0.5  

0.6  

0.7  

0.8  

0.9  

1.0  

1.1  

applu  

wrf  

art  

deal  

sjeng  

barne

grmcs  

namd  

h264  

gcc  

pvray  

tonto  

libq  

gobm

astar  

milc  

hmm

swim

 

sjbb  

sap  

xalan  

sphn

x  

bzip  

lbm  

sjas  

soplx  

cacts  

omne

gems  

mcf  

IT  (n

orm.  to  2-­‐cycle  router)  

2-­‐cycle  router   4-­‐cycle  router   6-­‐cycle  router  

1. 18/30 (21/36 total) applications’ performance is sensitive to network latency (3x latency reduction → at least 25% performance improvement)

Impact of network latency on application performance

Page 12: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Resource requirements of various applications - II

12

0.5  

0.6  

0.7  

0.8  

0.9  

1.0  

1.1  

applu  

wrf  

art  

deal  

sjeng  

barne

grmcs  

namd  

h264  

gcc  

pvray  

tonto  

libq  

gobm

astar  

milc  

hmm

swim

 

sjbb  

sap  

xalan  

sphn

x  

bzip  

lbm  

sjas  

soplx  

cacts  

omne

gems  

mcf  

IT  (n

orm.  to  2-­‐cycle  router)  

2-­‐cycle  router   4-­‐cycle  router   6-­‐cycle  router  

2. 12/30 (15/36 total) applications’ performance is marginally sensitive to network latency (3x latency increase → less than 15% performance improvement)

1. 18/30 (21/36 total) applications’ performance is sensitive to network latency (3x latency reduction → at least 25% performance improvement)

Impact of network latency on application performance

Page 13: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0.5  

0.7  

0.9  

1.1  

applu  

wrf  

art  

deal  

sjeng  

barnes  

grmcs  

namd  

h264  

gcc  

pvray  

tonto  

libq  

gobm

k  

astar  

milc  

hmmer  

swim

 

sjbb  

sap  

xalan  

sphn

x  

bzip  

lbm  

sjas  

soplx  

cacts  

omne

t  

gems  

mcf  IT  (n

orm.  to  2-­‐cycle  

router)  

2-­‐cycle  router   4-­‐cycle  router   6-­‐cycle  router  

Application-aware approach to designing multiple NoCs

13

0

2

4

6 ap

plu

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264

gcc

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

IT (

norm

. to

64b

links

)

64b links 128b links 256b links 512b links

Page 14: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0.5  

0.7  

0.9  

1.1  

applu  

wrf  

art  

deal  

sjeng  

barnes  

grmcs  

namd  

h264  

gcc  

pvray  

tonto  

libq  

gobm

k  

astar  

milc  

hmmer  

swim

 

sjbb  

sap  

xalan  

sphn

x  

bzip  

lbm  

sjas  

soplx  

cacts  

omne

t  

gems  

mcf  IT  (n

orm.  to  2-­‐cycle  

router)  

2-­‐cycle  router   4-­‐cycle  router   6-­‐cycle  router  

Application-aware approach to designing multiple NoCs

14

Based on the observations: 1. Applications can be classified into distinct classes: typically LAT/BW sensitive 2. LAT sensitive applications can benefit from low network latency 3. BW sensitive applications can benefit from high network bandwidth 4. Not all applications are equally sensitive to either LAT or BW 5. Monolithic network cannot optimize both classes simultaneously

0

2

4

6 ap

plu

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264

gcc

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

IT (

norm

. to

64b

links

)

64b links 128b links 256b links 512b links

Page 15: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0.5  

0.7  

0.9  

1.1  

applu  

wrf  

art  

deal  

sjeng  

barnes  

grmcs  

namd  

h264  

gcc  

pvray  

tonto  

libq  

gobm

k  

astar  

milc  

hmmer  

swim

 

sjbb  

sap  

xalan  

sphn

x  

bzip  

lbm  

sjas  

soplx  

cacts  

omne

t  

gems  

mcf  IT  (n

orm.  to  2-­‐cycle  

router)  

2-­‐cycle  router   4-­‐cycle  router   6-­‐cycle  router  

Application-aware approach to designing multiple NoCs

15

Solution

Two NoCs where each (sub)network is optimized for either LAT or BW sensitive applications

0

2

4

6 ap

plu

wrf art

deal

sjen

g

barn

es

grm

cs

nam

d

h264

gcc

pvra

y

tont

o

libq

gobm

k

asta

r

milc

hmm

er

swim

sjbb

sap

xala

n

sphn

x

bzip

lbm

sjas

sopl

x

cact

s

omne

t

gem

s

mcf

IT (

norm

. to

64b

links

)

64b links 128b links 256b links 512b links

Page 16: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design methodology

16

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

Page 17: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design methodology

17

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

1

Identify LAT/BW sensitive applications - Proposes a novel dynamic application classification scheme

1

Page 18: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design methodology

18

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

1

Identify LAT/BW sensitive applications - Proposes a novel dynamic application classification scheme

1

2 Design sub-networks based on applications’ demand - This network architecture is better than a monolithic iso-resource

network

2

Page 19: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design methodology

19

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

1

Identify LAT/BW sensitive applications - Proposes a novel dynamic application classification scheme

1

2 Design sub-networks based on applications’ demand - This network architecture is better than a monolithic iso-resource

network

2

DE

MU

X

Page 20: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design: Dynamic classification of applications

20

Network episode Compute episode

time

Application life cycle O

utst

andi

ng n

etw

ork

pack

ets

Page 21: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design: Dynamic classification of applications

21

Network episode Compute episode

time

Application life cycle

•  App. has at least one outstanding packet •  Processor is likely stalling → low IPC

Out

stan

ding

net

wor

k pa

cket

s

Page 22: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design: Dynamic classification of applications

22

Network episode Compute episode

time

Application life cycle

•  App. has at least one outstanding packet •  Processor is likely stalling → low IPC

•  App. has no outstanding packet •  High IPC

Out

stan

ding

net

wor

k pa

cket

s

Page 23: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design: Dynamic classification of applications

23

Network episode Compute episode

Epi

sode

hei

ght

Episode length time

Application life cycle

•  App. has at least one outstanding packet •  Processor is likely stalling → low IPC

•  App. has no outstanding packet •  High IPC

Episode length = Number of consecutive cycles there are net. packets Episode height = Avg. number of L1 packets injected during an episode

Out

stan

ding

net

wor

k pa

cket

s

Page 24: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design: Dynamic classification of applications

24

Network episode Compute episode

time

Application life cycle

•  App. has at least one outstanding packet •  Processor is likely stalling → low IPC

•  App. has no outstanding packet •  High IPC

Out

stan

ding

net

wor

k pa

cket

s

Short episode ht.: Low MLP, each request is critical (LAT sensitive) Tall episode ht.: High MLP (BW sensitive) Short episode len.: Packets are very critical (LAT sensitive) Long episode len.: Latency tolerant (could be de-prioritized)

Episode length Epi

sode

hei

ght

Page 25: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Classification and ranking

25

Classifica(on   Length    Long     Medium     Short  

Tall   gems,  mcf   sphinx,  lbm,  cactus,  xalan   sjeng,  tonto  

Height   Medium   omnetpp,  apsi  ocean,  sjbb,  sap,  bzip,  

sjas,  soplex,  tpc  

applu,  perl,  barnes,  gromacs,  namd,  calculix,  

gcc,  povray,  h264,    gobmk,  hmmer,  astar  

Short   leslie   art,  libq,  milc,  swim     wrf,  deal  

Classification: LAT/BW

Page 26: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Classification and ranking

26

Classification: LAT/BW

Ranking   Length    Long     Medium     Short  

High   Rank-­‐4   Rank-­‐2   Rank-­‐1  Height   Medium   Rank-­‐3   Rank-­‐2   Rank-­‐2  

Short   Rank-­‐4   Rank-­‐3   Rank-­‐1  

Ranking: Sensitivity to LAT/BW

Classifica(on   Length    Long     Medium     Short  

Tall   gems,  mcf   sphinx,  lbm,  cactus,  xalan   sjeng,  tonto  

Height   Medium   omnetpp,  apsi  ocean,  sjbb,  sap,  bzip,  

sjas,  soplex,  tpc  

applu,  perl,  barnes,  gromacs,  namd,  calculix,  

gcc,  povray,  h264,    gobmk,  hmmer,  astar  

Short   leslie   art,  libq,  milc,  swim     wrf,  deal  

Page 27: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Network design

27

1N-128

Page 28: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Network design

28

1N-128 2N-64x256-ST (Steering)

Page 29: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Network design

29

1N-128 2N-64x256-ST (Steering)

2N-64x256-ST-RK (Steering+Ranking)

Page 30: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Network design

30

1N-128 2N-64x256-ST (Steering)

2N-64x256-ST-RK (Steering+Ranking)

2N-64x256-ST-RK(FS) (Steering+Ranking and

Frequency Scaling)

Page 31: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Network design

31

1N-128

1N-256

2N-64x256-ST (Steering)

2N-64x256-ST-RK (Steering+Ranking)

2N-64x256-ST-RK(FS) (Steering+Ranking and

Frequency Scaling)

1N-512 (High BW) 2N-128X128

1N-320 (iso-BW)

1N-320(FS) (iso-resource)

Page 32: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Analysis

32

0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Performance (25 WL with 50% BW and 50% LAT)

Page 33: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

33

Performance (25 WL with 50% BW and 50% LAT)

Page 34: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

34

+18%

Performance (25 WL with 50% BW and 50% LAT)

Page 35: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

35

Performance (25 WL with 50% BW and 50% LAT)

+7% +18%

Page 36: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

36

Performance (25 WL with 50% BW and 50% LAT)

+5% +7% +18%

Page 37: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

37

Performance (25 WL with 50% BW and 50% LAT)

5% +5%

+7% +18%

Page 38: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

38

Performance (25 WL with 50% BW and 50% LAT)

w. 2% 5%

+5% +7% +18%

Page 39: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

39

Performance (25 WL with 50% BW and 50% LAT)

w. 2% w. 2% 5%

+5% +7% +18%

Page 40: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

40

Performance (25 WL with 50% BW and 50% LAT)

w. 2% w. 2% 5%

+5% +7% +18%

0  

0.4  

0.8  

1.2  

1.6  

2  

Normalize

d  en

ergy  

Energy (25 WL with 50% BW and 50% LAT)

- 47% -  59%

Page 41: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

0

10

20

30

40

50

60

Wei

ghte

d sp

eedu

p

Analysis

41

Performance (25 WL with 50% BW and 50% LAT)

w. 2% w. 2% 5%

+5% +7% +18%

0  

0.4  

0.8  

1.2  

1.6  

2  

Normalize

d  en

ergy  

Energy (25 WL with 50% BW and 50% LAT)

- 47% -  59%

Best EDP across all designs

Page 42: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Conclusions •  Problem: Current day NoC designs are agnostic to application requirements and are provisioned for the general case or worst case. Applications have widely differing demands from the network •  Our goal: To design a NoC that can satisfy the diverse dynamic performance requirements of applications •  Observation: Applications can be divided into two general classes in terms of their requirements from the network: bandwidth-sensitive and latency-sensitive - Not all applications are equally sensitive to bandwidth and latency •  Key idea: Design two NoC - Each sub-network customized for either BW or LAT sensitive applications - Propose metrics to classify applications as BW or LAT sensitive - Prioritize applications’ packets within the sub-networks based on their sensitivity •  Network design: BW optimized network has wider link width but operates at a lower frequency and LAT optimized network has narrow link width but operates at a higher frequency •  Results: Our proposal is significantly better when compared to an iso-resource monolithic network (5%/3% weighted/instruction throughput improvement and 31% energy reduction)

42

Page 43: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Thank you

43

Q? Asit Mishra

[email protected]

Page 44: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Backup Slides . . .

44

Page 45: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Other metrics considered for application classification

45

0 200 400 600 800 1000 1200 1400 1600 1800 2000

0

20

40

60

80

100

120 ap

plu

wrf art

deal

sj

eng

barn

es

grm

cs

nam

d h2

64

gcc

pvra

y to

nto

libq

gobm

k as

tar

milc

hm

mer

sw

im

sjbb

sa

p xa

lan

sphn

x bz

ip

lbm

sj

as

sopl

x ca

cts

omne

t ge

ms

mcf

Sla

ck (i

n cy

cles

)

L1/L

2 M

PK

I

L1MPKI L2MPKI Slack

Page 46: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Analysis of network episode length and height

46

0  2  4  6  8  10  12  14  

applu  

wrf  

art  

deal  

sjeng  

barnes  

grmcs  

namd  

h264  

gcc  

pvray  

tonto  

libq  

gobm

k  

astar  

milc  

hmmer  

swim

 

sjbb  

sap  

xalan  

sphn

x  

bzip  

lbm  

sjas  

soplx  

cacts  

omne

t  

gems  

mcf  

Avg.  episode

 height  

(network  packets)  

0  1000  2000  3000  4000  5000  6000  

applu  

wrf  

art  

deal  

sjeng  

barnes  

grmcs  

namd  

h264  

gcc  

pvray  

tonto  

libq  

gobm

k  

astar  

milc  

hmmer  

swim

 

sjbb  

sap  

xalan  

sphn

x  

bzip  

lbm  

sjas  

soplx  

cacts  

omne

t  

gems  

mcf  Av

g.  episode

 length    

(in  cycles)  

Short length/height Medium length/height Long length/High height

0.3M 10K

0.4M 18K

Page 47: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Analysis of network episode length and height

47

0  2  4  6  8  10  12  14  

applu  

wrf  

art  

deal  

sjeng  

barnes  

grmcs  

namd  

h264  

gcc  

pvray  

tonto  

libq  

gobm

k  

astar  

milc  

hmmer  

swim

 

sjbb  

sap  

xalan  

sphn

x  

bzip  

lbm  

sjas  

soplx  

cacts  

omne

t  

gems  

mcf  

Avg.  episode

 height  

(network  packets)  

0  1000  2000  3000  4000  5000  6000  

applu  

wrf  

art  

deal  

sjeng  

barnes  

grmcs  

namd  

h264  

gcc  

pvray  

tonto  

libq  

gobm

k  

astar  

milc  

hmmer  

swim

 

sjbb  

sap  

xalan  

sphn

x  

bzip  

lbm  

sjas  

soplx  

cacts  

omne

t  

gems  

mcf  Av

g.  episode

 length    

(in  cycles)  

Short length/height Medium length/height Long length/High height

Based on performance scaling sensitivity to bandwidth and frequency

0.3M 10K

0.4M 18K

Page 48: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Empirical results to support the classification

48

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 49: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Empirical results to support the classification

49

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 50: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Empirical results to support the classification

50

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 51: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Empirical results to support the classification

51

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Page 52: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Empirical results to support the classification

52

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Why 9 clusters?

0 25 50 75

100 125 150 175 200 225

0 5 10 15 20 25 30 35

With

in g

roup

sum

of

squa

res

Number of clusters

Page 53: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Empirical results to support the classification

53

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Why 9 clusters?

0 25 50 75

100 125 150 175 200 225

0 5 10 15 20 25 30 35

With

in g

roup

sum

of

squa

res

Number of clusters

13x

Page 54: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Empirical results to support the classification

54

SPECjbb (sjbb) as cut-off for BW/LAT sensitive applications

Why 9 clusters?

0 25 50 75

100 125 150 175 200 225

0 5 10 15 20 25 30 35

With

in g

roup

sum

of

squa

res

Number of clusters

Page 55: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Analysis with varying workload combinations

55

0.7

0.9

1.1

1.3

1.5

0% BANDWIDTH 100% LATENCY

25% BANDWIDTH 75% LATENCY

50% BANDWIDTH 50% LATENCY

75% BANDWIDTH 25% LATENCY

100% BANDWIDTH 0%

LATENCY

WS

and

IT (n

orm

. to

1N-1

28 n

et.) WS IT

Page 56: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Comparison to prior works

56

0.8

1.0

1.2

1.4

1N-1

28-S

TC

1N-1

28-S

T+R

K

2N-1

28x1

28-L

D-B

AL

2N-6

4x25

6-W

-LD

-BA

L

2N-6

4x25

6-S

T+R

K(F

S)

WS

and

IT (n

orm

. to

1N-1

28 n

et.) WS IT

Page 57: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Dynamic steering of packets

57

0%

20%

40%

60%

80%

100% ap

plu art

barn

es

nam

d

gcc

libq

asta

r

hmm

er

lesl

ie

calc

ulix

sjen

g

sap

sphn

x

lbm

sopl

x

omne

t

mcf

ocea

n

% p

acke

ts in

sub

-net

wor

k Latency-optimized network Bandwidth-optimized network

Page 58: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design: Putting it all together

58

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Page 59: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design: Putting it all together

59

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Classify applications based on sensitivity to network BW/LAT

Page 60: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design: Putting it all together

60

Episode LEN/HT

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Classify applications based on sensitivity to network BW/LAT

Use network episode length/height to dynamically identify

apps

Page 61: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design: Putting it all together

61

Episode LEN/HT

Design LAT/BW optimized networks

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Classify applications based on sensitivity to network BW/LAT

Use network episode length/height to dynamically identify

apps

Page 62: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Design: Putting it all together

62

Classify applications based on sensitivity to network BW/LAT

Episode LEN/HT

Design LAT/BW optimized networks

Processors/L1$ Network L2$ and Mem. Controllers

Logical view of a multicore processor

MU

X

Use network episode length/height to dynamically identify

apps Prioritization within

networks

Page 63: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Summary

63

•  A NoC paradigm based on top-down approach (application demand/requirement analysis)

•  An efficient design paradigm for future heterogeneous multicores

Page 64: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Summary

64

•  A NoC paradigm based on top-down approach (application demand/requirement analysis)

•  An efficient design paradigm for future heterogeneous multicores

Small core

GPGPUs

Accelerators/ ASIC

Latency critical

Throughput (BW) critical

Throughput (BW) critical

Latency critical (real-

time constraints)

Big core

Page 65: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Summary

65

•  A NoC paradigm based on top-down approach (application demand/requirement analysis)

•  An efficient design paradigm for future heterogeneous multicores

Small core

GPGPUs

Accelerators/ ASIC

Latency critical

Big core

Providing all these guarantees in one network is hard

Multiple networks: each customized for one metric

Throughput (BW) critical

Throughput (BW) critical

Latency critical (real-

time constraints)

Page 66: A Heterogeneous Multiple Network-On-Chip Design: An ...users.ece.cmu.edu/~omutlu/pub/mishra_dac13_talk.pdf · A Heterogeneous Multiple Network-On-Chip Design: ... 2 4 6 applu deal

Summary

66

•  A NoC paradigm based on top-down approach (application demand/requirement analysis)

•  An efficient design paradigm for future heterogeneous multicore m/c

Latency

Throughput

Local communication

Long haul comm.

Power

1 cycle/ bufferless/Faster

routers

1 cycle/ high bandwidth

Power efficient links/DVFS router

Hybrid/ fewer connectivity

network

Butterfly/express channels

MU

X

Share 2D space or 3D layers

Episode LEN/HT/??