Top Banner
Exploiting Detachability Hashem H. Najaf-abadi Eric Rotenberg
16

Exploiting Detachability

Jan 01, 2016

Download

Documents

rajah-salinas

Exploiting Detachability. Hashem H. Najaf-abadi Eric Rotenberg. Different jobs, Different tools. Different applications have different characteristics and therefore different resource needs. Therefore a single fixed architecture compromises the performance of the individual, - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Exploiting Detachability

Exploiting Detachability

Hashem H. Najaf-abadi

Eric Rotenberg

Page 2: Exploiting Detachability

Different jobs, Different tools

• Different applications have different characteristics and therefore different resource needs.

• Therefore a single fixed architecture compromises the performance of the individual,

for the performance of all.

Page 3: Exploiting Detachability

Architectural changeability (Transformation)

• In silicon-based technology;

performance of a changeable (polymorphic) design in a fixed configuration

is less thana non-changeable implementation of the same configuration.

Page 4: Exploiting Detachability

Changeability in subcomponent

Changeability at any level of the design hierarchy

Changeability in interconnect

Sub-component

Interconnect

Interacting subcomponents may need to change too

Page 5: Exploiting Detachability

Changeability at logic-circuit level

• In an adder for instance;

F.A.F.A.F.A.F.A.

carry

• Or the bypasses;

F.U. F.U. F.U. F.U.

Page 6: Exploiting Detachability

Changeability at the pipeline level

• In the execution for instance;

fetch

decode

dispatch

issue

execute

write-back

execute

Page 7: Exploiting Detachability

Changeability at the processor level

L2 cache

Core A

Core B

Core C

• At least there’s no higher level for changeability to spread to.

Page 8: Exploiting Detachability

Heterogeneity

• Pros: • No low-level changeability

• Cons:• Poor scalability (die area is consumed,

burdening access to system resources)

• Inflexible (once configurations are placed in the system, they are permanent,

while their need is user dependent)

Page 9: Exploiting Detachability

Spread Heterogeneity to numerous chips

• Pros:• Increases the overall die area, thus

ameliorating the unscalability

• Cons:• Exacerbates the burdening of access to

system resources • Remains inflexible in the forms of

architectural diversity that are made available

Page 10: Exploiting Detachability

Exploiting Detachability

• Detachability: a property that already exists (due to marketing and packaging issues).

• Pros:• No suboptimality due to limited die are or

burdening of access to system resources. • Flexible in the forms of architectural diversity

Page 11: Exploiting Detachability

Exploiting Detachability

• Other advantages:

• A substrate for gradual employment of alternate technologies (which tend to be application dependent)

• A paradigm where architects can focus on innovations for enhancing architectures for specific applications, rather than tweaking the same old design.

Page 12: Exploiting Detachability

Changeability in real world applications

• Rough automatic design-space exploration for the integer SPEC2000 benchmarks

• Randomly varied the L1 and L2 cache sizes, the processor width, issue queue size, and clock period.

Page 13: Exploiting Detachability

Customization results

bzip gap gcc gzip mcf parser perl twolf vortex vpr crafty

No. mem. access cycles 345 345 345 345 278 323 228 298 328 183 302

No. front-end cycles 13 13 13 13 11 12 9 11 13 7 12

Processor width 3 3 3 3 5 5 3 4 2 7 6

Issue queue size 64 64 64 64 64 64 16 16 64 64 64

B-to-Back lat. of dep. inst. 14 14 14 14 12 14 9 12 13 8 13

Clock period 0.1447 0.1447 0.1447 0.1447 0.1797 0.1547 0.2187 0.1676 0.1522 0.2718 0.165

No. L1 access cycles 7 8 8 8 9 10 8 10 10 6 14

No. L1-cache lines 32 128 64 128 1024 512 512 512 128 128 128

L1-cache line-size 16 32 64 32 64 32 8 128 8 32 64

L1-cache associativity 1 1 1 1 1 1 4 1 2 1 8

No. L2 access cycles 13 15 19 18 55 32 12 28 20 16 16

No. L2-cache lines 32 64 32 128 4096 1024 1024 512 256 1024 1024

L2-cache line-size 32 64 512 256 128 16 32 128 64 32 16

L2-cache associativity 16 16 16 8 4 1 4 16 16 16 8

Page 14: Exploiting Detachability

On each other’s bzip gap gcc gzip mcf parser perl twolf vortex vpr crafty

bzip 0 0.0053* 0.0105 0.0056 0.1033 0.1052 0.2113 0.2073 0.0488 0.1145 0.1714

gap 0.2033 0 0.0135 0.008* 0.2684 0.1615 0.1623 0.1863 0.0475 0.1352 0.1105

gcc 0.4977 0.3078 0 0.1834 0.3832 0.1768 0.2503 0.1620 0.1519 0.1293 0.1190

gzip 0.2179 0.0668 0.0071* 0 0.3 0.1762 0.1688 0.1666 0.0690 0.1744 0.1322

mcf 0.5113 0.4794 0.5212 0.4532 0 0.5008 0.5906 0.3511 0.5333 0.3876 0.3983

parser 0.4107 0.2854 0.2104 0.2405 0.1010 0 0.3794 0.1466 0.3074 0.1273 0.1664

perl 0.6460 0.6141 0.3751 0.4822 0.4491 0.1802 0 0.1347 0.0035* 0.1491 0.2413

twolf 0.4686 0.3345 0.2016 0.2446 0.0997 0.0595* 0.3311 0 0.2183 0.0665 0.2472

vortex 0.6223 0.5284 0.2809 0.4958 0.4809 0.1379 0.2989 0.1590 0 0.2594 0.2268

vpr 0.4327 0.2572 0.1664 0.2587 0.1149 0.0385 0.2583 0.0179* 0.0677 0 0.1803

crafty 0.7088 0.5850 0.1886 0.1886 0.6225 0.3351 0.1827 0.3524 0.0875* 0.2837 0

average 0.429 0.3458 0.1795 0.2632 0.2822 0.197 0.2576 0.1866 0.1718 0.166 0.1812

Rows indicate benchmarks, and columns indicate the their customized architectures

Page 15: Exploiting Detachability

Representative architectures

• Assigning surrogates:

gcc

gzip

parser

perl

twolf

vortex

vpr

crafty

gap

7

6

bzip

5

mcf

3

4

2

1

Page 16: Exploiting Detachability

Customization results

gcc mcfparse

rvorte

x crafty

No. mem. access cycles 345 278 323 328 302

No. front-end cycles 13 11 12 13 12

Processor width 3 5 5 2 6

Issue queue size 64 64 64 64 64

B-to-Back lat. of dep. inst. 14 12 14 13 13

Clock period 0.1447 0.17970.154

70.152

2 0.165

No. L1 access cycles 8 9 10 10 14

No. L1-cache lines 64 1024 512 128 128

L1-cache line-size 64 64 32 8 64

L1-cache associativity 1 1 1 2 8

No. L2 access cycles 19 55 32 20 16

No. L2-cache lines 32 4096 1024 256 1024

L2-cache line-size 512 128 16 64 16

L2-cache associativity 16 4 1 16 8