Top Banner
38 th DAC, Las Vegas, June 18-22, 2001 38 th DAC, Las Vegas, June 18-22, 2001 Hardware/Software Instruction Set Configurability for Sytem-on-Chip Processors Hardware/Software Hardware/Software Instruction Set Configurability Instruction Set Configurability for Sytem-on-Chip Processors for Sytem-on-Chip Processors Albert Wang, Chris Rowen, Dror Maydan, Earl Killia
50

Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

Sep 30, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

38th D

AC

, Las

Veg

as,

June

18-

22, 2

001

38th D

AC

, Las

Veg

as,

June

18-

22, 2

001

Hardware/SoftwareInstruction Set Configurabilityfor Sytem-on-Chip Processors

Hardware/SoftwareHardware/SoftwareInstruction Set ConfigurabilityInstruction Set Configurabilityfor Sytem-on-Chip Processorsfor Sytem-on-Chip Processors

Albert Wang, Chris Rowen,Dror Maydan, Earl Killia

Page 2: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

2

Landscape of reconfigurable computingLandscape of Landscape of reconfigurablereconfigurable computing computing

Optimality/integration

(e.g. mW, $)

Flexibility/modularity(e.g. time-to-market)

ASIC

FPGA

∆ ~

10x

∆ ~10x

Instruction-setConfigurable

Processor

GeneralProcessor

FPGA+

Processor

Page 3: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

3

Computing using temporal connectionComputing using temporal connectionComputing using temporal connection

Registers

Datapath

Con

trol

Processor Solution

Mem

ory

(Pro

gram

)

ü XCorrect Efficient

ü X

Processor

Page 4: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

4

Computing using spatial connectionComputing using spatial connectionComputing using spatial connection

Registers

Datapath

Con

trol

Processor Solution ASIC Solution

FSM Storage

Mem

ory

(Pro

gram

)

üX

Correct Efficient

ü X

ASIC

Page 5: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

5

Processor with Application-specific Instructions

Configurable Processors: best of bothConfigurable Processors: best of bothConfigurable Processors: best of both

Registers

Datapath

Con

trol

Processor Solutions ASIC Solutions

FSM Storage

Mem

ory

(Pro

gram

)

üü

Correct EfficientProcessor

ASIC

ü ü

Page 6: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

6

OutlineOutlineOutline

vConfigurable processor solution

§ Xtensa ™ processor Architecture

§ Instruction extension automation

§ Software development tools

vAn Example

vResults

vSummary

Page 7: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

7

Conventional ArchitectureConventional ArchitectureConventional Architecture

Source

RF0 RF1 RF2 S1S0

FU0 FU0 FU0 FU0

Result

Decoder

Con

trol

•More registers

•More FU’s

•Deeper pipeline

•Bypass/forward

Page 8: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

8

Conventional Architecture - cont.Conventional Architecture - cont.Conventional Architecture - cont.

Source routing

RF0 RF1 RF2 S1S0

FU0 FU1 FU2 FU3

Result routing

Decoder

Con

trol

•More FU’s

Page 9: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

9

Conventional Architecture – cont.Conventional Architecture – cont.Conventional Architecture – cont.

Source routing

RF0 RF1 RF2 S1S0

FU0 FU1 FU2 FU3

Result routing

Decoder

Con

trol

•More FU’s

•More registers

Page 10: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

10

Conventional Architecture – cont.Conventional Architecture – cont.Conventional Architecture – cont.

Source routing

RF0 RF1 RF2 S1S0

FU0 FU1 FU2 FU3

Result routing

Decoder

Con

trol

•More registers

•More FU’s

•Deeper pipeline

Page 11: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

11

Conventional Architecture – cont.Conventional Architecture – cont.Conventional Architecture – cont.

Source routing

RF0 RF1 RF2 S1S0

FU0 FU1 FU2 FU3

Result routing

Decoder

Con

trol

•More registers

•More FU’s

•Deeper pipeline

•Bypass/forward

Page 12: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

12

Conventional Architecture – cont.Conventional Architecture – cont.Conventional Architecture – cont.

vProblem with fixed processor:

§ Waste silicon• There is no universal extensions, or even one for each

application class

§ Not fast enough, compared with hardwareimplementation

§ Waste power

vThe Tensilica solution:

§ Small core processor

§ Allow easy and efficient application-specificinstruction extensions

Page 13: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

13

Xtensa Architecture – BaseXtensa Architecture – BaseXtensa Architecture – Base

Source routing

RF0 RF1 RF2 S1S0

FU0 FU0 FU0 FU0

Result routing

Decoder

Con

trol

v Good performance§ Comparable to any embedded 32-bit

RISCv Good code density§ Much better than 32-bit RISC§ Use 16b/24b instructions

v Small§ .7mm2 in .18

v Low power§ .37mw / MHz

v Easy extension§ With Tensilica Instruction Extension

(TIE) language – ISA levelv Efficient extension§ TIE compiler generates efficient

pipelined implementation§ TIE compiler extends all software

development tools

Page 14: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

14

TIE language - opcodeTIE language - TIE language - opcodeopcode

Source routing

RF0 RF1 RF2 S1S0

FU0 FU0 FU0 FU0

Result routing

Decoder

Con

trol

•Opcode

opcode MAC op2=5 CUST0

Page 15: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

15

TIE Language – regfile / stateTIE Language – TIE Language – regfileregfile / / statestate

Source routing

RF0 S0

FU0 FU0 FU0 FU0

Result routing

Decoder

Con

trol

•Opcode

•Register file / State… as needed

state ACC 40

Page 16: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

16

TIE Language – semanticsTIE Language – TIE Language – semanticssemantics

Source routing

RF0

FU0 MAC

Result routing

Decoder

Con

trol

•Opcode

•Register file / state

•semantics

S0 … as needed

… as needed

semantic sem1 {MAC} {assign ACCL=ACCL+ars[16:0]*art[15:0];}

Page 17: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

17

TIE Language – iclassTIE Language – TIE Language – iclassiclass

Source routing

RF0

FU0 MAC

Result routing

Decoder

Con

trol

•Opcode

•Register file / state

•semantics

S0 … as needed

… as needed

•Instruction class

iclass c1 {MAC} {in ars, in art} {inout ACC}

Page 18: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

18

TIE Language - scheduleTIE Language - scheduleTIE Language - schedule

•schedule

Source routing

RF0

FU0MAC

Result routing

Decoder

Con

trol

•Opcode

•Register file / state

•semantics

S0 … as needed

… as needed

•Instruction class

schedule s1 {MAC}{use ars 1; use art 1; use ACC 2; def ACC 2;}

Page 19: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

19

A Complete Example – parallel MACA Complete Example – parallel MACA Complete Example – parallel MAC

opcode PMAC op2=0 CUST0

state ACC1 40

state ACC2 40

iclass rr {PMAC}{in ars, in art}{inout ACC1, inout ACC2}

semantic pmac_sem {PMAC} {assign ACC1 = ACC1 + ars[15:0] * art[15:0];

assign ACC2 = ACC2 + ars[31:16] * art[31:16];

}

schedule pmac_schd {PMAC} {use ars 1; use art 1;

use ACC1 2; use ACC2 2;

def ACC1 2; def ACC2 2;

}

Page 20: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

20

Productivity Gain – language + compilerProductivity Gain – language + compilerProductivity Gain – language + compiler

Select processoroptions

Using theXtensaprocessorgenerator,create...

ALU

Pipe

I/O

Timer

MMURegister File

Cache

Tailored,synthesizableHDL uP core

CustomizedCompiler,Assembler,Linker,Debugger,Simulator

∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗

Describe newinstructions In Minutes!

Page 21: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

21

Productivity Gain – Software ToolsProductivity Gain – Software ToolsProductivity Gain – Software Tools

Select processoroptions

Using theXtensaprocessorgenerator,create...

ALU

Pipe

I/O

Timer

MMURegister File

Cache

Tailored,synthesizableHDL uP core

CustomizedCompiler,Assembler,Linker,Debugger,Simulator

∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗∗

Describe newinstructions

Page 22: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

22

Software Support – AssemblerSoftware Support – AssemblerSoftware Support – Assembler

RF0

FU0

Decoder

Con

trol

ACC1 ACC2

+

+

•Assembler

•Custom data type

•Register allocation

•Code Scheduling

•RTOS

•Simulator/debugger

Loop a2, .L1 l16si a10, a3, 0 l16si a11, a3, 2 addi.n a3, a3, 2 PMAC a10, a11.L1:

Page 23: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

23

Software Support – custom data typeSoftware Support – custom data typeSoftware Support – custom data type

RF0

FU0

Decoder

Con

trol

ACC1 ACC2

+

+

•Assembler

•Custom data type

•Register allocation

•Code Scheduling

•RTOS

•Simulator/debugger

sat_int x,y,z;z = sat_add(x,y);C Code:

Page 24: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

24

Software Support – register allocationSoftware Support – register allocationSoftware Support – register allocation

RF0

FU0

Decoder

Con

trol

ACC1 ACC2

+

+

•Assembler

•Custom data type

•Register allocation

•Code Scheduling

•RTOS

•Simulator/debugger

sat_add s3, s1, s2sat_store s3, a1, 0call8 foosat_load s3, a1, 0

Spilling around a call:

Page 25: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

25

Software Support – code schedulingSoftware Support – code schedulingSoftware Support – code scheduling

RF0

FU0

Decoder

Con

trol

ACC1 ACC2

+

+

•Assembler

•Custom data type

•Register allocation

•Code Scheduling

•RTOS

•Simulator/debugger

t = sat_mult(x,y);z = sat_add(z, t);t2 = sat_mult(x2, y2);

sat_mult s3, s1, s2 sat_mult s6, s5, s4sat_add s7, s7, s3

Page 26: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

26

Software Support - RTOSSoftware Support - RTOSSoftware Support - RTOS

RF0

FU0

Decoder

Con

trol

ACC1 ACC2

+

+

•Assembler

•Custom data type

•Register allocation

•Code Scheduling

•RTOS

•Simulator/debugger

Task0S0,S1,…s15

Task1S0,S1,…s15

Memory

sat_store

sat_load

Context Switch

Page 27: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

27

Software Support – simulator/debuggerSoftware Support – simulator/debuggerSoftware Support – simulator/debugger

RF0

FU0

Decoder

Con

trol

ACC1 ACC2

+

+

gdb> break …

gdb> cont

gdb> step

gdb> display …

•Assembler

•Custom data type

•Register allocation

•Code Scheduling

•RTOS

•Simulator/debugger

?

?

?

Page 28: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

28

OutlineOutlineOutline

vConfigurable processors

§ Architecture

§ Instruction extension

§ Software support

vAn Example

vResults

vSummary

Page 29: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

29

Data Encryption Standard (DES)Data Encryption Standard (DES)Data Encryption Standard (DES)

Initial step(R, L) = Initial_permutation(Din64)

Iterate 16 timesKey generation

(C, D) = PC1(k)n = rotate_amount (function of iteration count)C = rotate_right(C, n)D = rotate_right (D, n)K = PC2(D, C)

EncryptionR i+1 = Li ⊕ Permutation ( S_Box ( K ⊕ Expansion ( R ) ) )L i+1 = Ri

Final stepDout64 = Final_permutation(L, R)

Page 30: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

30

DES: Software ImplementationDES: Software ImplementationDES: Software Implementation

static unsigned permute(unsigned char *table,in t n,unsigned hi,unsigned lo)

{int ib, ob;unsigned out = 0;for (ob = 0; ob < n; ob++) {

ib = table[ob] - 1;if (ib >= 32) { if (hi & (1 << (ib-32))) out |= 1 << ob;} else {

if (lo & (1 << ib)) out |= 1 << ob;}

}return out;

}

Page 31: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

31

DES: Software ImplementationDES: Software ImplementationDES: Software Implementation

static unsigned permute(unsigned char *table,in t n,unsigned hi,unsigned lo)

{int ib, ob;unsigned out = 0;for (ob = 0; ob < n; ob++) {

ib = table[ob] - 1;if (ib >= 32) { if (hi & (1 << (ib-32))) out |= 1 << ob;} else {

if (lo & (1 << ib)) out |= 1 << ob;}

}return out;

}Too much computation!

Page 32: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

32

DES: Hardware ImplementationDES: Hardware ImplementationDES: Hardware Implementation

Initial Permutation

ExpansionPermutation

S Boxes

P Permutation

Final Permutation

KeyGeneration

StateMachine

Page 33: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

33

DES: Hardware ImplementationDES: Hardware ImplementationDES: Hardware Implementation

Initial Permutation

ExpansionPermutation

S Boxes

P Permutation

Final Permutation

KeyGeneration

StateMachine

Complicated control logic!

Page 34: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

34

DES: SETDATA instructionDES: DES: SETDATASETDATA instruction instruction

SETDATA ars, artInitial Permutation

ExpansionPermutation

S Boxes

P Permutation

Final Permutation

KeyGeneration

StateMachine

Page 35: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

35

DES: SETKEY instructionDES: DES: SETKEYSETKEY instruction instruction

Initial Permutation

ExpansionPermutation

S Boxes

P Permutation

Final Permutation

KeyGeneration

StateMachine

SETKEY ars, art

Page 36: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

36

DES: DES instructionDES: DES: DESDES instruction instruction

DES immediate

Initial Permutation

ExpansionPermutation

S Boxes

P Permutation

Final Permutation

KeyGeneration

StateMachine

Page 37: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

37

DES: GETDATA instructionDES: DES: GETDATAGETDATA instruction instruction

GETDATA ars, hilo

Initial Permutation

ExpansionPermutation

S Boxes

P Permutation

Final Permutation

KeyGeneration

StateMachine

Page 38: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

38

DES: Putting it togetherDES: Putting it togetherDES: Putting it together

GETDATA ars, hilo

DES immediate

SETDATA ars, artInitial Permutation

ExpansionPermutation

S Boxes

P Permutation

Final Permutation

KeyGeneration

StateMachine

SETKEY ars, art

Page 39: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

39

DES: Improved ProgramDES: Improved ProgramDES: Improved Program

SETKEY(K_hi, K_lo);for (;;) { … /* read encrypted data */ SETDATA(D_hi, D_lo); DES(DECRYPT1); DES(DECRYPT2); DES(DECRYPT2); DES(DECRYPT2); DES(DECRYPT2); DES(DECRYPT2); DES(DECRYPT2); DES(DECRYPT1); DES(DECRYPT2); DES(DECRYPT2); DES(DECRYPT2); DES(DECRYPT2); DES(DECRYPT2); DES(DECRYPT2); DES(DECRYPT1); DES(DECRYPT1); E_hi = GETDATA(hi); E_lo = GETDATA(lo); … /* write data */ }

SETKEY(K_hi, K_lo);for (;;) { … /* read data */ SETDATA(D_hi, D_lo); DES(ENCRYPT1); DES(ENCRYPT1); DES(ENCRYPT2); DES(ENCRYPT2); DES(ENCRYPT2); DES(ENCRYPT2); DES(ENCRYPT2); DES(ENCRYPT2); DES(ENCRYPT1); DES(ENCRYPT2); DES(ENCRYPT2); DES(ENCRYPT2); DES(ENCRYPT2); DES(ENCRYPT2); DES(ENCRYPT2); DES(ENCRYPT1); E_hi = GETDATA(hi); E_lo = GETDATA(lo); … /* write encrypted data */ }

DecryptionEncryption

Page 40: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

40

DES: SummaryDES: SummaryDES: Summary

vAdd 4 TIE instructions:

§ 80 lines of TIE description

§ No cycle time impact

§ ~1700 additional gates

§ Code-size reduced

DES Performance

4350 53

72

0

20

40

60

80

1024 64 8 MeanBlock Size (Bytes)

Spe

edup

(X

)

Page 41: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

41

OutlineOutlineOutline

vConfigurable processors

§ Architecture

§ Instruction extension

§ Software support

vAn Example

vResults

vSummary

Page 42: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

42

Improvement over general purpose 32b RISCImprovement over general purpose 32b RISCImprovement over general purpose 32b RISC

JPEG (image compression)

JPEG (image compression)

Motion Estimation (video conferencing)

Motion Estimation (video conferencing)

FIR filter(signal processing)

FIR filter(signal processing)

Viterbi Decoding (wireless communication)

Viterbi Decoding (wireless communication)

MIPS or MIPS/Watt

DES (content encryption)

DES (content encryption)

2x 4x 6x 8x 10x 55x1x

Base + 7500 gates

Base + 6500 gates

Base + 900 gates

Base + 1000 gates

Base+1700 gates

Page 43: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

43

What is “EEMBC”?What is “EEMBC”?What is “EEMBC”?

v EDN Embedded Microprocessor Benchmark Consortium

v Pronounced “Embassy”

v Non-profit consortium, funded by over 40 members

§ Including: ARM, AMD, IBM, Intel, LSI Logic, MIPS, Motorola,National Semi, NEC, TI, Toshiba…Tensilica, and more…

v Objective: Provide independently certified benchmark scoresrelevant to deeply embedded processor applications

§ Independent laboratory recreates and certifies all benchmarkresults - no tricks

v Five different benchmark suites:v Each suite comprised of a range (five to sixteen) of

benchmarks representative of that product category§ Example: Consumer: image compression, image filtering, color

conversion

Page 44: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

44

EEMBC Networking BenchmarkEEMBC Networking BenchmarkEEMBC Networking Benchmark

Netmark Performance

0

2

4

6

8

10

12

14

IDT 32334/100

IDT79RC32364/100

NEC V832-143

AMD ElanSC520/133

Toshiba TMPR3927F-GH189/133

IDT79RC32V334-150

Toshiba TMPR3927F-GHM2000/133

NEC VR5432-167

Xtensa/200

IDT79RC64575IDtc/250

NEC VR5000

IDT79RC64575Algor/250

AMD K6-2/450

AMD K6-2E/400

Xtensa Optimized/200

AMD K6-2E+/500

AMD K6-IIIE+/550

Netmark Efficiency (Netmark/MHz)

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

0.045

vComparable in Netmark to high-end desktop CPUsv2x in Netmark/MHzv59K total gates at 200MHz

Colors: Blue-Xtensa, Green-Desktop x86s, Maroon-64b RISCs, Orange-32b RISCs

Page 45: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

45

EEMBC Telecom BenchmarkEEMBC Telecom BenchmarkEEMBC Telecom Benchmark

Telemark Performance

0

10

20

30

40

50

60

70

80

90AMD ElanSC520/133

IDT 32334-100

Analog Devices 21065L/60

NEC V832-143

IDT79RC32V334-150

Xtensa/200

NEC VR5432-167

IDT79RC64575Algor/250

NEC VR5000

AMD K6-2E/400

TI TMS320C6203/300

AMDK6-2E+/500

AMD K6-III+/550

IBM PowerPC750CX/500

TI TMS320C6203 C opt/300

TI TMS320C6203 Optimized/300

Xtensa Optimized/200

Telemark Efficiency (Telemark/MHz)

0.000

0.050

0.100

0.150

0.200

0.250

0.300

0.350

0.400

0.450

Colors: Blue-Xtensa, Green-Desktop x86s, Maroon-64b RISCs, Orange-32b RISCs, Gray - DSPs

vBeats all processors, including hand-optimized TI C6xv180K total gates at 200MHz

Page 46: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

46

EEMBC Consumer BenchmarkEEMBC Consumer BenchmarkEEMBC Consumer Benchmark

Consumermark Performance

0

20

40

60

80

100

120

140

160

180

200

ST20C2/50

AMD ElanSC520/133

NEC V832/143

National Geode GX1/200

NEC VR5432/167

Xtensa/200

NEC VR5000/250

AMD K6-2E/400

AMDK6-2E+/500

AMD K6-III+/550

Xtensa Optimized/200

Consumermark Efficiency (Consumermark/MHz)

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

Colors: Blue-Xtensa, Green-Desktop x86s, Maroon-64b RISCs, Orange-32b RISCs

v6x in Consumermark and 12x in Consumermark/MHzv127K total gates at 200MHz

Page 47: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

47

SummarySummarySummary

Optimality/integration

(e.g. mW, $)

Flexibility/modularity(e.g. time-to-market)

ASIC

FPGA

∆ ~

10x

∆ ~10x

Instruction-setConfigurable

Processor

TraditionalProcessor

FPGA+

Processor

Page 48: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

48

SummarySummarySummary

Optimality/integration

(e.g. mW, $)

Flexibility/modularity(e.g. time-to-market)

ASIC

FPGA

∆ ~

10x

∆ ~10x

Instruction-setConfigurable

Processor

GeneralProcessor

FPGA+

Processor

Page 49: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

49

SummarySummarySummary

Optimality/integration

(e.g. mW, $)

Flexibility/modularity(e.g. time-to-market)

ASIC

FPGA

∆ ~

10x

∆ ~10x

TraditionalProcessor

FPGA+

Processor

Instruction-setConfigurable

Processor

v Benefit of SoC integration

§ Higher Bandwidth

§ Lower Cost

§ Lower Power

v Benefit of IS configuration

§ A cost-effectivecomputing platform

v Benefit of TIE compilerand SW tools

§ Faster time-to-market

§ Lower development cost

§ Lower risk

Page 50: Hardware/Software Instruction Set Configurability for ......AMD K6-2/450 AMD K6-2E/400 Xtensa Optimized/200 AMD K6-2E+/500 AMD K6-IIIE+/550 Netmark Efficiency (Netmark/MHz) 0.000 0.005

50

Thank You!