Top Banner
Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation
25

Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Dec 19, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Advanced Processor Architectures for Embedded Systems

Witawas Srisa-anCSCE 496: Embedded Systems Design and

Implementation

Page 2: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

(R)evolution of Processors

Rock Hard

Ice Hard

Play-doughHard

Page 3: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

(R)evolution of Processors

Rock Hard

Ice Hard

Play-doughHard

Hardwire, GPPPerform well in most conditions

but not extreme conditions

Page 4: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

(R)evolution of Processors

Rock Hard

Ice Hard

Play DoughHard

GPP with FPGAsCustom designs perform wellin some extreme conditions.

Required extensive knowledgeof hardware design

Page 5: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

(R)evolution of Processors

Rock Hard

Ice Hard

Play-doughHard

GPP with embedded programmable logicsReconfiguration triggered

by software

Page 6: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

(R)evolution of Processors

• Ice Hard– Contains ASIC

(Application Specific IC) designs

• Increases time-to-market

• Takes time to reconfigure

Page 7: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Software Hotspots

• In DSP– 80% of the processing load are spent on 20%

of the code• Hand tuned assembly that can take thousands of

cycle to execute.• Less portable

– The remaining 80% of the code have complex system functions

• Run well on most GPP

Page 8: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Software Hotspots Example• when 16 QuadAM modem (19.2 Kbaud) implemented

entirely in software – takes 177,000 instruction cycles to execute on

TIC6711

FPGA Co-processor (a few cycles)

Page 9: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Solving Hotspots

PROCESSOR + FPGA MULTIPLE DSPs

PP P

PFPGA

DSP ENABLEDPROCESSORS

P

P RISC

PROCESSORPROGRAMMABLE

LOGIC

Page 10: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

An Example of Configurable Processor (Stretch S5000)

ALUFPU

32-BIT RF

CO

NT

RO

L

128-BIT WRF32-BIT RF

ALUFPU

S5 ENGINE

I/O I/O

I/O

I/O + DMA

ISEFInstruction-Set

Extension Fabric

DATA RAM32KB

SRAM256KB

D-CACHE32KB

I-CACHE32KB

MMU

S5 Engine Common To

All S5000 Processors

300 MHz Xtensa-V

32-bit RISC Processor

I/O Subsystem Tailored To Markets &

Applications

Programmable Logic Data Path Inside The RISC

Processor

32 x 128b Wide Registers +

Flexible Wide Load/Store Instructions

Page 11: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Programmable Logic Architecture

RISC DP

Instruction Set Extension Fabric

(ISEF)

WRAR

Memory

128

32

128

128128

128

3232 3232

Page 12: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

ISEF Resources• An ISEF includes:

– Computation resources – Routing resources– Pipeline resources– State Register resources

• 2 types of computation resources:– 4096 arithmetic units (AUs) for arithmetic and logic operations– 8192 multiplier units (MUs) for multiply and shift operations

• Example: A single ISEF may implement– 32 16*16 multipliers– 128 32-bit ALUs

Page 13: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Wide Register• Wide register file is used for holding

WR data

– 32 WR registers (128-bits each)

– Divided into 2 banks of 16 registers (WRA and WRB)

• The WRA/WRB types associate a variable with WR bank A/B

– WRA v1, v2, v3;

– WRB w1, w2, w3;

• The WR type defaults to WRA

– Use WRA/WRB to avoid unnecessary register moves between the two WR banks

128128

WritePort 0

WritePort 1

128 bits 128 bits

0

15

ReadPort 0

ReadPort 1

ReadPort 2

128 128128

WRA WRB

1

...

128 bits128 bits

Page 14: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Extension Instructions (EIs)• The power of the Software Configurable Processor

(SCP) architecture is derived from the ability to define new and complex instructions that operate on very wide data

• Extension Instruction’s 3 steps

1. EI Definition: write a Stretch-C function

2. EI Compilation: compile the Stretch-C function

3. EI Use: call an EI through its intrinsic in the application code (C/C++)

Page 15: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Extension Instructions1. Define an Extension Instruction (writing Stretch-C)

#include <stretch.h>SE_FUNC void V_AND8(WR v1, WR vMask, WR *vOut) {

*vOut = v1 & vMask;}

2. Compile and link EI (Stretch-C source file: *.xc)

3. Use EI in C/C++ application code (calling intrinsics)

#include “vector.h”WR v1, vMask, vOut;…WRL128I(&v1, (WR*) memSrc1Ptr, 0);V_AND8(v1, vMask, &vOut);WRS128I(vOut, (WR*) memDstPtr, 0);

vector.xc

Page 16: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Extension Instructions

• Extension Instructions– Are issued by the Xtensa– Read source operands from the

128-bit WR and/or 32-bit AR register files

– Execute out of the ISEF– Write destination operands to

WR

• Once the ISEF is configured with the new instruction, it may be– Called as an intrinsic from

application C code– Used as an assembly

instruction in an assembly source file

ISEF

128

WRReadPort 2

128

WRReadPort 1

128

WRReadPort 0

32

ARReadPort 1

32

ARReadPort 0

WritePort 1

128

WritePort 0

128

Page 17: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Writing Stretch-C Functions#include <stretch.h>

SE_FUNC void V_AND128(WR v1, WR v2, WR *vOut)

{*vOut = v1 & vMask;

}

• #include stretch.h header file

• Stretch-C functions are identified by keyword SE_FUNC void

• EI names are identified by the Stretch-C function name (for single instruction functions)

• EI source and destination operands are defined by the Stretch-C function parameters

• EI operation is defined by the Stretch-C function instructions

Page 18: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Extension Instruction Parameters 1• Extension Instructions are user

defined assembly instructions that use input and output operands

• An Extension Instruction can specify up to 3 Parameters– 0, 1, 2, or 3 inputs– 0, 1 or 2 outputs

• Input and output parameters reside in register files– Inputs come from the WR

or AR register files– Outputs may only be

written to the WR register file

WR

WRA WRB

Extension Unit

128 128

128128

AR

12832 32

ISEF

Assembly# result = a + bADD result, a, b

Stretch-C// RESULT = A + BV_ADD4(A, B, &RESULT);

Page 19: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Extension Instruction Parameters 2

• EI source operands (inputs) may include– Up to 3 WR inputs (use WR,

WRA or WRB)– Up to 2 AR inputs (use int,

short, etc.)

• EI destination operands (outputs) may include– Up to 2 WR outputs, each

writing a separate WR bank– Use the C pointer notation for

outputs

• A single WR parameter may be used as both an input and output operand

SE_FUNC voidFOO(int c1, WR v1, WRB

*vOut){ }

SE_FUNC voidFOO(WR v1, WRA *vOut1, WRB

*vOut2){ }

SE_FUNC voidFOO(WR v1, WRA *vInOut1, WRB

*vOut2){ }

Page 20: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Example of Stretch-C

• RGB2YCrCbY = 0.299 R + 0.587 G + 0.114 B

Cr = 0.701 R - 0.587 G - 0.114 B

Cb = -0.299 R - 0.587 G + 0.886 B

Or

Y = (77R + 150G + 29B) >> 8

Cb = (-43R - 85G + 128B + 32768) >> 8

Cr = (128R - 107G + 21B + 32768) >> 8

Page 21: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

RGB2YCCSE_FUNC void rgb2ycc(WR A, WR *B){ se_sint<8> r[5], g[5], b[5]; se_sint<8> y[5], cb[5], cr[5]; int i, j; /* unpack A to RGB data, does not use any ISEF logic */ for (i = 0; i < 5; i++) { j = i * 3 * 8; r[i] = A(j+7, j); g[i] = A(j+15, j+8); b[i] = A(j+23, j+16); } /* converting 5 pixels */ for (i = 0; i < 5; i++) { y[i] = ( 77*r[i] + 150*g[i] + 29*b[i] ) >> 8; cb[i] = (-43*r[i] - 85*g[i] + 128*b[i] + 32768) >> 8; cr[i] = (128*r[i] - 107*g[i] - 21*b[i] + 32768) >> 8; } /* pack YCbCr to B */ *B = (cr[4],cb[4],y[4],cr[3],cb[3],y[3],cr[2],cb[2],y[2],cr[1],cb[1],y[1],cr[0],cb[0],y[0]);}

Page 22: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Stretch Compiler

scc

libei.hlibei.a

rgb2ycc.xc

scc

rgb2ycc.c

scc

rgb2ycc.exe

rgb2ycc.o

<stretch.h>

target

compile

link

Stretch compile

run

Page 23: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Compiler Option

Aruba

Stretch Compiler

Stretch Linker

C/C++ Compiler(xt-xcc, gcc, …)

NativeISS

Compilation Option -ms5610-ms5-iss (default)-stretch-nobits

-ms5-native

.xo Object File Includes Configurationbitstream for ISEF

.dll for im plem enting ExtensionInstructions (EIs)

C++ functions for EIs

Target Aruba device Instruction Set Sim ulator Native (e.g.: x86)

.exe

libei.a, libei.h

.xo.xr

.c, .cc

.xcscc shell

S5000

Page 24: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.

Summary

• Software Configurable Processor– Describe hardware using C/C++

• But not trivial. Basic understanding of the architecture is needed

– Reconfiguration can take place in 150 micro-seconds

• 2 ISEFs per chip – Can ping pong

• Configuration files stored in SDRAM– Use DMA to preload information

• ISEF is proprietary and NOT FPGAs

Page 25: Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation.