Advanced Processor Architectures for Embedded Systems Witawas Srisa-an CSCE 496: Embedded Systems Design and Implementation
Dec 19, 2015
Advanced Processor Architectures for Embedded Systems
Witawas Srisa-anCSCE 496: Embedded Systems Design and
Implementation
(R)evolution of Processors
Rock Hard
Ice Hard
Play-doughHard
(R)evolution of Processors
Rock Hard
Ice Hard
Play-doughHard
Hardwire, GPPPerform well in most conditions
but not extreme conditions
(R)evolution of Processors
Rock Hard
Ice Hard
Play DoughHard
GPP with FPGAsCustom designs perform wellin some extreme conditions.
Required extensive knowledgeof hardware design
(R)evolution of Processors
Rock Hard
Ice Hard
Play-doughHard
GPP with embedded programmable logicsReconfiguration triggered
by software
(R)evolution of Processors
• Ice Hard– Contains ASIC
(Application Specific IC) designs
• Increases time-to-market
• Takes time to reconfigure
Software Hotspots
• In DSP– 80% of the processing load are spent on 20%
of the code• Hand tuned assembly that can take thousands of
cycle to execute.• Less portable
– The remaining 80% of the code have complex system functions
• Run well on most GPP
Software Hotspots Example• when 16 QuadAM modem (19.2 Kbaud) implemented
entirely in software – takes 177,000 instruction cycles to execute on
TIC6711
FPGA Co-processor (a few cycles)
Solving Hotspots
PROCESSOR + FPGA MULTIPLE DSPs
PP P
PFPGA
DSP ENABLEDPROCESSORS
P
P RISC
PROCESSORPROGRAMMABLE
LOGIC
An Example of Configurable Processor (Stretch S5000)
ALUFPU
32-BIT RF
CO
NT
RO
L
128-BIT WRF32-BIT RF
ALUFPU
S5 ENGINE
I/O I/O
I/O
I/O + DMA
ISEFInstruction-Set
Extension Fabric
DATA RAM32KB
SRAM256KB
D-CACHE32KB
I-CACHE32KB
MMU
S5 Engine Common To
All S5000 Processors
300 MHz Xtensa-V
32-bit RISC Processor
I/O Subsystem Tailored To Markets &
Applications
Programmable Logic Data Path Inside The RISC
Processor
32 x 128b Wide Registers +
Flexible Wide Load/Store Instructions
Programmable Logic Architecture
RISC DP
Instruction Set Extension Fabric
(ISEF)
WRAR
Memory
128
32
128
128128
128
3232 3232
ISEF Resources• An ISEF includes:
– Computation resources – Routing resources– Pipeline resources– State Register resources
• 2 types of computation resources:– 4096 arithmetic units (AUs) for arithmetic and logic operations– 8192 multiplier units (MUs) for multiply and shift operations
• Example: A single ISEF may implement– 32 16*16 multipliers– 128 32-bit ALUs
Wide Register• Wide register file is used for holding
WR data
– 32 WR registers (128-bits each)
– Divided into 2 banks of 16 registers (WRA and WRB)
• The WRA/WRB types associate a variable with WR bank A/B
– WRA v1, v2, v3;
– WRB w1, w2, w3;
• The WR type defaults to WRA
– Use WRA/WRB to avoid unnecessary register moves between the two WR banks
128128
WritePort 0
WritePort 1
128 bits 128 bits
0
15
ReadPort 0
ReadPort 1
ReadPort 2
128 128128
WRA WRB
1
...
128 bits128 bits
Extension Instructions (EIs)• The power of the Software Configurable Processor
(SCP) architecture is derived from the ability to define new and complex instructions that operate on very wide data
• Extension Instruction’s 3 steps
1. EI Definition: write a Stretch-C function
2. EI Compilation: compile the Stretch-C function
3. EI Use: call an EI through its intrinsic in the application code (C/C++)
Extension Instructions1. Define an Extension Instruction (writing Stretch-C)
#include <stretch.h>SE_FUNC void V_AND8(WR v1, WR vMask, WR *vOut) {
*vOut = v1 & vMask;}
2. Compile and link EI (Stretch-C source file: *.xc)
3. Use EI in C/C++ application code (calling intrinsics)
#include “vector.h”WR v1, vMask, vOut;…WRL128I(&v1, (WR*) memSrc1Ptr, 0);V_AND8(v1, vMask, &vOut);WRS128I(vOut, (WR*) memDstPtr, 0);
vector.xc
Extension Instructions
• Extension Instructions– Are issued by the Xtensa– Read source operands from the
128-bit WR and/or 32-bit AR register files
– Execute out of the ISEF– Write destination operands to
WR
• Once the ISEF is configured with the new instruction, it may be– Called as an intrinsic from
application C code– Used as an assembly
instruction in an assembly source file
ISEF
128
WRReadPort 2
128
WRReadPort 1
128
WRReadPort 0
32
ARReadPort 1
32
ARReadPort 0
WritePort 1
128
WritePort 0
128
Writing Stretch-C Functions#include <stretch.h>
SE_FUNC void V_AND128(WR v1, WR v2, WR *vOut)
{*vOut = v1 & vMask;
}
• #include stretch.h header file
• Stretch-C functions are identified by keyword SE_FUNC void
• EI names are identified by the Stretch-C function name (for single instruction functions)
• EI source and destination operands are defined by the Stretch-C function parameters
• EI operation is defined by the Stretch-C function instructions
Extension Instruction Parameters 1• Extension Instructions are user
defined assembly instructions that use input and output operands
• An Extension Instruction can specify up to 3 Parameters– 0, 1, 2, or 3 inputs– 0, 1 or 2 outputs
• Input and output parameters reside in register files– Inputs come from the WR
or AR register files– Outputs may only be
written to the WR register file
WR
WRA WRB
Extension Unit
128 128
128128
AR
12832 32
ISEF
Assembly# result = a + bADD result, a, b
Stretch-C// RESULT = A + BV_ADD4(A, B, &RESULT);
Extension Instruction Parameters 2
• EI source operands (inputs) may include– Up to 3 WR inputs (use WR,
WRA or WRB)– Up to 2 AR inputs (use int,
short, etc.)
• EI destination operands (outputs) may include– Up to 2 WR outputs, each
writing a separate WR bank– Use the C pointer notation for
outputs
• A single WR parameter may be used as both an input and output operand
SE_FUNC voidFOO(int c1, WR v1, WRB
*vOut){ }
SE_FUNC voidFOO(WR v1, WRA *vOut1, WRB
*vOut2){ }
SE_FUNC voidFOO(WR v1, WRA *vInOut1, WRB
*vOut2){ }
Example of Stretch-C
• RGB2YCrCbY = 0.299 R + 0.587 G + 0.114 B
Cr = 0.701 R - 0.587 G - 0.114 B
Cb = -0.299 R - 0.587 G + 0.886 B
Or
Y = (77R + 150G + 29B) >> 8
Cb = (-43R - 85G + 128B + 32768) >> 8
Cr = (128R - 107G + 21B + 32768) >> 8
RGB2YCCSE_FUNC void rgb2ycc(WR A, WR *B){ se_sint<8> r[5], g[5], b[5]; se_sint<8> y[5], cb[5], cr[5]; int i, j; /* unpack A to RGB data, does not use any ISEF logic */ for (i = 0; i < 5; i++) { j = i * 3 * 8; r[i] = A(j+7, j); g[i] = A(j+15, j+8); b[i] = A(j+23, j+16); } /* converting 5 pixels */ for (i = 0; i < 5; i++) { y[i] = ( 77*r[i] + 150*g[i] + 29*b[i] ) >> 8; cb[i] = (-43*r[i] - 85*g[i] + 128*b[i] + 32768) >> 8; cr[i] = (128*r[i] - 107*g[i] - 21*b[i] + 32768) >> 8; } /* pack YCbCr to B */ *B = (cr[4],cb[4],y[4],cr[3],cb[3],y[3],cr[2],cb[2],y[2],cr[1],cb[1],y[1],cr[0],cb[0],y[0]);}
Stretch Compiler
scc
libei.hlibei.a
rgb2ycc.xc
scc
rgb2ycc.c
scc
rgb2ycc.exe
rgb2ycc.o
<stretch.h>
target
compile
link
Stretch compile
run
Compiler Option
Aruba
Stretch Compiler
Stretch Linker
C/C++ Compiler(xt-xcc, gcc, …)
NativeISS
Compilation Option -ms5610-ms5-iss (default)-stretch-nobits
-ms5-native
.xo Object File Includes Configurationbitstream for ISEF
.dll for im plem enting ExtensionInstructions (EIs)
C++ functions for EIs
Target Aruba device Instruction Set Sim ulator Native (e.g.: x86)
.exe
libei.a, libei.h
.xo.xr
.c, .cc
.xcscc shell
S5000
Summary
• Software Configurable Processor– Describe hardware using C/C++
• But not trivial. Basic understanding of the architecture is needed
– Reconfiguration can take place in 150 micro-seconds
• 2 ISEFs per chip – Can ping pong
• Configuration files stored in SDRAM– Use DMA to preload information
• ISEF is proprietary and NOT FPGAs