Top Banner
1 1 Processor Processor Design Design Embedded Processors Embedded Processors Professor Jari Nurmi Institute of Digital and Computer Systems Tampere University of Technology, Finland email [email protected] Embedded Processors Embedded Processors Embedded processor = ’not a computer processor’ implements control and/or communication functionality of a device not user-programmable (programmed by the application developer) may be a microcontroller management of peripherals to access sensors and actuators or a full-fledged RISC/CISC/DSP processor Different goals compared to high-end workstations low power consumption (in many applications) small silicon area of the processor small memory footprint low to moderate performance may be sufficient small interrupt latency (and interrupt overhead) real-time requirements price, price, price
14

ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

May 15, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

11

ProcessorProcessor DesignDesign––

Embedded ProcessorsEmbedded Processors

Professor Jari NurmiInstitute of Digital and Computer SystemsTampere University of Technology, Finlandemail [email protected]

Embedded ProcessorsEmbedded Processors

ÿ Embedded processor = ’not a computer processor’ÿ implements control and/or communication functionality of a deviceÿ not user-programmable (programmed by the application developer)ÿ may be a microcontroller

ÿ management of peripherals to access sensors and actuators

ÿ or a full-fledged RISC/CISC/DSP processor

ÿ Different goals compared to high-end workstationsÿ low power consumption (in many applications)ÿ small silicon area of the processorÿ small memory footprintÿ low to moderate performance may be sufficientÿ small interrupt latency (and interrupt overhead)ÿ real-time requirementsÿ price, price, price

Page 2: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

22

Embedded ProcessorsEmbedded Processors ((cont’dcont’d ))

ÿ Embedded application characteristics vary a lotÿ Examples on different kinds of applications

ÿ Game console (high end stream processing power with specialgraphics enhancements)

ÿ Mobile phone (lots of DSP and moderately control processing)

ÿ Home appliances (low speed control)

ÿ Printer (stream control and computation, no real-time requirement)

Embedded ProcessorsEmbedded Processors ((cont’dcont’d ))

ÿ Emphasis here on embedded RISCÿ (embedded) DSP will be discussed next

ÿ How to achieve the design goals, especiallyÿ low power consumptionÿ small silicon area of the processorÿ small memory footprintÿ small interrupt latency and overhead

ÿ Examples on embedded RISC solutions for thisÿ ARMÿ MIPSÿ CompactRISC

Page 3: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

33

Power ConsumptionPower Consumption

ÿ Basic power consumption formula in digital CMOSP = V2 ×××× f××××c where

ÿ V = voltage swing (usually equals supply voltage)ÿ f = clocking frequencyÿ c = capacitance switched at each clock

ÿ Three things to minimizeÿ V affects the most, but can be affected least by the design(er)

ÿ reduces also f (desired or not)ÿ f can be reduced if less performance is sufficientÿ c can actually be factored into

ceff = Σ ci × ai

where ai is the activity factor of node i

ÿ ci’s minimized by less circuitry (or slower circuitry)ÿ ai minimized by activity control

Power ConsumptionPower Consumption ((cont’dcont’d ))

ÿ ci worst in output nodesÿ minimizing the bandwidth of off-chip accesses by

ÿ caches (or bringing memory on-chip)ÿ small instruction length

ÿ ci in other nodes can be minimized byÿ accepting slower operationÿ having less complexity (less nodes)

ÿ no superscalar issue, out-of-order execution, etc.ÿ simple cache control (low associativeness)ÿ short buses (and otherwise small dimensions)

ÿ ai minimized by activity controlÿ different power-down modesÿ partial power-down of currently unused parts (clock gating)ÿ latched inputs on blocks connected to busesÿ attention on timing to avoid multiple transitions during a cycle

Page 4: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

44

ProcessorProcessor ((CoreCore ) Area) Area

ÿ Giving up extreme performance requirements saves alsoareaÿ slower logic is smallerÿ less pipeline registers, stall & forward controlÿ less advanced features (branch prediction, speculative execution)

ÿ Giving up accuracy or range saves areaÿ 8, 16, 32, 64-bit processors for different segments of embeddedÿ single-cycle multipliers etc. as application specific extensionsÿ typically no floating-point hardware

Memory OptimizationMemory Optimization

ÿ For low cost, the amount of memory is crucialÿ on-chip memory and/or cache (chip cost)ÿ off-chip memory (system/board cost)

ÿ Mainly two things affect the program memory sizeÿ program lengthÿ instruction word length

ÿ Program length shortened by powerful instructionsÿ Instruction word length shortened by simple instructions!ÿ Compromise of these goals needs to be foundÿ One solution is to have two instruction modes

ÿ full-length powerful instructionsÿ compressed instructions

Page 5: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

55

Interrupt LatencyInterrupt Latency andand OverheadOverhead

ÿ The latency and overhead of interrupts crucial in (reactive) embeddedsystems

ÿ The latency of inter-instruction interrupt is made up ofÿ time of synchronization of the requestÿ time to complete or abort the (longest) instruction in executionÿ time to enter the interrupt service mode (with possible state saving)ÿ (and possibly time to wait for getting enabled)

ÿ Overhead consists ofÿ mode switching (with state saving and restoring)ÿ interrupt processing

ÿ As short instructions (in cycles) as possible for less latencyÿ Possibly long instructions reversible to enable aborting and re-issueÿ Different register sets for interrupt processing mode(s)

ÿ improves both latency and overheadÿ Efficient interrupt processing instructions (e.g. not using compressed

instructions in interrupts)

ARMARM InstructionInstruction SetSet

Page 6: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

66

ARMARM ArchitectureArchitecture

ÿ small (32x8) multiplierÿ barrel shifterÿ 16 (31) GP registersÿ 6 status registersÿ many dedicated busesÿ compressed

instructions (Thumb)

ThumbThumb

ÿ Compressed mode of ARM processors

Page 7: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

77

Thumb InstructionThumb Instruction SetSet

ARMARM CachesCaches

ÿ Separate instruction and data cachesÿ 4 kbytes eachÿ Organization

ÿ four cache segmentsÿ 64-way set-associative (each segment fully associative)ÿ four words per block (4 seg’s x 64 lines x 4 words x 4 bytes = 4kbytes)ÿ word-aligned cache access

ÿ Regions of data cache can be marked uncacheableÿ Flexible cleaning and flushing utilitiesÿ 8-word write buffer, configurable region-wise as write-through, write-

back, or disabled

31 6 5 4 3 2 1 0

segment wordtag

Page 8: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

88

ARMARM SolutionsSolutions for . . .for . . .

ÿ Memory footprintÿ compressed instruction mode (Thumb)ÿ some additional arithmetic efficiency (barrel shifter, small multiplier)ÿ conditional instructions (less delay slots to be filled)

ÿ Processor sizeÿ not full-size multiplierÿ (only) three-stage pipeline (in ARM7, five-stage in ARM9)ÿ small on-chip caches (in ARM7 no cache by default)ÿ only physical addresses, no address mappingÿ no branch prediction etc. fancy things

ARMARM SolutionsSolutions for . . . (for . . . ( cont’dcont’d ))

ÿ Power consumptionÿ Thumb instruction compressionÿ cachesÿ short dedicated buses in the core, small coreÿ low-depth pipelineÿ everything simple but working

ÿ Interrupt latency and overheadÿ FIQ, Fast Interrupt Request for data transfersÿ total of six (partially overlapping) sets of registers for different

modesÿ always handles the interrupts in the (non-compressed) ARM mode

Page 9: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

99

ARMARM Register SetsRegister Sets

The MIPSThe MIPS ApproachApproach

ÿ Discrete R3000 and R4000 (R2000) processors from multiplemanufacturers

ÿ Three core familiesÿ MIPS 32 (32-bit RISC)

ÿ one R3000 & R4000 compatible low-power core

ÿ one with fast (= single-cycle) multiply-accumulate added

ÿ one additionally optimized for WindowsCE and other OS’s

ÿ MIPS 64 (64-bit RISC)ÿ one synthesizable high-performance core

ÿ one with 3D graphics extensions added

ÿ MIPS 16ÿ ”code compression providing 40% reduction in memory footprint”

ÿ MIPS compared to ARMÿ seems to target a broader range (also high performance market)ÿ is lagging in some embedded-specific solutions (like code compression)

Page 10: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

1010

MIPSMIPS InstructionsInstructions

ÿ Three instruction formatsÿ Nothing very special

MIPSMIPS Code CompressionCode Compression

ÿ Limited opcodesÿ Limited register setÿ Short immediatesÿ Decompression as in

ARM Thumb

Page 11: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

1111

MIPSMIPS Register MappingRegister Mapping

ÿ Register set in compressed modeÿ not very straightforward mappingÿ access to other registers by special

register-to-register moves

ÿ Moving between modesÿ JALX instruction calls a subroutine

and toggles the modeÿ in returns the mode of caller is

restored (merged with the address)

MIPS Solutions for . . .MIPS Solutions for . . .

ÿ Memory footprintÿ code compression

ÿ Processor sizeÿ multiply-accumulate as an peripheral option onlyÿ smallish on-chip caches (0-16 kbytes 4-way set-associative

separately for I and D)ÿ the basic design is simple and enables compact implementationÿ however, pipeline varies (8-stage pipeline in R4000!)

ÿ Power consumptionÿ cachesÿ code compression

ÿ Interrupt latency and overheadÿ nothing specific

Page 12: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

1212

National CompactRISCNational CompactRISC

ÿ Scalable RISC architecture with 8/16/32/64-bit dataÿ Available as coresÿ Variable instruction word length 16/32-bit (/48-bit in CR32)ÿ Three-stage pipelineÿ Interrupt stack in hardwareÿ Barrel shifterÿ Multi-cycle multiply operationÿ 12-13 truly GP registers (total 16), dedicated registers

CompactRISC RegistersCompactRISC Registers

Page 13: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

1313

CompactRISC Solutions for . . .CompactRISC Solutions for . . .

ÿ Memory footprintÿ dynamic instruction size

ÿ Processor sizeÿ no caches by defaultÿ simple basic design enables compact implementationÿ multi-cycle multiplierÿ different word length core implementationsÿ three-stage pipeline

ÿ Power consumptionÿ dynamic instruction sizeÿ however, no caches included in the core design

ÿ Interrupt latency and overheadÿ separate interrupt stackÿ barrel shifter

SummarySummary

ÿ Key things in embedded processors areÿ keeping the memory footprint smallÿ keeping the processor area smallÿ keeping the power consumption low (in most cases)ÿ keeping the interrupt latency and overhead low (in most cases)ÿ keeping the price/performance ratio as low as possible

ÿ The means to achieve these goals varies

Page 14: ProcessorDesign Embedded Processors - TUNIedu.cs.tut.fi/pdf/lecture11.pdf · Embedded Processors (cont’d) ÿEmbedded application characteristics vary a lot ÿExamples on different

1414

End ofEnd of Embedded ProcessorsEmbedded Processors

next we will look at DSP processors