Embedded Processors and CPU Cores - Ryerson Universitycourses/ee8205/lectures/... · Embedded Processors and CPU Cores EE8205: ... • Central processing unit ... CPU Pipelining What
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• Memory holds data, instructions. • Central processing unit (CPU) fetches instructions from memory. • Separate CPU and memory distinguishes programmable computer. • CPU registers help out: program counter (PC), instruction register
• Harvard architecture cannot use self-modifying code. • Harvard allows two simultaneous memory fetches. • Most DSPs use Harvard architecture for streaming data: greater memory bandwidth more predictable bandwidth
Instruction Execution Process Instruction Fetch: Reads next instruction into instruction register (IR). Instruction address is in the program counter (PC). Instruction Interpretation: Decodes the op-code, gets the required operands and routes them to ALU. Sequencing
Determines the address of next instruction and loads it into the PC. Execution: Generates control signals of ALU for execution.
Non-destructive architecture, Data Preserving Model, is fundamental to minimize the load-store traffic.
For example in an addition instruction: RISC Model: AR <= BR + CR A combined load/store and non-destructive register model provides a dramatic boost in RISC performance. • RISC ability to minimize the load/store traffic from/to
memory. • De-coupling load/store and processing operations. • Allow optimizing compilers to fill the stall slots.
Interrupts A computer program has only two ways to determine the conditions that exist in internal and external circuits. • One method uses software instructions that jump to subroutine on
some flag status. • The second method responds to hardware signals called interrupts
that force the program to call interrupt-handling subroutines. • Interrupts take processor time only when action is required. • Processor can respond to an external event much faster by
using interrupts.
The whole programming of microcomputers and micro-controller by using interrupts is called real-time programming. Interrupts are often the only way in which real-time programming can be done successfully.
Interrupt based I/O Computers permit I/O modules to INTERRUPT a CPU during its normal operation. • I/O module asserts an interrupt request on the control bus. • CPU transfers the control to an interrupt handler routine. • Interrupt handler is generally part of the operating system.
Interrupts Allows the processor to execute other instructions while an
I/O operation is in progress. Suspension of a processing caused by an event external to
a processor in such a way that the computation can be resumed.
Instruction Cycle with Interrupts CPU check for interrupts at the end of each instruction and executes the interrupt handler if required.
Interrupt Handler program identifies the nature/source of an interrupt and performs whatever actions are needed. • It takes over the control after the interrupt. • Control is transferred back to the interrupted program that will resume
execution from the point of interruption. • Point of interruption can occur anywhere in a program. • State of the program is saved. (PC + PSW + relevant registers + …)
Multiple Interrupts (Sequential Order) • Disable interrupts to complete the interrupting task at hand. • Additional interrupts remain pending until interrupts are enabled.
Then interrupts are considered in order • After completing the interrupt handler routine, the processor
Nios II with a Harvard architecture supports separate instruction and data buses. The data bus implemented as 32-bit Avalon master port connects to both memory and peripheral components.
Data bus performs two functions:
• Read data from memory or a peripheral when the processor executes a load instruction.
• Write data to memory or a peripheral when the processor executes a store instruction.
Memory-mapped I/O access Both data memory and peripherals are mapped into the address space of the data master port.
Nios-II Instruction Bus Instruction bus connects only to memory components.
It is implemented as a 32-bit Avalon master port. • Fetch instructions to be executed by the processor • No memory write • Always retrieves 32 bits of data
Shared Memory for Instructions and Data • Instruction and data master ports can share a single
memory that contains both instructions and data. • The overall Nios II processor system might present a
single, shared instruction and data bus to the outside world.
Memory Address Map Exception Address ● We can select a memory module where the exception
vector resides, and its location. ● In a typical system, you select a low-latency memory
module for the exception code. Break Location ● For Nios II cores containing a JTAG debug module. ● Memory Module is always the JTAG debug module. Offset
is fixed at 0x20, and Address is determined by the base address of the JTAG debug module.
● You cannot modify any of the Break Location fields.
NIOS Exception & Interrupt Control Exception Controller A simple, non-vectored exception control with single address exception handling capabilities. Integral Interrupt Controller • 32 level-sensitive interrupt request (IRQ) inputs, irq0
through irq31. • The software can enable and disable any interrupt
source individually through the ienable control register. • Software can also enable and disable interrupts globally
The Nios II architecture supports a JTAG debug module. Host-based tools communicate with the JTAG debug module: • Downloading programs to memory • Starting and stopping execution • Setting breakpoints and watch points • Analyzing registers and memory • Collecting real-time execution trace data • The debug module connects to the JTAG circuitry in an FPGA.
The debug module connects to signals inside the processor core (on the processor side).
It has non-maskable control over the processor under test. Its functionality can be reduced, or removed altogether.
All system resources visible to the processor in supervisor mode are available to the debug module.
Instruction Set Format Three types of inst format: I-type, R-type, and J-type I-type instruction-word format contains an immediate value embedded within the instruction word. I-type instructions words contain: ● A 6-bit opcode field OP ● Two 5-bit register fields A and B ● A 16 bit immediate data field IMM16 In most cases, fields A and IMM16 specify the source operands, and field B specifies the destination register. IMM16 is considered signed except for logical operations and unsigned comparisons. I-type instructions include arithmetic and logical operations such as addi and andi; branch operations; load and store operations;
● A 6-bit opcode field OP ● Three 5-bit register fields A, B, and C ● An 11-bit opcode-extension field OPX
In most cases, fields A and B specify the source operands, and field C specifies the destination register. R-type instructions include arithmetic and logical operations.
J-type instructions contain: ● A 6-bit opcode field and a 26-bit immediate data field ● The only J-type instruction is call.
Nios-II Processor Cores Three Nios II cores are available:
Nios II/f “fast” core for high performance. It presents the most configuration options allowing us to fine-tune the processor performance.
Nios II/s “standard” core is designed for small size while maintaining performance.
Nios II/e “economy” core is designed to achieve the smallest possible core size. This core has a limited feature set, and many settings are not available when the Nios II/e core is selected.
Nios-II Multiply & Divide Settings Nios II cores offer different multiply or divide options.
Choose the best option to balance embedded multiplier or logic element (LE) usage, and performance.
Hardware Multiply setting provides the following options: ● Include embedded multipliers (e.g., the DSP blocks in
Stratix devices) in the arithmetic logic unit (ALU). ● Include LE-based multipliers in the ALU. ● Omit hardware and have multiply operations in software.
Hardware Divide setting includes LE-based divide hardware in the ALU that achieves much greater performance than emulated software divide operations.
(1) DMIPS performance for the Nios II/s and Nios II/f cores depends on the hardware multiply option. (2) Using the fastest hardware multiply option, and targeting a Stratix II FPGA in the fastest speed grade.
The Nios II/f Core ● Separate instruction and data caches. ● Can access up to 2 GBytes of external address space. ● Supports optional tightly coupled memory for
instructions and data. ● Employs a 6-stage pipeline to achieve maximum
DMIPS/MHz. ● Performs dynamic branch prediction. ● Provides hardware multiply, divide, and shift options to
improve arithmetic performance. ● Supports the addition of custom instructions. ● Supports the JTAG debug module. ● Supports optional JTAG debug module enhancements,
including hardware breakpoints and real-time trace.
Nios II/s “standard” core is designed for small core size. On-chip logic and memory resources are conserved at the expense of execution performance. ● Uses approximately 20% less logic than Nios II/f ● Its execution performance also drops by roughly 40%
Main design goals include ● Do not cripple performance for the sake of size. ● Remove hardware features that have the highest ratio of resource
usage to performance impact. ● Optimal core for cost-sensitive, medium-performance
applications. Main applications are with large amounts of code and/or data, such as systems running an operating system where performance is not the highest priority.
Overview: Nios II/s ● Has instruction cache (512 bytes to 64 Kbytes)
but no data cache ● Can access up to 2 GBytes of external address space ● Supports optional tightly coupled memory for instructions ● Performs static branch prediction. ● Does not support bit-31 data cache bypass ● Provides hardware multiply, divide, and shift options to
improve arithmetic performance ● Supports the addition of custom instructions ● Supports the JTAG debug module ● Supports optional JTAG debug module enhancements,
including hardware breakpoints and real-time trace ● Employs a 5-stage pipeline of
Nios II/e “economy” core is designed to achieve the smallest possible core size. Singular design goal Reduce resource utilization, while maintaining compatibility with the Nios II instruction set architecture. The core is roughly half the size of the standard core, but the performance is substantially lower.
Overview: Nios II/e Core ● Executes at most one instruction per six clock cycles ● Can access up to 2 GBytes of external address space ● Supports the addition of custom instructions ● Supports the JTAG debug module ● Does not provide hardware support for potential
unimplemented instructions ● Has no instruction cache or data cache ● Does not perform branch prediction ● No hardware support for any of the potential
unimplemented instructions. ● Employs dedicated shift circuitry to perform shift and
Nios-II CPU Operating Modes The Nios II processor has two operating modes:
Normal Mode ● System and application code execute in normal mode. ● Registers bt (r25), ba (r30) & bstatus (ctl2) are not available.
Debug Mode ● Software debugging tools use it to implement breakpoints and
watch-points. ● System and application code never execute in debug mode.
Changing Modes ● The processor starts in normal mode after reset. ● It enters debug mode only as directed by debugging tools. ● System and application code have no control over when the
processor enters debug mode. Processor returns to its prior state on exiting from debug mode
MIPS32 4K CPU Soft Core Main blocks of the core are: Execution Unit, Multiply-Divide Unit (MDU), System Control Coprocessor (CP0), Memory Management Unit (MMU), Cache Controller, Bus I/F Unit (BIU), I-Cache, D-Cache, Enhanced JTAG Controller, Power Management
● 32-bit Address and Data Paths ● Programmable Cache Sizes (0 to 16 Kbytes) ● Supports for ScratchPad RAM Max 20-bit index (1M address) ● Supports Multiply-Divide Unit: 32x16 multiply per clock
32x32 multiply every other clock ● Power Control ● Supports EJTAG debug module: Test Access Port (TAP)
facilitates high-speed download of application code ● Employs a 5-stage pipeline of
Instruction, Execute, Memory, Align/Accum & Writeback ● CP0 is responsible for: Virtual-to-physical address translation, cache protocols, exception control system, processor’s diagnostics, operating mode selection (kernel vs. user mode) and enabling/disabling of interrupts.