Data Processor Processed - rishadshafik.netrishadshafik.net/onewebmedia/EEE2007-2017-Processors-coloured.pdf · Data Processor Processed ... Central Processing Unit Computer. Computer

Post on 19-Jul-2018

226 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Data Processor Processed data

The brain of the computing system, meant to carry out the intended functionality, as and when needed.

Instructions

CPU

Processed data

Data

Opcode Mode register Address

Simplified example of an instruction:

Instruction / Data

Control

Memory

Process

Computer

Memory

InputOutput

SystemInterconnects

Peripherals

Communicationlines

CPU: CentralProcessing

Unit

Computer

Computer Arithmeticand Logic Units

ControlUnit

InternalInterconnects

Registers

CPU

I/O

Memory

Systembus

CPU

Computer Arithmeticand Logic Units

ControlUnit

InternalInterconnects

Registers

CPU

I/O

Memory

Systembus

CPU

Status Registers:A set of bits

Includes Condition CodesSign of last result

Zero, Carry, and EqualOverflow

Interrupt enable/disableSupervisor

V1 Coldfire user visible registers‐ 16 general‐purpose 32‐bit registers 

(D0–D7, A0–A7)‐ 32‐bit program counter (PC)‐ 8‐bit condition code register (CCR)

Internal (system) registers‐ 16‐bit status register (SR)‐ 32‐bit supervisor stack pointer (SSP)‐ 32‐bit vector base register (VBR)‐ 32‐bit CPU configuration register 

(CPUCR)

Do NOT try to memorise this!

ARM has a total of 37 registers‐ 31 general purpose / user visible registers‐ 6 internal (system) registers‐ Mostly 32‐bits each, depends on mode‐ 16 User visible registers at once

7 modes of operations:‐ User: normal execution mode‐ System: OS system‐privilege mode‐ FIQ: data transfer mode‐ Supervisor: SVC‐ Abort: abort instruction‐ IRQ: general purpose interrupt services‐ Undefined: when unintended 

instructions are executed

Do NOT try to memorise this!

main(){int a,b,c[50];b = 2;for( a= 0; a < 50; a++)c[a] = a * b;

}

mov r3, #2str r3, [fp, #‐16]mov r3, #0str r3, [fp, #‐20]b .L2

.L3:ldr r1, [fp, #‐20]ldr r2, [fp, #‐20]ldr r3, [fp, #‐16]mul r0, r3, r2mvn r2, #207mov r3, r1, asl #2sub r1, fp, #12add r3, r3, r1add r3, r3, r2str r0, [r3, #0]ldr r3, [fp, #‐20]add r3, r3, #1str r3, [fp, #‐20]

.L2:ldr r3, [fp, #‐20]cmp r3, #49ble .L3sub sp, fp, #12ldmfd sp, {fp, sp, pc}

You’re not expected to learn ARM assembler from this lecture by the way! This shows Register use R0 R1…

Computer Arithmeticand Logic Units

ControlUnit

InternalInterconnects

Registers

CPU

I/O

Memory

Systembus

CPU

Fetch next instruction

Execute instruction

Decodeinstruction

Simplified view

Load PC addressinto Stack

Fetch Decode Execute

(Pre‐)Fetch Decode Execute

Instr. N

Instr. N+1

Detailed data flow

Instruction 3 caused a branch to 15

Instructions4‐7 have stalls

Prefetch the branch instructions and store somewhere non‐conflicting

• Do not prefetch after branch

Instr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3

Instr 4

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Reg ALU DMemIfetch Reg

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 6 Cycle 7Cycle 5

Reg ALU DMemIfetch Reg

Needs careful processor pipeline design with appropriate arbitration between streams(eg. skip the cycle 4)

pipelined

dunpipeline

Time CycleTime Cycle

CPI stall Pipeline CPI Ideal

depth Pipeline CPI Ideal Speedup

pipelined

dunpipeline

Time CycleTime Cycle

CPI stall Pipeline 1

depth Pipeline Speedup

Instper cycles Stall Average CPI Ideal CPIpipelined

For simple pipeline, ideal CPI = 1:

CPI: Cycles per instruction

Without pipelining, CPI is equal to the number of stages in Data Flow; assumingeach stage requires 1 cycle (= Ideal CPI x Pipeline depth)

Clock rates

remember LOADs are problematic

0

loads 

• Would you pay for a double port DRAM system or not?

InstructionFetch

Shift + ALU MemoryAccess

RegWriteReg

ReadRegDecode

FETCH DECODE EXECUTE MEMORY WRITE

ARM9TDMI – 5 stage pipelineARM or ThumbInst Decode

Reg Select

RegRead Shift ALU Reg

WriteThumbARMdecompress

ARM decodeInstructionFetch

FETCH DECODE EXECUTE

ARM7TDMI – 3 stage pipeline

Computer Arithmeticand Logic Units

ControlUnit

InternalInterconnects

Registers

CPU

I/O

Memory

Systembus

CPU

Von NeumannHarvard

CISCRISC

CENTRALPROCESSING

UNIT(CPU)

PROGRAMAND

DATAMEMORY

DATAMEMORYADDRESS

INPUTUNIT

OUTPUTUNIT

CENTRALPROCESSING

UNIT(CPU)

PROGRAMMEMORY

DATAMEMORY

DATAPROGRAM

INSTRUCTION

PROGRAMMEMORYADDRESS

DATAMEMORYADDRESS

INPUT / OUTPUT

CENTRALPROCESSING

UNIT(CPU)

MEMORY

(PROGRAMAND DATA)

ADDRESSBUS

INPUT/OUTPUTDEVICES

CONTROLBUS

DATABUS

Shared L1 Data/Instruction Cache typical in modern systems

Decoder

Program Counter

Instruction Register

Data Address Data

ALU Data Registers

Control Path Data Path

+1

Program Memory Workspace(Data Memory)

Program address

Program Instruction

Data Source 1

Data Source 2

CISC: complex instruction set computerRISC: reduced instruction set computer

Berkeley group coined the term RISC and made a cpu called RISC 1soon after Stanford made a similar cpu: MIPS

SPARC also emerged from SUN

ARM has a range of RISC architectures

early RISC CPUs had about 50 instructions compared to 200‐300‐ aim was to simplify CPU to process  (and start) instructions faster

RISC: Reduced Instruction Set Computer (e.g. ARM)

CISC: Complex Instruction Set Computer (e.g. Intel Pentium)

Computer Arithmeticand Logic Units

ControlUnit

InternalInterconnects

Registers

CPU

I/O

Memory

Systembus

CPU

To make an N‐bit ALU: Duplicate the 1‐bit ALU N times; connect carry‐out to neighbour's carry‐ins

An example: a one bit full adder

Example: a 32‐bit full adder

By paralleling the one‐bit ALUs and some other modification on the logical circuits, we can create bigger ALUs.

This 32‐bit adder can be used to1. Add two 32‐bit numbers2. Subtract 32‐bit numbers3. Generate 

ZERO/NON_ZERO conditions

4. Set overflow flags

/enable(2’s complement)(2’s complement)

The decoder determines the nature of the instruction in IR, and passes this on to the PLA.

The control Programmable Logic Array (PLA) takes in information from all the other modules, and generates the appropriate data path operations

••

•••

1.

2.3.4.

a.

b.

c.

5.

top related