Top Banner
UC Regents Spring 2014 © UCB CS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and Engineering www-inst.eecs.berkeley.edu/ ~cs152/ TA: Eric Love Lecture 2 Single Cycle Wrap-up Pla y:
67

UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

Dec 24, 2015

Download

Documents

Pauline Fowler
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up

2014-1-23

John Lazzaro(not a prof - “John” is always OK)

CS 152Computer Architecture and Engineering

www-inst.eecs.berkeley.edu/~cs152/

TA: Eric Love

Lecture 2 – Single Cycle Wrap-up

Play:

Page 2: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

NvidiaTegra K1 Tech Talk

5:30 PMthis

Thursday in the Woz.

Tegra K1 remixes theKepler GPU

architecture for lowpower

SOCs.

Page 3: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Topics for today’s lecture

Single-Cycle CPU Design

Very Long Instruction Words (VLIW): Doing more work in a single cycle.

Short Break.

Walk up to John and Eric during the break to discuss

individual administrative issues.

Page 4: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up

Single Cycle CPU Design

Page 5: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

Single Cycle CPU design

All instructions execute in a single cycle of the clock.

(positive edge to positive edge)

All state elements act like positive edge-triggered flip flops.

D Q

clk

Page 6: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

No delayed branches.

No delayed loads.

The PC of the next instruction executed after a taken branch is the branch target of the taken branch.

The next instruction executed after the load sees the value that was retrieved by the load in the appropriate register.

We will re-introduce delayed branch and delayed load semantics in the pipelining

lecture.

Contract changes

Page 7: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

Architected state

Main Memory 2^32 bytes

organized as 32-bit words

...

00000000

00000004

00000008

FFFFFFFF

FFFFFFFC

FFFFFFF8

addr

next instr

Program Counter (PC)32 bits

32 32-bit Registers

...

R31R30

R1

R0 [hardwired to constant 0]

The state visible to the programmer.

The state that appears in machine

language instructions.

Page 8: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

Architected state

Main Memory 2^32 bytes

organized as 32-bit words

...

00000000

00000004

00000008

FFFFFFFF

FFFFFFFC

FFFFFFF8

addr

next instr

Program Counter (PC)32 bits

32 32-bit Registers

...

R31R30

R1

R0 [hardwired to constant 0]

All state elements in our single-cycle CPU design hold

architected state.

Page 9: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Recall: MIPS R-format instructions

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

Fetch next inst from memory:012A4020

opcode rs rt rd functshamtDecode fields to get : ADD $8 $9 $10

“Retrieve” register values: $9 $10

Add $9 to $10

Place this sum in $8

Prepare to fetch instruction that follows the ADD in the program.

Syntax: ADD $8 $9 $10 Semantics: $8 = $9 + $10

Page 10: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Goal #1: An R-format single-cycle CPU

opcode rs rt rd functshamt

Syntax: ADD $8 $9 $10 Semantics: $8 = $9 + $10

Sample program:ADD $8 $9 $10SUB $4 $8 $3AND $9 $8 $4...

How registers get their initial values are not of concern to us right now.

No loads or stores: machine has no use for data memory, only instruction memory.

No branches or jumps: machine only runs straight line code.

Page 11: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Separate Read-Only Instruction Memory

32

Addr

Data

32

InstrMem Reads are combinational: Put a

stable address on input, a short time later data appears on output.

Not concerned about how programs are loaded into this memory.

Related to separate instruction and data caches in “real” designs.

Page 12: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Task #1: Straight-line Instruction Fetch

32

Addr

Data

32

InstrMem

Fetching straight-line MIPS instructions requires a machine that generates this timing diagram:

“Requirement

s”

Why +4 and not +1?

Why increment every cycle?

CLK

Addr

Data IMem[PC + 8]IMem[PC + 4]IMem[PC]

PC + 8PC + 4PC

PC == Program Counter, points to next instruction.

32-bit instructions.

Straight-line code.

Page 13: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

New Component: Register (for PC)

In later examples, we will add an “enable” input: clock edge updates state only if enable is high.

32Din

Clk

PC

Dout32

Built out of an array of flip-flops

D Q

clk

D Q

D Q

Din0

Din1

Din2

Dout0

Dout1

Dout2

How to design?

Mux Q back to D.

Page 14: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

New Component: A 32-bit adder (ALU)

Combinational: Put A and B values on inputs, a short time later A + B appears on output.

32+

32

32

A

B

A + B

32ALU

32

32

A

B

A op B

op

ln(#ops)ALU: Combinational part that is able to execute many functions of A and B (add, sub, and, or, ... ).The “op” value selects the function.

Equal?

Sometimes, extra outputs for use by control logic ...

Page 15: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Design: Straight-line Instruction Fetch

Clk

32Addr Data

InstrMem

32D

PC

Q32

32

+

32

320x4

+4 in hexadecimal

State machine design in the service of an ISA

CLK

Addr

Data IMem[PC + 8]IMem[PC + 4]IMem[PC]

PC + 8PC + 4PC

Page 16: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

Fetch next inst from memory:012A4020

opcode rs rt rd functshamtDecode fields to get : ADD $8 $9 $10

“Retrieve” register values: $9 $10

Add $9 to $10

Place this sum in $8

Prepare to fetch instruction that follows the ADD in the program.

Syntax: ADD $8 $9 $10 Semantics: $8 = $9 + $10

Goal #1: An R-format single-cycle CPU

Done!

To continue, we need registers ...

Page 17: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

MIPS Register file: From the top down

R1

R2

...

R31

Why is R0 special?

Q

Q

Q

R0 - The constant 0 Q

clk

.

.

.

32MUX

32

32

sel(rs1)

5

.

.

.

rd1

32MUX

32

32

sel(rs2)

5

.

.

.

rd2

“two read ports”

D

D

D

En

En

En

DEMUX

.

.

.

sel(ws)5

WE

How do we add a second write port?

wd

32

Duplicate write buses, add muxes.

Page 18: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Register File Schematic Symbol

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

Why do we need WE?

If we had a MIPS register file w/o WE, how could we work around it?

Advanced planning, for instructions that don’t write the register file.

Do writes to the hardwired-to-zero register R0.

Page 19: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

InstructionFetch

InstructionDecode

OperandFetch

Execute

ResultStore

NextInstruction

Fetch next inst from memory:012A4020

opcode rs rt rd functshamtDecode fields to get : ADD $8 $9 $10

“Retrieve” register values: $9 $10

Add $9 to $10

Place this sum in $8

Prepare to fetch instruction that follows the ADD in the program.

Syntax: ADD $8 $9 $10 Semantics: $8 = $9 + $10

Goal #1: An R-format single-cycle CPU

What do we do with these?

Page 20: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Computing engine of the R-format CPU

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

32ALU

32

32

op

opcode rs rt rd functshamt

Decode fields to get : ADD $8 $9 $10

Logic

What do we do with WE?

Hardwire to always write.

Page 21: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Putting it all together ...

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

32ALU

32

32

op

LogicIs it safe to use same clock for PC and RegFile?

32Addr Data

InstrMem

32D

PC

Q32

32

+

32

320x4

To rs1,rs2, ws, op decodelogic ...

Yes!

Page 22: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

D Q

CLK

Value of D is sampled on positive clock edge.Q outputs sampled value for rest of cycle.

D

Q

Recall: Our ideal-world D Flip-Flop

Also assume: clocks arrive at all flip flops simultaneously.

Page 23: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Reminder: How data flows after posedge

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

32ALU

32

32

op

Logic

Addr Data

InstrMem

D

PC

Q+

0x4

Page 24: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Next posedge: Update state and repeat

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

D

PC

Q

In this ideal world, as long as the clock is slow enough, the machine gets the right answer.

In Metrics lecture,we look at theassumptions behind ideality.

Page 25: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Next Step ...

Design stand-alone machines for other major classes of instructions:immediates, branches, load/store.

Learn how to efficiently “merge” single-function machines to make one general-purpose machine.

Page 26: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Goal #2: add I-format ALU instructions

Syntax: ORI $8 $9 64 Semantics: $8 = $9 | 64

16-bit immediate extended to 32 bits.

In this example, $9 is rs and $8 is rt.

Zero-extend: 0x8000 ⇨ 0x00008000

Sign-extend: 0x8000 ⇨ 0xFFFF8000

Some MIPS instructions zero-extend immediate field, other instructions sign-

extend.

Page 27: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Computing engine of the I-format CPU

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

32ALU

32

32

op

Decode fields to get : ORI $8 $9 64

Logic

In a Verilog implementation, what should we do with rs2?

Ext

Tie to the value that minimizes energy consumption.

Page 28: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Merging data paths ...

I-format

R-format

Where ?

How many ?(ignore ALU control)

Add muxes

N

N

N

2

Page 29: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

The merged data path ...

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

32ALU

32

32

op

opcode rs rt rd functshamt

RegDest

ALUsrc

Ext

ExtOp

ALUctr

If you watched it being designed, it’s understandable ...

Page 30: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up

Memory Instructions

Page 31: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Loads, Stores, and Data Memory ...

32Dout

Data Memory

WE32Din

32Addr

Syntax: LW $1, 32($2)

Syntax: SW $3, 12($4) Action: $1 = M[$2 + 32] Action: M[$4 + 12] = $3

Writes are clocked: If WE is high, memory Addr captures Din on positive edge of clock.

Reads are combinational: Put a stable address on Addr,a short time later Dout is ready.

Note: Not a realistic main memory (DRAM) model ...

Zero-extend or sign-extend immediate field?

Sign-extend.

Page 32: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Adding data memory to the data path

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

ExtRegDest

ALUsrcExtOp

ALUctr

MemToRegMemWr

Syntax: LW $1, 32($2)

Syntax: SW $3, 12($4) Action: $1 = M[$2 + 32] Action: M[$4 + 12] = $3

RegWr

Recall spec: no load delay slot.

Page 33: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up

Branch Instructions

Page 34: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Conditional Branches in MIPS ...

Syntax: BEQ $1, $2, 12

Action: If ($1 != $2), PC = PC + 4

Zero-extend or sign-extend immediate field?

Action: If ($1 == $2), PC = PC + 4 + 48

Immediate field codes # words, not # bytes.Why is this encoding a good

idea?

Why is this extension method a good idea?

Increases branch range to 128 KB.

Supports forward and backward branches.

Sign-extend.

Page 35: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Adding branch testing to the data path

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

ExtRegDest

ALUsrcExtOp

ALUctr

MemToRegMemWr

Syntax: BEQ $1, $2, 12Action: If ($1 != $2), PC = PC + 4Action: If ($1 == $2), PC = PC + 4 + 48

Equal (wire into control)

RegWr

Page 36: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Recall: Straight-line Instruction Fetch

32

Addr

Data

32

InstrMem Fetching straight-line MIPS

instructions requires a machine that generates this timing diagram:

CLK

Addr

Data IMem[PC + 8]IMem[PC + 4]IMem[PC]

PC + 8PC + 4PC

PC == Program Counter, points to next instruction.

Page 37: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Recall: Straight-line Instruction Fetch

CLK

Addr

Data IMem[PC + 8]IMem[PC + 4]IMem[PC]

PC + 8PC + 4PC

Clk

32Addr Data

InstrMem

32D

PC

Q32

32

+

32

320x4

Syntax: BEQ $1, $2, 12Action: If ($1 != $2), PC = PC + 4Action: If ($1 == $2), PC = PC + 4 + 48

How do we add this behavior ?

Page 38: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Design: Instruction Fetch with Branch

Clk

32Addr Data

InstrMem

32D

PC

Q

32

32+

32

32

0x4

Syntax: BEQ $1, $2, 12Action: If ($1 != $2), PC = PC + 4Action: If ($1 == $2), PC = PC + 4 + 48

PCSrc

32

+32

Extend

Page 39: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up

Single-Cycle Control

Page 40: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

What is single cycle control?

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

ExtRegDest

ALUsrcExtOp

ALUctr

MemToRegMemWr

Equal

RegWr

32Addr Data

InstrMem

Equal

RegDestRegWr

ExtOpALUsrc MemWr

MemToReg

PCSrc

Combinational Logic(Only Gates, No Flip Flops)

Just specify logic functions!

rs,rt,rd,imm

Page 41: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Two goals when specifying control logic

Bug-free: One “0” that should be a “1” in the control logic function breaks contract with the programmer.

Efficient: Logic function specification should map to hardware with good performance properties: fast, small, low power, etc.

Should be easy for humans to read and understand: sensible

signal names, symbolic constants ...

Page 42: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

time machine back to FPGA-oriented 2006 CS 152 ...

Page 43: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Advice: Carefully written Verilog will yield identical semantics in ModelSim and Synplicity. If you write your code in this way, many “works in Modelsim but not on Xilinx” issues disappear.

In practice: Use behavioral Verilog

Always check log files, and inspect output tools produce!

Look for tell-tale Synplicity “warnings and errors” messages !

“latch generated”, “combinational loop detected”, etc

Automate with scripts if possible.

Page 44: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

F06 152 Labs: A small subset of MIPS ...

What if some other instruction appears in the instruction stream?

For labs: undefined.

Real world: exceptions.

Page 45: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Why not in labs? Doubles complexity!

Components in blue handle exceptions ...Will cover this (pipelined CPU) example later in the term ...

Components in blue handle exceptions ...Will cover this (pipelined CPU) example later in the term ...

Page 46: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

A slide from Eric’s section on 1/22 ...

Actually, I agree ...However ... if you aren’t able to

design at the level we just worked through, you will unwitting propose unbuildable ideas.... and lose the confidence of your fellow team members.

Page 47: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

Upcoming 2014 Lab 1 ...

If you understoo

d this lecture,

you nowhave the

conceptual foundation to modify

this design.

RISC-V Single-Cycle CPU

Written in Chisel

... if you’re willing to spend a few days to teach yourself

Chisel.

Page 48: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up

Break

Play:

Page 49: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up

VLIW

VeryLongInstructionWords

Josh Fisher: idea grew out of his Ph.D (1979) in compilers

Led to a startup

(MultiFlow) whose

computers worked, but

which went out of business ...

the ideas remain

influential.

Page 50: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Basic Idea: Super-sized Instructions

Example: All instructions are 64-bit. Each instruction consists of two 32-bit MIPS instructions, that execute in parallel.

opcode rs rt rd functshamt

opcode rs rt rd functshamt

Syntax: ADD $8 $9 $10 Semantics:$8 = $9 + $10

Syntax: ADD $7 $8 $9 Semantics:$7 = $8 + $9

A 64-bit VLIW instruction

Page 51: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

VLIW Assembly Syntax ...

Instr: ADD $8 $9 $10 ADD $7 $8 $9

Denotes start of an instruction word. Listed operators all

execute in parallel.

Instr: SUB $2 $3 $0 OR $1 $5 $4 Execute in

parallel.

Label: AND $5 $2 $3 OR $1 $5 $4

[...]

Branch label name instead of default “instr”.

Page 52: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

ADD $8 $9 $10; Result: $8 = 19

ADD $7 $8 $9; Result: $7 = 28

32-bit MIPS:

Assume: $7 = 7, $8 = 8, $9 = 9, $10 = 10 (decimal)

VLIW:

Instr: ADD $8 $9 $10 ; result $8 = 19ADD $7 $8 $9 ; result $7 = 17 (not 28)

32-bit & 64-bit semantics different? Yes!

Page 53: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Design: A 64-bit VLIW R-format CPU

No loads or stores: machine has no use for data memory, only instruction memory.

No branches or jumps: machine only runs straight line code.

opcode rs rt rd functshamt

opcode rs rt rd functshamt

Syntax: ADD $8 $9 $10 Semantics:$8 = $9 + $10

Syntax: ADD $7 $8 $9 Semantics:$7 = $8 + $9

Page 54: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

VLIW: Straight-line Instruction Fetch

Clk

Addr Data

InstrMem

32D

PC

Q32

32

+

32

32

CLK

Addr

Data IMem[PC + 16]IMem[PC + 8]IMem[PC]

PC + 16PC + 8PC

64

0x8

+8 in hexadecimal -- 64 bit instructions

Simple changes to support 64-bit instructions ...

Page 55: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Computing engine of VLIW R-format CPU

opcode rs rt rd functshamt

opcode rs rt rd functshamt

32ALU

32

32

op

32ALU

32

32

op

32rd1

RegFile

32rd2

WE1

32wd1

5rs1

5rs2

5ws1

WE2

32rd3

32rd4

5rs3

5rs4

32 wd2

5 ws2

Page 56: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

What have we gained with 64-bit VLIW?

opcode rs rt rd functshamt

opcode rs rt rd functshamt

Syntax: ADD $8 $9 $10 Semantics:$8 = $9 + $10

Syntax: ADD $7 $8 $9 Semantics:$7 = $8 + $9

If:Clock speed remains the same.

All 32-bit operators do useful work.

Performance doubles!

N x 32-bit VLIW yields factor of N speedup! Multiflow: N = 7, 14, or 28 (3 CPUs in product family)

Page 57: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

What does N = 14 assembly look like?

Two instructions

from a scientific

benchmark (Linpack) for

a MultiFlow CPU with

14 operations per

instruction.

Page 58: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

What have we gained with 64-bit VLIW?

opcode rs rt rd functshamt

opcode rs rt rd functshamt

Syntax: ADD $8 $9 $10 Semantics:$8 = $9 + $10

Syntax: ADD $7 $8 $9 Semantics:$7 = $8 + $9

If:Clock speed remains the same

All 32-bit operators do useful work.

Performance doubles!

N x 32-bit VLIW yields factor of N speedup! Multiflow: N = 7, 14, or 28 (3 CPUs in product family)

A very big “if” !

Page 59: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

As N scales, HW and SW needs conflict

Instruction Set Architecture: Where the conflict plays out.

I/O systemProcessor

Digital DesignCircuit Design

Datapath & Control

Transistors

MemoryHardware

CompilerOperating

System(Mac OS X)

Application (iTunes)

Software Assembler

Hardware need: Clock does not slow down.

Software need: All operators do useful work.

Page 60: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Example problem: Register file ports ...

32ALU

32

32

op

32ALU

32

32

op

32rd1

RegFile

32rd2

WE1

32wd1

5rs1

5rs2

5ws1

WE2

32rd3

32rd4

5rs3

5rs4

32 wd2

5 ws2

N ALUs require 2*N read ports and N write ports. Why is this a

problem?

Page 61: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Recall: Register File Design

R1

R2

...

R31

Q

Q

Q

R0 - The constant 0 Q

clk

.

.

.

.

.

32MUX

32

32

sel(rs1)

5

.

.

.

rd1

32MUX

32

32

sel(rs2)

5

.

.

.

rd2

D

D

D

En

En

En

DEMUX

.

.

.

sel(ws)5

WE

wd32

More read ports increases fanout, slows down reads.

More write ports adds data muxes, demux OR tree.

Page 62: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Split register files: A solution?

32ALU

32

32

op

32ALU

32

32

op

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

32rd1

RegFile

32rd2

WE32wd

5rs1

5rs2

5ws

Too often, the data an ALU needs to do “useful work” will not be in its own

regfile.

Software need: All operators do useful work.

Page 63: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Architect’s job: Find a good compromise

Instruction Set Architecture: Where the conflict plays out.

I/O systemProcessor

Digital DesignCircuit Design

Datapath & Control

Transistors

MemoryHardware

CompilerOperating

SystemSoftware Assembler

Application

Example solution: Split register files, with a dedicated bus and special instructions for moves between regfiles. Mayhurt software more than it helpshardware :-(

Page 64: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Branch policy: All instr operators execute

opcode rs rt rd functshamt

opcode rs rt rd functshamt

BNE $8 $9 Label ADD $7 $8 $9

Problem: Large N machines find it hard to fill all operators with useful work.

ADD executes if branch is taken or not taken.

Solution: New “predication” operator.Syntax: SELECT $7 $8 $9 $10

Semantics: If $8 == 0, $7 = $10, else $7 = $9

Permits simple branches to be converted to inline code.

Page 65: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Branch nesting in a single instruction ...

opcode rs rt rd functshamt

opcode rs rt rd functshamt

BEQ $8 $9 LabelOne

Conundrum: How to define the semantics of multiple branches in one instruction?

BEQ $11 $12 LabelTwo

MultiFlow: N-way Branch priority set in an opcode field.

Solution: Nested branch semanticsIf $8 == $9, branch to LabelOne

Else $11 == $12, branch to LabelTwo

Page 66: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

UC Regents Spring 2014 © UCBCS 152: Single-Cycle Design

Will return to VLIW later in semester ...

Page 67: UC Regents Spring 2014 © UCBCS 152: L2 Single-Cycle Wrap-up 2014-1-23 John Lazzaro (not a prof - “John” is always OK) CS 152 Computer Architecture and.

Next Tuesday

... and if we have time, we’ll discuss microcode (on class website, click on link

for reading PDF).

How to measure the “goodness” of an architecture (and an implementation) ...

Have a good weekend!