Array vs. Linked list ‐ Quickly changing size, order ‐ More often dynamically(rarely statically) ‐ Slowly changing size, order ‐ Could be contiguous (when static) but most often not ‐ Slower traversal / additional memory for storing pointers but flexible structure ‐ Could be allocated dynamically or statically ‐ Contiguous location in memory ‐ Fast traversal / no memory overhead but fixed structure Example of array MIPS pseudocode [ ] .data a: .word 100 text Example of array C code Example of array MIPS pseudocode int a[100]; void main () { int b[10]; int size; * . text addi $sp, $sp, ‐10*4; add $t0, $sp, $0 ($t0 is base of array b) **** add $a0 $0 $t1 ($t1 has value of size) main: int *p; **** p = (int *)malloc(sizeof(int)*size); **** f () add $a0, $0, $t1 ($t1 has value of size) jal malloc (malloc returning memory address to $v0) **** add $a0 $t1 $0 ($t1 has value of free(p); **** } add $a0, $t1, $0 ($t1 has value of memory address of p) add $a0, $v0, $0 jal free **** **** addi $sp, $sp, +10*4; jr $ra malloc and free are a OS procedures Is returning pointer to array b from main() a good idea?
51
Embed
Array vs. Linked list - UCSBstrukov/ece154Winter2012/Lecture6.pdf · Array vs. Linked list ‐Quickly changing size, order ‐Slowly changing size, order ‐More often dyyynamically
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Array vs. Linked list‐ Quickly changing size, order‐ More often dynamically (rarely statically)‐ Slowly changing size, order y y ( y y)‐ Could be contiguous (when static) but most often not
‐ Slower traversal / additional memory for storing pointers but flexible structure
‐ Could be allocated dynamically or statically‐ Contiguous location in memory ‐ Fast traversal / no memory overhead but fixed structure
Example of array MIPS pseudocode
[ ]
.dataa: .word 100
text
Example of array C code
Example of array MIPS pseudocode
int a[100];void main () {int b[10];int size;
*
.textaddi $sp, $sp, ‐10*4;add $t0, $sp, $0 ($t0 is base of array b)****add $a0 $0 $t1 ($t1 has value of size)
main:
int *p;****p = (int *)malloc(sizeof(int)*size);****f ( )
add $a0, $0, $t1 ($t1 has value of size)jal malloc (malloc returning
memory address to $v0)****add $a0 $t1 $0 ($t1 has value offree(p);
****}
add $a0, $t1, $0 ($t1 has value of memory address of p)
Deleting from doubly linked list example – I(from quiz ECE15B 2011)
Deleting from doubly linked list example ‐ II
Before proceeding to datapath and control let’s first review some fundamental concepts from digital logic
l h l• Critical Path Delay
Sequential ElementsSequential Elements• Register: stores data in a circuit
Uses a clock signal to determine when to update– Uses a clock signal to determine when to update the stored value
– Edge‐triggered: update when Clk changes from 0Edge‐triggered: update when Clk changes from 0 to 1
D QClk
D
Clk
D
Q
Chapter 4 — The Processor —6
Sequential ElementsSequential Elements• Register with write control
Only updates on clock edge when write control– Only updates on clock edge when write control input is 1
– Used when stored value is required laterUsed when stored value is required later
Clk
D Q Write
Clk
ClkWrite D
Q
Chapter 4 — The Processor —7
Clocking MethodologyClocking Methodology• Combinational logic transforms data during clock cyclesy– Between clock edges– Input from state elements, output to state p , pelement
– Longest delay determines clock period
Chapter 4 — The Processor —8
Register File and SRAM
/ k
Write data
2 k -bit registersh / k
/ k
/ h
Write address
Muxes
/ k /
Write enable
/ k
Write data Write
Read data 0
Q C
Q
D FF
oder
/ k
Write enable
Read data 0
/ k
/ k
k
/ h / h
/ h
Read addr 0
/ k
Read addr 1
addr
Read enable
Read data 1
Q C
Q
D FF
Dec
o / k
/ k
/ k
Read data 1
/
Read enable
(b) Graphic symbol for register file
Q D
/ k
Q C
Q
D FF
/
h Read address 0
Read address 1
Read enable
k
k
/h
C
Q
FF
/ k
Push
/ k
Input
Output Pop
Full
Empty
Fig re 2 9 Register file ith random access andState element might have write‐enable signal!
/ (a) Register file with random access (c) FIFO symbol
Jan. 2011Computer Architecture, Background and
MotivationSlide 9
Figure 2.9 Register file with random access and FIFO SRAM memory is simply a large, single-port register file
Single Cycle DatapathSingle Cycle Datapath
• General AssumptionsGeneral Assumptions– One instruction executed per one cycle
State elements are updated only at the end of– State elements are updated only at the end of cycle
– Control and data signals (after they become– Control and data signals (after they become stable) does not change during cycle
• Two addition operations during cycle = two physical p g y p yadders (however still want to share resources among instructions as much as possible – e.g. by using muxes)
Introduction§4.1 IntroIntroduction
• CPU performance factors
oduction
– Instruction count• Determined by ISA and compiler
– CPI and Cycle time• Determined by CPU hardware
• We will examine two MIPS implementations– A simplified version– A more realistic pipelined version
• Simple s bset sho s most aspects• Simple subset, shows most aspects– Memory reference: lw, sw– Arithmetic/logical: add, sub, and, or, slt– Control transfer: beq, jo o a s e beq, j
Note that datapath from P&H (which is simpler) is discussed in details in class while datapathfrom BP is given as a reference
Chapter 4 — The Processor —11
g
Instruction ExecutionInstruction Execution
• PC instruction memory, fetch instructionPC instruction memory, fetch instruction
• Register numbers register file, read registers• Depending on instruction classDepending on instruction class
– Use ALU to calculate• Arithmetic result
• Memory address for load/store
• Branch target address
Access data memory for load/store– Access data memory for load/store
– PC target address or PC + 4
Chapter 4 — The Processor —12
CPU OverviewCPU Overview
Chapter 4 — The Processor —13
‐Where are the state elements?‐Why three adders and two memories (caches), i.e. no sharing for single data path?
MultiplexersMultiplexers Can’t just join wires togethertogether Use multiplexers
Chapter 4 — The Processor —14
ControlControl
Chapter 4 — The Processor —15
Instruction FetchInstruction Fetch
32‐bit register
Increment by 4 for next instruction
register
Chapter 4 — The Processor —16
R‐Format InstructionsR Format Instructions• Read two register operands
– Add to PC + 4• Already calculated by instruction fetch
Chapter 4 — The Processor —19
Branch InstructionsBranch InstructionsJust
tre‐routes wires
Sign‐bit wire
Chapter 4 — The Processor —20
Sign bit wire replicated
Composing the ElementsComposing the Elements
• First‐cut data path does an instruction in oneFirst cut data path does an instruction in one clock cycle– Each datapath element can only do one function– Each datapath element can only do one function at a time
– Hence we need separate instruction and dataHence, we need separate instruction and data memories
• Use multiplexers where alternate data sourcesUse multiplexers where alternate data sources are used for different instructions
We will refer to this diagram laterSeven R format ALU instructions (add, sub, slt, and, or, xor, nor)Six I-format ALU instructions (lui, addi, slti, andi, ori, xori)Two I-format memory access instructions (lw, sw)Three I-format conditional branch instructions (bltz, beq, bne)F diti l j i t ti (j j j l ll)
Feb. 2011Computer Architecture, Data Path and
ControlSlide 35
Four unconditional jump instructions (j, jr, jal, syscall)
The MicroMIPS Instruction Set
Instruction UsageLoad upper immediate lui rt,imm
Add add rd,rs,rtCopy
op15
0
fn
32Instruction SetSubtract sub rd,rs,rt
Set less than slt rd,rs,rt
Add immediate addi rt,rs,imm
Set less than immediate slti rd,rs,imm
Arithmetic008
10
3442
Set less than immediate slti rd,rs,imm
AND and rd,rs,rt
OR or rd,rs,rt
XOR xor rd,rs,rt
NOR d tL i
100000
36373839NOR nor rd,rs,rt
AND immediate andi rt,rs,imm
OR immediate ori rt,rs,imm
XOR immediate xori rt,rs,imm
Logic 0121314
39
Load word lw rt,imm(rs)
Store word sw rt,imm(rs)
Jump j L
Jump register jr rs
Memory access3543
20 8
Branch less than 0 bltz rs,L
Branch equal beq rs,rt,L
Branch not equal bne rs,rt,L
Jump and link jal L
Control transfer1453T bl 13 1
Feb. 2011Computer Architecture, Data Path and
ControlSlide 36
Jump and link jal L
System call syscall
30 12Table 13.1
13.2 The Instruction Execution Unit
Next addr
5 bits 5 bits 31 25 20 15 0
Opcode Source 1 or base
Source 2 or dest’n
op rs rt
R 6 bits 5 bits
rd
5 bits
sh
6 bits 10 5
fn
jta
imm Operand / Offset, 16 bits
Destination Unused Opcode ext I
Jblt j
beq,bnesyscall
Next addr jta
rs,rt,rd (rs) PC
jtaJump target address, 26 bits Jinst
Instruction, 32 bits
bltz,jr
12 A/L
j,jal22 instructions
ALU
Data cache
Instr cache
Reg file
inst
(rt)
Address Data
12 A/L, lui, lw,sw
op fn
imm
Harvard
Fig. 13.2 Abstract view of the instruction execution unit for MicroMIPS. F i f i t ti fi ld Fi 13 1
Control Harvard
architecture
Feb. 2011Computer Architecture, Data Path and
ControlSlide 37
For naming of instruction fields, see Fig. 13.1.
13.3 A Single‐Cycle Data Path
Next addr jta
ALUOvfl Next PC
Incr PC
Register
ALU Data Instr Reg f
inst
rs (rs) Data addr
rt Data out
Ovfl
0
(PC)
ALU out
PC
0
Register writeback
/
ALU cachecache file
imm
(rt) Data in 0
1 SE
rd
32 / 16
Func 31
012
01 2
op fn
ALUSrc DataRead RegInSrcRegDst
Register input
Fig 13 3 Ke elements of the single c cle MicroMIPS data path
ALUSrcALUFunc DataWrite
DataRead RegInSrcRegDstRegWrite Br&Jump
Instruction fetch Reg access / decode ALU operation Data access
Feb. 2011Computer Architecture, Data Path and
ControlSlide 38
Fig. 13.3 Key elements of the single-cycle MicroMIPS data path.
An ALU for MicroMIPSAmount
Constant amount 5
ConstVar
0
Shift function 2
00 Shift
00 No shift 01 Logical left 10 Logical right 11 Arith right
lui
Shifter 5 Variable amount
5 1 Function class
2 5 LSBs Shifted y
32
00 Shift01 Set less 10 Arithmetic 11 Logic We use only 5
control signals ( hift )
x y
x
Adder
c 0 s
0
1
2
y
32
32MSB
Shorthand symbol for ALU
0 or 1
imm (no shifts)
AddSub
y
Adder
c 32 k /
2
3 32 c 31
32
x
sFunc
Control 5
Logic unit
32-input NOR
ALU
y
s
Ovfl Zero
Fig 10 19 A multifunction ALU with 8 control signals (2 for function class
Logic function 2
NOR
Ovfl Zero
AND 00 OR 01
XOR 10 NOR 11
Feb. 2011Computer Architecture, Data Path and
ControlSlide 39
Fig. 10.19 A multifunction ALU with 8 control signals (2 for function class, 1 arithmetic, 3 shift, 2 logic) specifying the operation.
13.4 Branching and Jumping(PC)31:2 + 1 Default option(PC)31:2 + 1 + imm When instruction is branch and condition is met(PC)31:28 | jta When instruction is j or jal
Update options
(rt) Branch /32B T
(rs)31:2 When the instruction is jrSysCallAddr Start address of an operating system routine
for PC
Lowest 2 bits of
Adder
(rs)
SE
condition checker / 30
32BrTrue / 32
/ 30
30 MSBs
IncrPC / 30
Lowest 2 bits of PC always 00
jta imm
(PC) inc
1 0 1 2
/ 30 / / 30
/ 30
/ 30 / 26
/ 30 4 MSBs
MSBs
NextPC
31:2
16 4 MSBs
Fig 13 4 Ne t address logic for MicroMIPS (see top part of Fig 13 3)
SysCallAddr
PCSrc
2 3
30 / 30
/ 30
BrType
Feb. 2011Computer Architecture, Data Path and
ControlSlide 40
Fig. 13.4 Next-address logic for MicroMIPS (see top part of Fig. 13.3).
13.5 Deriving the Control SignalsTable 13.2 Control signals for the single-cycle MicroMIPS implementation.
Control signal 0 1 2 3gRegWrite Don’t write WriteRegDst1, RegDst0 rt rd $31RegInSrc RegInSrc Data o t ALU o t IncrPC
Reg file
RegInSrc1, RegInSrc0 Data out ALU out IncrPCALUSrc (rt ) immAddSub Add Subtract
ALU LogicFn1, LogicFn0 AND OR XOR NORFnClass1, FnClass0 lui Set less Arithmetic LogicDataRead Don’t read ReadData
Putting It All Together Fig. 10.19ConstVar Shift f ti
00 No shif t
Shifter
Amount
5
Constant amount
Variable amount
5
5
0
1
0
Function class
2
Shift function
5 LSBs Shifted y
32
2
00 Shif t 01 Set less 10 Arithmetic 11 Logic
01 Logical lef t10 Logical right 11 Arith right
imm
lui
Adder
(rs)
(rt)
SE
(PC)
Branch condition checker
in c
/ 30
/ 32 BrTrue / 32
/ 30
/ 30
/ 30 4
30 MSBs
IncrPC / 30
31:2 16
Fig. 13.4
AddSub
x y
y
x
Adder
c 32
c 0
k /
s 1
2
3
32
32 c 31
32 32
MSB
x
Shorthsymbfor AL
Fun
Cont
0 or 1 jta
imm
SysCallAddr
PCSrc
1 0 1 2 3
/ 30 / 30
/ 30
/ 30
/ 30 30
/ 26
30 4 MSBs
BrType
NextPC 16
4 MSBs
Next addr jta
( )
ALUOvfl Next PC
Incr PC
(PC)
Fig. 13.3 Logic unit
Logic function 2
32-input NOR
Ovfl Zero
A
y OZero
AND 00 OR 01
XOR 10 NOR 11
/
ALU
Data cache
Instr cache
Reg file
inst
rs (rs)
(rt)
Data addr
Data in 0
1SE
rt
rd
32 /
Data out
Func
Ovfl
31
0 1 2
( )
ALU out
PC
0 1 2
Control
addInstsubInst
jInst
../
op fn
imm 1
ALUSrc ALUF D t W it
DataRead
SE
RegInSrcRegDst R W it
/16
Register input
B &J
Control
sltInst
..
..
Feb. 2011 Computer Architecture, Data Path and Control Slide 48
ALUFunc DataWriteRegWriteBr&Jump
13.6 Performance of the Single‐Cycle Design
An example combinational-logic data path to compute z := (u + v)(w – x) / y
Add/Sub Multiply Divide TotalAdd/Sub latency
2 ns
Multiply latency
6 ns
Divide latency15 ns
Total latency23 ns
u
Note that the divider gets its correct inputs after 9 ns, but this won’t cause a problem if we allow enough total time
+
v
if we allow enough total time
/w
xz
Beginning with inputs u, v, w, x, and y stored in registers, the entire computation can be completed in 25 ns, allowing 1 ns each for register readout and write