EI209 Chapter 2.3Haojin Zhu, SJTU, 2015 The Language a Computer Understands Word a computer understands: instruction Vocabulary of all words a computer understands: instruction set (aka instruction set architecture or ISA) Different computers may have different vocabularies (i.e., different ISAs) l iPhone (ARM) not same as Macbook (x86) Or the same vocabulary (i.e., same ISA) l iPhone and iPad computers have same instruction set (ARM)
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Design a program to check if your computer is big endian or little endian?
Please submit the source code and the execution results.
Submission Due: Oct.8
EI209 Chapter 2.3 Haojin Zhu, SJTU, 2015
The Language a Computer Understands
Word a computer understands: instruction Vocabulary of all words a computer understands:
instruction set (aka instruction set architecture or ISA) Different computers may have different vocabularies (i.e.,
different ISAs) iPhone (ARM) not same as Macbook (x86)
Or the same vocabulary (i.e., same ISA) iPhone and iPad computers have same instruction set (ARM)
EI209 Chapter 2.4 Haojin Zhu, SJTU, 2015
The Language a Computer Understands
Why not all the same? Why not all different? What might be pros and cons?
Single ISA (to rule them all):- Leverage common compilers, operating systems, etc.- BUT fairly easy to retarget these for different ISAs (e.g., Linux, gcc)
Multiple ISAs:- Specialized instructions for specialized applications- Different tradeoffs in resources used (e.g., functionality, memory
demands, complexity, power consumption, etc.)- Competition and innovation is good, especially in emerging
environments (e.g., mobile devices)
EI209 Chapter 2.5 Haojin Zhu, SJTU, 2015
MIPS: Instruction Set for EI209
MIPS is a real-world ISA (see www.mips.com) Standard instruction set for networking equipment Was also used in original Nintendo-64!
Elegant example of a Reduced Instruction Set Computer (RISC) instruction set
Invented by John Hennessy @ Stanford Why not Berkeley/Sun RISC invented by Dave Patterson? Ask him!
EI209 Chapter 2.6 Haojin Zhu, SJTU, 2015
RISC Design Principles
Basic RISC principle: “A simpler CPU (the hardware that interprets machine language) is a faster CPU” (CPU Core)
Focus of the RISC design is reduction of the number and complexity of instructions in the ISA
A number of the more common strategies include: Fixed instruction length, generally a single word;
Simplifies process of fetching instructions from memory Simplified addressing modes;
Simplifies process of fetching operands from memory Fewer and simpler instructions in the instruction set;
Simplifies process of executing instructions Only load and store instructions access memory;
E.g., no add memory to register, add memory to memory, etc. Let the compiler do it. Use a good compiler to break complex high-level
language statements into a number of simple assembly language statements
EI209 Chapter 2.7 Haojin Zhu, SJTU, 2015
Mainstream ISAs
ARM (Advanced RISC Machine) is most popular RISC In every smart phone-like device (e.g., iPhone, iPad, iPod, …)
Intel 80x86 is another popular ISA and is used in Macbook and PCs (Core i3, Core i5, Core i7, …)
x86 is a Complex Instruction Set Computer (CISC) 20x ARM sold vs. 80x86 (i.e., 5 billion vs. 0.3 billion)
EI209 Chapter 2.8 Haojin Zhu, SJTU, 2015
MIPSGreen Card
Fall 2012 -- Lecture #6 8
EI209 Chapter 2.9 Haojin Zhu, SJTU, 2015
MIPSGreen Card
Fall 2012 -- Lecture #6 9
EI209 Chapter 2.10 Haojin Zhu, SJTU, 2015
Two Key Principles of Machine Design1. Instructions are represented as numbers and, as
such, are indistinguishable from data
2. Programs are stored in alterable memory (that can be read or written to)
just like data
Stored-program concept Programs can be shipped as files
of binary numbers – binary compatibility
Computers can inherit ready-made software provided they are compatible with an existing ISA – leads industry to align around a small number of ISAs
Accounting prg (machine code)
C compiler (machine code)
Payroll data
Source code in C for Acct prg
Memory
EI209 Chapter 2.11 Haojin Zhu, SJTU, 2015
MIPS-32 ISA Instruction Categories
Computational Load/Store Jump and Branch Floating Point
fixed size instructions small number of instruction formats opcode always the first 6 bits
Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes
Make the common case fast arithmetic operands from the register file (load-store machine) allow instructions to contain immediate operands
Good design demands good compromises three instruction formats
EI209 Chapter 2.14 Haojin Zhu, SJTU, 2015
MIPS Arithmetic Instructions MIPS assembly language arithmetic statement
add $t0, $s1, $s2
sub $t0, $s1, $s2
Each arithmetic instruction performs one operation
Each specifies exactly three operands that are all contained in the datapath’s register file ($t0,$s1,$s2)
destination source1 op source2
Instruction Format (R format)
0 17 18 8 0 0x22
EI209 Chapter 2.15 Haojin Zhu, SJTU, 2015
MIPS fields are given names to make them easier to refer to
MIPS Instruction Fields
op rs rt rd shamt funct
op 6-bits opcode that specifies the operationrs 5-bits register file address of the first source operandrt 5-bits register file address of the second source
operandrd 5-bits register file address of the result’s destinationshamt 5-bits shift amount (for shift instructions)funct 6-bits function code augmenting the opcode
MIPS Memory Access Instructions MIPS has two basic data transfer instructions for
accessing memory
lw $t0, 4($s3) #load word from memory
sw $t0, 8($s3) #store word to memory The data is loaded into (lw) or stored from (sw) a register
in the register file – a 5 bit address
The memory address – a 32 bit address – is formed by adding the contents of the base address register to the offset value
A 16-bit field meaning access is limited to memory locations within a region of 213 or 8,192 words (215 or 32,768 bytes) of the address in the base register
EI209 Chapter 2.23 Haojin Zhu, SJTU, 2015
Load/Store Instruction Format (I format):
lw $t0, 24($s3)
Machine Language - Load Instruction
35 19 8 2410
Memory
data word address (hex)0x000000000x000000040x000000080x0000000c
Byte Addresses Since 8-bit bytes are so useful, most architectures
address individual bytes in memory Alignment restriction - the memory address of a word must be
on natural word boundaries (a multiple of 4 in MIPS-32)
Big Endian: leftmost byte is word address IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA
Little Endian: rightmost byte is word addressIntel 80x86, DEC Vax, DEC Alpha (Windows NT)
msb lsb3 2 1 0
little endian byte 0
0 1 2 3big endian byte 0
EI209 Chapter 2.25 Haojin Zhu, SJTU, 2015
Aside: Loading and Storing Bytes MIPS provides special instructions to move bytes
lb $t0, 1($s3) #load byte from memory
sb $t0, 6($s3) #store byte to memory
0x28 19 8 16 bit offset
What 8 bits get loaded and stored? load byte places the byte from memory in the rightmost 8 bits of
the destination register- what happens to the other bits in the register?
store byte takes the byte from the rightmost 8 bits of a register and writes it to a byte in memory
- what happens to the other bits in the memory word?
EI209 Chapter 2.26 Haojin Zhu, SJTU, 2015
Example of Loading and Storing Bytes Given following code sequence and memory state what is
the state of the memory after executing the code?add $s3, $zero, $zerolb $t0, 1($s3)sb $t0, 6($s3)
Memory
0x 0 0 9 0 1 2 A 0Data Word
Address (Decimal)
0
4
8
12
16
20
24
0x F F F F F F F F
0x 0 1 0 0 0 4 0 2
0x 1 0 0 0 0 0 1 0
0x 0 0 0 0 0 0 0 0
0x 0 0 0 0 0 0 0 0
0x 0 0 0 0 0 0 0 0
What value is left in $t0?
What if the machine was little Endian?
What word is changed in Memory and to what?
EI209 Chapter 2.27 Haojin Zhu, SJTU, 2015
Example of Loading and Storing Bytes Given following code sequence and memory state what is
the state of the memory after executing the code?add $s3, $zero, $zerolb $t0, 1($s3)sb $t0, 6($s3)
Memory
0x 0 0 9 0 1 2 A 0Data Word
Address (Decimal)
0
4
8
12
16
20
24
0x F F F F F F F F
0x 0 1 0 0 0 4 0 2
0x 1 0 0 0 0 0 1 0
0x 0 0 0 0 0 0 0 0
0x 0 0 0 0 0 0 0 0
0x 0 0 0 0 0 0 0 0$t0 = 0x00000090
mem(4) = 0xFFFF90FF
mem(4) = 0xFF12FFFF
What value is left in $t0?
What if the machine was little Endian?
What word is changed in Memory and to what?
$t0 = 0x00000012
EI209 Chapter 2.28 Haojin Zhu, SJTU, 2015
Speed of Registers vs. Memory Given that
Registers: 32 words (128 Bytes) Memory: Billions of bytes (2 GB to 8 GB on laptop)
and the RISC principle is… Smaller is faster
How much faster are registers than memory?? About 100-500 times faster!
in terms of latency of one access
EI209 Chapter 2.29 Haojin Zhu, SJTU, 2015
addi $sp, $sp, 4 #$sp = $sp + 4
slti $t0, $s2, 15 #$t0 = 1 if $s2<15
Machine format (I format):
MIPS Immediate Instructions
0x0A 18 8 0x0F
Small constants are used often in typical code Possible approaches?
put “typical constants” in memory and load them create hard-wired registers (like $zero) for constants like 1 have special instructions that contain constants !
The constant is kept inside the instruction itself! Immediate format limits values to the range +215–1 to -215
what about upper 16 bits?
EI209 Chapter 2.30 Haojin Zhu, SJTU, 2015
We'd also like to be able to load a 32 bit constant into a register, for this we must use two instructions
a new "load upper immediate" instruction
lui $t0, 1010101010101010
Then must get the lower order bits right, use ori $t0, $t0, 1010101010101010
Aside: How About Larger Constants?
16 0 8 10101010101010102
1010101010101010
0000000000000000 1010101010101010
0000000000000000
1010101010101010 1010101010101010why can’t addi be used as the second instruction for this 32 bit constant?
ori $t0, $t1, 0xFF00 #$t0 = $t1 | ff00 Instruction Format (I format)
0 9 10 8 0 0x24
0x0D 9 8 0xFF00
EI209 Chapter 2.36 Haojin Zhu, SJTU, 2015
MIPS conditional branch instructions:
bne $s0, $s1, Lbl #go to Lbl if $s0$s1 beq $s0, $s1, Lbl #go to Lbl if $s0=$s1
Ex: if (i==j) h = i + j;
bne $s0, $s1, Lbl1add $s3, $s0, $s1
Lbl1: ...
MIPS Control Flow Instructions
Instruction Format (I format):
0x05 16 17 16 bit offset
How is the branch destination address specified?
EI209 Chapter 2.37 Haojin Zhu, SJTU, 2015
Specifying Branch Destinations Use a register (like in lw and sw) added to the 16-bit offset
which register? Instruction Address Register (the PC)- its use is automatically implied by instruction- PC gets updated (PC+4) during the fetch cycle so that it holds the
address of the next instruction limits the branch distance to -215 to +215-1 (word) instructions from
the (instruction after the) branch instruction, but most branches are local anyway
PCAdd
32
32 3232
32
offset
16
32
00
sign-extend
from the low order 16 bits of the branch instruction
branch dstaddress
?Add
4 32
EI209 Chapter 2.38 Haojin Zhu, SJTU, 2015
We have beq, bne, but what about other kinds of branches (e.g., branch-if-less-than)? For this, we need yet another instruction, slt
Set on less than instruction: slt $t0, $s0, $s1 # if $s0 < $s1 then
# $t0 = 1 else # $t0 = 0
Instruction format (R format):
Alternate versions of slt slti $t0, $s0, 25# if $s0 < 25 then $t0=1 ...
sltu $t0, $s0, $s1 # if $s0 < $s1 then $t0=1 ...
sltiu $t0, $s0, 25 # if $s0 < 25 then $t0=1 ...2
In Support of Branch Instructions
0 16 17 8 0x24
EI209 Chapter 2.39 Haojin Zhu, SJTU, 2015
Aside: More Branch Instructions Can use slt, beq, bne, and the fixed value of 0 in
register $zero to create other conditions less than blt $s1, $s2, Label
less than or equal to ble $s1, $s2, Label greater than bgt $s1, $s2, Label great than or equal to bge $s1, $s2, Label
slt $at, $s1, $s2 #$at set to 1 ifbne $at, $zero, Label #$s1 < $s2
Such branches are included in the instruction set as pseudo instructions - recognized (and expanded) by the assembler
Its why the assembler needs a reserved register ($at)
EI209 Chapter 2.40 Haojin Zhu, SJTU, 2015
MIPS Instructions
Consider a comparison instruction: slt $t0, $t1, $zeroand $t1 contains the 32-bit number 1111 01…01
What gets stored in $t0?
EI209 Chapter 2.41 Haojin Zhu, SJTU, 2015
MIPS Instructions
Consider a comparison instruction: slt $t0, $t1, $zeroand $t1 contains the 32-bit number 1111 01…01
What gets stored in $t0?The result depends on whether $t1 is a signed or unsigned number – the compiler/programmer must track this and accordingly use either slt or sltu
slt $t0, $t1, $zero stores 1 in $t0 sltu $t0, $t1, $zero stores 0 in $t0
EI209 Chapter 2.42 Haojin Zhu, SJTU, 2015
Bounds Check Shortcut Treating signed numbers as if they were unsigned gives
a low cost way of checking if 0 ≤ x < y (index out of bounds for arrays)
sltu $t0, $s1, $t2 # $t0 = 0 if # $s1 > $t2 (max)# or $s1 < 0 (min)
beq $t0, $zero, IOOB # go to IOOB if# $t0 = 0
The key is that negative integers in two’s complement look like large numbers in unsigned notation. Thus, an unsigned comparison of x < y also checks if x is negative as well as if x is less than y.
EI209 Chapter 2.43 Haojin Zhu, SJTU, 2015
MIPS also has an unconditional branch instruction or jump instruction:
j label #go to label
Other Control Flow Instructions
Instruction Format (J Format):
0x02 26-bit address
PC4
32
26
32
00
from the low order 26 bits of the jump instruction
Why shift left by two bits?
EI209 Chapter 2.44 Haojin Zhu, SJTU, 2015
Aside: Branching Far Away What if the branch destination is further away than can
be captured in 16 bits?
The assembler comes to the rescue – it inserts an unconditional jump to the branch target and inverts the condition
beq $s0, $s1, L1
becomes
bne $s0, $s1, L2j L1
L2:
EI209 Chapter 2.45 Haojin Zhu, SJTU, 2015
Compiling Another While Loop Compile the assembly code for the C while loop where
i is in $s3, k is in $s5, and the base address of the array save is in $s6
while (save[i] == k) i += 1;
EI209 Chapter 2.46 Haojin Zhu, SJTU, 2015
Compiling Another While Loop Compile the assembly code for the C while loop where
i is in $s3, k is in $s5, and the base address of the array save is in $s6
Saves PC+4 in register $ra to have a link to the next instruction for the procedure return
Machine format (J format):
Then can do procedure return with a
jr $ra #return Instruction format (R format):
Instructions for Accessing Procedures
0x03 26 bit address
0 31 0x08
EI209 Chapter 2.51 Haojin Zhu, SJTU, 2015
Six Steps in Execution of a Procedure1. Main routine (caller) places parameters in a place
where the procedure (callee) can access them $a0 - $a3: four argument registers
2. Caller transfers control to the callee
3. Callee acquires the storage resources needed
4. Callee performs the desired task
5. Callee places the result value in a place where the caller can access it $v0 - $v1: two value registers for result values
6. Callee returns control to the caller $ra: one return address register to return to the point of origin
EI209 Chapter 2.52 Haojin Zhu, SJTU, 2015
MIPS Function Call Conventions Registers faster than memory, so use them $a0–$a3: four argument registers to pass parameters $v0–$v1: two value registers to return values $ra: one return address register to return to the point of origin (7 + $zero +$at of 32, 23 left!)
$t0-$t9: 10 x temporaries (intermediates) $s0-$s7: 8 x “saved” temporaries (program variables) 18 registers 32 – (18 + 9) = 5 left
EI209 Chapter 2.53 Haojin Zhu, SJTU, 2015
Notes on Functions Calling program (caller) puts parameters into registers $a0-$a3 and uses jal X to invoke X (callee)
Must have register in computer with address of currently executing instruction
Instead of Instruction Address Register (better name), historically called Program Counter (PC)
It’s a program’s counter; it doesn’t count programs! jr $ra puts address inside $ra into PC What value does jal X place into $ra? ????
Call Sequence1. place excess arguments2. save caller save registers ($a0-$a3, $t0-$t9)3. jal4. allocate stack frame5. save callee save registers ($s0-$s9, $fp, $ra)6 set frame pointer
Return1. place function argument in $v02. restore callee save registers3. restore $fp4. pop frame5. jr $31
arg 1arg 2
..calleesaved
registers
callersaved
registerslocal
variables..
$fp$ra
$s0-$s9
$a0-$a3$t0-$t9
$fp
$sp
EI209 Chapter 2.63 Haojin Zhu, SJTU, 2015
Nested Procedures What happens to return addresses with nested
On the call to rt_1, the return address (next in the caller routine) gets stored in $ra. What happens to the value in $ra (when i != 0) when rt_1 makes a call to rt_2?
EI209 Chapter 2.66 Haojin Zhu, SJTU, 2015
Saving the Return Address, Part 1 Nested procedures (i passed in $a0, return value in $v0)
execution of the first encounter with jal (second call to fact routine with $a0 now holding 1) saved return address to
caller routine (i.e., location in the main routine where first call to fact is made) on the stack
saved original value of $a0 on the stack
EI209 Chapter 2.75 Haojin Zhu, SJTU, 2015
A Look at the Stack for $a0 = 2, Part 1
$sp
$ra
$a0
$v0
$sp
caller rt addr
caller rt addr$a0 = 2
21
bk_f
old TOS Stack state after
execution of the first encounter with jal (second call to fact routine with $a0 now holding 1) saved return address to
caller routine (i.e., location in the main routine where first call to fact is made) on the stack
saved original value of $a0 on the stack
EI209 Chapter 2.77 Haojin Zhu, SJTU, 2015
A Look at the Stack for $a0 = 2, Part 2
$ra
$a0
$v0
$spcaller rt addr
$a0 = 2
10
bk_f
old TOS
$sp$a0 = 1bk_f
bk_f
Stack state after execution of the second encounter with jal (third call to fact routine with $a0 now holding 0) saved return address of
instruction in caller routine (instruction after jal) on the stack
saved previous value of $a0 on the stack
EI209 Chapter 2.79 Haojin Zhu, SJTU, 2015
A Look at the Stack for $a0 = 2, Part 3
$ra
$a0
$v0
$spbk_f
$a0 = 1
0
old TOS
$sp$a0 = 0bk_f
bk_f
$a0 = 2caller rt addr
1
$sp
Stack state after execution of the first encounter with the first jr ($v0 initialized to 1) stack pointer updated to
point to third call to fact
EI209 Chapter 2.80 Haojin Zhu, SJTU, 2015
A Look at the Stack for $a0 = 2, Part 4
$sp
$ra
$a0
$v0
old TOS Stack state after execution
of the first encounter with the second jr (return from fact routine after updating $v0 to 1 * 1) return address to caller
routine (bk_f in fact routine) restored to $ra from the stack
previous value of $a0 restored from the stack
stack pointer updated to point to second call to fact
EI209 Chapter 2.81 Haojin Zhu, SJTU, 2015
A Look at the Stack for $a0 = 2, Part 4
$ra
$a0
$v0
$spbk_f
$a0 = 1
0
old TOS
$a0 = 0bk_f
$a0 = 2caller rt addr
1
$sp
1
bk_f
1 * 1
Stack state after execution of the first encounter with the second jr (return from fact routine after updating $v0 to 1 * 1) return address to caller
routine (bk_f in fact routine) restored to $ra from the stack
previous value of $a0 restored from the stack
stack pointer updated to point to second call to fact
EI209 Chapter 2.82 Haojin Zhu, SJTU, 2015
A Look at the Stack for $a0 = 2, Part 5
$sp
$ra
$a0
$v0
old TOS Stack state after
execution of the second encounter with the second jr (return from fact routine after updating $v0 to 2 * 1 * 1) return address to caller
routine (main routine) restored to $ra from the stack
original value of $a0 restored from the stack
stack pointer updated to point to first call to fact
EI209 Chapter 2.83 Haojin Zhu, SJTU, 2015
A Look at the Stack for $a0 = 2, Part 5
$ra
$a0
$v0
$spbk_f
$a0 = 1
1
old TOS
$a0 = 0bk_f
$a0 = 2caller rt addr
1 * 1
$sp
2
bk_f
2 * 1 * 1
Stack state after execution of the second encounter with the second jr (return from fact routine after updating $v0 to 2 * 1 * 1) return address to caller
routine (main routine) restored to $ra from the stack
original value of $a0 restored from the stack
stack pointer updated to point to first call to fact
caller rt addr
EI209 Chapter 2.84 Haojin Zhu, SJTU, 2015
Optimized Function Convention
To reduce expensive loads and stores from spilling and restoring registers, MIPS divides registers into two categories:
1. Preserved across function call Caller can rely on values being unchanged $ra, $sp, $gp, $fp, “saved registers” $s0- $s7
2. Not preserved across function call Caller cannot rely on values being unchanged Return value registers $v0,$v1, Argument registers
$a0-$a3, “temporary registers” $t0-$t9
EI209 Chapter 2.85 Haojin Zhu, SJTU, 2015
Where is the Stack in Memory? MIPS convention Stack starts in high memory and grows down
Hexadecimal (base 16) : 7fff fffchex
MIPS programs (text segment) in low end 0040 0000hex
static data segment (constants and other static variables) above text for static variables
MIPS convention global pointer ($gp) points to static (30 of 32, 2 left! – will see when talk about OS)
Heap above static for data structures that grow and shrink ; grows up to high addresses
EI209 Chapter 2.86 Haojin Zhu, SJTU, 2015
MIPS Memory Allocation
05/03/2023 Fall 2012 -- Lecture #7 86
jal saves PC+1 in %ra
The callee can use temporary registers (%ti) without saving and restoring them
The caller can rely on save registers (%si) without fear of callee changing them
MIPS uses jal to invoke a function andjr to return from a function