1 Machine-Level Representation of Programs I
Jan 01, 2016
2
Outline
• Compiler drivers• History of the Intel IA-32 architecture• Assembly code and object code• Memory and Registers• Addressing Mode• Data Formats
• Suggested reading
– Chap 1.2, 1.4.1, 1.7.3, 3.1, 3.2, 3.3, 3.4.1
3
The Hello Program
• It begins life as a high-level C program
– Can be read and understand by human beings
• The individual C statements must be
translated by compiler drivers
– So that the hello program can run on a
computer system
– Compiler :编译器
4
The Hello Program
• The C programs are translated into – A sequence of low-level machine-language
instructions
• These instructions are then packaged in a form – called an object program
• Object program are stored as a binary disk file– Also referred to as executable object files
5
The Context of a Compiler (gcc)
Source program (text)hello.c
Preprocessor (cpp)
Modified source program (text)hello.i
Assembly program (text)
Compiler (cc1)
hello.s
Assembler (as)
Relocatable object program (binary)hello.o
Linker (ld)
Executable object program (binary)hello
Figure 1.3 P5
Compiler: 编译器Assembler: 汇编器Linker: 连接器
6
Characteristics of the high level programming languages
• Abstraction – Productive– reliable
• Type checking• As efficient as hand written code• Can be compiled and executed on a number of
different machines, whereas assembly code is highly machine specific
Productive :多产的Reliable: 可靠的
7
Characteristics of the assembly programming languages
• Managing memory• Low level instructions to carry out the
computation• Highly machine specific
8
Why should we understand the assembly code
• Understand the optimization capabilities of the compiler
• Analyze the underlying inefficiencies in the code
• Sometimes the run-time behavior of a program is needed
9
From writing assembly code to understand assembly code
• Different set of skills– Transformations– Relation between source code and assembly
code
• Reverse engineering– Trying to understand the process by which a
system was created • By studying the system and • By working backward
Backward: 回溯
10
A Historical Perspective
• Long evolutionary development
– Started from rather primitive 16-bit processors
– Added more features
• Take the advantage of the technology improvements
• Satisfy the demands for higher performance and for
supporting more advanced operating systems
– Laden with features providing backward compatibility
that are obsolete
* laden with: 承载
* compatibility: 兼容性
* obsolete: 陈旧的
11
X86 family
• 8086(1978, 29K)
– The heart of the IBM PC & DOS
– 1M bytes addressable, 640K for users
• 80286(1982, 134K)
– More (now obsolete) addressing modes
– Basis of the IBM PC-AT & Windows
12
X86 family
• i386(1985, 275K)
– 32 bits architecture, flat addressing model
– Support a Unix operating system
• I486(1989, 1.9M)
– Integrated the floating-point unit onto the
processor chip
13
X86 family
• Pentium(1993, 3.1M)
• PentiumPro(1995, 6.5M)
– P6 microarchitecture
– Conditional mov
• Pentium/MMX(1997, 4.5M)
– New class of instructions for manipulating
vectors of integers
14
X86 family
• Pentium II(1997, 7M)
– Implementing MMX instructions within P6
• Pentium III(1999, 8.2M)
– New class of instructions for manipulating
vectors of floating-point numbers(SSE, Stream
SIMD Extension)
16
X86 family
• Advanced Micro Devices (AMD)
– Now are close competitors to Intel
– Developing own extension to 64-bits
17
X86 family
• Transmeta
– In January of 2002, introduced CrucoeTM processor
– Radically different approach to implementation
• Translates x86 code into “Very Long Instruction Word”
(VLIW) code
• High degree of parallelism
– Shooting for low-power market such as lap-top
computers
18
Hardware Organization Figure 1.4 P7
•CPU: Central Processing Unit•ALU: Arithmetic/Logic Unit•PC: Program Counter•USB: Universal Serial Bus
19
Virtual spaces
• A linear array of bytes– each with its own unique address (array index)
starting at zero
… … … …
0xffffffff
0xfffffffe
0x2
0x1
0x0
addresses contents
21
Data layout
• Object model in assembly– A large, byte-addressable array– No distinctions even between signed or
unsigned integers– Code, user data, OS data– Run-time stack for managing procedure call
and return– Blocks of memory allocated by user
23
Operations in C constructs
• Arithmetic expression evaluation
• Loops
• Procedure calls and returns
• Translated into sequences of instructions
24
Operations in Assembly Instructions
• Performs only a very elementary operation
• Normally one by one in sequential
• Operate data stored in registers
• Transfer data between memory and a
register
• Conditionally branch to a new instruction
address
25
Assembly Programmer’s View Figure 3.2 P136
FF
BF
7F
3F
C0
80
40
00
Stack
DLLs
TextDataHeap
Heap
08
%eax
%edx
%ecx
%ebx
%esi
%edi
%esp
%ebp
%al%ah
%dl%dh
%cl%ch
%bl%bh
%eip
%eflag
Addresses
Data
Instructions
26
Programmer-Visible States P129
• Program Counter(%eip)
– Address of the next instruction
• Register File
– Heavily used program data
– Integer and floating-point
27
Programmer-Visible States
• Conditional code register
– Hold status information about the most recently
executed instruction
– Implement conditional changes in the control
flow
28
Code Examples P130
C codeint sum(int x, int y){ int t = x+y; return t;}
_sum:pushl %ebpmovl %esp,%ebpmovl 12(%ebp),%eaxaddl 8(%ebp),%eaxmovl %ebp,%esppopl %ebpret
Obtain with command
gcc –O2 -S code.c
Assembly file code.s
29
Code Examples P131
55 89 e5 8b 45 0c 03 45 08 01 05 00 00 00 00 89 ec 5d c3
Obtain with command
gcc –O2 -c code.c
Relocatable object file code.o
30
Code Examples
Obtain with command
objdump -d code.o
Disassembly output (P132 反汇编输出 )0x80483b4 <sum>:0x80483b4 550x80483b5 89 e50x80483b7 8b 45 0c0x80483ba 03 45 080x80483bd 01 05 00 00 00 000x80483c3 89 ec0x80483c5 5d0x80483c6 c3
push %ebp mov %esp,%ebp mov 0xc(%ebp),%eax add 0x8(%ebp),%eax mov %ebp,%esp add %eax, 0x0 pop %ebp ret nop
32
Assembly Code
• Operands:– x: Register %eax– y: Memory M[%ebp+8]– t: Register %eax
• Instruction– addl 8(%ebp),%eax– Add 2 4-byte integers– Similar to expression x +=y
• Return function value in %eax
34
Operands P137
• In high level languages
– Either constants (常数)
– Or variable (变量)
• Example
– A = A + 4
variabl
e
constant
35
Operands
• Counterparts in assembly languages– Immediate ( constant )
– Register ( variable )
– Memory ( variable )
• Examplemovl 8(%ebp), %eaxaddl $4, %eax
memory
register
immediate
36
Simple Addressing Mode
• Immediate– represents a constant – The format is $imm ($4, $0xffffffff)
• Registers – The fastest storage units in computer systems– Typically 32-bit long
– Register mode Ea
• The value stored in the register
• Noted as R[Ea]
37
Virtual spaces
• A linear array of bytes– each with its own unique address (array index)
starting at zero
… … … …
0xffffffff
0xfffffffe
0x2
0x1
0x0
addresses contents
38
Memory References
• The name of the array is annotated as M
• If addr is a memory address
• M[addr] is the content of the memory starting at addr
• addr is used as an array index
• How many bytes are there in M[addr]?– It depends on the context
39
Memory Addressing Mode
• An expression for – a memory address (or an array index)
• Most general form – imm (Eb, Ei, s)
– s: 1, 2, 4, 8
• The address represented by the above form– imm + R[Eb] + R[Ei] * s
• It gives the value– M[imm + R[Eb] + R[Ei] * s]
40
Type Form Operand value Name
Immediate
$Imm Imm Immediate
Register Ea R[Ea] Register
Memory Imm M[Imm] Absolute
Memory (Ea) M[R[Ea]] Indirect
Memory Imm(Eb) M[Imm+ R[Eb]] Base+displacement
Memory (Eb, Ei) M[R[Eb]+ R[Ei]] Indexed
Memory Imm(Eb, Ei) M[Imm+ R[Eb]+ R[Ei]] Scaled indexed
Memory (, Ei, s) M[R[Ei]*s] Scaled indexed
Memory (Eb, Ei, s) M[R[Eb]+ R[Ei]*s] Scaled indexed
Memory Imm(Eb, Ei, s)
M[Imm+ R[Eb]+ R[Ei]*s]
Scaled indexed
Addressing Mode Figure 3.3 P137
41
Address
Value
0x100 0xFF
0x104 0xAB
0x108 0x13
0x10C 0x11
Register
Value
%eax 0x100
%ecx 0x1
%edx 0x3
0x130x108
0x13260(%ecx,%edx)
0x11(%eax,%edx,4)
0x108$0x108
0xFF(%eax)
0x100%eax
ValueOperand
•Practice problem 3.1 P138
Comment
Register
Immediate
Address 0x100
Absolute address
Address 0x108
Address 0x10C
42
Data Formats Figure 3.1 P135
C declaration Intel data type GAS suffix Size (byte)
char short int unsigned long int unsigned long char * float double long double
ByteWordDouble wordDouble wordDouble wordDouble wordDouble wordSingle precisionDouble precisionExtended precision
bwlllllslt
124444448
10/12