Credits and Disclaimers 1courses.cs.vt.edu/~cs2505/summer2020session1/Notes/T24_x...X86-64 Assembly Computer Organization I 1 CS@VT ©2005-2019 McQuain Credits and Disclaimers The

X86-64 Assembly

Computer Organization I

1

CS@VT ©2005-2019 McQuain

Credits and Disclaimers

The examples and discussion in the following slides have been adapted from a variety of sources, including:

Chapter 3 of Computer Systems 3nd Edition by Bryant and O'Hallaron x86 Assembly/GAS Syntax on WikiBooks (http://en.wikibooks.org/wiki/X86_Assembly/GAS_Syntax) Using Assembly Language in Linux by Phillip ?? (http://asm.sourceforge.net/articles/linasm.html)

The C code was compiled to assembly with gcc version 4.8.3 on CentOS 7.

Unless noted otherwise, the assembly code was generated using the following command line:

gcc –S –m64 -fno-asynchronous-unwind-tables –mno-red-zone –O0 file.c

AT&T assembly syntax is used, rather than Intel syntax, since that is what the gcc tools use.

X86-64 Assembly


2


Shift Instructions

Shifting the representation of an integer sall rightop, leftop leftop = leftop << rightop // C syntax! sarl rightop, leftop leftop = leftop >> rightop (preserves sign) shll rightop, leftop leftop = leftop << rightop (same as sall) shrl rightop, leftop leftop = leftop >> rightop (hi bits set to 0)

X86-64 Assembly


3


Left Shifts and Multiplication

Shifting an integer operand to the left by k bits is equivalent to multiplying the operand's value by 2k: sall 1, %eax # eax = 2*eax sall 3, %edx # edx = 8*edx For example:

Since general multiplication is much more expensive (in time) than shifting bits, we should prefer using a shift-left instruction when multiplying by a power of 2.

edx 00000000 00000000 00000000 00000101 5

edx 00000000 00000000 00000000 00101000 40

X86-64 Assembly


4


Right Shifts, Unsigned Operands, and Division

Shifting an integer operand to the right by k bits might be expected to divide the operand's value by 2k: shrl 1, %eax # eax = eax / 2 ? Recall that shrl shifts in 0's on the left; so this will indeed perform integer division by 2, provided the value in eax is interpreted as an unsigned integer. For example, if we have an 8-bit unsigned representation of 25510, the instruction above would perform the following transformation:

1111 1111 0111 1111 So it would yield 12710, which is correct for integer division.

X86-64 Assembly


5


Right Shifts, Unsigned Operands, and Division

But, the following will not yield the correct result for an unsigned integer: sarl 1, %eax # eax != eax / 2 For example, if we consider an 8-bit representation of 20010, the instruction above would produce this transformation:

1100 1000 1110 0100 So it would yield 22810, which is incorrect. The correct result would be 10010 which would be represented as 0110 0010. Note that the correct value would have been found by using shrl instead.

X86-64 Assembly


6


Right Shifts, Signed Operands, and Division

Shifting a non-negative (signed) integer operand to the right by k bits will divide the operand's value by 2k: shrl 1, %eax # eax = eax / 2 sarl 1, %eax # eax = eax / 2 If eax holds a non-negative signed integer, the left-most bit will 0, and so both of these instructions will yield the same result. But, if the signed operand is negative, then the high bit will be 1. Clearly, shrl cannot yield the correct quotient in this case. Why?

X86-64 Assembly


7


Right Shifts, Signed Operands, and Division

What about the following instruction, if eax holds a negative signed value? sarl 1, %eax # eax = eax / 2 sarl replicates the sign bit, so this will yield a negative result… But, suppose we have an 8-bit representation of -7: 1111 1001 Then applying an arithmetic right shift of 1 position yields: 1111 1100 That represents the value -4… is that correct?

Mathematics says yes by the Division Algorithm: -7 = -4 * 2 + 1 Remainders must be >= 0!

C says no: -7 = -3 * 2 + -1 -7 % 2 must equal -(7 % 2)

X86-64 Assembly


8


Bitwise Instructions

There are the usual logical operations, applied bitwise: andl rightop, leftop leftop = leftop & rightop // C syntax! orl rightop, leftop leftop = leftop | righttop xorl rightop, leftop leftop = leftop ^ rightop notl op op = ~op

X86-64 Assembly


9


Arithmetic/Logic Example

int arith(int x, int y, int z) { . . . }

. . .

rbp old value of rbp

rsp . . .

the Stack

frame for caller

fame for arith()

Calling a function causes the creation of a stack frame dedicated to that function. The frame pointer register, rbp, points to the beginning of the stack frame for the currently-running function. The stack pointer register, rsp, points to the last thing that was pushed onto the stack. (As an optimization, %rsp may or may not actually be updated. More on this later).

X86-64 Assembly


10



int arith(int x, int y, int z) { int t1 = x + y; int t2 = z*48; int t3 = t1 & 0xFFFF; int t4 = t2 * t3; return t4; }

. . .

rbp + 8 return address

rbp old value of rbp

rbp – 4 t1

rbp – 8 t2

rbp - 12 t3

rbp - 16 t4

rbp - 20 x

rbp - 24 y

rbp – 28 Z

the Stack

autos within fn

The first 6 function arguments are passed in registers, additional arguments are passed on the stack. The arguments stored in registers are often moved somewhere else on the stack before any computations. In this example: • x is passed in register %edi and is moved to -20(%rbp). • y is passed in register %esi and is moved to -24(%rbp). • z is passed in register %edx and is moved to -28(%rbp).

X86-64 Assembly


11


Aside: Stack Frame Layout

. . .

rbp + 8 return address 8-byte value

rbp old value of rbp 8-byte value

rbp – 4 t1 4-byte values

rbp – 8 t2

rbp - 12 t3

rbp - 16 t4

rbp - 20 x

rbp - 24 y

rbp – 28 Z

the Stack

X86-64 Assembly


12




Mapping: address x rbp – 20 y rbp - 24 t1 rbp - 4

movl -24(%rbp), %eax # eax = y movl -20(%rbp), %edx # edx = x addl %edx, %eax # eax = x + y movl %eax, -4(%rbp) # t1 = x + y

X86-64 Assembly


13




Mapping: address z rbp - 28 t2 rbp - 8

movl -28(%rbp), %edx # edx = z movl %edx, %eax # eax = z addl %eax, %eax # eax = z + z = 2z addl %edx, %eax # eax = 2z + z = 3z sall $4, %eax # eax = (3z) << 4 = 3z*16 = 48z movl %eax, -8(%rbp) # t2 = 48z

X86-64 Assembly


14




Mapping: address t1 rbp - 4 t3 rbp - 12

movl -4(%rbp), %eax # eax = t1 movzwl $ax, %eax # eax = t1 & 0xFFFF movl %eax, -12(%rbp) # t3 = t1 & 0xFFFF

X86-64 Assembly


15


Aside: movzwl

You may have noticed the movzwl instruction:

. . . movzwl $ax, %eax # eax = t1 & 0xFFFF . . .

This moves a zero extended (z) word (16 bits) stored in %ax to %eax. And is equivalent to t1 & 0xFFFF since that will zero out the high 16 bits in %eax preserving the rest. We'll see other versions of this instruction later. There are different sizes (movzb) and there are signed variants (movsb). In this case, movzwl apparently offered a performance (or some other) advantage.

X86-64 Assembly


16




Mapping: address t2 rbp - 8 t3 rbp - 12 t4 rbp - 16

movl -8(%rbp), %eax # eax = t2 imull -12(%rbp), %eax # eax = t2 * t3 movl %eax, -16(%rbp) # t4 = t2 * t3

X86-64 Assembly


17


.file "arith.c" .text .globl arith .type arith, @function arith: pushq %rbp # save old frame pointer movq %rsp, %rbp # move frame pointer to top movl %edi, -20(%rbp) # move arguments x, y, and z movl %esi, -24(%rbp) movl %edx, -28(%rbp) . . . movl -16(%rbp), %eax # set return value in eax popq %rbp # rsp = rbp; pop to rbp ret # return to caller .size arith, .-arith .ident "GCC: (GNU) 4.8.3 20140911 ..." .section .note.GNU-stack,"",@progbits

Assembled Code

int arith(int x, int y, int z) { . . . int t4 = t2 * t3; return t4; }

gcc –O0 -S -Wall -m32 arith.c

X86-64 Assembly


18


Assembled Code


. . . movl -24(%rbp), %eax # eax = y movl -20(%rbp), %edx # edx = x addl %edx, %eax # eax = x + y movl %eax, -4(%rbp) # t1 = x + y movl -28(%rbp), %edx # edx = z movl %edx, %eax # eax = z addl %eax, %eax # eax = z + z = 2z addl %edx, %eax # eax = 2z + z = 3z sall $4, %eax # eax = (3z) << 4 = 3z*16 = 48z movl %eax, -8(%rbp) # t2 = 48z . . .

int arith(int x, int y, int z) { int t1 = x + y; int t2 = z*48; . . . }

X86-64 Assembly


19


Assembled Code


. . . movl -4(%rbp), %eax # eax = t1 movzwl %ax, %eax # eax = t1 & 0xFFFF movl %eax, -12(%rbp) # t3 = t1 & 0xFFFF movl -8(%rbp), %eax # eax = t2 imull -12(%rbp), %eax # eax = t2 * t3 movl %eax, -16(%rbp) # t4 = t2 * t3 . . . int arith(int x, int y, int z) {

. . . int t3 = t1 & 0xFFFF; int t4 = t2 * t3; . . . }

X86-64 Assembly


20


Comparing Operands

The compare instruction facilitates the comparison of operands: cmpl rightop, leftop The instruction performs a subtraction of its operands, discarding the result. The instruction sets flags in the machine status word register (EFLAGS) that record the results of the comparison: CF carry flag; indicates overflow for unsigned operations OF overflow flag; indicates operation caused 2's complement overflow SF sign flag; indicates operation resulted in a negative value ZF zero flag; indicates operation resulted in zero For our purposes, we will most commonly check these codes by using the various jump instructions.

X86-64 Assembly


21


Conditional Jump Instructions

The conditional jump instructions check the relevant EFLAGS flags and jump to the instruction that corresponds to the label if the flag is set: # make jump if last result was: je label # zero jne label # nonzero js label # negative jns label # nonnegative jg label # positive (signed >) jge label # nonnegative (signed >=) jl label # negative (signed <) jle label # nonpositive (signed <=) ja label # above (unsigned >) jae label # above or equal (unsigned >=) jb label # below (unsigned <) jbe label # below or equal (unsigned <=).

X86-64 Assembly


22


C to Assembly: if

. . . movl $5, -8(%rbp) cmpl $0, -4(%rbp) js .L1 addl $1, -8(%rbp) .L1: . . .

. . . int y = 5; if ( x >= 0 ) { y++; } . . .

gcc -S –m64 –O0 if.c

X86-64 Assembly


23


C to Assembly: if

. . . movl $5, -8(%rbp) # y = 5 cmpl $0, -4(%rbp) # compare x to 0 js .L1 # goto .L1 if negative addl $1, -8(%rbp) # y++ .L1: . . .

. . . int y = 5; if ( x < 0 ) goto L1; y++; L1: . . .

X86-64 Assembly


24


C to Assembly: if…else

. . . movl $5, -8(%rbp) cmpl $0, -4(%rbp) js .L4 addl $1, -8(%rbp) jmp .L3 .L4: subl $1, -8(%rbp) .L3: . . .

. . . int y = 5; if ( x >= 0 ) y++; else y--; . . .

gcc -S –m64 –O0 ifelse.c

X86-64 Assembly


25


C to Assembly: if…else

. . . movl $5, -8(%rbp) # y = 5 cmpl $0, -4(%rbp) # compare x to 0 js .L4 # goto .L2 if negative addl $1, -8(%rbp) # y++ jmp .L3 # goto .L3 after y++ .L4: subl $1, -8(%rbp) # y-- .L3: . . .

gcc -S –m64 –O0 ifelse.c

. . . int y = 5; if ( x < 0 ) goto L4; y++; goto L3; L4: y--; L3: . . .

X86-64 Assembly


26


C to Assembly: do…while

. . . movl $0, -8(%rbp) # y = 0 .L2: addl $1, -8(%rbp) # y++ subl $1, -4(%rbp) # x-- cmpl $0, -4(%rbp) # compare x to 0 jg .L2 # goto .L2 if positive . . .

. . . int y = 0; do { y++; x--; } while ( x > 0); . . .

gcc -S –m64 –O0 dowhile.c

X86-64 Assembly


27


. . . movl $0, -8(%rbp) # y = 0 .L2: addl $1, -8(%rbp) # y++ subl $1, -4(%rbp) # x-- cmpl $0, -4(%rbp) # compare x to 0 jg .L2 # goto .L2 if positive . . .

. . . int y = 0; L2: y++; x--; if ( x > 0) goto L2; . . .

C to Assembly: do…while

gcc -S –m64 –O0 dowhile.c

X86-64 Assembly


28


. . . movl $0, -8(%rbp) # y = 0 jmp .L2 # goto compare x to 0 # entry test .L3: addl $1, -8(%rbp) # y++ subl $1, -4(%rbp) # x-- .L2: cmpl $0, -4(%rbp) # compare x to 0 jg .L3 # goto loop entry if positive . . .

C to Assembly: while . . . int y = 0; while ( x > 0 ) { y++; x--; } . . .

gcc -S –m64 –O0 while.c

X86-64 Assembly


29


C to Assembly: while

gcc -S –m64 –O0 while.c

Note that the compiler translated the C while loop to a logically-equivalent do-while loop.

. . . movl $0, -8(%rbp) # y = 0 jmp .L2 # goto compare x to 0 # entry test .L3: addl $1, -8(%rbp) # y++ subl $1, -4(%rbp) # x-- .L2: cmpl $0, -4(%rbp) # compare x to 0 jg .L3 # goto loop entry if positive . . .

. . . int y = 0; goto L2; L3: y++; x--; L2: if (x > 0) goto L3; . . .

X86-64 Assembly


30


f: pushq %rbp movq %rsp, %rbp subq 20, %rsp movl %edi, -20(%rbp) movl $1, -4(%rbp) movl $2, -8(%rbp) jmp .L2 .L3: movl -4(%rbp), %eax imull -8(%rbp), %eax movl %eax, -4(%rbp) addl $1, -8(%rbp) .L2: movl -8(%rbp), %eax cmpl -20(%rbp), %eax jle .L3 movl -4(%rbp), %eax leave ret . . .

Reverse Engineering: Assembly to C

Let's consider a short assembly function:

We're going to reconstruct an equivalent function in C. The first step will be to identify the things that do not translate to C…

This is stack setup code; the compiler creates this; it is not represented in C.

This is cleanup and return code; it corresponds to a return statement in C.

X86-64 Assembly


31


. . . f: ... movl %edi, -20(%rbp) movl $1, -4(%rbp) movl $2, -8(%rbp) jmp .L2 .L3: movl -4(%rbp), %eax imull -8(%rbp), %eax movl %eax, -4(%rbp) addl $1, -8(%rbp) .L2: movl -8(%rbp), %eax cmpl -20(%rbp), %eax jle .L3 movl -4(%rbp), %eax . . .


The next step will be to identify variables…

We're going to reconstruct an equivalent function in C. The next step will be to identify variables…

X86-64 Assembly


32


. . . f: . . . movl $1, -4(%rbp) movl $2, -8(%rbp) . . . cmpl -20(%rbp), %eax . . .


Variables will be indicated by memory accesses.

Filtering out repeat accesses yields these assembly statements:

There's an access to a variable on the stack at rbp - 4; this must be a local (auto) variable. Let's call it Local1 There's another access to a variable on the stack at rbp - 8; this must also be a local (auto) variable. Let's call it Local2. A parameter is passed in %edi and stored in rbp – 20; let's call it Param1.

X86-64 Assembly


33


Reverse Engineering: Assembly to C Now we'll assume the variables are all C ints, and considering that the first two accesses are initialization statements, so far we can say the function in question looks like:

And another clue is the statement that stores the value of the variable we're calling Local1 into the register eax (or rax) right before the function returns. That indicates what's returned and the return type:

______ f(int Param1) { int Local1 = 1; int Local2 = 2; . . . }

int f(int Param1) { int Local1 = 1; int Local2 = 2; . . . return Local1; }

X86-64 Assembly


34


. . . f: . . . jmp .L2 .L3: . . . .L2: movl -8(%rbp), %eax cmpl -20(%rbp), %eax jle .L3 . . .


Now, there are two jump statements, a comparison statement, and two labels, all of which indicate the presence of a loop…

The first jump is unconditional… that looks like a C goto. So, this skips the loop body the first time through…

The comparison is using the parameter we're calling Param1 (first argument) and we see that the register eax is holding the value of the variable we're calling Local2 (second argument). Moreover, the conditional jump statement that follows the comparison causes a jump back to the label at the top of the loop, if Local2 <= Param1.

X86-64 Assembly


35


. . . f: . . . jmp .L2 .L3: . . . .L2: movl -8(%rbp), %eax cmpl -20(%rbp), %eax jle .L3 . . .


What we've just discovered is that there is a while loop:

int f(int Param1) { int Local1 = 1; int Local2 = 2; . . . while (Local2 <= Param1){ . . . } . . . return Local1; }

The final step is to construct the body of the loop, and make sure we haven't missed anything else…

X86-64 Assembly


36


. . . f: . . . jmp .L2 .L3: movl -4(%rbp), %eax imull -8(%rbp), %eax movl %eax, -4(%rbp) addl $1, -8(%rbp) .L2: movl -8(%rbp), %eax cmpl -20(%rbp), %eax jle .L3 . . .


Here's what's left, including the loop boundaries for clarity:

And that will do it…

eax = Local1

eax = Local1 * Local2

Local1 = eax = Local1 * Local2

Local2 = Local2 + 1

X86-64 Assembly


37



Here's our function:

int f(int Param1) { int Local1 = 1; int Local2 = 2; while (Local2 <= Param1) { Local1 = Local1 * Local2; Local2++; } return Local1; }

So, what is it computing… really?

X86-64 Assembly


38


f: cmpl $1, %edi jle .L4 movl $2, %edx movl $1, %eax .L3: imull %edx, %eax addl $1, %edx cmpl %edx, %edi jge .L3 rep ret .L4: movl $1, %eax ret . . .

Optimized Assembly Let's consider the same function, just lightly optimized using –O1:

The is stack setup code has been omitted. There are only a few locals, and one parameter, so we don’t need the stack.

The stack clean up code is also mostly gone. Only the ret instruction remains. More on this later.

Registers are used instead of the stack. %edi holds Param1. %eax is used as Local1. %edx is used as Local2.

X86-64 Assembly


39


f: cmpl $1, %edi jle .L4 movl $2, %edx movl $1, %eax .L3: imull %edx, %eax addl $1, %edx cmpl %edx, %edi jge .L3 rep ret .L4: movl $1, %eax ret . . .

Optimized Assembly

Reproducing the earlier slide, we have the exact same pieces in fewer steps:

And that will do it…

eax = Local1

edx = Local2

Local1 = eax = Local1 * Local2

Local2 = Local2 + 1

Credits and Disclaimers 1courses.cs.vt.edu/~cs2505/summer2020session1/Notes/T24_x...X86-64 Assembly Computer Organization I 1 CS@VT ©2005-2019 McQuain Credits and Disclaimers The

Documents