Advanced Procedures Assembly Language Programming Chapter 8
Advanced Procedures
Assembly Language Programming
Chapter 8
Programming is:1. Exploring the problem space – Knowing what the
problem is (10%) - Requirements
2. Understanding the problem – Knowing what to program (10%) - Specification
3. Solving the problem – Knowing how to program it (20%)
Design = Data + Algorithms + Users (I/O)
4. Implementing the solution – Coding (20%)
5. Evaluating the solution – Testing (40%)
Unit, Integration, Performance/Stress, Usability, Security
Things to ConsiderLimitations, Constraints, Trade-offs
Performance (Efficiency), Space/Size
Data – Encoding, representation, size, portability
Architecture, Algorithm, Programming language
Libraries, Tools, Networks, Platform
Users, Security, Privacy
Faults, fault tolerance, error recovery, error handling, failures, downtime, overflow, underflow
This list is endless and grows with experience!
Program Design TechniquesTop-Down Design (functional decomposition) involves
the following:1. design your program before starting to code
2. break large tasks into smaller ones
3. use a hierarchical structure based on procedure calls
4. test individual procedures separately
The Problems …1. Assumes programmer has a strong understanding of
the necessary and correct architecture
2. Important decisions made early – mistakes thus costly
3. Assumes hierarchical structure is actually possible
4. All initial work on design, none on coding, nothing to show for large periods of time
5. Highly likely to fail or be ineffective on large projects
Program Design, AlternativelyBottom-Up Design (functional synthesis) involves the
following:1. Pick one small part of the program, write a procedure to
perform it
2. Repeat until a collection of small parts is formed
3. Write a procedure to join several small parts
4. Continue until all the parts implemented and joined
Build some Lego blocks, connect them, build some more …
Some comments …1. Design tends to emerge, not necessarily planned
2. Gets parts working sooner, easier to spot programming road-blocks
3. Hard to know how big the parts should be
4. Can ignore some parts if time runs out (and they are not important)
5. Not good for teams as no plan exists
6. Generally safer, but less overall structure
Reality of Programming
A mixture of bottom-up and top-down approachs
Top-down design – Some planning needed before you start
Bottom-up implementation – Don't over plan
Iterative methodologies – Get it to work then refine and improve
Opportunistic approaches – do what works, when it works
Much personal preference exists (strong sex-based differences as well)
Creating ProceduresLarge problems can be divided into smaller tasks to
make them more manageable
A procedure is the ASM equivalent of a Java Method, C/C++ Function, Basic Subroutine, or Pascal Procedure
Same thing as what is in the Irvine32 library
The following is an assembly language procedure named sample:
sample PROC… Code for procedure goes here …retsample ENDP
CALL and RET The CALL instruction calls a procedure 1. pushes offset of next instruction on the
stack (saves the value of the instruction pointer)
2. copies the address of the called procedure into EIP (puts the address of the procedure into the instruction pointer)
3. Begins to execute the code of the procedure
The RET instruction returns from a procedure1. pops top of stack into EIP (over-writes
instruction pointer with the value of the instruction after the call)
CALL-RET Examplemain PROC
00000020 call MySub00000025 mov eax,ebx..main ENDP
MySub PROC00000040 mov eax,edx..retMySub ENDP
0000025 is the offset of the instruction immediately following the CALL instruction
00000040 is the offset of the first instruction inside MySub
CALL-RET in Action00000025 ESP
EIP
00000040The CALL instruction pushes 00000025 onto the stack, and loads 00000040 into EIP
CALL = PUSH eip MOV EIP, OFFSET proc
00000025 ESP
EIP
00000025The RET instruction pops 00000025 from the stack into EIP
RET = POP eip
(stack shown before RET executes)
Nested Procedure Calls
main PROC . . call Sub1 exitmain ENDP
Sub1 PROC . . call Sub2 retSub1 ENDP
Sub2 PROC . . call Sub3 retSub2 ENDP
Sub3 PROC . . retSub3 ENDP
(ret to main)
(ret to Sub1)
(ret to Sub2) ESP
By the time Sub3 is called, the stack contains all three return addresses:
USES OperatorLists the registers that are used by a procedureMASM inserts code that will try to preserve them
ArraySum PROC USES esi ecxmov eax,0 ; set the sum to zeroetc.
MASM generates the code shown in gold
ArraySum PROCpush esipush ecx..pop ecxpop esiret
ArraySum ENDP
Terminologyint sum (int x, int y) { return (x+y);}…printf(“%d\n”, sum(2,3));
x, y are the parameters of the procedure called sum
2, 3 are the arguments for a specific call, invocation, instance of sum
Stack FrameAlso known as an activation record
Area of the stack set aside for a procedure's return address, passed parameters, saved registers, and local variables
Created by the following steps:Calling program pushes arguments on the stack and calls the procedure
The called procedure pushes EBP on the stack, and sets EBP to ESP
If local variables are needed, a constant is subtracted from ESP to make room on the stack
Registers are saved if they will be altered
Calling a ProcPush arguments (parameter values) on stack
Call proc (and push the return address on the stack)
Push EBP on the stack
Set EBP equal to ESP (points to its own location on stack)
Add stack space for local variables (“push space on stack")
Push registers to save on the stack
MEMORISE THIS SET OF STEPS!
Stack after the CALLThis is what we must buildVal/Address Arg 2: EBP
+ 12
Val/Address Arg 1: EBP + 8
Return Address
Old/Saved EBP
Local Variable 1: EBP - 4
Local Variable 2: EBP - 8
EAX (saved)
…
EDX (saved)
Empty Space
EBP
ESP
Stack ParametersSometimes more convenient than register parameters
Any number of values can be pushed/passed
Values are in memory and don’t need to be moved there later
Two possible ways of calling DumpMempushadmov esi,OFFSET arraymov ecx,LENGTHOF arraymov ebx,TYPE arraycall DumpMempopad
push TYPE arraypush LENGTHOF arraypush OFFSET arraycall DumpMem
Passing ArgsPush arguments on stack
Suggestion: Use only 32-bit values in protected mode to keep the stack aligned – Don’t push random length strings!
1. By Value: Push the values on the stack
2. By Reference: Push the address (offsets) on the stack
Call the called-procedure
Put return value in eax
Return
Remove arguments from the stack if the called-procedure did not remove them
Example: Pass by Value
.dataval1 DWORD 5val2 DWORD 6
.codepush val2push val1
(val2) 6(val1) 5 ESP
Stack prior to CALL
Example: Pass by Reference
.dataval1 DWORD 5val2 DWORD 6
.codepush OFFSET val2push OFFSET val1
(offset val2) 00000004(offset val1) 00000000 ESP
Stack prior to CALL
Stack after the CALLThis is what we must buildVal/Address Arg 2: EBP
+ 12
Val/Address Arg 1: EBP + 8
Return Address
Old/Saved EBP
Local Variable 1: EBP - 4
Local Variable 2: EBP - 8
EAX (saved)
…
EDX (saved)
Empty Space
EBP
ESP
Simple ExampleThe ArrayFill procedure fills an array with 16-bit random integers
The calling program passes the address of the array, along with a count of the number of array elements:
.datacount = 100array WORD count DUP(?).code
push OFFSET arraypush COUNTcall ArrayFill
Passing an Array
ArrayFill PROCpush ebpmov ebp,esppushadmov esi,[ebp+12]mov ecx,[ebp+8].
offset(array)
count
EBP
[EBP + 8]
[EBP + 12]
return address
EBP
ESI points to the beginning of the array, so it's easy to use a loop to access each array element
ArrayFill can reference an array without knowing the arrays name
Stack Parameters (C/C++)
C and C++ functions access stack parameters using constant offsets from EBP
Example: [ebp + 8]
EBP is called the base pointer or frame pointer because it holds the base address of the stack frame.
EBP does not change value during the function.
EBP must be restored to its original value when a function returns.
RET InstructionReturn from subroutine
Pops stack into the instruction pointer (EIP or IP) -- Control transfers to the target address
Syntax:RETRET n
Optional operand n causes n bytes to be added to the stack pointer after EIP (or IP) is assigned a value
Adding n bytes to ESP “deletes” the pushed arguments from the stack
Stack after the CALLAll this has to go …
Val/Address Arg 2: EBP + 12
Val/Address Arg 1: EBP + 8
Return Address
Old/Saved EBP
Local Variable 1: EBP - 4
Local Variable 2: EBP - 8
EAX (saved)
…
EDX (saved)
Empty Space
EBP
ESP
Deleting ParametersIn the Windows StdCall Model the called
procedure uses ret n to remove the parameters
In C/C++ Model, the calling procedure removes them after the call has returned either by
1. Popping them off the stack
2. Adding n to ESP
ExampleProcedure Difference subtracts the first argument from the second one
Sample call:push 14 ; first argument
push 30 ; second argument
call Difference ; EAX = -16
Difference PROCpush ebpmov ebp,espmov eax,[ebp + 12] ; first argumentsub eax,[ebp + 8] ; second argumentpop ebpret 8 ; remove the 2 argsDifference ENDP
Passing 8/16-bit Arguments
Cannot push 8-bit values on stack
Pushing 16-bit operand may cause page fault or ESP alignment problem-- also incompatible with Windows API functions
Expand smaller arguments into 32-bit values, using MOVZX or MOVSX
.data charVal BYTE 'x'.code
movzx eax,charValpush eaxcall Uppercase
Passing Multiword Arguments
Push high-order values on the stack first and work backward in memory
Results in little-endian ordering of data
Example:.data
longVal DQ 1234567800ABCDEFh
.code
push DWORD PTR longVal + 4 ; high doubleword
push DWORD PTR longVal ; low doubleword
call WriteHex64
Saving & Restoring Registers
Push registers on stack just after assigning ESP to EBP to save registers that are modified inside the procedure
Remember: Don't overwrite the register containing the return value when you restore the registers!
MySub PROC
push ebp
mov ebp,esp
push ecx ; save local registers
push edx
Local VariablesOnly statements within subroutine can view or modify local variables
Storage used by local variables is released when subroutine ends
local variable name can have the same name as a local variable in another function without creating a name clash
Essential when writing recursive procedures, as well as procedures executed by multiple execution threads
Creating LOCAL VariablesExample - create two DWORD local variables:
Say: int x=10, y=20;
ret address
saved ebp EBP
10 (x) [ebp-4] MySub PROC
20 (y) [ebp-8]push ebpmov ebp,espsub esp,8 ;create 2 DWORD variables
mov DWORD PTR [ebp-4],10 ;[ebp-4] = x = 10mov DWORD PTR [ebp-8],20 ;[ebp-8] = y = 20
A Complete Procedureint swap (string s, index i1, index i2) { char c = s[i1]; s[i1] = s[i2]; s[i2] = c; return;}
(Address of) s: EBP+16
I1: EBP+12
I2: EBP+8
Return Address
Saved EBP (of main)
C: EBP-4
ESI
EDI
EAX
Empty Space…
A Complete Procedureswap PROC push ebp mov ebp,esp sub esp,4 push esi push edi push eax ;c = s[i1]; ;s[i1] = s[i2]; ;s[i2] = c; pop eax pop edi pop esi add esp,4 push OFFSET sbuf pop ebp push 0 ret 12 push (SIZE sbuf) - 1swap ENDP call swap
(Address of) s: EBP+16
I1: EBP+12
I2: EBP+8
Return Address
Saved EBP (of main)
C: EBP-4
ESI
EDI
EAX
Empty Space
A Complete Procedureswap PROC push ebp mov ebp,esp sub esp,4 push esi push edi push eax ;do the swap pop eax pop edi pop esi add esp,4 pop ebp ret 12 swap ENDP
;esi = address of str[index1]mov esi,[ebp+16] add esi,[ebp+12];edi = address of str[index2]mov edi,[ebp+16]add edi,[ebp+8];c = str[index1]mov al,BYTE PTR [esi]mov [ebp-4],eax;str[index1] = str[index2]mov al, BYTE PTR [edi]mov BYTE PTR [esi],al;str[index2] = cmov eax,[ebp-4]mov BYTE PTR [edi],al
ENTER InstructionENTER partially creates a stack frame for a called proc1. pushes EBP on the stack2. sets EBP to the base of the stack frame3. reserves space for local variablesExample: MySub PROC
enter 8,0 ;0 = nesting level (not used here)
Equivalent to:MySub PROC
push ebp mov ebp,esp sub esp,8
LEAVE InstructionPartially removes the stack frame for a procedure
MySub PROCenter 8,0.........leaveret
MySub ENDP
push ebpmov ebp,espsub esp,8 ; 2 local DWORDs
mov esp,ebp ; free local spacepop ebp
Equivalent operations
A Simpler Procswap PROC enter 4,0 pushad ;c = s[i1]; ;s[i1] = s[i2]; ;s[i2] = c; popad leave ret 12swap ENDP
push OFFSET sbufpush 0push (SIZE sbuf) -1call swap
(Address of) s: EBP+16
I1: EBP+12
I2: EBP+8
Return Address
Saved EBP (of main)
C: EBP-4
EAX
...
EDI
Empty Space
LOCAL DirectiveThe LOCAL directive declares a list of local variables
immediately follows the PROC directive
each variable is assigned a type
replaces ENTER and LEAVE
Syntax: LOCAL varlist
Example:MySub PROCLOCAL var1:BYTE, var2:WORD, var3:SDWORD
Using LOCAL
LOCAL t1:BYTE ; single character
LOCAL flagVals[20]:BYTE ; array of bytes
LOCAL pArray:PTR WORD ; pointer to an array
Note: inc BYTE PTR [esi]
Casts the contents of esi to a byte Different use of PTR!
Examples:
ExampleBubbleSort PROC
LOCAL temp:DWORD, SwapFlag:BYTE
. . .ret
BubbleSort ENDP
BubbleSort PROCpush ebpmov ebp,espadd esp,0FFFFFFF8h ; add -8 to ESP. . .mov esp,ebppop ebpret
BubbleSort ENDP
MASM generates the following code:
return address
EBP EBP
[EBP - 4]
ESP
temp
SwapFlag [EBP - 8]
Non-Doubleword Local Variables
Local variables can be different sizes1. 8-bit: assigned to next available byte
2. 16-bit: assigned to next even (word) boundary
3. 32-bit: assigned to next doubleword boundary
Be very careful – you can save space but it can add complexity and make code difficult to maintain and understand
Generally not worth the hassle
LEA InstructionLEA returns offsets of direct and indirect operands
OFFSET operator only returns constant offsets
LEA required when obtaining offsets of stack parameters and local variables (anything that isn't in the .data segment)
Example
CopyString PROC,LOCAL count:DWORD, temp[20]:BYTE
;mov edi,OFFSET count <<ERROR: invalid operand>>;mov esi,OFFSET temp <<ERROR: invalid operand>>lea edi,count ; oklea esi,temp ; ok
LEA ExampleSuppose you have a Local variable at [ebp-8]
And you need the address of that local variable in ESI
You cannot use this: mov esi, OFFSET [ebp-8] ; error
Use this instead:lea esi, [ebp-8] ; YAY!
Why? Because OFFSET uses assembly time info and you need lea to access runtime information
What is Recursion?The process created when either:
1. A procedure calls itself
2. A cycle exists: i.e., Procedure A calls procedure B, which in turn calls procedure A
Using a graph in which each node is a procedure and each edge is a procedure call, recursion forms a cycle:
A
B
D
E
C
Calculating a Sum
CalcSum PROCcmp ecx,0 ;check counter valuejz L2 ;quit if zeroadd eax,ecx ;otherwise, add to sumdec ecx ;decrement countercall CalcSum ;recursive callL2: retCalcSum ENDP
The CalcSum procedure recursively calculates the sum of an array of integers. Receives: ECX = count. Returns: EAX = sum
Call#
pre 1 2 3 4 5 6 post
ecx 5 5 4 3 2 1 0 0
eax 0 0 5 9 12 14 15 15
Calculating a Factorialint factorial(int n){ if (n == 0) { return(1); } else { return(n*factorial(n-1)); }} /*end factorial*/
5! = 5 * 4!
4! = 4 * 3!
3! = 3 * 2!
2! = 2 * 1!
1! = 1 * 0!
0! = 1
(base case)
1 * 1 = 1
2 * 1 = 2
3 * 2 = 6
4 * 6 = 24
5 * 24 = 120
1 = 1
recursive calls backing up
This function calculates the factorial of integer n. A new value of n is saved in each stack frame:
When each call instance returns, the product it returns is multiplied by the previous value of n.
Calculating a FactorialFactorial PROC
push ebpmov ebp,espmov eax,[ebp+8] ; get ncmp eax,0 ; n < 0?ja L1 ; yes: continuemov eax,1 ; no: return 1jmp L2
L1: dec eax
push eax ; Factorial(n-1)call Factorial
; Instructions from this point on execute when each; recursive call returns.
mov ebx,[ebp+8] ; get nmul ebx ; eax = eax * ebx
L2: pop ebp ; return EAX
ret 4 ; clean up stackFactorial ENDP
Calculating a Factorial
12 n
n-1
ReturnMain
ebp0
11
ReturnFact
ebp1
10
ReturnFact
ebp2
9
ReturnFact
ebp3
n-2
n-3
(etc...)
Suppose we want to calculate 12!
This diagram shows the first few stack frames created by recursive calls to Factorial
Each recursive call uses 12 bytes of stack space
This is NOT how you should use recursion!!!
A loop would use less memory, have fewer procedure calls, and run much, much, faster
INVOKE DirectiveThe INVOKE directive is a replacement for Intel’s CALL instruction that lets you pass multiple arguments
Syntax: INVOKE procedureName [, argumentList]
ArgumentList is an optional comma-delimited list of procedure arguments
Arguments can be:a) immediate valuesb) integer expressionsc) variable namesd) address and ADDR expressionse) register names
INVOKE Examples.data byteVal BYTE 10 wordVal WORD 1000h.code
; direct operands:INVOKE Sub1,byteVal,wordVal
; address of variable:INVOKE Sub2,ADDR byteVal
; register name, integer expression:INVOKE Sub3,eax,(10 * 20)
; address expression (indirect operand):INVOKE Sub4,[ebx]
ADDR Operator
.datamyWord WORD ?.codeINVOKE mySub,ADDR myWord
• Returns a near or far pointer to a variable, depending on which memory model your program uses:
• Small model: returns 16-bit offset• Large model: returns 32-bit segment/offset• Flat model: returns 32-bit offset
YUCK! It is better practice to be EXPLICIT and to not rely on the assembler to "get it right" for you! Many of these directives (INVOKE, ENTER, LEAVE) hide details and inspire mistakes.
• Example:
PROC Revealed … The PROC directive declares a procedure with an optional list of named parameters.
Syntax: label PROC paramList
paramList is a list of parameters separated by commas. Each parameter has the following syntax: paramName : type
type must either be:1. one of the standard ASM types (BYTE,
SBYTE, WORD, etc.)2. a pointer (offset/address) to one of these
types
PROC Directive Alternate format permits parameter list to be on one or more separate lines:label PROC,
paramList
The parameters can be on the same line . . .param-1:type-1, param-2:type-2, . . ., param-n:type-n
Or they can be on separate lines:param-1:type-1,
param-2:type-2,
. . .,
param-n:type-n
comma required
PROC Example 1
AddTwo PROC,val1:DWORD, val2:DWORD
mov eax,val1add eax,val2
retAddTwo ENDP
• The AddTwo procedure receives two integers and returns their sum in EAX
PROC Example 2
FillArray PROC,pArray:PTR BYTE, fillVal:BYTE, arraySize:DWORD
mov ecx,arraySizemov esi,pArraymov al,fillVal
L1: mov [esi],alinc esiloop L1ret
FillArray ENDP
FillArray receives a pointer to an array of bytes, a single byte fill value that will be copied to each element of the array, and the size of the array
PROC Example 3
ReadFile PROC,pBuffer:PTR BYTE,LOCAL fileHandle:DWORD. . .
ReadFile ENDP
Swap PROC,pValX:PTR DWORD,pValY:PTR DWORD. . .Swap ENDP
PROTO DirectiveCreates a procedure prototype
Syntax: label PROTO paramList
Every procedure called by the INVOKE directive must have a prototype
A complete procedure definition can also serve as its own prototype
PROTO DirectiveStandard configuration:
1. PROTO appears at top of the program listing2. INVOKE appears in the code segment3. the procedure implementation occurs later in the
program
MySub PROTO ;procedure prototype
.codeINVOKE MySub ;procedure call
MySub PROC ;procedure implementation; Does Stuff …
MySub ENDP
PROTO ExamplePrototype for the ArraySum procedure, showing its parameter list:
ArraySum PROTO,ptrArray:PTR DWORD, ; points to the arrayszArray:DWORD ; array size
The EndAnd they all wrote assembly language
happily ever after ...