Copyright 2013 – Noah Mendelsohn
Compiling C Programs
Noah MendelsohnTufts UniversityEmail: [email protected]: http://www.cs.tufts.edu/~noah
COMP 40: Machine Structure and
Assembly Language Programming (Fall 2014)
© 2010 Noah Mendelsohn3
How do we get from source to executable program?
© 2010 Noah Mendelsohn
Executable files
Executable file:
– A single file with all code ready to run at a fixed address in memory
– Typically the same address for all programs
Requirements
– Code divided into multiple source files (.c files and .h files)
– Functions in shared .c files need to show up in lots of executables
– Often we want to share only the compiled versions (.o files) [you don’t have the source for printf() but you use it all the time]
The challenge
– In different executables using the same shared code…
– … the same functions and global variables may wind up at different addresses …
– … but we still need to make references work across source files
4
© 2010 Noah Mendelsohn
Resolving external references
5
#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2));}
two_plus_one.c
int sum(int a, int b) { return a+b;}
arith.c
call to sum(1,2) code for sum()
How do we know where sum() wound up?
two_plus_one (executable)
© 2010 Noah Mendelsohn
From source code to executable (simplified)
6
two_plus_one.c
int sum(int a, int b) { return a+b;}
arith.c
gcc –c arith.c
Relocateable object code for sum()
arith.o
gcc –c two_plus_one.c
Relocateable object code for main()
two_plus_one.o
#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2));}
© 2010 Noah Mendelsohn
From source code to executable (simplified)
7
#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n” sum(1,2));}
two_plus_one.c
int sum(int a, int b) { return a+b;}
gcc –c arith.c
Relocateable object code for sum()
arith.c
arith.o
gcc –c two_plus_one.c
Relocateable object code for main()
two_plus_one.o
Relocatable .o files
• Contain machine code• References within the file are resolved
• References to external files not resolved• Some address fields may need adjusting later depending on final location in executable program
• Includes lists of: 1) Names and addresses of defined externals
2) Names and referents of things needing relocation
© 2010 Noah Mendelsohn
Linking .o files to create executable
8
gcc –o two_plus_one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
© 2010 Noah Mendelsohn
Linking .o files to create executable
9
gcc –o two_plus_one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
gcc actually runs a program named “ld” to create the executable.
© 2010 Noah Mendelsohn
Linking .o files to create executable
10
gcc –o two_plus_one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
To create executable:Code from all .o files collected in one executableFixed load address assumedAll references resolved – code & vars updated
© 2010 Noah Mendelsohn
Linking .o files to create executable
11
gcc –o two_plus one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
The executable contains all the code, with references resolved, loadable at a fixed addr. It is ready to be invoked using the exec_() family of system calls or from the command line [which uses exec()].
© 2010 Noah Mendelsohn
Linking .o files to create executable
12
gcc –o two_plus_one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
The default name for an executable is a.out so programmers sometimes informally refer to any executable as an “a.out”.
© 2010 Noah Mendelsohn13
We left out two important steps!
© 2010 Noah Mendelsohn
Preprocessor
14
#include <stdio.h>#define TWO 2int main(int argc, char *argv[]) { printf(“The sum is %d\n”, sum(1,TWO));}
Before the compiler even sees the code…
…the preprocessor rewrites the code handling all #define, #include, #ifdef and macro substitution…
These are gone before the compiler sees the code
© 2010 Noah Mendelsohn
Preprocessor used for sharing declarations
15
#include <stdio.h>#include “arith.h”int main(int argc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2));}
two_plus_one.c
#include “arith.h”int sum(int a, int b) { return a+b;}
arith.c
int sum(int a, int b);
arith.h
Caller and callee agree on function prototype for sum()
© 2010 Noah Mendelsohn
We also left out the assembler step
The object code in a .o is binary (not human-readable)
Assembly language is a human-reable form of machine code
– Symbolic names for machine instructions
– Symbolic labels for addresses (like variables and branch targets in code)
– Etc.
When you run gcc –c it actually does three steps:
– Run the preprocessor
– Run the compiler itself to create an assembler file
– Run the assembler to create a .o
– Normally, we do these steps together, but you can use switches to run them separately
16
© 2010 Noah Mendelsohn
Common invocations of gcc
17
gcc –c two_plus_two.c Runs preprocessor, compiler & assembler to make two_plus_two.o
gcc –c arith.c Same: makes arith.o
gcc –o two_plus_two two_plus_two.o arith.o Use ld to link .o files + system libraries to make two_plus_two executale
gcc –E two_plus_two.c Runs just preprocessor
gcc –S two_plus_two.c Runs just preprocessor & compiler, produces assembler in .s file
gcc –c two_plus_two.s Notices .s extension, runs assembler
© 2010 Noah Mendelsohn18
Putting it All Together
© 2010 Noah Mendelsohn
Compiling a program
19
#include <stdio.h>int main(int argc, char *argv[]) { printf(“The sum is %d\n” sum(1,2));}
Preprocessor(cpp)
Preprocessed
source
Compiler(cpp)
AssemblerSource
Assembler(as) .o file
Preprocessor(cpp)
Preprocessed
source
Compiler(cpp)
AssemblerSource
Assembler(as) .o file
int sum(int a, int b) { return a+b;}
Loader(ld)
Two_plus_two(executable)
© 2010 Noah Mendelsohn20
Shared Libraries(not required for COMP 40)
(these slides on shared libraries were used in COMP 111…you may find them interesting to read)
© 2010 Noah Mendelsohn
Ooops! Where does printf come from?
21
gcc –o two_plus one two_plus_one.o arith.o libc.a
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
Routines like printf live in libraries.
© 2010 Noah Mendelsohn
Ooops! Where does printf come from?
22
gcc –o two_plus one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
Routines like printf live in libraries.
These are created with the “ar” command, which packages up several .o files together into a “.a” archive or library. You can list the .a along with your separate .o files and ld will pull from it any .o files it needs.
© 2010 Noah Mendelsohn
Ooops! Where does printf come from?
23
gcc –o two_plus one two_plus_one.o arith.o
Relocateable object code for sum()
two_plus_one.o
Relocateable object code for sum()
arith.o
Executable Program
two_plus_one
Routines like printf live in libraries.
These are created with the “ar” command, which packages up several .o files together into a “.a” archive or library. You can list the .a along with your separate .o files and ld will pull from it any .o files it needs.
printf used to live in the system library named libc.a, which the compiler links automatically into the executable (so you don’t have to list it).
© 2010 Noah Mendelsohn
Why shared libraries?
Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf
Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?
Challenges:
– We can’t link it when ld builds the rest of the executable: we can just note we need it
– The same copy is likely to be mapped at different addresses in different programs
24
© 2010 Noah Mendelsohn
Why shared libraries?
Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf
Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?
Challenges:
– We can’t link it when ld builds the rest of the executable: we can just note we need it
– The same copy is likely to be mapped at different addresses in different programs
Solution: compiler, linker and OS work together to support shared libraries
– gcc –fPIC printf.c generates “position-independent code” that can load at any address
– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library
– gcc –o two_plus_one two_plus_one.o arith.o libc.so
25
We’ll use printf as an example even though it’s built in to the system…
Compile the source with –fPIC to make a position-independent .o file.
© 2010 Noah Mendelsohn
Why shared libraries?
Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf
Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?
Challenges:
– We can’t link it when ld builds the rest of the executable: we can just note we need it
– The same copy is likely to be mapped at different addresses in different programs
Solution: compiler, linker and OS work together to support shared libraries
– gcc –fPIC printf.c generates “position-independent code” that can load at any address
– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library
– gcc –o two_plus_one two_plus_one.o arith.o libc.so
26
Link that printf.o and any other files with the –shared option to create a shared library (.so) file.
© 2010 Noah Mendelsohn
Why shared libraries?
Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf
Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?
Challenges:
– We can’t link it when ld builds the rest of the executable: we can just note we need it
– The same copy is likely to be mapped at different addresses in different programs
Solution: compiler, linker and OS work together to support shared libraries
– gcc –fPIC printf.c generates “position-independent code” that can load at any address
– gcc –shared –o libc.so printf.o xxx.o obj3.o creates shared library
– gcc –o two_plus_one two_plus_one.o arith.o libc.so
27
The linker recognizes .so files…instead of including the code, it leaves a little stub that tells the OS to find and map the shared copy of the .so file when exec loads the program.
(Actually, libc.so is so widely used that it’s automatically linked, so you don’t need to list it as you would your own .so libraries).
© 2010 Noah Mendelsohn
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Angry
B
irds
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds Data)
Heap(malloc’d)
argv, environ
???
libc.so
Stack(Browser Call
Stack)
Text(Browser code)
Static initialized (Browser Data)
Static uninitialized(Browser Data)
Heap(malloc’d)
argv, environ
libc.so
libc.so (with printf code) shows up at
different locations in
the two programs
Memory mapping allows sharing of .so libraries
© 2010 Noah Mendelsohn
Memory mapping allows sharing of .so libraries
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds Data)
Heap(malloc’d)
argv, environ
Stack(Angry Birds Call Stack)
Text(Browser code)
Static initialized (Browser Data)
Static uninitialized(Browser Data)
Heap(malloc’d)
argv, environ
Angry
B
irds ???
libc.so
libc.so
libc.so
Only one copy lives in
memory… everyone shares it!
© 2010 Noah Mendelsohn
Memory mapping allows sharing of .so libraries
MA
IN M
EM
ORY
CPU
Angry
B
irds
Pla
y
Vid
eo
Bro
wse
r
OPER
ATIN
G S
YSTEM
Stack(Angry Birds Call Stack)
Text(Angry Birds
code)
Static initialized (Angry Birds
Data)
Static uninitialized(Angry Birds Data)
Heap(malloc’d)
argv, environ
Stack(Angry Birds Call Stack)
Text(Browser code)
Static initialized (Browser Data)
Static uninitialized(Browser Data)
Heap(malloc’d)
argv, environ
Angry
B
irds ???
libc.so
libc.so
libc.so
Memory mapping
hardware can do this…
Code must be position-
independent!