Printf Case StudyPrerequisites : Basic Pointers, Arrays, String Literals, Stack Frames Goal: Reinforce our understanding of pointers by solving a real problem that requires them (can't be solved without pointers). Overview: In this case study we'll e xploit our knowledge of stack frames on Visual Studio for x86 and write some C code that reads parameter values directly from the stack. This C code will be specific to Visual Studio on x86 platforms (i.e., it is NOT portable code and is NOT a good example of how t o solve this type of problems generally), but it will very nicely demonstrate how we can draw diagrams of the internal state of our program variables and t hen use those diagrams to write correc t, non-trivial code using pointers. C/C++ language constructs and c oncepts demonstrated: character strings, int* pointers, char* pointers, char** pointers Background Printf is a function we use all the time and we rarely (if ever) give any thought to how it works. We k now it performs output (to the scre en), and that it will format and output just about anything we want it to. For example, we c an output something simple like: printf("Hello World!\n" ); Or, we can output something mo re complex like: printf("The circuit's impedanc e is %f + %fj\n" , real_imp, imag_imp); Somehow, printf has to determine both what values we want to display and how we want those values formatted. As programmers we know to use the % sequences (such as %f in the example above) to describe what and how we want things formatted, but how does printf do its t hing? In this case study we’ll write our own printf function. We’ll only implement a small fraction of the functionality provided by the Standard C library’s printf, but we’ll cover most of the important aspects of printf. As we go through this case study we’ll discover that pointers are the key to being able to extract the values from the stack. We’ll do some pointer arithmetic to calculate the address of the values we want on the stack, we’ll declare pointers of the correct type to ensure that we read these values correctly and we’ll even spend a little bit of time (very little) discussion how we format those values for output. The putchar function
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Goal: Reinforce our understanding of pointers by solving a real problem that requires them (can't be
solved without pointers).
Overview: In this case study we'll exploit our knowledge of stack frames on Visual Studio for x86 and
write some C code that reads parameter values directly from the stack. This C code will be specific to
Visual Studio on x86 platforms (i.e., it is NOT portable code and is NOT a good example of how to solve
this type of problems generally), but it will very nicely demonstrate how we can draw diagrams of the
internal state of our program variables and then use those diagrams to write correct, non-trivial code
using pointers.
C/C++ language constructs and concepts demonstrated: character strings, int* pointers, char* pointers,
char** pointers
BackgroundPrintf is a function we use all the time and we rarely (if ever) give any thought to how it works. We know
it performs output (to the screen), and that it will format and output just about anything we want it to.
For example, we can output something simple like:
printf("Hello World!\n");
Or, we can output something more complex like:
printf("The circuit's impedance is %f + %fj\n", real_imp, imag_imp);
Somehow, printf has to determine both what values we want to display and how we want those values
formatted. As programmers we know to use the % sequences (such as %f in the example above) to
describe what and how we want things formatted, but how does printf do its thing? In this case study
we’ll write our own printf function. We’ll only implement a small fraction of the functionality provided
by the Standard C library’s printf, but we’ll cover most of the important aspects of printf. As we gothrough this case study we’ll discover that pointers are the key to being able to extract the values from
the stack. We’ll do some pointer arithmetic to calculate the address of the values we want on the stack,
we’ll declare pointers of the correct type to ensure that we read these values correctly and we’ll even
spend a little bit of time (very little) discussion how we format those values for output.
We’re not going to build then entire functionality of printf from scratch. Our starting point will be the
putchar function. Putchar is a very simple function (from the Standard C library) that formats and
outputs a single ASCII character. For example, if you invoke putchar(65), then you will get the letter “A”
displayed on the screen. That’s because in the ASCII table, row 65 is assigned the character “A”. Of
course, displaying a character actually requires a fairly complex bit of hardware and software to light up
all the right pixels on your display, not to mention the esoteric art of designing character fonts – coolstuff, but way outside the bounds of what we need to learn in EE312.
Printf version 1, just the basicsWe’re ready to jump in and get started. Printf, in its most basic usage is actually quite simple. Given an
invocation like:
printf("Hello World!\n");
All we need to do is output each of the ASCII characters one-at-a-time using putchar. A loop will take
care of that easily.
void printf_v1(char fmt[]) {
uint32_t k = 0;
while (fmt[k] != 0) {
putchar(fmt[k]);
k += 1;}
}
Codereview
In this incarnation of printf, we have just a single parameter. I’ve named the parameter “fmt”,
which is shorthand for “format string”. I like that name because the first argument provided to
printf is the formatting instructions for this output operation – it tells us what characters are to
be displayed, and also contains the %f, %d and similar formatting instructions for all the other
outputs. I’ve followed my own personal style of declaring fmt using the array declaration syntax
(i.e., the []), even though I know full well that fmt is actually a pointer. I use the [] syntax
whenever I have a parameter that points to an array. I use the * syntax to declare parameters
that point at single variables. Of course, the compiler doesn’t care, and both “char fmt[]” and
“char* fmt” mean the same thing.
You’ll also note that I’m using a while loop rather than a for loop. That’s another element of
my style where I try to use for when the iteration takes place over a well defined range (e.g.,
from 1 to 10), and where I try to use while when the iteration continues until something
special happens. In this case, I don’t know in advance how many characters are in the format
string, so I use a while loop that continues until it detects the terminating zero at the end of the
format string.
Finally, you may have noticed that I describe the terminating zero as 0. That, of course, is what it
is – the number zero. Some programmers prefer to write that zero using the syntax ‘\0’. Please
parameter fmt and increment p by 1. I actually like the second strategy better, but let’s start with the
first. In the diagram below, I’ve gone ahead and removed the stack from main (it’s really not interesting
to us), and I’ve added the actual array of characters for the format string. Note that (just as it always is),
the array argument is not actually on the stack with the parameters. The real argument is an address to
the first character in the array (in our case fmt is a pointer to the letter ‘t’). You’ll notice that the fmt
parameter points to the first character in our array, and also that the array ends with two numbers. Thefirst number is 10, which is the actual value of ‘\n’ (newline). In ASCII, the new line command code is the
10th entry in the ASCII table. Sometimes students will get confused and think that ‘\n’ is actually two
characters – it’s not. The new line character is just that, a single character, which happens to have ASCII
value 10. The second number is the zero which marks the end of the string. I’ve also assumed that I’ve
executed the statement “p = &k;” setting p to be equal to the address of k, in other words, making p
point to k.
The diagram shows the addresses that result from the pointer arithmetic p + 1, p + 2, etc. Recall that
Visual Studio uses eight bytes of storage on the stack to implement function return (four bytes for the
return address plus another four bytes to store the copy of the old frame pointer). Based on the diagram
we can clearly see that p + 3 points at the first parameter (fmt), and p + 4 points at the memory location
which contains the second argument, 42. Since this argument is one of the “extra” arguments that are
permitted by our printf(char fmt[], …) declaration, the argument has no name. The ONLY way we can
access this argument is by calculating its address. Using a diagram, we can easily calculate the pointer
arithmetic expression to find this address, resulting in the code shown below.
} else { // fmt[k] is the beginning of an escape sequence, e.g., %d
/* I'm just going to assume %d for now */
int32_t x = *p;displayDecimal(x);
k = k + 2; // we add 2 to skip the % and the d, and then resume our loop.
}
}}
There’s a couple of things worth noting. First of all, this version of printf is far from done. One big
mistake is that it always assumes that ‘%’ is followed by ‘d’. As a result, the function doesn’t work for
%c, %f, %s or any other escape sequence. Also, the function is limited to working with only a single extra
argument. If there’s more than one %d in the format string, then the function just prints the same
argument over and over (the pointer p never moves, so each time we go to print an argument we
always print the same one). Still the function works just fine for our simple example
printf_v2("the number is %d\n", 42);
One other thing worth noting is the use of the function displayDecimal . This function takes the integer
argument and converts that argument into a sequence of ASCII characters. As humans we often forgetthat this step is even necessary, we instinctively think of something like “42” being a number, even
when it quite clearly is a sequence ‘4’ followed by ‘2’. In our computer program, we have to actually
manually identify and then output each character that makes up the number. A simple function to do
that is shown below.
void displayDecimal(int32_t x) {
if (x == 0) { // special case for 0
putchar('0');return;
}
if (x < 0) { // special case for negative values putchar('-');
x = -x;
// fall through and display the absolute value
}
/* we can now assume x > 0 */
/* extract the digits in x from least to most significant */
char digits[10]; //int32_t is at most 2Billion so, at most 10 characters
uint32_t num_digits = 0; // the actual number of digits
while (x != 0) {
uint32_t d = x % 10; // least significant digit
char c = d + '0'; // ASCII representation of d
/* store the characters in an array so we can reverse them */
digits[num_digits] = c;
num_digits += 1;
/* continue to the next digit of x */
x = x / 10;
}
/* now print the digits in reverse order */
while (num_digits > 0) {
num_digits -= 1;putchar(digits[num_digits]);
}
}
Summary (printf_v2)
Printf has only one formal parameter (in our case, we call this parameter “fmt”). However, printf
can have extra arguments. These arguments do not have names and can only be accessed using
their address.
Calculating the address of a variable in memory requires that you have a diagram showing you
the location of that variable relative to other variables. In our case, we used our detailed
knowledge of Visual Studio’s stack frame to draw a diagram illustrating the position of theunnamed extra argument “42” relative to the named variables “k” and “fmt”.
We chose to read the argument from the stack using a pointer (named p). By referring to our
diagram we concluded that “p = &k + 4” was the correct arithmetic. Note that since p is declared
to be an int32_t* pointer, the +4 in our arithmetic is actually going to increase the address
stored inside p by 16 – the addition is scaled by the size of int32_t, i.e., multiplied by four.
Printf version 3 – a string argument and %s
Of course decimal is not the only format we want to use when producing output, and %d is far from the
only escape option provided by printf. Let’s consider the escape sequence %s which will format and
display a string argument. Consider:
printf_v3("Hello %s\n", "Craig");
In this case we have two string arguments. The first string argument is bound to the formal parameter
“fmt”. The second string argument, “Craig”, will be an unnamed extra argument. To access this
argument we will need to calculate its address (just like we did with the 42 in printf_v2). Before jumping
into the pointer arithmetic, it is worthwhile to remind ourselves exactly what the string argument
“Craig” is. In the C programming language, strings are arrays (arrays of characters with a zero at the
end). Furthermore arrays, when used as arguments to functions, are passed using the address of the
first character of that array. So, in this case, the unnamed argument is actually going to be the address
of the ASCII ‘C’ in an array of six characters, ‘C’, ‘r’, ‘a’, ‘i’, ‘g’, 0. Like all addresses in 32-bit Windows, thisaddress in Visual Studio will be four bytes long. The following diagram shows the stack frame.
Since we’ve not changed the number of arguments from the previous example, and all the arguments
are coincidentally the same size, we can continue to use the same code to extract the extra argument
from the stack. Naturally, we don’t want to format this argument in decimal anymore, so we’ll use the
function displayString instead.
void displayString(char str[]) {
uint32_t k = 0;
while (str[k] != 0) {putchar(str[k]);
k += 1;}
}
The other than changing the function we use to format the output, printf itself is not changed.
In our platform, addresses are the same size (and same binary encoding) as numbers. We can
extract the string extra argument using the same code that we used to extract the integer extra
argument in version 2.
The C programming language considers the types of our variables to be very important, and
consults the type of each variable before determining if an expression is legal. Using an integer
variable where an address (pointer) is expected is illegal in C, even if the number stored in thevariable is the correct address. To get around this problem, we can use type casts. A type cast
will often not do anything other than tell the compiler that the operation should be legal and to
compile it as written. In the case of type casts using pointer types (e.g., type casting to char*)
this is always the case and a type cast using a pointer will never actually do anything. The type
cast essentially just becomes the manual override button that the programmer presses to tell
the compiler to shut up and just generate the machine code.
Printf version 4, cleaning up the code
In our last version of printf, I want to accomplish two things. First, the code is incredibly ugly. Most
importantly by declaring the variable p as an int32_t* the code is incredibly misleading. We don’t know
that p actually points to an integer. It might point to an address (%s) or it might even point to a floating
point number (%f). I want to correct this and declare p using a type that documents only what I know
about that address (and at the same time, I’m going to give this variable a new name). The second thing
I want to do is to improve the functionality of printf so that it will print multiple arguments. To make
that happen, I’ll need to add some pointer arithmetic to increment p each time we extract an argument.
As long as we’re working on yet another version of printf, I might as well give the code a thorough
cleaning and add in the additional cases for %c and %f. A heads up though, I’m not going to bother
actually writing displayFloat as a function. Extracting the binary encoding for IEEE floating point and
creating a sequence of ASCII characters to represent that number is way outside of the goals for this
example.
First up on the docket is to replace the variable p with a new variable, “next_arg”. In our program
next_arg will always be the address of the next extra argument (if there is one). So, we’ll initialize
next_arg to be the address of the first extra argument, and each time we see a valid % sequence, we’ll
increment next_arg so that it becomes the address of the next argument. I’d like to give next_arg the
correct type, which for this case is quite clearly “void*”. In C/C++ the type void* is a generic pointer. We
use that type when we have an address, but we don’t know what type of information is stored at that
address. That’s perfect for this case where I know that next_arg is the address of the next argument, but
I don’t yet know whether that argument is an integer, a float, a character or a string.
As part of my code cleaning, I’m going to initialize next_arg to be &fmt + 1 rather than &k + 4. As we cantell from our diagram either bit of arithmetic calculates the correct address. I prefer &fmt + 1 since this
will still be the correct address even if I create additional local variables (&k + 4 is correct only as long as
k is the first local variable – declare a local variable before k in the program and the whole thing breaks).
The main loop for printf is slightly more complicated because I’m adding cases for %f and %c (more on
that later). The biggest change to the main loop is caused by the fact that in C/C++ I cannot legally de-
reference a void* pointer. Specifically in this case, even though next_arg is the correct address, I can’t
read from that address using *next_arg. The reason I can’t read from that location is that since next_arg
is a generic pointer, the compiler has no idea how many bytes I want to read (or how to interpret the
bits contained inside those bytes). For example, next_arg could be the address of a character, ornext_arg could the address of a float. We don’t know (yet), which is why we declared the pointer to be
void* in the first place. Well, the compiler doesn’t know either, so it cannot possibly create machine
code for an expression like *next_arg. To get around this problem, I’m going to resurrect my variable p.
Actually, I’m going to create a whole bunch of variables, each named p, and each with exactly the
correct type to match the. Here’s the final code.
void printf(char* fmt, ...) {
void* next_arg = &fmt + 1; // address of next "extra" argument
uint32_t k = 0;
while (fmt[k] != 0) {
if (fmt[k] != '%') {
putchar(fmt[k]);
k += 1;} else {
// fmt[k] is the beginning of an escape sequence, e.g., %d
if (fmt[k + 1] == 'd') { // %d case
int32_t* p = (int32_t*) next_arg;
next_arg = p + 1;displayDecimal(*p);
} else if (fmt[k + 1] == 's') { // %s case
int32_t* p = (char**) next_arg;
next_arg = p + 1;displayString(*p);
} else if (fmt[k + 1] == 'f') { // %f case
double* p = (double*) next_arg;next_arg = p + 1;
displayFloat(*p);
} else if (fmt[k + 1] == 'c') { // %c case
int * p = (int *) next_arg;next_arg = p + 1;
putchar(*p);} else { // either %% or error
putchar('%');
}k += 2; // we add 2 to skip the % and the d, and then resume our loop.
} // end of %? escape sequence
}
}
The first escape sequence case in the code is for %d. In this case, next_arg will be the address of an
integer. Accordingly, I declare a variable named p of type int32_t* and I copy the address from next_arg
to p. The C/C++ programming language mandates that I use a type cast when I copy this address.
However, the type cast doesn’t do anything, it just tells the compiler to go ahead and copy the address
into the new variable. Once I have p pointing at the right location (and declared with the correct type), I
can do my pointer arithmetic to calculate the correct address for the next extra argument. The
expression p + 1 is precisely the correct address because the 1 will be scaled by the size of the current
argument (i.e., multiplied by 4 since the current argument is an int32_t). I can also read the extraargument using the expression *p and send that value directly to displayDecimal to handle the output.
The case for %s is almost verbatim a copy of the %d case. That’s not surprising since our diagram
illustrated how similar the two cases actually are. Again, I declare a pointer p and copy the address from
next_arg into p (with a type cast). What’s different this time is that p is declared to be char**. That type
means “a pointer to a pointer to a character”. That is, of course, precisely what next_arg is in this case.
Consider this diagram from printf_v3.
The address stored in next_arg is the address of the extra argument. That extra argument is itself an
address, specifically the address of the ‘C’ in the string “Craig”. In our diagram, next_arg is a pointer that
points to a pointer that points to a character.
In our code, as soon as we know we’re processing the case for %s we know we have a diagram like this
one. Consequently we know that next_arg is really a char**. So, we create a new variable (p) of type
char**, copy the address from next_arg into p and proceed as always. We assign next_arg the
incremented address p + 1 and we send *p to our output function displayString. It is incredibly
important to be able to recognize why char** is the correct type, and why all the code around p is