Printf Case Study

8/10/2019 Printf Case Study

http://slidepdf.com/reader/full/printf-case-study 1/14

Printf Case Study

Prerequisites: Basic Pointers, Arrays, String Literals, Stack Frames

Goal: Reinforce our understanding of pointers by solving a real problem that requires them (can't be

solved without pointers).

Overview: In this case study we'll exploit our knowledge of stack frames on Visual Studio for x86 and

write some C code that reads parameter values directly from the stack. This C code will be specific to

Visual Studio on x86 platforms (i.e., it is NOT portable code and is NOT a good example of how to solve

this type of problems generally), but it will very nicely demonstrate how we can draw diagrams of the

internal state of our program variables and then use those diagrams to write correct, non-trivial code

using pointers.

C/C++ language constructs and concepts demonstrated: character strings, int* pointers, char* pointers,

char** pointers

BackgroundPrintf is a function we use all the time and we rarely (if ever) give any thought to how it works. We know

it performs output (to the screen), and that it will format and output just about anything we want it to.

For example, we can output something simple like:

printf("Hello World!\n");

Or, we can output something more complex like:

printf("The circuit's impedance is %f + %fj\n", real_imp, imag_imp);

Somehow, printf has to determine both what values we want to display and how we want those values

formatted. As programmers we know to use the % sequences (such as %f in the example above) to

describe what and how we want things formatted, but how does printf do its thing? In this case study

we’ll write our own printf function. We’ll only implement a small fraction of the functionality provided

by the Standard C library’s printf, but we’ll cover most of the important aspects of printf. As we gothrough this case study we’ll discover that pointers are the key to being able to extract the values from

the stack. We’ll do some pointer arithmetic to calculate the address of the values we want on the stack,

we’ll declare pointers of the correct type to ensure that we read these values correctly and we’ll even

spend a little bit of time (very little) discussion how we format those values for output.

The putchar function



We’re not going to build then entire functionality of printf from scratch. Our starting point will be the

putchar function. Putchar is a very simple function (from the Standard C library) that formats and

outputs a single ASCII character. For example, if you invoke putchar(65), then you will get the letter “A”

displayed on the screen. That’s because in the ASCII table, row 65 is assigned the character “A”. Of

course, displaying a character actually requires a fairly complex bit of hardware and software to light up

all the right pixels on your display, not to mention the esoteric art of designing character fonts – coolstuff, but way outside the bounds of what we need to learn in EE312.

Printf version 1, just the basicsWe’re ready to jump in and get started. Printf, in its most basic usage is actually quite simple. Given an

invocation like:

printf("Hello World!\n");

All we need to do is output each of the ASCII characters one-at-a-time using putchar. A loop will take

care of that easily.

void printf_v1(char fmt[]) {

uint32_t k = 0;

while (fmt[k] != 0) {

putchar(fmt[k]);

k += 1;}

}

Codereview

In this incarnation of printf, we have just a single parameter. I’ve named the parameter “fmt”,

which is shorthand for “format string”. I like that name because the first argument provided to

printf is the formatting instructions for this output operation – it tells us what characters are to

be displayed, and also contains the %f, %d and similar formatting instructions for all the other

outputs. I’ve followed my own personal style of declaring fmt using the array declaration syntax

(i.e., the []), even though I know full well that fmt is actually a pointer. I use the [] syntax

whenever I have a parameter that points to an array. I use the * syntax to declare parameters

that point at single variables. Of course, the compiler doesn’t care, and both “char fmt[]” and

“char* fmt” mean the same thing.

You’ll also note that I’m using a while loop rather than a for loop. That’s another element of

my style where I try to use for when the iteration takes place over a well defined range (e.g.,

from 1 to 10), and where I try to use while when the iteration continues until something

special happens. In this case, I don’t know in advance how many characters are in the format

string, so I use a while loop that continues until it detects the terminating zero at the end of the

format string.

Finally, you may have noticed that I describe the terminating zero as 0. That, of course, is what it

is – the number zero. Some programmers prefer to write that zero using the syntax ‘\0’. Please



don’t get confused, 0 and ‘\0’ are the same thing. For that matter, 0 and 0x0 are the same thing

too. Just be careful not to confuse ‘\0’ and ‘0’ which are quite different (‘0’ is actually the

number 48).

This version of printf is very limited. Printf_v1 simply prints the format string verbatim. If you try

something like:

printf("the number is %d\n", 42);

then you’ll see “the number is %d” as your output. Note that the \n is handled just fine. That’s because

\n is really just the number 10 (row 10 in the ASCII table is “line feed” – i.e., new line). That’s worth

repeating, just to be sure…. \n is NOT a \ character followed by the letter n. It is a single character

representing the “new line” operation. That character happens to be 10 in the ASCII table (line feed).

However, the %d formatting code is completely ignored by printf_v1. That’s because we didn’t even

attempt to take care of this case. Time to move on to version 2.

Printf version 2 –

one decimal argumentOur first big challenge comes when we try to make printf extract a value from the stack. Consider the

invocation:

printf_v2("the number is %d\n", 42);

In this invocation we have two arguments passed to printf. The first (as always) is a format string. The

second is the value 42 (on our platform this is a 32-bit signed integer). We understand how stack frames

work, so we can certainly imagine how these two arguments would be arranged on the stack.



In the diagram, I’ve illustrated the stack frame for a myPrintf function with one formal parameter (i.e.,

one parameter that is declared in the parameter list), but has been supplied with two actual arguments.

I’ve also illustrated the stack frame so that it contains two local variables, the variable k which is used as

before to index into the format string, and a new variable p which is a pointer. This diagram corresponds

with the following function:

void printf_v2(char* fmt, ...) {

uint32_t k;

int32_t* p;}

Note that the parameter list for printf_v2 has one formal argument (declared as a pointer this time, but

as I’m sure you remember array parameters and pointer parameters are the same thing – I’m using

pointer syntax in this case to remind you that the actual argument will be an address). After the

declaration of fmt I have the C/C++ ellipses expression: “, …” The ellipses means that printf_v2 is a

function that can accept extra arguments. If you declare a function with ellipses, then you can call that

function with as many extra arguments as you wish. The extra arguments can be any type (characters,

integers, strings, floats, etc.). You can also have zero extra arguments. For this case, the extra argument

is 42.

Now that we’ve become familiar with the terrain, we have three problems we have to solve. (1) We

need to locate the memory location where the extra argument is stored, (2) We need to determine

when the argument is supposed to be printed (i.e., where the %d is inside the format string), and (3) we

need to actually format the output in decimal. The first problem is by far the most interesting, finding

the memory location with the 42 in it. Our diagram actually makes this pretty easy. There are actually a

couple of ways I can go about finding this address. In the first method, I’m can make the pointer p point

at the variable k , and then I’ll increment p by 4. In the second case, I’ll make the pointer p point at the

Stack Frame for

main function

Stack Frame for

myPrintf function



parameter fmt and increment p by 1. I actually like the second strategy better, but let’s start with the

first. In the diagram below, I’ve gone ahead and removed the stack from main (it’s really not interesting

to us), and I’ve added the actual array of characters for the format string. Note that (just as it always is),

the array argument is not actually on the stack with the parameters. The real argument is an address to

the first character in the array (in our case fmt is a pointer to the letter ‘t’). You’ll notice that the fmt

parameter points to the first character in our array, and also that the array ends with two numbers. Thefirst number is 10, which is the actual value of ‘\n’ (newline). In ASCII, the new line command code is the

10th entry in the ASCII table. Sometimes students will get confused and think that ‘\n’ is actually two

characters – it’s not. The new line character is just that, a single character, which happens to have ASCII

value 10. The second number is the zero which marks the end of the string. I’ve also assumed that I’ve

executed the statement “p = &k;” setting p to be equal to the address of k, in other words, making p

point to k.

The diagram shows the addresses that result from the pointer arithmetic p + 1, p + 2, etc. Recall that

Visual Studio uses eight bytes of storage on the stack to implement function return (four bytes for the

return address plus another four bytes to store the copy of the old frame pointer). Based on the diagram

we can clearly see that p + 3 points at the first parameter (fmt), and p + 4 points at the memory location

which contains the second argument, 42. Since this argument is one of the “extra” arguments that are

permitted by our printf(char fmt[], …) declaration, the argument has no name. The ONLY way we can

access this argument is by calculating its address. Using a diagram, we can easily calculate the pointer

arithmetic expression to find this address, resulting in the code shown below.

the number is %d 10 0

p

p+1

p+2

p+3

p+4




uint32_t k = 0;

int32_t* p = &k;

p = p + 4;


if (fmt[k] != '%') {putchar(fmt[k]);

k += 1;

} else { // fmt[k] is the beginning of an escape sequence, e.g., %d

/* I'm just going to assume %d for now */

int32_t x = *p;displayDecimal(x);

k = k + 2; // we add 2 to skip the % and the d, and then resume our loop.

}

}}

There’s a couple of things worth noting. First of all, this version of printf is far from done. One big

mistake is that it always assumes that ‘%’ is followed by ‘d’. As a result, the function doesn’t work for

%c, %f, %s or any other escape sequence. Also, the function is limited to working with only a single extra

argument. If there’s more than one %d in the format string, then the function just prints the same

argument over and over (the pointer p never moves, so each time we go to print an argument we

always print the same one). Still the function works just fine for our simple example

printf_v2("the number is %d\n", 42);

One other thing worth noting is the use of the function displayDecimal . This function takes the integer

argument and converts that argument into a sequence of ASCII characters. As humans we often forgetthat this step is even necessary, we instinctively think of something like “42” being a number, even

when it quite clearly is a sequence ‘4’ followed by ‘2’. In our computer program, we have to actually

manually identify and then output each character that makes up the number. A simple function to do

that is shown below.

void displayDecimal(int32_t x) {

if (x == 0) { // special case for 0

putchar('0');return;

}

if (x < 0) { // special case for negative values putchar('-');

x = -x;

// fall through and display the absolute value

}

/* we can now assume x > 0 */

/* extract the digits in x from least to most significant */



char digits[10]; //int32_t is at most 2Billion so, at most 10 characters

uint32_t num_digits = 0; // the actual number of digits

while (x != 0) {

uint32_t d = x % 10; // least significant digit

char c = d + '0'; // ASCII representation of d

/* store the characters in an array so we can reverse them */

digits[num_digits] = c;

num_digits += 1;

/* continue to the next digit of x */

x = x / 10;

}

/* now print the digits in reverse order */

while (num_digits > 0) {

num_digits -= 1;putchar(digits[num_digits]);

}

}

Summary (printf_v2)

Printf has only one formal parameter (in our case, we call this parameter “fmt”). However, printf

can have extra arguments. These arguments do not have names and can only be accessed using

their address.

Calculating the address of a variable in memory requires that you have a diagram showing you

the location of that variable relative to other variables. In our case, we used our detailed

knowledge of Visual Studio’s stack frame to draw a diagram illustrating the position of theunnamed extra argument “42” relative to the named variables “k” and “fmt”.

We chose to read the argument from the stack using a pointer (named p). By referring to our

diagram we concluded that “p = &k + 4” was the correct arithmetic. Note that since p is declared

to be an int32_t* pointer, the +4 in our arithmetic is actually going to increase the address

stored inside p by 16 – the addition is scaled by the size of int32_t, i.e., multiplied by four.

Printf version 3 – a string argument and %s

Of course decimal is not the only format we want to use when producing output, and %d is far from the

only escape option provided by printf. Let’s consider the escape sequence %s which will format and

display a string argument. Consider:

printf_v3("Hello %s\n", "Craig");

In this case we have two string arguments. The first string argument is bound to the formal parameter

“fmt”. The second string argument, “Craig”, will be an unnamed extra argument. To access this



argument we will need to calculate its address (just like we did with the 42 in printf_v2). Before jumping

into the pointer arithmetic, it is worthwhile to remind ourselves exactly what the string argument

“Craig” is. In the C programming language, strings are arrays (arrays of characters with a zero at the

end). Furthermore arrays, when used as arguments to functions, are passed using the address of the

first character of that array. So, in this case, the unnamed argument is actually going to be the address

of the ASCII ‘C’ in an array of six characters, ‘C’, ‘r’, ‘a’, ‘i’, ‘g’, 0. Like all addresses in 32-bit Windows, thisaddress in Visual Studio will be four bytes long. The following diagram shows the stack frame.

Since we’ve not changed the number of arguments from the previous example, and all the arguments

are coincidentally the same size, we can continue to use the same code to extract the extra argument

from the stack. Naturally, we don’t want to format this argument in decimal anymore, so we’ll use the

function displayString instead.

void displayString(char str[]) {

uint32_t k = 0;

while (str[k] != 0) {putchar(str[k]);

k += 1;}

}

The other than changing the function we use to format the output, printf itself is not changed.

Hello %s 10 0

p

p+1

p+2

p+3

p+4

Craig 0




uint32_t k = 0;

int32_t* p = &k;

p = p + 4;



k += 1;

} else { // fmt[k] is the beginning of an escape sequence, e.g., %s

/* I'm just going to assume %s for now */

int32_t x = *p;displayString(x);

k = k + 2; // we add 2 to skip the % and the s, and then resume our loop.

}

}}

Conceptually, this version of printf does the right thing for printf(“Hello %s”, “Craig”); However, the

compiler balks at our invocation of displayString(x); The compiler is concerned that we declared x to be

an int32_t (i.e., a number) and yet the function displayString needs an argument that is an address (i.e.,

a pointer). In other words, the compiler thinks we made a mistake. Actually, it’s the compiler that’s

mistaken here. We know our code is correct because the code matches precisely our diagram (and our

diagram is correct). After p = p + 4, our pointer p points at the location on the stack where the second

(extra) argument is stored. We know that this memory location contains the address of the letter ‘C’ in

our string “Craig”. So, by reading from *p and storing the result in the variable x, we are storing the

address of the letter ‘C’ in the variable x. This address is precisely the address that displayString needs in

order for displayString to print out “Craig”. So, we’re right, the compiler is wrong. What do we do?

The situation calls for a type cast expression. In this case, I’m going to declare an additional variable (q)

and specify that q is type “char*”. Then I’ll use a type cast to convert the value of x into an address and

store that address in q.

int32_t x = *p;char* q;

q = (char*) x; // type cast expressiondisplayString(x);

Type casts in C/C++ allow you to explicitly convert from one type to another. In our case, we want to

convert from an integer (x) to an address. We know that addresses really are numbers, after all, so this

conversion isn’t actually a conversion at all – the value in q is going to be precisely the same value that

was in x. However, since x and q are different types, the language considers them to be different. The

type cast is required in order to satisfy the language’s type system, but that type cast doesn’t do

anything. “q = (char*) x;” means exactly what “q = x;” means, copy the number in x and store that

number in the variable q.



IMPORTANT: Any type cast expression involving pointers in the C programming language will not do

any actual conversion. In fact, if you want to understand what is happening, it’s best to completely

ignore the type cast when reviewing the code.

Now that we can display both %s and %d we should add the case-selection code to our program so that

it correctly selects between strings and decimals. While the switch keyword can be used, I actually

prefer to stick with the more general if-then-else for most of my case selection. So, printf_v3 looks like

this:


uint32_t k = 0;int32_t* p = &k;

p = p + 4;



k += 1;

} else { // fmt[k] is the beginning of an escape sequence, e.g., %s

if (fmt[k+1] != 'd') { // %d case

int32_t x = *p;

displayDecimal(x);

} else if (fmt[k+1] != 's') { // %s case

int32_t x = *p;

char* q = (char*) x;

displayString(q);

else { // default case (an error!)

/* do nothing */

}

k = k + 2; // we add 2 to skip the % and the s, and then resume our loop.

}

}

}

As you can see, we have three cases currently in our code. The first case is for %d sequences, the second

is for %s sequences. We can distinguish between these two cases by examining the value of fmt[k+1].

Since fmt[k] is the ‘%’ character, then fmt[k+1] will be either a ‘d’ or an ‘s’. Well, I suppose it’s possible

that fmt[k+1] is neither ‘d’ nor ‘s’. For now, that’s an error and since we don’t know what to do, I’m

going to structure the code so that it ignores that error.

Printf version 3 summary

A string argument is a pointer – the address of the first character in an array of characters.



In our platform, addresses are the same size (and same binary encoding) as numbers. We can

extract the string extra argument using the same code that we used to extract the integer extra

argument in version 2.

The C programming language considers the types of our variables to be very important, and

consults the type of each variable before determining if an expression is legal. Using an integer

variable where an address (pointer) is expected is illegal in C, even if the number stored in thevariable is the correct address. To get around this problem, we can use type casts. A type cast

will often not do anything other than tell the compiler that the operation should be legal and to

compile it as written. In the case of type casts using pointer types (e.g., type casting to char*)

this is always the case and a type cast using a pointer will never actually do anything. The type

cast essentially just becomes the manual override button that the programmer presses to tell

the compiler to shut up and just generate the machine code.

Printf version 4, cleaning up the code

In our last version of printf, I want to accomplish two things. First, the code is incredibly ugly. Most

importantly by declaring the variable p as an int32_t* the code is incredibly misleading. We don’t know

that p actually points to an integer. It might point to an address (%s) or it might even point to a floating

point number (%f). I want to correct this and declare p using a type that documents only what I know

about that address (and at the same time, I’m going to give this variable a new name). The second thing

I want to do is to improve the functionality of printf so that it will print multiple arguments. To make

that happen, I’ll need to add some pointer arithmetic to increment p each time we extract an argument.

As long as we’re working on yet another version of printf, I might as well give the code a thorough

cleaning and add in the additional cases for %c and %f. A heads up though, I’m not going to bother

actually writing displayFloat as a function. Extracting the binary encoding for IEEE floating point and

creating a sequence of ASCII characters to represent that number is way outside of the goals for this

example.

First up on the docket is to replace the variable p with a new variable, “next_arg”. In our program

next_arg will always be the address of the next extra argument (if there is one). So, we’ll initialize

next_arg to be the address of the first extra argument, and each time we see a valid % sequence, we’ll

increment next_arg so that it becomes the address of the next argument. I’d like to give next_arg the

correct type, which for this case is quite clearly “void*”. In C/C++ the type void* is a generic pointer. We

use that type when we have an address, but we don’t know what type of information is stored at that

address. That’s perfect for this case where I know that next_arg is the address of the next argument, but

I don’t yet know whether that argument is an integer, a float, a character or a string.

As part of my code cleaning, I’m going to initialize next_arg to be &fmt + 1 rather than &k + 4. As we cantell from our diagram either bit of arithmetic calculates the correct address. I prefer &fmt + 1 since this

will still be the correct address even if I create additional local variables (&k + 4 is correct only as long as

k is the first local variable – declare a local variable before k in the program and the whole thing breaks).


void* next_arg = &fmt + 1;uint32_t k = 0;



The main loop for printf is slightly more complicated because I’m adding cases for %f and %c (more on

that later). The biggest change to the main loop is caused by the fact that in C/C++ I cannot legally de-

reference a void* pointer. Specifically in this case, even though next_arg is the correct address, I can’t

read from that address using *next_arg. The reason I can’t read from that location is that since next_arg

is a generic pointer, the compiler has no idea how many bytes I want to read (or how to interpret the

bits contained inside those bytes). For example, next_arg could be the address of a character, ornext_arg could the address of a float. We don’t know (yet), which is why we declared the pointer to be

void* in the first place. Well, the compiler doesn’t know either, so it cannot possibly create machine

code for an expression like *next_arg. To get around this problem, I’m going to resurrect my variable p.

Actually, I’m going to create a whole bunch of variables, each named p, and each with exactly the

correct type to match the. Here’s the final code.

void printf(char* fmt, ...) {

void* next_arg = &fmt + 1; // address of next "extra" argument

uint32_t k = 0;


if (fmt[k] != '%') {

putchar(fmt[k]);

k += 1;} else {

// fmt[k] is the beginning of an escape sequence, e.g., %d

if (fmt[k + 1] == 'd') { // %d case

int32_t* p = (int32_t*) next_arg;

next_arg = p + 1;displayDecimal(*p);

} else if (fmt[k + 1] == 's') { // %s case

int32_t* p = (char**) next_arg;

next_arg = p + 1;displayString(*p);

} else if (fmt[k + 1] == 'f') { // %f case

double* p = (double*) next_arg;next_arg = p + 1;

displayFloat(*p);

} else if (fmt[k + 1] == 'c') { // %c case

int * p = (int *) next_arg;next_arg = p + 1;

putchar(*p);} else { // either %% or error

putchar('%');

}k += 2; // we add 2 to skip the % and the d, and then resume our loop.

} // end of %? escape sequence

}

}

The first escape sequence case in the code is for %d. In this case, next_arg will be the address of an

integer. Accordingly, I declare a variable named p of type int32_t* and I copy the address from next_arg



to p. The C/C++ programming language mandates that I use a type cast when I copy this address.

However, the type cast doesn’t do anything, it just tells the compiler to go ahead and copy the address

into the new variable. Once I have p pointing at the right location (and declared with the correct type), I

can do my pointer arithmetic to calculate the correct address for the next extra argument. The

expression p + 1 is precisely the correct address because the 1 will be scaled by the size of the current

argument (i.e., multiplied by 4 since the current argument is an int32_t). I can also read the extraargument using the expression *p and send that value directly to displayDecimal to handle the output.

The case for %s is almost verbatim a copy of the %d case. That’s not surprising since our diagram

illustrated how similar the two cases actually are. Again, I declare a pointer p and copy the address from

next_arg into p (with a type cast). What’s different this time is that p is declared to be char**. That type

means “a pointer to a pointer to a character”. That is, of course, precisely what next_arg is in this case.

Consider this diagram from printf_v3.

The address stored in next_arg is the address of the extra argument. That extra argument is itself an

address, specifically the address of the ‘C’ in the string “Craig”. In our diagram, next_arg is a pointer that

points to a pointer that points to a character.

In our code, as soon as we know we’re processing the case for %s we know we have a diagram like this

one. Consequently we know that next_arg is really a char**. So, we create a new variable (p) of type

char**, copy the address from next_arg into p and proceed as always. We assign next_arg the

incremented address p + 1 and we send *p to our output function displayString. It is incredibly

important to be able to recognize why char** is the correct type, and why all the code around p is

Hello %s 10 0

Craig 0



correct (“p + 1” and not “*p + 1” or “&p + 1” for example). It takes a little while to sink in, but the code is

correct because the code precisely matches the diagram (and the diagram is correct).

Finally we have two cases, one for %c and one for %f. Both these cases match the case for %d with the

obvious substitution of displayFloat instead of displayDecimal for %f and putchar instead of

displayDecimal for %c. There is one odd thing going on, and that’s that for %f I used a double* pointer

(instead of float*) and for %c I used an int * pointer instead of char*. The reason I used these pointer

types is because of an obscurity in the C standard. The C standard states that float cannot be used as a

parameter (or argument) type. Instead, the compiler always substitutes double. Even if you declare the

parameter as a float, the compiler will actually use the double-precision type instead. A similar thing

happens with characters. In C and C++, character parameters (and arguments) are always promoted to

int. Since the argument for %f is going to be a double, I have to use double* to read this argument

(otherwise I’d only read half the bytes). Since the argument for %c is going to be int, I have to use int* to

read this argument. Note that I used int here instead of the more specific int32_t. The C standard

doesn’t say that char is promoted to 32-bit ints, only that it’s promoted to int.

Printf Case Study

Documents