pdfcrowd.com ope n in br owser PRO v ers ion Are you a developer? Try out the HTML to PDF API C Language Gotchas This document describes some easy-to-make mistakes in C. 1.1 <ctype.h> And Character Ty pes 1.2 <stdio.h> And Character Types 2. printf Format Strings 3. Unboun ded String Operatio ns 4. Type Ass umpt ions 5. Null Pointer Constants 6. I/O Error Checking 1. Character Data 1.1 <ctype.h> And Char acter Types Th e char data type mig ht , in any implementation, be si gned or un sig ned. Howev er, the fun ctions de fined i n <ctype.h> don't automatically work well with this: th e st andar d say s: In all cases the argument is an int, the value of which shall b e represent able as an unsigned charor shall be equal to the value ofEOF. This means that if you write code like this, then you run the risk of passing illegal values to the ctype functions: size_t count_whitespace(const char *s) { const char *t = s;
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
There are a couple of other points worth making here:
If you assign a character picked out of a string to an integer variable, the cast should be applied at that point,not just before passing the character to a ctype function.
You can see the idea that you're supposed to use the unsigned representation of characters surface in thedefinition of functions such as strcmp and memcmp, which are defined as interpreting the characters they
compare as unsigned char values.
1.2 <stdio.h> And Character Types
Characters obtained directly from stdio functions such as getc don't require casting before being used in a
pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
<ctype.h> function, as the return value is either an unsigned char converted to an int, or EOF.
Characters returned from stdio functions are usually assigned to int variables, not char or unsigned char variables,
since if you do the latter you can no longer correctly check for EOF. Consider for example this code:
char c;
c = getchar();while(c != EOF) {
...c = getchar();
}
The loop may never terminate: if char is an unsigned type then EOF will be converted to some positive value. On
systems with where char is signed, there is a more subtle bug. Suppose for example that EOF is -1 - then if
character 255 is read it will be converted to the value -1 and terminate the input prematurely.
The conventional way to write this code is:
int c;
c = getchar();while(c != EOF) {
...c = getchar();
}
There is still a problem. Although EOF is guaranteed to be negative, and thus distinct from the value of anyunsigned char, it is not guaranteed to be different from any such value when converted to an int. This is never a
problem on many common systems; but it is at least a theoretical possibility permitted by the C standard. Imagine for
pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
instance a 16-bit word-addressed machine with a compiler that didn't try to pack multiple characters to the word butinstead made char and int identical types. Then we would have (int)(unsigned char)65535 == -1.
(There are certainly word-addressed machines. The second half of the condition might be a reasonable designdecision for a compiler vendor with customers who weren't very interested in text processing).
You might feel that you would be in good company if you ignored systems where the natural character and integer
types were the same size, but this article would certainly be incomplete if it did so. To work correctly on suchsystems, the proper way to write the above loop would be:
int c;
c = getchar();while(!feof(stdin) && !ferror(stdin)) {
...
c = getchar();}
You should be careful to consider the effect of end of file or error on any tests you make on these values. Considerthis loop, intended to scan all characters up to the next whitespace character received:
int c;
c = getchar();while(!isspace(c)) {
...c = getchar();
}
If EOF is returned before any whitespace is detected then this loop may never terminate (since it is not a whitespace
pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
character). A better way to write this would be:
int c;
c = getchar();while(!feof(stdin) && !ferror(stdin) && !isspace(c)) {
...
c = getchar();}
Finally, it is worth noting that although EOF is usually -1, all the standard promises is that it is a negative integral
constant with type int.
2. printf Format Strings
Conversions contained in printf format strings require specific types to appear in the argument list: for example %d
requires an int argument, and %.*s requires an int followed by a character pointer.
Unfortunately, the nature of the printf interface is such that the compiler can only do very limited type-checking, and
is forbidden from doing the implicit conversion to the correct type that ordinary function calls have in C.
If you have something like, say, a pid_t object, and want to print out its numeric value, then the temptation is to just write %d and rely on the fact that pid_t objects do indeed often promote to a type compatible with int. For example,on my system, gcc compiles this without complaint, and it runs fine:
pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
printf("$$=%d\n", getpid());return 0;
}
But such facts are rarely guaranteed, and may not be true on some other system that your code will one day be
ported to. My usual approach is to cast explicitly to the widest available type of the appropriate signedness, andused the correct conversion specification for the resulting type. For example:
printf("$$ = %ld\n", (long)getpid());
This points up a possible problem: what if PIDs are bigger than longs? Of course this problem was here in theprevious version - only worse, as the problem was with the (potentially) smaller type. Answers might be:
Use a pair of macros; but this make the code hard to read, and therefore hard to maintain.
Use intmax_t; but this relies on a C99 implementation, or an implementation that supports this type as anextension, which may limit portability.
Use long long; but this relies on a C99 implementation, or an implementation that supports this type as anextension, which may limit portability, and there's no sure guarantee that the type you're trying to print won't be
even wider than long long.
Ignore the problem and hope that nobody ever violates your assumptions (but this might not even be safe with
PIDs, never mind any other data type). Increasingly risky as 64-bit machines become more widespread.
2.1 %.*s And Pointer Differences
A common idiom used to print a substring delimited by two character pointers is to use %.*s with the ".*" part being
satisfied by the difference of the two pointers. This, for example, compiles without complaint on my system:
But the pointer difference actually has type ptrdiff_t, not the int that printf will be expecting. On a machine with 64-
bit pointers but 32-bit ints, this becomes a serious problem; a compiler than can check the format strings would,hopefully, at least produce a warning. The correct way to write this is to explicitly cast the pointer difference to an int:
printf("%.*s\n", (int)(p2 - p1), p1);
This points up the other problem with this approach - what if the pointer difference is so large that it won't fit an int
(which might be as small as 16 bits)? This problem was always there; it's just that it was less noticeable before.Answers might be:
Check that the pointer difference is in range before casting it.
Limit your strings to a suitable size earlier in the code.
Ignore the problem, and assume that available memory will limit your strings to a suitable size (which is not a
bad bet when int and the machine's address space are both 32 bits, but falls over if you port to machines with
larger address spaces or smaller ints.) Increasingly risky as 64-bit machines become more widespread.
2.2 Format String Vulnerabilities
A "format string vulnerability" is where the programmer passes a user-supplied string to some function (typically e.g.
printf and syslog) that will then interpret it in a way that can potentially be exploited to trick the program into doingsomething the author didn't intend. In some programs this is just a bug, but if the program runs with more privilege
pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
For example, suppose that the program has read some data from the user into the buffer somevar and now wantsto print it out. The wrong way to do it would be:
printf(somevar);
A hostile user can choose the string they send in such a way as to extract information from the program or even to
modify its data and internal workings. (Ask Google about "exploiting format string vulnerabilities" for moreinformation about this. The point of this document is to teach you how not to be vulnerable to these exploits, not toteach you how to construct them.)
The correct way to do what the programmer usually intends in this situation is:
printf("%s", somevar);
or:
fputs(somevar, stdout);
3. Unbounded String OperationsFunctions such as strcpy, strcat and sprintf have a well-known misfeature: they don't bound the size of the resultingstring, which means that whatever buffer has been allocated for the result can be overrun if some check is not made.
Formally, overrunning the buffer just results in undefined behaviour. This might just make a program unreliable, but itmight also be a security hole.
(This assumes that buffer is an array, not a char *. If it is just a pointer, sizeof will return the wrong value, and
you must find the size some other way.)
Use snprintf instead of sprintf. This ensures that only a buffer of known size will be updated, and returns the
number of characters (excluding the terminating 0) that would be required. The disadvantage is that you have
to rely on having C99 or a C library which supports snprintf as an extension.
Another worry with snprintf is that not every implementation chooses the same conventions for the return
value: some old versions return -1 on overflow instead of the number of characters that would have been writtenif space were available. See also http://www.unixpapa.com/incnote/stdio.html.
Use strncpy instead of strcpy. However, you should be aware of two things; firstly strncpy does not
guarantee to null-terminate the result if it is too long, merely to truncate it; and secondly that if there is space for
a terminator, it fills up the rest of the buffer with 0s, not just the one byte that is necessary. To use this functionsafely, it is therefore necessary to explicitly set the last byte of the buffer to 0 after calling it; for example:
Use strlcpy and strlcat instead of strcpy and strcat. Unfortunately these functions are not yet defined in anystandard.
Dynamically allocate sufficient memory for all result strings, rather than relying on fixed-size buffers and boundchecking. The downside of this is that you then have much more memory allocation to worry about, and
(particularly for sprintf) it can be tiresome to calculate all the lengths in advance. Example:
that it be an arithmetic type (which includes float and double). It doesn't guarantee that it has a resolution ofseconds, nor even that it increases with time.
Even some UNIX standards don't guarantee a starting point of 1970.
The answer to this is to avoid making any unguaranteed assumptions about time_t. This might be quite hard work,
as the functions defined by the C standard are not powerful enough for many applications.
Also, there's quite a bit of code available for free that already assumes the traditional UNIX time_t, and if you want to
use it, that means you need a matching time representation.
One answer is to define your own time type, which you can guarantee to have the required representation, and use
conversion functions to convert between your type and the system's time_t. On most (all?) UNIX systems, andanything that has a compatible time_t, this type can be a synonym for time_t and the conversion functions need do
nothing. On other systems the type should be defined as a suitable integer type, and the conversion functions
depend on what the system's time_t is like.
If you define your type as long, then, since that type may be only 32 bits wide, your program may stop working in
2038.
Another alternative is not to care and declare that your program will never work on systems with unusual time
representations. But you may come to regret doing this.
5. Null Pointer Constants
The comp.lang.c FAQ has an entire section on null pointers. There are some points worth making here, though.
http://slidepdf.com/reader/full/c-language-gotchas 12/18df di b PRO i A d l ? T t th HTML t PDF API
An integral constant expression with the value 0, or such an expression cast to void * , is called a null pointer
constant. If a null pointer constant is assigned to or compared for equality to a pointer, the constant is converted to
a pointer of that type. Such a pointer, called a null pointer, is guaranteed to compared unequal to a pointer to any
object or function.
Firstly, despite the name a "null pointer constant" need not actually be a null pointer according to the definition
above; it will only become one in a context where a pointer is expected. For instance outside a pointer context the
constant 0 has type int despite being a "null pointer constant".
This matters when the programmer knows that a pointer is expected but the compiler does not. Variadic functionsare one situation in which this case arises. Consider this code fragment:
printf("%p\n", 0);
%p expects a pointer argument (specifically void *) but the compiler does not know this - it is worked out at runtimeinstead. Consequently the argument is not converted to a pointer at all (despite technically being a null pointer
constant) and so the wrong type of argument is passed. The resulting behaviour is undefined (it might crash, forinstance). The correct way to write this is to use an explicit cast:
printf("%p\n", (void *)0);
Secondly, NULL is guaranteed only to be a null pointer constant - not necessarily to be a null pointer. Again, thisdoes matter. Consider this code fragment:
printf("%p\n", NULL);
If NULL is defined as 0, as is legitimate, then this is no more correct than the original version. If you want to use
http://slidepdf.com/reader/full/c-language-gotchas 13/18df di b PRO i
Are you a developer? Try out the HTML to PDF API
NULL then you must still cast it:
printf("%p\n", (void *)NULL);
You might reasonably wonder what the point of NULL is in that case. I generally do not use it in my own code, butsome people like it - it makes it clearer that a null pointer is intended.
Thirdly, only constants are null pointer constants. (The name makes more sense here.) A variable that happens tohave the value 0 might not become a null pointer when converted to an appropriate pointer type. So this fragment:
int x = 0;printf("%p\n", (void *)x);
...might perfectly legitimately produce different output to the previous examples.
Finally, beware of using the word "null" to describe things other than null pointers or null pointer constants in the
context of C; it can lead to confusion. I use "character 0" or '\0' instead of "the null character" or "NUL", and "theempty string" instead "the null string". This is a point about human language rather than what you write in a C
program, though.
6. I/O Error Checking
There are a few points worth making regarding checking for I/O errors.
Firstly, a common mistake is to fail to close output streams before calling exit. Consider for instance this program:
http://slidepdf.com/reader/full/c-language-gotchas 14/18df di b PRO iAre you a developer? Try out the HTML to PDF API
#include <stdio.h>#include <stdlib.h>
int main(void) {if(printf("Hello, world\n") < 0) {
perror("writing to stdout");exit(EXIT_FAILURE);
}exit(EXIT_SUCCESS);
}
This program correctly checks for an error when calling printf; however if standard output is buffered then the
greeting will not actually be written until the stream is closed by exit. If the write succeeds all is well, but if it fails, thenthere is no way for the C library to report the failure. This applies when main returns, as well as when exit is called.
The library could change the successful termination indication to a failed termination indication in this situation.
However, the standard does not require this and most existing C libraries do not do it, so although it would be agood idea if they did, we can't rely on it as application authors.
One possible corrected version of the program is therefore this:
#include <stdio.h>#include <stdlib.h>
int main(void) {if(printf("Hello, world\n") < 0
|| fclose(stdout) < 0) {perror("writing to stdout");exit(EXIT_FAILURE);
}exit(EXIT_SUCCESS);
}
S dl i i h i i h h i b EOF d i i 1 2 b i i ibl h lid
http://slidepdf.com/reader/full/c-language-gotchas 15/18df di b PRO iAre you a developer? Try out the HTML to PDF API
Secondly, it is worth re-iterating here the point about EOF made in section 1.2 above; it is possible that some valid
character may compare equal to EOF, so calls to getc, getchar, fgetc, putc, putchar and fputc must use explicitcalls to ferror and (for the read functions) feof to check for an error or the end of the file, rather than checking the
return value.
ungetc doesn't set the error indicator even when it fails, so it's not possible to reliably check for errors unless you
know that the argument isn't equal to EOF. On the other hand, the standard says "one character of pushback is
guaranteed", so this shouldn't be an issue if you only ever push at most one character back before reading again.
These issues are only a practical problem when the char and int types have the same representation, but portable
code cannot assume that they do not.
Finally, the error returns from the various <stdio.h> functions are completely inconsistent. The table below
summarizes:
Function Successful Return Error Returnfclose zero EOF (negative)
fflush zero EOF (negative)
fgetc character read use ferror/feof
fgetpos zero nonzero
fgets pointer to array null pointer
fprintf number of characters (non-negative) negative
fputc character written use ferror
fputs non-negative EOF (negative)
fopen pointer to stream null pointer
fread elements read elements read
freopen pointer to stream null pointer
fscanf number of conversions (non negative) EOF or not all conversions
pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API
vscanf number of conversions (non-negative) EOF or not all conversions
vsnprintf number of characters that would be written (non-negative) [2] negative
vsprintf number of characters (non-negative) negative
[1] the standard text is "The fseek function returns nonzero only for a request that cannot be satisfied.", which seems
unambiguous but is oddly different in style from most of the other return value specifications.
[2] early versions of Glibc had a broken implementation of snprintf and vsnprintf: they would return -1 on overflow
rather than the number of characters that would be written. Broken versions of these functions are quite common,
especially in older C libraries.
Afterword
Quite a few of the examples here focus on variadic functions. In particular they look at printf - but this is just becausethat is one of the more complex variadic functions that C programmers usually encounter. The underlying problem is
that C's compile-time type checking mechanisms do not apply in the case of variadic functions.
The same problems occur with unprototyped functions. However, it is usually reasonable to regard these as obsolete
and always avoid them. The one exception is for code that must be portable to very old machines; personally I takethe view that this consideration only ever applies to code necessary to bring up a modern compiler targetted on such
machines.
It would be useful to do away with these problems even in the context of variadic functions, though. There is some
hope. In the case of functions with the same interface as printf, scanf or strftime, GNU C can be told to do type
checking based on the format string.
Obviously this doesn't help for variadic functions with different requirements, though. execlp and similar functions in
UNIX could have compile-time type checking in the same way (a common bug is to fail to make the terminating null
pointer actually a null pointer) but gcc does not currently support this