Top Banner
pdfcrowd.com ope n in br owser PRO v ers ion Are you a developer? Try out the HTML to PDF API C Language Gotchas This document describes some easy-to-make mistakes in C. 1.1 <ctype.h> And Character Ty pes 1.2 <stdio.h> And Character Types 2. printf Format Strings 3. Unboun ded String Operatio ns 4. Type Ass umpt ions 5. Null Pointer Constants 6. I/O Error Checking 1. Character Data 1.1 <ctype.h> And Char acter Types Th e cha r data type mig ht , in any implementation, be si gned or un sig ned. Howev er, the fun ctions de fined i n <ctype.h> don't automatically work well  with this: th e st andar d say s: In all cases the argument is an int , the value of which shall b e represent able as an unsigned char or shall be equal to the value of EOF . This means that if you write code like this, then you run the risk of passing illegal values to the ctype functions: size_t count_whitespace(const char *s) { const char *t = s;
18

C Language Gotchas

Apr 05, 2018

Download

Documents

56mani78
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 1/18

pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

C Language Gotchas

This document describes some easy-to-make mistakes in C.

1.1 <ctype.h> And Character Types

1.2 <stdio.h> And Character Types

2. printf Format Strings3. Unbounded String Operations

4. Type Assumptions

5. Null Pointer Constants6. I/O Error Checking

1. Character Data

1.1 <ctype.h> And Character Types

The cha r data type might, in any implementation, be signed or unsigned. However, the functions defined in

<ctype.h> don't automatically work well with this: the standard says:

In all cases the argument is an int , the value of which shall be representable as an unsigned char or shall be

equal to the value of EOF .

This means that if you write code like this, then you run the risk of passing illegal values to the ctype functions:

size_t count_whitespace(const char *s) {const char *t = s;

Page 2: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 2/18

pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

while(isspace(*t)) /* possibly *t < 0 */++t;

return t - s;}

One approach would be to pass all your character strings around explicitly using unsigned characters. But this is

inconvenient when you have to interwork with other functions that haven't been designed with this approach in mind -

for example the string handling functions found in the standard library.

The approach I adopt is to use an explicit cast. The above example would thus become:

size_t count_whitespace(const char *s) {const char *t = s;while(isspace((unsigned char)*t))

++t;

return t - s;}

There are a couple of other points worth making here:

If you assign a character picked out of a string to an integer variable, the cast should be applied at that point,not just before passing the character to a ctype function.

You can see the idea that you're supposed to use the unsigned representation of characters surface in thedefinition of functions such as strcmp and memcmp, which are defined as interpreting the characters they

compare as unsigned char values.

1.2 <stdio.h> And Character Types

Characters obtained directly from stdio functions such as getc don't require casting before being used in a

Page 3: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 3/18

pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

<ctype.h> function, as the return value is either an unsigned char converted to an int, or EOF.

Characters returned from stdio functions are usually assigned to int variables, not char or unsigned char variables,

since if you do the latter you can no longer correctly check for EOF. Consider for example this code:

char c;

c = getchar();while(c != EOF) {

...c = getchar();

}

The loop may never terminate: if char is an unsigned type then EOF will be converted to some positive value. On

systems with where char is signed, there is a more subtle bug. Suppose for example that EOF is -1 - then if

character 255 is read it will be converted to the value -1 and terminate the input prematurely.

The conventional way to write this code is:

int c;

c = getchar();while(c != EOF) {

...c = getchar();

}

There is still a problem. Although EOF is guaranteed to be negative, and thus distinct from the value of anyunsigned char, it is not guaranteed to be different from any such value when converted to an int. This is never a

problem on many common systems; but it is at least a theoretical possibility permitted by the C standard. Imagine for

Page 4: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 4/18

pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

instance a 16-bit word-addressed machine with a compiler that didn't try to pack multiple characters to the word butinstead made char and int identical types. Then we would have (int)(unsigned char)65535 == -1.

(There are certainly word-addressed machines. The second half of the condition might be a reasonable designdecision for a compiler vendor with customers who weren't very interested in text processing).

You might feel that you would be in good company if you ignored systems where the natural character and integer

types were the same size, but this article would certainly be incomplete if it did so. To work correctly on suchsystems, the proper way to write the above loop would be:

int c;

c = getchar();while(!feof(stdin) && !ferror(stdin)) {

...

c = getchar();}

You should be careful to consider the effect of end of file or error on any tests you make on these values. Considerthis loop, intended to scan all characters up to the next whitespace character received:

int c;

c = getchar();while(!isspace(c)) {

...c = getchar();

}

If EOF is returned before any whitespace is detected then this loop may never terminate (since it is not a whitespace

Page 5: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 5/18

pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

character). A better way to write this would be:

int c;

c = getchar();while(!feof(stdin) && !ferror(stdin) && !isspace(c)) {

...

c = getchar();}

Finally, it is worth noting that although EOF is usually -1, all the standard promises is that it is a negative integral

constant with type int.

2. printf Format Strings

Conversions contained in printf format strings require specific types to appear in the argument list: for example %d

requires an int argument, and %.*s requires an int followed by a character pointer.

Unfortunately, the nature of the printf interface is such that the compiler can only do very limited type-checking, and

is forbidden from doing the implicit conversion to the correct type that ordinary function calls have in C.

If you have something like, say, a pid_t object, and want to print out its numeric value, then the temptation is to just write %d and rely on the fact that pid_t objects do indeed often promote to a type compatible with int. For example,on my system, gcc compiles this without complaint, and it runs fine:

#include <stdio.h>#include <unistd.h>

int main(void) {

Page 6: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 6/18

pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

printf("$$=%d\n", getpid());return 0;

}

But such facts are rarely guaranteed, and may not be true on some other system that your code will one day be

ported to. My usual approach is to cast explicitly to the widest available type of the appropriate signedness, andused the correct conversion specification for the resulting type. For example:

printf("$$ = %ld\n", (long)getpid());

This points up a possible problem: what if PIDs are bigger than longs? Of course this problem was here in theprevious version - only worse, as the problem was with the (potentially) smaller type. Answers might be:

Use a pair of macros; but this make the code hard to read, and therefore hard to maintain.

Use intmax_t; but this relies on a C99 implementation, or an implementation that supports this type as anextension, which may limit portability.

Use long long; but this relies on a C99 implementation, or an implementation that supports this type as anextension, which may limit portability, and there's no sure guarantee that the type you're trying to print won't be

even wider than long long.

Ignore the problem and hope that nobody ever violates your assumptions (but this might not even be safe with

PIDs, never mind any other data type). Increasingly risky as 64-bit machines become more widespread.

2.1 %.*s And Pointer Differences

A common idiom used to print a substring delimited by two character pointers is to use %.*s with the ".*" part being

satisfied by the difference of the two pointers. This, for example, compiles without complaint on my system:

Page 7: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 7/18

pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

#include <stdio.h>

void t(const char *p1, const char *p2) {printf("%.*s\n", p2 - p1, p1);

}

But the pointer difference actually has type ptrdiff_t, not the int that printf will be expecting. On a machine with 64-

bit pointers but 32-bit ints, this becomes a serious problem; a compiler than can check the format strings would,hopefully, at least produce a warning. The correct way to write this is to explicitly cast the pointer difference to an int:

printf("%.*s\n", (int)(p2 - p1), p1);

This points up the other problem with this approach - what if the pointer difference is so large that it won't fit an int

(which might be as small as 16 bits)? This problem was always there; it's just that it was less noticeable before.Answers might be:

Check that the pointer difference is in range before casting it.

Limit your strings to a suitable size earlier in the code.

Ignore the problem, and assume that available memory will limit your strings to a suitable size (which is not a

bad bet when int and the machine's address space are both 32 bits, but falls over if you port to machines with

larger address spaces or smaller ints.) Increasingly risky as 64-bit machines become more widespread.

2.2 Format String Vulnerabilities

A "format string vulnerability" is where the programmer passes a user-supplied string to some function (typically e.g.

printf and syslog) that will then interpret it in a way that can potentially be exploited to trick the program into doingsomething the author didn't intend. In some programs this is just a bug, but if the program runs with more privilege

than the user than it becomes a security hole.

Page 8: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 8/18

pdfcrowd comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

For example, suppose that the program has read some data from the user into the buffer somevar and now wantsto print it out. The wrong way to do it would be:

printf(somevar);

A hostile user can choose the string they send in such a way as to extract information from the program or even to

modify its data and internal workings. (Ask Google about "exploiting format string vulnerabilities" for moreinformation about this. The point of this document is to teach you how not to be vulnerable to these exploits, not toteach you how to construct them.)

The correct way to do what the programmer usually intends in this situation is:

printf("%s", somevar);

or:

fputs(somevar, stdout);

3. Unbounded String OperationsFunctions such as strcpy, strcat and sprintf have a well-known misfeature: they don't bound the size of the resultingstring, which means that whatever buffer has been allocated for the result can be overrun if some check is not made.

Formally, overrunning the buffer just results in undefined behaviour. This might just make a program unreliable, but itmight also be a security hole.

Page 9: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 9/18

df d mi b PRO i Are you a developer? Try out the HTML to PDF API

Various solutions exist.

Explicitly check that the result string will fit prior to the operation. If it doesn't, don't perform the operation but

instead signal an error somehow. Example:

if(strlen(buffer) + strlen(str) >= sizeof buffer)return -1;

strcat(buffer, str);

(This assumes that buffer is an array, not a char *. If it is just a pointer, sizeof will return the wrong value, and

you must find the size some other way.)

Use snprintf instead of sprintf. This ensures that only a buffer of known size will be updated, and returns the

number of characters (excluding the terminating 0) that would be required. The disadvantage is that you have

to rely on having C99 or a C library which supports snprintf as an extension.

Another worry with snprintf is that not every implementation chooses the same conventions for the return

value: some old versions return -1 on overflow instead of the number of characters that would have been writtenif space were available. See also http://www.unixpapa.com/incnote/stdio.html.

Use strncpy instead of strcpy. However, you should be aware of two things; firstly strncpy does not

guarantee to null-terminate the result if it is too long, merely to truncate it; and secondly that if there is space for

a terminator, it fills up the rest of the buffer with 0s, not just the one byte that is necessary. To use this functionsafely, it is therefore necessary to explicitly set the last byte of the buffer to 0 after calling it; for example:

strncpy(buffer, str, sizeof buffer);buffer[sizeof buffer - 1] = 0;

The same warning regarding sizeof as above applies.

Page 10: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 10/18

df di b PRO i Are you a developer? Try out the HTML to PDF API

Use strncat instead of strcat. Be careful: the size parameter is the number of characters to copy, not the

buffer size; and a 0 character is always appended. So there must be enough space in the buffer for the original

string, the number of characters specified, and the terminator. For example:

strncat(buffer, str, sizeof buffer - strlen(buffer) - 1);

Use strlcpy and strlcat instead of strcpy and strcat. Unfortunately these functions are not yet defined in anystandard.

Dynamically allocate sufficient memory for all result strings, rather than relying on fixed-size buffers and boundchecking. The downside of this is that you then have much more memory allocation to worry about, and

(particularly for sprintf) it can be tiresome to calculate all the lengths in advance. Example:

if(!(buffer = malloc(strlen(s1) + strlen(s2) + 1)))return -1; /* malloc failed */

strcpy(buffer, s1);strcat(buffer, s2);

Use a string handling library which does most of the memory allocation work for you behind the scenes.

4. Type Assumptions

4.1 time_t

It's fairly common practice, especially in programs written under some variant of UNIX, to assume that time_t is a

count of non-leap seconds from the start of 1970. In fact the C standard doesn't guarantee this - it only guarantees

Page 11: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 11/18

df di b PRO i A d l ? T t th HTML t PDF API

that it be an arithmetic type (which includes float and double). It doesn't guarantee that it has a resolution ofseconds, nor even that it increases with time.

Even some UNIX standards don't guarantee a starting point of 1970.

The answer to this is to avoid making any unguaranteed assumptions about time_t. This might be quite hard work,

as the functions defined by the C standard are not powerful enough for many applications.

Also, there's quite a bit of code available for free that already assumes the traditional UNIX time_t, and if you want to

use it, that means you need a matching time representation.

One answer is to define your own time type, which you can guarantee to have the required representation, and use

conversion functions to convert between your type and the system's time_t. On most (all?) UNIX systems, andanything that has a compatible time_t, this type can be a synonym for time_t and the conversion functions need do

nothing. On other systems the type should be defined as a suitable integer type, and the conversion functions

depend on what the system's time_t is like.

If you define your type as long, then, since that type may be only 32 bits wide, your program may stop working in

2038.

Another alternative is not to care and declare that your program will never work on systems with unusual time

representations. But you may come to regret doing this.

5. Null Pointer Constants

The comp.lang.c FAQ has an entire section on null pointers. There are some points worth making here, though.

The standard says:

Page 12: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 12/18df di b PRO i A d l ? T t th HTML t PDF API

 An integral constant expression with the value 0, or such an expression cast to void * , is called a null pointer 

constant. If a null pointer constant is assigned to or compared for equality to a pointer, the constant is converted to

a pointer of that type. Such a pointer, called a null pointer, is guaranteed to compared unequal to a pointer to any 

object or function.

Firstly, despite the name a "null pointer constant" need not actually be a null pointer according to the definition

above; it will only become one in a context where a pointer is expected. For instance outside a pointer context the

constant 0 has type int despite being a "null pointer constant".

This matters when the programmer knows that a pointer is expected but the compiler does not. Variadic functionsare one situation in which this case arises. Consider this code fragment:

printf("%p\n", 0);

%p expects a pointer argument (specifically void *) but the compiler does not know this - it is worked out at runtimeinstead. Consequently the argument is not converted to a pointer at all (despite technically being a null pointer

constant) and so the wrong type of argument is passed. The resulting behaviour is undefined (it might crash, forinstance). The correct way to write this is to use an explicit cast:

printf("%p\n", (void *)0);

Secondly, NULL is guaranteed only to be a null pointer constant - not necessarily to be a null pointer. Again, thisdoes matter. Consider this code fragment:

printf("%p\n", NULL);

If NULL is defined as 0, as is legitimate, then this is no more correct than the original version. If you want to use

Page 13: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 13/18df di b PRO i

Are you a developer? Try out the HTML to PDF API

NULL then you must still cast it:

printf("%p\n", (void *)NULL);

You might reasonably wonder what the point of NULL is in that case. I generally do not use it in my own code, butsome people like it - it makes it clearer that a null pointer is intended.

Thirdly, only constants are null pointer constants. (The name makes more sense here.) A variable that happens tohave the value 0 might not become a null pointer when converted to an appropriate pointer type. So this fragment:

int x = 0;printf("%p\n", (void *)x);

...might perfectly legitimately produce different output to the previous examples.

Finally, beware of using the word "null" to describe things other than null pointers or null pointer constants in the

context of C; it can lead to confusion. I use "character 0" or '\0' instead of "the null character" or "NUL", and "theempty string" instead "the null string". This is a point about human language rather than what you write in a C

program, though.

6. I/O Error Checking

There are a few points worth making regarding checking for I/O errors.

Firstly, a common mistake is to fail to close output streams before calling exit. Consider for instance this program:

#i l d < tdi h>

Page 14: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 14/18df di b PRO iAre you a developer? Try out the HTML to PDF API

#include <stdio.h>#include <stdlib.h>

int main(void) {if(printf("Hello, world\n") < 0) {

perror("writing to stdout");exit(EXIT_FAILURE);

}exit(EXIT_SUCCESS);

}

This program correctly checks for an error when calling printf; however if standard output is buffered then the

greeting will not actually be written until the stream is closed by exit. If the write succeeds all is well, but if it fails, thenthere is no way for the C library to report the failure. This applies when main returns, as well as when exit is called.

The library could change the successful termination indication to a failed termination indication in this situation.

However, the standard does not require this and most existing C libraries do not do it, so although it would be agood idea if they did, we can't rely on it as application authors.

One possible corrected version of the program is therefore this:

#include <stdio.h>#include <stdlib.h>

int main(void) {if(printf("Hello, world\n") < 0

|| fclose(stdout) < 0) {perror("writing to stdout");exit(EXIT_FAILURE);

}exit(EXIT_SUCCESS);

}

S dl i i h i i h h i b EOF d i i 1 2 b i i ibl h lid

Page 15: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 15/18df di b PRO iAre you a developer? Try out the HTML to PDF API

Secondly, it is worth re-iterating here the point about EOF made in section 1.2 above; it is possible that some valid

character may compare equal to EOF, so calls to getc, getchar, fgetc, putc, putchar and fputc must use explicitcalls to ferror and (for the read functions) feof to check for an error or the end of the file, rather than checking the

return value.

ungetc doesn't set the error indicator even when it fails, so it's not possible to reliably check for errors unless you

know that the argument isn't equal to EOF. On the other hand, the standard says "one character of pushback is

guaranteed", so this shouldn't be an issue if you only ever push at most one character back before reading again.

These issues are only a practical problem when the char and int types have the same representation, but portable

code cannot assume that they do not.

Finally, the error returns from the various <stdio.h> functions are completely inconsistent. The table below

summarizes:

Function Successful Return Error Returnfclose zero EOF (negative)

fflush zero EOF (negative)

fgetc character read use ferror/feof

fgetpos zero nonzero

fgets pointer to array null pointer

fprintf number of characters (non-negative) negative

fputc character written use ferror

fputs non-negative EOF (negative)

fopen pointer to stream null pointer

fread elements read elements read

freopen pointer to stream null pointer

fscanf number of conversions (non negative) EOF or not all conversions

Page 16: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 16/18

Are you a developer? Try out the HTML to PDF API

fscanf number of conversions (non-negative) EOF or not all conversions

fseek zero [1] nonzero

fsetpos zero nonzero

ftell file position -1L

fwrite elements written elements written

getc character read use ferror/feof

getchar character read use ferror/feof

gets never use this function

printf number of characters (non-negative) negative

putc character written use ferror

puts non-negative EOF (negative)

remove zero nonzero

rename zero nonzerosetvbuf zero nonzero

scanf number of conversions (non-negative) EOF or not all conversions

snprintf number of characters that would be written (non-negative) [2] negative

sprintf number of characters (non-negative) negative

sscanf number of conversions (non-negative) EOF or not all conversions

tmpfile pointer to stream null pointer

tmpnam non-null pointer null pointer

ungetc character pushed back EOF (see above)

vfprintf number of characters (non-negative) negative

vfscanf number of conversions (non-negative) EOF or not all conversions

vprintf number of characters (non-negative) negative

vscanf number of conversions (non negative) EOF or not all conversions

Page 17: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 17/18

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

vscanf number of conversions (non-negative) EOF or not all conversions

vsnprintf number of characters that would be written (non-negative) [2] negative

vsprintf number of characters (non-negative) negative

[1] the standard text is "The fseek function returns nonzero only for a request that cannot be satisfied.", which seems

unambiguous but is oddly different in style from most of the other return value specifications.

[2] early versions of Glibc had a broken implementation of snprintf and vsnprintf: they would return -1 on overflow

rather than the number of characters that would be written. Broken versions of these functions are quite common,

especially in older C libraries.

Afterword

Quite a few of the examples here focus on variadic functions. In particular they look at printf - but this is just becausethat is one of the more complex variadic functions that C programmers usually encounter. The underlying problem is

that C's compile-time type checking mechanisms do not apply in the case of variadic functions.

The same problems occur with unprototyped functions. However, it is usually reasonable to regard these as obsolete

and always avoid them. The one exception is for code that must be portable to very old machines; personally I takethe view that this consideration only ever applies to code necessary to bring up a modern compiler targetted on such

machines.

It would be useful to do away with these problems even in the context of variadic functions, though. There is some

hope. In the case of functions with the same interface as printf, scanf or strftime, GNU C can be told to do type

checking based on the format string.

Obviously this doesn't help for variadic functions with different requirements, though. execlp and similar functions in

UNIX could have compile-time type checking in the same way (a common bug is to fail to make the terminating null

pointer actually a null pointer) but gcc does not currently support this

Page 18: C Language Gotchas

8/2/2019 C Language Gotchas

http://slidepdf.com/reader/full/c-language-gotchas 18/18

pdfcrowd.comopen in browser PRO version Are you a developer? Try out the HTML to PDF API

pointer actually a null pointer) but gcc does not currently support this.

Even if it did, though, building knowledge of each special format into the compiler is not a very general solution. It's

not clear how this would be sensibly fixed without very radical changes to C.

AdministriviaCopyright © 2001, 2002, 2007 Richard Kettlewell. Please send feedback to richard+cfu@@sfere.greenend.org.uk .

RJK | Contents