1 |
12 |
jlechner |
Copyright (C) 2000, 2003 Free Software Foundation, Inc.
|
2 |
|
|
|
3 |
|
|
This file is intended to contain a few notes about writing C code
|
4 |
|
|
within GCC so that it compiles without error on the full range of
|
5 |
|
|
compilers GCC needs to be able to compile on.
|
6 |
|
|
|
7 |
|
|
The problem is that many ISO-standard constructs are not accepted by
|
8 |
|
|
either old or buggy compilers, and we keep getting bitten by them.
|
9 |
|
|
This knowledge until know has been sparsely spread around, so I
|
10 |
|
|
thought I'd collect it in one useful place. Please add and correct
|
11 |
|
|
any problems as you come across them.
|
12 |
|
|
|
13 |
|
|
I'm going to start from a base of the ISO C90 standard, since that is
|
14 |
|
|
probably what most people code to naturally. Obviously using
|
15 |
|
|
constructs introduced after that is not a good idea.
|
16 |
|
|
|
17 |
|
|
For the complete coding style conventions used in GCC, please read
|
18 |
|
|
http://gcc.gnu.org/codingconventions.html
|
19 |
|
|
|
20 |
|
|
|
21 |
|
|
String literals
|
22 |
|
|
---------------
|
23 |
|
|
|
24 |
|
|
Irix6 "cc -n32" and OSF4 "cc" have problems with constant string
|
25 |
|
|
initializers with parens around it, e.g.
|
26 |
|
|
|
27 |
|
|
const char string[] = ("A string");
|
28 |
|
|
|
29 |
|
|
This is unfortunate since this is what the GNU gettext macro N_
|
30 |
|
|
produces. You need to find a different way to code it.
|
31 |
|
|
|
32 |
|
|
Some compilers like MSVC++ have fairly low limits on the maximum
|
33 |
|
|
length of a string literal; 509 is the lowest we've come across. You
|
34 |
|
|
may need to break up a long printf statement into many smaller ones.
|
35 |
|
|
|
36 |
|
|
|
37 |
|
|
Empty macro arguments
|
38 |
|
|
---------------------
|
39 |
|
|
|
40 |
|
|
ISO C (6.8.3 in the 1990 standard) specifies the following:
|
41 |
|
|
|
42 |
|
|
If (before argument substitution) any argument consists of no
|
43 |
|
|
preprocessing tokens, the behavior is undefined.
|
44 |
|
|
|
45 |
|
|
This was relaxed by ISO C99, but some older compilers emit an error,
|
46 |
|
|
so code like
|
47 |
|
|
|
48 |
|
|
#define foo(x, y) x y
|
49 |
|
|
foo (bar, )
|
50 |
|
|
|
51 |
|
|
needs to be coded in some other way.
|
52 |
|
|
|
53 |
|
|
|
54 |
|
|
free and realloc
|
55 |
|
|
----------------
|
56 |
|
|
|
57 |
|
|
Some implementations crash upon attempts to free or realloc the null
|
58 |
|
|
pointer. Thus if mem might be null, you need to write
|
59 |
|
|
|
60 |
|
|
if (mem)
|
61 |
|
|
free (mem);
|
62 |
|
|
|
63 |
|
|
|
64 |
|
|
Trigraphs
|
65 |
|
|
---------
|
66 |
|
|
|
67 |
|
|
You weren't going to use them anyway, but some otherwise ISO C
|
68 |
|
|
compliant compilers do not accept trigraphs.
|
69 |
|
|
|
70 |
|
|
|
71 |
|
|
Suffixes on Integer Constants
|
72 |
|
|
-----------------------------
|
73 |
|
|
|
74 |
|
|
You should never use a 'l' suffix on integer constants ('L' is fine),
|
75 |
|
|
since it can easily be confused with the number '1'.
|
76 |
|
|
|
77 |
|
|
|
78 |
|
|
Common Coding Pitfalls
|
79 |
|
|
======================
|
80 |
|
|
|
81 |
|
|
errno
|
82 |
|
|
-----
|
83 |
|
|
|
84 |
|
|
errno might be declared as a macro.
|
85 |
|
|
|
86 |
|
|
|
87 |
|
|
Implicit int
|
88 |
|
|
------------
|
89 |
|
|
|
90 |
|
|
In C, the 'int' keyword can often be omitted from type declarations.
|
91 |
|
|
For instance, you can write
|
92 |
|
|
|
93 |
|
|
unsigned variable;
|
94 |
|
|
|
95 |
|
|
as shorthand for
|
96 |
|
|
|
97 |
|
|
unsigned int variable;
|
98 |
|
|
|
99 |
|
|
There are several places where this can cause trouble. First, suppose
|
100 |
|
|
'variable' is a long; then you might think
|
101 |
|
|
|
102 |
|
|
(unsigned) variable
|
103 |
|
|
|
104 |
|
|
would convert it to unsigned long. It does not. It converts to
|
105 |
|
|
unsigned int. This mostly causes problems on 64-bit platforms, where
|
106 |
|
|
long and int are not the same size.
|
107 |
|
|
|
108 |
|
|
Second, if you write a function definition with no return type at
|
109 |
|
|
all:
|
110 |
|
|
|
111 |
|
|
operate (int a, int b)
|
112 |
|
|
{
|
113 |
|
|
...
|
114 |
|
|
}
|
115 |
|
|
|
116 |
|
|
that function is expected to return int, *not* void. GCC will warn
|
117 |
|
|
about this.
|
118 |
|
|
|
119 |
|
|
Implicit function declarations always have return type int. So if you
|
120 |
|
|
correct the above definition to
|
121 |
|
|
|
122 |
|
|
void
|
123 |
|
|
operate (int a, int b)
|
124 |
|
|
...
|
125 |
|
|
|
126 |
|
|
but operate() is called above its definition, you will get an error
|
127 |
|
|
about a "type mismatch with previous implicit declaration". The cure
|
128 |
|
|
is to prototype all functions at the top of the file, or in an
|
129 |
|
|
appropriate header.
|
130 |
|
|
|
131 |
|
|
Char vs unsigned char vs int
|
132 |
|
|
----------------------------
|
133 |
|
|
|
134 |
|
|
In C, unqualified 'char' may be either signed or unsigned; it is the
|
135 |
|
|
implementation's choice. When you are processing 7-bit ASCII, it does
|
136 |
|
|
not matter. But when your program must handle arbitrary binary data,
|
137 |
|
|
or fully 8-bit character sets, you have a problem. The most obvious
|
138 |
|
|
issue is if you have a look-up table indexed by characters.
|
139 |
|
|
|
140 |
|
|
For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
|
141 |
|
|
WITH ACUTE ACCENT. In the proper locale, isalpha('\341') will be
|
142 |
|
|
true. But if you read '\341' from a file and store it in a plain
|
143 |
|
|
char, isalpha(c) may look up character 225, or it may look up
|
144 |
|
|
character -31. And the ctype table has no entry at offset -31, so
|
145 |
|
|
your program will crash. (If you're lucky.)
|
146 |
|
|
|
147 |
|
|
It is wise to use unsigned char everywhere you possibly can. This
|
148 |
|
|
avoids all these problems. Unfortunately, the routines in
|
149 |
|
|
take plain char arguments, so you have to remember to cast them back
|
150 |
|
|
and forth - or avoid the use of strxxx() functions, which is probably
|
151 |
|
|
a good idea anyway.
|
152 |
|
|
|
153 |
|
|
Another common mistake is to use either char or unsigned char to
|
154 |
|
|
receive the result of getc() or related stdio functions. They may
|
155 |
|
|
return EOF, which is outside the range of values representable by
|
156 |
|
|
char. If you use char, some legal character value may be confused
|
157 |
|
|
with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
|
158 |
|
|
The correct choice is int.
|
159 |
|
|
|
160 |
|
|
A more subtle version of the same mistake might look like this:
|
161 |
|
|
|
162 |
|
|
unsigned char pushback[NPUSHBACK];
|
163 |
|
|
int pbidx;
|
164 |
|
|
#define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
|
165 |
|
|
#define get(c) (pbidx ? pushback[--pbidx] : getchar())
|
166 |
|
|
...
|
167 |
|
|
unget(EOF);
|
168 |
|
|
|
169 |
|
|
which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
|
170 |
|
|
WITH UMLAUT.
|
171 |
|
|
|
172 |
|
|
|
173 |
|
|
Other common pitfalls
|
174 |
|
|
---------------------
|
175 |
|
|
|
176 |
|
|
o Expecting 'plain' char to be either sign or unsigned extending.
|
177 |
|
|
|
178 |
|
|
o Shifting an item by a negative amount or by greater than or equal to
|
179 |
|
|
the number of bits in a type (expecting shifts by 32 to be sensible
|
180 |
|
|
has caused quite a number of bugs at least in the early days).
|
181 |
|
|
|
182 |
|
|
o Expecting ints shifted right to be sign extended.
|
183 |
|
|
|
184 |
|
|
o Modifying the same value twice within one sequence point.
|
185 |
|
|
|
186 |
|
|
o Host vs. target floating point representation, including emitting NaNs
|
187 |
|
|
and Infinities in a form that the assembler handles.
|
188 |
|
|
|
189 |
|
|
o qsort being an unstable sort function (unstable in the sense that
|
190 |
|
|
multiple items that sort the same may be sorted in different orders
|
191 |
|
|
by different qsort functions).
|
192 |
|
|
|
193 |
|
|
o Passing incorrect types to fprintf and friends.
|
194 |
|
|
|
195 |
|
|
o Adding a function declaration for a module declared in another file to
|
196 |
|
|
a .c file instead of to a .h file.
|
197 |
|
|
|