1 |
1026 |
ivang |
@c
|
2 |
|
|
@c COPYRIGHT (c) 1988-2002.
|
3 |
|
|
@c On-Line Applications Research Corporation (OAR).
|
4 |
|
|
@c All rights reserved.
|
5 |
|
|
@c
|
6 |
|
|
@c callconv.t,v 1.6 2002/01/17 21:47:47 joel Exp
|
7 |
|
|
@c
|
8 |
|
|
|
9 |
|
|
@chapter Calling Conventions
|
10 |
|
|
|
11 |
|
|
@section Introduction
|
12 |
|
|
|
13 |
|
|
Each high-level language compiler generates
|
14 |
|
|
subroutine entry and exit code based upon a set of rules known
|
15 |
|
|
as the compiler's calling convention. These rules address the
|
16 |
|
|
following issues:
|
17 |
|
|
|
18 |
|
|
@itemize @bullet
|
19 |
|
|
@item register preservation and usage
|
20 |
|
|
|
21 |
|
|
@item parameter passing
|
22 |
|
|
|
23 |
|
|
@item call and return mechanism
|
24 |
|
|
@end itemize
|
25 |
|
|
|
26 |
|
|
A compiler's calling convention is of importance when
|
27 |
|
|
interfacing to subroutines written in another language either
|
28 |
|
|
assembly or high-level. Even when the high-level language and
|
29 |
|
|
target processor are the same, different compilers may use
|
30 |
|
|
different calling conventions. As a result, calling conventions
|
31 |
|
|
are both processor and compiler dependent.
|
32 |
|
|
|
33 |
|
|
@section Programming Model
|
34 |
|
|
|
35 |
|
|
This section discusses the programming model for the
|
36 |
|
|
SPARC architecture.
|
37 |
|
|
|
38 |
|
|
@subsection Non-Floating Point Registers
|
39 |
|
|
|
40 |
|
|
The SPARC architecture defines thirty-two
|
41 |
|
|
non-floating point registers directly visible to the programmer.
|
42 |
|
|
These are divided into four sets:
|
43 |
|
|
|
44 |
|
|
@itemize @bullet
|
45 |
|
|
@item input registers
|
46 |
|
|
|
47 |
|
|
@item local registers
|
48 |
|
|
|
49 |
|
|
@item output registers
|
50 |
|
|
|
51 |
|
|
@item global registers
|
52 |
|
|
@end itemize
|
53 |
|
|
|
54 |
|
|
Each register is referred to by either two or three
|
55 |
|
|
names in the SPARC reference manuals. First, the registers are
|
56 |
|
|
referred to as r0 through r31 or with the alternate notation
|
57 |
|
|
r[0] through r[31]. Second, each register is a member of one of
|
58 |
|
|
the four sets listed above. Finally, some registers have an
|
59 |
|
|
architecturally defined role in the programming model which
|
60 |
|
|
provides an alternate name. The following table describes the
|
61 |
|
|
mapping between the 32 registers and the register sets:
|
62 |
|
|
|
63 |
|
|
@ifset use-ascii
|
64 |
|
|
@example
|
65 |
|
|
@group
|
66 |
|
|
+-----------------+----------------+------------------+
|
67 |
|
|
| Register Number | Register Names | Description |
|
68 |
|
|
+-----------------+----------------+------------------+
|
69 |
|
|
| 0 - 7 | g0 - g7 | Global Registers |
|
70 |
|
|
+-----------------+----------------+------------------+
|
71 |
|
|
| 8 - 15 | o0 - o7 | Output Registers |
|
72 |
|
|
+-----------------+----------------+------------------+
|
73 |
|
|
| 16 - 23 | l0 - l7 | Local Registers |
|
74 |
|
|
+-----------------+----------------+------------------+
|
75 |
|
|
| 24 - 31 | i0 - i7 | Input Registers |
|
76 |
|
|
+-----------------+----------------+------------------+
|
77 |
|
|
@end group
|
78 |
|
|
@end example
|
79 |
|
|
@end ifset
|
80 |
|
|
|
81 |
|
|
@ifset use-tex
|
82 |
|
|
@sp 1
|
83 |
|
|
@tex
|
84 |
|
|
\centerline{\vbox{\offinterlineskip\halign{
|
85 |
|
|
\vrule\strut#&
|
86 |
|
|
\hbox to 1.75in{\enskip\hfil#\hfil}&
|
87 |
|
|
\vrule#&
|
88 |
|
|
\hbox to 1.75in{\enskip\hfil#\hfil}&
|
89 |
|
|
\vrule#&
|
90 |
|
|
\hbox to 1.75in{\enskip\hfil#\hfil}&
|
91 |
|
|
\vrule#\cr
|
92 |
|
|
\noalign{\hrule}
|
93 |
|
|
&\bf Register Number &&\bf Register Names&&\bf Description&\cr\noalign{\hrule}
|
94 |
|
|
&0 - 7&&g0 - g7&&Global Registers&\cr\noalign{\hrule}
|
95 |
|
|
&8 - 15&&o0 - o7&&Output Registers&\cr\noalign{\hrule}
|
96 |
|
|
&16 - 23&&l0 - l7&&Local Registers&\cr\noalign{\hrule}
|
97 |
|
|
&24 - 31&&i0 - i7&&Input Registers&\cr\noalign{\hrule}
|
98 |
|
|
}}\hfil}
|
99 |
|
|
@end tex
|
100 |
|
|
@end ifset
|
101 |
|
|
|
102 |
|
|
@ifset use-html
|
103 |
|
|
@html
|
104 |
|
|
|
105 |
|
|
106 |
|
|
Register Number |
107 |
|
|
Register Names |
|
108 |
|
|
Description |
|
109 |
|
|
0 - 7 |
110 |
|
|
g0 - g7 |
|
111 |
|
|
Global Registers |
|
|
112 |
|
|
8 - 15 |
113 |
|
|
o0 - o7 |
|
114 |
|
|
Output Registers |
|
|
115 |
|
|
16 - 23 |
116 |
|
|
l0 - l7 |
|
117 |
|
|
Local Registers |
|
|
118 |
|
|
24 - 31 |
119 |
|
|
i0 - i7 |
|
120 |
|
|
Input Registers |
|
|
121 |
|
|
|
|
|
122 |
|
|
|
123 |
|
|
@end html
|
124 |
|
|
@end ifset
|
125 |
|
|
|
126 |
|
|
As mentioned above, some of the registers serve
|
127 |
|
|
defined roles in the programming model. The following table
|
128 |
|
|
describes the role of each of these registers:
|
129 |
|
|
|
130 |
|
|
@ifset use-ascii
|
131 |
|
|
@example
|
132 |
|
|
@group
|
133 |
|
|
+---------------+----------------+----------------------+
|
134 |
|
|
| Register Name | Alternate Name | Description |
|
135 |
|
|
+---------------+----------------+----------------------+
|
136 |
|
|
| g0 | na | reads return 0 |
|
137 |
|
|
| | | writes are ignored |
|
138 |
|
|
+---------------+----------------+----------------------+
|
139 |
|
|
| o6 | sp | stack pointer |
|
140 |
|
|
+---------------+----------------+----------------------+
|
141 |
|
|
| i6 | fp | frame pointer |
|
142 |
|
|
+---------------+----------------+----------------------+
|
143 |
|
|
| i7 | na | return address |
|
144 |
|
|
+---------------+----------------+----------------------+
|
145 |
|
|
@end group
|
146 |
|
|
@end example
|
147 |
|
|
@end ifset
|
148 |
|
|
|
149 |
|
|
@ifset use-tex
|
150 |
|
|
@sp 1
|
151 |
|
|
@tex
|
152 |
|
|
\centerline{\vbox{\offinterlineskip\halign{
|
153 |
|
|
\vrule\strut#&
|
154 |
|
|
\hbox to 1.75in{\enskip\hfil#\hfil}&
|
155 |
|
|
\vrule#&
|
156 |
|
|
\hbox to 1.75in{\enskip\hfil#\hfil}&
|
157 |
|
|
\vrule#&
|
158 |
|
|
\hbox to 1.75in{\enskip\hfil#\hfil}&
|
159 |
|
|
\vrule#\cr
|
160 |
|
|
\noalign{\hrule}
|
161 |
|
|
&\bf Register Name &&\bf Alternate Names&&\bf Description&\cr\noalign{\hrule}
|
162 |
|
|
&g0&&NA&&reads return 0; &\cr
|
163 |
|
|
&&&&&writes are ignored&\cr\noalign{\hrule}
|
164 |
|
|
&o6&&sp&&stack pointer&\cr\noalign{\hrule}
|
165 |
|
|
&i6&&fp&&frame pointer&\cr\noalign{\hrule}
|
166 |
|
|
&i7&&NA&&return address&\cr\noalign{\hrule}
|
167 |
|
|
}}\hfil}
|
168 |
|
|
@end tex
|
169 |
|
|
@end ifset
|
170 |
|
|
|
171 |
|
|
@ifset use-html
|
172 |
|
|
@html
|
173 |
|
|
|
174 |
|
|
175 |
|
|
Register Name |
176 |
|
|
Alternate Name |
|
177 |
|
|
Description |
|
|
178 |
|
|
g0 |
179 |
|
|
NA |
|
180 |
|
|
reads return 0 ; writes are ignored |
|
|
181 |
|
|
o6 |
182 |
|
|
sp |
|
183 |
|
|
stack pointer |
|
|
184 |
|
|
i6 |
185 |
|
|
fp |
|
186 |
|
|
frame pointer |
|
|
187 |
|
|
i7 |
188 |
|
|
NA |
|
189 |
|
|
return address |
|
|
190 |
|
|
|
|
191 |
|
|
|
192 |
|
|
@end html
|
193 |
|
|
@end ifset
|
194 |
|
|
|
195 |
|
|
|
196 |
|
|
@subsection Floating Point Registers
|
197 |
|
|
|
198 |
|
|
The SPARC V7 architecture includes thirty-two,
|
199 |
|
|
thirty-two bit registers. These registers may be viewed as
|
200 |
|
|
follows:
|
201 |
|
|
|
202 |
|
|
@itemize @bullet
|
203 |
|
|
@item 32 single precision floating point or integer registers
|
204 |
|
|
(f0, f1, ... f31)
|
205 |
|
|
|
206 |
|
|
@item 16 double precision floating point registers (f0, f2,
|
207 |
|
|
f4, ... f30)
|
208 |
|
|
|
209 |
|
|
@item 8 extended precision floating point registers (f0, f4,
|
210 |
|
|
f8, ... f28)
|
211 |
|
|
@end itemize
|
212 |
|
|
|
213 |
|
|
The floating point status register (fpsr) specifies
|
214 |
|
|
the behavior of the floating point unit for rounding, contains
|
215 |
|
|
its condition codes, version specification, and trap information.
|
216 |
|
|
|
217 |
|
|
A queue of the floating point instructions which have
|
218 |
|
|
started execution but not yet completed is maintained. This
|
219 |
|
|
queue is needed to support the multiple cycle nature of floating
|
220 |
|
|
point operations and to aid floating point exception trap
|
221 |
|
|
handlers. Once a floating point exception has been encountered,
|
222 |
|
|
the queue is frozen until it is emptied by the trap handler.
|
223 |
|
|
The floating point queue is loaded by launching instructions.
|
224 |
|
|
It is emptied normally when the floating point completes all
|
225 |
|
|
outstanding instructions and by floating point exception
|
226 |
|
|
handlers with the store double floating point queue (stdfq)
|
227 |
|
|
instruction.
|
228 |
|
|
|
229 |
|
|
@subsection Special Registers
|
230 |
|
|
|
231 |
|
|
The SPARC architecture includes two special registers
|
232 |
|
|
which are critical to the programming model: the Processor State
|
233 |
|
|
Register (psr) and the Window Invalid Mask (wim). The psr
|
234 |
|
|
contains the condition codes, processor interrupt level, trap
|
235 |
|
|
enable bit, supervisor mode and previous supervisor mode bits,
|
236 |
|
|
version information, floating point unit and coprocessor enable
|
237 |
|
|
bits, and the current window pointer (cwp). The cwp field of
|
238 |
|
|
the psr and wim register are used to manage the register windows
|
239 |
|
|
in the SPARC architecture. The register windows are discussed
|
240 |
|
|
in more detail below.
|
241 |
|
|
|
242 |
|
|
@section Register Windows
|
243 |
|
|
|
244 |
|
|
The SPARC architecture includes the concept of
|
245 |
|
|
register windows. An overly simplistic way to think of these
|
246 |
|
|
windows is to imagine them as being an infinite supply of
|
247 |
|
|
"fresh" register sets available for each subroutine to use. In
|
248 |
|
|
reality, they are much more complicated.
|
249 |
|
|
|
250 |
|
|
The save instruction is used to obtain a new register
|
251 |
|
|
window. This instruction decrements the current window pointer,
|
252 |
|
|
thus providing a new set of registers for use. This register
|
253 |
|
|
set includes eight fresh local registers for use exclusively by
|
254 |
|
|
this subroutine. When done with a register set, the restore
|
255 |
|
|
instruction increments the current window pointer and the
|
256 |
|
|
previous register set is once again available.
|
257 |
|
|
|
258 |
|
|
The two primary issues complicating the use of
|
259 |
|
|
register windows are that (1) the set of register windows is
|
260 |
|
|
finite, and (2) some registers are shared between adjacent
|
261 |
|
|
registers windows.
|
262 |
|
|
|
263 |
|
|
Because the set of register windows is finite, it is
|
264 |
|
|
possible to execute enough save instructions without
|
265 |
|
|
corresponding restore's to consume all of the register windows.
|
266 |
|
|
This is easily accomplished in a high level language because
|
267 |
|
|
each subroutine typically performs a save instruction upon
|
268 |
|
|
entry. Thus having a subroutine call depth greater than the
|
269 |
|
|
number of register windows will result in a window overflow
|
270 |
|
|
condition. The window overflow condition generates a trap which
|
271 |
|
|
must be handled in software. The window overflow trap handler
|
272 |
|
|
is responsible for saving the contents of the oldest register
|
273 |
|
|
window on the program stack.
|
274 |
|
|
|
275 |
|
|
Similarly, the subroutines will eventually complete
|
276 |
|
|
and begin to perform restore's. If the restore results in the
|
277 |
|
|
need for a register window which has previously been written to
|
278 |
|
|
memory as part of an overflow, then a window underflow condition
|
279 |
|
|
results. Just like the window overflow, the window underflow
|
280 |
|
|
condition must be handled in software by a trap handler. The
|
281 |
|
|
window underflow trap handler is responsible for reloading the
|
282 |
|
|
contents of the register window requested by the restore
|
283 |
|
|
instruction from the program stack.
|
284 |
|
|
|
285 |
|
|
The Window Invalid Mask (wim) and the Current Window
|
286 |
|
|
Pointer (cwp) field in the psr are used in conjunction to manage
|
287 |
|
|
the finite set of register windows and detect the window
|
288 |
|
|
overflow and underflow conditions. The cwp contains the index
|
289 |
|
|
of the register window currently in use. The save instruction
|
290 |
|
|
decrements the cwp modulo the number of register windows.
|
291 |
|
|
Similarly, the restore instruction increments the cwp modulo the
|
292 |
|
|
number of register windows. Each bit in the wim represents
|
293 |
|
|
represents whether a register window contains valid information.
|
294 |
|
|
The value of 0 indicates the register window is valid and 1
|
295 |
|
|
indicates it is invalid. When a save instruction causes the cwp
|
296 |
|
|
to point to a register window which is marked as invalid, a
|
297 |
|
|
window overflow condition results. Conversely, the restore
|
298 |
|
|
instruction may result in a window underflow condition.
|
299 |
|
|
|
300 |
|
|
Other than the assumption that a register window is
|
301 |
|
|
always available for trap (i.e. interrupt) handlers, the SPARC
|
302 |
|
|
architecture places no limits on the number of register windows
|
303 |
|
|
simultaneously marked as invalid (i.e. number of bits set in the
|
304 |
|
|
wim). However, RTEMS assumes that only one register window is
|
305 |
|
|
marked invalid at a time (i.e. only one bit set in the wim).
|
306 |
|
|
This makes the maximum possible number of register windows
|
307 |
|
|
available to the user while still meeting the requirement that
|
308 |
|
|
window overflow and underflow conditions can be detected.
|
309 |
|
|
|
310 |
|
|
The window overflow and window underflow trap
|
311 |
|
|
handlers are a critical part of the run-time environment for a
|
312 |
|
|
SPARC application. The SPARC architectural specification allows
|
313 |
|
|
for the number of register windows to be any power of two less
|
314 |
|
|
than or equal to 32. The most common choice for SPARC
|
315 |
|
|
implementations appears to be 8 register windows. This results
|
316 |
|
|
in the cwp ranging in value from 0 to 7 on most implementations.
|
317 |
|
|
|
318 |
|
|
|
319 |
|
|
The second complicating factor is the sharing of
|
320 |
|
|
registers between adjacent register windows. While each
|
321 |
|
|
register window has its own set of local registers, the input
|
322 |
|
|
and output registers are shared between adjacent windows. The
|
323 |
|
|
output registers for register window N are the same as the input
|
324 |
|
|
registers for register window ((N - 1) modulo RW) where RW is
|
325 |
|
|
the number of register windows. An alternative way to think of
|
326 |
|
|
this is to remember how parameters are passed to a subroutine on
|
327 |
|
|
the SPARC. The caller loads values into what are its output
|
328 |
|
|
registers. Then after the callee executes a save instruction,
|
329 |
|
|
those parameters are available in its input registers. This is
|
330 |
|
|
a very efficient way to pass parameters as no data is actually
|
331 |
|
|
moved by the save or restore instructions.
|
332 |
|
|
|
333 |
|
|
@section Call and Return Mechanism
|
334 |
|
|
|
335 |
|
|
The SPARC architecture supports a simple yet
|
336 |
|
|
effective call and return mechanism. A subroutine is invoked
|
337 |
|
|
via the call (call) instruction. This instruction places the
|
338 |
|
|
return address in the caller's output register 7 (o7). After
|
339 |
|
|
the callee executes a save instruction, this value is available
|
340 |
|
|
in input register 7 (i7) until the corresponding restore
|
341 |
|
|
instruction is executed.
|
342 |
|
|
|
343 |
|
|
The callee returns to the caller via a jmp to the
|
344 |
|
|
return address. There is a delay slot following this
|
345 |
|
|
instruction which is commonly used to execute a restore
|
346 |
|
|
instruction -- if a register window was allocated by this
|
347 |
|
|
subroutine.
|
348 |
|
|
|
349 |
|
|
It is important to note that the SPARC subroutine
|
350 |
|
|
call and return mechanism does not automatically save and
|
351 |
|
|
restore any registers. This is accomplished via the save and
|
352 |
|
|
restore instructions which manage the set of registers windows.
|
353 |
|
|
|
354 |
|
|
@section Calling Mechanism
|
355 |
|
|
|
356 |
|
|
All RTEMS directives are invoked using the regular
|
357 |
|
|
SPARC calling convention via the call instruction.
|
358 |
|
|
|
359 |
|
|
@section Register Usage
|
360 |
|
|
|
361 |
|
|
As discussed above, the call instruction does not
|
362 |
|
|
automatically save any registers. The save and restore
|
363 |
|
|
instructions are used to allocate and deallocate register
|
364 |
|
|
windows. When a register window is allocated, the new set of
|
365 |
|
|
local registers are available for the exclusive use of the
|
366 |
|
|
subroutine which allocated this register set.
|
367 |
|
|
|
368 |
|
|
@section Parameter Passing
|
369 |
|
|
|
370 |
|
|
RTEMS assumes that arguments are placed in the
|
371 |
|
|
caller's output registers with the first argument in output
|
372 |
|
|
register 0 (o0), the second argument in output register 1 (o1),
|
373 |
|
|
and so forth. Until the callee executes a save instruction, the
|
374 |
|
|
parameters are still visible in the output registers. After the
|
375 |
|
|
callee executes a save instruction, the parameters are visible
|
376 |
|
|
in the corresponding input registers. The following pseudo-code
|
377 |
|
|
illustrates the typical sequence used to call a RTEMS directive
|
378 |
|
|
with three (3) arguments:
|
379 |
|
|
|
380 |
|
|
@example
|
381 |
|
|
load third argument into o2
|
382 |
|
|
load second argument into o1
|
383 |
|
|
load first argument into o0
|
384 |
|
|
invoke directive
|
385 |
|
|
@end example
|
386 |
|
|
|
387 |
|
|
@section User-Provided Routines
|
388 |
|
|
|
389 |
|
|
All user-provided routines invoked by RTEMS, such as
|
390 |
|
|
user extensions, device drivers, and MPCI routines, must also
|
391 |
|
|
adhere to these calling conventions.
|
392 |
|
|
|