OpenCores
URL https://opencores.org/ocsvn/test_project/test_project/trunk

Subversion Repositories test_project

[/] [test_project/] [trunk/] [linux_sd_driver/] [Documentation/] [ia64/] [fsys.txt] - Blame information for rev 62

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 62 marcus.erl
-*-Mode: outline-*-
2
 
3
                Light-weight System Calls for IA-64
4
                -----------------------------------
5
 
6
                        Started: 13-Jan-2003
7
                    Last update: 27-Sep-2003
8
 
9
                      David Mosberger-Tang
10
                      
11
 
12
Using the "epc" instruction effectively introduces a new mode of
13
execution to the ia64 linux kernel.  We call this mode the
14
"fsys-mode".  To recap, the normal states of execution are:
15
 
16
  - kernel mode:
17
        Both the register stack and the memory stack have been
18
        switched over to kernel memory.  The user-level state is saved
19
        in a pt-regs structure at the top of the kernel memory stack.
20
 
21
  - user mode:
22
        Both the register stack and the kernel stack are in
23
        user memory.  The user-level state is contained in the
24
        CPU registers.
25
 
26
  - bank 0 interruption-handling mode:
27
        This is the non-interruptible state which all
28
        interruption-handlers start execution in.  The user-level
29
        state remains in the CPU registers and some kernel state may
30
        be stored in bank 0 of registers r16-r31.
31
 
32
In contrast, fsys-mode has the following special properties:
33
 
34
  - execution is at privilege level 0 (most-privileged)
35
 
36
  - CPU registers may contain a mixture of user-level and kernel-level
37
    state (it is the responsibility of the kernel to ensure that no
38
    security-sensitive kernel-level state is leaked back to
39
    user-level)
40
 
41
  - execution is interruptible and preemptible (an fsys-mode handler
42
    can disable interrupts and avoid all other interruption-sources
43
    to avoid preemption)
44
 
45
  - neither the memory-stack nor the register-stack can be trusted while
46
    in fsys-mode (they point to the user-level stacks, which may
47
    be invalid, or completely bogus addresses)
48
 
49
In summary, fsys-mode is much more similar to running in user-mode
50
than it is to running in kernel-mode.  Of course, given that the
51
privilege level is at level 0, this means that fsys-mode requires some
52
care (see below).
53
 
54
 
55
* How to tell fsys-mode
56
 
57
Linux operates in fsys-mode when (a) the privilege level is 0 (most
58
privileged) and (b) the stacks have NOT been switched to kernel memory
59
yet.  For convenience, the header file  provides
60
three macros:
61
 
62
        user_mode(regs)
63
        user_stack(task,regs)
64
        fsys_mode(task,regs)
65
 
66
The "regs" argument is a pointer to a pt_regs structure.  The "task"
67
argument is a pointer to the task structure to which the "regs"
68
pointer belongs to.  user_mode() returns TRUE if the CPU state pointed
69
to by "regs" was executing in user mode (privilege level 3).
70
user_stack() returns TRUE if the state pointed to by "regs" was
71
executing on the user-level stack(s).  Finally, fsys_mode() returns
72
TRUE if the CPU state pointed to by "regs" was executing in fsys-mode.
73
The fsys_mode() macro is equivalent to the expression:
74
 
75
        !user_mode(regs) && user_stack(task,regs)
76
 
77
* How to write an fsyscall handler
78
 
79
The file arch/ia64/kernel/fsys.S contains a table of fsyscall-handlers
80
(fsyscall_table).  This table contains one entry for each system call.
81
By default, a system call is handled by fsys_fallback_syscall().  This
82
routine takes care of entering (full) kernel mode and calling the
83
normal Linux system call handler.  For performance-critical system
84
calls, it is possible to write a hand-tuned fsyscall_handler.  For
85
example, fsys.S contains fsys_getpid(), which is a hand-tuned version
86
of the getpid() system call.
87
 
88
The entry and exit-state of an fsyscall handler is as follows:
89
 
90
** Machine state on entry to fsyscall handler:
91
 
92
 - r10    = 0
93
 - r11    = saved ar.pfs (a user-level value)
94
 - r15    = system call number
95
 - r16    = "current" task pointer (in normal kernel-mode, this is in r13)
96
 - r32-r39 = system call arguments
97
 - b6     = return address (a user-level value)
98
 - ar.pfs = previous frame-state (a user-level value)
99
 - PSR.be = cleared to zero (i.e., little-endian byte order is in effect)
100
 - all other registers may contain values passed in from user-mode
101
 
102
** Required machine state on exit to fsyscall handler:
103
 
104
 - r11    = saved ar.pfs (as passed into the fsyscall handler)
105
 - r15    = system call number (as passed into the fsyscall handler)
106
 - r32-r39 = system call arguments (as passed into the fsyscall handler)
107
 - b6     = return address (as passed into the fsyscall handler)
108
 - ar.pfs = previous frame-state (as passed into the fsyscall handler)
109
 
110
Fsyscall handlers can execute with very little overhead, but with that
111
speed comes a set of restrictions:
112
 
113
 o Fsyscall-handlers MUST check for any pending work in the flags
114
   member of the thread-info structure and if any of the
115
   TIF_ALLWORK_MASK flags are set, the handler needs to fall back on
116
   doing a full system call (by calling fsys_fallback_syscall).
117
 
118
 o Fsyscall-handlers MUST preserve incoming arguments (r32-r39, r11,
119
   r15, b6, and ar.pfs) because they will be needed in case of a
120
   system call restart.  Of course, all "preserved" registers also
121
   must be preserved, in accordance to the normal calling conventions.
122
 
123
 o Fsyscall-handlers MUST check argument registers for containing a
124
   NaT value before using them in any way that could trigger a
125
   NaT-consumption fault.  If a system call argument is found to
126
   contain a NaT value, an fsyscall-handler may return immediately
127
   with r8=EINVAL, r10=-1.
128
 
129
 o Fsyscall-handlers MUST NOT use the "alloc" instruction or perform
130
   any other operation that would trigger mandatory RSE
131
   (register-stack engine) traffic.
132
 
133
 o Fsyscall-handlers MUST NOT write to any stacked registers because
134
   it is not safe to assume that user-level called a handler with the
135
   proper number of arguments.
136
 
137
 o Fsyscall-handlers need to be careful when accessing per-CPU variables:
138
   unless proper safe-guards are taken (e.g., interruptions are avoided),
139
   execution may be pre-empted and resumed on another CPU at any given
140
   time.
141
 
142
 o Fsyscall-handlers must be careful not to leak sensitive kernel'
143
   information back to user-level.  In particular, before returning to
144
   user-level, care needs to be taken to clear any scratch registers
145
   that could contain sensitive information (note that the current
146
   task pointer is not considered sensitive: it's already exposed
147
   through ar.k6).
148
 
149
 o Fsyscall-handlers MUST NOT access user-memory without first
150
   validating access-permission (this can be done typically via
151
   probe.r.fault and/or probe.w.fault) and without guarding against
152
   memory access exceptions (this can be done with the EX() macros
153
   defined by asmmacro.h).
154
 
155
The above restrictions may seem draconian, but remember that it's
156
possible to trade off some of the restrictions by paying a slightly
157
higher overhead.  For example, if an fsyscall-handler could benefit
158
from the shadow register bank, it could temporarily disable PSR.i and
159
PSR.ic, switch to bank 0 (bsw.0) and then use the shadow registers as
160
needed.  In other words, following the above rules yields extremely
161
fast system call execution (while fully preserving system call
162
semantics), but there is also a lot of flexibility in handling more
163
complicated cases.
164
 
165
* Signal handling
166
 
167
The delivery of (asynchronous) signals must be delayed until fsys-mode
168
is exited.  This is accomplished with the help of the lower-privilege
169
transfer trap: arch/ia64/kernel/process.c:do_notify_resume_user()
170
checks whether the interrupted task was in fsys-mode and, if so, sets
171
PSR.lp and returns immediately.  When fsys-mode is exited via the
172
"br.ret" instruction that lowers the privilege level, a trap will
173
occur.  The trap handler clears PSR.lp again and returns immediately.
174
The kernel exit path then checks for and delivers any pending signals.
175
 
176
* PSR Handling
177
 
178
The "epc" instruction doesn't change the contents of PSR at all.  This
179
is in contrast to a regular interruption, which clears almost all
180
bits.  Because of that, some care needs to be taken to ensure things
181
work as expected.  The following discussion describes how each PSR bit
182
is handled.
183
 
184
PSR.be  Cleared when entering fsys-mode.  A srlz.d instruction is used
185
        to ensure the CPU is in little-endian mode before the first
186
        load/store instruction is executed.  PSR.be is normally NOT
187
        restored upon return from an fsys-mode handler.  In other
188
        words, user-level code must not rely on PSR.be being preserved
189
        across a system call.
190
PSR.up  Unchanged.
191
PSR.ac  Unchanged.
192
PSR.mfl Unchanged.  Note: fsys-mode handlers must not write-registers!
193
PSR.mfh Unchanged.  Note: fsys-mode handlers must not write-registers!
194
PSR.ic  Unchanged.  Note: fsys-mode handlers can clear the bit, if needed.
195
PSR.i   Unchanged.  Note: fsys-mode handlers can clear the bit, if needed.
196
PSR.pk  Unchanged.
197
PSR.dt  Unchanged.
198
PSR.dfl Unchanged.  Note: fsys-mode handlers must not write-registers!
199
PSR.dfh Unchanged.  Note: fsys-mode handlers must not write-registers!
200
PSR.sp  Unchanged.
201
PSR.pp  Unchanged.
202
PSR.di  Unchanged.
203
PSR.si  Unchanged.
204
PSR.db  Unchanged.  The kernel prevents user-level from setting a hardware
205
        breakpoint that triggers at any privilege level other than 3 (user-mode).
206
PSR.lp  Unchanged.
207
PSR.tb  Lazy redirect.  If a taken-branch trap occurs while in
208
        fsys-mode, the trap-handler modifies the saved machine state
209
        such that execution resumes in the gate page at
210
        syscall_via_break(), with privilege level 3.  Note: the
211
        taken branch would occur on the branch invoking the
212
        fsyscall-handler, at which point, by definition, a syscall
213
        restart is still safe.  If the system call number is invalid,
214
        the fsys-mode handler will return directly to user-level.  This
215
        return will trigger a taken-branch trap, but since the trap is
216
        taken _after_ restoring the privilege level, the CPU has already
217
        left fsys-mode, so no special treatment is needed.
218
PSR.rt  Unchanged.
219
PSR.cpl Cleared to 0.
220
PSR.is  Unchanged (guaranteed to be 0 on entry to the gate page).
221
PSR.mc  Unchanged.
222
PSR.it  Unchanged (guaranteed to be 1).
223
PSR.id  Unchanged.  Note: the ia64 linux kernel never sets this bit.
224
PSR.da  Unchanged.  Note: the ia64 linux kernel never sets this bit.
225
PSR.dd  Unchanged.  Note: the ia64 linux kernel never sets this bit.
226
PSR.ss  Lazy redirect.  If set, "epc" will cause a Single Step Trap to
227
        be taken.  The trap handler then modifies the saved machine
228
        state such that execution resumes in the gate page at
229
        syscall_via_break(), with privilege level 3.
230
PSR.ri  Unchanged.
231
PSR.ed  Unchanged.  Note: This bit could only have an effect if an fsys-mode
232
        handler performed a speculative load that gets NaTted.  If so, this
233
        would be the normal & expected behavior, so no special treatment is
234
        needed.
235
PSR.bn  Unchanged.  Note: fsys-mode handlers may clear the bit, if needed.
236
        Doing so requires clearing PSR.i and PSR.ic as well.
237
PSR.ia  Unchanged.  Note: the ia64 linux kernel never sets this bit.
238
 
239
* Using fast system calls
240
 
241
To use fast system calls, userspace applications need simply call
242
__kernel_syscall_via_epc().  For example
243
 
244
-- example fgettimeofday() call --
245
-- fgettimeofday.S --
246
 
247
#include 
248
 
249
GLOBAL_ENTRY(fgettimeofday)
250
.prologue
251
.save ar.pfs, r11
252
mov r11 = ar.pfs
253
.body
254
 
255
mov r2 = 0xa000000000020660;;  // gate address
256
                               // found by inspection of System.map for the
257
                               // __kernel_syscall_via_epc() function.  See
258
                               // below for how to do this for real.
259
 
260
mov b7 = r2
261
mov r15 = 1087                 // gettimeofday syscall
262
;;
263
br.call.sptk.many b6 = b7
264
;;
265
 
266
.restore sp
267
 
268
mov ar.pfs = r11
269
br.ret.sptk.many rp;;         // return to caller
270
END(fgettimeofday)
271
 
272
-- end fgettimeofday.S --
273
 
274
In reality, getting the gate address is accomplished by two extra
275
values passed via the ELF auxiliary vector (include/asm-ia64/elf.h)
276
 
277
 o AT_SYSINFO : is the address of __kernel_syscall_via_epc()
278
 o AT_SYSINFO_EHDR : is the address of the kernel gate ELF DSO
279
 
280
The ELF DSO is a pre-linked library that is mapped in by the kernel at
281
the gate page.  It is a proper ELF shared object so, with a dynamic
282
loader that recognises the library, you should be able to make calls to
283
the exported functions within it as with any other shared library.
284
AT_SYSINFO points into the kernel DSO at the
285
__kernel_syscall_via_epc() function for historical reasons (it was
286
used before the kernel DSO) and as a convenience.

powered by: WebSVN 2.1.0

© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.