OpenCores
URL https://opencores.org/ocsvn/or1k_soc_on_altera_embedded_dev_kit/or1k_soc_on_altera_embedded_dev_kit/trunk

Subversion Repositories or1k_soc_on_altera_embedded_dev_kit

[/] [or1k_soc_on_altera_embedded_dev_kit/] [trunk/] [linux-2.6/] [linux-2.6.24/] [Documentation/] [ia64/] [mca.txt] - Blame information for rev 3

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 3 xianfeng
An ad-hoc collection of notes on IA64 MCA and INIT processing.  Feel
2
free to update it with notes about any area that is not clear.
3
 
4
---
5
 
6
MCA/INIT are completely asynchronous.  They can occur at any time, when
7
the OS is in any state.  Including when one of the cpus is already
8
holding a spinlock.  Trying to get any lock from MCA/INIT state is
9
asking for deadlock.  Also the state of structures that are protected
10
by locks is indeterminate, including linked lists.
11
 
12
---
13
 
14
The complicated ia64 MCA process.  All of this is mandated by Intel's
15
specification for ia64 SAL, error recovery and unwind, it is not as
16
if we have a choice here.
17
 
18
* MCA occurs on one cpu, usually due to a double bit memory error.
19
  This is the monarch cpu.
20
 
21
* SAL sends an MCA rendezvous interrupt (which is a normal interrupt)
22
  to all the other cpus, the slaves.
23
 
24
* Slave cpus that receive the MCA interrupt call down into SAL, they
25
  end up spinning disabled while the MCA is being serviced.
26
 
27
* If any slave cpu was already spinning disabled when the MCA occurred
28
  then it cannot service the MCA interrupt.  SAL waits ~20 seconds then
29
  sends an unmaskable INIT event to the slave cpus that have not
30
  already rendezvoused.
31
 
32
* Because MCA/INIT can be delivered at any time, including when the cpu
33
  is down in PAL in physical mode, the registers at the time of the
34
  event are _completely_ undefined.  In particular the MCA/INIT
35
  handlers cannot rely on the thread pointer, PAL physical mode can
36
  (and does) modify TP.  It is allowed to do that as long as it resets
37
  TP on return.  However MCA/INIT events expose us to these PAL
38
  internal TP changes.  Hence curr_task().
39
 
40
* If an MCA/INIT event occurs while the kernel was running (not user
41
  space) and the kernel has called PAL then the MCA/INIT handler cannot
42
  assume that the kernel stack is in a fit state to be used.  Mainly
43
  because PAL may or may not maintain the stack pointer internally.
44
  Because the MCA/INIT handlers cannot trust the kernel stack, they
45
  have to use their own, per-cpu stacks.  The MCA/INIT stacks are
46
  preformatted with just enough task state to let the relevant handlers
47
  do their job.
48
 
49
* Unlike most other architectures, the ia64 struct task is embedded in
50
  the kernel stack[1].  So switching to a new kernel stack means that
51
  we switch to a new task as well.  Because various bits of the kernel
52
  assume that current points into the struct task, switching to a new
53
  stack also means a new value for current.
54
 
55
* Once all slaves have rendezvoused and are spinning disabled, the
56
  monarch is entered.  The monarch now tries to diagnose the problem
57
  and decide if it can recover or not.
58
 
59
* Part of the monarch's job is to look at the state of all the other
60
  tasks.  The only way to do that on ia64 is to call the unwinder,
61
  as mandated by Intel.
62
 
63
* The starting point for the unwind depends on whether a task is
64
  running or not.  That is, whether it is on a cpu or is blocked.  The
65
  monarch has to determine whether or not a task is on a cpu before it
66
  knows how to start unwinding it.  The tasks that received an MCA or
67
  INIT event are no longer running, they have been converted to blocked
68
  tasks.  But (and its a big but), the cpus that received the MCA
69
  rendezvous interrupt are still running on their normal kernel stacks!
70
 
71
* To distinguish between these two cases, the monarch must know which
72
  tasks are on a cpu and which are not.  Hence each slave cpu that
73
  switches to an MCA/INIT stack, registers its new stack using
74
  set_curr_task(), so the monarch can tell that the _original_ task is
75
  no longer running on that cpu.  That gives us a decent chance of
76
  getting a valid backtrace of the _original_ task.
77
 
78
* MCA/INIT can be nested, to a depth of 2 on any cpu.  In the case of a
79
  nested error, we want diagnostics on the MCA/INIT handler that
80
  failed, not on the task that was originally running.  Again this
81
  requires set_curr_task() so the MCA/INIT handlers can register their
82
  own stack as running on that cpu.  Then a recursive error gets a
83
  trace of the failing handler's "task".
84
 
85
[1] My (Keith Owens) original design called for ia64 to separate its
86
    struct task and the kernel stacks.  Then the MCA/INIT data would be
87
    chained stacks like i386 interrupt stacks.  But that required
88
    radical surgery on the rest of ia64, plus extra hard wired TLB
89
    entries with its associated performance degradation.  David
90
    Mosberger vetoed that approach.  Which meant that separate kernel
91
    stacks meant separate "tasks" for the MCA/INIT handlers.
92
 
93
---
94
 
95
INIT is less complicated than MCA.  Pressing the nmi button or using
96
the equivalent command on the management console sends INIT to all
97
cpus.  SAL picks one of the cpus as the monarch and the rest are
98
slaves.  All the OS INIT handlers are entered at approximately the same
99
time.  The OS monarch prints the state of all tasks and returns, after
100
which the slaves return and the system resumes.
101
 
102
At least that is what is supposed to happen.  Alas there are broken
103
versions of SAL out there.  Some drive all the cpus as monarchs.  Some
104
drive them all as slaves.  Some drive one cpu as monarch, wait for that
105
cpu to return from the OS then drive the rest as slaves.  Some versions
106
of SAL cannot even cope with returning from the OS, they spin inside
107
SAL on resume.  The OS INIT code has workarounds for some of these
108
broken SAL symptoms, but some simply cannot be fixed from the OS side.
109
 
110
---
111
 
112
The scheduler hooks used by ia64 (curr_task, set_curr_task) are layer
113
violations.  Unfortunately MCA/INIT start off as massive layer
114
violations (can occur at _any_ time) and they build from there.
115
 
116
At least ia64 makes an attempt at recovering from hardware errors, but
117
it is a difficult problem because of the asynchronous nature of these
118
errors.  When processing an unmaskable interrupt we sometimes need
119
special code to cope with our inability to take any locks.
120
 
121
---
122
 
123
How is ia64 MCA/INIT different from x86 NMI?
124
 
125
* x86 NMI typically gets delivered to one cpu.  MCA/INIT gets sent to
126
  all cpus.
127
 
128
* x86 NMI cannot be nested.  MCA/INIT can be nested, to a depth of 2
129
  per cpu.
130
 
131
* x86 has a separate struct task which points to one of multiple kernel
132
  stacks.  ia64 has the struct task embedded in the single kernel
133
  stack, so switching stack means switching task.
134
 
135
* x86 does not call the BIOS so the NMI handler does not have to worry
136
  about any registers having changed.  MCA/INIT can occur while the cpu
137
  is in PAL in physical mode, with undefined registers and an undefined
138
  kernel stack.
139
 
140
* i386 backtrace is not very sensitive to whether a process is running
141
  or not.  ia64 unwind is very, very sensitive to whether a process is
142
  running or not.
143
 
144
---
145
 
146
What happens when MCA/INIT is delivered what a cpu is running user
147
space code?
148
 
149
The user mode registers are stored in the RSE area of the MCA/INIT on
150
entry to the OS and are restored from there on return to SAL, so user
151
mode registers are preserved across a recoverable MCA/INIT.  Since the
152
OS has no idea what unwind data is available for the user space stack,
153
MCA/INIT never tries to backtrace user space.  Which means that the OS
154
does not bother making the user space process look like a blocked task,
155
i.e. the OS does not copy pt_regs and switch_stack to the user space
156
stack.  Also the OS has no idea how big the user space RSE and memory
157
stacks are, which makes it too risky to copy the saved state to a user
158
mode stack.
159
 
160
---
161
 
162
How do we get a backtrace on the tasks that were running when MCA/INIT
163
was delivered?
164
 
165
mca.c:::ia64_mca_modify_original_stack().  That identifies and
166
verifies the original kernel stack, copies the dirty registers from
167
the MCA/INIT stack's RSE to the original stack's RSE, copies the
168
skeleton struct pt_regs and switch_stack to the original stack, fills
169
in the skeleton structures from the PAL minstate area and updates the
170
original stack's thread.ksp.  That makes the original stack look
171
exactly like any other blocked task, i.e. it now appears to be
172
sleeping.  To get a backtrace, just start with thread.ksp for the
173
original task and unwind like any other sleeping task.
174
 
175
---
176
 
177
How do we identify the tasks that were running when MCA/INIT was
178
delivered?
179
 
180
If the previous task has been verified and converted to a blocked
181
state, then sos->prev_task on the MCA/INIT stack is updated to point to
182
the previous task.  You can look at that field in dumps or debuggers.
183
To help distinguish between the handler and the original tasks,
184
handlers have _TIF_MCA_INIT set in thread_info.flags.
185
 
186
The sos data is always in the MCA/INIT handler stack, at offset
187
MCA_SOS_OFFSET.  You can get that value from mca_asm.h or calculate it
188
as KERNEL_STACK_SIZE - sizeof(struct pt_regs) - sizeof(struct
189
ia64_sal_os_state), with 16 byte alignment for all structures.
190
 
191
Also the comm field of the MCA/INIT task is modified to include the pid
192
of the original task, for humans to use.  For example, a comm field of
193
'MCA 12159' means that pid 12159 was running when the MCA was
194
delivered.

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.