1 |
3 |
xianfeng |
|
2 |
|
|
SCSI EH
|
3 |
|
|
======================================
|
4 |
|
|
|
5 |
|
|
This document describes SCSI midlayer error handling infrastructure.
|
6 |
|
|
Please refer to Documentation/scsi/scsi_mid_low_api.txt for more
|
7 |
|
|
information regarding SCSI midlayer.
|
8 |
|
|
|
9 |
|
|
TABLE OF CONTENTS
|
10 |
|
|
|
11 |
|
|
[1] How SCSI commands travel through the midlayer and to EH
|
12 |
|
|
[1-1] struct scsi_cmnd
|
13 |
|
|
[1-2] How do scmd's get completed?
|
14 |
|
|
[1-2-1] Completing a scmd w/ scsi_done
|
15 |
|
|
[1-2-2] Completing a scmd w/ timeout
|
16 |
|
|
[1-3] How EH takes over
|
17 |
|
|
[2] How SCSI EH works
|
18 |
|
|
[2-1] EH through fine-grained callbacks
|
19 |
|
|
[2-1-1] Overview
|
20 |
|
|
[2-1-2] Flow of scmds through EH
|
21 |
|
|
[2-1-3] Flow of control
|
22 |
|
|
[2-2] EH through transportt->eh_strategy_handler()
|
23 |
|
|
[2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions
|
24 |
|
|
[2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions
|
25 |
|
|
[2-2-3] Things to consider
|
26 |
|
|
|
27 |
|
|
|
28 |
|
|
[1] How SCSI commands travel through the midlayer and to EH
|
29 |
|
|
|
30 |
|
|
[1-1] struct scsi_cmnd
|
31 |
|
|
|
32 |
|
|
Each SCSI command is represented with struct scsi_cmnd (== scmd). A
|
33 |
|
|
scmd has two list_head's to link itself into lists. The two are
|
34 |
|
|
scmd->list and scmd->eh_entry. The former is used for free list or
|
35 |
|
|
per-device allocated scmd list and not of much interest to this EH
|
36 |
|
|
discussion. The latter is used for completion and EH lists and unless
|
37 |
|
|
otherwise stated scmds are always linked using scmd->eh_entry in this
|
38 |
|
|
discussion.
|
39 |
|
|
|
40 |
|
|
|
41 |
|
|
[1-2] How do scmd's get completed?
|
42 |
|
|
|
43 |
|
|
Once LLDD gets hold of a scmd, either the LLDD will complete the
|
44 |
|
|
command by calling scsi_done callback passed from midlayer when
|
45 |
|
|
invoking hostt->queuecommand() or SCSI midlayer will time it out.
|
46 |
|
|
|
47 |
|
|
|
48 |
|
|
[1-2-1] Completing a scmd w/ scsi_done
|
49 |
|
|
|
50 |
|
|
For all non-EH commands, scsi_done() is the completion callback. It
|
51 |
|
|
does the following.
|
52 |
|
|
|
53 |
|
|
1. Delete timeout timer. If it fails, it means that timeout timer
|
54 |
|
|
has expired and is going to finish the command. Just return.
|
55 |
|
|
|
56 |
|
|
2. Link scmd to per-cpu scsi_done_q using scmd->en_entry
|
57 |
|
|
|
58 |
|
|
3. Raise SCSI_SOFTIRQ
|
59 |
|
|
|
60 |
|
|
SCSI_SOFTIRQ handler scsi_softirq calls scsi_decide_disposition() to
|
61 |
|
|
determine what to do with the command. scsi_decide_disposition()
|
62 |
|
|
looks at the scmd->result value and sense data to determine what to do
|
63 |
|
|
with the command.
|
64 |
|
|
|
65 |
|
|
- SUCCESS
|
66 |
|
|
scsi_finish_command() is invoked for the command. The
|
67 |
|
|
function does some maintenance choirs and notify completion by
|
68 |
|
|
calling scmd->done() callback, which, for fs requests, would
|
69 |
|
|
be HLD completion callback - sd:sd_rw_intr, sr:rw_intr,
|
70 |
|
|
st:st_intr.
|
71 |
|
|
|
72 |
|
|
- NEEDS_RETRY
|
73 |
|
|
- ADD_TO_MLQUEUE
|
74 |
|
|
scmd is requeued to blk queue.
|
75 |
|
|
|
76 |
|
|
- otherwise
|
77 |
|
|
scsi_eh_scmd_add(scmd, 0) is invoked for the command. See
|
78 |
|
|
[1-3] for details of this function.
|
79 |
|
|
|
80 |
|
|
|
81 |
|
|
[1-2-2] Completing a scmd w/ timeout
|
82 |
|
|
|
83 |
|
|
The timeout handler is scsi_times_out(). When a timeout occurs, this
|
84 |
|
|
function
|
85 |
|
|
|
86 |
|
|
1. invokes optional hostt->eh_timed_out() callback. Return value can
|
87 |
|
|
be one of
|
88 |
|
|
|
89 |
|
|
- EH_HANDLED
|
90 |
|
|
This indicates that eh_timed_out() dealt with the timeout. The
|
91 |
|
|
scmd is passed to __scsi_done() and thus linked into per-cpu
|
92 |
|
|
scsi_done_q. Normal command completion described in [1-2-1]
|
93 |
|
|
follows.
|
94 |
|
|
|
95 |
|
|
- EH_RESET_TIMER
|
96 |
|
|
This indicates that more time is required to finish the
|
97 |
|
|
command. Timer is restarted. This action is counted as a
|
98 |
|
|
retry and only allowed scmd->allowed + 1(!) times. Once the
|
99 |
|
|
limit is reached, action for EH_NOT_HANDLED is taken instead.
|
100 |
|
|
|
101 |
|
|
*NOTE* This action is racy as the LLDD could finish the scmd
|
102 |
|
|
after the timeout has expired but before it's added back. In
|
103 |
|
|
such cases, scsi_done() would think that timeout has occurred
|
104 |
|
|
and return without doing anything. We lose completion and the
|
105 |
|
|
command will time out again.
|
106 |
|
|
|
107 |
|
|
- EH_NOT_HANDLED
|
108 |
|
|
This is the same as when eh_timed_out() callback doesn't exist.
|
109 |
|
|
Step #2 is taken.
|
110 |
|
|
|
111 |
|
|
2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
|
112 |
|
|
command. See [1-3] for more information.
|
113 |
|
|
|
114 |
|
|
|
115 |
|
|
[1-3] How EH takes over
|
116 |
|
|
|
117 |
|
|
scmds enter EH via scsi_eh_scmd_add(), which does the following.
|
118 |
|
|
|
119 |
|
|
1. Turns on scmd->eh_eflags as requested. It's 0 for error
|
120 |
|
|
completions and SCSI_EH_CANCEL_CMD for timeouts.
|
121 |
|
|
|
122 |
|
|
2. Links scmd->eh_entry to shost->eh_cmd_q
|
123 |
|
|
|
124 |
|
|
3. Sets SHOST_RECOVERY bit in shost->shost_state
|
125 |
|
|
|
126 |
|
|
4. Increments shost->host_failed
|
127 |
|
|
|
128 |
|
|
5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
|
129 |
|
|
|
130 |
|
|
As can be seen above, once any scmd is added to shost->eh_cmd_q,
|
131 |
|
|
SHOST_RECOVERY shost_state bit is turned on. This prevents any new
|
132 |
|
|
scmd to be issued from blk queue to the host; eventually, all scmds on
|
133 |
|
|
the host either complete normally, fail and get added to eh_cmd_q, or
|
134 |
|
|
time out and get added to shost->eh_cmd_q.
|
135 |
|
|
|
136 |
|
|
If all scmds either complete or fail, the number of in-flight scmds
|
137 |
|
|
becomes equal to the number of failed scmds - i.e. shost->host_busy ==
|
138 |
|
|
shost->host_failed. This wakes up SCSI EH thread. So, once woken up,
|
139 |
|
|
SCSI EH thread can expect that all in-flight commands have failed and
|
140 |
|
|
are linked on shost->eh_cmd_q.
|
141 |
|
|
|
142 |
|
|
Note that this does not mean lower layers are quiescent. If a LLDD
|
143 |
|
|
completed a scmd with error status, the LLDD and lower layers are
|
144 |
|
|
assumed to forget about the scmd at that point. However, if a scmd
|
145 |
|
|
has timed out, unless hostt->eh_timed_out() made lower layers forget
|
146 |
|
|
about the scmd, which currently no LLDD does, the command is still
|
147 |
|
|
active as long as lower layers are concerned and completion could
|
148 |
|
|
occur at any time. Of course, all such completions are ignored as the
|
149 |
|
|
timer has already expired.
|
150 |
|
|
|
151 |
|
|
We'll talk about how SCSI EH takes actions to abort - make LLDD
|
152 |
|
|
forget about - timed out scmds later.
|
153 |
|
|
|
154 |
|
|
|
155 |
|
|
[2] How SCSI EH works
|
156 |
|
|
|
157 |
|
|
LLDD's can implement SCSI EH actions in one of the following two
|
158 |
|
|
ways.
|
159 |
|
|
|
160 |
|
|
- Fine-grained EH callbacks
|
161 |
|
|
LLDD can implement fine-grained EH callbacks and let SCSI
|
162 |
|
|
midlayer drive error handling and call appropriate callbacks.
|
163 |
|
|
This will be discussed further in [2-1].
|
164 |
|
|
|
165 |
|
|
- eh_strategy_handler() callback
|
166 |
|
|
This is one big callback which should perform whole error
|
167 |
|
|
handling. As such, it should do all choirs SCSI midlayer
|
168 |
|
|
performs during recovery. This will be discussed in [2-2].
|
169 |
|
|
|
170 |
|
|
Once recovery is complete, SCSI EH resumes normal operation by
|
171 |
|
|
calling scsi_restart_operations(), which
|
172 |
|
|
|
173 |
|
|
1. Checks if door locking is needed and locks door.
|
174 |
|
|
|
175 |
|
|
2. Clears SHOST_RECOVERY shost_state bit
|
176 |
|
|
|
177 |
|
|
3. Wakes up waiters on shost->host_wait. This occurs if someone
|
178 |
|
|
calls scsi_block_when_processing_errors() on the host.
|
179 |
|
|
(*QUESTION* why is it needed? All operations will be blocked
|
180 |
|
|
anyway after it reaches blk queue.)
|
181 |
|
|
|
182 |
|
|
4. Kicks queues in all devices on the host in the asses
|
183 |
|
|
|
184 |
|
|
|
185 |
|
|
[2-1] EH through fine-grained callbacks
|
186 |
|
|
|
187 |
|
|
[2-1-1] Overview
|
188 |
|
|
|
189 |
|
|
If eh_strategy_handler() is not present, SCSI midlayer takes charge
|
190 |
|
|
of driving error handling. EH's goals are two - make LLDD, host and
|
191 |
|
|
device forget about timed out scmds and make them ready for new
|
192 |
|
|
commands. A scmd is said to be recovered if the scmd is forgotten by
|
193 |
|
|
lower layers and lower layers are ready to process or fail the scmd
|
194 |
|
|
again.
|
195 |
|
|
|
196 |
|
|
To achieve these goals, EH performs recovery actions with increasing
|
197 |
|
|
severity. Some actions are performed by issuing SCSI commands and
|
198 |
|
|
others are performed by invoking one of the following fine-grained
|
199 |
|
|
hostt EH callbacks. Callbacks may be omitted and omitted ones are
|
200 |
|
|
considered to fail always.
|
201 |
|
|
|
202 |
|
|
int (* eh_abort_handler)(struct scsi_cmnd *);
|
203 |
|
|
int (* eh_device_reset_handler)(struct scsi_cmnd *);
|
204 |
|
|
int (* eh_bus_reset_handler)(struct scsi_cmnd *);
|
205 |
|
|
int (* eh_host_reset_handler)(struct scsi_cmnd *);
|
206 |
|
|
|
207 |
|
|
Higher-severity actions are taken only when lower-severity actions
|
208 |
|
|
cannot recover some of failed scmds. Also, note that failure of the
|
209 |
|
|
highest-severity action means EH failure and results in offlining of
|
210 |
|
|
all unrecovered devices.
|
211 |
|
|
|
212 |
|
|
During recovery, the following rules are followed
|
213 |
|
|
|
214 |
|
|
- Recovery actions are performed on failed scmds on the to do list,
|
215 |
|
|
eh_work_q. If a recovery action succeeds for a scmd, recovered
|
216 |
|
|
scmds are removed from eh_work_q.
|
217 |
|
|
|
218 |
|
|
Note that single recovery action on a scmd can recover multiple
|
219 |
|
|
scmds. e.g. resetting a device recovers all failed scmds on the
|
220 |
|
|
device.
|
221 |
|
|
|
222 |
|
|
- Higher severity actions are taken iff eh_work_q is not empty after
|
223 |
|
|
lower severity actions are complete.
|
224 |
|
|
|
225 |
|
|
- EH reuses failed scmds to issue commands for recovery. For
|
226 |
|
|
timed-out scmds, SCSI EH ensures that LLDD forgets about a scmd
|
227 |
|
|
before reusing it for EH commands.
|
228 |
|
|
|
229 |
|
|
When a scmd is recovered, the scmd is moved from eh_work_q to EH
|
230 |
|
|
local eh_done_q using scsi_eh_finish_cmd(). After all scmds are
|
231 |
|
|
recovered (eh_work_q is empty), scsi_eh_flush_done_q() is invoked to
|
232 |
|
|
either retry or error-finish (notify upper layer of failure) recovered
|
233 |
|
|
scmds.
|
234 |
|
|
|
235 |
|
|
scmds are retried iff its sdev is still online (not offlined during
|
236 |
|
|
EH), REQ_FAILFAST is not set and ++scmd->retries is less than
|
237 |
|
|
scmd->allowed.
|
238 |
|
|
|
239 |
|
|
|
240 |
|
|
[2-1-2] Flow of scmds through EH
|
241 |
|
|
|
242 |
|
|
1. Error completion / time out
|
243 |
|
|
ACTION: scsi_eh_scmd_add() is invoked for scmd
|
244 |
|
|
- set scmd->eh_eflags
|
245 |
|
|
- add scmd to shost->eh_cmd_q
|
246 |
|
|
- set SHOST_RECOVERY
|
247 |
|
|
- shost->host_failed++
|
248 |
|
|
LOCKING: shost->host_lock
|
249 |
|
|
|
250 |
|
|
2. EH starts
|
251 |
|
|
ACTION: move all scmds to EH's local eh_work_q. shost->eh_cmd_q
|
252 |
|
|
is cleared.
|
253 |
|
|
LOCKING: shost->host_lock (not strictly necessary, just for
|
254 |
|
|
consistency)
|
255 |
|
|
|
256 |
|
|
3. scmd recovered
|
257 |
|
|
ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
|
258 |
|
|
- shost->host_failed--
|
259 |
|
|
- clear scmd->eh_eflags
|
260 |
|
|
- scsi_setup_cmd_retry()
|
261 |
|
|
- move from local eh_work_q to local eh_done_q
|
262 |
|
|
LOCKING: none
|
263 |
|
|
|
264 |
|
|
4. EH completes
|
265 |
|
|
ACTION: scsi_eh_flush_done_q() retries scmds or notifies upper
|
266 |
|
|
layer of failure.
|
267 |
|
|
- scmd is removed from eh_done_q and scmd->eh_entry is cleared
|
268 |
|
|
- if retry is necessary, scmd is requeued using
|
269 |
|
|
scsi_queue_insert()
|
270 |
|
|
- otherwise, scsi_finish_command() is invoked for scmd
|
271 |
|
|
LOCKING: queue or finish function performs appropriate locking
|
272 |
|
|
|
273 |
|
|
|
274 |
|
|
[2-1-3] Flow of control
|
275 |
|
|
|
276 |
|
|
EH through fine-grained callbacks start from scsi_unjam_host().
|
277 |
|
|
|
278 |
|
|
<>
|
279 |
|
|
|
280 |
|
|
1. Lock shost->host_lock, splice_init shost->eh_cmd_q into local
|
281 |
|
|
eh_work_q and unlock host_lock. Note that shost->eh_cmd_q is
|
282 |
|
|
cleared by this action.
|
283 |
|
|
|
284 |
|
|
2. Invoke scsi_eh_get_sense.
|
285 |
|
|
|
286 |
|
|
<>
|
287 |
|
|
|
288 |
|
|
This action is taken for each error-completed
|
289 |
|
|
(!SCSI_EH_CANCEL_CMD) commands without valid sense data. Most
|
290 |
|
|
SCSI transports/LLDDs automatically acquire sense data on
|
291 |
|
|
command failures (autosense). Autosense is recommended for
|
292 |
|
|
performance reasons and as sense information could get out of
|
293 |
|
|
sync inbetween occurrence of CHECK CONDITION and this action.
|
294 |
|
|
|
295 |
|
|
Note that if autosense is not supported, scmd->sense_buffer
|
296 |
|
|
contains invalid sense data when error-completing the scmd
|
297 |
|
|
with scsi_done(). scsi_decide_disposition() always returns
|
298 |
|
|
FAILED in such cases thus invoking SCSI EH. When the scmd
|
299 |
|
|
reaches here, sense data is acquired and
|
300 |
|
|
scsi_decide_disposition() is called again.
|
301 |
|
|
|
302 |
|
|
1. Invoke scsi_request_sense() which issues REQUEST_SENSE
|
303 |
|
|
command. If fails, no action. Note that taking no action
|
304 |
|
|
causes higher-severity recovery to be taken for the scmd.
|
305 |
|
|
|
306 |
|
|
2. Invoke scsi_decide_disposition() on the scmd
|
307 |
|
|
|
308 |
|
|
- SUCCESS
|
309 |
|
|
scmd->retries is set to scmd->allowed preventing
|
310 |
|
|
scsi_eh_flush_done_q() from retrying the scmd and
|
311 |
|
|
scsi_eh_finish_cmd() is invoked.
|
312 |
|
|
|
313 |
|
|
- NEEDS_RETRY
|
314 |
|
|
scsi_eh_finish_cmd() invoked
|
315 |
|
|
|
316 |
|
|
- otherwise
|
317 |
|
|
No action.
|
318 |
|
|
|
319 |
|
|
3. If !list_empty(&eh_work_q), invoke scsi_eh_abort_cmds().
|
320 |
|
|
|
321 |
|
|
<>
|
322 |
|
|
|
323 |
|
|
This action is taken for each timed out command.
|
324 |
|
|
hostt->eh_abort_handler() is invoked for each scmd. The
|
325 |
|
|
handler returns SUCCESS if it has succeeded to make LLDD and
|
326 |
|
|
all related hardware forget about the scmd.
|
327 |
|
|
|
328 |
|
|
If a timedout scmd is successfully aborted and the sdev is
|
329 |
|
|
either offline or ready, scsi_eh_finish_cmd() is invoked for
|
330 |
|
|
the scmd. Otherwise, the scmd is left in eh_work_q for
|
331 |
|
|
higher-severity actions.
|
332 |
|
|
|
333 |
|
|
Note that both offline and ready status mean that the sdev is
|
334 |
|
|
ready to process new scmds, where processing also implies
|
335 |
|
|
immediate failing; thus, if a sdev is in one of the two
|
336 |
|
|
states, no further recovery action is needed.
|
337 |
|
|
|
338 |
|
|
Device readiness is tested using scsi_eh_tur() which issues
|
339 |
|
|
TEST_UNIT_READY command. Note that the scmd must have been
|
340 |
|
|
aborted successfully before reusing it for TEST_UNIT_READY.
|
341 |
|
|
|
342 |
|
|
4. If !list_empty(&eh_work_q), invoke scsi_eh_ready_devs()
|
343 |
|
|
|
344 |
|
|
<>
|
345 |
|
|
|
346 |
|
|
This function takes four increasingly more severe measures to
|
347 |
|
|
make failed sdevs ready for new commands.
|
348 |
|
|
|
349 |
|
|
1. Invoke scsi_eh_stu()
|
350 |
|
|
|
351 |
|
|
<>
|
352 |
|
|
|
353 |
|
|
For each sdev which has failed scmds with valid sense data
|
354 |
|
|
of which scsi_check_sense()'s verdict is FAILED,
|
355 |
|
|
START_STOP_UNIT command is issued w/ start=1. Note that
|
356 |
|
|
as we explicitly choose error-completed scmds, it is known
|
357 |
|
|
that lower layers have forgotten about the scmd and we can
|
358 |
|
|
reuse it for STU.
|
359 |
|
|
|
360 |
|
|
If STU succeeds and the sdev is either offline or ready,
|
361 |
|
|
all failed scmds on the sdev are EH-finished with
|
362 |
|
|
scsi_eh_finish_cmd().
|
363 |
|
|
|
364 |
|
|
*NOTE* If hostt->eh_abort_handler() isn't implemented or
|
365 |
|
|
failed, we may still have timed out scmds at this point
|
366 |
|
|
and STU doesn't make lower layers forget about those
|
367 |
|
|
scmds. Yet, this function EH-finish all scmds on the sdev
|
368 |
|
|
if STU succeeds leaving lower layers in an inconsistent
|
369 |
|
|
state. It seems that STU action should be taken only when
|
370 |
|
|
a sdev has no timed out scmd.
|
371 |
|
|
|
372 |
|
|
2. If !list_empty(&eh_work_q), invoke scsi_eh_bus_device_reset().
|
373 |
|
|
|
374 |
|
|
<>
|
375 |
|
|
|
376 |
|
|
This action is very similar to scsi_eh_stu() except that,
|
377 |
|
|
instead of issuing STU, hostt->eh_device_reset_handler()
|
378 |
|
|
is used. Also, as we're not issuing SCSI commands and
|
379 |
|
|
resetting clears all scmds on the sdev, there is no need
|
380 |
|
|
to choose error-completed scmds.
|
381 |
|
|
|
382 |
|
|
3. If !list_empty(&eh_work_q), invoke scsi_eh_bus_reset()
|
383 |
|
|
|
384 |
|
|
<>
|
385 |
|
|
|
386 |
|
|
hostt->eh_bus_reset_handler() is invoked for each channel
|
387 |
|
|
with failed scmds. If bus reset succeeds, all failed
|
388 |
|
|
scmds on all ready or offline sdevs on the channel are
|
389 |
|
|
EH-finished.
|
390 |
|
|
|
391 |
|
|
4. If !list_empty(&eh_work_q), invoke scsi_eh_host_reset()
|
392 |
|
|
|
393 |
|
|
<>
|
394 |
|
|
|
395 |
|
|
This is the last resort. hostt->eh_host_reset_handler()
|
396 |
|
|
is invoked. If host reset succeeds, all failed scmds on
|
397 |
|
|
all ready or offline sdevs on the host are EH-finished.
|
398 |
|
|
|
399 |
|
|
5. If !list_empty(&eh_work_q), invoke scsi_eh_offline_sdevs()
|
400 |
|
|
|
401 |
|
|
<>
|
402 |
|
|
|
403 |
|
|
Take all sdevs which still have unrecovered scmds offline
|
404 |
|
|
and EH-finish the scmds.
|
405 |
|
|
|
406 |
|
|
5. Invoke scsi_eh_flush_done_q().
|
407 |
|
|
|
408 |
|
|
<>
|
409 |
|
|
|
410 |
|
|
At this point all scmds are recovered (or given up) and
|
411 |
|
|
put on eh_done_q by scsi_eh_finish_cmd(). This function
|
412 |
|
|
flushes eh_done_q by either retrying or notifying upper
|
413 |
|
|
layer of failure of the scmds.
|
414 |
|
|
|
415 |
|
|
|
416 |
|
|
[2-2] EH through transportt->eh_strategy_handler()
|
417 |
|
|
|
418 |
|
|
transportt->eh_strategy_handler() is invoked in the place of
|
419 |
|
|
scsi_unjam_host() and it is responsible for whole recovery process.
|
420 |
|
|
On completion, the handler should have made lower layers forget about
|
421 |
|
|
all failed scmds and either ready for new commands or offline. Also,
|
422 |
|
|
it should perform SCSI EH maintenance choirs to maintain integrity of
|
423 |
|
|
SCSI midlayer. IOW, of the steps described in [2-1-2], all steps
|
424 |
|
|
except for #1 must be implemented by eh_strategy_handler().
|
425 |
|
|
|
426 |
|
|
|
427 |
|
|
[2-2-1] Pre transportt->eh_strategy_handler() SCSI midlayer conditions
|
428 |
|
|
|
429 |
|
|
The following conditions are true on entry to the handler.
|
430 |
|
|
|
431 |
|
|
- Each failed scmd's eh_flags field is set appropriately.
|
432 |
|
|
|
433 |
|
|
- Each failed scmd is linked on scmd->eh_cmd_q by scmd->eh_entry.
|
434 |
|
|
|
435 |
|
|
- SHOST_RECOVERY is set.
|
436 |
|
|
|
437 |
|
|
- shost->host_failed == shost->host_busy
|
438 |
|
|
|
439 |
|
|
|
440 |
|
|
[2-2-2] Post transportt->eh_strategy_handler() SCSI midlayer conditions
|
441 |
|
|
|
442 |
|
|
The following conditions must be true on exit from the handler.
|
443 |
|
|
|
444 |
|
|
- shost->host_failed is zero.
|
445 |
|
|
|
446 |
|
|
- Each scmd's eh_eflags field is cleared.
|
447 |
|
|
|
448 |
|
|
- Each scmd is in such a state that scsi_setup_cmd_retry() on the
|
449 |
|
|
scmd doesn't make any difference.
|
450 |
|
|
|
451 |
|
|
- shost->eh_cmd_q is cleared.
|
452 |
|
|
|
453 |
|
|
- Each scmd->eh_entry is cleared.
|
454 |
|
|
|
455 |
|
|
- Either scsi_queue_insert() or scsi_finish_command() is called on
|
456 |
|
|
each scmd. Note that the handler is free to use scmd->retries and
|
457 |
|
|
->allowed to limit the number of retries.
|
458 |
|
|
|
459 |
|
|
|
460 |
|
|
[2-2-3] Things to consider
|
461 |
|
|
|
462 |
|
|
- Know that timed out scmds are still active on lower layers. Make
|
463 |
|
|
lower layers forget about them before doing anything else with
|
464 |
|
|
those scmds.
|
465 |
|
|
|
466 |
|
|
- For consistency, when accessing/modifying shost data structure,
|
467 |
|
|
grab shost->host_lock.
|
468 |
|
|
|
469 |
|
|
- On completion, each failed sdev must have forgotten about all
|
470 |
|
|
active scmds.
|
471 |
|
|
|
472 |
|
|
- On completion, each failed sdev must be ready for new commands or
|
473 |
|
|
offline.
|
474 |
|
|
|
475 |
|
|
|
476 |
|
|
--
|
477 |
|
|
Tejun Heo
|
478 |
|
|
htejun@gmail.com
|
479 |
|
|
11th September 2005
|