1 |
1275 |
phoenix |
The Linux Watchdog driver API.
|
2 |
|
|
|
3 |
|
|
Copyright 2002 Christer Weingel
|
4 |
|
|
|
5 |
|
|
Some parts of this document are copied verbatim from the sbc60xxwdt
|
6 |
|
|
driver which is (c) Copyright 2000 Jakob Oestergaard
|
7 |
|
|
|
8 |
|
|
This document describes the state of the Linux 2.4.18 kernel.
|
9 |
|
|
|
10 |
|
|
Introduction:
|
11 |
|
|
|
12 |
|
|
A Watchdog Timer (WDT) is a hardware circuit that can reset the
|
13 |
|
|
computer system in case of a software fault. You probably knew that
|
14 |
|
|
already.
|
15 |
|
|
|
16 |
|
|
Usually a userspace daemon will notify the kernel watchdog driver via the
|
17 |
|
|
/dev/watchdog special device file that userspace is still alive, at
|
18 |
|
|
regular intervals. When such a notification occurs, the driver will
|
19 |
|
|
usually tell the hardware watchdog that everything is in order, and
|
20 |
|
|
that the watchdog should wait for yet another little while to reset
|
21 |
|
|
the system. If userspace fails (RAM error, kernel bug, whatever), the
|
22 |
|
|
notifications cease to occur, and the hardware watchdog will reset the
|
23 |
|
|
system (causing a reboot) after the timeout occurs.
|
24 |
|
|
|
25 |
|
|
The Linux watchdog API is a rather AD hoc construction and different
|
26 |
|
|
drivers implement different, and sometimes incompatible, parts of it.
|
27 |
|
|
This file is an attempt to document the existing usage and allow
|
28 |
|
|
future driver writers to use it as a reference.
|
29 |
|
|
|
30 |
|
|
The simplest API:
|
31 |
|
|
|
32 |
|
|
All drivers support the basic mode of operation, where the watchdog
|
33 |
|
|
activates as soon as /dev/watchdog is opened and will reboot unless
|
34 |
|
|
the watchdog is pinged within a certain time, this time is called the
|
35 |
|
|
timeout or margin. The simplest way to ping the watchdog is to write
|
36 |
|
|
some data to the device. So a very simple watchdog daemon would look
|
37 |
|
|
like this:
|
38 |
|
|
|
39 |
|
|
int main(int argc, const char *argv[]) {
|
40 |
|
|
int fd=open("/dev/watchdog",O_WRONLY);
|
41 |
|
|
if (fd==-1) {
|
42 |
|
|
perror("watchdog");
|
43 |
|
|
exit(1);
|
44 |
|
|
}
|
45 |
|
|
while(1) {
|
46 |
|
|
write(fd, "\0", 1);
|
47 |
|
|
sleep(10);
|
48 |
|
|
}
|
49 |
|
|
}
|
50 |
|
|
|
51 |
|
|
A more advanced driver could for example check that a HTTP server is
|
52 |
|
|
still responding before doing the write call to ping the watchdog.
|
53 |
|
|
|
54 |
|
|
When the device is closed, the watchdog is disabled. This is not
|
55 |
|
|
always such a good idea, since if there is a bug in the watchdog
|
56 |
|
|
daemon and it crashes the system will not reboot. Because of this,
|
57 |
|
|
some of the drivers support the configuration option "Disable watchdog
|
58 |
|
|
shutdown on close", CONFIG_WATCHDOG_NOWAYOUT. If it is set to Y when
|
59 |
|
|
compiling the kernel, there is no way of disabling the watchdog once
|
60 |
|
|
it has been started. So, if the watchdog dameon crashes, the system
|
61 |
|
|
will reboot after the timeout has passed.
|
62 |
|
|
|
63 |
|
|
Some other drivers will not disable the watchdog, unless a specific
|
64 |
|
|
magic character 'V' has been sent /dev/watchdog just before closing
|
65 |
|
|
the file. If the userspace daemon closes the file without sending
|
66 |
|
|
this special character, the driver will assume that the daemon (and
|
67 |
|
|
userspace in general) died, and will stop pinging the watchdog without
|
68 |
|
|
disabling it first. This will then cause a reboot.
|
69 |
|
|
|
70 |
|
|
The ioctl API:
|
71 |
|
|
|
72 |
|
|
All conforming drivers also support an ioctl API.
|
73 |
|
|
|
74 |
|
|
Pinging the watchdog using an ioctl:
|
75 |
|
|
|
76 |
|
|
All drivers that have an ioctl interface support at least one ioctl,
|
77 |
|
|
KEEPALIVE. This ioctl does exactly the same thing as a write to the
|
78 |
|
|
watchdog device, so the main loop in the above program could be
|
79 |
|
|
replaced with:
|
80 |
|
|
|
81 |
|
|
while (1) {
|
82 |
|
|
ioctl(fd, WDIOC_KEEPALIVE, 0);
|
83 |
|
|
sleep(10);
|
84 |
|
|
}
|
85 |
|
|
|
86 |
|
|
the argument to the ioctl is ignored.
|
87 |
|
|
|
88 |
|
|
Setting and getting the timeout:
|
89 |
|
|
|
90 |
|
|
For some drivers it is possible to modify the watchdog timeout on the
|
91 |
|
|
fly with the SETTIMEOUT ioctl, those drivers have the WDIOF_SETTIMEOUT
|
92 |
|
|
flag set in their option field. The argument is an integer
|
93 |
|
|
representing the timeout in seconds. The driver returns the real
|
94 |
|
|
timeout used in the same variable, and this timeout might differ from
|
95 |
|
|
the requested one due to limitation of the hardware.
|
96 |
|
|
|
97 |
|
|
int timeout = 45;
|
98 |
|
|
ioctl(fd, WDIOC_SETTIMEOUT, &timeout);
|
99 |
|
|
printf("The timeout was set to %d seconds\n", timeout);
|
100 |
|
|
|
101 |
|
|
This example might actually print "The timeout was set to 60 seconds"
|
102 |
|
|
if the device has a granularity of minutes for its timeout.
|
103 |
|
|
|
104 |
|
|
Starting with the Linux 2.4.18 kernel, it is possible to query the
|
105 |
|
|
current timeout using the GETTIMEOUT ioctl.
|
106 |
|
|
|
107 |
|
|
ioctl(fd, WDIOC_GETTIMEOUT, &timeout);
|
108 |
|
|
printf("The timeout was is %d seconds\n", timeout);
|
109 |
|
|
|
110 |
|
|
Envinronmental monitoring:
|
111 |
|
|
|
112 |
|
|
All watchdog drivers are required return more information about the system,
|
113 |
|
|
some do temperature, fan and power level monitoring, some can tell you
|
114 |
|
|
the reason for the last reboot of the system. The GETSUPPORT ioctl is
|
115 |
|
|
available to ask what the device can do:
|
116 |
|
|
|
117 |
|
|
struct watchdog_info ident;
|
118 |
|
|
ioctl(fd, WDIOC_GETSUPPORT, &ident);
|
119 |
|
|
|
120 |
|
|
the fields returned in the ident struct are:
|
121 |
|
|
|
122 |
|
|
identity a string identifying the watchdog driver
|
123 |
|
|
firmware_version the firmware version of the card if available
|
124 |
|
|
options a flags describing what the device supports
|
125 |
|
|
|
126 |
|
|
the options field can have the following bits set, and describes what
|
127 |
|
|
kind of information that the GET_STATUS and GET_BOOT_STATUS ioctls can
|
128 |
|
|
return. [FIXME -- Is this correct?]
|
129 |
|
|
|
130 |
|
|
WDIOF_OVERHEAT Reset due to CPU overheat
|
131 |
|
|
|
132 |
|
|
The machine was last rebooted by the watchdog because the thermal limit was
|
133 |
|
|
exceeded
|
134 |
|
|
|
135 |
|
|
WDIOF_FANFAULT Fan failed
|
136 |
|
|
|
137 |
|
|
A system fan monitored by the watchdog card has failed
|
138 |
|
|
|
139 |
|
|
WDIOF_EXTERN1 External relay 1
|
140 |
|
|
|
141 |
|
|
External monitoring relay/source 1 was triggered. Controllers intended for
|
142 |
|
|
real world applications include external monitoring pins that will trigger
|
143 |
|
|
a reset.
|
144 |
|
|
|
145 |
|
|
WDIOF_EXTERN2 External relay 2
|
146 |
|
|
|
147 |
|
|
External monitoring relay/source 2 was triggered
|
148 |
|
|
|
149 |
|
|
WDIOF_POWERUNDER Power bad/power fault
|
150 |
|
|
|
151 |
|
|
The machine is showing an undervoltage status
|
152 |
|
|
|
153 |
|
|
WDIOF_CARDRESET Card previously reset the CPU
|
154 |
|
|
|
155 |
|
|
The last reboot was caused by the watchdog card
|
156 |
|
|
|
157 |
|
|
WDIOF_POWEROVER Power over voltage
|
158 |
|
|
|
159 |
|
|
The machine is showing an overvoltage status. Note that if one level is
|
160 |
|
|
under and one over both bits will be set - this may seem odd but makes
|
161 |
|
|
sense.
|
162 |
|
|
|
163 |
|
|
WDIOF_KEEPALIVEPING Keep alive ping reply
|
164 |
|
|
|
165 |
|
|
The watchdog saw a keepalive ping since it was last queried.
|
166 |
|
|
|
167 |
|
|
WDIOF_SETTIMEOUT Can set/get the timeout
|
168 |
|
|
|
169 |
|
|
|
170 |
|
|
For those drivers that return any bits set in the option field, the
|
171 |
|
|
GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current
|
172 |
|
|
status, and the status at the last reboot, respectively.
|
173 |
|
|
|
174 |
|
|
int flags;
|
175 |
|
|
ioctl(fd, WDIOC_GETSTATUS, &flags);
|
176 |
|
|
|
177 |
|
|
or
|
178 |
|
|
|
179 |
|
|
ioctl(fd, WDIOC_GETBOOTSTATUS, &flags);
|
180 |
|
|
|
181 |
|
|
Note that not all devices support these two calls, and some only
|
182 |
|
|
support the GETBOOTSTATUS call.
|
183 |
|
|
|
184 |
|
|
Some drivers can measure the temperature using the GETTEMP ioctl. The
|
185 |
|
|
returned value is the temperature in degrees farenheit.
|
186 |
|
|
|
187 |
|
|
int temperature;
|
188 |
|
|
ioctl(fd, WDIOC_GETTEMP, &temperature);
|
189 |
|
|
|
190 |
|
|
Finally the SETOPTIONS ioctl can be used to control some aspects of
|
191 |
|
|
the cards operation; right now the pcwd driver is the only one
|
192 |
|
|
supporting thiss ioctl.
|
193 |
|
|
|
194 |
|
|
int options = 0;
|
195 |
|
|
ioctl(fd, WDIOC_SETOPTIONS, options);
|
196 |
|
|
|
197 |
|
|
The following options are available:
|
198 |
|
|
|
199 |
|
|
WDIOS_DISABLECARD Turn off the watchdog timer
|
200 |
|
|
WDIOS_ENABLECARD Turn on the watchdog timer
|
201 |
|
|
WDIOS_TEMPPANIC Kernel panic on temperature trip
|
202 |
|
|
|
203 |
|
|
[FIXME -- better explanations]
|
204 |
|
|
|
205 |
|
|
Implementations in the current drivers in the kernel tree:
|
206 |
|
|
|
207 |
|
|
Here I have tried to summarize what the different drivers support and
|
208 |
|
|
where they do strange things compared to the other drivers.
|
209 |
|
|
|
210 |
|
|
acquirewdt.c -- Acquire Single Board Computer
|
211 |
|
|
|
212 |
|
|
This driver has a hardcoded timeout of 1 minute
|
213 |
|
|
|
214 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT
|
215 |
|
|
|
216 |
|
|
GETSUPPORT returns KEEPALIVEPING. GETSTATUS will return 1 if
|
217 |
|
|
the device is open, 0 if not. [FIXME -- isn't this rather
|
218 |
|
|
silly? To be able to use the ioctl, the device must be open
|
219 |
|
|
and so GETSTATUS will always return 1].
|
220 |
|
|
|
221 |
|
|
advantechwdt.c -- Advantech Single Board Computer
|
222 |
|
|
|
223 |
|
|
Timeout that defaults to 60 seconds, supports SETTIMEOUT.
|
224 |
|
|
|
225 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT
|
226 |
|
|
|
227 |
|
|
GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
|
228 |
|
|
The GETSTATUS call returns if the device is open or not.
|
229 |
|
|
[FIXME -- silliness again?]
|
230 |
|
|
|
231 |
|
|
eurotechwdt.c -- Eurotech CPU-1220/1410
|
232 |
|
|
|
233 |
|
|
The timeout can be set using the SETTIMEOUT ioctl and defaults
|
234 |
|
|
to 60 seconds.
|
235 |
|
|
|
236 |
|
|
Also has a module parameter "ev", event type which controls
|
237 |
|
|
what should happen on a timeout, the string "int" or anything
|
238 |
|
|
else that causes a reboot. [FIXME -- better description]
|
239 |
|
|
|
240 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT
|
241 |
|
|
|
242 |
|
|
GETSUPPORT returns CARDRESET and WDIOF_SETTIMEOUT but
|
243 |
|
|
GETSTATUS is not supported and GETBOOTSTATUS just returns 0.
|
244 |
|
|
|
245 |
|
|
i810-tco.c -- Intel 810 chipset
|
246 |
|
|
|
247 |
|
|
Also has support for a lot of other i8x0 stuff, but the
|
248 |
|
|
watchdog is one of the things.
|
249 |
|
|
|
250 |
|
|
The timeout is set using the module parameter "i810_margin",
|
251 |
|
|
which is in steps of 0.6 seconds where 2
|
252 |
|
|
driver supports the SETTIMEOUT ioctl.
|
253 |
|
|
|
254 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT.
|
255 |
|
|
|
256 |
|
|
GETSUPPORT returns WDIOF_SETTIMEOUT. The GETSTATUS call
|
257 |
|
|
returns some kind of timer value which ist not compatible with
|
258 |
|
|
the other drivers. GETBOOT status returns some kind of
|
259 |
|
|
hardware specific boot status. [FIXME -- describe this]
|
260 |
|
|
|
261 |
|
|
ib700wdt.c -- IB700 Single Board Computer
|
262 |
|
|
|
263 |
|
|
Default timeout of 30 seconds and the timeout is settable
|
264 |
|
|
using the SETTIMEOUT ioctl. Note that only a few timeout
|
265 |
|
|
values are supported.
|
266 |
|
|
|
267 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT
|
268 |
|
|
|
269 |
|
|
GETSUPPORT returns WDIOF_KEEPALIVEPING and WDIOF_SETTIMEOUT.
|
270 |
|
|
The GETSTATUS call returns if the device is open or not.
|
271 |
|
|
[FIXME -- silliness again?]
|
272 |
|
|
|
273 |
|
|
machzwd.c -- MachZ ZF-Logic
|
274 |
|
|
|
275 |
|
|
Hardcoded timeout of 10 seconds
|
276 |
|
|
|
277 |
|
|
Has a module parameter "action" that controls what happens
|
278 |
|
|
when the timeout runs out which can be 0 = RESET (default),
|
279 |
|
|
1 = SMI, 2 = NMI, 3 = SCI.
|
280 |
|
|
|
281 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT and the magic character
|
282 |
|
|
'V' close handling.
|
283 |
|
|
|
284 |
|
|
GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
|
285 |
|
|
returns if the device is open or not. [FIXME -- silliness
|
286 |
|
|
again?]
|
287 |
|
|
|
288 |
|
|
mixcomwd.c -- MixCom Watchdog
|
289 |
|
|
|
290 |
|
|
[FIXME -- I'm unable to tell what the timeout is]
|
291 |
|
|
|
292 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT
|
293 |
|
|
|
294 |
|
|
GETSUPPORT returns WDIOF_KEEPALIVEPING, GETSTATUS returns if
|
295 |
|
|
the device is opened or not [FIXME -- I'm not really sure how
|
296 |
|
|
this works, there seems to be some magic connected to
|
297 |
|
|
CONFIG_WATCHDOG_NOWAYOUT]
|
298 |
|
|
|
299 |
|
|
pcwd.c -- Berkshire PC Watchdog
|
300 |
|
|
|
301 |
|
|
Hardcoded timeout of 1.5 seconds
|
302 |
|
|
|
303 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT
|
304 |
|
|
|
305 |
|
|
GETSUPPORT returns WDIOF_OVERHEAT|WDIOF_CARDRESET and both
|
306 |
|
|
GETSTATUS and GETBOOTSTATUS return something useful.
|
307 |
|
|
|
308 |
|
|
The SETOPTIONS call can be used to enable and disable the card
|
309 |
|
|
and to ask the driver to call panic if the system overheats.
|
310 |
|
|
|
311 |
|
|
sbc60xxwdt.c -- 60xx Single Board Computer
|
312 |
|
|
|
313 |
|
|
Hardcoded timeout of 10 seconds
|
314 |
|
|
|
315 |
|
|
Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
|
316 |
|
|
character 'V' close handling.
|
317 |
|
|
|
318 |
|
|
No bits set in GETSUPPORT
|
319 |
|
|
|
320 |
|
|
scx200.c -- National SCx200 CPUs
|
321 |
|
|
|
322 |
|
|
Not in the kernel yet.
|
323 |
|
|
|
324 |
|
|
The timeout is set using a module parameter "margin" which
|
325 |
|
|
defaults to 60 seconds. The timeout can also be set using
|
326 |
|
|
SETTIMEOUT and read using GETTIMEOUT.
|
327 |
|
|
|
328 |
|
|
Supports a module parameter "nowayout" that is initialized
|
329 |
|
|
with the value of CONFIG_WATCHDOG_NOWAYOUT. Also supports the
|
330 |
|
|
magic character 'V' handling.
|
331 |
|
|
|
332 |
|
|
shwdt.c -- SuperH 3/4 processors
|
333 |
|
|
|
334 |
|
|
[FIXME -- I'm unable to tell what the timeout is]
|
335 |
|
|
|
336 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT
|
337 |
|
|
|
338 |
|
|
GETSUPPORT returns WDIOF_KEEPALIVEPING, and the GETSTATUS call
|
339 |
|
|
returns if the device is open or not. [FIXME -- silliness
|
340 |
|
|
again?]
|
341 |
|
|
|
342 |
|
|
softdog.c -- Software watchdog
|
343 |
|
|
|
344 |
|
|
The timeout is set with the module parameter "soft_margin"
|
345 |
|
|
which defaults to 60 seconds, the timeout is also settable
|
346 |
|
|
using the SETTIMEOUT ioctl.
|
347 |
|
|
|
348 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT
|
349 |
|
|
|
350 |
|
|
WDIOF_SETTIMEOUT bit set in GETSUPPORT
|
351 |
|
|
|
352 |
|
|
w83877f_wdt.c -- W83877F Computer
|
353 |
|
|
|
354 |
|
|
Hardcoded timeout of 30 seconds
|
355 |
|
|
|
356 |
|
|
Does not support CONFIG_WATCHDOG_NOWAYOUT, but has the magic
|
357 |
|
|
character 'V' close handling.
|
358 |
|
|
|
359 |
|
|
No bits set in GETSUPPORT
|
360 |
|
|
|
361 |
|
|
wdt.c -- ICS WDT500/501 ISA and
|
362 |
|
|
wdt_pci.c -- ICS WDT500/501 PCI
|
363 |
|
|
|
364 |
|
|
Default timeout of 60 seconds. The timeout is also settable
|
365 |
|
|
using the SETTIMEOUT ioctl.
|
366 |
|
|
|
367 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT
|
368 |
|
|
|
369 |
|
|
GETSUPPORT returns with bits set depending on the actual
|
370 |
|
|
card. The WDT501 supports a lot of external monitoring, the
|
371 |
|
|
WDT500 much less.
|
372 |
|
|
|
373 |
|
|
wdt285.c -- Footbridge watchdog
|
374 |
|
|
|
375 |
|
|
The timeout is set with the module parameter "soft_margin"
|
376 |
|
|
which defaults to 60 seconds. The timeout is also settable
|
377 |
|
|
using the SETTIMEOUT ioctl.
|
378 |
|
|
|
379 |
|
|
Does not support CONFIG_WATCHDOG_NOWAYOUT
|
380 |
|
|
|
381 |
|
|
WDIOF_SETTIMEOUT bit set in GETSUPPORT
|
382 |
|
|
|
383 |
|
|
wdt977.c -- Netwinder W83977AF chip
|
384 |
|
|
|
385 |
|
|
Hardcoded timeout of 3 minutes
|
386 |
|
|
|
387 |
|
|
Supports CONFIG_WATCHDOG_NOWAYOUT
|
388 |
|
|
|
389 |
|
|
Does not support any ioctls at all.
|
390 |
|
|
|