1 |
1275 |
phoenix |
|
2 |
|
|
The Linux IPMI Driver
|
3 |
|
|
---------------------
|
4 |
|
|
Corey Minyard
|
5 |
|
|
|
6 |
|
|
|
7 |
|
|
|
8 |
|
|
The Intelligent Platform Management Interface, or IPMI, is a
|
9 |
|
|
standard for controlling intelligent devices that monitor a system.
|
10 |
|
|
It provides for dynamic discovery of sensors in the system and the
|
11 |
|
|
ability to monitor the sensors and be informed when the sensor's
|
12 |
|
|
values change or go outside certain boundaries. It also has a
|
13 |
|
|
standardized database for field-replacable units (FRUs) and a watchdog
|
14 |
|
|
timer.
|
15 |
|
|
|
16 |
|
|
To use this, you need an interface to an IPMI controller in your
|
17 |
|
|
system (called a Baseboard Management Controller, or BMC) and
|
18 |
|
|
management software that can use the IPMI system.
|
19 |
|
|
|
20 |
|
|
This document describes how to use the IPMI driver for Linux. If you
|
21 |
|
|
are not familiar with IPMI itself, see the web site at
|
22 |
|
|
http://www.intel.com/design/servers/ipmi/index.htm. IPMI is a big
|
23 |
|
|
subject and I can't cover it all here!
|
24 |
|
|
|
25 |
|
|
Basic Design
|
26 |
|
|
------------
|
27 |
|
|
|
28 |
|
|
The Linux IPMI driver is designed to be very modular and flexible, you
|
29 |
|
|
only need to take the pieces you need and you can use it in many
|
30 |
|
|
different ways. Because of that, it's broken into many chunks of
|
31 |
|
|
code. These chunks are:
|
32 |
|
|
|
33 |
|
|
ipmi_msghandler - This is the central piece of software for the IPMI
|
34 |
|
|
system. It handles all messages, message timing, and responses. The
|
35 |
|
|
IPMI users tie into this, and the IPMI physical interfaces (called
|
36 |
|
|
System Management Interfaces, or SMIs) also tie in here. This
|
37 |
|
|
provides the kernelland interface for IPMI, but does not provide an
|
38 |
|
|
interface for use by application processes.
|
39 |
|
|
|
40 |
|
|
ipmi_devintf - This provides a userland IOCTL interface for the IPMI
|
41 |
|
|
driver, each open file for this device ties in to the message handler
|
42 |
|
|
as an IPMI user.
|
43 |
|
|
|
44 |
|
|
ipmi_kcs_drv - A driver for the KCS SMI. Most system have a KCS
|
45 |
|
|
interface for IPMI.
|
46 |
|
|
|
47 |
|
|
|
48 |
|
|
Much documentation for the interface is in the include files. The
|
49 |
|
|
IPMI include files are:
|
50 |
|
|
|
51 |
|
|
ipmi.h - Contains the user interface and IOCTL interface for IPMI.
|
52 |
|
|
|
53 |
|
|
ipmi_smi.h - Contains the interface for SMI drivers to use.
|
54 |
|
|
|
55 |
|
|
ipmi_msgdefs.h - General definitions for base IPMI messaging.
|
56 |
|
|
|
57 |
|
|
|
58 |
|
|
Addressing
|
59 |
|
|
----------
|
60 |
|
|
|
61 |
|
|
The IPMI addressing works much like IP addresses, you have an overlay
|
62 |
|
|
to handle the different address types. The overlay is:
|
63 |
|
|
|
64 |
|
|
struct ipmi_addr
|
65 |
|
|
{
|
66 |
|
|
int addr_type;
|
67 |
|
|
short channel;
|
68 |
|
|
char data[IPMI_MAX_ADDR_SIZE];
|
69 |
|
|
};
|
70 |
|
|
|
71 |
|
|
The addr_type determines what the address really is. The driver
|
72 |
|
|
currently understands two different types of addresses.
|
73 |
|
|
|
74 |
|
|
"System Interface" addresses are defined as:
|
75 |
|
|
|
76 |
|
|
struct ipmi_system_interface_addr
|
77 |
|
|
{
|
78 |
|
|
int addr_type;
|
79 |
|
|
short channel;
|
80 |
|
|
};
|
81 |
|
|
|
82 |
|
|
and the type is IPMI_SYSTEM_INTERFACE_ADDR_TYPE. This is used for talking
|
83 |
|
|
straight to the BMC on the current card. The channel must be
|
84 |
|
|
IPMI_BMC_CHANNEL.
|
85 |
|
|
|
86 |
|
|
Messages that are destined to go out on the IPMB bus use the
|
87 |
|
|
IPMI_IPMB_ADDR_TYPE address type. The format is
|
88 |
|
|
|
89 |
|
|
struct ipmi_ipmb_addr
|
90 |
|
|
{
|
91 |
|
|
int addr_type;
|
92 |
|
|
short channel;
|
93 |
|
|
unsigned char slave_addr;
|
94 |
|
|
unsigned char lun;
|
95 |
|
|
};
|
96 |
|
|
|
97 |
|
|
The "channel" here is generally zero, but some devices support more
|
98 |
|
|
than one channel, it corresponds to the channel as defined in the IPMI
|
99 |
|
|
spec.
|
100 |
|
|
|
101 |
|
|
|
102 |
|
|
Messages
|
103 |
|
|
--------
|
104 |
|
|
|
105 |
|
|
Messages are defined as:
|
106 |
|
|
|
107 |
|
|
struct ipmi_msg
|
108 |
|
|
{
|
109 |
|
|
unsigned char netfn;
|
110 |
|
|
unsigned char lun;
|
111 |
|
|
unsigned char cmd;
|
112 |
|
|
unsigned char *data;
|
113 |
|
|
int data_len;
|
114 |
|
|
};
|
115 |
|
|
|
116 |
|
|
The driver takes care of adding/stripping the header information. The
|
117 |
|
|
data portion is just the data to be send (do NOT put addressing info
|
118 |
|
|
here) or the response. Note that the completion code of a response is
|
119 |
|
|
the first item in "data", it is not stripped out because that is how
|
120 |
|
|
all the messages are defined in the spec (and thus makes counting the
|
121 |
|
|
offsets a little easier :-).
|
122 |
|
|
|
123 |
|
|
When using the IOCTL interface from userland, you must provide a block
|
124 |
|
|
of data for "data", fill it, and set data_len to the length of the
|
125 |
|
|
block of data, even when receiving messages. Otherwise the driver
|
126 |
|
|
will have no place to put the message.
|
127 |
|
|
|
128 |
|
|
Messages coming up from the message handler in kernelland will come in
|
129 |
|
|
as:
|
130 |
|
|
|
131 |
|
|
struct ipmi_recv_msg
|
132 |
|
|
{
|
133 |
|
|
struct list_head link;
|
134 |
|
|
|
135 |
|
|
/* The type of message as defined in the "Receive Types"
|
136 |
|
|
defines above. */
|
137 |
|
|
int recv_type;
|
138 |
|
|
|
139 |
|
|
ipmi_user_t *user;
|
140 |
|
|
struct ipmi_addr addr;
|
141 |
|
|
long msgid;
|
142 |
|
|
struct ipmi_msg msg;
|
143 |
|
|
|
144 |
|
|
/* Call this when done with the message. It will presumably free
|
145 |
|
|
the message and do any other necessary cleanup. */
|
146 |
|
|
void (*done)(struct ipmi_recv_msg *msg);
|
147 |
|
|
|
148 |
|
|
/* Place-holder for the data, don't make any assumptions about
|
149 |
|
|
the size or existence of this, since it may change. */
|
150 |
|
|
unsigned char msg_data[IPMI_MAX_MSG_LENGTH];
|
151 |
|
|
};
|
152 |
|
|
|
153 |
|
|
You should look at the receive type and handle the message
|
154 |
|
|
appropriately.
|
155 |
|
|
|
156 |
|
|
|
157 |
|
|
The Upper Layer Interface (Message Handler)
|
158 |
|
|
-------------------------------------------
|
159 |
|
|
|
160 |
|
|
The upper layer of the interface provides the users with a consistent
|
161 |
|
|
view of the IPMI interfaces. It allows multiple SMI interfaces to be
|
162 |
|
|
addressed (because some boards actually have multiple BMCs on them)
|
163 |
|
|
and the user should not have to care what type of SMI is below them.
|
164 |
|
|
|
165 |
|
|
|
166 |
|
|
Creating the User
|
167 |
|
|
|
168 |
|
|
To user the message handler, you must first create a user using
|
169 |
|
|
ipmi_create_user. The interface number specifies which SMI you want
|
170 |
|
|
to connect to, and you must supply callback functions to be called
|
171 |
|
|
when data comes in. The callback function can run at interrupt level,
|
172 |
|
|
so be careful using the callbacks. This also allows to you pass in a
|
173 |
|
|
piece of data, the handler_data, that will be passed back to you on
|
174 |
|
|
all calls.
|
175 |
|
|
|
176 |
|
|
Once you are done, call ipmi_destroy_user() to get rid of the user.
|
177 |
|
|
|
178 |
|
|
From userland, opening the device automatically creates a user, and
|
179 |
|
|
closing the device automatically destroys the user.
|
180 |
|
|
|
181 |
|
|
|
182 |
|
|
Messaging
|
183 |
|
|
|
184 |
|
|
To send a message from kernel-land, the ipmi_request() call does
|
185 |
|
|
pretty much all message handling. Most of the parameter are
|
186 |
|
|
self-explanatory. However, it takes a "msgid" parameter. This is NOT
|
187 |
|
|
the sequence number of messages. It is simply a long value that is
|
188 |
|
|
passed back when the response for the message is returned. You may
|
189 |
|
|
use it for anything you like.
|
190 |
|
|
|
191 |
|
|
Responses come back in the function pointed to by the ipmi_recv_hndl
|
192 |
|
|
field of the "handler" that you passed in to ipmi_create_user().
|
193 |
|
|
Remember again, these may be running at interrupt level. Remember to
|
194 |
|
|
look at the receive type, too.
|
195 |
|
|
|
196 |
|
|
From userland, you fill out an ipmi_req_t structure and use the
|
197 |
|
|
IPMICTL_SEND_COMMAND ioctl. For incoming stuff, you can use select()
|
198 |
|
|
or poll() to wait for messages to come in. However, you cannot use
|
199 |
|
|
read() to get them, you must call the IPMICTL_RECEIVE_MSG with the
|
200 |
|
|
ipmi_recv_t structure to actually get the message. Remember that you
|
201 |
|
|
must supply a pointer to a block of data in the msg.data field, and
|
202 |
|
|
you must fill in the msg.data_len field with the size of the data.
|
203 |
|
|
This gives the receiver a place to actually put the message.
|
204 |
|
|
|
205 |
|
|
If the message cannot fit into the data you provide, you will get an
|
206 |
|
|
EMSGSIZE error and the driver will leave the data in the receive
|
207 |
|
|
queue. If you want to get it and have it truncate the message, us
|
208 |
|
|
the IPMICTL_RECEIVE_MSG_TRUNC ioctl.
|
209 |
|
|
|
210 |
|
|
When you send a command (which is defined by the lowest-order bit of
|
211 |
|
|
the netfn per the IPMI spec) on the IPMB bus, the driver will
|
212 |
|
|
automatically assign the sequence number to the command and save the
|
213 |
|
|
command. If the response is not receive in the IPMI-specified 5
|
214 |
|
|
seconds, it will generate a response automatically saying the command
|
215 |
|
|
timed out. If an unsolicited response comes in (if it was after 5
|
216 |
|
|
seconds, for instance), that response will be ignored.
|
217 |
|
|
|
218 |
|
|
In kernelland, after you receive a message and are done with it, you
|
219 |
|
|
MUST call ipmi_free_recv_msg() on it, or you will leak messages. Note
|
220 |
|
|
that you should NEVER mess with the "done" field of a message, that is
|
221 |
|
|
required to properly clean up the message.
|
222 |
|
|
|
223 |
|
|
Note that when sending, there is an ipmi_request_supply_msgs() call
|
224 |
|
|
that lets you supply the smi and receive message. This is useful for
|
225 |
|
|
pieces of code that need to work even if the system is out of buffers
|
226 |
|
|
(the watchdog timer uses this, for instance). You supply your own
|
227 |
|
|
buffer and own free routines. This is not recommended for normal use,
|
228 |
|
|
though, since it is tricky to manage your own buffers.
|
229 |
|
|
|
230 |
|
|
|
231 |
|
|
Events and Incoming Commands
|
232 |
|
|
|
233 |
|
|
The driver takes care of polling for IPMI events and receiving
|
234 |
|
|
commands (commands are messages that are not responses, they are
|
235 |
|
|
commands that other things on the IPMB bus have sent you). To receive
|
236 |
|
|
these, you must register for them, they will not automatically be sent
|
237 |
|
|
to you.
|
238 |
|
|
|
239 |
|
|
To receive events, you must call ipmi_set_gets_events() and set the
|
240 |
|
|
"val" to non-zero. Any events that have been received by the driver
|
241 |
|
|
since startup will immediately be delivered to the first user that
|
242 |
|
|
registers for events. After that, if multiple users are registered
|
243 |
|
|
for events, they will all receive all events that come in.
|
244 |
|
|
|
245 |
|
|
For receiving commands, you have to individually register commands you
|
246 |
|
|
want to receive. Call ipmi_register_for_cmd() and supply the netfn
|
247 |
|
|
and command name for each command you want to receive. Only one user
|
248 |
|
|
may be registered for each netfn/cmd, but different users may register
|
249 |
|
|
for different commands.
|
250 |
|
|
|
251 |
|
|
From userland, equivalent IOCTLs are provided to do these functions.
|
252 |
|
|
|
253 |
|
|
|
254 |
|
|
The Lower Layer (SMI) Interface
|
255 |
|
|
-------------------------------
|
256 |
|
|
|
257 |
|
|
As mentioned before, multiple SMI interfaces may be registered to the
|
258 |
|
|
message handler, each of these is assigned an interface number when
|
259 |
|
|
they register with the message handler. They are generally assigned
|
260 |
|
|
in the order they register, although if an SMI unregisters and then
|
261 |
|
|
another one registers, all bets are off.
|
262 |
|
|
|
263 |
|
|
The ipmi_smi.h defines the interface for SMIs, see that for more
|
264 |
|
|
details.
|
265 |
|
|
|
266 |
|
|
|
267 |
|
|
The KCS Driver
|
268 |
|
|
--------------
|
269 |
|
|
|
270 |
|
|
The KCS driver allows up to 4 KCS interfaces to be configured in the
|
271 |
|
|
system. By default, the driver will register one KCS interface at the
|
272 |
|
|
spec-specified I/O port 0xca2 without interrupts. You can change this
|
273 |
|
|
at module load time (for a module) with:
|
274 |
|
|
|
275 |
|
|
insmod ipmi_kcs_drv.o kcs_ports=,... kcs_addrs=,
|
276 |
|
|
kcs_irqs=,... kcs_trydefaults=[0|1]
|
277 |
|
|
|
278 |
|
|
The KCS driver supports two types of interfaces, ports (for I/O port
|
279 |
|
|
based KCS interfaces) and memory addresses (for KCS interfaces in
|
280 |
|
|
memory). The driver will support both of them simultaneously, setting
|
281 |
|
|
the port to zero (or just not specifying it) will allow the memory
|
282 |
|
|
address to be used. The port will override the memory address if it
|
283 |
|
|
is specified and non-zero. kcs_trydefaults sets whether the standard
|
284 |
|
|
IPMI interface at 0xca2 and any interfaces specified by ACPE are
|
285 |
|
|
tried. By default, the driver tries it, set this value to zero to
|
286 |
|
|
turn this off.
|
287 |
|
|
|
288 |
|
|
When compiled into the kernel, the addresses can be specified on the
|
289 |
|
|
kernel command line as:
|
290 |
|
|
|
291 |
|
|
ipmi_kcs=:,:....,[nodefault]
|
292 |
|
|
|
293 |
|
|
The values is either "p" or "m" for port or memory
|
294 |
|
|
addresses. So for instance, a KCS interface at port 0xca2 using
|
295 |
|
|
interrupt 9 and a memory interface at address 0xf9827341 with no
|
296 |
|
|
interrupt would be specified "ipmi_kcs=p0xca2:9,m0xf9827341".
|
297 |
|
|
If you specify zero for in irq or don't specify it, the driver will
|
298 |
|
|
run polled unless the software can detect the interrupt to use in the
|
299 |
|
|
ACPI tables.
|
300 |
|
|
|
301 |
|
|
By default, the driver will attempt to detect a KCS device at the
|
302 |
|
|
spec-specified 0xca2 address and any address specified by ACPI. If
|
303 |
|
|
you want to turn this off, use the "nodefault" option.
|
304 |
|
|
|
305 |
|
|
If you have high-res timers compiled into the kernel, the driver will
|
306 |
|
|
use them to provide much better performance. Note that if you do not
|
307 |
|
|
have high-res timers enabled in the kernel and you don't have
|
308 |
|
|
interrupts enabled, the driver will run VERY slowly. Don't blame me,
|
309 |
|
|
the KCS interface sucks.
|
310 |
|
|
|
311 |
|
|
|
312 |
|
|
Other Pieces
|
313 |
|
|
------------
|
314 |
|
|
|
315 |
|
|
Watchdog
|
316 |
|
|
|
317 |
|
|
A watchdog timer is provided that implements the Linux-standard
|
318 |
|
|
watchdog timer interface. It has three module parameters that can be
|
319 |
|
|
used to control it:
|
320 |
|
|
|
321 |
|
|
insmod ipmi_watchdog timeout= pretimeout= action=
|
322 |
|
|
preaction= preop=
|
323 |
|
|
|
324 |
|
|
The timeout is the number of seconds to the action, and the pretimeout
|
325 |
|
|
is the amount of seconds before the reset that the pre-timeout panic will
|
326 |
|
|
occur (if pretimeout is zero, then pretimeout will not be enabled).
|
327 |
|
|
|
328 |
|
|
The action may be "reset", "power_cycle", or "power_off", and
|
329 |
|
|
specifies what to do when the timer times out, and defaults to
|
330 |
|
|
"reset".
|
331 |
|
|
|
332 |
|
|
The preaction may be "pre_smi" for an indication through the SMI
|
333 |
|
|
interface, "pre_int" for an indication through the SMI with an
|
334 |
|
|
interrupts, and "pre_nmi" for a NMI on a preaction. This is how
|
335 |
|
|
the driver is informed of the pretimeout.
|
336 |
|
|
|
337 |
|
|
The preop may be set to "preop_none" for no operation on a pretimeout,
|
338 |
|
|
"preop_panic" to set the preoperation to panic, or "preop_give_data"
|
339 |
|
|
to provide data to read from the watchdog device when the pretimeout
|
340 |
|
|
occurs. A "pre_nmi" setting CANNOT be used with "preop_give_data"
|
341 |
|
|
because you can't do data operations from an NMI.
|
342 |
|
|
|
343 |
|
|
When preop is set to "preop_give_data", one byte comes ready to read
|
344 |
|
|
on the device when the pretimeout occurs. Select and fasync work on
|
345 |
|
|
the device, as well.
|
346 |
|
|
|
347 |
|
|
When compiled into the kernel, the kernel command line is available
|
348 |
|
|
for configuring the watchdog:
|
349 |
|
|
|
350 |
|
|
ipmi_wdog=[,[,
|
351 |
|
|
|
352 |
|
|
The options are the actions and preaction above (if an option
|
353 |
|
|
controlling the same thing is specified twice, the last is taken). An
|
354 |
|
|
options "start_now" is also there, if included, the watchdog will
|
355 |
|
|
start running immediately when all the drivers are ready, it doesn't
|
356 |
|
|
have to have a user hooked up to start it.
|
357 |
|
|
|
358 |
|
|
The watchdog will panic and start a 120 second reset timeout if it
|
359 |
|
|
gets a pre-action. During a panic or a reboot, the watchdog will
|
360 |
|
|
start a 120 timer if it is running to make sure the reboot occurs.
|
361 |
|
|
|
362 |
|
|
Note that if you use the NMI preaction for the watchdog, you MUST
|
363 |
|
|
NOT use nmi watchdog mode 1. If you use the NMI watchdog, you
|
364 |
|
|
must use mode 2.
|