1 |
3 |
xianfeng |
|
2 |
|
|
PCI Power Management
|
3 |
|
|
~~~~~~~~~~~~~~~~~~~~
|
4 |
|
|
|
5 |
|
|
An overview of the concepts and the related functions in the Linux kernel
|
6 |
|
|
|
7 |
|
|
Patrick Mochel
|
8 |
|
|
(and others)
|
9 |
|
|
|
10 |
|
|
---------------------------------------------------------------------------
|
11 |
|
|
|
12 |
|
|
1. Overview
|
13 |
|
|
2. How the PCI Subsystem Does Power Management
|
14 |
|
|
3. PCI Utility Functions
|
15 |
|
|
4. PCI Device Drivers
|
16 |
|
|
5. Resources
|
17 |
|
|
|
18 |
|
|
1. Overview
|
19 |
|
|
~~~~~~~~~~~
|
20 |
|
|
|
21 |
|
|
The PCI Power Management Specification was introduced between the PCI 2.1 and
|
22 |
|
|
PCI 2.2 Specifications. It a standard interface for controlling various
|
23 |
|
|
power management operations.
|
24 |
|
|
|
25 |
|
|
Implementation of the PCI PM Spec is optional, as are several sub-components of
|
26 |
|
|
it. If a device supports the PCI PM Spec, the device will have an 8 byte
|
27 |
|
|
capability field in its PCI configuration space. This field is used to describe
|
28 |
|
|
and control the standard PCI power management features.
|
29 |
|
|
|
30 |
|
|
The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses
|
31 |
|
|
(B0 - B3). The higher the number, the less power the device consumes. However,
|
32 |
|
|
the higher the number, the longer the latency is for the device to return to
|
33 |
|
|
an operational state (D0).
|
34 |
|
|
|
35 |
|
|
There are actually two D3 states. When someone talks about D3, they usually
|
36 |
|
|
mean D3hot, which corresponds to an ACPI D2 state (power is reduced, the
|
37 |
|
|
device may lose some context). But they may also mean D3cold, which is an
|
38 |
|
|
ACPI D3 state (power is fully off, all state was discarded); or both.
|
39 |
|
|
|
40 |
|
|
Bus power management is not covered in this version of this document.
|
41 |
|
|
|
42 |
|
|
Note that all PCI devices support D0 and D3cold by default, regardless of
|
43 |
|
|
whether or not they implement any of the PCI PM spec.
|
44 |
|
|
|
45 |
|
|
The possible state transitions that a device can undergo are:
|
46 |
|
|
|
47 |
|
|
+---------------------------+
|
48 |
|
|
| Current State | New State |
|
49 |
|
|
+---------------------------+
|
50 |
|
|
| D0 | D1, D2, D3|
|
51 |
|
|
+---------------------------+
|
52 |
|
|
| D1 | D2, D3 |
|
53 |
|
|
+---------------------------+
|
54 |
|
|
| D2 | D3 |
|
55 |
|
|
+---------------------------+
|
56 |
|
|
| D1, D2, D3 | D0 |
|
57 |
|
|
+---------------------------+
|
58 |
|
|
|
59 |
|
|
Note that when the system is entering a global suspend state, all devices will
|
60 |
|
|
be placed into D3 and when resuming, all devices will be placed into D0.
|
61 |
|
|
However, when the system is running, other state transitions are possible.
|
62 |
|
|
|
63 |
|
|
2. How The PCI Subsystem Handles Power Management
|
64 |
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
65 |
|
|
|
66 |
|
|
The PCI suspend/resume functionality is accessed indirectly via the Power
|
67 |
|
|
Management subsystem. At boot, the PCI driver registers a power management
|
68 |
|
|
callback with that layer. Upon entering a suspend state, the PM layer iterates
|
69 |
|
|
through all of its registered callbacks. This currently takes place only during
|
70 |
|
|
APM state transitions.
|
71 |
|
|
|
72 |
|
|
Upon going to sleep, the PCI subsystem walks its device tree twice. Both times,
|
73 |
|
|
it does a depth first walk of the device tree. The first walk saves each of the
|
74 |
|
|
device's state and checks for devices that will prevent the system from entering
|
75 |
|
|
a global power state. The next walk then places the devices in a low power
|
76 |
|
|
state.
|
77 |
|
|
|
78 |
|
|
The first walk allows a graceful recovery in the event of a failure, since none
|
79 |
|
|
of the devices have actually been powered down.
|
80 |
|
|
|
81 |
|
|
In both walks, in particular the second, all children of a bridge are touched
|
82 |
|
|
before the actual bridge itself. This allows the bridge to retain power while
|
83 |
|
|
its children are being accessed.
|
84 |
|
|
|
85 |
|
|
Upon resuming from sleep, just the opposite must be true: all bridges must be
|
86 |
|
|
powered on and restored before their children are powered on. This is easily
|
87 |
|
|
accomplished with a breadth-first walk of the PCI device tree.
|
88 |
|
|
|
89 |
|
|
|
90 |
|
|
3. PCI Utility Functions
|
91 |
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
92 |
|
|
|
93 |
|
|
These are helper functions designed to be called by individual device drivers.
|
94 |
|
|
Assuming that a device behaves as advertised, these should be applicable in most
|
95 |
|
|
cases. However, results may vary.
|
96 |
|
|
|
97 |
|
|
Note that these functions are never implicitly called for the driver. The driver
|
98 |
|
|
is always responsible for deciding when and if to call these.
|
99 |
|
|
|
100 |
|
|
|
101 |
|
|
pci_save_state
|
102 |
|
|
--------------
|
103 |
|
|
|
104 |
|
|
Usage:
|
105 |
|
|
pci_save_state(struct pci_dev *dev);
|
106 |
|
|
|
107 |
|
|
Description:
|
108 |
|
|
Save first 64 bytes of PCI config space, along with any additional
|
109 |
|
|
PCI-Express or PCI-X information.
|
110 |
|
|
|
111 |
|
|
|
112 |
|
|
pci_restore_state
|
113 |
|
|
-----------------
|
114 |
|
|
|
115 |
|
|
Usage:
|
116 |
|
|
pci_restore_state(struct pci_dev *dev);
|
117 |
|
|
|
118 |
|
|
Description:
|
119 |
|
|
Restore previously saved config space.
|
120 |
|
|
|
121 |
|
|
|
122 |
|
|
pci_set_power_state
|
123 |
|
|
-------------------
|
124 |
|
|
|
125 |
|
|
Usage:
|
126 |
|
|
pci_set_power_state(struct pci_dev *dev, pci_power_t state);
|
127 |
|
|
|
128 |
|
|
Description:
|
129 |
|
|
Transition device to low power state using PCI PM Capabilities
|
130 |
|
|
registers.
|
131 |
|
|
|
132 |
|
|
Will fail under one of the following conditions:
|
133 |
|
|
- If state is less than current state, but not D0 (illegal transition)
|
134 |
|
|
- Device doesn't support PM Capabilities
|
135 |
|
|
- Device does not support requested state
|
136 |
|
|
|
137 |
|
|
|
138 |
|
|
pci_enable_wake
|
139 |
|
|
---------------
|
140 |
|
|
|
141 |
|
|
Usage:
|
142 |
|
|
pci_enable_wake(struct pci_dev *dev, pci_power_t state, int enable);
|
143 |
|
|
|
144 |
|
|
Description:
|
145 |
|
|
Enable device to generate PME# during low power state using PCI PM
|
146 |
|
|
Capabilities.
|
147 |
|
|
|
148 |
|
|
Checks whether if device supports generating PME# from requested state
|
149 |
|
|
and fail if it does not, unless enable == 0 (request is to disable wake
|
150 |
|
|
events, which is implicit if it doesn't even support it in the first
|
151 |
|
|
place).
|
152 |
|
|
|
153 |
|
|
Note that the PMC Register in the device's PM Capabilities has a bitmask
|
154 |
|
|
of the states it supports generating PME# from. D3hot is bit 3 and
|
155 |
|
|
D3cold is bit 4. So, while a value of 4 as the state may not seem
|
156 |
|
|
semantically correct, it is.
|
157 |
|
|
|
158 |
|
|
|
159 |
|
|
4. PCI Device Drivers
|
160 |
|
|
~~~~~~~~~~~~~~~~~~~~~
|
161 |
|
|
|
162 |
|
|
These functions are intended for use by individual drivers, and are defined in
|
163 |
|
|
struct pci_driver:
|
164 |
|
|
|
165 |
|
|
int (*suspend) (struct pci_dev *dev, pm_message_t state);
|
166 |
|
|
int (*resume) (struct pci_dev *dev);
|
167 |
|
|
|
168 |
|
|
|
169 |
|
|
suspend
|
170 |
|
|
-------
|
171 |
|
|
|
172 |
|
|
Usage:
|
173 |
|
|
|
174 |
|
|
if (dev->driver && dev->driver->suspend)
|
175 |
|
|
dev->driver->suspend(dev,state);
|
176 |
|
|
|
177 |
|
|
A driver uses this function to actually transition the device into a low power
|
178 |
|
|
state. This should include disabling I/O, IRQs, and bus-mastering, as well as
|
179 |
|
|
physically transitioning the device to a lower power state; it may also include
|
180 |
|
|
calls to pci_enable_wake().
|
181 |
|
|
|
182 |
|
|
Bus mastering may be disabled by doing:
|
183 |
|
|
|
184 |
|
|
pci_disable_device(dev);
|
185 |
|
|
|
186 |
|
|
For devices that support the PCI PM Spec, this may be used to set the device's
|
187 |
|
|
power state to match the suspend() parameter:
|
188 |
|
|
|
189 |
|
|
pci_set_power_state(dev,state);
|
190 |
|
|
|
191 |
|
|
The driver is also responsible for disabling any other device-specific features
|
192 |
|
|
(e.g blanking screen, turning off on-card memory, etc).
|
193 |
|
|
|
194 |
|
|
The driver should be sure to track the current state of the device, as it may
|
195 |
|
|
obviate the need for some operations.
|
196 |
|
|
|
197 |
|
|
The driver should update the current_state field in its pci_dev structure in
|
198 |
|
|
this function, except for PM-capable devices when pci_set_power_state is used.
|
199 |
|
|
|
200 |
|
|
resume
|
201 |
|
|
------
|
202 |
|
|
|
203 |
|
|
Usage:
|
204 |
|
|
|
205 |
|
|
if (dev->driver && dev->driver->resume)
|
206 |
|
|
dev->driver->resume(dev)
|
207 |
|
|
|
208 |
|
|
The resume callback may be called from any power state, and is always meant to
|
209 |
|
|
transition the device to the D0 state.
|
210 |
|
|
|
211 |
|
|
The driver is responsible for reenabling any features of the device that had
|
212 |
|
|
been disabled during previous suspend calls, such as IRQs and bus mastering,
|
213 |
|
|
as well as calling pci_restore_state().
|
214 |
|
|
|
215 |
|
|
If the device is currently in D3, it may need to be reinitialized in resume().
|
216 |
|
|
|
217 |
|
|
* Some types of devices, like bus controllers, will preserve context in D3hot
|
218 |
|
|
(using Vcc power). Their drivers will often want to avoid re-initializing
|
219 |
|
|
them after re-entering D0 (perhaps to avoid resetting downstream devices).
|
220 |
|
|
|
221 |
|
|
* Other kinds of devices in D3hot will discard device context as part of a
|
222 |
|
|
soft reset when re-entering the D0 state.
|
223 |
|
|
|
224 |
|
|
* Devices resuming from D3cold always go through a power-on reset. Some
|
225 |
|
|
device context can also be preserved using Vaux power.
|
226 |
|
|
|
227 |
|
|
* Some systems hide D3cold resume paths from drivers. For example, on PCs
|
228 |
|
|
the resume path for suspend-to-disk often runs BIOS powerup code, which
|
229 |
|
|
will sometimes re-initialize the device.
|
230 |
|
|
|
231 |
|
|
To handle resets during D3 to D0 transitions, it may be convenient to share
|
232 |
|
|
device initialization code between probe() and resume(). Device parameters
|
233 |
|
|
can also be saved before the driver suspends into D3, avoiding re-probe.
|
234 |
|
|
|
235 |
|
|
If the device supports the PCI PM Spec, it can use this to physically transition
|
236 |
|
|
the device to D0:
|
237 |
|
|
|
238 |
|
|
pci_set_power_state(dev,0);
|
239 |
|
|
|
240 |
|
|
Note that if the entire system is transitioning out of a global sleep state, all
|
241 |
|
|
devices will be placed in the D0 state, so this is not necessary. However, in
|
242 |
|
|
the event that the device is placed in the D3 state during normal operation,
|
243 |
|
|
this call is necessary. It is impossible to determine which of the two events is
|
244 |
|
|
taking place in the driver, so it is always a good idea to make that call.
|
245 |
|
|
|
246 |
|
|
The driver should take note of the state that it is resuming from in order to
|
247 |
|
|
ensure correct (and speedy) operation.
|
248 |
|
|
|
249 |
|
|
The driver should update the current_state field in its pci_dev structure in
|
250 |
|
|
this function, except for PM-capable devices when pci_set_power_state is used.
|
251 |
|
|
|
252 |
|
|
|
253 |
|
|
|
254 |
|
|
A reference implementation
|
255 |
|
|
-------------------------
|
256 |
|
|
.suspend()
|
257 |
|
|
{
|
258 |
|
|
/* driver specific operations */
|
259 |
|
|
|
260 |
|
|
/* Disable IRQ */
|
261 |
|
|
free_irq();
|
262 |
|
|
/* If using MSI */
|
263 |
|
|
pci_disable_msi();
|
264 |
|
|
|
265 |
|
|
pci_save_state();
|
266 |
|
|
pci_enable_wake();
|
267 |
|
|
/* Disable IO/bus master/irq router */
|
268 |
|
|
pci_disable_device();
|
269 |
|
|
pci_set_power_state(pci_choose_state());
|
270 |
|
|
}
|
271 |
|
|
|
272 |
|
|
.resume()
|
273 |
|
|
{
|
274 |
|
|
pci_set_power_state(PCI_D0);
|
275 |
|
|
pci_restore_state();
|
276 |
|
|
/* device's irq possibly is changed, driver should take care */
|
277 |
|
|
pci_enable_device();
|
278 |
|
|
pci_set_master();
|
279 |
|
|
|
280 |
|
|
/* if using MSI, device's vector possibly is changed */
|
281 |
|
|
pci_enable_msi();
|
282 |
|
|
|
283 |
|
|
request_irq();
|
284 |
|
|
/* driver specific operations; */
|
285 |
|
|
}
|
286 |
|
|
|
287 |
|
|
This is a typical implementation. Drivers can slightly change the order
|
288 |
|
|
of the operations in the implementation, ignore some operations or add
|
289 |
|
|
more driver specific operations in it, but drivers should do something like
|
290 |
|
|
this on the whole.
|
291 |
|
|
|
292 |
|
|
5. Resources
|
293 |
|
|
~~~~~~~~~~~~
|
294 |
|
|
|
295 |
|
|
PCI Local Bus Specification
|
296 |
|
|
PCI Bus Power Management Interface Specification
|
297 |
|
|
|
298 |
|
|
http://www.pcisig.com
|
299 |
|
|
|