1 |
1275 |
phoenix |
|
2 |
|
|
PCI Power Management
|
3 |
|
|
~~~~~~~~~~~~~~~~~~~~
|
4 |
|
|
|
5 |
|
|
An overview of the concepts and the related functions in the Linux kernel
|
6 |
|
|
|
7 |
|
|
Patrick Mochel
|
8 |
|
|
|
9 |
|
|
---------------------------------------------------------------------------
|
10 |
|
|
|
11 |
|
|
1. Overview
|
12 |
|
|
2. How the PCI Subsystem Does Power Management
|
13 |
|
|
3. PCI Utility Functions
|
14 |
|
|
4. PCI Device Drivers
|
15 |
|
|
5. Resources
|
16 |
|
|
|
17 |
|
|
1. Overview
|
18 |
|
|
~~~~~~~~~~~
|
19 |
|
|
|
20 |
|
|
The PCI Power Management Specification was introduced between the PCI 2.1 and
|
21 |
|
|
PCI 2.2 Specifications. It a standard interface for controlling various
|
22 |
|
|
power management operations.
|
23 |
|
|
|
24 |
|
|
Implementation of the PCI PM Spec is optional, as are several sub-components of
|
25 |
|
|
it. If a device supports the PCI PM Spec, the device will have an 8 byte
|
26 |
|
|
capability field in its PCI configuration space. This field is used to describe
|
27 |
|
|
and control the standard PCI power management features.
|
28 |
|
|
|
29 |
|
|
The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses
|
30 |
|
|
(B0 - B3). The higher the number, the less power the device consumes. However,
|
31 |
|
|
the higher the number, the longer the latency is for the device to return to
|
32 |
|
|
an operational state (D0).
|
33 |
|
|
|
34 |
|
|
Bus power management is not covered in this version of this document.
|
35 |
|
|
|
36 |
|
|
Note that all PCI devices support D0 and D3 by default, regardless of whether or
|
37 |
|
|
not they implement any of the PCI PM spec.
|
38 |
|
|
|
39 |
|
|
The possible state transitions that a device can undergo are:
|
40 |
|
|
|
41 |
|
|
+---------------------------+
|
42 |
|
|
| Current State | New State |
|
43 |
|
|
+---------------------------+
|
44 |
|
|
| D0 | D1, D2, D3|
|
45 |
|
|
+---------------------------+
|
46 |
|
|
| D1 | D2, D3 |
|
47 |
|
|
+---------------------------+
|
48 |
|
|
| D2 | D3 |
|
49 |
|
|
+---------------------------+
|
50 |
|
|
| D1, D2, D3 | D0 |
|
51 |
|
|
+---------------------------+
|
52 |
|
|
|
53 |
|
|
Note that when the system is entering a global suspend state, all devices will
|
54 |
|
|
be placed into D3 and when resuming, all devices will be placed into D0.
|
55 |
|
|
However, when the system is running, other state transitions are possible.
|
56 |
|
|
|
57 |
|
|
2. How The PCI Subsystem Handles Power Management
|
58 |
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
59 |
|
|
|
60 |
|
|
The PCI suspend/resume functionality is accessed indirectly via the Power
|
61 |
|
|
Management subsystem. At boot, the PCI driver registers a power management
|
62 |
|
|
callback with that layer. Upon entering a suspend state, the PM layer iterates
|
63 |
|
|
through all of its registered callbacks. This currently takes place only during
|
64 |
|
|
APM state transitions.
|
65 |
|
|
|
66 |
|
|
Upon going to sleep, the PCI subsystem walks its device tree twice. Both times,
|
67 |
|
|
it does a depth first walk of the device tree. The first walk saves each of the
|
68 |
|
|
device's state and checks for devices that will prevent the system from entering
|
69 |
|
|
a global power state. The next walk then places the devices in a low power
|
70 |
|
|
state.
|
71 |
|
|
|
72 |
|
|
The first walk allows a graceful recovery in the event of a failure, since none
|
73 |
|
|
of the devices have actually been powered down.
|
74 |
|
|
|
75 |
|
|
In both walks, in particular the second, all children of a bridge are touched
|
76 |
|
|
before the actual bridge itself. This allows the bridge to retain power while
|
77 |
|
|
its children are being accessed.
|
78 |
|
|
|
79 |
|
|
Upon resuming from sleep, just the opposite must be true: all bridges must be
|
80 |
|
|
powered on and restored before their children are powered on. This is easily
|
81 |
|
|
accomplished with a breadth-first walk of the PCI device tree.
|
82 |
|
|
|
83 |
|
|
|
84 |
|
|
3. PCI Utility Functions
|
85 |
|
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
86 |
|
|
|
87 |
|
|
These are helper functions designed to be called by individual device drivers.
|
88 |
|
|
Assuming that a device behaves as advertised, these should be applicable in most
|
89 |
|
|
cases. However, results may vary.
|
90 |
|
|
|
91 |
|
|
Note that these functions are never implicitly called for the driver. The driver
|
92 |
|
|
is always responsible for deciding when and if to call these.
|
93 |
|
|
|
94 |
|
|
|
95 |
|
|
pci_save_state
|
96 |
|
|
--------------
|
97 |
|
|
|
98 |
|
|
Usage:
|
99 |
|
|
pci_save_state(dev, buffer);
|
100 |
|
|
|
101 |
|
|
Description:
|
102 |
|
|
Save first 64 bytes of PCI config space. Buffer must be allocated by
|
103 |
|
|
caller.
|
104 |
|
|
|
105 |
|
|
|
106 |
|
|
pci_restore_state
|
107 |
|
|
-----------------
|
108 |
|
|
|
109 |
|
|
Usage:
|
110 |
|
|
pci_restore_state(dev, buffer);
|
111 |
|
|
|
112 |
|
|
Description:
|
113 |
|
|
Restore previously saved config space. (First 64 bytes only);
|
114 |
|
|
|
115 |
|
|
If buffer is NULL, then restore what information we know about the
|
116 |
|
|
device from bootup: BARs and interrupt line.
|
117 |
|
|
|
118 |
|
|
|
119 |
|
|
pci_set_power_state
|
120 |
|
|
-------------------
|
121 |
|
|
|
122 |
|
|
Usage:
|
123 |
|
|
pci_set_power_state(dev, state);
|
124 |
|
|
|
125 |
|
|
Description:
|
126 |
|
|
Transition device to low power state using PCI PM Capabilities
|
127 |
|
|
registers.
|
128 |
|
|
|
129 |
|
|
Will fail under one of the following conditions:
|
130 |
|
|
- If state is less than current state, but not D0 (illegal transition)
|
131 |
|
|
- Device doesn't support PM Capabilities
|
132 |
|
|
- Device does not support requested state
|
133 |
|
|
|
134 |
|
|
|
135 |
|
|
pci_enable_wake
|
136 |
|
|
---------------
|
137 |
|
|
|
138 |
|
|
Usage:
|
139 |
|
|
pci_enable_wake(dev, state, enable);
|
140 |
|
|
|
141 |
|
|
Description:
|
142 |
|
|
Enable device to generate PME# during low power state using PCI PM
|
143 |
|
|
Capabilities.
|
144 |
|
|
|
145 |
|
|
Checks whether if device supports generating PME# from requested state
|
146 |
|
|
and fail if it does not, unless enable == 0 (request is to disable wake
|
147 |
|
|
events, which is implicit if it doesn't even support it in the first
|
148 |
|
|
place).
|
149 |
|
|
|
150 |
|
|
Note that the PMC Register in the device's PM Capabilties has a bitmask
|
151 |
|
|
of the states it supports generating PME# from. D3hot is bit 3 and
|
152 |
|
|
D3cold is bit 4. So, while a value of 4 as the state may not seem
|
153 |
|
|
semantically correct, it is.
|
154 |
|
|
|
155 |
|
|
|
156 |
|
|
4. PCI Device Drivers
|
157 |
|
|
~~~~~~~~~~~~~~~~~~~~~
|
158 |
|
|
|
159 |
|
|
These functions are intended for use by individual drivers, and are defined in
|
160 |
|
|
struct pci_driver:
|
161 |
|
|
|
162 |
|
|
int (*save_state) (struct pci_dev *dev, u32 state);
|
163 |
|
|
int (*suspend) (struct pci_dev *dev, u32 state);
|
164 |
|
|
int (*resume) (struct pci_dev *dev);
|
165 |
|
|
int (*enable_wake) (struct pci_dev *dev, u32 state, int enable);
|
166 |
|
|
|
167 |
|
|
|
168 |
|
|
save_state
|
169 |
|
|
----------
|
170 |
|
|
|
171 |
|
|
Usage:
|
172 |
|
|
|
173 |
|
|
if (dev->driver && dev->driver->save_state)
|
174 |
|
|
dev->driver->save_state(dev,state);
|
175 |
|
|
|
176 |
|
|
The driver should use this callback to save device state. It should take into
|
177 |
|
|
account the current state of the device and the requested state in order to
|
178 |
|
|
avoid any unnecessary operations.
|
179 |
|
|
|
180 |
|
|
For example, a video card that supports all 4 states (D0-D3), all controller
|
181 |
|
|
context is preserved when entering D1, but the screen is placed into a low power
|
182 |
|
|
state (blanked).
|
183 |
|
|
|
184 |
|
|
The driver can also interpret this function as a notification that it may be
|
185 |
|
|
entering a sleep state in the near future. If it knows that the device cannot
|
186 |
|
|
enter the requested state, either because of lack of support for it, or because
|
187 |
|
|
the device is middle of some critical operation, then it should fail.
|
188 |
|
|
|
189 |
|
|
This function should not be used to set any state in the device or the driver
|
190 |
|
|
because the device may not actually enter the sleep state (e.g. another driver
|
191 |
|
|
later causes causes a global state transition to fail).
|
192 |
|
|
|
193 |
|
|
Note that in intermediate low power states, a device's I/O and memory spaces may
|
194 |
|
|
be disabled and may not be available in subsequent transitions to lower power
|
195 |
|
|
states.
|
196 |
|
|
|
197 |
|
|
|
198 |
|
|
suspend
|
199 |
|
|
-------
|
200 |
|
|
|
201 |
|
|
Usage:
|
202 |
|
|
|
203 |
|
|
if (dev->driver && dev->driver->suspend)
|
204 |
|
|
dev->driver->suspend(dev,state);
|
205 |
|
|
|
206 |
|
|
A driver uses this function to actually transition the device into a low power
|
207 |
|
|
state. This may include disabling I/O, memory and bus-mastering, as well as
|
208 |
|
|
physically transitioning the device to a lower power state.
|
209 |
|
|
|
210 |
|
|
Bus mastering may be disabled by doing:
|
211 |
|
|
|
212 |
|
|
pci_disable_device(dev);
|
213 |
|
|
|
214 |
|
|
For devices that support the PCI PM Spec, this may be used to set the device's
|
215 |
|
|
power state:
|
216 |
|
|
|
217 |
|
|
pci_set_power_state(dev,state);
|
218 |
|
|
|
219 |
|
|
The driver is also responsible for disabling any other device-specific features
|
220 |
|
|
(e.g blanking screen, turning off on-card memory, etc).
|
221 |
|
|
|
222 |
|
|
The driver should be sure to track the current state of the device, as it may
|
223 |
|
|
obviate the need for some operations.
|
224 |
|
|
|
225 |
|
|
The driver should update the current_state field in its pci_dev structure in
|
226 |
|
|
this function.
|
227 |
|
|
|
228 |
|
|
resume
|
229 |
|
|
------
|
230 |
|
|
|
231 |
|
|
Usage:
|
232 |
|
|
|
233 |
|
|
if (dev->driver && dev->driver->suspend)
|
234 |
|
|
dev->driver->resume(dev)
|
235 |
|
|
|
236 |
|
|
The resume callback may be called from any power state, and is always meant to
|
237 |
|
|
transition the device to the D0 state.
|
238 |
|
|
|
239 |
|
|
The driver is responsible for reenabling any features of the device that had
|
240 |
|
|
been disabled during previous suspend calls and restoring all state that was
|
241 |
|
|
saved in previous save_state calls.
|
242 |
|
|
|
243 |
|
|
If the device is currently in D3, it must be completely reinitialized, as it
|
244 |
|
|
must be assumed that the device has lost all of its context (even that of its
|
245 |
|
|
PCI config space). For almost all current drivers, this means that the
|
246 |
|
|
initialization code that the driver does at boot must be separated out and
|
247 |
|
|
called again from the resume callback. Note that some values for the device may
|
248 |
|
|
not have to be probed for this time around if they are saved before entering the
|
249 |
|
|
low power state.
|
250 |
|
|
|
251 |
|
|
If the device supports the PCI PM Spec, it can use this to physically transition
|
252 |
|
|
the device to D0:
|
253 |
|
|
|
254 |
|
|
pci_set_power_state(dev,0);
|
255 |
|
|
|
256 |
|
|
Note that if the entire system is transitioning out of a global sleep state, all
|
257 |
|
|
devices will be placed in the D0 state, so this is not necessary. However, in
|
258 |
|
|
the event that the device is placed in the D3 state during normal operation,
|
259 |
|
|
this call is necessary. It is impossible to determine which of the two events is
|
260 |
|
|
taking place in the driver, so it is always a good idea to make that call.
|
261 |
|
|
|
262 |
|
|
The driver should take note of the state that it is resuming from in order to
|
263 |
|
|
ensure correct (and speedy) operation.
|
264 |
|
|
|
265 |
|
|
The driver should update the current_state field in its pci_dev structure in
|
266 |
|
|
this function.
|
267 |
|
|
|
268 |
|
|
|
269 |
|
|
enable_wake
|
270 |
|
|
-----------
|
271 |
|
|
|
272 |
|
|
Usage:
|
273 |
|
|
|
274 |
|
|
if (dev->driver && dev->driver->enable_wake)
|
275 |
|
|
dev->driver->enable_wake(dev,state,enable);
|
276 |
|
|
|
277 |
|
|
This callback is generally only relevant for devices that support the PCI PM
|
278 |
|
|
spec and have the ability to generate a PME# (Power Management Event Signal)
|
279 |
|
|
to wake the system up. (However, it is possible that a device may support
|
280 |
|
|
some non-standard way of generating a wake event on sleep.)
|
281 |
|
|
|
282 |
|
|
Bits 15:11 of the PMC (Power Mgmt Capabilities) Register in a device's
|
283 |
|
|
PM Capabilties describe what power states the device supports generating a
|
284 |
|
|
wake event from:
|
285 |
|
|
|
286 |
|
|
+------------------+
|
287 |
|
|
| Bit | State |
|
288 |
|
|
+------------------+
|
289 |
|
|
| 15 | D0 |
|
290 |
|
|
| 14 | D1 |
|
291 |
|
|
| 13 | D2 |
|
292 |
|
|
| 12 | D3hot |
|
293 |
|
|
| 11 | D3cold |
|
294 |
|
|
+------------------+
|
295 |
|
|
|
296 |
|
|
A device can use this to enable wake events:
|
297 |
|
|
|
298 |
|
|
pci_enable_wake(dev,state,enable);
|
299 |
|
|
|
300 |
|
|
Note that to enable PME# from D3cold, a value of 4 should be passed to
|
301 |
|
|
pci_enable_wake (since it uses an index into a bitmask). If a driver gets
|
302 |
|
|
a request to enable wake events from D3, two calls should be made to
|
303 |
|
|
pci_enable_wake (one for both D3hot and D3cold).
|
304 |
|
|
|
305 |
|
|
|
306 |
|
|
5. Resources
|
307 |
|
|
~~~~~~~~~~~~
|
308 |
|
|
|
309 |
|
|
PCI Local Bus Specification
|
310 |
|
|
PCI Bus Power Management Interface Specification
|
311 |
|
|
|
312 |
|
|
http://pcisig.org
|
313 |
|
|
|