1 |
27 |
unneback |
|
2 |
|
|
eCos SMP Support
|
3 |
|
|
================
|
4 |
|
|
|
5 |
|
|
eCos contains support for limited Symmetric Multi-Processing
|
6 |
|
|
(SMP). This is only available on selected architectures and platforms.
|
7 |
|
|
|
8 |
|
|
This first part of this document describes the platform-independent
|
9 |
|
|
parts of the SMP support. Annexes at the end of this document describe
|
10 |
|
|
any details that are specific to a particular platform.
|
11 |
|
|
|
12 |
|
|
Target Hardware Limitations
|
13 |
|
|
---------------------------
|
14 |
|
|
|
15 |
|
|
To allow a reasonable implementation of SMP, and to reduce the
|
16 |
|
|
disruption to the existing source base, a number of assumptions have
|
17 |
|
|
been made about the features of the target hardware.
|
18 |
|
|
|
19 |
|
|
- Modest multiprocessing. The typical number of CPUs supported is two
|
20 |
|
|
to four, with an upper limit around eight. While there are no
|
21 |
|
|
inherent limits in the code, hardware and algorithmic limitations
|
22 |
|
|
will probably become significant beyond this point.
|
23 |
|
|
|
24 |
|
|
- SMP synchronization support. The hardware must supply a mechanism to
|
25 |
|
|
allow software on two CPUs to synchronize. This is normally provided
|
26 |
|
|
as part of the instruction set in the form of test-and-set,
|
27 |
|
|
compare-and-swap or load-link/store-conditional instructions. An
|
28 |
|
|
alternative approach is the provision of hardware semaphore
|
29 |
|
|
registers which can be used to serialize implementations of these
|
30 |
|
|
operations. Whatever hardware facilities are available, they are
|
31 |
|
|
used in eCos to implement spinlocks.
|
32 |
|
|
|
33 |
|
|
- Coherent caches. It is assumed that no extra effort will be required
|
34 |
|
|
to access shared memory from any processor. This means that either
|
35 |
|
|
there are no caches, they are shared by all processors, or are
|
36 |
|
|
maintained in a coherent state by the hardware. It would be too
|
37 |
|
|
disruptive to the eCos sources if every memory access had to be
|
38 |
|
|
bracketed by cache load/flush operations. Any hardware that requires
|
39 |
|
|
this is not supported.
|
40 |
|
|
|
41 |
|
|
- Uniform addressing. It is assumed that all memory that is
|
42 |
|
|
shared between CPUs is addressed at the same location from all
|
43 |
|
|
CPUs. Like non-coherent caches, dealing with CPU-specific address
|
44 |
|
|
translation is considered too disruptive to the eCos source
|
45 |
|
|
base. This does not, however, preclude systems with non-uniform
|
46 |
|
|
access costs for different CPUs.
|
47 |
|
|
|
48 |
|
|
- Uniform device addressing. As with access to memory, it is assumed
|
49 |
|
|
that all devices are equally accessible to all CPUs. Since device
|
50 |
|
|
access is often made from thread contexts, it is not possible to
|
51 |
|
|
restrict access to device control registers to certain CPUs, since
|
52 |
|
|
there is currently no support for binding or migrating threads to CPUs.
|
53 |
|
|
|
54 |
|
|
- Interrupt routing. The target hardware must have an interrupt
|
55 |
|
|
controller that can route interrupts to specific CPUs. It is
|
56 |
|
|
acceptable for all interrupts to be delivered to just one CPU, or
|
57 |
|
|
for some interrupts to be bound to specific CPUs, or for some
|
58 |
|
|
interrupts to be local to each CPU. At present dynamic routing,
|
59 |
|
|
where a different CPU may be chosen each time an interrupt is
|
60 |
|
|
delivered, is not supported. ECos cannot support hardware where all
|
61 |
|
|
interrupts are delivered to all CPUs simultaneously with the
|
62 |
|
|
expectation that software will resolve any conflicts.
|
63 |
|
|
|
64 |
|
|
- Inter-CPU interrupts. A mechanism to allow one CPU to interrupt
|
65 |
|
|
another is needed. This is necessary so that events on one CPU can
|
66 |
|
|
cause rescheduling on other CPUs.
|
67 |
|
|
|
68 |
|
|
- CPU Identifiers. Code running on a CPU must be able to determine
|
69 |
|
|
which CPU it is running on. The CPU Id is usually provided either in
|
70 |
|
|
a CPU status register, or in a register associated with the
|
71 |
|
|
inter-CPU interrupt delivery subsystem. Ecos expects CPU Ids to be
|
72 |
|
|
small positive integers, although alternative representations, such
|
73 |
|
|
as bitmaps, can be converted relatively easily. Complex mechanisms
|
74 |
|
|
for getting the CPU Id cannot be supported. Getting the CPU Id must
|
75 |
|
|
be a cheap operation, since it is done often, and in performance
|
76 |
|
|
critical places such as interrupt handlers and the scheduler.
|
77 |
|
|
|
78 |
|
|
Kernel Support
|
79 |
|
|
--------------
|
80 |
|
|
|
81 |
|
|
This section describes how SMP is handled in the kernel, and where
|
82 |
|
|
system behaviour differs from a single CPU system.
|
83 |
|
|
|
84 |
|
|
System Startup
|
85 |
|
|
~~~~~~~~~~~~~~
|
86 |
|
|
|
87 |
|
|
System startup takes place on only one CPU, called the primary
|
88 |
|
|
CPU. All other CPUs, the secondary CPUs, are either placed in
|
89 |
|
|
suspended state at reset, or are captured by the HAL and put into
|
90 |
|
|
a spin as they start up.
|
91 |
|
|
|
92 |
|
|
The primary CPU is responsible for copying the DATA segment and
|
93 |
|
|
zeroing the BSS (if required), calling HAL variant and platform
|
94 |
|
|
initialization routines and invoking constructors. It then calls
|
95 |
|
|
cyg_start() to enter the application. The application may then create
|
96 |
|
|
extra threads and other objects.
|
97 |
|
|
|
98 |
|
|
It is only when the application calls Cyg_Scheduler::start() that the
|
99 |
|
|
secondary CPUs are initialized. This routine scans the list of
|
100 |
|
|
available secondary CPUs and calls HAL_SMP_CPU_START() to start each one.
|
101 |
|
|
Finally it calls Cyg_Scheduler::start_cpu().
|
102 |
|
|
|
103 |
|
|
Each secondary CPU starts in the HAL, where it completes any per-CPU
|
104 |
|
|
initialization before calling into the kernel at
|
105 |
|
|
cyg_kernel_cpu_startup(). Here it claims the scheduler lock and calls
|
106 |
|
|
Cyg_Scheduler::start_cpu().
|
107 |
|
|
|
108 |
|
|
Cyg_Scheduler::start_cpu() is common to both the primary and secondary
|
109 |
|
|
CPUs. The first thing this code does is to install an interrupt object
|
110 |
|
|
for this CPU's inter-CPU interrupt. From this point on the code is the
|
111 |
|
|
same as for the single CPU case: an initial thread is chosen and
|
112 |
|
|
entered.
|
113 |
|
|
|
114 |
|
|
From this point on the CPUs are all equal, eCos makes no further
|
115 |
|
|
distinction between the primary and secondary CPUs. However, the
|
116 |
|
|
hardware may still distinguish them as far as interrupt delivery is
|
117 |
|
|
concerned.
|
118 |
|
|
|
119 |
|
|
|
120 |
|
|
Scheduling
|
121 |
|
|
~~~~~~~~~~
|
122 |
|
|
|
123 |
|
|
To function correctly an operating system kernel must protect its
|
124 |
|
|
vital data structures, such as the run queues, from concurrent
|
125 |
|
|
access. In a single CPU system the only concurrent activities to worry
|
126 |
|
|
about are asynchronous interrupts. The kernel can easily guard its
|
127 |
|
|
data structures against these by disabling interrupts. However, in a
|
128 |
|
|
multi-CPU system, this is inadequate since it does not block access by
|
129 |
|
|
other CPUs.
|
130 |
|
|
|
131 |
|
|
The eCos kernel protects its vital data structures using the scheduler
|
132 |
|
|
lock. In single CPU systems this is a simple counter that is
|
133 |
|
|
atomically incremented to acquire the lock and decremented to release
|
134 |
|
|
it. If the lock is decremented to zero then the scheduler may be
|
135 |
|
|
invoked to choose a different thread to run. Because interrupts may
|
136 |
|
|
continue to be serviced while the scheduler lock is claimed, ISRs are
|
137 |
|
|
not allowed to access kernel data structures, or call kernel routines
|
138 |
|
|
that can. Instead all such operations are deferred to an associated
|
139 |
|
|
DSR routine that is run during the lock release operation, when the
|
140 |
|
|
data structures are in a consistent state.
|
141 |
|
|
|
142 |
|
|
By choosing a kernel locking mechanism that does not rely on interrupt
|
143 |
|
|
manipulation to protect data structures, it is easier to convert eCos
|
144 |
|
|
to SMP than would otherwise be the case. The principal change needed to
|
145 |
|
|
make eCos SMP-safe is to convert the scheduler lock into a nestable
|
146 |
|
|
spin lock. This is done by adding a spinlock and a CPU id to the
|
147 |
|
|
original counter.
|
148 |
|
|
|
149 |
|
|
The algorithm for acquiring the scheduler lock is very simple. If the
|
150 |
|
|
scheduler lock's CPU id matches the current CPU then it can increment
|
151 |
|
|
the counter and continue. If it does not match, the CPU must spin on
|
152 |
|
|
the spinlock, after which it may increment the counter and store its
|
153 |
|
|
own identity in the CPU id.
|
154 |
|
|
|
155 |
|
|
To release the lock, the counter is decremented. If it goes to zero
|
156 |
|
|
the CPU id value must be set to NONE and the spinlock cleared.
|
157 |
|
|
|
158 |
|
|
To protect these sequences against interrupts, they must be performed
|
159 |
|
|
with interrupts disabled. However, since these are very short code
|
160 |
|
|
sequences, they will not have an adverse effect on the interrupt
|
161 |
|
|
latency.
|
162 |
|
|
|
163 |
|
|
Beyond converting the scheduler lock, further preparing the kernel for
|
164 |
|
|
SMP is a relatively minor matter. The main changes are to convert
|
165 |
|
|
various scalar housekeeping variables into arrays indexed by CPU
|
166 |
|
|
id. These include the current thread pointer, the need_reschedule
|
167 |
|
|
flag and the timeslice counter.
|
168 |
|
|
|
169 |
|
|
At present only the Multi-Level Queue (MLQ) scheduler is capable of
|
170 |
|
|
supporting SMP configurations. The main change made to this scheduler
|
171 |
|
|
is to cope with having several threads in execution at the same
|
172 |
|
|
time. Running threads are marked with the CPU they are executing on.
|
173 |
|
|
When scheduling a thread, the scheduler skips past any running threads
|
174 |
|
|
until it finds a thread that is pending. While not a constant-time
|
175 |
|
|
algorithm, as in the single CPU case, this is still deterministic,
|
176 |
|
|
since the worst case time is bounded by the number of CPUs in the
|
177 |
|
|
system.
|
178 |
|
|
|
179 |
|
|
A second change to the scheduler is in the code used to decide when
|
180 |
|
|
the scheduler should be called to choose a new thread. The scheduler
|
181 |
|
|
attempts to keep the *n* CPUs running the *n* highest priority
|
182 |
|
|
threads. Since an event or interrupt on one CPU may require a
|
183 |
|
|
reschedule on another CPU, there must be a mechanism for deciding
|
184 |
|
|
this. The algorithm currently implemented is very simple. Given a
|
185 |
|
|
thread that has just been awakened (or had its priority changed), the
|
186 |
|
|
scheduler scans the CPUs, starting with the one it is currently
|
187 |
|
|
running on, for a current thread that is of lower priority than the
|
188 |
|
|
new one. If one is found then a reschedule interrupt is sent to that
|
189 |
|
|
CPU and the scan continues, but now using the current thread of the
|
190 |
|
|
rescheduled CPU as the candidate thread. In this way the new thread
|
191 |
|
|
gets to run as quickly as possible, hopefully on the current CPU, and
|
192 |
|
|
the remaining CPUs will pick up the remaining highest priority
|
193 |
|
|
threads as a consequence of processing the reschedule interrupt.
|
194 |
|
|
|
195 |
|
|
The final change to the scheduler is in the handling of
|
196 |
|
|
timeslicing. Only one CPU receives timer interrupts, although all CPUs
|
197 |
|
|
must handle timeslicing. To make this work, the CPU that receives the
|
198 |
|
|
timer interrupt decrements the timeslice counter for all CPUs, not
|
199 |
|
|
just its own. If the counter for a CPU reaches zero, then it sends a
|
200 |
|
|
timeslice interrupt to that CPU. On receiving the interrupt the
|
201 |
|
|
destination CPU enters the scheduler and looks for another thread at
|
202 |
|
|
the same priority to run. This is somewhat more efficient than
|
203 |
|
|
distributing clock ticks to all CPUs, since the interrupt is only
|
204 |
|
|
needed when a timeslice occurs.
|
205 |
|
|
|
206 |
|
|
Device Drivers
|
207 |
|
|
~~~~~~~~~~~~~~
|
208 |
|
|
|
209 |
|
|
The main area where the SMP nature of a system will be most apparent
|
210 |
|
|
is in device drivers. It is quite possible for the ISR, DSR and thread
|
211 |
|
|
components of a device driver to execute on different CPUs. For this
|
212 |
|
|
reason it is much more important that SMP-capable device drivers use
|
213 |
|
|
the driver API routines correctly.
|
214 |
|
|
|
215 |
|
|
Synchronization between threads and DSRs continues to require that the
|
216 |
|
|
thread-side code use cyg_drv_dsr_lock() and cyg_drv_dsr_unlock() to
|
217 |
|
|
protect access to shared data. Synchronization between ISRs and DSRs
|
218 |
|
|
or threads requires that access to sensitive data be protected, in all
|
219 |
|
|
places, by calls to cyg_drv_isr_lock() and cyg_drv_isr_unlock().
|
220 |
|
|
|
221 |
|
|
The ISR lock, for SMP systems, not only disables local interrupts, but
|
222 |
|
|
also acquires a spinlock to protect against concurrent access from
|
223 |
|
|
other CPUs. This is necessary because ISRs are not run with the
|
224 |
|
|
scheduler lock claimed. Hence they can run in parallel with other
|
225 |
|
|
components of the device driver.
|
226 |
|
|
|
227 |
|
|
The ISR lock provided by the driver API is just a shared spinlock that
|
228 |
|
|
is available for use by all drivers. If a driver needs to implement a
|
229 |
|
|
finer grain of locking, it can use private spinlocks, accessed via the
|
230 |
|
|
cyg_drv_spinlock_*() functions (see API later).
|
231 |
|
|
|
232 |
|
|
|
233 |
|
|
API Extensions
|
234 |
|
|
--------------
|
235 |
|
|
|
236 |
|
|
In general, the SMP support is invisible to application code. All
|
237 |
|
|
synchronization and communication operations function exactly as
|
238 |
|
|
before. The main area where code needs to be SMP aware is in the
|
239 |
|
|
handling of interrupt routing, and in the synchronization of ISRs,
|
240 |
|
|
DSRs and threads.
|
241 |
|
|
|
242 |
|
|
The following sections contain brief descriptions of the API
|
243 |
|
|
extensions added for SMP support. More details will be found in the
|
244 |
|
|
Kernel C API and Device Driver API documentation.
|
245 |
|
|
|
246 |
|
|
Interrupt Routing
|
247 |
|
|
~~~~~~~~~~~~~~~~~
|
248 |
|
|
|
249 |
|
|
Two new functions have been added to the Kernel API and the device
|
250 |
|
|
driver API to do interrupt routing. These are:
|
251 |
|
|
|
252 |
|
|
void cyg_interrupt_set_cpu( cyg_vector_t vector, cyg_cpu_t cpu );
|
253 |
|
|
void cyg_drv_interrupt_set_cpu( cyg_vector_t vector, cyg_cpu_t cpu );
|
254 |
|
|
|
255 |
|
|
cyg_cpu_t cyg_interrupt_get_cpu( cyg_vector_t vector );
|
256 |
|
|
cyg_cpu_t cyg_drv_interrupt_get_cpu( cyg_vector_t vector );
|
257 |
|
|
|
258 |
|
|
the *_set_cpu() functions cause the given interrupt to be handled by
|
259 |
|
|
the nominated CPU.
|
260 |
|
|
|
261 |
|
|
The *_get_cpu() functions return the CPU to which the vector is
|
262 |
|
|
routed.
|
263 |
|
|
|
264 |
|
|
Although not currently supported, special values for the cpu argument
|
265 |
|
|
may be used to indicate that the interrupt is being routed dynamically
|
266 |
|
|
or is CPU-local.
|
267 |
|
|
|
268 |
|
|
Once a vector has been routed to a new CPU, all other interrupt
|
269 |
|
|
masking and configuration operations are relative to that CPU, where
|
270 |
|
|
relevant.
|
271 |
|
|
|
272 |
|
|
Synchronization
|
273 |
|
|
~~~~~~~~~~~~~~~
|
274 |
|
|
|
275 |
|
|
All existing synchronization mechanisms work as before in an SMP
|
276 |
|
|
system. Additional synchronization mechanisms have been added to
|
277 |
|
|
provide explicit synchronization for SMP.
|
278 |
|
|
|
279 |
|
|
A set of functions have been added to the Kernel and device driver
|
280 |
|
|
APIs to provide spinlocks:
|
281 |
|
|
|
282 |
|
|
void cyg_spinlock_init( cyg_spinlock_t *lock, cyg_bool_t locked );
|
283 |
|
|
void cyg_drv_spinlock_init( cyg_spinlock_t *lock, cyg_bool_t locked );
|
284 |
|
|
|
285 |
|
|
void cyg_spinlock_destroy( cyg_spinlock_t *lock );
|
286 |
|
|
void cyg_drv_spinlock_destroy( cyg_spinlock_t *lock );
|
287 |
|
|
|
288 |
|
|
void cyg_spinlock_spin( cyg_spinlock_t *lock );
|
289 |
|
|
void cyg_drv_spinlock_spin( cyg_spinlock_t *lock );
|
290 |
|
|
|
291 |
|
|
void cyg_spinlock_clear( cyg_spinlock_t *lock );
|
292 |
|
|
void cyg_drv_spinlock_clear( cyg_spinlock_t *lock );
|
293 |
|
|
|
294 |
|
|
cyg_bool_t cyg_spinlock_try( cyg_spinlock_t *lock );
|
295 |
|
|
cyg_bool_t cyg_drv_spinlock_try( cyg_spinlock_t *lock );
|
296 |
|
|
|
297 |
|
|
cyg_bool_t cyg_spinlock_test( cyg_spinlock_t *lock );
|
298 |
|
|
cyg_bool_t cyg_drv_spinlock_test( cyg_spinlock_t *lock );
|
299 |
|
|
|
300 |
|
|
void cyg_spinlock_spin_intsave( cyg_spinlock_t *lock,
|
301 |
|
|
cyg_addrword_t *istate );
|
302 |
|
|
void cyg_drv_spinlock_spin_intsave( cyg_spinlock_t *lock,
|
303 |
|
|
cyg_addrword_t *istate );
|
304 |
|
|
|
305 |
|
|
void cyg_spinlock_clear_intsave( cyg_spinlock_t *lock,
|
306 |
|
|
cyg_addrword_t istate );
|
307 |
|
|
void cyg_drv_spinlock_clear_intsave( cyg_spinlock_t *lock,
|
308 |
|
|
cyg_addrword_t istate );
|
309 |
|
|
|
310 |
|
|
The *_init() functions initialize the lock, to either locked or clear,
|
311 |
|
|
and the *_destroy() functions destroy the lock. Init() should be called
|
312 |
|
|
before the lock is used and destroy() should be called when it is
|
313 |
|
|
finished with.
|
314 |
|
|
|
315 |
|
|
The *_spin() functions will cause the calling CPU to spin until it can
|
316 |
|
|
claim the lock and the *_clear() functions clear the lock so that the
|
317 |
|
|
next CPU can claim it. The *_try() functions attempts to claim the lock
|
318 |
|
|
but returns false if it cannot. The *_test() functions simply return
|
319 |
|
|
the state of the lock.
|
320 |
|
|
|
321 |
|
|
None of these functions will necessarily block interrupts while they
|
322 |
|
|
spin. If the spinlock is only to be used between threads on different
|
323 |
|
|
CPUs, or in circumstances where it is known that the relevant
|
324 |
|
|
interrupts are disabled, then these functions will suffice. However,
|
325 |
|
|
if the spinlock is also to be used from an ISR, which may be called at
|
326 |
|
|
any point, a straightforward spinlock may result in deadlock. Hence
|
327 |
|
|
the *_intsave() variants are supplied to disable interrupts while the
|
328 |
|
|
lock is held.
|
329 |
|
|
|
330 |
|
|
The *_spin_intsave() function disables interrupts, saving the current
|
331 |
|
|
state in *istate, and then claims the lock. The *_clear_intsave()
|
332 |
|
|
function clears the spinlock and restores the interrupt enable state
|
333 |
|
|
from *istate.
|
334 |
|
|
|
335 |
|
|
|
336 |
|
|
HAL Support
|
337 |
|
|
-----------
|
338 |
|
|
|
339 |
|
|
SMP support in any platform depends on the HAL supplying the
|
340 |
|
|
appropriate operations. All HAL SMP support is defined in the
|
341 |
|
|
hal_smp.h header (and if necessary var_smp.h and plf_smp.h).
|
342 |
|
|
|
343 |
|
|
SMP support falls into a number of functional groups.
|
344 |
|
|
|
345 |
|
|
CPU Control
|
346 |
|
|
~~~~~~~~~~~
|
347 |
|
|
|
348 |
|
|
This group consists of descriptive and control macros for managing the
|
349 |
|
|
CPUs in an SMP system.
|
350 |
|
|
|
351 |
|
|
HAL_SMP_CPU_TYPE A type that can contain a CPU id. A CPU id is
|
352 |
|
|
usually a small integer that is used to index
|
353 |
|
|
arrays of variables that are managed on an
|
354 |
|
|
per-CPU basis.
|
355 |
|
|
|
356 |
|
|
HAL_SMP_CPU_MAX The maximum number of CPUs that can be
|
357 |
|
|
supported. This is used to provide the size of
|
358 |
|
|
any arrays that have an element per CPU.
|
359 |
|
|
|
360 |
|
|
HAL_SMP_CPU_COUNT() Returns the number of CPUs currently
|
361 |
|
|
operational. This may differ from
|
362 |
|
|
HAL_SMP_CPU_MAX depending on the runtime
|
363 |
|
|
environment.
|
364 |
|
|
|
365 |
|
|
HAL_SMP_CPU_THIS() Returns the CPU id of the current CPU.
|
366 |
|
|
|
367 |
|
|
HAL_SMP_CPU_NONE A value that does not match any real CPU
|
368 |
|
|
id. This is uses where a CPU type variable
|
369 |
|
|
must be set to a nul value.
|
370 |
|
|
|
371 |
|
|
HAL_SMP_CPU_START( cpu )
|
372 |
|
|
Starts the given CPU executing at a defined
|
373 |
|
|
HAL entry point. After performing any HAL
|
374 |
|
|
level initialization, the CPU calls up into
|
375 |
|
|
the kernel at cyg_kernel_cpu_startup().
|
376 |
|
|
|
377 |
|
|
HAL_SMP_CPU_RESCHEDULE_INTERRUPT( cpu, wait )
|
378 |
|
|
Sends the CPU a reschedule interrupt, and if
|
379 |
|
|
_wait_ is non-zero, waits for an
|
380 |
|
|
acknowledgment. The interrupted CPU should
|
381 |
|
|
call cyg_scheduler_set_need_reschedule() in
|
382 |
|
|
its DSR to cause the reschedule to occur.
|
383 |
|
|
|
384 |
|
|
HAL_SMP_CPU_TIMESLICE_INTERRUPT( cpu, wait )
|
385 |
|
|
Sends the CPU a timeslice interrupt, and if
|
386 |
|
|
_wait_ is non-zero, waits for an
|
387 |
|
|
acknowledgment. The interrupted CPU should
|
388 |
|
|
call cyg_scheduler_timeslice_cpu() to cause
|
389 |
|
|
the timeslice event to be processed.
|
390 |
|
|
|
391 |
|
|
Test-and-set Support
|
392 |
|
|
~~~~~~~~~~~~~~~~~~~~
|
393 |
|
|
|
394 |
|
|
Test-and-set is the foundation of the SMP synchronization
|
395 |
|
|
mechanisms.
|
396 |
|
|
|
397 |
|
|
HAL_TAS_TYPE The type for all test-and-set variables. The
|
398 |
|
|
test-and-set macros only support operations on
|
399 |
|
|
a single bit (usually the least significant
|
400 |
|
|
bit) of this location. This allows for maximum
|
401 |
|
|
flexibility in the implementation.
|
402 |
|
|
|
403 |
|
|
HAL_TAS_SET( tas, oldb )
|
404 |
|
|
Performs a test and set operation on the
|
405 |
|
|
location _tas_. _oldb_ will contain *true* if
|
406 |
|
|
the location was already set, and *false* if
|
407 |
|
|
it was clear.
|
408 |
|
|
|
409 |
|
|
HAL_TAS_CLEAR( tas, oldb )
|
410 |
|
|
Performs a test and clear operation on the
|
411 |
|
|
location _tas_. _oldb_ will contain *true* if
|
412 |
|
|
the location was already set, and *false* if
|
413 |
|
|
it was clear.
|
414 |
|
|
|
415 |
|
|
Spinlocks
|
416 |
|
|
~~~~~~~~~
|
417 |
|
|
|
418 |
|
|
Spinlocks provide inter-CPU locking. Normally they will be implemented
|
419 |
|
|
on top of the test-and-set mechanism above, but may also be
|
420 |
|
|
implemented by other means if, for example, the hardware has more
|
421 |
|
|
direct support for spinlocks.
|
422 |
|
|
|
423 |
|
|
HAL_SPINLOCK_TYPE The type for all spinlock variables.
|
424 |
|
|
|
425 |
|
|
HAL_SPINLOCK_INIT_CLEAR A value that may be assigned to a spinlock
|
426 |
|
|
variable to initialize it to clear.
|
427 |
|
|
|
428 |
|
|
HAL_SPINLOCK_INIT_SET A value that may be assigned to a spinlock
|
429 |
|
|
variable to initialize it to set.
|
430 |
|
|
|
431 |
|
|
HAL_SPINLOCK_SPIN( lock )
|
432 |
|
|
The caller spins in a busy loop waiting for
|
433 |
|
|
the lock to become clear. It then sets it and
|
434 |
|
|
continues. This is all handled atomically, so
|
435 |
|
|
that there are no race conditions between CPUs.
|
436 |
|
|
|
437 |
|
|
HAL_SPINLOCK_CLEAR( lock )
|
438 |
|
|
The caller clears the lock. One of any waiting
|
439 |
|
|
spinners will then be able to proceed.
|
440 |
|
|
|
441 |
|
|
HAL_SPINLOCK_TRY( lock, val )
|
442 |
|
|
Attempts to set the lock. The value put in
|
443 |
|
|
_val_ will be *true* if the lock was
|
444 |
|
|
claimed successfully, and *false* if it was
|
445 |
|
|
not.
|
446 |
|
|
|
447 |
|
|
HAL_SPINLOCK_TEST( lock, val )
|
448 |
|
|
Tests the current value of the lock. The value
|
449 |
|
|
put in _val_ will be *true* if the lock is
|
450 |
|
|
claimed and *false* of it is clear.
|
451 |
|
|
|
452 |
|
|
Scheduler Lock
|
453 |
|
|
~~~~~~~~~~~~~~
|
454 |
|
|
|
455 |
|
|
The scheduler lock is the main protection for all kernel data
|
456 |
|
|
structures. By default the kernel implements the scheduler lock itself
|
457 |
|
|
using a spinlock. However, if spinlocks cannot be supported by the
|
458 |
|
|
hardware, or there is a more efficient implementation available, the
|
459 |
|
|
HAL may provide macros to implement the scheduler lock.
|
460 |
|
|
|
461 |
|
|
HAL_SMP_SCHEDLOCK_DATA_TYPE
|
462 |
|
|
A data type, possibly a structure, that
|
463 |
|
|
contains any data items needed by the
|
464 |
|
|
scheduler lock implementation. A variable of
|
465 |
|
|
this type will be instantiated as a static
|
466 |
|
|
member of the Cyg_Scheduler_SchedLock class
|
467 |
|
|
and passed to all the following macros.
|
468 |
|
|
|
469 |
|
|
HAL_SMP_SCHEDLOCK_INIT( lock, data )
|
470 |
|
|
Initialize the scheduler lock. The _lock_
|
471 |
|
|
argument is the scheduler lock counter and the
|
472 |
|
|
_data_ argument is a variable of
|
473 |
|
|
HAL_SMP_SCHEDLOCK_DATA_TYPE type.
|
474 |
|
|
|
475 |
|
|
HAL_SMP_SCHEDLOCK_INC( lock, data )
|
476 |
|
|
Increment the scheduler lock. The first
|
477 |
|
|
increment of the lock from zero to one for any
|
478 |
|
|
CPU may cause it to wait until the lock is
|
479 |
|
|
zeroed by another CPU. Subsequent increments
|
480 |
|
|
should be less expensive since this CPU
|
481 |
|
|
already holds the lock.
|
482 |
|
|
|
483 |
|
|
HAL_SMP_SCHEDLOCK_ZERO( lock, data )
|
484 |
|
|
Zero the scheduler lock. This operation will
|
485 |
|
|
also clear the lock so that other CPUs may
|
486 |
|
|
claim it.
|
487 |
|
|
|
488 |
|
|
HAL_SMP_SCHEDLOCK_SET( lock, data, new )
|
489 |
|
|
|
490 |
|
|
Set the lock to a different value, in
|
491 |
|
|
_new_. This is only called when the lock is
|
492 |
|
|
already known to be owned by the current
|
493 |
|
|
CPU. It is never called to zero the lock, or
|
494 |
|
|
to increment it from zero.
|
495 |
|
|
|
496 |
|
|
|
497 |
|
|
Interrupt Routing
|
498 |
|
|
~~~~~~~~~~~~~~~~~
|
499 |
|
|
|
500 |
|
|
The routing of interrupts to different CPUs is supported by two new
|
501 |
|
|
interfaces in hal_intr.h.
|
502 |
|
|
|
503 |
|
|
Once an interrupt has been routed to a new CPU, the existing vector
|
504 |
|
|
masking and configuration operations should take account of the CPU
|
505 |
|
|
routing. For example, if the operation is not invoked on the
|
506 |
|
|
destination CPU itself, then the HAL may need to arrange to transfer
|
507 |
|
|
the operation to the destination CPU for correct application.
|
508 |
|
|
|
509 |
|
|
HAL_INTERRUPT_SET_CPU( vector, cpu )
|
510 |
|
|
Route the interrupt for the given _vector_ to
|
511 |
|
|
the given _cpu_.
|
512 |
|
|
|
513 |
|
|
HAL_INTERRUPT_GET_CPU( vector, cpu )
|
514 |
|
|
Set _cpu_ to the id of the CPU to which this
|
515 |
|
|
vector is routed.
|
516 |
|
|
|
517 |
|
|
|
518 |
|
|
|
519 |
|
|
|
520 |
|
|
|
521 |
|
|
Annex 1 - Pentium SMP Support
|
522 |
|
|
=============================
|
523 |
|
|
|
524 |
|
|
ECos supports SMP working on Pentium class IA32 CPUs with integrated
|
525 |
|
|
SMP support. It uses the per-CPU APIC's and the IOAPIC to provide CPU
|
526 |
|
|
control and identification, and to distribute interrupts. Only PCI
|
527 |
|
|
interrupts that map into the ISA interrupt space are currently
|
528 |
|
|
supported. The code relies on the MP Configuration Table supplied by
|
529 |
|
|
the BIOS to discover the number of CPUs, IOAPIC location and interrupt
|
530 |
|
|
assignments - hardware based MP configuration discovery is
|
531 |
|
|
not currently supported.
|
532 |
|
|
|
533 |
|
|
Inter-CPU interrupts are mapped into interrupt vectors from 64
|
534 |
|
|
up. Each CPU has its own vector at 64+CPUID.
|
535 |
|
|
|
536 |
|
|
Interrupt delivery is initially configured to deliver all interrupts
|
537 |
|
|
to the initial CPU. HAL_INTERRUPT_SET_CPU() currently only supports
|
538 |
|
|
the ability to deliver interrupts to specific CPUs, dynamic CPU
|
539 |
|
|
selection is not currently supported.
|
540 |
|
|
|
541 |
|
|
eCos has only been tested in a dual processor configuration. While the
|
542 |
|
|
code has been written to handle an arbitrary number of CPUs, this has
|
543 |
|
|
not been tested.
|
544 |
|
|
|