OpenCores
URL https://opencores.org/ocsvn/or1k_old/or1k_old/trunk

Subversion Repositories or1k_old

[/] [or1k_old/] [trunk/] [uclinux/] [uClinux-2.0.x/] [Documentation/] [smp.tex] - Blame information for rev 1782

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 199 simons
From: michael@Physik.Uni-Dortmund.DE (Michael Dirkmann)
2
 
3
thanks for your information. Attached is the tex-code of your
4
SMP-documentation :
5
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
6
\documentclass[]{article}
7
\parindent0.0cm
8
\parskip0.2cm
9
 
10
\begin{document}
11
 
12
\begin{center}
13
\LARGE \bf
14
An Implementation Of Multiprocessor Linux
15
\normalsize
16
\end{center}
17
 
18
{ \it
19
This document describes the implementation of a simple SMP
20
Linux kernel extension and how to use this to develop SMP Linux kernels for
21
architectures other than the Intel MP v1.1 architecture for Pentium and 486
22
processors.}
23
 
24
\hfill Alan Cox, 1995
25
 
26
 
27
The author wishes to thank Caldera Inc. ( http://www.caldera.com )
28
whose donation of an ASUS dual Pentium board made this project possible,
29
and Thomas Radke, whose initial work on multiprocessor Linux formed
30
the backbone of this project.
31
 
32
\section{Background: The Intel MP specification.}
33
Most IBM PC style multiprocessor motherboards combine Intel 486 or Pentium
34
processors and glue chipsets with a hardware/software specification. The
35
specification places much of the onus for hard work on the chipset and
36
hardware rather than the operating system.
37
 
38
The Intel Pentium processors have a wide variety of built-in facilities for
39
supporting multiprocessing, including hardware cache coherency, built in
40
interprocessor interrupt handling and a set of atomic test and set,
41
exchange and similar operations. The cache coherency in particular makes the
42
operating systems job far easier.
43
 
44
The specification defines a detailed configuration structure in ROM that
45
the boot up processor can read to find the full configuration of the
46
processors and busses. It also defines a procedure for starting up the
47
other processors.
48
 
49
 
50
\section{Mutual Exclusion Within A Single Processor Linux Kernel}
51
For any kernel to function in a sane manner it has to provide internal
52
locking and protection of its own tables to prevent two processes updating
53
them at once and for example allocating the same memory block. There are
54
two strategies for this within current Unix and Unixlike kernels.
55
Traditional unix systems from the earliest of days use a scheme of 'Coarse
56
Grained Locking' where the entire kernel is protected as a small number of
57
locks only. Some modern systems use fine grained locking. Because fine
58
grained locking has more overhead it is normally used only on
59
multiprocessor kernels and real time kernels. In a real time kernel the
60
fine grained locking reduces the amount of time locks are held and reduces
61
the critical (to real time programming at least) latency times.
62
 
63
Within the Linux kernel certain guarantees are made. No process running in
64
kernel mode will be pre-empted by another kernel mode process unless it
65
voluntarily sleeps.  This ensures that blocks of kernel code are
66
effectively atomic with respect to other processes and greatly simplifies
67
many operation. Secondly interrupts may pre-empt a kernel running process,
68
but will always return to that process. A process in kernel mode may
69
disable interrupts on the processor and guarantee such an interruption will
70
not occur. The final guarantee is that an interrupt will not be pre-empted
71
by a kernel task. That is interrupts will run to completion or be
72
pre-empted by other interrupts only.
73
 
74
The SMP kernel chooses to continue these basic guarantees in order to make
75
initial implementation and deployment easier.  A single lock is maintained
76
across all processors. This lock is required to access the kernel space.
77
Any processor may hold it and once it is held may also re-enter the kernel
78
for interrupts and other services whenever it likes until the lock is
79
relinquished. This lock ensures that a kernel mode process will not be
80
pre-empted and ensures that blocking interrupts in kernel mode behaves
81
correctly. This is guaranteed because only the processor holding the lock
82
can be in kernel mode, only kernel mode processes can disable interrupts
83
and only the processor holding the lock may handle an interrupt.
84
 
85
Such a choice is however poor for performance. In the longer term it is
86
necessary to move to finer grained parallelism in order to get the best
87
system performance. This can be done hierarchically by gradually refining
88
the locks to cover smaller areas. With the current kernel highly CPU bound
89
process sets perform well but I/O bound task sets can easily degenerate to
90
near single processor performance levels. This refinement will be needed to
91
get the best from Linux/SMP.
92
 
93
\subsection{Changes To The Portable Kernel Components}
94
The kernel changes are split into generic SMP support changes and
95
architecture specific changes necessary to accommodate each different
96
processor type Linux is ported to.
97
 
98
 
99
\subsubsection{Initialisation}
100
The first problem with a multiprocessor kernel is starting the other
101
processors up. Linux/SMP defines that a single processor enters the normal
102
kernel entry point start\_kernel(). Other processors are assumed not to be
103
started or to have been captured elsewhere. The first processor begins the
104
normal Linux initialisation sequences and sets up paging, interrupts and
105
trap handlers. After it has obtained the processor information about the
106
boot CPU, the architecture specific function
107
 
108
 
109
{\tt \bf{void smp\_store\_cpu\_info(int processor\_id) }}
110
 
111
is called to store any information about the processor into a per processor
112
array. This includes things like the bogomips speed ratings.
113
 
114
Having completed the kernel initialisation the architecture specific
115
function
116
 
117
{\tt \bf void smp\_boot\_cpus(void) }
118
 
119
is called and is expected to start up each other processor and cause it to
120
enter start\_kernel() with its paging registers and other control
121
information correctly loaded. Each other processor skips the setup except
122
for calling the trap and irq initialisation functions that are needed on
123
some processors to set each CPU up correctly.  These functions will
124
probably need to be modified in existing kernels to cope with this.
125
 
126
 
127
Each additional CPU the calls the architecture specific function
128
 
129
{\tt \bf void smp\_callin(void)}
130
 
131
which does any final setup and then spins the processor while the boot
132
up processor forks off enough idle threads for each processor. This is
133
necessary because the scheduler assumes there is always something to run.
134
Having generated these threads and forked init the architecture specific
135
 
136
{\tt \bf void smp\_commence(void)}
137
 
138
function is invoked. This does any final setup and indicates to the system
139
that multiprocessor mode is now active. All the processors spinning in the
140
smp\_callin() function are now released to run the idle processes, which
141
they will run when they have no real work to process.
142
 
143
 
144
\subsubsection{Scheduling}
145
The kernel scheduler implements a simple but very and effective task
146
scheduler. The basic structure of this scheduler is unchanged in the
147
multiprocessor kernel. A processor field is added to each task, and this
148
maintains the number of the processor executing a given task, or a magic
149
constant (NO\_PROC\_ID)  indicating the job is not allocated to a processor.
150
 
151
Each processor executes the scheduler itself and will select the next task
152
to run from all runnable processes not allocated to a different processor.
153
The algorithm used by the selection is otherwise unchanged. This is
154
actually inadequate for the final system because there are advantages to
155
keeping a process on the same CPU, especially on processor boards with per
156
processor second level caches.
157
 
158
Throughout the kernel the variable 'current' is used as a global for the
159
current process. In Linux/SMP this becomes a macro which expands to
160
current\_set[smp\_processor\_id()]. This enables almost the entire kernel to
161
be unaware of the array of running processors, but still allows the SMP
162
aware kernel modules to see all of the running processes.
163
 
164
The fork system call is modified to generate multiple processes with a
165
process id of zero until the SMP kernel starts up properly. This is
166
necessary because process number 1 must be init, and it is desirable that
167
all the system threads are process 0.
168
 
169
The final area within the scheduling of processes that does cause problems
170
is the fact the uniprocessor kernel hard codes tests for the idle threads
171
as task[0] and the init process as task[1]. Because there are multiple idle
172
threads it is necessary to replace these with tests that the process id is
173
 
174
 
175
\subsubsection{Memory Management}
176
The memory management core of the existing Linux system functions
177
adequately within the multiprocessor framework providing the locking is
178
used. Certain processor specific areas do need changing, in particular
179
invalidate() must invalidate the TLBs of all processors before it returns.
180
 
181
 
182
\subsubsection{Miscellaneous Functions}
183
The portable SMP code rests on a small set of functions and variables
184
that are provided by the processor specification functionality. These are
185
 
186
{\tt \bf int smp\_processor\_id(void) }
187
 
188
which returns the identity of the process the call is executed upon. This
189
call is assumed to be valid at all times. This may mean additional tests
190
are needed during initialisation.
191
 
192
 
193
{\tt \bf int smp\_num\_cpus;}
194
 
195
This is the number of processors in the system. \
196
 
197
{\tt \bf void smp\_message\_pass(int target, int msg, unsigned long data,
198
                int wait)}
199
 
200
This function passes messages between processors. At the moment it is not
201
sufficiently defined to sensibly document and needs cleaning up and further
202
work. Refer to the processor specific code documentation for more details.
203
 
204
 
205
\subsection{Architecture Specific Code For the Intel MP Port}
206
The architecture specific code for the intel port splits fairly cleanly
207
into four sections. Firstly the initialisation code used to boot the
208
system, secondly the message handling and support code, thirdly the
209
interrupt and kernel syscall entry function handling and finally the
210
extensions to standard kernel facilities to cope with multiple processors.
211
 
212
\subsubsection{Initialisation}
213
The Intel MP architecture captures all the processors except for a single
214
processor known as the 'boot processor' in the BIOS at boot time. Thus a
215
single processor enters the kernel bootup code. The first processor
216
executes the bootstrap code, loads and uncompresses the kernel. Having
217
unpacked the kernel it sets up the paging and control registers then enters
218
the C kernel startup.
219
 
220
The assembler startup code for the kernel is modified so that it can be
221
used by the other processors to do the processor identification and various
222
other low level configurations but does not execute those parts of the
223
startup code that would damage the running system (such as clearing the BSS
224
segment).
225
 
226
In the initialisation done by the first processor the arch/i386/mm/init
227
code is modified to scan the low page, top page and BIOS for intel MP
228
signature blocks. This is necessary because the MP signature blocks must
229
be read and processed before the kernel is allowed to allocate and destroy
230
the page at the top of low memory. Having established the number of
231
processors it reserves a set of pages to provide a stack come boot up area
232
for each processor in the system. These must be allocated at startup to
233
ensure they fall below the 1Mb boundary.
234
 
235
Further processors are started up in smp\_boot\_cpus() by programming the
236
APIC controller registers and sending an inter-processor interrupt (IPI) to
237
the processor. This message causes the target processor to begin executing
238
code at the start of any page of memory within the lowest 1Mb, in 16bit
239
real mode. The kernel uses the single page it allocated for each processor
240
to use as stack. Before booting a given CPU the relocatable code from
241
trampoline.S and trampoline32.S is copied to the bottom of its stack page
242
and used as the target for the startup.
243
 
244
The trampoline code calculates the desired stack base from the code
245
segment (since the code segment on startup is the bottom of the stack),
246
 enters 32bit mode and jumps to the kernel entry assembler. This as
247
described above is modified to only execute the parts necessary for each
248
processor, and then to enter start\_kernel(). On entering the kernel the
249
processor initialises its trap and interrupt handlers before entering
250
smp\_callin(), where it reports its status and sets a flag that causes the
251
boot processor to continue and look for further processors. The processor
252
then spins until smp\_commence() is invoked.
253
 
254
Having started each processor up the smp\_commence( ) function flips a
255
flag. Each processor spinning in smp\_callin() then loads the task register
256
with the task state segment (TSS) of its idle thread as is needed for task
257
switching.
258
 
259
\subsubsection{Message Handling and Support Code}
260
The architecture specific code implements the smp\_processor\_id() function
261
by querying the APIC logical identity register. Because the APIC isn't
262
mapped into the kernel address space at boot, the initial value returned is
263
rigged by setting the APIC base pointer to point at a suitable constant.
264
Once the system starts doing the SMP setup (in smp\_boot\_cpus()), the APIC
265
is mapped with a vremap() call and the apic pointer is adjusted
266
appropriately. From then on the real APIC logical identity register is
267
read.
268
 
269
Message passing is accomplished using a pair of IPIs on interrupt 13
270
(unused by the 80486 FPUs in SMP mode) and interrupt 16. Two are used in
271
order to separate messages that cannot be processed until the receiver
272
obtains the kernel spinlock from messages that can be processed
273
immediately. In effect IRQ 13 is a fast IRQ handler that does not obtain
274
the locks, and cannot cause a reschedule, while IRQ 16 is a slow IRQ that
275
must acquire the kernel spinlocks and can cause a reschedule. This
276
interrupt is used for passing on slave timer messages from the processor
277
that receives the timer interrupt to the rest of the processors, so that
278
they can reschedule running tasks.
279
 
280
 
281
\subsubsection{Entry And Exit Code}
282
A single spinlock protects the entire kernel. The interrupt handlers, the
283
syscall entry code and the exception handlers all acquire the lock before
284
entering the kernel proper. When the processor is trying to acquire the
285
spinlock it spins continually on the lock with interrupts disabled. This
286
causes a specific deadlock problem. The lock owner may need to send an
287
invalidate request to the rest of the processors and wait for these to
288
complete before continuing. A processor spinning on the lock would not be
289
able to do thus. Thus the loop of the spinlock tests and handles invalidate
290
requests. If the invalidate bit for the spinning CPU is set the processor
291
invalidates its TLB and atomically clears the bit. When the spinlock is
292
obtained that processor will take an IPI and in the IPI test the bit and
293
skip the invalidate as the bit is clear.
294
 
295
One complexity of the spinlock is that a process running in kernel mode
296
can sleep voluntarily and be pre-empted. A switch from such a process to a
297
process executing in user space may reduce the lock count. To track this
298
the kernel uses a syscall\_count and a per process lock\_depth parameter to
299
track the kernel lock state. The switch\_to() function is modified in SMP
300
mode to adjust the lock appropriately.
301
 
302
The final problem is the idle thread. In the single processor kernel the
303
idle thread executes 'hlt' instructions. This saves power and reduces the
304
running temperature of the processors when they are idle. However it means
305
the process spends all its time in kernel mode and would thus hold the
306
kernel spinlock. The SMP idle thread continually reschedules a new task and
307
returns to user mode. This is far from ideal and will be modified to use
308
'hlt' instructions and release the spinlock soon. Using 'hlt' is even more
309
beneficial on a multiprocessor system as it almost completely takes an idle
310
processor off the bus.
311
 
312
Interrupts are distributed by an i82489 APIC. This chip is set up to work
313
as an emulation of the traditional PC interrupt controllers when the
314
machine boots (so that an Intel MP machine boots one CPU and PC
315
compatible). The kernel has all the relevant locks but does not yet
316
reprogram the 82489 to deliver interrupts to arbitrary processors as it
317
should. This requires further modification of the standard Linux interrupt
318
handling code, and is particularly messy as the interrupt handler behaviour
319
has to change as soon as the 82489 is switched into SMP mode.
320
 
321
 
322
\subsubsection{Extensions To Standard Facilities}
323
The kernel maintains a set of per processor control information such as
324
the speed of the processor for delay loops. These functions on the SMP
325
kernel look the values up in a per processor array that is set up from the
326
data generated at boot up by the smp\_store\_cpu\_info() function. This
327
includes other facts such as whether there is an FPU on the processor. The
328
current kernel does not handle floating point correctly, this requires some
329
changes to the techniques the single CPU kernel uses to minimise floating
330
point processor reloads.
331
 
332
The highly useful atomic bit operations are prefixed with the 'lock'
333
prefix in the SMP kernel to maintain their atomic properties when used
334
outside of (and by) the spinlock and message code. Amongst other things
335
this is needed for the invalidate handler, as all  CPU's will invalidate at
336
the same time without any locks.
337
 
338
Interrupt 13 floating point error reporting is removed. This facility is
339
not usable on a multiprocessor board, nor relevant to the Intel MP
340
architecture which does not cover the 80386/80387 processor pair. \
341
 
342
The /proc filesystem support is changed so that the /proc/cpuinfo file
343
contains a column for each processor present. This information is extracted
344
from the data save by smp\_store\_cpu\_info().
345
 
346
\end{document}

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.