1 |
3 |
xianfeng |
Started Nov 1999 by Kanoj Sarcar
|
2 |
|
|
|
3 |
|
|
The intent of this file is to have an uptodate, running commentary
|
4 |
|
|
from different people about NUMA specific code in the Linux vm.
|
5 |
|
|
|
6 |
|
|
What is NUMA? It is an architecture where the memory access times
|
7 |
|
|
for different regions of memory from a given processor varies
|
8 |
|
|
according to the "distance" of the memory region from the processor.
|
9 |
|
|
Each region of memory to which access times are the same from any
|
10 |
|
|
cpu, is called a node. On such architectures, it is beneficial if
|
11 |
|
|
the kernel tries to minimize inter node communications. Schemes
|
12 |
|
|
for this range from kernel text and read-only data replication
|
13 |
|
|
across nodes, and trying to house all the data structures that
|
14 |
|
|
key components of the kernel need on memory on that node.
|
15 |
|
|
|
16 |
|
|
Currently, all the numa support is to provide efficient handling
|
17 |
|
|
of widely discontiguous physical memory, so architectures which
|
18 |
|
|
are not NUMA but can have huge holes in the physical address space
|
19 |
|
|
can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM.
|
20 |
|
|
|
21 |
|
|
The initial port includes NUMAizing the bootmem allocator code by
|
22 |
|
|
encapsulating all the pieces of information into a bootmem_data_t
|
23 |
|
|
structure. Node specific calls have been added to the allocator.
|
24 |
|
|
In theory, any platform which uses the bootmem allocator should
|
25 |
|
|
be able to put the bootmem and mem_map data structures anywhere
|
26 |
|
|
it deems best.
|
27 |
|
|
|
28 |
|
|
Each node's page allocation data structures have also been encapsulated
|
29 |
|
|
into a pg_data_t. The bootmem_data_t is just one part of this. To
|
30 |
|
|
make the code look uniform between NUMA and regular UMA platforms,
|
31 |
|
|
UMA platforms have a statically allocated pg_data_t too (contig_page_data).
|
32 |
|
|
For the sake of uniformity, the function num_online_nodes() is also defined
|
33 |
|
|
for all platforms. As we run benchmarks, we might decide to NUMAize
|
34 |
|
|
more variables like low_on_memory, nr_free_pages etc into the pg_data_t.
|
35 |
|
|
|
36 |
|
|
The NUMA aware page allocation code currently tries to allocate pages
|
37 |
|
|
from different nodes in a round robin manner. This will be changed to
|
38 |
|
|
do concentratic circle search, starting from current node, once the
|
39 |
|
|
NUMA port achieves more maturity. The call alloc_pages_node has been
|
40 |
|
|
added, so that drivers can make the call and not worry about whether
|
41 |
|
|
it is running on a NUMA or UMA platform.
|