1 |
1026 |
ivang |
@c
|
2 |
|
|
@c COPYRIGHT (c) 1988-2002.
|
3 |
|
|
@c On-Line Applications Research Corporation (OAR).
|
4 |
|
|
@c All rights reserved.
|
5 |
|
|
@c
|
6 |
|
|
@c codetuning.t,v 1.5 2002/01/17 21:47:45 joel Exp
|
7 |
|
|
@c
|
8 |
|
|
|
9 |
|
|
@chapter Code Tuning Parameters
|
10 |
|
|
|
11 |
|
|
@section Inline Thread_Enable_dispatch
|
12 |
|
|
|
13 |
|
|
Should the calls to _Thread_Enable_dispatch be inlined?
|
14 |
|
|
|
15 |
|
|
If TRUE, then they are inlined.
|
16 |
|
|
|
17 |
|
|
If FALSE, then a subroutine call is made.
|
18 |
|
|
|
19 |
|
|
|
20 |
|
|
Basically this is an example of the classic trade-off of size versus
|
21 |
|
|
speed. Inlining the call (TRUE) typically increases the size of RTEMS
|
22 |
|
|
while speeding up the enabling of dispatching.
|
23 |
|
|
|
24 |
|
|
[NOTE: In general, the _Thread_Dispatch_disable_level will only be 0 or 1
|
25 |
|
|
unless you are in an interrupt handler and that interrupt handler invokes
|
26 |
|
|
the executive.] When not inlined something calls _Thread_Enable_dispatch
|
27 |
|
|
which in turns calls _Thread_Dispatch. If the enable dispatch is inlined,
|
28 |
|
|
then one subroutine call is avoided entirely.]
|
29 |
|
|
|
30 |
|
|
@example
|
31 |
|
|
#define CPU_INLINE_ENABLE_DISPATCH FALSE
|
32 |
|
|
@end example
|
33 |
|
|
|
34 |
|
|
@section Inline Thread_queue_Enqueue_priority
|
35 |
|
|
|
36 |
|
|
Should the body of the search loops in _Thread_queue_Enqueue_priority be
|
37 |
|
|
unrolled one time? In unrolled each iteration of the loop examines two
|
38 |
|
|
"nodes" on the chain being searched. Otherwise, only one node is examined
|
39 |
|
|
per iteration.
|
40 |
|
|
|
41 |
|
|
If TRUE, then the loops are unrolled.
|
42 |
|
|
|
43 |
|
|
If FALSE, then the loops are not unrolled.
|
44 |
|
|
|
45 |
|
|
The primary factor in making this decision is the cost of disabling and
|
46 |
|
|
enabling interrupts (_ISR_Flash) versus the cost of rest of the body of
|
47 |
|
|
the loop. On some CPUs, the flash is more expensive than one iteration of
|
48 |
|
|
the loop body. In this case, it might be desirable to unroll the loop.
|
49 |
|
|
It is important to note that on some CPUs, this code is the longest
|
50 |
|
|
interrupt disable period in RTEMS. So it is necessary to strike a balance
|
51 |
|
|
when setting this parameter.
|
52 |
|
|
|
53 |
|
|
@example
|
54 |
|
|
#define CPU_UNROLL_ENQUEUE_PRIORITY TRUE
|
55 |
|
|
@end example
|
56 |
|
|
|
57 |
|
|
|
58 |
|
|
@section Structure Alignment Optimization
|
59 |
|
|
|
60 |
|
|
The following macro may be defined to the attribute setting used to force
|
61 |
|
|
alignment of critical RTEMS structures. On some processors it may make
|
62 |
|
|
sense to have these aligned on tighter boundaries than the minimum
|
63 |
|
|
requirements of the compiler in order to have as much of the critical data
|
64 |
|
|
area as possible in a cache line. This ensures that the first access of
|
65 |
|
|
an element in that structure fetches most, if not all, of the data
|
66 |
|
|
structure and places it in the data cache. Modern CPUs often have cache
|
67 |
|
|
lines of at least 16 bytes and thus a single access implicitly fetches
|
68 |
|
|
some surrounding data and places that unreferenced data in the cache.
|
69 |
|
|
Taking advantage of this allows RTEMS to essentially prefetch critical
|
70 |
|
|
data elements.
|
71 |
|
|
|
72 |
|
|
The placement of this macro in the declaration of the variables is based
|
73 |
|
|
on the syntactically requirements of the GNU C "__attribute__" extension.
|
74 |
|
|
For another toolset, the placement of this macro could be incorrect. For
|
75 |
|
|
example with GNU C, use the following definition of
|
76 |
|
|
CPU_STRUCTURE_ALIGNMENT to force a structures to a 32 byte boundary.
|
77 |
|
|
|
78 |
|
|
#define CPU_STRUCTURE_ALIGNMENT __attribute__ ((aligned (32)))
|
79 |
|
|
|
80 |
|
|
To benefit from using this, the data must be heavily used so it will stay
|
81 |
|
|
in the cache and used frequently enough in the executive to justify
|
82 |
|
|
turning this on. NOTE: Because of this, only the Priority Bit Map table
|
83 |
|
|
currently uses this feature.
|
84 |
|
|
|
85 |
|
|
The following illustrates how the CPU_STRUCTURE_ALIGNMENT is defined on
|
86 |
|
|
ports which require no special alignment for optimized access to data
|
87 |
|
|
structures:
|
88 |
|
|
|
89 |
|
|
@example
|
90 |
|
|
#define CPU_STRUCTURE_ALIGNMENT
|
91 |
|
|
@end example
|
92 |
|
|
|
93 |
|
|
@section Data Alignment Requirements
|
94 |
|
|
|
95 |
|
|
@subsection Data Element Alignment
|
96 |
|
|
|
97 |
|
|
The CPU_ALIGNMENT macro should be set to the CPU's worst alignment
|
98 |
|
|
requirement for data types on a byte boundary. This is typically the
|
99 |
|
|
alignment requirement for a C double. This alignment does not take into
|
100 |
|
|
account the requirements for the stack.
|
101 |
|
|
|
102 |
|
|
The following sets the CPU_ALIGNMENT macro to 8 which indicates that there
|
103 |
|
|
is a basic C data type for this port which much be aligned to an 8 byte
|
104 |
|
|
boundary.
|
105 |
|
|
|
106 |
|
|
@example
|
107 |
|
|
#define CPU_ALIGNMENT 8
|
108 |
|
|
@end example
|
109 |
|
|
|
110 |
|
|
@subsection Heap Element Alignment
|
111 |
|
|
|
112 |
|
|
The CPU_HEAP_ALIGNMENT macro is set to indicate the byte alignment
|
113 |
|
|
requirement for data allocated by the RTEMS Code Heap Handler. This
|
114 |
|
|
alignment requirement may be stricter than that for the data types
|
115 |
|
|
alignment specified by CPU_ALIGNMENT. It is common for the heap to follow
|
116 |
|
|
the same alignment requirement as CPU_ALIGNMENT. If the CPU_ALIGNMENT is
|
117 |
|
|
strict enough for the heap, then this should be set to CPU_ALIGNMENT. This
|
118 |
|
|
macro is necessary to ensure that allocated memory is properly aligned for
|
119 |
|
|
use by high level language routines.
|
120 |
|
|
|
121 |
|
|
The following example illustrates how the CPU_HEAP_ALIGNMENT macro is set
|
122 |
|
|
when the required alignment for elements from the heap is the same as the
|
123 |
|
|
basic CPU alignment requirements.
|
124 |
|
|
|
125 |
|
|
@example
|
126 |
|
|
#define CPU_HEAP_ALIGNMENT CPU_ALIGNMENT
|
127 |
|
|
@end example
|
128 |
|
|
|
129 |
|
|
NOTE: This does not have to be a power of 2. It does have to be greater
|
130 |
|
|
or equal to than CPU_ALIGNMENT.
|
131 |
|
|
|
132 |
|
|
@subsection Partition Element Alignment
|
133 |
|
|
|
134 |
|
|
The CPU_PARTITION_ALIGNMENT macro is set to indicate the byte alignment
|
135 |
|
|
requirement for memory buffers allocated by the RTEMS Partition Manager
|
136 |
|
|
that is part of the Classic API. This alignment requirement may be
|
137 |
|
|
stricter than that for the data types alignment specified by
|
138 |
|
|
CPU_ALIGNMENT. It is common for the partition to follow the same
|
139 |
|
|
alignment requirement as CPU_ALIGNMENT. If the CPU_ALIGNMENT is strict
|
140 |
|
|
enough for the partition, then this should be set to CPU_ALIGNMENT. This
|
141 |
|
|
macro is necessary to ensure that allocated memory is properly aligned for
|
142 |
|
|
use by high level language routines.
|
143 |
|
|
|
144 |
|
|
The following example illustrates how the CPU_PARTITION_ALIGNMENT macro is
|
145 |
|
|
set when the required alignment for elements from the RTEMS Partition
|
146 |
|
|
Manager is the same as the basic CPU alignment requirements.
|
147 |
|
|
|
148 |
|
|
|
149 |
|
|
@example
|
150 |
|
|
#define CPU_PARTITION_ALIGNMENT CPU_ALIGNMENT
|
151 |
|
|
@end example
|
152 |
|
|
|
153 |
|
|
NOTE: This does not have to be a power of 2. It does have to be greater
|
154 |
|
|
or equal to than CPU_ALIGNMENT.
|
155 |
|
|
|