1 |
7 |
hellwig |
|
2 |
|
|
Format of Object and Executable Files
|
3 |
|
|
-------------------------------------
|
4 |
|
|
|
5 |
|
|
0) General
|
6 |
|
|
|
7 |
|
|
The "a.out" file format is used as format for assembler output
|
8 |
|
|
("object files") as well as for linker output ("executable files").
|
9 |
|
|
The difference of these two is the size of certain sections of
|
10 |
|
|
the file being zero in case of executable files.
|
11 |
|
|
|
12 |
|
|
The file consists of a header and up to 6 sections:
|
13 |
|
|
- code
|
14 |
|
|
- initialized data
|
15 |
|
|
- code relocation records
|
16 |
|
|
- data relocation records
|
17 |
|
|
- symbol table records
|
18 |
|
|
- symbol string storage
|
19 |
|
|
|
20 |
|
|
|
21 |
|
|
1) Header
|
22 |
|
|
|
23 |
|
|
The header specifies the sizes of all following sections,
|
24 |
|
|
but has itself a fixed length (and is always present):
|
25 |
|
|
|
26 |
|
|
typedef struct {
|
27 |
|
|
unsigned int magic; /* must be EXEC_MAGIC */
|
28 |
|
|
unsigned int csize; /* size of code in bytes */
|
29 |
|
|
unsigned int dsize; /* size of initialized data in bytes */
|
30 |
|
|
unsigned int bsize; /* size of uninitialized data in bytes */
|
31 |
|
|
unsigned int crsize; /* size of code relocation info in bytes */
|
32 |
|
|
unsigned int drsize; /* size of data relocation info in bytes */
|
33 |
|
|
unsigned int symsize; /* size of symbol table in bytes */
|
34 |
|
|
unsigned int strsize; /* size of string space in bytes */
|
35 |
|
|
} ExecHeader;
|
36 |
|
|
|
37 |
|
|
The magic number is used to distinguish executable files from
|
38 |
|
|
other file types. This field must have the value EXEC_MAGIC.
|
39 |
|
|
|
40 |
|
|
The code size is given in bytes, but is always a multiple of 4.
|
41 |
|
|
This is the exact size of the code section in the executable file.
|
42 |
|
|
For the code size in memory see "executing an executable" below.
|
43 |
|
|
|
44 |
|
|
The initialized data size is given in bytes, but is always a
|
45 |
|
|
multiple of 4. This is the exact size of the initialized data
|
46 |
|
|
section in the executable file. For the data size in memory see
|
47 |
|
|
"executing an executable" below.
|
48 |
|
|
|
49 |
|
|
The uninitialized data size is given in bytes, but is always a
|
50 |
|
|
multiple of 4. There is no corresponding section to this value
|
51 |
|
|
contained within the executable. See "executing an executable"
|
52 |
|
|
below for the semantics of this value.
|
53 |
|
|
|
54 |
|
|
The code relocation info size is given in bytes, but is always
|
55 |
|
|
a multiple of sizeof(RelocRecord). These records describe the
|
56 |
|
|
changes to be applied to the code section during the link step.
|
57 |
|
|
This size is zero if the file is an executable.
|
58 |
|
|
|
59 |
|
|
The data relocation info size is given in bytes, but is always
|
60 |
|
|
a multiple of sizeof(RelocRecord). These records describe the
|
61 |
|
|
changes to be applied to the data section during the link step.
|
62 |
|
|
This size is zero if the file is an executable.
|
63 |
|
|
|
64 |
|
|
The size of the symbol table is given in bytes, but is always
|
65 |
|
|
a multiple of sizeof(SymbolRecord). The symbol table contains
|
66 |
|
|
information about symbols which are exported from or imported
|
67 |
|
|
into this object file. It is mainly used during the link step
|
68 |
|
|
and may not be present (size = 0) if the file is an executable.
|
69 |
|
|
|
70 |
|
|
The string space is used to store the names of the symbols in
|
71 |
|
|
the symbol table.
|
72 |
|
|
|
73 |
|
|
|
74 |
|
|
2) Code/Initialized Data
|
75 |
|
|
|
76 |
|
|
These sections contain the instructions and the initialized data
|
77 |
|
|
of the program, respectively.
|
78 |
|
|
|
79 |
|
|
|
80 |
|
|
3) Code/Data Relocation Records
|
81 |
|
|
|
82 |
|
|
The relocation records have the following structure:
|
83 |
|
|
|
84 |
|
|
typedef struct {
|
85 |
|
|
unsigned int offset; /* where to relocate */
|
86 |
|
|
int method; /* how to relocate */
|
87 |
|
|
int value; /* additive part of value */
|
88 |
|
|
int base; /* if MSB = 0: segment number */
|
89 |
|
|
/* if MSB = 1: symbol table index */
|
90 |
|
|
} RelocRecord;
|
91 |
|
|
|
92 |
|
|
The offset gives the position where the relocation has to be done,
|
93 |
|
|
in the form of a byte offset from the beginning of the section.
|
94 |
|
|
|
95 |
|
|
The method determines how the relocation is performed, and must
|
96 |
|
|
be one of the following constants:
|
97 |
|
|
METHOD_H16 /* write 16 bits with high part of value */
|
98 |
|
|
METHOD_L16 /* write 16 bits with low part of value */
|
99 |
|
|
METHOD_R16 /* write 16 bits with value relative to PC */
|
100 |
|
|
METHOD_R26 /* write 26 bits with value relative to PC */
|
101 |
|
|
METHOD_W32 /* write full 32 bit word with value */
|
102 |
|
|
|
103 |
|
|
"Value" and "base" together are used to compute the final value of
|
104 |
|
|
the relocated code or data item. The value is added to the value of
|
105 |
|
|
the base. The base is either the start address of a segment in memory,
|
106 |
|
|
or the value of an imported symbol. In the former case, which is marked
|
107 |
|
|
by an MSB of 0, the base is specified as one of the following constants:
|
108 |
|
|
SEGMENT_ABS /* absolute values */
|
109 |
|
|
SEGMENT_CODE /* code segment */
|
110 |
|
|
SEGMENT_DATA /* initialized data segment */
|
111 |
|
|
SEGMENT_BSS /* uninitialized data segment */
|
112 |
|
|
In the latter case, which is marked by an MSB of 1, the remaining bits
|
113 |
|
|
specify the index of the symbol in the symbol table.
|
114 |
|
|
|
115 |
|
|
|
116 |
|
|
4) Symbol Table Records
|
117 |
|
|
|
118 |
|
|
For every symbol which is imported into or exported from the current
|
119 |
|
|
object file, there is a corresponding symbol table record:
|
120 |
|
|
|
121 |
|
|
typedef struct {
|
122 |
|
|
unsigned int name; /* offset in string space */
|
123 |
|
|
int type; /* if MSB = 0: the symbol's segment */
|
124 |
|
|
/* if MSB = 1: the symbol is undefined */
|
125 |
|
|
int value; /* if symbol defined: the symbol's value */
|
126 |
|
|
/* if symbol not defined: meaningless */
|
127 |
|
|
} SymbolRecord;
|
128 |
|
|
|
129 |
|
|
The name of the symbol is given as an offset into the string space.
|
130 |
|
|
|
131 |
|
|
If the "type" has an MSB of 0, the symbol is defined here (i.e.,
|
132 |
|
|
it is exported), and the "type" specifies the segment (for the
|
133 |
|
|
segment constants see above) in which the symbol is defined, while
|
134 |
|
|
the "value" holds its value. Otherwise, the symbol is not defined
|
135 |
|
|
here (i.e., it is imported), and the "value" has no meaning.
|
136 |
|
|
|
137 |
|
|
|
138 |
|
|
5) Symbol String Storage
|
139 |
|
|
|
140 |
|
|
The strings are null-terminated and stored without any padding.
|
141 |
|
|
|
142 |
|
|
|
143 |
|
|
6) Executing an Executable
|
144 |
|
|
|
145 |
|
|
When an executable file is loaded into memory for execution, three
|
146 |
|
|
logical segments are set up: the code segment, the data segment (with
|
147 |
|
|
initialized data, followed by uninitialized data, which starts off
|
148 |
|
|
as all 0), and a stack.
|
149 |
|
|
|
150 |
|
|
The code segment begins at address 0 in virtual memory and is loaded
|
151 |
|
|
with the contents of the code section from the executable file.
|
152 |
|
|
|
153 |
|
|
The data segment begins at the next page boundary (multiple of 4 KB)
|
154 |
|
|
after the code segment. It is loaded with the contents of the data
|
155 |
|
|
section from the executable file and is followed immediately by the
|
156 |
|
|
"uninitialized data", which must be zeroed by the loader. The data
|
157 |
|
|
area is expanded upwards as requested by explicit "brk" system calls.
|
158 |
|
|
|
159 |
|
|
The stack is located in the highest possible locations in the virtual
|
160 |
|
|
address space, which are accessible in user mode, and thus expanding
|
161 |
|
|
downwards from (but excluding) the address 0x80000000. It is extended
|
162 |
|
|
automatically by the operating system.
|