OpenCores
URL https://opencores.org/ocsvn/forwardcom/forwardcom/trunk

Subversion Repositories forwardcom

[/] [forwardcom/] [trunk/] [loader.as] - Blame information for rev 103

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 70 Agner
/****************************  loader.as  ********************************
2
* Author:        Agner Fog
3
* date created:  2020-12-04
4
* Last modified: 2021-07-30
5
* Version:       1.11
6
* Project:       Loader for ForwardCom soft core
7
* Language:      ForwardCom assembly
8
* Description:
9
* This loader is designed to run in a ForwardCom processor to load an
10
* executable file into code and data RAM before running the loaded program.
11
*
12
* Copyright 2020-2021 GNU General Public License v.3 http://www.gnu.org/licenses
13
******************************************************************************
14
 
15
Prerequisites:
16
The executable file to be loaded is structured as defined in the ForwardCom
17
ELF specification defined in the file elf_forwardcom.h.
18
The sections are sorted into blocks in the following order
19
(see CLinker::sortSections() in file linker.cpp):
20
* const (ip)
21
* code (ip)
22
* data (datap)
23
* bss (datap)
24
* data (threadp)
25
* bss (threadp)
26
The binary data sections are stored in the executable file in the same order
27
as the program headers.
28
The executable file is position-independent. No relocation of addresses in
29
the code is needed.
30
The program has only one thread.
31
The available RAM is sufficient.
32
The input is loaded as bytes through a serial input port (BAUD rate set in defines.vh)
33
 
34
The data will be stored in the processor memory in the following order:
35
 
36
1. data (at beginning of data memory. Addressed by datap)
37
2. bss (uninitialized data, immediately after data. Addressed by datap)
38
3. free space to use for heap and stack. (The stack pointer will point to the end of this space)
39
4. threadp data (immediately before const. Addressed by threadp)
40
5. const data (at end of data memory. Addressed by IP)
41
6. code (at beginning of code memory. Addressed by IP)
42
7. loader code (at end of code memory)
43
 
44
 
45
Instructions for how to modify and rebuild the loader:
46
-----------------------------------------------------------
47
 
48
1. The first instruction must be a direct jump to the loader code that
49
loads an executable program (*.ex file). The load button will go to this
50
address.
51
 
52
The second instruction at address 1 (word-based) must be an entry for the
53
restart code that will restart a previously loaded program. The reset button
54
will go to this address. The restart code must set datap, threadp, sp, and
55
the entry point to the values previously calculated by the loader.
56
The present version stores these values in instructions in the code section
57
in order to free the entire data memory for the running program.
58
Note that we have execute and write access (int32 only) to the code memory,
59
but not read access.
60
 
61
2. Assemble:
62
forw -ass -debug -binlist loader.as -list=loader.txt
63
 
64
3. Link:
65
forw -link -hex2 loader.mem loader.ob
66
 
67
4. Replace the file loader.mem in the softcore project with the new version.
68
 
69
5. Check size:
70
The size of the code section of the loader can be found from the address of
71
the last instruction in the file loader.txt produced by step 2.
72
If this size (in 32-bit words) exceeds the value MAX_LOADER_SIZE
73
defined in the file defines.vh, then the value of MAX_LOADER_SIZE must
74
be increased to at least the actual size. The value must be even.
75
 
76
The loader code will be placed at an address calculated as the end of the
77
code memory minus MAX_LOADER_SIZE.
78
 
79
6. Rebuild the soft core project.
80
 
81
*****************************************************************************/
82
 
83
// Definition of serial input ports
84
%serial_input_port   = 8                              // serial input port, read one byte at a time
85
%serial_input_status = 9                              // serial input status. bit 0-15 = number of bytes in input buffer
86
 
87
 
88
// Definition of offsets in the file header (struct ElfFwcEhdr in elf_forwardcom.h):
89
%e_ident        = 0x00  //  uint8_t   e_ident[16];    // Magic number and other info
90
%e_type         = 0x10  //  uint16_t  e_type;         // Object file type
91
%e_machine      = 0x12  //  uint16_t  e_machine;      // Architecture
92
%e_version      = 0x14  //  uint32_t  e_version;      // Object file version
93
%e_entry        = 0x18  //  uint64_t  e_entry;        // Entry point virtual address
94
%e_phoff        = 0x20  //  uint64_t  e_phoff;        // Program header table file offset
95
%e_shoff        = 0x28  //  uint64_t  e_shoff;        // Section header table file offset
96
%e_flags        = 0x30  //  uint32_t  e_flags;        // Processor-specific flags. We may define any values for these flags
97
%e_ehsize       = 0x34  //  uint16_t  e_ehsize;       // ELF header size in bytes
98
%e_phentsize    = 0x36  //  uint16_t  e_phentsize;    // Program header table entry size
99
%e_phnum        = 0x38  //  uint16_t  e_phnum;        // Program header table entry count
100
%e_shentsize    = 0x3A  //  uint16_t  e_shentsize;    // Section header table entry size
101
%e_shnum        = 0x3C  //  uint32_t  e_shnum;        // Section header table entry count (was uint16_t)
102
%e_shstrndx     = 0x40  //  uint32_t  e_shstrndx;     // Section header string table index (was uint16_t)
103
%e_stackvect    = 0x44  //  uint32_t  e_stackvect;    // number of vectors to store on stack. multiply by max vector length and add to stacksize
104
%e_stacksize    = 0x48  //  uint64_t  e_stacksize;    // size of stack for main thread
105
%e_ip_base      = 0x50  //  uint64_t  e_ip_base;      // __ip_base relative to first ip based segment
106
%e_datap_base   = 0x58  //  uint64_t  e_datap_base;   // __datap_base relative to first datap based segment
107
%e_threadp_base = 0x60  //  uint64_t  e_threadp_base; // __threadp_base relative to first threadp based segment
108
%file_header_size = 0x68                              // size of file header
109
 
110
%ELFMAG         = 0x464C457F // 0x7F 'E' 'L' 'F': identifying number at e_ident
111
 
112
 
113
// Definition of offsets in program headers (struct ElfFwcPhdr in elf_forwardcom.h):
114
%p_type         = 0x00  //  uint32_t  p_type;         // Segment type
115
%p_flags        = 0x04  //  uint32_t  p_flags;        // Segment flags
116
%p_offset       = 0x08  //  uint64_t  p_offset;       // Segment file offset
117
%p_vaddr        = 0x10  //  uint64_t  p_vaddr;        // Segment virtual address
118
%p_paddr        = 0x18  //  uint64_t  p_paddr;        // Segment physical address (not used. indicates first section instead)
119
%p_filesz       = 0x20  //  uint64_t  p_filesz;       // Segment size in file
120
%p_memsz        = 0x28  //  uint64_t  p_memsz;        // Segment size in memory
121
%p_align        = 0x30  //  uint8_t   p_align;        // Segment alignment
122
%p_unused       = 0x31  //  uint8_t   unused[7];
123
 
124
// Definition of section flags
125
%SHF_EXEC       = 0x0001     // Executable
126
%SHF_WRITE      = 0x0002     // Writable
127
%SHF_READ       = 0x0004     // Readable
128
%SHF_IP         = 0x1000     // Addressed relative to IP (executable and read-only sections)
129
%SHF_DATAP      = 0x2000     // Addressed relative to DATAP (writeable data sections)
130
%SHF_THREADP    = 0x4000     // Addressed relative to THREADP (thread-local data sections)
131
 
132
// Start of RAM address
133
%ram_start_address = 0
134
 
135
// stack alignment
136
%stack_align = 1 << 4        // alignment of stack
137
 
138
 
139
/* Register use in this loader
140
r0:  number of bytes to read from input
141
r1:  current address in ram
142
r6:  ram address of current program header
143
r10: ram_start_address
144
r11: number of bytes read from input = current position in input file
145
r12: size of each program header
146
r13: size of all threadp sections
147
r14: current program header index
148
r20: ram address of first program header
149
r21: number of program headers
150
r22: temporary start address for program data (later moved to 0)
151
r23: start address of const data
152
r24: start address of code section
153
r25: start address of threadp sections
154
r26: end of initialized data section, start of BSS
155
r27: size of code memory
156
r28: end of data and bss sections
157
r29: start address of loader
158
r30: error code
159
*/
160
 
161
 
162
/*********************************************
163
        Program code for loader
164
*********************************************/
165
 
166
code section execute align = 8
167
 
168
__entry_point function public
169
_loader  function public
170
 
171
// Loader entry:
172
jump LOADER
173
 
174
// Restart entry. This will restart a previously loaded program:
175
RESTART:
176
 
177
// Dummy constants make sure the following instructions are 2-word size.
178
// These constants will be changed by the loader
179
set_sp:
180
int32 sp = 0xDEADBEEF                            // will be replaced by calculated stack address
181
set_datap:
182
int32 r1 = 0xC001F001                            // will be replaced by calculated 32-bit datap value
183
int64 datap = write_spec(r1)                     // save datap register
184
set_threadp:
185
int32 r2 = 0xFEE1600D                            // will be replaced by calculated 32-bit threadp value
186
int64 threadp = write_spec(r2)                   // save threadp register
187
 
188
// clear input buffer
189
do { // repeat until no more serial input coming
190
    int r2 = 1
191
    int output(r2, r2, serial_input_status)      // clear input buffer
192
    for (int r1 = 0; r1 < 1000000; r1++) {}      // delay loop
193
    int16 r2 = input(r2, serial_input_status)    // check if there is more input
194
}
195
while (int16 r2 != 0)
196
 
197
// clear registers
198
int r0 = 0
199
int r1 = 0
200
int r2 = 0
201
int r3 = 0
202
int r4 = 0
203
int r5 = 0
204
int r6 = 0
205
int r7 = 0
206
int r8 = 0
207
int r9 = 0
208
int r10 = 0
209
int r11 = 0
210
int r12 = 0
211
int r13 = 0
212
int r14 = 0
213
int r15 = 0
214
int r16 = 0
215
int r17 = 0
216
int r18 = 0
217
int r19 = 0
218
int r20 = 0
219
int r21 = 0
220
int r22 = 0
221
int r23 = 0
222
int r24 = 0
223
int r25 = 0
224
int r26 = 0
225
int r27 = 0
226
int r28 = 0
227
int r29 = 0
228
int r30 = read_perf(perf0, -1)                   // clear all performance counters
229
int r30 = 0
230
 
231
// breakpoint
232
 
233
// To do: clear r0 - r30 using POP instruction if supported
234
 
235
set_entry_point:
236
jump LOADER                                      // this will be replaced by 24-bit relative call to program entry
237
 
238
breakpoint                                       // debug breakpoint in case main program returns
239
for (int;;){}                                    // stop in infinite loop
240
 
241
 
242
/*********************************************
243
           Loader starts here
244
*********************************************/
245
 
246
LOADER:
247
 
248
read_restart:
249
 
250
do {                                             // wait until there are at least 4 bytes in input buffer
251
    int16 r3 = input(r0, serial_input_status)    // bit 15:0 of status = number of bytes in input buffer (r0 is dummy)
252
} while (int16+ r3 < 4)                          // repeat if not enough data
253
 
254
// Read serial input and search for file header beginning with 0x7F, 'E', 'L', 'F'
255
int8 r3 = input(r0, serial_input_port)           // read first byte (r0 is dummy)
256
if (int8+ r3 != 0x7F) {jump read_restart}
257
int8 r3 = input(r0, serial_input_port)           // read second byte
258
if (int8+ r3 != 'E')  {jump read_restart}
259
int8 r3 = input(r0, serial_input_port)           // read third byte
260
if (int8+ r3 != 'L')  {jump read_restart}
261
int8 r3 = input(r0, serial_input_port)           // read fourth byte
262
if (int8+ r3 != 'F')  {jump read_restart}
263
 
264
// Store file header in memory at address 0
265
//int64 r1 = ram_start_address                   // Store file header in memory at address 0
266
//int32 [r1] = ELFMAG                            // store first word (superfluous. will not be used)
267
int r1 = 4                                       // we have read 4 bytes
268
 
269
// read_block function input:
270
// r0: number of bytes to read
271
// r1: pointer to memory block to write to
272
// return:
273
// r0: last byte read
274
// r1: end of memory block
275
 
276
int r0 = file_header_size - 4                    // read program header (we have already read 4 bytes)
277
int r11 = r0 + r1                                // count number of bytes read
278
call read_block
279
int64 r10 = ram_start_address                    // Store file header in memory at address 0
280
 
281
// read program headers
282
int32 r0 = [r10 + e_phoff]                       // file offset to first program header
283
int32 r0 -= r11                                  // number of bytes read so far
284
int r11 += r0                                    // count number of bytes read
285
call read_dummy                                  // read any space between file header and first program header
286
 
287
// round up to align by 8
288
int r1 += 7
289
int r1 &= -8
290
 
291
int r20 = r1                                     // save address of first program header
292
int16 r21 = [r10 + e_phnum]                      // number of program headers
293
int16 r12 = [r10 + e_phentsize]                  // size of each program header
294
// int r0 = r21 * r12                            // size of all program headers
295
int r0 = 0
296
for (int+ r14 = 0; r14 < r21; r14++) {           // multiplication loop in case CPU does not support multiplication
297
    int16 r0 += r12
298
}
299
int r11 += r0                                    // count number of bytes read
300
call read_block                                  // read all program headers
301
 
302
int r22 = r1 + 7                                 // temporary program data start address
303
int r22 &= -8                                    // align by 8
304
 
305
// find first code section
306
int32 r6 = r20                                   // ram address of first program header
307
for (int+ r14 = 0; r14 < r21; r14++) {           // loop through code sections
308
    int r3 = [r6 + p_flags]                      // section flags
309
    if (int8+ r3 & SHF_EXEC) {break}             // search for SHF_EXEC flag
310
    int r6 += r12                                // next program header
311
}
312
 
313
int r24 = read_capabilities(capab5, 0)           // get data cache size = start of code section
314
int r27 = read_capabilities(capab4, 0)           // get code cache size = max size of code section
315
int64 r4 = [r6 + p_vaddr]                        // virtual address of first code section relative to first IP section
316
int64 r23 = r24 - r4                             // start address of const data (ip-addressed)
317
 
318
// load binary data
319
 
320
// 1. const sections
321
int r1 = r23                                     // start address of const data
322
int32 r6 = r20                                   // ram address of first program header
323
for (int+ r14 = 0; r14 < r21; r14++) {           // loop through program headers
324
    int r3 = [r6 + p_flags]                      // section flags
325
    int16+ test_bits_and(r3, SHF_IP | SHF_READ), jump_false LOOP3BREAK // skip if not readable IP
326
    if (int16+ r3 & SHF_EXEC) {break}            // stop if SHF_EXEC flag
327
    int32 r0 = [r6 + p_offset]                   // file offset of this section
328
    int32 r0 -= r11                              // space between last program header and first binary data block
329
    int r11 += r0                                // count number of bytes read
330
    call read_dummy                              // read any space
331
    int32 r0 = [r6 + p_filesz]                   // file size of this section
332
    int32 r0 += 3                                // round up to nearest multiple of 4
333
    int32 r0 &= -4
334
    int r11 += r0                                // count number of bytes read
335
    call read_block                              // read const data section
336
    int r6 += r12                                // next program header
337
}
338
LOOP3BREAK:
339
 
340
// 2. code sections
341
for (int ; r14 < r21; r14++) {                   // continue loop through program headers
342
    int r3 = [r6 + p_flags]                      // section flags
343
    if (int16+ !(r3 & SHF_EXEC)) {break}         // stop if not SHF_EXEC flag
344
    int32 r0 = [r6 + p_offset]                   // file offset of this section
345
    int32 r0 -= r11                              // any space between last binary data and this
346
    int r11 += r0                                // count number of bytes read
347
    call read_dummy                              // read any space
348
    uint64 r1 = r23 + [r6 + p_vaddr]             // address to place code
349
    int32 r0 = [r6 + p_filesz]                   // file size of this section
350
    int32 r0 += 3                                // round up to nearest multiple of 4
351
    int32 r0 &= -4
352
    int r11 += r0                                // count number of bytes read
353
    call read_block                              // read code section
354
    int r6 += r12                                // next program header
355
}
356
 
357
int r30 = 1                                      // error code
358
int r29 = address([_loader])
359
if (uint32 r1 > r29) {jump ERROR}                // out of code memory
360
 
361
// 3. datap sections
362
// align first data section
363
int r3 = [r6 + p_flags]                          // section flags
364
if (int+ r3 & SHF_DATAP) {                       // check if there is a data or bss section
365
    int8  r4 = [r6 + p_align]
366
    int   r5 = 1
367
    int64 r5 <<= r4                              // alignment
368
    int64 r5 -= 1
369
    int64 r22 += r5
370
    int64 r5 = ~r5
371
    int64 r22 &= r5                              // aligned start address of program data
372
}
373
 
374
// data section headers
375
for (int ; r14 < r21; r14++) {                   // continue loop through program headers
376
    int r3 = [r6 + p_flags]                      // section flags
377
    if (int16+ !(r3 & SHF_DATAP)) {break}        // stop if not SHF_DATAP flag
378
    int32 r0 = [r6 + p_offset]                   // file offset of this section
379
    int32 r0 -= r11                              // any space between last binary data and this
380
    int r11 += r0                                // count number of bytes read
381
    call read_dummy                              // read any space
382
    int r1 = r22 + [r6 + p_vaddr]                // address to place code
383
    int r27 = r1 + [r6 + p_memsz]                // end of initialized and unitialized data section
384
    int32 r0 = [r6 + p_filesz]                   // file size of this section
385
    int32 r0 += 3                                // round up to nearest multiple of 4
386
    int32 r0 &= -4
387
    int r11 += r0                                // count number of bytes read. will be zero for BSS section
388
    call read_block                              // read code section
389
    int r6 += r12                                // next program header
390
    int r26 = r1                                 // end of initialized data section
391
}
392
 
393
// 4. threadp sections
394
int r13 = 0                                      // size of all threadp sections
395
int64 r25 = r23                                  // default if no threadp section. used for stack pointer
396
// find last threadp section
397
int r7 = r6
398
for (int r2 = r14; r2 < r21; r2++) {             // continue loop through program headers
399
    int r3 = [r7 + p_flags]                      // section flags
400
    if (int16+ !(r3 & SHF_THREADP)) {break}      // stop if not SHF_THREADP flag
401
    int r7 += r12                                // next program header
402
}
403
int r7 -= r12                                    // last threadp header, if any
404
if (int r7 >= r6) {                              // check if there is any threadp header
405
    int r13 = [r7 + p_vaddr]                     // virtual address of last threadp section relative to first threadp section
406
    int r13 += [r7 + p_memsz]                    // add size of last threadp section to get total size of threadp sections
407
    // start of threadp section
408
    int64 r25 = r23 - r13
409
    // align start of threadp sections
410
    int8  r4 = [r7 + p_align]                    // alignment of first threadp section
411
    int   r5 = 1
412
    int64 r5 <<= r4                              // alignment
413
    int64 r5 = -r5
414
    int64 r25 = r25 & r5                         // aligned start address of first threadp section
415
}
416
 
417
int r30 = 2                                      // error code
418
if (uint32 r25 <= r27) {jump ERROR}              // out of RAM memory
419
// r22 contains the amount or RAM used for headers during loading.
420
// This is included in the memory count above, but will be freed before the loaded program is run.
421
// This freed memory will be available for data stack or heap
422
 
423
// threadp section headers
424
for (int ; r14 < r21; r14++) {                   // continue loop through program headers
425
    int r3 = [r6 + p_flags]                      // section flags
426
    if (int16+ !(r3 & SHF_THREADP)) {break}      // stop if not SHF_THREADP flag
427
    uint64 r1 = r25 + [r6 + p_vaddr]             // address to place code
428
    int32 r0 = [r6 + p_offset]                   // file offset of this section
429
    int32 r0 -= r11                              // any space between last binary data and this
430
    int r11 += r0                                // count number of bytes read
431
    call read_dummy                              // read any space
432
    int32 r0 = [r6 + p_filesz]                   // file size of this section (0 if BSS)
433
    int32 r0 += 3                                // round up to nearest multiple of 4
434
    int32 r0 &= -4
435
    int r11 += r0                                // count number of bytes read. will be zero for BSS section
436
    call read_block                              // read code section
437
    int r6 += r12                                // next program header
438
}
439
 
440
int64 r10 = ram_start_address                    // Store file header temporarily in memory at address 0
441
 
442
// calculate entry point for loaded program
443
// r23 = const start = start of IP-addressed block
444
int64 r1 = r23 + [r10 + e_entry]                 // entry point
445
int64 r2 = address([set_entry_point+4])          // reference point
446
int32 r3 = r1 - r2                               // relative address
447
int32 r4 = r3 << 6                               // remove upper 8 bits and scale by 4
448
uint32 r5 = r4 >> 8                              //
449
int32 r6 = r5 | 0x79000000                       // code for direct call instruction
450
int32 [set_entry_point] = r6                     // modify set_entry_point instruction to call calculated entry point
451
 
452
// get datap
453
int64 r7 = [r10 + e_datap_base] /* + r22 */      // temporary datap address is r7+r22, but moved down to r7
454
int32 [set_datap+4] = r7                         // modify instruction that sets datap
455
 
456
// get threadp
457
int64 r8 = r25 + [r10 + e_threadp_base]          // threadp register
458
int32 [set_threadp+4] = r8                       // modify instruction that sets threadp
459
 
460
// get sp
461
int64 sp = r25 & -stack_align                    // align stack at end of datap ram = begin of threadp
462
int32 [set_sp+4] = sp                            // modify instruction that sets stack pointer
463
 
464
// Move data down from r22 to 0
465
int r2 = ram_start_address
466
for (int+ r3 = r22; r3 < r26; r3 += 4) {
467
    int32 r4 = [r3]
468
    int32 [r2] = r4
469
    int32 r2 += 4
470
}
471
 
472
// Fill the rest with zeroes, including BSS and empty space or stack
473
int r0 = 0
474
for (int ; r2 < r25; r2 += 4) {
475
    int32 [r2] = r0
476
}
477
 
478
// Initialize datap, threadp, sp. Jump to the entry point of the loaded program
479
jump RESTART
480
 
481
_loader end
482
 
483
 
484
// Error if out of memory or if input file sections are not in desired order
485
ERROR:
486
breakpoint
487
int r0 = r30                                     // show error code in debugger
488
jump ERROR
489
 
490
 
491
// Function to read a block of data into memory.
492
// input:
493
// r0: number of bytes to read. must be divisible by 4
494
// r1: pointer to memory block to write to. must be aligned by 4
495
// return:
496
// r0: last word read
497
// r1: end of memory block
498
read_block function
499
    int r30 = 0x10                               // error code
500
    if (int32 r0 < 0) {jump ERROR}               // check if negative
501
    int64 r2 = r1 + r0                           // end of memory block
502
    for (uint64 ; r1 < r2; r1 += 4) {            // loop n/4 times
503
        do {                                     // wait until there are at least 4 bytes in input buffer
504
            int32 r3 = input(r0, serial_input_status) // bit 15:0 of status = number of bytes in input buffer
505
        } while (int16 r3 < 4)                   // repeat if data not enough data
506
        int8 r3 = input(r0, serial_input_port)   // read first byte
507
        int8 r4 = input(r0, serial_input_port)   // read second byte
508
        int32 r4 <<= 8;
509
        int32 r3 |= r4
510
        int8 r4 = input(r0, serial_input_port)   // read third byte
511
        int32 r4 <<= 16;
512
        int32 r3 |= r4
513
        int8 r4 = input(r0, serial_input_port)   // read fourth byte
514
        int32 r4 <<= 24;
515
        int32 r3 |= r4
516
        int32 [r1] = r3                          // store byte to memory
517
    }
518
    return
519
read_block end
520
 
521
// Function to read a block of data and discard it
522
// input:
523
// r0: number of bytes to read
524
// return:
525
// r0: last byte read
526
read_dummy function
527
    int r30 = 0x11                               // error code
528
    if (int32 r0 < 0) {jump ERROR}               // check if negative
529
    for (uint64 ; r0 > 0; r0--) {                // loop n times
530
        do {
531
            int16 r3 = input(r0, serial_input_port) // read one byte. r0 is dummy
532
        } while (int16+ !(r3 & 0x100))           // repeat if data not ready
533
    }
534
    //int8 r0 = r3                               // return last byte read
535
    return
536
read_dummy end
537
 
538
nop
539
 
540
code end

powered by: WebSVN 2.1.0

© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.