URL
                    https://opencores.org/ocsvn/forwardcom/forwardcom/trunk
                
            Subversion Repositories forwardcom
[/] [forwardcom/] [trunk/] [loader.as] - Rev 149
Go to most recent revision | Compare with Previous | Blame | View Log
/**************************** loader.as ********************************* Author: Agner Fog* date created: 2020-12-04* Last modified: 2021-07-30* Version: 1.11* Project: Loader for ForwardCom soft core* Language: ForwardCom assembly* Description:* This loader is designed to run in a ForwardCom processor to load an* executable file into code and data RAM before running the loaded program.** Copyright 2020-2021 GNU General Public License v.3 http://www.gnu.org/licenses******************************************************************************Prerequisites:The executable file to be loaded is structured as defined in the ForwardComELF specification defined in the file elf_forwardcom.h.The sections are sorted into blocks in the following order(see CLinker::sortSections() in file linker.cpp):* const (ip)* code (ip)* data (datap)* bss (datap)* data (threadp)* bss (threadp)The binary data sections are stored in the executable file in the same orderas the program headers.The executable file is position-independent. No relocation of addresses inthe code is needed.The program has only one thread.The available RAM is sufficient.The input is loaded as bytes through a serial input port (BAUD rate set in defines.vh)The data will be stored in the processor memory in the following order:1. data (at beginning of data memory. Addressed by datap)2. bss (uninitialized data, immediately after data. Addressed by datap)3. free space to use for heap and stack. (The stack pointer will point to the end of this space)4. threadp data (immediately before const. Addressed by threadp)5. const data (at end of data memory. Addressed by IP)6. code (at beginning of code memory. Addressed by IP)7. loader code (at end of code memory)Instructions for how to modify and rebuild the loader:-----------------------------------------------------------1. The first instruction must be a direct jump to the loader code thatloads an executable program (*.ex file). The load button will go to thisaddress.The second instruction at address 1 (word-based) must be an entry for therestart code that will restart a previously loaded program. The reset buttonwill go to this address. The restart code must set datap, threadp, sp, andthe entry point to the values previously calculated by the loader.The present version stores these values in instructions in the code sectionin order to free the entire data memory for the running program.Note that we have execute and write access (int32 only) to the code memory,but not read access.2. Assemble:forw -ass -debug -binlist loader.as -list=loader.txt3. Link:forw -link -hex2 loader.mem loader.ob4. Replace the file loader.mem in the softcore project with the new version.5. Check size:The size of the code section of the loader can be found from the address ofthe last instruction in the file loader.txt produced by step 2.If this size (in 32-bit words) exceeds the value MAX_LOADER_SIZEdefined in the file defines.vh, then the value of MAX_LOADER_SIZE mustbe increased to at least the actual size. The value must be even.The loader code will be placed at an address calculated as the end of thecode memory minus MAX_LOADER_SIZE.6. Rebuild the soft core project.*****************************************************************************/// Definition of serial input ports%serial_input_port = 8 // serial input port, read one byte at a time%serial_input_status = 9 // serial input status. bit 0-15 = number of bytes in input buffer// Definition of offsets in the file header (struct ElfFwcEhdr in elf_forwardcom.h):%e_ident = 0x00 // uint8_t e_ident[16]; // Magic number and other info%e_type = 0x10 // uint16_t e_type; // Object file type%e_machine = 0x12 // uint16_t e_machine; // Architecture%e_version = 0x14 // uint32_t e_version; // Object file version%e_entry = 0x18 // uint64_t e_entry; // Entry point virtual address%e_phoff = 0x20 // uint64_t e_phoff; // Program header table file offset%e_shoff = 0x28 // uint64_t e_shoff; // Section header table file offset%e_flags = 0x30 // uint32_t e_flags; // Processor-specific flags. We may define any values for these flags%e_ehsize = 0x34 // uint16_t e_ehsize; // ELF header size in bytes%e_phentsize = 0x36 // uint16_t e_phentsize; // Program header table entry size%e_phnum = 0x38 // uint16_t e_phnum; // Program header table entry count%e_shentsize = 0x3A // uint16_t e_shentsize; // Section header table entry size%e_shnum = 0x3C // uint32_t e_shnum; // Section header table entry count (was uint16_t)%e_shstrndx = 0x40 // uint32_t e_shstrndx; // Section header string table index (was uint16_t)%e_stackvect = 0x44 // uint32_t e_stackvect; // number of vectors to store on stack. multiply by max vector length and add to stacksize%e_stacksize = 0x48 // uint64_t e_stacksize; // size of stack for main thread%e_ip_base = 0x50 // uint64_t e_ip_base; // __ip_base relative to first ip based segment%e_datap_base = 0x58 // uint64_t e_datap_base; // __datap_base relative to first datap based segment%e_threadp_base = 0x60 // uint64_t e_threadp_base; // __threadp_base relative to first threadp based segment%file_header_size = 0x68 // size of file header%ELFMAG = 0x464C457F // 0x7F 'E' 'L' 'F': identifying number at e_ident// Definition of offsets in program headers (struct ElfFwcPhdr in elf_forwardcom.h):%p_type = 0x00 // uint32_t p_type; // Segment type%p_flags = 0x04 // uint32_t p_flags; // Segment flags%p_offset = 0x08 // uint64_t p_offset; // Segment file offset%p_vaddr = 0x10 // uint64_t p_vaddr; // Segment virtual address%p_paddr = 0x18 // uint64_t p_paddr; // Segment physical address (not used. indicates first section instead)%p_filesz = 0x20 // uint64_t p_filesz; // Segment size in file%p_memsz = 0x28 // uint64_t p_memsz; // Segment size in memory%p_align = 0x30 // uint8_t p_align; // Segment alignment%p_unused = 0x31 // uint8_t unused[7];// Definition of section flags%SHF_EXEC = 0x0001 // Executable%SHF_WRITE = 0x0002 // Writable%SHF_READ = 0x0004 // Readable%SHF_IP = 0x1000 // Addressed relative to IP (executable and read-only sections)%SHF_DATAP = 0x2000 // Addressed relative to DATAP (writeable data sections)%SHF_THREADP = 0x4000 // Addressed relative to THREADP (thread-local data sections)// Start of RAM address%ram_start_address = 0// stack alignment%stack_align = 1 << 4 // alignment of stack/* Register use in this loaderr0: number of bytes to read from inputr1: current address in ramr6: ram address of current program headerr10: ram_start_addressr11: number of bytes read from input = current position in input filer12: size of each program headerr13: size of all threadp sectionsr14: current program header indexr20: ram address of first program headerr21: number of program headersr22: temporary start address for program data (later moved to 0)r23: start address of const datar24: start address of code sectionr25: start address of threadp sectionsr26: end of initialized data section, start of BSSr27: size of code memoryr28: end of data and bss sectionsr29: start address of loaderr30: error code*//*********************************************Program code for loader*********************************************/code section execute align = 8__entry_point function public_loader function public// Loader entry:jump LOADER// Restart entry. This will restart a previously loaded program:RESTART:// Dummy constants make sure the following instructions are 2-word size.// These constants will be changed by the loaderset_sp:int32 sp = 0xDEADBEEF // will be replaced by calculated stack addressset_datap:int32 r1 = 0xC001F001 // will be replaced by calculated 32-bit datap valueint64 datap = write_spec(r1) // save datap registerset_threadp:int32 r2 = 0xFEE1600D // will be replaced by calculated 32-bit threadp valueint64 threadp = write_spec(r2) // save threadp register// clear input bufferdo { // repeat until no more serial input comingint r2 = 1int output(r2, r2, serial_input_status) // clear input bufferfor (int r1 = 0; r1 < 1000000; r1++) {} // delay loopint16 r2 = input(r2, serial_input_status) // check if there is more input}while (int16 r2 != 0)// clear registersint r0 = 0int r1 = 0int r2 = 0int r3 = 0int r4 = 0int r5 = 0int r6 = 0int r7 = 0int r8 = 0int r9 = 0int r10 = 0int r11 = 0int r12 = 0int r13 = 0int r14 = 0int r15 = 0int r16 = 0int r17 = 0int r18 = 0int r19 = 0int r20 = 0int r21 = 0int r22 = 0int r23 = 0int r24 = 0int r25 = 0int r26 = 0int r27 = 0int r28 = 0int r29 = 0int r30 = read_perf(perf0, -1) // clear all performance countersint r30 = 0// breakpoint// To do: clear r0 - r30 using POP instruction if supportedset_entry_point:jump LOADER // this will be replaced by 24-bit relative call to program entrybreakpoint // debug breakpoint in case main program returnsfor (int;;){} // stop in infinite loop/*********************************************Loader starts here*********************************************/LOADER:read_restart:do { // wait until there are at least 4 bytes in input bufferint16 r3 = input(r0, serial_input_status) // bit 15:0 of status = number of bytes in input buffer (r0 is dummy)} while (int16+ r3 < 4) // repeat if not enough data// Read serial input and search for file header beginning with 0x7F, 'E', 'L', 'F'int8 r3 = input(r0, serial_input_port) // read first byte (r0 is dummy)if (int8+ r3 != 0x7F) {jump read_restart}int8 r3 = input(r0, serial_input_port) // read second byteif (int8+ r3 != 'E') {jump read_restart}int8 r3 = input(r0, serial_input_port) // read third byteif (int8+ r3 != 'L') {jump read_restart}int8 r3 = input(r0, serial_input_port) // read fourth byteif (int8+ r3 != 'F') {jump read_restart}// Store file header in memory at address 0//int64 r1 = ram_start_address // Store file header in memory at address 0//int32 [r1] = ELFMAG // store first word (superfluous. will not be used)int r1 = 4 // we have read 4 bytes// read_block function input:// r0: number of bytes to read// r1: pointer to memory block to write to// return:// r0: last byte read// r1: end of memory blockint r0 = file_header_size - 4 // read program header (we have already read 4 bytes)int r11 = r0 + r1 // count number of bytes readcall read_blockint64 r10 = ram_start_address // Store file header in memory at address 0// read program headersint32 r0 = [r10 + e_phoff] // file offset to first program headerint32 r0 -= r11 // number of bytes read so farint r11 += r0 // count number of bytes readcall read_dummy // read any space between file header and first program header// round up to align by 8int r1 += 7int r1 &= -8int r20 = r1 // save address of first program headerint16 r21 = [r10 + e_phnum] // number of program headersint16 r12 = [r10 + e_phentsize] // size of each program header// int r0 = r21 * r12 // size of all program headersint r0 = 0for (int+ r14 = 0; r14 < r21; r14++) { // multiplication loop in case CPU does not support multiplicationint16 r0 += r12}int r11 += r0 // count number of bytes readcall read_block // read all program headersint r22 = r1 + 7 // temporary program data start addressint r22 &= -8 // align by 8// find first code sectionint32 r6 = r20 // ram address of first program headerfor (int+ r14 = 0; r14 < r21; r14++) { // loop through code sectionsint r3 = [r6 + p_flags] // section flagsif (int8+ r3 & SHF_EXEC) {break} // search for SHF_EXEC flagint r6 += r12 // next program header}int r24 = read_capabilities(capab5, 0) // get data cache size = start of code sectionint r27 = read_capabilities(capab4, 0) // get code cache size = max size of code sectionint64 r4 = [r6 + p_vaddr] // virtual address of first code section relative to first IP sectionint64 r23 = r24 - r4 // start address of const data (ip-addressed)// load binary data// 1. const sectionsint r1 = r23 // start address of const dataint32 r6 = r20 // ram address of first program headerfor (int+ r14 = 0; r14 < r21; r14++) { // loop through program headersint r3 = [r6 + p_flags] // section flagsint16+ test_bits_and(r3, SHF_IP | SHF_READ), jump_false LOOP3BREAK // skip if not readable IPif (int16+ r3 & SHF_EXEC) {break} // stop if SHF_EXEC flagint32 r0 = [r6 + p_offset] // file offset of this sectionint32 r0 -= r11 // space between last program header and first binary data blockint r11 += r0 // count number of bytes readcall read_dummy // read any spaceint32 r0 = [r6 + p_filesz] // file size of this sectionint32 r0 += 3 // round up to nearest multiple of 4int32 r0 &= -4int r11 += r0 // count number of bytes readcall read_block // read const data sectionint r6 += r12 // next program header}LOOP3BREAK:// 2. code sectionsfor (int ; r14 < r21; r14++) { // continue loop through program headersint r3 = [r6 + p_flags] // section flagsif (int16+ !(r3 & SHF_EXEC)) {break} // stop if not SHF_EXEC flagint32 r0 = [r6 + p_offset] // file offset of this sectionint32 r0 -= r11 // any space between last binary data and thisint r11 += r0 // count number of bytes readcall read_dummy // read any spaceuint64 r1 = r23 + [r6 + p_vaddr] // address to place codeint32 r0 = [r6 + p_filesz] // file size of this sectionint32 r0 += 3 // round up to nearest multiple of 4int32 r0 &= -4int r11 += r0 // count number of bytes readcall read_block // read code sectionint r6 += r12 // next program header}int r30 = 1 // error codeint r29 = address([_loader])if (uint32 r1 > r29) {jump ERROR} // out of code memory// 3. datap sections// align first data sectionint r3 = [r6 + p_flags] // section flagsif (int+ r3 & SHF_DATAP) { // check if there is a data or bss sectionint8 r4 = [r6 + p_align]int r5 = 1int64 r5 <<= r4 // alignmentint64 r5 -= 1int64 r22 += r5int64 r5 = ~r5int64 r22 &= r5 // aligned start address of program data}// data section headersfor (int ; r14 < r21; r14++) { // continue loop through program headersint r3 = [r6 + p_flags] // section flagsif (int16+ !(r3 & SHF_DATAP)) {break} // stop if not SHF_DATAP flagint32 r0 = [r6 + p_offset] // file offset of this sectionint32 r0 -= r11 // any space between last binary data and thisint r11 += r0 // count number of bytes readcall read_dummy // read any spaceint r1 = r22 + [r6 + p_vaddr] // address to place codeint r27 = r1 + [r6 + p_memsz] // end of initialized and unitialized data sectionint32 r0 = [r6 + p_filesz] // file size of this sectionint32 r0 += 3 // round up to nearest multiple of 4int32 r0 &= -4int r11 += r0 // count number of bytes read. will be zero for BSS sectioncall read_block // read code sectionint r6 += r12 // next program headerint r26 = r1 // end of initialized data section}// 4. threadp sectionsint r13 = 0 // size of all threadp sectionsint64 r25 = r23 // default if no threadp section. used for stack pointer// find last threadp sectionint r7 = r6for (int r2 = r14; r2 < r21; r2++) { // continue loop through program headersint r3 = [r7 + p_flags] // section flagsif (int16+ !(r3 & SHF_THREADP)) {break} // stop if not SHF_THREADP flagint r7 += r12 // next program header}int r7 -= r12 // last threadp header, if anyif (int r7 >= r6) { // check if there is any threadp headerint r13 = [r7 + p_vaddr] // virtual address of last threadp section relative to first threadp sectionint r13 += [r7 + p_memsz] // add size of last threadp section to get total size of threadp sections// start of threadp sectionint64 r25 = r23 - r13// align start of threadp sectionsint8 r4 = [r7 + p_align] // alignment of first threadp sectionint r5 = 1int64 r5 <<= r4 // alignmentint64 r5 = -r5int64 r25 = r25 & r5 // aligned start address of first threadp section}int r30 = 2 // error codeif (uint32 r25 <= r27) {jump ERROR} // out of RAM memory// r22 contains the amount or RAM used for headers during loading.// This is included in the memory count above, but will be freed before the loaded program is run.// This freed memory will be available for data stack or heap// threadp section headersfor (int ; r14 < r21; r14++) { // continue loop through program headersint r3 = [r6 + p_flags] // section flagsif (int16+ !(r3 & SHF_THREADP)) {break} // stop if not SHF_THREADP flaguint64 r1 = r25 + [r6 + p_vaddr] // address to place codeint32 r0 = [r6 + p_offset] // file offset of this sectionint32 r0 -= r11 // any space between last binary data and thisint r11 += r0 // count number of bytes readcall read_dummy // read any spaceint32 r0 = [r6 + p_filesz] // file size of this section (0 if BSS)int32 r0 += 3 // round up to nearest multiple of 4int32 r0 &= -4int r11 += r0 // count number of bytes read. will be zero for BSS sectioncall read_block // read code sectionint r6 += r12 // next program header}int64 r10 = ram_start_address // Store file header temporarily in memory at address 0// calculate entry point for loaded program// r23 = const start = start of IP-addressed blockint64 r1 = r23 + [r10 + e_entry] // entry pointint64 r2 = address([set_entry_point+4]) // reference pointint32 r3 = r1 - r2 // relative addressint32 r4 = r3 << 6 // remove upper 8 bits and scale by 4uint32 r5 = r4 >> 8 //int32 r6 = r5 | 0x79000000 // code for direct call instructionint32 [set_entry_point] = r6 // modify set_entry_point instruction to call calculated entry point// get datapint64 r7 = [r10 + e_datap_base] /* + r22 */ // temporary datap address is r7+r22, but moved down to r7int32 [set_datap+4] = r7 // modify instruction that sets datap// get threadpint64 r8 = r25 + [r10 + e_threadp_base] // threadp registerint32 [set_threadp+4] = r8 // modify instruction that sets threadp// get spint64 sp = r25 & -stack_align // align stack at end of datap ram = begin of threadpint32 [set_sp+4] = sp // modify instruction that sets stack pointer// Move data down from r22 to 0int r2 = ram_start_addressfor (int+ r3 = r22; r3 < r26; r3 += 4) {int32 r4 = [r3]int32 [r2] = r4int32 r2 += 4}// Fill the rest with zeroes, including BSS and empty space or stackint r0 = 0for (int ; r2 < r25; r2 += 4) {int32 [r2] = r0}// Initialize datap, threadp, sp. Jump to the entry point of the loaded programjump RESTART_loader end// Error if out of memory or if input file sections are not in desired orderERROR:breakpointint r0 = r30 // show error code in debuggerjump ERROR// Function to read a block of data into memory.// input:// r0: number of bytes to read. must be divisible by 4// r1: pointer to memory block to write to. must be aligned by 4// return:// r0: last word read// r1: end of memory blockread_block functionint r30 = 0x10 // error codeif (int32 r0 < 0) {jump ERROR} // check if negativeint64 r2 = r1 + r0 // end of memory blockfor (uint64 ; r1 < r2; r1 += 4) { // loop n/4 timesdo { // wait until there are at least 4 bytes in input bufferint32 r3 = input(r0, serial_input_status) // bit 15:0 of status = number of bytes in input buffer} while (int16 r3 < 4) // repeat if data not enough dataint8 r3 = input(r0, serial_input_port) // read first byteint8 r4 = input(r0, serial_input_port) // read second byteint32 r4 <<= 8;int32 r3 |= r4int8 r4 = input(r0, serial_input_port) // read third byteint32 r4 <<= 16;int32 r3 |= r4int8 r4 = input(r0, serial_input_port) // read fourth byteint32 r4 <<= 24;int32 r3 |= r4int32 [r1] = r3 // store byte to memory}returnread_block end// Function to read a block of data and discard it// input:// r0: number of bytes to read// return:// r0: last byte readread_dummy functionint r30 = 0x11 // error codeif (int32 r0 < 0) {jump ERROR} // check if negativefor (uint64 ; r0 > 0; r0--) { // loop n timesdo {int16 r3 = input(r0, serial_input_port) // read one byte. r0 is dummy} while (int16+ !(r3 & 0x100)) // repeat if data not ready}//int8 r0 = r3 // return last byte readreturnread_dummy endnopcode end
Go to most recent revision | Compare with Previous | Blame | View Log

