URL
https://opencores.org/ocsvn/eco32/eco32/trunk
Subversion Repositories eco32
[/] [eco32/] [trunk/] [doc/] [fpga-impl] - Rev 244
Go to most recent revision | Compare with Previous | Blame | View Log
FPGA Implementations of ECO32=============================eco32-00--------This is essentially the same as the solution of assigment 9 of thecourse "Hardware for Embedded Systems", i.e., an implementation ofECO32e. The differences are:a) The reset circuit is moved to a subdirectory of its own. Theduration of the reset pulse is reduced to 2^24/50MHz = 0.3 sec,a quarter of the original duration.b) The reset circuit is connected to the pushbutton on the carrierboard, which has been designated for reset by the manufacturer.c) The bus controller is moved to a subdirectory of its own.d) The top-level description is transformed from a schematic intoplain text. This in turn eliminates the need for top-levelsymbols of the Reset/ROM/RAM/Busctrl/CPU/DSP/KBD circuits.eco32-01--------We have a new module, "ser", which represents the circuit for aserial interface (8 bit data, no parity, 1 stop bit, 38400 baud).The data is buffered twice in both directions. The module isinstanciated once; the data in/out lines are connected to theRS232 interface on the carrier board. The bus controller gotthe necessary additional connections to drive the module.eco32-02--------The fake RAM module is replaced by a preliminary implementation ofreal RAM. It uses the block RAM of the FPGA (instead of the SDRAMmounted as an extra chip on the board that the final implementationwill use). It is therefore very small in size: 4 blocks of 16K bitseach yield a total size of 2 KWords (8K bytes).eco32-03--------This revision corrects an error which should have been correcteda long time ago: the instructions ldb and ldh never sign-extendedtheir loaded data. On top of that, the instructions ldbu and ldhunever placed zeroes into the bit positions 31 to 8 and 31 to 16,respectively. This went undetected so far, because the implementationof the bus did this already, although it is not explicitly requested.eco32-04--------This version got a shift unit. It is connected in parallel to theALU, feeding its output into an expanded multiplexer. Becausearithmetic right shifts are slow, shifting needs an extra cycleto complete. Even then it was necessary to request the place androute effort level "high" to get by with a clock period of 20 nsec.eco32-05--------Again there was an error to correct: I tried to scroll the displayby copying the display memory contents and discovered that readingthe memory needs an additional bus cycle (because the memory isclocked). A simple state machine had to be written, which in turnneeded the reset signal. I changed the top-level description ofthe display from a schematic to plain text.eco32-06--------An easy job: I implemented the "jalr" instruction.eco32-07--------This is the first step in getting the real memory to work:I integrated the clock/reset module from my SDRAM controllerexperiments. I also corrected the naming of the flash ROMsignals; all active-low signals are now consistently namedwith a trailing "_n".eco32-08--------We now have a working SDRAM controller!eco32-09--------Second serial interface added.eco32-10--------Branches based on signed comparisons added.eco32-11--------Timer added.eco32-12--------Multiply, divide, and remainder instructions done.eco32-13--------A first attempt to introduce virtual addressing: a totallyminimalistic MMU consisting of two AND gates which suppressthe two MSBs of the virtual address if they are set. Ifthey are not, too bad - the virtual address is then mappedto physical address 0.eco32-14--------A couple of steps to make interrupts available:a) The CPU gets an input vector of 16 interrupt request lines whichare all tied to 0 in the top-level design external to the CPU.b) The timer circuit's control register gets an interrupt enable bit,which gates the 'timer expired' status bit onto an additionaloutput line, the timer's interrupt request line. This line isconnected to the CPU's irq line 14.c) Inside the CPU there must be a set of 4 special registers. Theyare implemented in a separate module. Two instructions (mvfs andmvts) transfer data between the standard and the special registersets. The data input of the special register set is connected tothe standard register data output 2; the write enable signal forthe special register set is controlled by the CPU's state machine.The data output of the special register set is connected to thedata input 2 multiplexer of the standard register set, which hasto be widened by one input (and by one control line also). Theregister number which selects the special register from/to whichreading/writing should take place comes from the instructionregister's immediate constant. The two new instructions get oneextra state each in the CPU's state machine.d) For interrupts and exceptions to take place there must be fouradditional values available which can be loaded into the PC:0xE0000004 general interrupts (V-bit of the PSW off)0xC0000004 general interrupts (V-bit of the PSW on)0xE0000008 user TLB miss (V-bit of the PSW off)0xC0000008 user TLB miss (V-bit of the PSW on)The contents of the special register 0 (the PSW) are needed atseveral places in the description of the CPU's state machine.They have to be set also, independently of the mvts instruction.Therefore an extra data path from/to the special register setis established, together with a separate write signal for thePSW. The state machine gets two new states, one to acknowledgeinterrupts and another one to implement the rfx instruction.Each instruction tests a specific 'interrupt trigger line'before returning to state 1 (instruction fetch). If it is set,the state machine branches to the 'interrupt' state. In thisway we don't need a separate state before the 'instructionfetch' state to check for interrupts (and also avoid theunpleasant alternative: to merge interrupt detection intothe fetch state - think of the already-incremented pc, forexample). The trigger signal is set if there is any interruptrequest present, its mask is open, and the global interruptenable (in the PSW) is set. The ECO32 architecture defines5 bits in the PSW to be the priority of the last acknowledgedinterrupt. Therefore a priority encoder takes the vector ofinterrupt requests (possibly modified by closed mask bits)and determines the highest unmasked interrupt from that. Thetwo additional states in the state machine also handle thetwo stacks (each three positions deep) for the 'interruptenable' and 'user mode' flags within the PSW.e) Since its construction, the ALU had two unused function encodings;they had been assigned to add and subtract, but were never used.They now deliver either the first or the second operand of the ALUto the output, unchanged. This simplifies three instructions (ldhi,jr, rfx) as well as the interrupt state in the CPU's state machine.eco32-15--------We now have the 'trap' instruction. This is an important firstexample of an exception.eco32-16--------This version accepts the four TLB instructions as valid instructions(but treats them as no-ops).eco32-17--------A couple of steps to make exceptions work:a) There are only 16 interrupts, so irq_priority is only [3:0] wide.The leading bit of the interrupt/exception priority in the PSW isexplicitly set to 0 in state 15 (interrupt).b) Generally, states returning to state 1 (instruction fetch) checkthe signal irq_trigger for pending interrupts and branch to state15 (interrupt) if it is set. This should NOT be done if the currentstate could possibly set the PSW to disable interrupts. So states15 (interrupt), 22 (mvts), 23 (rfx), and 24 (trap) don't do thischeck any longer. On the other hand, delaying the acceptance ofa pending interrupt for a whole instruction would come as a hardsurprise for an unsuspecting system programmer. It would in factbe possible to write an instruction sequence which never acceptsany interrupts, although interrupts are expected to be enabled forone instruction:mvts $5,PSW ; disable interruptslabel:mvts $4,PSW ; enable interruptsmvts $5,PSW ; disable interruptsj labelThis cannot be tolerated. Therefore an additional state is inserted,just to check irq_trigger, computed from the new value of the PSW.This certainly makes no sense for interrupt and trap, because the newvalue of the interrupt enable flag in the PSW is known to be 0. Sothe new state is only reached from states 22 (mvts) and 23 (rfx).First, I did some renumbering of states:Renamed state 25 to 26 (TLB instruction).Renamed state 24 to 25 (trap).Then the additional state is called state 24.c) Because the trap instruction is merely one of several possible causesfor an exception, its execution state (25, see step b) above) can beused to implement exceptions. The exception number must be communicatedto this state. We therefore have a 4-bit register named 'exc_priority'which must be set by any state transition to state 25. Its contentsare appended to a leading 1 and then represent the exception prioritywhich is found in the PSW.d) The following exceptions are implemented:trap instruction exceptionillegal instruction exceptiondivide instruction exceptione) The 'bus timeout exception' is implemented with the help of a counterwhich is activated if the bus is enabled and its wait line active.When the counter expires, the exception execution state is entered.There is a catch: if the bus timeout occurs during instruction fetch,the PC has yet its old value, i.e., it must not get decremented whilehandling the exception. This could be handled best by just anotherstate (renaming state 26 to 27, and using the new state 26 forexception handling without decrementing the PC).f) The 'privileged instruction exception' isn't difficult to implementbut can only be tested if a TLB is present (because the test programmust enter user mode in order to trigger the exception - and in usermode, instructions cannot be executed at addresses which have theirMSB set without triggering a 'privileged address exception').eco32-18--------This intermediate version got a new bus controller which does no longermirror RAM and ROM in their respective upper address spaces but signalsa bus timeout instead.eco32-19--------This version implements the MMU with a TLB (first of two parts).a) Add the TLB module. It consists of an "input section" (32 comparatorsworking in parallel, and a priority encoder which computes the binaryrepresentation of the number of one of the matching comparators), andan "output section" which merely delivers the previously stored framenumber and permission bits of the frame. The output section's memoryis addressed by the output of the priority encoder. The two sectionstogether implement a fully associative address translation cache.b) Change the MMU from a purely combinational circuit to one which needsa single clock cycle to compute its output. This is necessary becausethe RAM which stores frame numbers in the TLB output section also needsone cycle to read its contents.c) In the controller of the CPU add one state before each bus cycle state(i.e., three states: fetch, load, and store). These additional statesperform the address translation from a virtual to a physical address.I added three new states (28..30) which now implement the bus cyclesand reassigned the old state numbers (1, 12, 14) to the states whichdo address translations.d) The MMU must implement several functions:no operation, hold outputmap virtual to physical addressexecute tbsexecute tbwrexecute tbriexecute tbwiThe controller instructs the MMU which function is to be executed.e) The tbwr instruction needs a "random" index. This can be generatedby a counter which counts down at every clock pulse, instructionfetch, or address mapping request. There is a catch: if the counterwould count on every clock pulse and each instruction would need amultiple of 2 clock pulses to complete, then only half the entriesof the TLB would be used. Thus counting instructions is safer, andfurthermore counting address mappings is cheaper than that (becauseaddress mapping is already one of the functions of the MMU andtherefore easily detectable).f) The values of the special registers 1 (TLB Index), 2 (TLB EntryHi),and 3 (TLB EntryLo) are needed within the MMU. The MMU also mustwrite new values to these registers under certain circumstances.Three dedicated signals for each of these special registers (oldvalue, write enable, new value) enable the MMU to do so.g) In principle, the tbri instruction needs two clock cycles to doits work: one cycle to read the TLB and another one to write thedata to special register 3. This can be reduced to a single clockcycle (write to special register 3) if the RAM's contents are readout by default within every clock cycle.eco32-20--------This version implements the MMU with a TLB (second of two parts).a) Detect privileged and illegal address exceptions within the statemachine. In order to do so, virtual address bits 31, 1, and 0 mustbe available there. The exceptions are detected in the addresstranslation states (1, 12, 14). Control is transferred to state25 (or 26 in case of violation during instruction fetch) withexc_priority set accordingly. Although not yet needed for the bus,the bus size lines must be set to the intended transfer widthalready in the translation states in order to detect illegaladdresses there (before the bus is actually accessed). Last butnot least the MMU must not try to map an address if that triggeredone of the two exceptions.b) The TLB supplies three control signals (tlb_missed, tlb_invalid,and tlb_wrtprot) which are needed to detect the three exceptions"TLB miss", "TLB entry invalid", and "page frame write protected".The first of these, tlb_missed, is generated in the "input section"of the TLB and has to be delayed for one clock cycle so that itappears at the TLB output at the same time the other two signals do.The three signals are routed to the CPU's state machine. Becausethey are valid only after the address translation took place (thevalid and write bits are stored together with the frame number),the error conditions can only be detected in the bus cycle states.The actual bus cycle however must suppress its bus enable signal,if any exception has been detected.Attention: the three control signals must be de-asserted if theaddress in question is directly mapped (i.e., has its two MSBs set).c) The tlb_missed signal has in fact to be splitted into two signals:tlb_kmissed (MSB of address is 1) and tlb_umissed (MSB is 0). Thismust be done in order to route "user TLB misses" to another startaddress. Furthermore, the V bit in the PSW has to be considered andthe ISR start address modified accordingly.d) The three write enable signals for the three special TLB registersare best produced within the main CPU state machine, because theyare dependent on the opcode if one of the TLB instructions isexecuted. They must also be asserted according to any exceptionwhich das been detected.eco32-21--------I changed the display description from a schematic to plain Verilog.eco32-22--------The display has got character attributes: one attribute byte percharacter stored in the display memory. The bits in the attributebyte are loosely imitating those from the good old CGA adapter intext mode.Bit 7: blinking foregroundBit 6: background redBit 5: background greenBit 4: background blueBit 3: intensified foregroundBit 2: foreground redBit 1: foreground greenBit 0: foreground blueeco32-23--------Now the keyboard can interrupt the CPU.eco32-24--------Project re-organized. All source files are now located under a singledirectory "src". Now it is easier to clean up a project after editingor testing: simply remove all files and directories except "src" andthe project manager's control file "eco32.npl".eco32-25--------The reset circuit had the following problem: although an externallyapplied reset signal (produced by pressing the "reset" pushbutton)was internally recognized for initializing the CPU, it did not workthe other way around, which is important when re-loading the FPGA.In this case, the CPU was reset, but the external devices, especiallythe disk drive, did not get a reset signal. So the drive could getout of sync with its controller. The reset circuit now actively drivesthe external bidirectional reset line when performing a reset, as wellas observing this line when not actively driving it.eco32-26--------This is the first version with a real IDE disk attached! Thanks toMartin Geisse, who did a very nice job.eco32-27--------The two serial interfaces are now able to generate interrupt requests.As far as I can see, the implementation is now functionally complete.eco32-28--------The IDE disk interface had a small problem with reading/writing a blockof 8 sectors in a single operation. Fixed.eco32-29--------Same as eco32-28, but with an ISE Version 11 project file. Becauseit is now possible to develop exclusively under Linux (includingdownload to the FPGA board), all source files were converted tonewline-only line endings.
Go to most recent revision | Compare with Previous | Blame | View Log
