OpenCores
URL https://opencores.org/ocsvn/eco32/eco32/trunk

Subversion Repositories eco32

[/] [eco32/] [trunk/] [doc/] [fpga-impl] - Rev 55

Go to most recent revision | Compare with Previous | Blame | View Log


FPGA Implementations of ECO32
=============================

eco32-00
--------

This is essentially the same as the solution of assigment 9 of the
course "Hardware for Embedded Systems", i.e., an implementation of
ECO32e. The differences are:
a) The reset circuit is moved to a subdirectory of its own. The
   duration of the reset pulse is reduced to 2^24/50MHz = 0.3 sec,
   a quarter of the original duration.
b) The reset circuit is connected to the pushbutton on the carrier
   board, which has been designated for reset by the manufacturer.
c) The bus controller is moved to a subdirectory of its own.
d) The top-level description is transformed from a schematic into
   plain text. This in turn eliminates the need for top-level
   symbols of the Reset/ROM/RAM/Busctrl/CPU/DSP/KBD circuits.


eco32-01
--------

We have a new module, "ser", which represents the circuit for a
serial interface (8 bit data, no parity, 1 stop bit, 38400 baud).
The data is buffered twice in both directions. The module is
instanciated once; the data in/out lines are connected to the
RS232 interface on the carrier board. The bus controller got
the necessary additional connections to drive the module.


eco32-02
--------

The fake RAM module is replaced by a preliminary implementation of
real RAM. It uses the block RAM of the FPGA (instead of the SDRAM
mounted as an extra chip on the board that the final implementation
will use). It is therefore very small in size: 4 blocks of 16K bits
each yield a total size of 2 KWords (8K bytes).


eco32-03
--------

This revision corrects an error which should have been corrected
a long time ago: the instructions ldb and ldh never sign-extended
their loaded data. On top of that, the instructions ldbu and ldhu
never placed zeroes into the bit positions 31 to 8 and 31 to 16,
respectively. This went undetected so far, because the implementation
of the bus did this already, although it is not explicitly requested.


eco32-04
--------

This version got a shift unit. It is connected in parallel to the
ALU, feeding its output into an expanded multiplexer. Because
arithmetic right shifts are slow, shifting needs an extra cycle
to complete. Even then it was necessary to request the place and
route effort level "high" to get by with a clock period of 20 nsec.


eco32-05
--------

Again there was an error to correct: I tried to scroll the display
by copying the display memory contents and discovered that reading
the memory needs an additional bus cycle (because the memory is
clocked). A simple state machine had to be written, which in turn
needed the reset signal. I changed the top-level description of
the display from a schematic to plain text.


eco32-06
--------

An easy job: I implemented the "jalr" instruction.


eco32-07
--------

This is the first step in getting the real memory to work:
I integrated the clock/reset module from my SDRAM controller
experiments. I also corrected the naming of the flash ROM
signals; all active-low signals are now consistently named
with a trailing "_n".


eco32-08
--------

We now have a working SDRAM controller!


eco32-09
--------

Second serial interface added.


eco32-10
--------

Branches based on signed comparisons added.


eco32-11
--------

Timer added.


eco32-12
--------

Multiply, divide, and remainder instructions done.


eco32-13
--------

A first attempt to introduce virtual addressing: a totally
minimalistic MMU consisting of two AND gates which suppress
the two MSBs of the virtual address if they are set. If
they are not, too bad - the virtual address is then mapped
to physical address 0.


eco32-14
--------

A couple of steps to make interrupts available:

a) The CPU gets an input vector of 16 interrupt request lines which
   are all tied to 0 in the top-level design external to the CPU.

b) The timer circuit's control register gets an interrupt enable bit,
   which gates the 'timer expired' status bit onto an additional
   output line, the timer's interrupt request line. This line is
   connected to the CPU's irq line 14.

c) Inside the CPU there must be a set of 4 special registers. They
   are implemented in a separate module. Two instructions (mvfs and
   mvts) transfer data between the standard and the special register
   sets. The data input of the special register set is connected to
   the standard register data output 2; the write enable signal for
   the special register set is controlled by the CPU's state machine.
   The data output of the special register set is connected to the
   data input 2 multiplexer of the standard register set, which has
   to be widened by one input (and by one control line also). The
   register number which selects the special register from/to which
   reading/writing should take place comes from the instruction
   register's immediate constant. The two new instructions get one
   extra state each in the CPU's state machine.

d) For interrupts and exceptions to take place there must be four
   additional values available which can be loaded into the PC:
   0xE0000004   general interrupts (V-bit of the PSW off)
   0xC0000004   general interrupts (V-bit of the PSW on)
   0xE0000008   user TLB miss (V-bit of the PSW off)
   0xC0000008   user TLB miss (V-bit of the PSW on)
   The contents of the special register 0 (the PSW) are needed at
   several places in the description of the CPU's state machine.
   They have to be set also, independently of the mvts instruction.
   Therefore an extra data path from/to the special register set
   is established, together with a separate write signal for the
   PSW. The state machine gets two new states, one to acknowledge
   interrupts and another one to implement the rfx instruction.
   Each instruction tests a specific 'interrupt trigger line'
   before returning to state 1 (instruction fetch). If it is set,
   the state machine branches to the 'interrupt' state. In this
   way we don't need a separate state before the 'instruction
   fetch' state to check for interrupts (and also avoid the
   unpleasant alternative: to merge interrupt detection into
   the fetch state - think of the already-incremented pc, for
   example). The trigger signal is set if there is any interrupt
   request present, its mask is open, and the global interrupt
   enable (in the PSW) is set. The ECO32 architecture defines
   5 bits in the PSW to be the priority of the last acknowledged
   interrupt. Therefore a priority encoder takes the vector of
   interrupt requests (possibly modified by closed mask bits)
   and determines the highest unmasked interrupt from that. The
   two additional states in the state machine also handle the
   two stacks (each three positions deep) for the 'interrupt
   enable' and 'user mode' flags within the PSW.

e) Since its construction, the ALU had two unused function encodings;
   they had been assigned to add and subtract, but were never used.
   They now deliver either the first or the second operand of the ALU
   to the output, unchanged. This simplifies three instructions (ldhi,
   jr, rfx) as well as the interrupt state in the CPU's state machine.


eco32-15
--------

We now have the 'trap' instruction. This is an important first
example of an exception.


eco32-16
--------

This version accepts the four TLB instructions as valid instructions
(but treats them as no-ops).


eco32-17
--------

A couple of steps to make exceptions work:

a) There are only 16 interrupts, so irq_priority is only [3:0] wide.
   The leading bit of the interrupt/exception priority in the PSW is
   explicitly set to 0 in state 15 (interrupt).

b) Generally, states returning to state 1 (instruction fetch) check
   the signal irq_trigger for pending interrupts and branch to state
   15 (interrupt) if it is set. This should NOT be done if the current
   state could possibly set the PSW to disable interrupts. So states
   15 (interrupt), 22 (mvts), 23 (rfx), and 24 (trap) don't do this
   check any longer. On the other hand, delaying the acceptance of
   a pending interrupt for a whole instruction would come as a hard
   surprise for an unsuspecting system programmer. It would in fact
   be possible to write an instruction sequence which never accepts
   any interrupts, although interrupts are expected to be enabled for
   one instruction:
       mvts $5,PSW    ; disable interrupts
   label:
       mvts $4,PSW    ; enable interrupts
       mvts $5,PSW    ; disable interrupts
       j label
   This cannot be tolerated. Therefore an additional state is inserted,
   just to check irq_trigger, computed from the new value of the PSW.
   This certainly makes no sense for interrupt and trap, because the new
   value of the interrupt enable flag in the PSW is known to be 0. So
   the new state is only reached from states 22 (mvts) and 23 (rfx).
   First, I did some renumbering of states:
   Renamed state 25 to 26 (TLB instruction).
   Renamed state 24 to 25 (trap).
   Then the additional state is called state 24.

c) Because the trap instruction is merely one of several possible causes
   for an exception, its execution state (25, see step b) above) can be
   used to implement exceptions. The exception number must be communicated
   to this state. We therefore have a 4-bit register named 'exc_priority'
   which must be set by any state transition to state 25. Its contents
   are appended to a leading 1 and then represent the exception priority
   which is found in the PSW.

d) The following exceptions are implemented:
     trap instruction exception
     illegal instruction exception
     divide instruction exception

e) The 'bus timeout exception' is implemented with the help of a counter
   which is activated if the bus is enabled and its wait line active.
   When the counter expires, the exception execution state is entered.
   There is a catch: if the bus timeout occurs during instruction fetch,
   the PC has yet its old value, i.e., it must not get decremented while
   handling the exception. This could be handled best by just another
   state (renaming state 26 to 27, and using the new state 26 for
   exception handling without decrementing the PC).

f) The 'privileged instruction exception' isn't difficult to implement
   but can only be tested if a TLB is present (because the test program
   must enter user mode in order to trigger the exception - and in user
   mode, instructions cannot be executed at addresses which have their
   MSB set without triggering a 'privileged address exception').


eco32-18
--------

This intermediate version got a new bus controller which does no longer
mirror RAM and ROM in their respective upper address spaces but signals
a bus timeout instead.


eco32-19
--------

This version implements the MMU with a TLB (first of two parts).

a) Add the TLB module. It consists of an "input section" (32 comparators
   working in parallel, and a priority encoder which computes the binary
   representation of the number of one of the matching comparators), and
   an "output section" which merely delivers the previously stored frame
   number and permission bits of the frame. The output section's memory
   is addressed by the output of the priority encoder. The two sections
   together implement a fully associative address translation cache.

b) Change the MMU from a purely combinational circuit to one which needs
   a single clock cycle to compute its output. This is necessary because
   the RAM which stores frame numbers in the TLB output section also needs
   one cycle to read its contents.

c) In the controller of the CPU add one state before each bus cycle state
   (i.e., three states: fetch, load, and store). These additional states
   perform the address translation from a virtual to a physical address.
   I added three new states (28..30) which now implement the bus cycles
   and reassigned the old state numbers (1, 12, 14) to the states which
   do address translations.

d) The MMU must implement several functions:
     no operation, hold output
     map virtual to physical address
     execute tbs
     execute tbwr
     execute tbri
     execute tbwi
   The controller instructs the MMU which function is to be executed.

e) The tbwr instruction needs a "random" index. This can be generated
   by a counter which counts down at every clock pulse, instruction
   fetch, or address mapping request. There is a catch: if the counter
   would count on every clock pulse and each instruction would need a
   multiple of 2 clock pulses to complete, then only half the entries
   of the TLB would be used. Thus counting instructions is safer, and
   furthermore counting address mappings is cheaper than that (because
   address mapping is already one of the functions of the MMU and
   therefore easily detectable).

f) The values of the special registers 1 (TLB Index), 2 (TLB EntryHi),
   and 3 (TLB EntryLo) are needed within the MMU. The MMU also must
   write new values to these registers under certain circumstances.
   Three dedicated signals for each of these special registers (old
   value, write enable, new value) enable the MMU to do so.

g) In principle, the tbri instruction needs two clock cycles to do
   its work: one cycle to read the TLB and another one to write the
   data to special register 3. This can be reduced to a single clock
   cycle (write to special register 3) if the RAM's contents are read
   out by default within every clock cycle.


eco32-20
--------

This version implements the MMU with a TLB (second of two parts).

a) Detect privileged and illegal address exceptions within the state
   machine. In order to do so, virtual address bits 31, 1, and 0 must
   be available there. The exceptions are detected in the address
   translation states (1, 12, 14). Control is transferred to state
   25 (or 26 in case of violation during instruction fetch) with
   exc_priority set accordingly. Although not yet needed for the bus,
   the bus size lines must be set to the intended transfer width
   already in the translation states in order to detect illegal
   addresses there (before the bus is actually accessed). Last but
   not least the MMU must not try to map an address if that triggered
   one of the two exceptions.

b) The TLB supplies three control signals (tlb_missed, tlb_invalid,
   and tlb_wrtprot) which are needed to detect the three exceptions
   "TLB miss", "TLB entry invalid", and "page frame write protected".
   The first of these, tlb_missed, is generated in the "input section"
   of the TLB and has to be delayed for one clock cycle so that it
   appears at the TLB output at the same time the other two signals do.
   The three signals are routed to the CPU's state machine. Because
   they are valid only after the address translation took place (the
   valid and write bits are stored together with the frame number),
   the error conditions can only be detected in the bus cycle states.
   The actual bus cycle however must suppress its bus enable signal,
   if any exception has been detected.
   Attention: the three control signals must be de-asserted if the
   address in question is directly mapped (i.e., has its two MSBs set).

c) The tlb_missed signal has in fact to be splitted into two signals:
   tlb_kmissed (MSB of address is 1) and tlb_umissed (MSB is 0). This
   must be done in order to route "user TLB misses" to another start
   address. Furthermore, the V bit in the PSW has to be considered and
   the ISR start address modified accordingly.

d) The three write enable signals for the three special TLB registers
   are best produced within the main CPU state machine, because they
   are dependent on the opcode if one of the TLB instructions is
   executed. They must also be asserted according to any exception
   which das been detected.


eco32-21
--------

I changed the display description from a schematic to plain Verilog.


eco32-22
--------

The display has got character attributes: one attribute byte per
character stored in the display memory. The bits in the attribute
byte are loosely imitating those from the good old CGA adapter in
text mode.
  Bit 7:  blinking foreground
  Bit 6:  background red
  Bit 5:  background green
  Bit 4:  background blue
  Bit 3:  intensified foreground
  Bit 2:  foreground red
  Bit 1:  foreground green
  Bit 0:  foreground blue


eco32-23
--------

Now the keyboard can interrupt the CPU.


eco32-24
--------

Project re-organized. All source files are now located under a single
directory "src". Now it is easier to clean up a project after editing
or testing: simply remove all files and directories except "src" and
the project manager's control file "eco32.npl".


eco32-25
--------

The reset circuit had the following problem: although an externally
applied reset signal (produced by pressing the "reset" pushbutton)
was internally recognized for initializing the CPU, it did not work
the other way around, which is important when re-loading the FPGA.
In this case, the CPU was reset, but the external devices, especially
the disk drive, did not get a reset signal. So the drive could get
out of sync with its controller. The reset circuit now actively drives
the external bidirectional reset line when performing a reset, as well
as observing this line when not actively driving it.


eco32-26
--------

This is the first version with a real IDE disk attached! Thanks to
Martin Geisse, who did a very nice job.


eco32-27
--------

The two serial interfaces are now able to generate interrupt requests.
As far as I can see, the implementation is now functionally complete.


eco32-28
--------

The IDE disk interface had a small problem with reading/writing a block
of 8 sectors in a single operation. Fixed.


eco32-29
--------

Same as eco32-28, but with an ISE Version 11 project file. Because
it is now possible to develop exclusively under Linux (including
download to the FPGA board), all source files were converted to
newline-only line endings.

Go to most recent revision | Compare with Previous | Blame | View Log

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.