1 |
210 |
ja_rd |
\chapter{Design Notes}
|
2 |
|
|
\label{notes}
|
3 |
|
|
|
4 |
|
|
\section{Project Goals}
|
5 |
|
|
\label{goals}
|
6 |
|
|
|
7 |
|
|
The first iteration of the project will be deemed finished when it can do
|
8 |
|
|
the following:
|
9 |
|
|
|
10 |
|
|
\begin{enumerate}
|
11 |
|
|
\item Run a minimal set of MIPS-I opcodes.\\
|
12 |
|
|
Excluding unaligned load/store (formerly patented).\\
|
13 |
|
|
Excluding all CPA instructions.\\
|
14 |
|
|
Excluding all CP0 instructions related to TLB.\\
|
15 |
|
|
Cache instructions will not be implemented as defined.
|
16 |
|
|
\item Catch all undefined opcodes (and trigger exception).
|
17 |
|
|
\item Operate in kernel/user mode as per the architecture definition.
|
18 |
|
|
\item Handle exceptions in a manner compatible to MIPS-I standard.
|
19 |
|
|
\item Code cache and data cache, even if not standard.\\
|
20 |
|
|
No MMU and no TLB, and no cache-related instructions.
|
21 |
|
|
\item Implement as much of CP0 as necessary for the above goals.
|
22 |
|
|
\item Interface to external SRAM (or FLASH) on 8- and 16-bit data bus.
|
23 |
|
|
\item Be no bigger than Plasma in a Spartan-3 or Cyclone-2 device, and
|
24 |
|
|
no slower -- Plasma is used as a reference in many ways.\\
|
25 |
|
|
Speed measured in raw clock frequency for the time being.
|
26 |
|
|
(I.e. don't not consider stalls, interlocks, etc. yet)
|
27 |
|
|
\item Interlock behavior of MUL/DIV and L* compatible to toolchain.\\
|
28 |
|
|
That is, interlock loads instead of relying on a delay slot.
|
29 |
|
|
\end{enumerate}
|
30 |
|
|
|
31 |
|
|
|
32 |
|
|
Unaligned load/stores are excluded not because of patent concerns (the
|
33 |
|
|
patents already expired) but because they're not essential for a first
|
34 |
|
|
version of the core. The same goes for all other exclusions.
|
35 |
|
|
|
36 |
|
|
As of rev. 154 all the 1st block goals have been accomplished (but not very
|
37 |
|
|
heavily tested; many bugs remain, probably).\\
|
38 |
|
|
|
39 |
|
|
For a second iteration I plan on the following:
|
40 |
|
|
\begin{enumerate}
|
41 |
|
|
\item Proper interlocking of load cycles (with no wasted cycles).
|
42 |
|
|
\item External interrupt support.
|
43 |
|
|
\item Trap handlers (instruction emulation) for unaligned load and store
|
44 |
|
|
instructions.
|
45 |
|
|
\item Trap handlers (instruction emulation) for the most usual MIPS32
|
46 |
|
|
instructions.
|
47 |
|
|
\item Some much needed optimization of the caches.
|
48 |
|
|
\end{enumerate}
|
49 |
|
|
None of these things have been done.\\
|
50 |
|
|
|
51 |
|
|
|
52 |
|
|
Note that 32-bit memory interfaces are not to be implemented any time soon,
|
53 |
|
|
mainly because I don't have any actual hardware with which to test it.
|
54 |
|
|
|
55 |
|
|
|
56 |
|
|
\section{Development status}
|
57 |
|
|
\label{status}
|
58 |
|
|
|
59 |
|
|
The CPU is already able to execute almost any MIPS-I code (excluding some
|
60 |
|
|
unimplemented instructions such as cache control).\\
|
61 |
|
|
It can pass a basic opcode test and can execute some basic applications
|
62 |
|
|
compiled with standard gcc tools (specifically, it can run an 'Adventure'
|
63 |
|
|
demo and a tiny 'hello world' program, see section 6).\\
|
64 |
|
|
|
65 |
|
|
Hardware interrupt support is entirely missing.
|
66 |
|
|
|
67 |
|
|
The most important limitations are the very basic memory interface, with
|
68 |
|
|
no support for SDRAM, and the absence of MIPS32 trap handlers -- which
|
69 |
|
|
means that the ubiquitous MIPS32 toolchains can't be easily used with
|
70 |
|
|
this core.
|
71 |
|
|
|
72 |
|
|
The memory controller can already access external static memory (SRAM or
|
73 |
|
|
FLASH) on 8-bit and/or 16 bit buses. Still does not support SDRAM, nor
|
74 |
|
|
static RAM in other bus widths.
|
75 |
|
|
My main development target is a DE-1 board from Terasic (Cyclone-2) and I
|
76 |
|
|
have focused in the kind of memory it has.
|
77 |
|
|
|
78 |
|
|
Wait states can be configured at synthesis, see section
|
79 |
|
|
~\ref{memory_map_definition}.
|
80 |
|
|
Code sample 'memtest' takes advantage of this to do a basic test of the
|
81 |
|
|
external SRAM, and code sample 'Adventure' uses both Flash and SRAM.
|
82 |
|
|
All the code samples habe been tested with the cache enabled and disabled,
|
83 |
|
|
and they ship with the cache enabled (i.e. with C startup code that
|
84 |
|
|
initializes and enables the cache).
|
85 |
|
|
|
86 |
|
|
|
87 |
|
|
The code samples can be found in the /src directory (see section
|
88 |
|
|
~\ref{samples}).
|
89 |
|
|
|
90 |
|
|
|
91 |
|
|
This is a summary of the state of the CPU at this time:
|
92 |
|
|
\begin{itemize}
|
93 |
|
|
\item MIPS-I things not implemented
|
94 |
|
|
\begin{enumerate}
|
95 |
|
|
\item External hardware interrupts.
|
96 |
|
|
\end{enumerate}
|
97 |
|
|
|
98 |
|
|
\item Things implemented but not fully tested.
|
99 |
|
|
\begin{enumerate}
|
100 |
|
|
\item Rte instruction.
|
101 |
|
|
\item Kernel/user modes.
|
102 |
|
|
\end{enumerate}
|
103 |
|
|
|
104 |
|
|
\item Things with provisional implementation
|
105 |
|
|
\begin{enumerate}
|
106 |
|
|
\item Load interlocks: the pipeline is stalled for every load instruction,
|
107 |
|
|
even if the target register is not used in the following
|
108 |
|
|
instruction. So that every load takes two cycles.\\
|
109 |
|
|
The interlock logic should check register indices and stall only if
|
110 |
|
|
there is a data hazard.\\
|
111 |
|
|
Note that all that's needed is a better identification of stall
|
112 |
|
|
conditions; the logic to enable a load instruction that does not
|
113 |
|
|
stall to overlap the next instruction is already in place.\\
|
114 |
|
|
The interlock logic needs a stronger test bench anyway.
|
115 |
|
|
\item Documentation is too sparse and source code is barely commented.\\
|
116 |
|
|
\item The D-Cache handles RAW hazards in a very inefficient way.\\
|
117 |
|
|
Data refills in a SW+LW sequence should only be triggered when the
|
118 |
|
|
SW invalidates the same line the LW is loading. Instead, the current
|
119 |
|
|
cache triggers the data refill always (for a SW+LW sequence, that
|
120 |
|
|
is).\\
|
121 |
|
|
This performance drag has to be fixed without ruining the clock rate
|
122 |
|
|
(that's the catch).
|
123 |
|
|
\end{enumerate}
|
124 |
|
|
\end{itemize}
|
125 |
|
|
|
126 |
|
|
\section{Performance}
|
127 |
|
|
\label{performance}
|
128 |
|
|
In my main test system, a Cyclone-2 grade -7, the core
|
129 |
|
|
with caches and with mul/div and all other necessary functionality, plus
|
130 |
|
|
a barebones UART, will be below 2500 LEs + 18 BRAMs, running at least at
|
131 |
|
|
50 MHz (with 'balanced optimization' on Quartus-II).\\
|
132 |
|
|
|
133 |
|
|
As soon as the core is in a stable state I will include a few synthesis
|
134 |
|
|
performance numbers for common configurations.\\
|
135 |
|
|
|
136 |
|
|
As soon as I can build a dhrystone benchmark I will post results (and commit
|
137 |
|
|
the code). The core needs a timer before I can do that.\\
|
138 |
|
|
|
139 |
|
|
My first performance target will be a real R3000 at the same clock rate.
|
140 |
|
|
I can anticipate that performance will be MUCH lower than that (by a factor
|
141 |
|
|
of 4 or more) due to the bus widths and the wait states, AND the inefficient
|
142 |
|
|
cache implementation. I'll work on that as soon as the basic stuff is done.
|
143 |
|
|
|
144 |
|
|
\section{Next steps}
|
145 |
|
|
\label{next_steps}
|
146 |
|
|
\begin{itemize}
|
147 |
|
|
\item Implement efficient load interlock detection with no wasted cycles.
|
148 |
|
|
\item Do whatever it takes to use standard C library functions.
|
149 |
|
|
\item Alternatively, build a small C library replacement.
|
150 |
|
|
\item Add a couple of benchmarks, including one with FP arithmetic.
|
151 |
|
|
\item Modify the software simulator so it can boot uClinux.
|
152 |
|
|
\item Make a uClinux port suitable for a R3000 derivative, from BuildRoot.
|
153 |
|
|
\item Make a freeRTOS port suitable for a R3000 derivative.
|
154 |
|
|
\end{itemize}
|
155 |
|
|
|
156 |
|
|
Some of the above items are done, others are in progress and others
|
157 |
|
|
are pipe dreams at this point.
|
158 |
|
|
|