OpenCores
URL https://opencores.org/ocsvn/ion/ion/trunk

Subversion Repositories ion

[/] [ion/] [trunk/] [doc/] [src/] [tex/] [notes.tex] - Blame information for rev 243

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 210 ja_rd
\chapter{Design Notes}
2
\label{notes}
3
 
4
\section{Project Goals}
5
\label{goals}
6
 
7
    The first iteration of the project will be deemed finished when it can do
8
    the following:
9
 
10
\begin{enumerate}
11
    \item Run a minimal set of MIPS-I opcodes.\\
12
        Excluding unaligned load/store (formerly patented).\\
13
        Excluding all CPA instructions.\\
14
        Excluding all CP0 instructions related to TLB.\\
15
        Cache instructions will not be implemented as defined.
16
    \item Catch all undefined opcodes (and trigger exception).
17
    \item Operate in kernel/user mode as per the architecture definition.
18
    \item Handle exceptions in a manner compatible to MIPS-I standard.
19
    \item Code cache and data cache, even if not standard.\\
20
        No MMU and no TLB, and no cache-related instructions.
21
    \item Implement as much of CP0 as necessary for the above goals.
22
    \item Interface to external SRAM (or FLASH) on 8- and 16-bit data bus.
23
    \item Be no bigger than Plasma in a Spartan-3 or Cyclone-2 device, and
24
        no slower -- Plasma is used as a reference in many ways.\\
25
        Speed measured in raw clock frequency for the time being.
26
        (I.e. don't not consider stalls, interlocks, etc. yet)
27
    \item Interlock behavior of MUL/DIV and L* compatible to toolchain.\\
28
        That is, interlock loads instead of relying on a delay slot.
29
\end{enumerate}
30
 
31
 
32
    Unaligned load/stores are excluded not because of patent concerns (the
33
    patents already expired) but because they're not essential for a first
34
    version of the core. The same goes for all other exclusions.
35
 
36
    As of rev. 154 all the 1st block goals have been accomplished (but not very
37
    heavily tested; many bugs remain, probably).\\
38
 
39
    For a second iteration I plan on the following:
40
\begin{enumerate}
41
    \item Proper interlocking of load cycles (with no wasted cycles).
42
    \item External interrupt support.
43
    \item Trap handlers (instruction emulation) for unaligned load and store
44
        instructions.
45
    \item Trap handlers (instruction emulation) for the most usual MIPS32
46
        instructions.
47
    \item Some much needed optimization of the caches.
48
\end{enumerate}
49
    None of these things have been done.\\
50
 
51
 
52
    Note that 32-bit memory interfaces are not to be implemented any time soon,
53
    mainly because I don't have any actual hardware with which to test it.
54
 
55
 
56
\section{Development status}
57
\label{status}
58
 
59
    The CPU is already able to execute almost any MIPS-I code (excluding some
60
    unimplemented instructions such as cache control).\\
61
    It can pass a basic opcode test and can execute some basic applications
62
    compiled with standard gcc tools (specifically, it can run an 'Adventure'
63
    demo and a tiny 'hello world' program, see section 6).\\
64
 
65
    Hardware interrupt support is entirely missing.
66
 
67
    The most important limitations are the very basic memory interface, with
68
    no support for SDRAM, and the absence of MIPS32 trap handlers -- which
69
    means that the ubiquitous MIPS32 toolchains can't be easily used with
70
    this core.
71
 
72
    The memory controller can already access external static memory (SRAM or
73
    FLASH) on 8-bit and/or 16 bit buses. Still does not support SDRAM, nor
74
    static RAM in other bus widths.
75
    My main development target is a DE-1 board from Terasic (Cyclone-2) and I
76
    have focused in the kind of memory it has.
77
 
78
    Wait states can be configured at synthesis, see section
79
    ~\ref{memory_map_definition}.
80
    Code sample 'memtest' takes advantage of this to do a basic test of the
81
    external SRAM, and code sample 'Adventure' uses both Flash and SRAM.
82
    All the code samples habe been tested with the cache enabled and disabled,
83
    and they ship with the cache enabled (i.e. with C startup code that
84
    initializes and enables the cache).
85
 
86
 
87
    The code samples can be found in the /src directory (see section
88
    ~\ref{samples}).
89
 
90
 
91
    This is a summary of the state of the CPU at this time:
92
\begin{itemize}
93
    \item MIPS-I things not implemented
94
    \begin{enumerate}
95
        \item External hardware interrupts.
96
    \end{enumerate}
97
 
98
    \item Things implemented but not fully tested.
99
    \begin{enumerate}
100
        \item Rte instruction.
101
        \item Kernel/user modes.
102
    \end{enumerate}
103
 
104
    \item Things with provisional implementation
105
    \begin{enumerate}
106
        \item Load interlocks: the pipeline is stalled for every load instruction,
107
            even if the target register is not used in the following
108
            instruction. So that every load takes two cycles.\\
109
            The interlock logic should check register indices and stall only if
110
            there is a data hazard.\\
111
            Note that all that's needed is a better identification of stall
112
            conditions; the logic to enable a load instruction that does not
113
            stall to overlap the next instruction is already in place.\\
114
            The interlock logic needs a stronger test bench anyway.
115
        \item Documentation is too sparse and source code is barely commented.\\
116
        \item The D-Cache handles RAW hazards in a very inefficient way.\\
117
            Data refills in a SW+LW sequence should only be triggered when the
118
            SW invalidates the same line the LW is loading. Instead, the current
119
            cache triggers the data refill always (for a SW+LW sequence, that
120
            is).\\
121
            This performance drag has to be fixed without ruining the clock rate
122
            (that's the catch).
123
    \end{enumerate}
124
\end{itemize}
125
 
126
\section{Performance}
127
\label{performance}
128
    In my main test system, a Cyclone-2 grade -7, the core
129
    with caches and with mul/div and all other necessary functionality, plus
130
    a barebones UART, will be below 2500 LEs + 18 BRAMs, running at least at
131
    50 MHz (with 'balanced optimization' on Quartus-II).\\
132
 
133
    As soon as the core is in a stable state I will include a few synthesis
134
    performance numbers for common configurations.\\
135
 
136
    As soon as I can build a dhrystone benchmark I will post results (and commit
137
    the code). The core needs a timer before I can do that.\\
138
 
139
    My first performance target will be a real R3000 at the same clock rate.
140
    I can anticipate that performance will be MUCH lower than that (by a factor
141
    of 4 or more) due to the bus widths and the wait states, AND the inefficient
142
    cache implementation. I'll work on that as soon as the basic stuff is done.
143
 
144
\section{Next steps}
145
\label{next_steps}
146
    \begin{itemize}
147
    \item Implement efficient load interlock detection with no wasted cycles.
148
    \item Do whatever it takes to use standard C library functions.
149
    \item Alternatively, build a small C library replacement.
150
    \item Add a couple of benchmarks, including one with FP arithmetic.
151
    \item Modify the software simulator so it can boot uClinux.
152
    \item Make a uClinux port suitable for a R3000 derivative, from BuildRoot.
153
    \item Make a freeRTOS port suitable for a R3000 derivative.
154
    \end{itemize}
155
 
156
    Some of the above items are done, others are in progress and others
157
    are pipe dreams at this point.
158
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.