OpenCores
URL https://opencores.org/ocsvn/marca/marca/trunk

Subversion Repositories marca

[/] [marca/] [trunk/] [doc/] [implementation.tex] - Blame information for rev 8

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 jeunes2
\documentclass[10pt, twoside, a4paper]{article}
2
\usepackage{graphicx}
3
\usepackage{listings}
4
 
5
\title{marca - McAdam's RISC Computer Architecture\\Implementation Details}
6
\author{Wolfgang Puffitsch}
7
 
8
\begin{document}
9
 
10
  \maketitle
11
 
12
  \section{General}
13
 
14
  \begin{itemize}
15
  \item 16 16-bit registers
16
  \item 16KB instruction ROM (8192 instructions)
17
  \item 8KB data RAM
18
  \item 256 byte data ROM
19
  \item 75 instructions
20
  \item 16 interrupt vectors
21
  \end{itemize}
22
 
23
  \section{Internals}
24
 
25
  The processor features a 4-stage pipeline:
26
  \begin{itemize}
27
  \item instruction fetch
28
  \item instruction decode
29
  \item execution/memory access
30
  \item write back
31
  \end{itemize}
32
  This scheme is similar to the one used in the MIPS architecture,
33
  only execution and write back stage are drawn together. For our
34
  architecture does not support indexed addressing, it does not need
35
  the ALU's result and can work in parallel, having the advantage of
36
  reducing the possible hazards.
37
 
38
  Figure \ref{fig:marca} shows a rough scheme of the internals of the
39
  processor.
40
  \begin{figure}[ht!]
41
    \centering
42
    \includegraphics[width=.95\textwidth]{marca}
43
    \caption{Internal scheme}
44
    \label{fig:marca}
45
  \end{figure}
46
 
47
  \subsection{Branches}
48
  Branches are not predicted and if executed they stall the the
49
  pipeline, leading to a total execution time of 4 cycles. The fetch
50
  stage is not stalled, the decode stage however is stalled for two
51
  cycles to compensate that.
52
 
53
  \subsection{Instruction fetch}
54
  This stage is not spectacular: it simply reads an instruction from
55
  the instruction ROM, and extracts the bits for the source and
56
  destination registers.
57
 
58
  \subsection{Instruction decode}
59
  This stage translates the bit-patterns of the opcodes to the signals
60
  used internally for the operations. It also holds the register file
61
  and handles access to it. Immediate values are also constructed here.
62
 
63
  \subsection{Execution / Memory access}
64
  The execution stage is the heart and soul of the processor: it holds
65
  the ALU, the memory/IO unit and a unit for interrupt handling.
66
 
67
  \subsubsection{ALU}
68
  The ALU does all arithmetic and logic computations as well as taking
69
  care of the processors flags (which are organized as seen in table
70
  \ref{tab:flags}).
71
 
72
  \begin{table}[ht!]
73
    \centering
74
    \begin{tabular}{|p{.75em}|p{.75em}|p{.75em}|p{.75em}
75
                    |p{.75em}|p{.75em}|p{.75em}|p{.75em}
76
                    |p{.75em}|p{.75em}|p{.75em}|p{.75em}
77
                    |p{.75em}|p{.75em}|p{.75em}|p{.75em}|p{.75em}}
78
      \multicolumn{16}{c}{Bit 15 \hfill Bit 0} \\
79
      \hline
80
      & & & & & & & & & & P & I & N & V & C & Z \\
81
      \hline
82
    \end{tabular}
83
    \caption{The flag register}
84
    \label{tab:flags}
85
  \end{table}
86
 
87
  Operations which need more than one cycle to execute (multiplication,
88
  division and modulo) block the rest of the processor until they are
89
  finished.
90
 
91
  \subsubsection{Memory/IO unit}
92
  The memory/IO unit takes care of the ordinary data memory, the data
93
  ROM (which is mapped to the addresses right above the RAM) and the
94
  communication to peripheral modules. Peripheral modules are located
95
  within the memory/IO unit and mapped to the highest addresses.
96
 
97
  The memories (the instruction ROM too) are Altera specific; we
98
  decided not to use generic memories, because \textsl{Quartus} can update the
99
  contents of its proprietary ROMs without synthesizing the whole
100
  design. Because all memories are single-ported (and thus fairly
101
  simple) it should be easy to replace them with memories specific to
102
  other vendors.
103
 
104
  We also decided against the use of external memories; larger FPGAs
105
  can accommodate all addressable memory on-chip, so the implementation
106
  overhead would not have paid off.
107
 
108
  Accesses which take more than one cycle (stores to peripheral
109
  modules and all load operations) block the rest of the processor
110
  until they are finished.
111
 
112
  \paragraph{Peripheral modules}
113
  The peripheral modules use a slightly modified version of the SimpCon
114
  interface. The SimpCon specific signals are pulled together to
115
  records, and the words which can be read/written are limited to 16
116
  bits. For accessing such a module, one may only use \texttt{load}
117
  and \texttt{store} instructions which point to aligned addresses.
118
 
119
  \paragraph{UART}
120
  The built-in UART is derived from the sc\_uart from Martin
121
  Sch\"oberl.  Apart from adapting the SimpCon interface, an interrupt
122
  line and two bits for enabling/masking receive (bit 3 in the status
123
  register) and transmit (bit 2) interrupts. In the current version
124
  address 0xFFF8 (-8) correspond to the UART's status register and
125
  address 0xFFFA (-6) to the wr\_data/rd\_data register.
126
 
127
  \subsubsection{Interrupt unit}
128
  The interrupt unit takes care of the interrupt vectors and, of
129
  course, the triggering of interrupts. Interrupts are executed only
130
  if the global interrupt flag is set, none of the other units is busy
131
  and the instruction in the execution stage is valid (it takes 3
132
  cycles after jumps, branches etc. until a new valid instruction is
133
  in that stage).
134
 
135
  Instructions which cannot be decoded as well as the ``error''
136
  instruction trigger interrupt 0; the ALU can trigger interrupt 1
137
  (division by zero), the memory unit can trigger interrupt 2 (invalid
138
  memory access). In contrast to all other interrupts, these three
139
  interrupts do not repeat the instruction which is executed when they
140
  occur.
141
 
142
  \subsection{Write back}
143
  The write back stage passes on the result of the execution stage to
144
  all other stages.
145
 
146
  \section{Assembler}
147
  The assembler \textsl{spar} (SPear Assembler Recycled) uses a syntax
148
  quite like usual Unix-style assemblers. It accepts the pseudo-ops
149
  \texttt{.file}, \texttt{.text}, \texttt{.data}, \texttt{.bss},
150
  \texttt{.align}, \texttt{.comm}, \texttt{.lcomm}, \texttt{.org} and
151
  \texttt{.skip} with the usual meanings. The mnemonic \texttt{data}
152
  initializes a byte to some constant value. In difference to the
153
  instruction set architecture specification, \texttt{mod} and
154
  \texttt{umod} accept three operands (if a move is needed, it is
155
  silently inserted).
156
 
157
  The assembler produces three files: one file for the instruction
158
  ROM, one file for the even bytes of the data ROM and one file for
159
  the odd bytes of the instruction ROM. The splitting of the data is
160
  necessary, because the data memories internally are split into two
161
  8-bit memories in order to support unaligned memory accesses without
162
  delays.
163
 
164
  Three output formats are supported: .mif (Memory Initialization
165
  Format), .hex (Intel Hex Format) and a binary format designed for
166
  download via UART.
167
 
168
  \section{Resource usage and speed}
169
 
170
  The processor was synthesized with \textsl{Quartus II} for the
171
  \textsl{Cyclone EP1C12Q240C8} FPGA with 12060 logic cells and 29952
172
  bytes of on-chip memory available.
173
 
174
  The processor needs $\sim$3550 logic cells or 29\% when being
175
  compiled for maximum clock frequency, which is $\sim$60 MHz. When
176
  optimizing for area, it needs $\sim$2600 logic cells or 22\% at
177
  $\sim$25 MHz.
178
 
179
  The processor uses 24832 bytes or 83\% of on-chip memory.
180
 
181
  \section{Example}
182
 
183
  \subsection{Reversing a line}
184
 
185
  In listing \ref{lst:uart} one can see how to interface the uart via
186
  interrupts. The program reads in a line from the UART and the writes
187
  it back reversed. The lines 1 to 4 show how to instantiate memory
188
  (the two bytes defined form the DOS-style end-of-line). The
189
  lines 7 to 25 initialize the registers and register the interrupt
190
  vectors, line 28 builds a barrier against the rest of the code.
191
 
192
  The lines 32 to 76 form the interrupt service routine. It first
193
  checks if it is operating in read or in write mode. When reading, it
194
  reads from the UART and stores the result. A mode switch occurs when
195
  a newline character is encountered. In write mode the contents of
196
  the buffer is written to the UART and switching back to read mode is
197
  done when finished.
198
 
199
  In figure \ref{fig:sim} the results of the simulation are presented.
200
 
201
  \lstset{basicstyle=\footnotesize,numbers=left,numberstyle=\tiny}
202
  \lstset{caption=Example for the UART and interrupts}
203
  \lstset{label=lst:uart}
204
  \lstinputlisting{uart_reverse.s}
205
 
206
  \begin{figure}[ht!]
207
    \centering
208
    \includegraphics[width=.95\textwidth]{uart_sim}
209
    \caption{Simulation results}
210
    \label{fig:sim}
211
  \end{figure}
212
 
213
  \subsection{Computing factorials}
214
 
215
  The example in \ref{lst:fact} computes the factorials of 1 \ldots 9
216
  and writes the results to the PC via UART. Note that the last result
217
  transmitted will be wrong, because it is truncated to 16 bits.
218
 
219
  \lstset{basicstyle=\footnotesize,numbers=left,numberstyle=\tiny}
220
  \lstset{caption=Computing factorials}
221
  \lstset{label=lst:fact}
222
  \lstinputlisting{factorial.s}
223
 
224
 
225
  \section{Versions Of This Document}
226
 
227
  2006-12-14: Draft version \textbf{0.1}
228
 
229
  \noindent
230
  2006-12-29: Draft version \textbf{0.2}
231
  \begin{itemize}
232
    \item A few refinements.
233
  \end{itemize}
234
 
235
  \noindent
236
  2007-01-22: Draft version \textbf{0.3}
237
  \begin{itemize}
238
    \item Added another example.
239
  \end{itemize}
240
 
241
  \noindent
242
  2007-02-02: Draft version \textbf{0.4}
243
  \begin{itemize}
244
    \item Updated resource usage and speed section.
245
  \end{itemize}
246
 
247
\end{document}

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.