OpenCores
URL https://opencores.org/ocsvn/lxp32/lxp32/trunk

Subversion Repositories lxp32

[/] [lxp32/] [trunk/] [doc/] [src/] [trm/] [lxp32-trm.tex] - Blame information for rev 2

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 ring0_mipt
% !TEX TS-program = lualatex
2
\documentclass[a4paper,12pt,twoside,extrafontsizes]{memoir}
3
 
4
\input{preamble.tex}
5
 
6
\begin{document}
7
 
8
\input{frontmatter.tex}
9
 
10
\mainmatter
11
 
12
\chapter{Introduction}
13
 
14
\section{Main features}
15
 
16
\lxp{} (\emph{Lightweight eXecution Pipeline}) is a small 32-bit CPU IP core optimized for FPGA implementation. Its key features include:
17
 
18
\begin{itemize}
19
        \item described in portable VHDL-93, not tied to any particular vendor;
20
        \item 3-stage pipeline;
21
        \item 256 registers implemented as a RAM block;
22
        \item simple instruction set with less than 30 distinct opcodes;
23
        \item separate instruction and data buses, optional instruction cache;
24
        \item WISHBONE compatible;
25
        \item 8 interrupts with hardwired priorities;
26
        \item optional divider.
27
\end{itemize}
28
 
29
Being a lightweight IP core, \lxp{} also has certain limitations:
30
 
31
\begin{itemize}
32
        \item no branch prediction;
33
        \item no floating-point unit;
34
        \item no memory management unit;
35
        \item no nested interrupt handling;
36
        \item no debugging facilities.
37
\end{itemize}
38
 
39
Two major hardware versions of the CPU are provided: \lxp{}U which does not include an instruction cache and uses the Low Latency Interface (Section \ref{sec:lli}) to fetch instructions, and \lxp{}C which fetches instructions over a cached WISHBONE bus protocol. These versions are otherwise identical and have the same instruction set architecture.
40
 
41
\section{Implementation estimates}
42
 
43
Typical results of \lxp{} core FPGA implementation are presented in Table \ref{tab:implementation}. Note that these data are only useful as rough estimates, since actual results depend greatly on tool versions and configuration, design constraints, device utilization ratio and other factors.
44
 
45
Data on two configurations are provided:
46
 
47
\begin{itemize}
48
        \item \emph{Compact}: \lxp{}U (without instruction cache), no divider, 2-cycle multiplier.
49
        \item \emph{Full}: \lxp{}C (with instruction cache), divider, 2-cycle multiplier.
50
\end{itemize}
51
 
52
The slowest speed grade was used for clock frequency estimation.
53
 
54
\begin{table}[htbp]
55
        \caption{Typical results of \lxp{} core FPGA implementation}
56
        \label{tab:implementation}
57
        \begin{tabularx}{\textwidth}{Q{0.5\textwidth}LL}
58
                \toprule
59
                Resource & Compact & Full \\
60
                \midrule
61
                \multicolumn{3}{c}{Altera\textregistered{} Cyclone\textregistered{} V 5CEBA2F23C8} \\
62
                \midrule
63
                Logic Array Blocks (LABs) & 79 & 119 \\
64
                \hspace*{1em}ALMs & 630 & 972 \\
65
                \hspace*{2em}ALUTs & 982 & 1531 \\
66
                \hspace*{2em}Flip-flops & 537 & 942 \\
67
                DSP blocks & 3 & 3 \\
68
                RAM blocks (M10K) & 2 & 3 \\
69
                Clock frequency & 103.9 MHz & 98.8 MHz \\
70
                \midrule
71
                \multicolumn{3}{c}{Microsemi\textregistered{} IGLOO\textregistered{}2 M2GL005-FG484} \\
72
                \midrule
73
                Logic elements (LUT+DFF) & 1529 & 2226 \\
74
                \hspace*{1em}LUTs & 1471 & 2157 \\
75
                \hspace*{1em}Flip-flops & 718 & 1181 \\
76
                Mathblocks (MACC) & 3 & 3 \\
77
                RAM blocks (RAM1K18) & 2 & 3 \\
78
                Clock frequency & 111.7 MHz & 107.8 MHz \\
79
                \midrule
80
                \multicolumn{3}{c}{Xilinx\textregistered{} Artix\textregistered{}-7 xc7a15tfgg484-1} \\
81
                \midrule
82
                Slices & 264 & 381 \\
83
                \hspace*{1em}LUTs & 809 & 1151 \\
84
                \hspace*{1em}Flip-flops & 527 & 923 \\
85
                DSP blocks (DSP48E1) & 4 & 4 \\
86
                RAM blocks (RAMB18E1) & 2 & 3 \\
87
                Clock frequency & 113.6 MHz & 109.3 MHz \\
88
                \bottomrule
89
        \end{tabularx}
90
\end{table}
91
 
92
\section{Structure of this manual}
93
 
94
General description of the \lxp{} operation from a software developer's point of view can be found in Chapter \ref{ch:isa}, \styledtitleref{ch:isa}. Future versions of the \lxp{} CPU are intended to be at least backwards compatible with this architecture.
95
 
96
Topics related to hardware, such as synthesis, implementation and interfacing other IP cores, are covered in Chapter \ref{ch:integration}, \styledtitleref{ch:integration}. The \lxp{} IP core package also includes testbenches which can be used to simulate the design as described in Chapter \ref{ch:simulation}, \styledtitleref{ch:simulation}.
97
 
98
Tools shipped as parts of the \lxp{} IP core package (assembler/linker, disassembler and interconnect generator) are documented in Chapter \ref{ch:developmenttools}, \styledtitleref{ch:developmenttools}.
99
 
100
Appendices include a detailed description of the \lxp{} instruction set, instruction cycle counts and \lxp{} assembly language definition. WISHBONE datasheet required by the WISHBONE specification is also provided.
101
 
102
\chapter{Instruction set architecture}
103
\label{ch:isa}
104
 
105
\section{Data format}
106
 
107
Most \lxp{} instructions work with 32-bit data words. A few instructions that address individual bytes use little-endian order, that is, the least significant byte is stored at the lowest address. Signed values are encoded in a 2's complement format.
108
 
109
\section{Instruction format}
110
\label{sec:instructionformat}
111
 
112
All \lxp{} instructions are encoded as 32-bit words, with the exception of \instr{lc} (\instrname{Load Constant}), which occupies two adjacent 32-bit words. Instructions in memory must be aligned to word boundaries.
113
 
114
Most arithmetic and logical instructions take two source operands and write the result to an independent destination register. General instruction format is presented on Figure \ref{fig:instructionformat}.
115
 
116
\begin{figure}[htbp]
117
        \centering
118
        \includegraphics[scale=1.2]{images/instructionformat.pdf}
119
        \caption{\lxp{} instruction format}
120
        \label{fig:instructionformat}
121
\end{figure}
122
 
123
This format includes the following fields:
124
 
125
\begin{enumerate}
126
        \item OPCODE -- a 6-bit instruction code (see Appendix \ref{app:instructionset}).
127
        \item T1 -- type of the RD1 field.
128
        \item T2 -- type of the RD2 field.
129
        \item DST -- register number (usually the destination register).
130
        \item RD1 -- register/direct operand 1.
131
        \item RD2 -- register/direct operand 2.
132
\end{enumerate}
133
 
134
Some of these fields may not have meaning for a particular instruction; such unused fields are replaced with zeros.
135
 
136
DST field specifies one of the 256 \lxp{} registers. RD1 and RD2 fields can denote either source register operands or direct (immediate) operands: if the corresponding T field is 1, RD value is a register number, otherwise it is interpreted as a direct signed byte in a 2's complement format (valid values range from -128 to 127).
137
 
138
For example, consider the following instruction that adds \code{10} to \code{r0} and writes the result to \code{r1}:
139
 
140
\begin{codepar}
141
    \instr{add} r1, r0, 10
142
\end{codepar}
143
 
144
In this example, OPCODE is \code{010000}, T1 is \code{1}, T2 is \code{0}, DST is \code{00000001}, RD1 is \code{00000000} and RD2 is \code{00001010}. Hence, the instruction is encoded as \code{0x4201000A}.
145
 
146
For convenience, some instructions have alias mnemonics. For example, \lxp{} does not have a distinct \instr{mov} opcode: instead, \code{\instr{mov} dst, src} is an alias for \code{\instr{add} dst, src, 0}.
147
 
148
A complete list of \lxp{} instructions is provided in Appendix \ref{app:instructionset}.
149
 
150
\section{Registers}
151
 
152
\lxp{} has 256 registers denoted as \code{r0} -- \code{r255}. The first 240 of them (from \code{r0} to \code{r239}) are general-purpose registers (GPR), the last 16 (from \code{r240} to \code{r255}) are special-purpose registers (SPR). For convenience, some special-purpose registers have alias names: for example, \code{r255} can be also referred to as \code{sp} (stack pointer). Special purpose registers are listed in Table \ref{tab:spr}. Some of these registers are reserved: the software should not access them.
153
 
154
\begin{table}[htbp]
155
        \caption{\lxp{} special-purpose registers}
156
        \label{tab:spr}
157
        \begin{tabularx}{\textwidth}{llL}
158
                \toprule
159
                Alias name & Generic name & Description \\
160
                \midrule
161
                \code{iv0} & \code{r240} & Interrupt vector 0 (Section \ref{sec:interrupthandling}) \\
162
                \code{iv1} & \code{r241} & Interrupt vector 1 (Section \ref{sec:interrupthandling}) \\
163
                \code{iv2} & \code{r242} & Interrupt vector 2 (Section \ref{sec:interrupthandling}) \\
164
                \code{iv3} & \code{r243} & Interrupt vector 3 (Section \ref{sec:interrupthandling}) \\
165
                \code{iv4} & \code{r244} & Interrupt vector 4 (Section \ref{sec:interrupthandling}) \\
166
                \code{iv5} & \code{r245} & Interrupt vector 5 (Section \ref{sec:interrupthandling}) \\
167
                \code{iv6} & \code{r246} & Interrupt vector 6 (Section \ref{sec:interrupthandling}) \\
168
                \code{iv7} & \code{r247} & Interrupt vector 7 (Section \ref{sec:interrupthandling}) \\
169
                \multicolumn{1}{l}{---} & \code{r248}\,--\,\code{r251} & \emph{Reserved} \\
170
                \code{cr}  & \code{r252} & Control register (Section \ref{sec:interrupthandling}) \\
171
                \code{irp} & \code{r253} & Interrupt return pointer (Section \ref{sec:interrupthandling}) \\
172
                \code{rp}  & \code{r254} & Return pointer (Section \ref{sec:callingprocedures})\\
173
                \code{sp}  & \code{r255} & Stack pointer (Section \ref{sec:stack}) \\
174
                \bottomrule
175
        \end{tabularx}
176
\end{table}
177
 
178
All registers are zero-initialized during the CPU reset.
179
 
180
\section{Addressing}
181
\label{sec:addressing}
182
 
183
All addressing in \lxp{} is indirect. In order to access a memory location, its address must be stored in a register; any available register can be used for this purpose.
184
 
185
Some instructions, namely \instr{lsb} (\instrname{Load Signed Byte}), \instr{lub} (\instrname{Load Unsigned Byte}) and \instr{sb} (\instrname{Store Byte}) provide byte-granular access, in which case all 32 bits in the address are significant. Otherwise the least two address bits are ignored as \lxp{} doesn't support unaligned access to 32-bit data words (during simulation, a warning is emitted if such a transaction is attempted).
186
 
187
A special rule applies to pointers that refer to instructions: since instructions are always word-aligned, the least significant bit is interpreted as the \code{IRF} (\emph{Interrupt Return Flag}). See Section \ref{sec:interrupthandling} for details.
188
 
189
\section{Stack}
190
\label{sec:stack}
191
 
192
The current pointer to the top of the stack is stored in the \code{sp} register. To the hardware this register is not different from general purpose registers, that is, in no situation does the CPU access the stack implicitly (procedure calls and interrupts use register-based conventions).
193
 
194
Software can access the stack as follows:
195
 
196
\begin{codepar}
197
    \emph{// push r0 on the stack}
198
    \instr{sub} sp, sp, 4
199
    \instr{sw} sp, r0
200
    \emph{// pop r0 from the stack}
201
    \instr{lw} r0, sp
202
    \instr{add} sp, sp, 4
203
\end{codepar}
204
 
205
Before using the stack, the \code{sp} register must be set up to point to a valid memory location. The simplest software can operate stackless, or even without data memory altogether if registers are enough to store the program state.
206
 
207
\section{Calling procedures}
208
\label{sec:callingprocedures}
209
 
210
\lxp{} provides a \instr{call} instruction which stores the address of the next instruction in the \code{rp} register and transfers execution to the procedure pointed by \instr{call} operand. Return from a procedure is performed by \code{\instr{jmp} rp} instruction which also has \instr{ret} alias.
211
 
212
If a procedure must in turn call some procedure itself, the return pointer in the \code{rp} register will be overwritten by the \instr{call} instruction. Hence the procedure must save its value somewhere; the most general solution is to use the stack:
213
 
214
\begin{codepar}
215
    \instr{sub} sp, sp, 4
216
    \instr{sw} sp, rp
217
    ...
218
    \instr{call} r1
219
    ...
220
    \instr{lw} rp, sp
221
    \instr{add} sp, sp, 4
222
    \instr{ret}
223
\end{codepar}
224
 
225
Procedures that don't use the \instr{call} instruction (sometimes called \emph{leaf} procedures) don't need to save the \code{rp} value.
226
 
227
Since \instr{ret} is just an alias for \code{\instr{jmp} rp}, one can also use \instrname{Compare and Jump} instructions (\instr{cjmp\emph{xxx}}) to perform a conditional procedure return.
228
 
229
Although the \lxp{} architecture doesn't mandate any particular calling convention, some general recommendations are presented below:
230
 
231
\begin{enumerate}
232
        \item Pass arguments through the \code{r1}--\code{r31} registers.
233
        \item Return value through the \code{r0} register.
234
        \item Designate \code{r0}--\code{r31} registers as \emph{caller-saved}, that is, they are not guaranteed to be preserved during procedure calls and must be saved by the caller if needed. The procedure can use them for any purpose, regardless of whether they are used to pass arguments and/or return values. For obvious reasons, this rule does not apply to interrupt handlers.
235
\end{enumerate}
236
 
237
\section{Interrupt handling}
238
\label{sec:interrupthandling}
239
 
240
\subsection{Control register}
241
 
242
\lxp{} supports 8 interrupts with hardwired priority levels (interrupts with lower vector numbers have higher priority). Interrupts vectors (pointers to interrupt handlers) are stored in the \code{iv0}--\code{iv7} registers. Interrupt handling is controlled by the \code{cr} register (Table \ref{tab:cr}).
243
 
244
\begin{table}[htbp]
245
        \caption{Control register}
246
        \label{tab:cr}
247
        \begin{tabularx}{\textwidth}{lL}
248
                \toprule
249
                Bit & Description \\
250
                \midrule
251
 
252
                1      & Enable interrupt 1 \\
253
                & \ldots \\
254
                7      & Enable interrupt 7 \\
255
                8      & Temporarily block interrupt 0 \\
256
                9      & Temporarily block interrupt 1 \\
257
                & \ldots \\
258
                15     & Temporarily block interrupt 7 \\
259
                31--16 & \emph{Reserved} \\
260
                \bottomrule
261
        \end{tabularx}
262
\end{table}
263
 
264
Disabled interrupts are ignored altogether: if the CPU receives an interrupt request signal while the corresponding interrupt is disabled, the interrupt handler will not be called even if the interrupt is enabled later. Conversely, temporarily blocked interrupts are still registered, but their handlers are not called until they are unblocked.
265
 
266
Like other registers, \code{cr} is zero-initialized during the CPU reset, meaning that no interrupts are initially enabled.
267
 
268
\subsection{Invoking interrupt handlers}
269
 
270
Interrupt handlers are invoked by the CPU similarly to procedures (Section \ref{sec:callingprocedures}), the difference being that in this case return address is stored in the \code{irp} register (as opposed to \code{rp}), and the least significant bit of the register (\code{IRF} -- \emph{Interrupt Return Flag}) is set.
271
 
272
An interrupt handler returns using the \code{\instr{jmp} irp} instruction which also has \instr{iret} alias. Until the interrupt handler returns, the CPU will defer further interrupt processing (although incoming interrupt requests will still be registered). This also means that \code{irp} register value will not be unexpectedly overwritten. When executing the \code{\instr{jmp} irp} instruction, the CPU will recognize the \code{IRF} flag and resume interrupt processing as usual. This behavior can be exploited to perform a conditional return from the interrupt handler, similarly to the technique described in Section \ref{sec:callingprocedures} for conditional procedure returns.
273
 
274
Another technique can be useful when waiting for a single event, such as a coprocessor finishing its job: the interrupt handler can be set up to return to a designated address instead of the address stored in the \code{irp} register. This designated address must have the \code{IRF} flag set, otherwise all further interrupt processing will be disabled:
275
 
276
\begin{codepar}
277
    \instr{lc} r0, continue@1 \emph{// IRF flag}
278
    \instr{lc} iv0, handler
279
    ... \emph{// issue coprocessor command}
280
    \instr{hlt} \emph{// wait for an interrupt}
281
continue:
282
    ... \emph{// the execution will continue here}
283
handler:
284
    \instr{jmp} r0
285
\end{codepar}
286
 
287
\chapter{Integration}
288
\label{ch:integration}
289
 
290
\section{Overview}
291
 
292
The \lxp{} IP core is delivered in a form of a synthesizable RTL description expressed in \mbox{VHDL-93}. It does not use any technology specific primitives and should work out of the box with major FPGA synthesis software. \lxp{} can be integrated in both VHDL and Verilog\textregistered{} based SoC designs.
293
 
294
Major \lxp{} hardware versions have separate top-level design units:
295
 
296
\begin{itemize}
297
        \item \shellcmd{lxp32u\_top} -- \lxp{}U (without instruction cache),
298
        \item \shellcmd{lxp32c\_top} -- \lxp{}C (with instruction cache).
299
\end{itemize}
300
 
301
A high level block diagram of the CPU is presented on Figure \ref{fig:blockdiagram}. Schematic symbols for \lxp{}U and \lxp{}C are shown on Figure \ref{fig:symbols}.
302
 
303
\begin{figure}[htbp]
304
        \centering
305
        \includegraphics[scale=0.85]{images/blockdiagram.pdf}
306
        \caption{\lxp{} CPU block diagram}
307
        \label{fig:blockdiagram}
308
\end{figure}
309
 
310
\begin{figure}[htbp]
311
        \centering
312
        \includegraphics[scale=0.85]{images/symbols.pdf}
313
        \caption{Schematic symbols for \lxp{}U and \lxp{}C}
314
        \label{fig:symbols}
315
\end{figure}
316
 
317
\lxp{}U uses the Low Latency Interface (Section \ref{sec:lli}) to fetch instructions. This interface is designed to interact with low latency on-chip peripherals such as RAM blocks or similar devices that are generally expected to return data word after one cycle since the instruction address has been set. It can be also connected to a custom (external) instruction cache.
318
 
319
To achieve the least possible latency, some LLI signals are not registered. For this reason the LLI is not suitable for interaction with off-chip peripherals.
320
 
321
\lxp{}C fetches instructions over the WISHBONE instruction bus. To maximize throughput, it supports the WISHBONE registered feedback signals [CTI\_O()] and [BTE\_O()]. All outputs on this bus are registered. This version is recommended for use with high latency memory devices such as SDRAM chips, as well as for situations where LLI combinatorial delays are unacceptable.
322
 
323
Both \lxp{}U and \lxp{}C use WISHBONE protocol for the data bus.
324
 
325
\section{Ports}
326
 
327
\begin{ctabular}{lccl}
328
        \toprule
329
        Port & Direction & Bus width & Description \\
330
        \midrule
331
        \tabcutin{4}{Global signals} \\
332
        \midrule
333
        \signal{clk\_i} & in & 1 & System clock \\
334
        \signal{rst\_i} & in & 1 & Synchronous reset, active high \\
335
        \midrule
336
        \tabcutin{4}{Instruction bus -- Low Latency Interface (\lxp{}U only)} \\
337
        \midrule
338
        \signal{lli\_re\_o} & out & 1 & Read enable output, active high \\
339
        \signal{lli\_adr\_o} & out & 30 & Address output \\
340
        \signal{lli\_dat\_i} & in & 32 & Data input \\
341
        \signal{lli\_busy\_i} & in & 1 & Busy flag input, active high \\
342
        \midrule
343
        \tabcutin{4}{Instruction bus -- WISHBONE (\lxp{}C only)} \\
344
        \midrule
345
        \signal{ibus\_cyc\_o} & out & 1 & Cycle output \\
346
        \signal{ibus\_stb\_o} & out & 1 & Strobe output \\
347
        \signal{ibus\_cti\_o} & out & 3 & Cycle type identifier \\
348
        \signal{ibus\_bte\_o} & out & 2 & Burst type extension \\
349
        \signal{ibus\_ack\_i} & in & 1 & Acknowledge input \\
350
        \signal{ibus\_adr\_o} & out & 30 & Address output \\
351
        \signal{ibus\_dat\_i} & in & 32 & Data input \\
352
        \midrule
353
        \tabcutin{4}{Data bus} \\
354
        \midrule
355
        \signal{dbus\_cyc\_o} & out & 1 & Cycle output \\
356
        \signal{dbus\_stb\_o} & out & 1 & Strobe output \\
357
        \signal{dbus\_we\_o} & out & 1 & Write enable output \\
358
        \signal{dbus\_sel\_o} & out & 4 & Select output \\
359
        \signal{dbus\_ack\_i} & in & 1 & Acknowledge input \\
360
        \signal{dbus\_adr\_o} & out & 30 & Address output \\
361
        \signal{dbus\_dat\_o} & out & 32 & Data output \\
362
        \signal{dbus\_dat\_i} & in & 32 & Data input \\
363
        \midrule
364
        \tabcutin{4}{Other ports} \\
365
        \midrule
366
        \signal{irq\_i} & in & 8 & Interrupt requests \\
367
        \bottomrule
368
\end{ctabular}
369
 
370
\section{Generics}
371
\label{sec:generics}
372
 
373
The following generics can be used to configure the \lxp{} IP core parameters.
374
 
375
\subsection{DBUS\_RMW}
376
 
377
By default, \lxp{} uses the \signal{dbus\_sel\_o} (byte enable) port to perform byte-granular write transactions initiated by the \instr{sb} (\instrname{Store Byte}) instruction. If this option is set to \code{true}, \signal{dbus\_sel\_o} is always tied to \code{"1111"}, and byte-granular write access is performed using the RMW (read-modify-write) cycle. The latter method is slower, but can work with slaves that do not have the [SEL\_I()] port.
378
 
379
This feature requires data bus transactions to be idempotent, that is, repeating a transaction must not alter the slave state. Care should be taken with non-memory slaves to ensure that this condition is satisfied.
380
 
381
\subsection{DIVIDER\_EN}
382
 
383
\lxp{} includes a divider unit which occupies a considerable amount of resources. It can be excluded by setting this option to \code{false}.
384
 
385
\subsection{IBUS\_BURST\_SIZE}
386
 
387
Instruction bus burst size. Default value is 16. Only for \lxp{}C.
388
 
389
\subsection{IBUS\_PREFETCH\_SIZE}
390
 
391
Number of words that the instruction cache will read ahead from the current instruction pointer. Default value is 32. Only for \lxp{}C.
392
 
393
\subsection{MUL\_ARCH}
394
 
395
\lxp{} provides three multiplier options:
396
 
397
\begin{itemize}
398
        \item \code{"dsp"} is the fastest architecture designed for technologies that provide fast parallel $16 \times 16$ multipliers, which includes most modern FPGA families. One multiplication takes 2 clock cycles.
399
        \item \code{"opt"} architecture uses a semi-parallel multiplication algorithm based on carry-save accumulation of partial products. It is designed for technologies that do not provide fast $16 \times 16$ multipliers. One multiplication takes 6 clock cycles.
400
        \item \code{"seq"} is a fully sequential design. One multiplication takes 34 clock cycles.
401
\end{itemize}
402
 
403
The default multiplier architecture is \code{"dsp"}. This option is recommended for most modern FPGA devices regardless of optimization goal since it is not only the fastest, but also occupies the least amount of general-purpose logic resources. However, it will create a timing bottleneck on technologies that lack fast multipliers.
404
 
405
For older FPGA families that don't provide dedicated multipliers the \code{"opt"} architecture can be used if decent throughput is still needed. It is designed to avoid creating a timing bottleneck on such technologies. Alternatively, \code{"seq"} architecture can be used when throughput is not a concern.
406
 
407
\subsection{START\_ADDR}
408
 
409
Address of the first instruction to be executed after CPU reset. Default value is \code{0}. Note that it is a 30-bit value as it is used to address 32-bit words, not bytes.
410
 
411
\section{Clock and reset}
412
\label{sec:clockreset}
413
 
414
All flip-flops in the CPU are triggered by a rising edge of the \signal{clk\_i} signal. No specific requirements are imposed on the \signal{clk\_i} signal apart from usual constraints on setup and hold times.
415
 
416
\lxp{} is reset synchronously when the \signal{rst\_i} signal is asserted. If the system reset signal comes from an asynchronous source, a synchronization circuit must be used; an example of such a circuit is shown on Figure \ref{fig:resetsync}.
417
 
418
\begin{figure}[htbp]
419
        \centering
420
        \includegraphics[scale=1]{images/resetsync.pdf}
421
        \caption{Reset synchronization circuit}
422
        \label{fig:resetsync}
423
\end{figure}
424
 
425
In SRAM-based FPGAs flip-flops and RAM blocks have deterministic state after a bitstream is loaded. On such technologies \lxp{} can operate without reset. In this case the \signal{rst\_i} port can be tied to a logical \code{0} in the RTL design to allow the synthesizer to remove redundant logic.
426
 
427
\signal{clk\_i} and \signal{rst\_i} signals also serve the role of [CLK\_I] and [RST\_I] WISHBONE signals, respectively, for both instruction and data buses.
428
 
429
\section{Low Latency Interface}
430
\label{sec:lli}
431
 
432
Low Latency Interface is a simple pipelined synchronous protocol with a typical latency of 1 cycle used by \lxp{}U to fetch instructions. Its timing diagram is shown on Figure \ref{fig:llitiming}. The request is considered valid when \signal{lli\_re\_o} is high and \signal{lli\_busy\_i} is low on the same clock cycle. On the next cycle after the request is valid the slave must either produce data on \signal{lmi\_dat\_i} or assert \signal{lli\_busy\_i} to indicate that data are not ready. Note that the values of \signal{lli\_re\_o} and \signal{lli\_adr\_o} are not guaranteed to be preserved by the CPU while the slave is busy.
433
 
434
The simplest, ``always ready'' slaves such as on-chip RAM blocks can be trivially connected to the LLI by connecting address, data and read enable ports and tying the \signal{lli\_busy\_i} signal to a logical \code{0}. Slaves are also allowed to introduce wait states, which makes it possible to implement external caching.
435
 
436
\begin{figure}[htbp]
437
        \centering
438
        \includegraphics[scale=1]{images/llitiming.pdf}
439
        \caption{Low Latency Interface timing diagram (\lxp{}U)}
440
        \label{fig:llitiming}
441
\end{figure}
442
 
443
Note that the \signal{lli\_adr\_o} signal has a width of 30 bits since it addresses words, not bytes (instructions are always word-aligned).
444
 
445
Since this interface is not registered, it is not suitable for interaction with off-chip peripherals. Also, care should be taken to avoid introducing too much additional combinatorial delay on its outputs.
446
 
447
\section{WISHBONE instruction bus}
448
 
449
The \lxp{}C CPU fetches instructions over the WISHBONE bus. Its parameters are defined in the WISHBONE datasheet (Appendix \ref{app:wishbonedatasheet}). For a detailed description of the bus protocol refer to the WISHBONE specification, revision B3.
450
 
451
With classic WISHBONE handshake decent throughput can be only achieved when the slave is able to terminate cycles asynchronously. It is usually possible only for the simplest slaves, which should probably be using the Low Latency Interface in the first place. To maximize throughput for complex, high latency slaves, \lxp{}C instruction bus uses optional WISHBONE address tags [CTI\_O()] (Cycle Type Identifier) and [BTE\_O()] (Burst Type Extension). These signals are hints allowing the slave to predict the address that will be set by the master in the next cycle and prepare data in advance. The slave can ignore these hints, processing requests as classic WISHBONE cycles, although performance would almost certainly suffer in this case.
452
 
453
A typical \lxp{}C instruction bus burst timing diagram is shown on Figure \ref{fig:ibustiming}.
454
 
455
\begin{figure}[htbp]
456
        \centering
457
        \includegraphics[scale=0.786]{images/ibustiming.pdf}
458
        \caption{Typical WISHBONE instruction bus burst (\lxp{}C)}
459
        \label{fig:ibustiming}
460
\end{figure}
461
 
462
\section{WISHBONE data bus}
463
 
464
\lxp{} uses the WISHBONE bus to interact with data memory and other peripherals. This bus is distinct from the instruction bus; its parameters are defined in the WISHBONE datasheet (Appendix \ref{app:wishbonedatasheet}).
465
 
466
This bus uses a 30-bit \signal{dbus\_adr\_o} port to address 32-bit words; the \signal{dbus\_sel\_o} port is used to select individual bytes to be written or read. Alternatively, with the \code{DBUS\_RMW} option (Section \ref{sec:generics}) the \signal{dbus\_sel\_o} port is not used; byte-granular write access is performed using the read-modify-write cycle instead.
467
 
468
For a detailed description of the bus protocol refer to the WISHBONE specification, revision B3.
469
 
470
Typical timing diagrams for write and read cycles are shown on Figure \ref{fig:dbustiming}. In these examples the peripheral terminates the cycle asynchronously; however, it can also introduce wait states by delaying the \signal{dbus\_ack\_i} signal.
471
 
472
\begin{figure}[htbp]
473
        \centering
474
        \includegraphics[scale=0.928]{images/dbustiming.pdf}
475
        \caption{Typical WISHBONE data bus WRITE and READ cycles}
476
        \label{fig:dbustiming}
477
\end{figure}
478
 
479
\section{Interrupts}
480
 
481
\lxp{} registers an interrupt condition when the corresponding request signal goes from \code{0} to \code{1}. Transitions from \code{1} to \code{0} are ignored. All interrupt request signals must be synchronous with the system clock (\signal{clk\_i}); if coming from an asynchronous source, they must be synchronized using a sequence of at least two flip-flops clocked by \signal{clk\_i}. These flip-flops are not included in the \lxp{} core in order not to increase interrupt processing delay for interrupt sources that are inherently synchronous. Failure to properly synchronize interrupt request signals will cause timing violations that will manifest itself as intermittent, hard to debug faults.
482
 
483
\section{Synthesis and optimization}
484
\label{sec:synthesis}
485
 
486
\subsection{Technology specific primitives}
487
 
488
\lxp{} RTL design is described in behavioral VHDL. However, it can also benefit from certain special resources provided by most FPGA devices, namely, RAM blocks and dedicated multipliers. For improved portability, hardware description that can potentially be mapped to such resources is localized in separate design units:
489
 
490
\begin{itemize}
491
        \item \shellcmd{lxp32\_ram256x32} -- a dual-port synchronous $256 \times 32$ bit RAM with one write port and one read port;
492
        \item \shellcmd{lxp32\_mul16x16} -- an unsigned $16 \times 16$ multiplier with an output register.
493
\end{itemize}
494
 
495
These design units contain behavioral description of respective hardware that is recognizable by FPGA synthesis tools. Usually no adjustments are needed as the synthesizer will automatically infer an appropriate primitive from its behavioral description. If automatic inference produces unsatisfactory results, these design units can be replaced with library element wrappers. The same is true for ASIC logic synthesis software which is unlikely to infer complex primitives.
496
 
497
\subsection{General optimization guidelines}
498
 
499
This subsection contains general advice on achieving satisfactory synthesis results regardless of the optimization goal. Some of these suggestions are also mentioned in other parts of this manual.
500
 
501
\begin{enumerate}
502
        \item If the technology doesn't provide dedicated multiplier resources, consider using \code{"opt"} or \code{"seq"} multiplier architecture (Section \ref{sec:generics}).
503
 
504
        \item Ensure that the instruction bus has adequate throughput. For \lxp{}C, check that the slave supports the WISHBONE registered feedback signals [CTI\_I()] and [BTE\_I()].
505
 
506
        \item Multiplexing instruction and data buses, or connecting them to the same interconnect that allows only one master at a time to be active (i.e. \emph{shared bus} interconnect topology) is not recommended. If you absolutely must do so, assign a higher priority level to the data bus, otherwise instruction prefetches will massively slow down data transactions.
507
\end{enumerate}
508
 
509
\subsection{Optimizing for timing}
510
 
511
\begin{enumerate}
512
        \item Set up reasonable timing constraints. Do not overconstrain the design by more that 10--15~\%.
513
 
514
        \item Analyze the worst path. The natural \lxp{} timing bottleneck usually goes from the scratchpad (register file) output through the ALU (in the Execute stage) to the scratchpad input. If timing analysis lists other critical paths, the problem can lie elsewhere. If the \signal{rst\_i} signal becomes a bottleneck, promote it to a global network or, with SRAM-based FPGAs, consider operating without reset (see Section \ref{sec:clockreset}). Critical paths affecting the WISHBONE state machines could indicate problems with interconnect performance.
515
 
516
        \item Configure the synthesis tool to reduce the fanout limit. Note that setting this limit to a too small value can lead to an opposite effect.
517
 
518
        \item Synthesis tools can support additional options to improve timing, such as the \emph{Retiming} algorithm which rearranges registers and combinatorial logic across the pipeline in attempt to balance delays. The efficiency of such algorithms is not very predictable. In general, sloppy designs are the most likely to benefit from it, while for a carefully designed circuit timing can sometimes get worse.
519
\end{enumerate}
520
 
521
\subsection{Optimizing for area}
522
 
523
\begin{enumerate}
524
        \item Consider excluding the divider if not using it (see Section \ref{sec:generics}).
525
 
526
        \item Relaxing timing constraints can sometimes allow the synthesizer to produce a more area-efficient circuit.
527
 
528
        \item Increase the fanout limit in the synthesizer settings to reduce buffer replication.
529
\end{enumerate}
530
 
531
\chapter{Simulation}
532
\label{ch:simulation}
533
 
534
\lxp{} package includes an automated verification environment (self-checking testbench) which verifies the \lxp{} CPU functional correctness. The environment consists of two major parts: a test platform which is a SoC-like design providing peripherals for the CPU to interact with, and the testbench itself which loads test firmware and monitors the platform's output signals. Like the CPU itself, the test environment is written in VHDL-93.
535
 
536
A separate testbench for the instruction cache (\shellcmd{lxp32\_icache}) is also provided. It can be invoked similarly to the main CPU testbench.
537
 
538
\section{Requirements}
539
 
540
The following software is required to simulate the \lxp{} design:
541
 
542
\begin{itemize}
543
        \item An HDL simulator supporting VHDL-93. \lxp{} package includes scripts (makefiles) for the following simulators:
544
 
545
        \begin{itemize}
546
                \item GHDL -- a free and open-source VHDL simulator which supports multiple operating systems\footnote{\url{http://ghdl.free.fr/}};
547
                \item Mentor Graphics\textregistered{} ModelSim\textregistered{} simulator (\shellcmd{vsim});
548
                \item Xilinx\textregistered{} Vivado\textregistered{} Simulator (\shellcmd{xsim}).
549
        \end{itemize}
550
 
551
        With GHDL, a waveform viewer such as GTKWave is also recommended (Figure \ref{fig:gtkwave})\footnote{\url{http://gtkwave.sourceforge.net/}}.
552
 
553
        Some FPGA vendors provide limited versions of the ModelSim\textregistered{} simulator for free as parts of their design suites. These versions should suffice for \lxp{} simulation.
554
 
555
        Other simulators can be used with some preparations (Section \ref{sec:simmanual}).
556
 
557
        \item GNU \shellcmd{make} and \shellcmd{coreutils} are needed to simulate the design using the provided makefiles. Under Microsoft\textregistered{} Windows\textregistered{}, MSYS or Cygwin can be used.
558
        \item \lxp{} assembler/linker program (\shellcmd{lxp32asm}) must be present (Section \ref{sec:lxp32asm}). A prebuilt executable for Microsoft\textregistered{} Windows\textregistered{} is already included in the \lxp{} package, for other operating systems \shellcmd{lxp32asm} must be built from source (Section \ref{sec:buildfromsource}).
559
\end{itemize}
560
 
561
\begin{figure}[htbp]
562
        \centering
563
        \includegraphics[scale=0.65]{images/gtkwave.png}
564
        \caption{GTKWave displaying the \lxp{} waveform dump produced by GHDL}
565
        \label{fig:gtkwave}
566
\end{figure}
567
 
568
\section{Running simulation using makefiles}
569
 
570
To simulate the design, go to the \shellcmd{verify/lxp32/run/<\emph{simulator}>} directory and run \shellcmd{make}. The following make targets are supported:
571
 
572
\begin{itemize}
573
        \item \shellcmd{batch} -- simulate the design in batch mode. Results will be written to the standard output. This is the default target.
574
        \item \shellcmd{gui} -- simulate the design in GUI mode. Note: since GHDL doesn't have a GUI, the simulation itself will be run in batch mode; upon a successful completion, GTKWave will be run automatically to display dumped waveforms.
575
        \item \shellcmd{compile} -- compile only, don't run simulation.
576
        \item \shellcmd{clean} -- delete all the produced artifacts.
577
\end{itemize}
578
 
579
\section{Running simulation manually}
580
\label{sec:simmanual}
581
 
582
\lxp{} testbench can be also run manually. The following steps must be performed:
583
 
584
\begin{enumerate}
585
        \item Compile the test firmware in the \shellcmd{verify/lxp32/src/firmware} directory:
586
 
587
        \begin{codepar}
588
    lxp32asm -f textio \emph{filename}.asm -o \emph{filename}.ram
589
        \end{codepar}
590
 
591
        Produced \shellcmd{*.ram} files must be placed to the simulator's working directory.
592
        \item Compile the \lxp{} RTL description (\shellcmd{rtl} directory).
593
        \item Compile the test platform (\shellcmd{verify/lxp32/src/platform} directory).
594
        \item Compile the testbench itself (\shellcmd{verify/lxp32/src/tb} directory).
595
        \item Simulate the \shellcmd{tb} design unit defined in the \shellcmd{tb.vhd} file.
596
\end{enumerate}
597
 
598
\section{Testbench parameters}
599
 
600
Simulation parameters can be configured by overriding generics defined by the \shellcmd{tb} design unit:
601
 
602
\begin{itemize}
603
        \item \code{MODEL\_LXP32C} -- simulate the \lxp{}C version. By default, this option is set to \code{true}. If set to \code{false}, \lxp{}U is simulated instead.
604
        \item \code{TEST\_CASE} -- if set to a non-empty string, specifies the file name of a test case to run. If set to an empty string (default), all tests are executed.
605
        \item \code{THROTTLE\_DBUS} -- perform pseudo-random data bus throttling. By default, this option is set to \code{true}.
606
        \item \code{THROTTLE\_IBUS} -- perform pseudo-random instruction bus throttling. By default, this option is set to \code{true}.
607
        \item \code{VERBOSE} -- print more messages.
608
\end{itemize}
609
 
610
\chapter{Development tools}
611
\label{ch:developmenttools}
612
 
613
\section{\shellcmd{lxp32asm} -- Assembler and linker}
614
\label{sec:lxp32asm}
615
 
616
\shellcmd{lxp32asm} is a combined assembler and linker for the \lxp{} platform. It takes one or more input files and produces executable code for the CPU. Input files can be either source files in the \lxp{} assembly language (Appendix \ref{app:assemblylanguage}) or \emph{linkable objects}. Linkable object is a relocatable format for storing compiled \lxp{} code together with symbol information.
617
 
618
\shellcmd{lxp32asm} operates in two stages:
619
 
620
\begin{enumerate}
621
        \item Compile.
622
 
623
        Source files are compiled to linkable objects.
624
 
625
        \item Link.
626
 
627
        Linkable objects are combined into a single executable module. References to symbols defined in external modules are resolved at this stage.
628
\end{enumerate}
629
 
630
In the simplest case there is only one input source file which doesn't contain external symbol references. If there are multiple input files, one of them must define the \code{entry} symbol at the beginning of the code.
631
 
632
\subsection{Command line syntax}
633
 
634
\begin{codepar}
635
    lxp32asm [ \emph{options} | \emph{input files} ]
636
\end{codepar}
637
 
638
Options supported by \shellcmd{lxp32asm} are listed below:
639
 
640
\begin{itemize}
641
        \item \shellcmd{-a \emph{align}} -- section alignment. Must be a multiple of 4, default value is 4. Ignored in compile-only mode.
642
 
643
        \item \shellcmd{-b \emph{addr}} -- Base address, that is, address in memory where the executable image will be located. Must be a multiple of section alignment. Default value is 0. Ignored in compile-only mode.
644
 
645
        \item \shellcmd{-c} -- compile only (skip the Link stage).
646
 
647
        \item \shellcmd{-f \emph{fmt}} -- select executable image format (see below for the list of supported formats). Ignored in compile-only mode.
648
 
649
        \item \shellcmd{-h}, \shellcmd{--help} -- display a short help message and exit.
650
 
651
        \item \shellcmd{-i \emph{dir}} -- add \emph{dir} to the list of directories used to search for included files. Multiple directories can be specified with multiple \shellcmd{-i} arguments.
652
 
653
        \item \shellcmd{-o \emph{file}} -- output file name.
654
 
655
        \item \shellcmd{-s \emph{size}} -- size of the executable image. Must be a multiple of 4. If total code size is less than the specified value, the executable image is padded with zeros. By default, image is not padded. This option is ignored in compile-only mode.
656
 
657
        \item \shellcmd{--} -- do not interpret subsequent command line arguments as options. Can be used if there are input file names starting with dash.
658
\end{itemize}
659
 
660
\subsection{Output formats}
661
 
662
The following output formats are supported by \shellcmd{lxp32asm}:
663
 
664
\begin{itemize}
665
        \item \shellcmd{bin} -- raw binary image. This is the default format.
666
        \item \shellcmd{textio} -- text format representing binary data as a sequence of zeros and ones. This format can be directly read from VHDL (using the \code{std.textio} package) or Verilog\textregistered{} (using the \code{\$readmemb} function).
667
        \item \shellcmd{dec} -- text format representing each word as a decimal number.
668
        \item \shellcmd{hex} -- text format representing each word as a hexadecimal number.
669
\end{itemize}
670
 
671
\section{\shellcmd{lxp32dump} -- Disassembler}
672
 
673
\shellcmd{lxp32dump} takes an executable image and produces a source file in \lxp{} assembly language. The produced file is a valid program that can be compiled by \shellcmd{lxp32asm}.
674
 
675
\subsection{Command line syntax}
676
 
677
\begin{codepar}
678
    lxp32dump [ \emph{options} | \emph{input file} ]
679
\end{codepar}
680
 
681
Supported options are:
682
 
683
\begin{itemize}
684
        \item \shellcmd{-b \emph{addr}} -- executable image base address, only used for comments.
685
 
686
        \item \shellcmd{-f \emph{fmt}} -- input file format. All \shellcmd{lxp32asm} output formats are supported. If this option is not supplied, autodetection is performed.
687
 
688
        \item \shellcmd{-h}, \shellcmd{--help} -- display a short help message and exit.
689
 
690
        \item \shellcmd{-o \emph{file}} -- output file name. By default, the standard output stream is used.
691
 
692
        \item \shellcmd{--} -- do not interpret subsequent command line arguments as options.
693
\end{itemize}
694
 
695
\section{\shellcmd{wigen} -- Interconnect generator}
696
 
697
\shellcmd{wigen} is a small tool that generates VHDL description of a simple WISHBONE interconnect based on shared bus topology. It supports any number of masters and slaves.
698
 
699
For interconnects with multiple masters a priority-based arbitration circuit is inserted with lower-numbered masters taking precedence. However, when a bus cycle is in progress ([CYC\_O] is asserted by the active master), the arbiter will not interrupt it even if a master with a higher priority level requests bus ownership.
700
 
701
\subsection{Command line syntax}
702
 
703
\begin{codepar}
704
        wigen [ \emph{option(s)} ] \emph{nm} \emph{ns} \emph{ma} \emph{sa} \emph{ps} [ \emph{pg} ]
705
\end{codepar}
706
 
707
\begin{itemize}
708
        \item\shellcmd{\emph{nm}} -- number of masters,
709
        \item\shellcmd{\emph{ns}} -- number of slaves,
710
        \item\shellcmd{\emph{ma}} -- master address width,
711
        \item\shellcmd{\emph{sa}} -- slave address width,
712
        \item\shellcmd{\emph{ps}} -- port size (8, 16, 32 or 64),
713
        \item\shellcmd{\emph{pg}} -- port granularity (8, 16, 32 or 64, default: the same as port size).
714
\end{itemize}
715
 
716
Supported options are:
717
 
718
\begin{itemize}
719
        \item \shellcmd{-e \emph{entity}} -- name of the design entity (default is \code{"intercon"}).
720
 
721
        \item \shellcmd{-h}, \shellcmd{--help} -- display a short help message and exit.
722
 
723
        \item \shellcmd{-o \emph{file}} -- output file name (default is \shellcmd{\emph{entity}.vhd}).
724
 
725
        \item \shellcmd{-p} -- generate pipelined arbiter (reduced combinatorial delays, increased latency).
726
 
727
        \item \shellcmd{-r} -- generate WISHBONE registered feedback signals ([CTI\_IO()] and [BTE\_IO()]).
728
 
729
        \item \shellcmd{-u} -- generate unsafe slave decoder (reduced combinatorial delays and resource usage, may not work properly if the address is invalid).
730
\end{itemize}
731
 
732
\section{Building from source}
733
\label{sec:buildfromsource}
734
 
735
Prebuilt tool executables for 32-bit Microsoft\textregistered{} Windows\textregistered{} are included in the \lxp{} IP core package. For other platforms the tools must be built from source. Since they are developed in \cplusplus{} using only the standard library, it should be possible to build them for any platform that provides a modern \cplusplus{} compiler.
736
 
737
\subsection{Requirements}
738
 
739
The following software is required to build \lxp{} tools from source:
740
 
741
\begin{enumerate}
742
        \item A modern \cplusplus{} compiler, such as Microsoft\textregistered{} Visual Studio\textregistered{} 2013 or newer, GCC 4.8 or newer, Clang 3.4 or newer.
743
        \item CMake 3.3 or newer.
744
\end{enumerate}
745
 
746
\subsection{Build procedure}
747
 
748
This software uses CMake as a build system generator. Building it involves two steps: first, the \shellcmd{cmake} program is invoked to generate a native build environment (a set of Makefiles or an IDE project); second, the generated environment is used to build the software.
749
 
750
\subsubsection{Examples}
751
 
752
In the following examples, it is assumed that the commands are run from the \shellcmd{tools} subdirectory of the \lxp{} IP core package tree.
753
 
754
For Microsoft\textregistered{} Visual Studio\textregistered{}:
755
 
756
\begin{codepar}
757
    mkdir build
758
    cd build
759
    cmake -G "NMake Makefiles" ../src
760
    nmake
761
    nmake install
762
\end{codepar}
763
 
764
For MSYS:
765
 
766
\begin{codepar}
767
    mkdir build
768
    cd build
769
    cmake -G "MSYS Makefiles" ../src
770
    make
771
    make install
772
\end{codepar}
773
 
774
For MinGW without MSYS:
775
 
776
\begin{codepar}
777
    mkdir build
778
    cd build
779
    cmake -G "MinGW Makefiles" ../src
780
    mingw32-make
781
    mingw32-make install
782
\end{codepar}
783
 
784
For other platforms:
785
 
786
\begin{codepar}
787
    mkdir build
788
    cd build
789
    cmake ../src
790
    make
791
    make install
792
\end{codepar}
793
 
794
More details can be found in the CMake documentation.
795
 
796
\appendix
797
 
798
\chapter{Instruction set reference}
799
\label{app:instructionset}
800
 
801
See Section \ref{sec:instructionformat} for a general description of \lxp{} instruction encoding.
802
 
803
\section{List of instructions by group}
804
 
805
\begin{ctabular}{lll}
806
        \toprule
807
        Instruction & Description & Opcode \\
808
        \midrule
809
        \tabcutin{3}{Data transfer} \\
810
        \midrule
811
        \hyperref[subsec:instr:mov]{\instr{mov}} & Move & alias for \code{\instr{add} dst, src, 0} \\
812
        \hyperref[subsec:instr:lc]{\instr{lc}} & Load Constant & \code{000001} \\
813
        \hyperref[subsec:instr:lw]{\instr{lw}} & Load Word & \code{001000} \\
814
        \hyperref[subsec:instr:lub]{\instr{lub}} & Load Unsigned Byte & \code{001010} \\
815
        \hyperref[subsec:instr:lsb]{\instr{lsb}} & Load Signed Byte & \code{001011} \\
816
        \hyperref[subsec:instr:sw]{\instr{sw}} & Store Word & \code{001100} \\
817
        \hyperref[subsec:instr:sb]{\instr{sb}} & Store Byte & \code{001110} \\
818
        \midrule
819
        \tabcutin{3}{Arithmetic operations} \\
820
        \midrule
821
        \hyperref[subsec:instr:add]{\instr{add}} & Add & \code{010000} \\
822
        \hyperref[subsec:instr:sub]{\instr{sub}} & Subtract & \code{010001} \\
823
        \hyperref[subsec:instr:mul]{\instr{mul}} & Multiply & \code{010010} \\
824
        \hyperref[subsec:instr:divu]{\instr{divu}} & Divide Unsigned & \code{010100} \\
825
        \hyperref[subsec:instr:divs]{\instr{divs}} & Divide Signed & \code{010101} \\
826
        \hyperref[subsec:instr:modu]{\instr{modu}} & Modulo Unsigned & \code{010110} \\
827
        \hyperref[subsec:instr:mods]{\instr{mods}} & Modulo Signed & \code{010111} \\
828
        \midrule
829
        \tabcutin{3}{Bitwise operations} \\
830
        \midrule
831
        \hyperref[subsec:instr:not]{\instr{not}} & Bitwise Not & alias for \code{\instr{xor} dst, src, -1} \\
832
        \hyperref[subsec:instr:and]{\instr{and}} & Bitwise And & \code{011000} \\
833
        \hyperref[subsec:instr:or]{\instr{or}} & Bitwise Or & \code{011001} \\
834
        \hyperref[subsec:instr:xor]{\instr{xor}} & Bitwise Exclusive Or & \code{011010}\\
835
        \hyperref[subsec:instr:sl]{\instr{sl}} & Shift Left & \code{011100} \\
836
        \hyperref[subsec:instr:sru]{\instr{sru}} & Shift Right Unsigned & \code{011110} \\
837
        \hyperref[subsec:instr:srs]{\instr{srs}} & Shift Right Signed & \code{011111} \\
838
        \midrule
839
        \tabcutin{3}{Execution transfer} \\
840
        \midrule
841
        \hyperref[subsec:instr:jmp]{\instr{jmp}} & Jump & \code{100000} \\
842
        \hyperref[subsec:instr:cjmpxxx]{\instr{cjmp\emph{xxx}}} & Compare and Jump & \code{11\emph{xxxx}} (\code{\emph{xxxx}} = condition) \\
843
        \hyperref[subsec:instr:call]{\instr{call}} & Call Procedure & \code{100001} \\
844
        \hyperref[subsec:instr:ret]{\instr{ret}} & Return from Procedure & alias for \code{\instr{jmp} rp} \\
845
        \hyperref[subsec:instr:iret]{\instr{iret}} & Interrupt Return & alias for \code{\instr{jmp} irp}\\
846
        \midrule
847
        \tabcutin{3}{Miscellaneous instructions} \\
848
        \midrule
849
        \hyperref[subsec:instr:nop]{\instr{nop}} & No Operation & \code{000000} \\
850
        \hyperref[subsec:instr:hlt]{\instr{hlt}} & Halt & \code{000010} \\
851
\end{ctabular}
852
 
853
\section{Alphabetical list of instructions}
854
 
855
\settocdepth{subsection}
856
 
857
{
858
\setlength{\parindent}{0pt}
859
\nonzeroparskip
860
 
861
\subsection{\instr{add} -- Add}
862
\label{subsec:instr:add}
863
 
864
\subsubsection{Syntax}
865
 
866
\code{\instr{add} DST, RD1, RD2}
867
 
868
\subsubsection{Encoding}
869
 
870
\code{010000 T1 T2 DST RD1 RD2}
871
 
872
Example: \code{\instr{add} r2, r1, 10} $\rightarrow$ \code{0x4202010A}
873
 
874
\subsubsection{Operation}
875
 
876
\code{DST := RD1 + RD2}
877
 
878
\subsection{\instr{and} -- Bitwise And}
879
\label{subsec:instr:and}
880
 
881
\subsubsection{Syntax}
882
 
883
\code{\instr{and} DST, RD1, RD2}
884
 
885
\subsubsection{Encoding}
886
 
887
\code{011000 T1 T2 DST RD1 RD2}
888
 
889
Example: \code{\instr{and} r2, r1, 0x3F} $\rightarrow$ \code{0x6202013F}
890
 
891
\subsubsection{Operation}
892
 
893
\code{DST := RD1 $\land$ RD2}
894
 
895
\subsection{\instr{call} -- Call Procedure}
896
\label{subsec:instr:call}
897
 
898
Save a pointer to the next instruction in the \code{rp} register and transfer execution to the address pointed by the operand.
899
 
900
\subsubsection{Syntax}
901
 
902
\code{\instr{call} RD1}
903
 
904
\subsubsection{Encoding}
905
 
906
\code{100001 1 0 11111110 RD1 00000000}
907
 
908
RD1 must be a register.
909
 
910
Example: \code{\instr{call} r1} $\rightarrow$ \code{0x86FE0100}
911
 
912
\subsubsection{Operation}
913
 
914
\code{rp := \emph{return\_address}}
915
 
916
\code{goto RD1}
917
 
918
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
919
 
920
\subsection{\instr{cjmp\emph{xxx}} -- Compare and Jump}
921
\label{subsec:instr:cjmpxxx}
922
 
923
Compare two operands and transfer execution to the specified address if a condition is satisfied.
924
 
925
\subsubsection{Syntax}
926
 
927
\code{\instr{cjmpe} DST, RD1, RD2} (Equal)
928
 
929
\code{\instr{cjmpne} DST, RD1, RD2} (Not Equal)
930
 
931
\code{\instr{cjmpsg} DST, RD1, RD2} (Signed Greater)
932
 
933
\code{\instr{cjmpsge} DST, RD1, RD2} (Signed Greater or Equal)
934
 
935
\code{\instr{cjmpsl} DST, RD1, RD2} (Signed Less)
936
 
937
\code{\instr{cjmpsle} DST, RD1, RD2} (Signed Less or Equal)
938
 
939
\code{\instr{cjmpug} DST, RD1, RD2} (Unsigned Greater)
940
 
941
\code{\instr{cjmpuge} DST, RD1, RD2} (Unsigned Greater or Equal)
942
 
943
\code{\instr{cjmpul} DST, RD1, RD2} (Unsigned Less)
944
 
945
\code{\instr{cjmpule} DST, RD1, RD2} (Unsigned Less or Equal)
946
 
947
\subsubsection{Encoding}
948
 
949
\code{OPCODE T1 T2 DST RD1 RD2}
950
 
951
Opcodes:
952
 
953
\begin{tabularx}{\textwidth}{lL}
954
\instr{cjmpe}   & \code{111000} \\
955
\instr{cjmpne}  & \code{110100} \\
956
\instr{cjmpsg}  & \code{110001} \\
957
\instr{cjmpsge} & \code{111001} \\
958
\instr{cjmpug}  & \code{110010} \\
959
\instr{cjmpuge} & \code{111010} \\
960
\end{tabularx}
961
 
962
\instr{cjmpsl}, \instr{cjmpsle}, \instr{cjmpul}, \instr{cjmpule} are aliases for \instr{cjmpsg}, \instr{cjmpsge}, \instr{cjmpug}, \instr{cjmpuge}, respectively, with RD1 and RD2 operands swapped.
963
 
964
Example: \code{\instr{cjmpuge} r2, r1, 5} $\rightarrow$ \code{0xEA020105}
965
 
966
\subsubsection{Operation}
967
 
968
\code{if \emph{condition} then goto DST}
969
 
970
Pointer in DST is interpreted as described in Section \ref{sec:addressing}. Unlike most instructions, \instr{cjmp\emph{xxx}} does not write to DST.
971
 
972
\subsection{\instr{divs} -- Divide Signed}
973
\label{subsec:instr:divs}
974
 
975
\subsubsection{Syntax}
976
 
977
\code{\instr{divs} DST, RD1, RD2}
978
 
979
\subsubsection{Encoding}
980
 
981
\code{010101 T1 T2 DST RD1 RD2}
982
 
983
Example: \code{\instr{divs} r2, r1, -3} $\rightarrow$ \code{0x560201FD}
984
 
985
\subsubsection{Operation}
986
 
987
\code{DST := (\emph{signed}) RD1 / (\emph{signed}) RD2}
988
 
989
The result is rounded towards zero and is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
990
 
991
\subsection{\instr{divu} -- Divide Unsigned}
992
\label{subsec:instr:divu}
993
 
994
\subsubsection{Syntax}
995
 
996
\code{\instr{divu} DST, RD1, RD2}
997
 
998
\subsubsection{Encoding}
999
 
1000
\code{010100 T1 T2 DST RD1 RD2}
1001
 
1002
Example: \code{\instr{divu} r2, r1, 73} $\rightarrow$ \code{0x52020107}
1003
 
1004
\subsubsection{Operation}
1005
 
1006
\code{DST := RD1 / RD2}
1007
 
1008
The result is rounded towards zero and is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1009
 
1010
\subsection{\instr{hlt} -- Halt}
1011
\label{subsec:instr:hlt}
1012
 
1013
Wait for an interrupt.
1014
 
1015
\subsubsection{Syntax}
1016
 
1017
\code{\instr{hlt}}
1018
 
1019
\subsubsection{Encoding}
1020
 
1021
\code{000010 0 0 00000000 00000000 00000000}
1022
 
1023
\subsubsection{Operation}
1024
 
1025
Pause execution until an interrupt is received.
1026
 
1027
\subsection{\instr{jmp} -- Jump}
1028
\label{subsec:instr:jmp}
1029
 
1030
Transfer execution to the address pointed by the operand.
1031
 
1032
\subsubsection{Syntax}
1033
 
1034
\code{\instr{jmp} RD1}
1035
 
1036
\subsubsection{Encoding}
1037
 
1038
\code{100000 1 0 00000000 RD1 00000000}
1039
 
1040
RD1 must be a register.
1041
 
1042
Example: \code{\instr{jmp} r1} $\rightarrow$ \code{0x82000100}
1043
 
1044
\subsubsection{Operation}
1045
 
1046
\code{goto RD1}
1047
 
1048
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1049
 
1050
\subsection{\instr{iret} -- Interrupt Return}
1051
\label{subsec:instr:iret}
1052
 
1053
Return from an interrupt handler.
1054
 
1055
\subsubsection{Syntax}
1056
 
1057
\instr{iret}
1058
 
1059
Alias for \code{\instr{jmp} irp}.
1060
 
1061
\subsection{\instr{lc} -- Load Constant}
1062
\label{subsec:instr:lc}
1063
 
1064
Load a 32-bit word to the specified register. Note that values in the [-128; 127] range can be loaded more efficiently using the \instr{mov} instruction alias.
1065
 
1066
\subsubsection{Syntax}
1067
 
1068
\code{\instr{lc} DST, WORD32}
1069
 
1070
\subsubsection{Encoding}
1071
 
1072
\code{000001 0 0 DST 00000000 00000000 WORD32}
1073
 
1074
Unlike other instructions, \instr{lc} occupies two 32-bit words.
1075
 
1076
Example: \code{\instr{lc} r1, 0x12345678} $\rightarrow$ \code{0x04010000 0x12345678}
1077
 
1078
\subsubsection{Operation}
1079
 
1080
\code{DST := WORD32}
1081
 
1082
\subsection{\instr{lsb} -- Load Signed Byte}
1083
\label{subsec:instr:lsb}
1084
 
1085
Load a byte from the specified address to the register, performing sign extension.
1086
 
1087
\subsubsection{Syntax}
1088
 
1089
\code{\instr{lsb} DST, RD1}
1090
 
1091
\subsubsection{Encoding}
1092
 
1093
\code{001011 1 0 DST RD1 00000000}
1094
 
1095
RD1 must be a register.
1096
 
1097
Example: \code{\instr{lsb} r2, r1} $\rightarrow$ \code{0x2E020100}
1098
 
1099
\subsubsection{Operation}
1100
 
1101
\code{DST := (\emph{signed}) (*(BYTE*)RD1)}
1102
 
1103
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1104
 
1105
\subsection{\instr{lub} -- Load Unsigned Byte}
1106
\label{subsec:instr:lub}
1107
 
1108
Load a byte from the specified address to the register. Higher 24 bits are zeroed.
1109
 
1110
\subsubsection{Syntax}
1111
 
1112
\code{\instr{lub} DST, RD1}
1113
 
1114
\subsubsection{Encoding}
1115
 
1116
\code{001010 1 0 DST RD1 00000000}
1117
 
1118
RD1 must be a register.
1119
 
1120
Example: \code{\instr{lub} r2, r1} $\rightarrow$ \code{0x2A020100}
1121
 
1122
\subsubsection{Operation}
1123
 
1124
\code{DST := *(BYTE*)RD1}
1125
 
1126
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1127
 
1128
\subsection{\instr{lw} -- Load Word}
1129
\label{subsec:instr:lw}
1130
 
1131
Load a word from the specified address to the register.
1132
 
1133
\subsubsection{Syntax}
1134
 
1135
\code{\instr{lw} DST, RD1}
1136
 
1137
\subsubsection{Encoding}
1138
 
1139
\code{001000 1 0 DST RD1 00000000}
1140
 
1141
RD1 must be a register.
1142
 
1143
Example: \code{\instr{lw} r2, r1} $\rightarrow$ \code{0x22020100}
1144
 
1145
\subsubsection{Operation}
1146
 
1147
\code{DST := *RD1}
1148
 
1149
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1150
 
1151
\subsection{\instr{mods} -- Modulo Signed}
1152
\label{subsec:instr:mods}
1153
 
1154
\subsubsection{Syntax}
1155
 
1156
\code{\instr{mods} DST, RD1, RD2}
1157
 
1158
\subsubsection{Encoding}
1159
 
1160
\code{010111 T1 T2 DST RD1 RD2}
1161
 
1162
Example: \code{\instr{mods} r2, r1, 10} $\rightarrow$ \code{0x5E02010A}
1163
 
1164
\subsubsection{Operation}
1165
 
1166
\code{DST := (\emph{signed}) RD1 mod (\emph{signed}) RD2}
1167
 
1168
Modulo operation satisfies the following condition: if $Q=A/B$ and $R=A \mod B$, then $A=B \cdot Q+R$.
1169
 
1170
The result is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1171
 
1172
\subsection{\instr{modu} -- Modulo Unsigned}
1173
\label{subsec:instr:modu}
1174
 
1175
\subsubsection{Syntax}
1176
 
1177
\code{\instr{modu} DST, RD1, RD2}
1178
 
1179
\subsubsection{Encoding}
1180
 
1181
\code{010110 T1 T2 DST RD1 RD2}
1182
 
1183
Example: \code{\instr{modu} r2, r1, 10} $\rightarrow$ \code{0x5A02010A}
1184
 
1185
\subsubsection{Operation}
1186
 
1187
\code{DST := RD1 mod RD2}
1188
 
1189
Modulo operation satisfies the following condition: if $Q=A/B$ and $R=A \mod B$, then $A=B \cdot Q+R$.
1190
 
1191
The result is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1192
 
1193
\subsection{\instr{mov} -- Move}
1194
\label{subsec:instr:mov}
1195
 
1196
\subsubsection{Syntax}
1197
 
1198
\code{\instr{mov} DST, RD1}
1199
 
1200
Alias for \code{\instr{add} DST, RD1, 0}
1201
 
1202
\subsection{\instr{mul} -- Multiply}
1203
\label{subsec:instr:mul}
1204
 
1205
Multiply two 32-bit values. The result is also 32-bit.
1206
 
1207
\subsubsection{Syntax}
1208
 
1209
\code{\instr{mul} DST, RD1, RD2}
1210
 
1211
\subsubsection{Encoding}
1212
 
1213
\code{010010 T1 T2 DST RD1 RD2}
1214
 
1215
Example: \code{\instr{mul} r2, r1, 3} $\rightarrow$ \code{0x4A020103}
1216
 
1217
\subsubsection{Operation}
1218
 
1219
\code{DST := RD1 * RD2}
1220
 
1221
Since the product width is the same as the operand width, the result of a multiplication does not depend on operand signedness.
1222
 
1223
\subsection{\instr{nop} -- No Operation}
1224
\label{subsec:instr:nop}
1225
 
1226
\subsubsection{Syntax}
1227
 
1228
\instr{nop}
1229
 
1230
\subsubsection{Encoding}
1231
 
1232
\code{000000 0 0 00000000 00000000 00000000}
1233
 
1234
\subsubsection{Operation}
1235
 
1236
This instruction does not alter the machine state.
1237
 
1238
\subsection{\instr{not} -- Bitwise Not}
1239
\label{subsec:instr:not}
1240
 
1241
\subsubsection{Syntax}
1242
 
1243
\code{\instr{not} DST, RD1}
1244
 
1245
Alias for \code{\instr{xor} DST, RD1, -1}.
1246
 
1247
\subsection{\instr{or} -- Bitwise Or}
1248
\label{subsec:instr:or}
1249
 
1250
\subsubsection{Syntax}
1251
 
1252
\code{\instr{or} DST, RD1, RD2}
1253
 
1254
\subsubsection{Encoding}
1255
 
1256
\code{011001 T1 T2 DST RD1 RD2}
1257
 
1258
Example: \code{\instr{or} r2, r1, 0x3F} $\rightarrow$ \code{0x6602013F}
1259
 
1260
\subsubsection{Operation}
1261
 
1262
\code{DST := RD1 $\lor$ RD2}
1263
 
1264
\subsection{\instr{ret} -- Return from Procedure}
1265
\label{subsec:instr:ret}
1266
 
1267
Return from a procedure.
1268
 
1269
\subsubsection{Syntax}
1270
 
1271
\instr{ret}
1272
 
1273
Alias for \code{\instr{jmp} rp}.
1274
 
1275
\subsection{\instr{sb} -- Store Byte}
1276
\label{subsec:instr:sb}
1277
 
1278
Store the lowest byte from the register to the specified address.
1279
 
1280
\subsubsection{Syntax}
1281
 
1282
\code{\instr{sb} RD1, RD2}
1283
 
1284
\subsubsection{Encoding}
1285
 
1286
\code{001110 1 T2 00000000 RD1 RD2}
1287
 
1288
RD1 must be a register.
1289
 
1290
Example: \code{\instr{sb} r2, r1} $\rightarrow$ \code{0x3B000201}
1291
 
1292
\subsubsection{Operation}
1293
 
1294
\code{*(BYTE*)RD1 := RD2 $\land$ 0x000000FF}
1295
 
1296
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1297
 
1298
\subsection{\instr{sl} -- Shift Left}
1299
\label{subsec:instr:sl}
1300
 
1301
\subsubsection{Syntax}
1302
 
1303
\code{\instr{sl} DST, RD1, RD2}
1304
 
1305
\subsubsection{Encoding}
1306
 
1307
\code{011100 T1 T2 DST RD1 RD2}
1308
 
1309
Example: \code{\instr{sl} r2, r1, 5} $\rightarrow$ \code{0x72020105}
1310
 
1311
\subsubsection{Operation}
1312
 
1313
\code{DST := RD1 << RD2}
1314
 
1315
The result is undefined if RD2 is outside the [0; 31] range.
1316
 
1317
\subsection{\instr{srs} -- Shift Right Signed}
1318
\label{subsec:instr:srs}
1319
 
1320
\subsubsection{Syntax}
1321
 
1322
\code{\instr{srs} DST, RD1, RD2}
1323
 
1324
\subsubsection{Encoding}
1325
 
1326
\code{011111 T1 T2 DST RD1 RD2}
1327
 
1328
Example: \code{\instr{srs} r2, r1, 5} $\rightarrow$ \code{0x7E020105}
1329
 
1330
\subsubsection{Operation}
1331
 
1332
\code{DST := ((\emph{signed}) RD1) >> RD2}
1333
 
1334
The result is undefined if RD2 is outside the [0; 31] range.
1335
 
1336
\subsection{\instr{sru} -- Shift Right Unsigned}
1337
\label{subsec:instr:sru}
1338
 
1339
\subsubsection{Syntax}
1340
 
1341
\code{\instr{sru} DST, RD1, RD2}
1342
 
1343
\subsubsection{Encoding}
1344
 
1345
\code{011110 T1 T2 DST RD1 RD2}
1346
 
1347
Example: \code{\instr{sru} r2, r1, 5} $\rightarrow$ \code{0x7A020105}
1348
 
1349
\subsubsection{Operation}
1350
 
1351
\code{DST := RD1 >> RD2}
1352
 
1353
The result is undefined if RD2 is outside the [0; 31] range.
1354
 
1355
\subsection{\instr{sub} -- Subtract}
1356
\label{subsec:instr:sub}
1357
 
1358
\subsubsection{Syntax}
1359
 
1360
\code{\instr{sub} DST, RD1, RD2}
1361
 
1362
\subsubsection{Encoding}
1363
 
1364
\code{010001 T1 T2 DST RD1 RD2}
1365
 
1366
Example: \code{\instr{sub} r2, r1, 5} $\rightarrow$ \code{0x46020105}
1367
 
1368
\subsubsection{Operation}
1369
 
1370
\code{DST := RD1 - RD2}
1371
 
1372
\subsection{\instr{sw} -- Store Word}
1373
\label{subsec:instr:sw}
1374
 
1375
Store the value of the register to the specified address.
1376
 
1377
\subsubsection{Syntax}
1378
 
1379
\code{\instr{sw} RD1, RD2}
1380
 
1381
\subsubsection{Encoding}
1382
 
1383
\code{001100 1 T2 00000000 RD1 RD2}
1384
 
1385
RD1 must be a register.
1386
 
1387
Example: \code{\instr{sw} r2, r1} $\rightarrow$ \code{0x33000201}
1388
 
1389
\subsubsection{Operation}
1390
 
1391
\code{*RD1 := RD2}
1392
 
1393
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1394
 
1395
\subsection{\instr{xor} -- Bitwise Exclusive Or}
1396
\label{subsec:instr:xor}
1397
 
1398
\subsubsection{Syntax}
1399
 
1400
\code{\instr{xor} DST, RD1, RD2}
1401
 
1402
\subsubsection{Encoding}
1403
 
1404
\code{011010 T1 T2 DST RD1 RD2}
1405
 
1406
Example: \code{\instr{xor} r2, r1, 0x3F} $\rightarrow$ \code{0x6A02013F}
1407
 
1408
\subsubsection{Operation}
1409
 
1410
\code{DST := RD1 $\oplus$ RD2}
1411
 
1412
}
1413
 
1414
\settocdepth{section}
1415
 
1416
\chapter{Instruction cycle counts}
1417
 
1418
Cycle counts for \lxp{} instructions are listed in Table \ref{tab:cycles}. These values can change in future hardware revisions.
1419
 
1420
\begin{table}[htbp]
1421
        \centering
1422
        \caption{Instruction cycle counts}
1423
        \label{tab:cycles}
1424
        \begin{tabularx}{0.8\textwidth}{LLLL}
1425
                \toprule
1426
                Instruction & Cycle count & Instruction & Cycle count \\
1427
                \midrule
1428
                \instr{add} & 1 & \instr{modu} & 37 \\
1429
                \instr{and} & 1 & \instr{mov} & 1 \\
1430
                \instr{call} & $\ge$ 4\footnotemark[1] & \instr{mul} & 2, 6 or 34\footnotemark[3] \\
1431
                \instr{cjmp\emph{xxx}} & $\ge$ 5\footnotemark[1] & \instr{nop} & 1 \\
1432
                \instr{divs} & 37 & \instr{not} & 1 \\
1433
                \instr{divu} & 37 & \instr{or} & 1 \\
1434
                \instr{hlt} & N/A & \instr{ret} & $\ge$ 4\footnotemark[1] \\
1435
                \instr{jmp} & $\ge$ 4\footnotemark[1] & \instr{sb} & $\ge$ 2\footnotemark[2] \\
1436
                \instr{iret} & $\ge$ 4\footnotemark[1] & \instr{sl} & 2 \\
1437
                \instr{lc} & 2 & \instr{srs} & 2 \\
1438
                \instr{lsb} & $\ge$ 3\footnotemark[2] & \instr{sru} & 2 \\
1439
                \instr{lub} & $\ge$ 3\footnotemark[2] & \instr{sub} & 1 \\
1440
                \instr{lw} & $\ge$ 3\footnotemark[2] & \instr{sw} & $\ge$ 2\footnotemark[2] \\
1441
                \instr{mods} & 37 & \instr{xor} & 1 \\
1442
                \bottomrule
1443
        \end{tabularx}
1444
\end{table}
1445
 
1446
\footnotetext[1]{Depends on instruction bus latency. Includes pipeline flushing overhead.}
1447
\footnotetext[2]{Depends on data bus latency.}
1448
\footnotetext[3]{Depends on multiplier architecture set with the \code{MUL\_ARCH} generic. See Section \ref{sec:generics}.}
1449
 
1450
\chapter{LXP32 assembly language}
1451
\label{app:assemblylanguage}
1452
 
1453
This appendix defines the assembly language used by \lxp{} development tools.
1454
 
1455
\section{Comments}
1456
 
1457
\lxp{} assembly language supports C style comments that can span across multiple lines and single-line \cplusplus{} style comments:
1458
 
1459
\begin{codepar}\itshape
1460
    /*
1461
     * This is a comment.
1462
     */
1463
 
1464
    // This is also a comment
1465
\end{codepar}
1466
 
1467
From a parser's point of view comments are equivalent to whitespace.
1468
 
1469
\section{Literals}
1470
 
1471
\lxp{} assembly language uses numeric and string literals similar to those provided by the C programming language.
1472
 
1473
Numeric literals can take form of decimal, hexadecimal or octal numbers. Literals prefixed with \code{0x} are interpreted as hexadecimal, literals prefixed with \code{0} are interpreted as octal, other literals are interpreted as decimal. A numeric literal can also start with an unary plus or minus sign which is also considered a part of the literal.
1474
 
1475
String literals must be enclosed in double quotes. The most common escape sequences used in C are supported (Table \ref{tab:stringescape}).
1476
 
1477
\begin{table}[htbp]
1478
        \caption{Escape sequences used in string literals}
1479
        \label{tab:stringescape}
1480
        \begin{tabularx}{\textwidth}{lL}
1481
                \toprule
1482
                Sequence & Interpretation \\
1483
                \midrule
1484
                \code{\textbackslash\textbackslash} & Backslash character \\
1485
                \code{\textbackslash "} & Double quotation mark \\
1486
                \code{\textbackslash '} & Single quotation mark (can be also used directly) \\
1487
                \code{\textbackslash t} & Tabulation character \\
1488
                \code{\textbackslash n} & Line feed \\
1489
                \code{\textbackslash r} & Carriage return \\
1490
                \code{\textbackslash x\emph{XX}} & Character with a hexadecimal code of \emph{XX} (1--2 digits) \\
1491
                \code{\textbackslash \emph{XXX}} & Character with an octal code of \emph{XXX} (1--3 digits) \\
1492
                \bottomrule
1493
        \end{tabularx}
1494
\end{table}
1495
 
1496
\section{Symbols}
1497
\label{sec:symbols}
1498
 
1499
Symbols are used to refer to data or code locations. \lxp{} assembly language does not have distinct code labels and variable declarations: symbols are used in both these contexts.
1500
 
1501
Symbol names must be valid identifiers. A valid identifier must start with an alphabetic character or an underscore, and may contain alphanumeric characters and underscores.
1502
 
1503
A symbol definition must be the first token in a source code line followed by a colon. A symbol definition can occupy a separate line (in which case it refers to the following statement). Alternatively, a statement can follow the symbol definition on the same line.
1504
 
1505
A special \code{entry} symbol is used to inform the linker about program entry point if there are multiple input files. If defined, this symbol must precede the first instruction or data definition statement in the module.
1506
 
1507
Symbols can be used as operands to the \instr{lc} instruction statement. A symbol reference can end with a \code{@\emph{n}} sequence, where \code{\emph{n}} is a numeric literal; in this case it is interpreted as an offset (in bytes) relative to the symbol definition. To refer to symbols defined in other modules, they must first be declared external using the \instr{\#extern} directive.
1508
 
1509
\begin{codeparbreakable}
1510
    \instr{lc} r10, jump\_label
1511
    \instr{lc} r11, data\_word
1512
\emph{// ...}
1513
    \instr{sw} r11, r0 \emph{// store the value of r0 to the}
1514
               \emph{// location pointed by data\_word}
1515
    \instr{jmp} r10    \emph{// transfer execution to jump\_label}
1516
\emph{// ...}
1517
jump\_label:
1518
    \instr{mov} r1, r0
1519
\emph{// ...}
1520
data\_word:
1521
    \instr{.word} 0x12345678
1522
\end{codeparbreakable}
1523
 
1524
\section{Statements}
1525
 
1526
Each statement occupies a single source code line. There are three kinds of statements:
1527
 
1528
\begin{itemize}
1529
        \item \emph{Directives} provide directions for the assembler that do not directly cause code generation.
1530
        \item \emph{Data definition statements} insert arbitrary data to the generated code.
1531
        \item \emph{Instruction statements} insert \lxp{} CPU instructions to the generated code.
1532
\end{itemize}
1533
 
1534
\subsection{Directives}
1535
 
1536
The first token of a directive statement always starts with the \code{\#} character.
1537
 
1538
\begin{codepar}
1539
\instr{\#define} \emph{identifier} \emph{token} [ \emph{token} ... ]
1540
\end{codepar}
1541
 
1542
Defines a macro that will be substituted with one or more tokens. The \code{\emph{identifier}} must satisfy the requirements listed in Section \ref{sec:symbols}. Tokens can be anything, including keywords, identifiers, literals and separators (i.e. comma and colon characters).
1543
 
1544
\begin{codepar}
1545
\instr{\#extern} \emph{identifier}
1546
\end{codepar}
1547
 
1548
Declares \code{\emph{identifier}} as an external symbol. Used to refer to symbols defined in other modules.
1549
 
1550
\begin{codepar}
1551
\instr{\#include} \emph{filename}
1552
\end{codepar}
1553
 
1554
Processes \code{\emph{filename}} contents as it were literally inserted at the point of the \instr{\#include} directive. \code{\emph{filename}} must be a string literal.
1555
 
1556
\begin{codepar}
1557
\instr{\#message} \emph{msg}
1558
\end{codepar}
1559
 
1560
Prints \code{\emph{msg}} to the standard output stream. \code{\emph{msg}} must be a string literal.
1561
 
1562
\subsection{Data definition statements}
1563
 
1564
The first token of a data definition statement always starts with the \code{.} (period) character.
1565
 
1566
\begin{codepar}
1567
\instr{.align} [ \emph{alignment} ]
1568
\end{codepar}
1569
 
1570
Ensures that code generated by the next data definition or instruction statement is aligned to a multiple of \code{\emph{alignment}} bytes, inserting padding zeros if needed. Default \code{\emph{alignment}} is 4. Instructions and words are always at least word-aligned; the \instr{.align} statement can be used to align them to a larger boundary, or to align byte data (see below).
1571
 
1572
\begin{codepar}
1573
\instr{.byte} \emph{token} [, \emph{token} ... ]
1574
\end{codepar}
1575
 
1576
Inserts one or more bytes to the output code. Each \code{\emph{token}} can be either a numeric literal with a valid range of [-128; 255] or a string literal. By default, bytes are not aligned.
1577
 
1578
\begin{codepar}
1579
\instr{.reserve} \emph{n}
1580
\end{codepar}
1581
 
1582
Inserts \code{\emph{n}} zero bytes to the output code.
1583
 
1584
\begin{codepar}
1585
\instr{.word} \emph{token} [, \emph{token} ... ]
1586
\end{codepar}
1587
 
1588
Inserts one or more 32-bit words to the output code. Tokens must be numeric literals.
1589
 
1590
\subsection{Instruction statements}
1591
 
1592
Instruction statements have the following general syntax:
1593
 
1594
\begin{codepar}
1595
    \instr{\emph{instruction}} [ \emph{operand} [, \emph{operand} ... ] ]
1596
\end{codepar}
1597
 
1598
Depending on the instruction, operands can be registers, numeric literals or symbols. Supported instructions are listed in Appendix \ref{app:instructionset}.
1599
 
1600
\chapter{WISHBONE datasheet}
1601
\label{app:wishbonedatasheet}
1602
 
1603
\section[Instruction bus (LXP32C only)]{Instruction bus (\lxp{}C only)}
1604
 
1605
\begin{ctabular}{ll}
1606
        \toprule
1607
        \tabcutin{2}{\makebox[0.9\textwidth][c]{General information}} \\
1608
        \midrule
1609
        WISHBONE revision & B3 \\
1610
        Type of interface & MASTER \\
1611
        Supported cycles  & BLOCK READ \\
1612
        \midrule
1613
        \tabcutin{2}{Signal names} \\
1614
        \midrule
1615
        \signal{clk\_i}       & CLK\_I \\
1616
        \signal{rst\_i}       & RST\_I \\
1617
        \signal{ibus\_cyc\_o} & CYC\_O \\
1618
        \signal{ibus\_stb\_o} & STB\_O \\
1619
        \signal{ibus\_cti\_o} & CTI\_O() \\
1620
        \signal{ibus\_bte\_o} & BTE\_O() \\
1621
        \signal{ibus\_ack\_i} & ACK\_I \\
1622
        \signal{ibus\_adr\_o} & ADR\_O() \\
1623
        \signal{ibus\_dat\_i} & DAT\_I() \\
1624
        \midrule
1625
        \tabcutin{2}{Supported tag signals} \\
1626
        \midrule
1627
        \signal{ibus\_cti\_o} & Cycle Type Identifier (address tag) \\
1628
        & \hspace{\parindent} ``010'' (Incrementing burst cycle) \\
1629
        & \hspace{\parindent} ``111'' (End-of-Burst) \\
1630
        \signal{ibus\_bte\_o} & Burst Type Extension (address tag) \\
1631
        & \hspace{\parindent} ``00'' (Linear burst) \\
1632
        \midrule
1633
        \tabcutin{2}{Dimensions} \\
1634
        \midrule
1635
        Port size & 32 \\
1636
        Port granularity & 32 \\
1637
        Maximum operand size & 32 \\
1638
        Data transfer ordering & BIG/LITTLE ENDIAN \\
1639
        Data transfer sequence & UNDEFINED \\
1640
        \bottomrule
1641
\end{ctabular}
1642
 
1643
\section{Data bus}
1644
 
1645
\begin{ctabular}{ll}
1646
        \toprule
1647
        \tabcutin{2}{\makebox[0.9\textwidth][c]{General information}} \\
1648
        \midrule
1649
        WISHBONE revision & B3 \\
1650
        Type of interface & MASTER \\
1651
        Supported cycles  & SINGLE READ/WRITE \\
1652
                          & RMW \\
1653
        \midrule
1654
        \tabcutin{2}{Signal names} \\
1655
        \midrule
1656
        \signal{clk\_i}       & CLK\_I \\
1657
        \signal{rst\_i}       & RST\_I \\
1658
        \signal{dbus\_cyc\_o} & CYC\_O \\
1659
        \signal{dbus\_stb\_o} & STB\_O \\
1660
        \signal{dbus\_we\_o}  & WE\_O \\
1661
        \signal{dbus\_sel\_o} & SEL\_O() \\
1662
        \signal{dbus\_ack\_i} & ACK\_I \\
1663
        \signal{dbus\_adr\_o} & ADR\_O() \\
1664
        \signal{dbus\_dat\_o} & DAT\_O() \\
1665
        \signal{dbus\_dat\_i} & DAT\_I() \\
1666
        \midrule
1667
        \tabcutin{2}{Dimensions} \\
1668
        \midrule
1669
        Port size & 32 \\
1670
        Port granularity & 8 \\
1671
        Maximum operand size & 32 \\
1672
        Data transfer ordering & LITTLE ENDIAN \\
1673
        Data transfer sequence & UNDEFINED \\
1674
        \bottomrule
1675
\end{ctabular}
1676
 
1677
\end{document}

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.