OpenCores
URL https://opencores.org/ocsvn/lxp32/lxp32/trunk

Subversion Repositories lxp32

[/] [lxp32/] [trunk/] [doc/] [src/] [trm/] [lxp32-trm.tex] - Blame information for rev 6

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 ring0_mipt
% !TEX TS-program = lualatex
2
\documentclass[a4paper,12pt,twoside,extrafontsizes]{memoir}
3
 
4
\input{preamble.tex}
5
 
6
\begin{document}
7
 
8
\input{frontmatter.tex}
9
 
10
\mainmatter
11
 
12
\chapter{Introduction}
13
 
14
\section{Main features}
15
 
16
\lxp{} (\emph{Lightweight eXecution Pipeline}) is a small 32-bit CPU IP core optimized for FPGA implementation. Its key features include:
17
 
18
\begin{itemize}
19 6 ring0_mipt
        \item portability (described in behavioral VHDL-93, not tied to any particular vendor);
20
        \item 3-stage hazard-free pipeline;
21 2 ring0_mipt
        \item 256 registers implemented as a RAM block;
22 6 ring0_mipt
        \item a simple instruction set with only 30 distinct opcodes;
23 2 ring0_mipt
        \item separate instruction and data buses, optional instruction cache;
24 6 ring0_mipt
        \item WISHBONE compatibility;
25 2 ring0_mipt
        \item 8 interrupts with hardwired priorities;
26
        \item optional divider.
27
\end{itemize}
28
 
29 6 ring0_mipt
As a lightweight CPU core, \lxp{} lacks some features of more advanced processors, such as nested interrupt handling, debugging support, floating-point and memory management units. \lxp{} is based on an original ISA (Instruction Set Architecture) which does not currently have a C compiler. It can be programmed in the assembly language covered by Appendix \ref{app:assemblylanguage}.
30 2 ring0_mipt
 
31
Two major hardware versions of the CPU are provided: \lxp{}U which does not include an instruction cache and uses the Low Latency Interface (Section \ref{sec:lli}) to fetch instructions, and \lxp{}C which fetches instructions over a cached WISHBONE bus protocol. These versions are otherwise identical and have the same instruction set architecture.
32
 
33
\section{Implementation estimates}
34
 
35
Typical results of \lxp{} core FPGA implementation are presented in Table \ref{tab:implementation}. Note that these data are only useful as rough estimates, since actual results depend greatly on tool versions and configuration, design constraints, device utilization ratio and other factors.
36
 
37
Data on two configurations are provided:
38
 
39
\begin{itemize}
40
        \item \emph{Compact}: \lxp{}U (without instruction cache), no divider, 2-cycle multiplier.
41
        \item \emph{Full}: \lxp{}C (with instruction cache), divider, 2-cycle multiplier.
42
\end{itemize}
43
 
44
The slowest speed grade was used for clock frequency estimation.
45
 
46
\begin{table}[htbp]
47
        \caption{Typical results of \lxp{} core FPGA implementation}
48
        \label{tab:implementation}
49
        \begin{tabularx}{\textwidth}{Q{0.5\textwidth}LL}
50
                \toprule
51
                Resource & Compact & Full \\
52
                \midrule
53
                \multicolumn{3}{c}{Microsemi\textregistered{} IGLOO\textregistered{}2 M2GL005-FG484} \\
54
                \midrule
55 6 ring0_mipt
                Logic elements (LUT+DFF) & 1457 & 2086 \\
56
                \hspace*{1em}LUTs & 1421 & 1999 \\
57
                \hspace*{1em}Flip-flops & 706 & 1110 \\
58 2 ring0_mipt
                Mathblocks (MACC) & 3 & 3 \\
59
                RAM blocks (RAM1K18) & 2 & 3 \\
60 6 ring0_mipt
                Clock frequency & 107.7 MHz & 109.2 MHz \\
61 2 ring0_mipt
                \midrule
62
                \multicolumn{3}{c}{Xilinx\textregistered{} Artix\textregistered{}-7 xc7a15tfgg484-1} \\
63
                \midrule
64 6 ring0_mipt
                Slices & 235 & 365 \\
65
                \hspace*{1em}LUTs & 666 & 1011 \\
66
                \hspace*{1em}Flip-flops & 528 & 883 \\
67 2 ring0_mipt
                DSP blocks (DSP48E1) & 4 & 4 \\
68
                RAM blocks (RAMB18E1) & 2 & 3 \\
69 6 ring0_mipt
                Clock frequency & 111.9 MHz & 120.2 MHz \\
70 2 ring0_mipt
                \bottomrule
71
        \end{tabularx}
72
\end{table}
73
 
74
\section{Structure of this manual}
75
 
76
General description of the \lxp{} operation from a software developer's point of view can be found in Chapter \ref{ch:isa}, \styledtitleref{ch:isa}. Future versions of the \lxp{} CPU are intended to be at least backwards compatible with this architecture.
77
 
78 6 ring0_mipt
Topics related to hardware, such as synthesis, implementation and interfacing other IP cores, are covered in Chapter \ref{ch:integration}, \styledtitleref{ch:integration}. A brief description of the \lxp{} pipelined architecture is provided in Chapter \ref{ch:pipeline}, \styledtitleref{ch:pipeline}. The \lxp{} IP core package includes a verification environment (self-checking testbench) which can be used to simulate the design as described in Chapter \ref{ch:simulation}, \styledtitleref{ch:simulation}.
79 2 ring0_mipt
 
80 6 ring0_mipt
Documentation for tools shipped with the \lxp{} IP core package (assembler/linker, disassembler and interconnect generator) is provided in Chapter \ref{ch:developmenttools}, \styledtitleref{ch:developmenttools}.
81 2 ring0_mipt
 
82
Appendices include a detailed description of the \lxp{} instruction set, instruction cycle counts and \lxp{} assembly language definition. WISHBONE datasheet required by the WISHBONE specification is also provided.
83
 
84
\chapter{Instruction set architecture}
85
\label{ch:isa}
86
 
87
\section{Data format}
88
 
89
Most \lxp{} instructions work with 32-bit data words. A few instructions that address individual bytes use little-endian order, that is, the least significant byte is stored at the lowest address. Signed values are encoded in a 2's complement format.
90
 
91
\section{Instruction format}
92
\label{sec:instructionformat}
93
 
94
All \lxp{} instructions are encoded as 32-bit words, with the exception of \instr{lc} (\instrname{Load Constant}), which occupies two adjacent 32-bit words. Instructions in memory must be aligned to word boundaries.
95
 
96
Most arithmetic and logical instructions take two source operands and write the result to an independent destination register. General instruction format is presented on Figure \ref{fig:instructionformat}.
97
 
98
\begin{figure}[htbp]
99
        \centering
100
        \includegraphics[scale=1.2]{images/instructionformat.pdf}
101
        \caption{\lxp{} instruction format}
102
        \label{fig:instructionformat}
103
\end{figure}
104
 
105
This format includes the following fields:
106
 
107
\begin{enumerate}
108
        \item OPCODE -- a 6-bit instruction code (see Appendix \ref{app:instructionset}).
109
        \item T1 -- type of the RD1 field.
110
        \item T2 -- type of the RD2 field.
111
        \item DST -- register number (usually the destination register).
112
        \item RD1 -- register/direct operand 1.
113
        \item RD2 -- register/direct operand 2.
114
\end{enumerate}
115
 
116
Some of these fields may not have meaning for a particular instruction; such unused fields are replaced with zeros.
117
 
118
DST field specifies one of the 256 \lxp{} registers. RD1 and RD2 fields can denote either source register operands or direct (immediate) operands: if the corresponding T field is 1, RD value is a register number, otherwise it is interpreted as a direct signed byte in a 2's complement format (valid values range from -128 to 127).
119
 
120
For example, consider the following instruction that adds \code{10} to \code{r0} and writes the result to \code{r1}:
121
 
122
\begin{codepar}
123
    \instr{add} r1, r0, 10
124
\end{codepar}
125
 
126
In this example, OPCODE is \code{010000}, T1 is \code{1}, T2 is \code{0}, DST is \code{00000001}, RD1 is \code{00000000} and RD2 is \code{00001010}. Hence, the instruction is encoded as \code{0x4201000A}.
127
 
128
For convenience, some instructions have alias mnemonics. For example, \lxp{} does not have a distinct \instr{mov} opcode: instead, \code{\instr{mov} dst, src} is an alias for \code{\instr{add} dst, src, 0}.
129
 
130
A complete list of \lxp{} instructions is provided in Appendix \ref{app:instructionset}.
131
 
132
\section{Registers}
133
 
134
\lxp{} has 256 registers denoted as \code{r0} -- \code{r255}. The first 240 of them (from \code{r0} to \code{r239}) are general-purpose registers (GPR), the last 16 (from \code{r240} to \code{r255}) are special-purpose registers (SPR). For convenience, some special-purpose registers have alias names: for example, \code{r255} can be also referred to as \code{sp} (stack pointer). Special purpose registers are listed in Table \ref{tab:spr}. Some of these registers are reserved: the software should not access them.
135
 
136
\begin{table}[htbp]
137
        \caption{\lxp{} special-purpose registers}
138
        \label{tab:spr}
139
        \begin{tabularx}{\textwidth}{llL}
140
                \toprule
141
                Alias name & Generic name & Description \\
142
                \midrule
143
                \code{iv0} & \code{r240} & Interrupt vector 0 (Section \ref{sec:interrupthandling}) \\
144
                \code{iv1} & \code{r241} & Interrupt vector 1 (Section \ref{sec:interrupthandling}) \\
145
                \code{iv2} & \code{r242} & Interrupt vector 2 (Section \ref{sec:interrupthandling}) \\
146
                \code{iv3} & \code{r243} & Interrupt vector 3 (Section \ref{sec:interrupthandling}) \\
147
                \code{iv4} & \code{r244} & Interrupt vector 4 (Section \ref{sec:interrupthandling}) \\
148
                \code{iv5} & \code{r245} & Interrupt vector 5 (Section \ref{sec:interrupthandling}) \\
149
                \code{iv6} & \code{r246} & Interrupt vector 6 (Section \ref{sec:interrupthandling}) \\
150
                \code{iv7} & \code{r247} & Interrupt vector 7 (Section \ref{sec:interrupthandling}) \\
151
                \multicolumn{1}{l}{---} & \code{r248}\,--\,\code{r251} & \emph{Reserved} \\
152
                \code{cr}  & \code{r252} & Control register (Section \ref{sec:interrupthandling}) \\
153
                \code{irp} & \code{r253} & Interrupt return pointer (Section \ref{sec:interrupthandling}) \\
154
                \code{rp}  & \code{r254} & Return pointer (Section \ref{sec:callingprocedures})\\
155
                \code{sp}  & \code{r255} & Stack pointer (Section \ref{sec:stack}) \\
156
                \bottomrule
157
        \end{tabularx}
158
\end{table}
159
 
160
All registers are zero-initialized during the CPU reset.
161
 
162
\section{Addressing}
163
\label{sec:addressing}
164
 
165
All addressing in \lxp{} is indirect. In order to access a memory location, its address must be stored in a register; any available register can be used for this purpose.
166
 
167
Some instructions, namely \instr{lsb} (\instrname{Load Signed Byte}), \instr{lub} (\instrname{Load Unsigned Byte}) and \instr{sb} (\instrname{Store Byte}) provide byte-granular access, in which case all 32 bits in the address are significant. Otherwise the least two address bits are ignored as \lxp{} doesn't support unaligned access to 32-bit data words (during simulation, a warning is emitted if such a transaction is attempted).
168
 
169
A special rule applies to pointers that refer to instructions: since instructions are always word-aligned, the least significant bit is interpreted as the \code{IRF} (\emph{Interrupt Return Flag}). See Section \ref{sec:interrupthandling} for details.
170
 
171
\section{Stack}
172
\label{sec:stack}
173
 
174
The current pointer to the top of the stack is stored in the \code{sp} register. To the hardware this register is not different from general purpose registers, that is, in no situation does the CPU access the stack implicitly (procedure calls and interrupts use register-based conventions).
175
 
176
Software can access the stack as follows:
177
 
178
\begin{codepar}
179
    \emph{// push r0 on the stack}
180
    \instr{sub} sp, sp, 4
181
    \instr{sw} sp, r0
182
    \emph{// pop r0 from the stack}
183
    \instr{lw} r0, sp
184
    \instr{add} sp, sp, 4
185
\end{codepar}
186
 
187
Before using the stack, the \code{sp} register must be set up to point to a valid memory location. The simplest software can operate stackless, or even without data memory altogether if registers are enough to store the program state.
188
 
189
\section{Calling procedures}
190
\label{sec:callingprocedures}
191
 
192 6 ring0_mipt
\lxp{} provides a \instr{call} instruction which saves the address of the next instruction in the \code{rp} register and transfers execution to the address stored in the register operand. Return from a procedure is performed by the \code{\instr{jmp} rp} instruction which also has a \instr{ret} alias.
193 2 ring0_mipt
 
194 6 ring0_mipt
If a procedure must in turn call a nested procedure itself, the return address in the \code{rp} register will be overwritten by the \instr{call} instruction. Hence, unless it is a tail call (see below), the procedure must save the \code{rp} value somewhere; the most general solution is to use the stack:
195 2 ring0_mipt
 
196
\begin{codepar}
197
    \instr{sub} sp, sp, 4
198
    \instr{sw} sp, rp
199
    ...
200 6 ring0_mipt
    \instr{lc} r0, Nested_proc
201
    \instr{call} r0
202 2 ring0_mipt
    ...
203
    \instr{lw} rp, sp
204
    \instr{add} sp, sp, 4
205
    \instr{ret}
206
\end{codepar}
207
 
208 6 ring0_mipt
Procedures that don't use the \instr{call} instruction (sometimes called \emph{leaf procedures}) don't need to save the \code{rp} value.
209 2 ring0_mipt
 
210 6 ring0_mipt
Since \instr{ret} is just an alias for \code{\instr{jmp} rp}, one can also use \instrname{Compare and Jump} instructions (\instr{cjmp\emph{xxx}}) to perform a conditional procedure return. For example, consider the following procedure which calculates the absolute value of \code{r1}:
211 2 ring0_mipt
 
212 6 ring0_mipt
\begin{codepar}
213
Abs_proc:
214
    \instr{cjmpsge} rp, r1, 0 \emph{// return immediately if r1>=0}
215
    \instr{neg} r1, r1 \emph{// otherwise, negate r1}
216
    \instr{ret} \emph{// jmp rp}
217
\end{codepar}
218
 
219
A \emph{tail call} is a special type of procedure call where the calling procedure calls a nested procedure as the last action before return. In such cases the \instr{call} instruction can be replaced with \instr{jmp}, so that when the nested procedure executes \instr{ret}, it returns directly to the caller's parent procedure.
220
 
221 2 ring0_mipt
Although the \lxp{} architecture doesn't mandate any particular calling convention, some general recommendations are presented below:
222
 
223
\begin{enumerate}
224 6 ring0_mipt
        \item Pass arguments and return values through the \code{r1}--\code{r31} registers (a procedure can have multiple return values).
225
        \item If necessary, the \code{r0} register can be used to load the procedure address.
226
        \item Designate \code{r0}--\code{r31} registers as \emph{caller-saved}, that is, they are not guaranteed to be preserved during procedure calls and must be saved by the caller if needed. The procedure can use them for any purpose, regardless of whether they are used to pass arguments and/or return values.
227 2 ring0_mipt
\end{enumerate}
228
 
229
\section{Interrupt handling}
230
\label{sec:interrupthandling}
231
 
232
\subsection{Control register}
233
 
234
\lxp{} supports 8 interrupts with hardwired priority levels (interrupts with lower vector numbers have higher priority). Interrupts vectors (pointers to interrupt handlers) are stored in the \code{iv0}--\code{iv7} registers. Interrupt handling is controlled by the \code{cr} register (Table \ref{tab:cr}).
235
 
236
\begin{table}[htbp]
237
        \caption{Control register}
238
        \label{tab:cr}
239
        \begin{tabularx}{\textwidth}{lL}
240
                \toprule
241
                Bit & Description \\
242
                \midrule
243
 
244
                1      & Enable interrupt 1 \\
245
                & \ldots \\
246
                7      & Enable interrupt 7 \\
247
                8      & Temporarily block interrupt 0 \\
248
                9      & Temporarily block interrupt 1 \\
249
                & \ldots \\
250
                15     & Temporarily block interrupt 7 \\
251
                31--16 & \emph{Reserved} \\
252
                \bottomrule
253
        \end{tabularx}
254
\end{table}
255
 
256
Disabled interrupts are ignored altogether: if the CPU receives an interrupt request signal while the corresponding interrupt is disabled, the interrupt handler will not be called even if the interrupt is enabled later. Conversely, temporarily blocked interrupts are still registered, but their handlers are not called until they are unblocked.
257
 
258
Like other registers, \code{cr} is zero-initialized during the CPU reset, meaning that no interrupts are initially enabled.
259
 
260
\subsection{Invoking interrupt handlers}
261
 
262
Interrupt handlers are invoked by the CPU similarly to procedures (Section \ref{sec:callingprocedures}), the difference being that in this case return address is stored in the \code{irp} register (as opposed to \code{rp}), and the least significant bit of the register (\code{IRF} -- \emph{Interrupt Return Flag}) is set.
263
 
264 6 ring0_mipt
An interrupt handler returns using the \code{\instr{jmp} irp} instruction which also has an \instr{iret} alias. Until the interrupt handler returns, the CPU will defer further interrupt processing (although incoming interrupt requests will still be registered). This also means that the \code{irp} register value will not be unexpectedly overwritten. When executing the \code{\instr{jmp} irp} instruction, the CPU will recognize the \code{IRF} flag and resume interrupt processing as usual. It is also possible to perform a conditional return from the interrupt handler, similarly to the technique described in Section \ref{sec:callingprocedures} for conditional procedure returns.
265 2 ring0_mipt
 
266 6 ring0_mipt
\subsection{Non-returnable interrupts}
267 2 ring0_mipt
 
268 6 ring0_mipt
If an interrupt vector has the least significant bit (\code{IRF}) set, the CPU will resume interrupt processing immediately. One should not try to invoke \instr{iret} from such a handler since the \code{irp} register could have been overwritten by another interrupt. This technique can be useful when the CPU's only task is to process external events:
269 2 ring0_mipt
 
270 6 ring0_mipt
\begin{codeparbreakable}
271
\emph{// Set the IRF to mark the interrupt as non-returnable}
272
    \instr{lc} iv0, main\_loop@1
273
    \instr{mov} cr, 1 \emph{// enable the interrupt}
274
    \instr{hlt} \emph{// wait for an interrupt request}
275
main\_loop:
276
\emph{// Process the event...}
277
    \instr{hlt} \emph{// wait for the next interrupt request}
278
\end{codeparbreakable}
279
 
280
Note that \instr{iret} is never called in this example.
281
 
282 2 ring0_mipt
\chapter{Integration}
283
\label{ch:integration}
284
 
285
\section{Overview}
286
 
287
The \lxp{} IP core is delivered in a form of a synthesizable RTL description expressed in \mbox{VHDL-93}. It does not use any technology specific primitives and should work out of the box with major FPGA synthesis software. \lxp{} can be integrated in both VHDL and Verilog\textregistered{} based SoC designs.
288
 
289
Major \lxp{} hardware versions have separate top-level design units:
290
 
291
\begin{itemize}
292
        \item \shellcmd{lxp32u\_top} -- \lxp{}U (without instruction cache),
293
        \item \shellcmd{lxp32c\_top} -- \lxp{}C (with instruction cache).
294
\end{itemize}
295
 
296
A high level block diagram of the CPU is presented on Figure \ref{fig:blockdiagram}. Schematic symbols for \lxp{}U and \lxp{}C are shown on Figure \ref{fig:symbols}.
297
 
298
\begin{figure}[htbp]
299
        \centering
300
        \includegraphics[scale=0.85]{images/blockdiagram.pdf}
301
        \caption{\lxp{} CPU block diagram}
302
        \label{fig:blockdiagram}
303
\end{figure}
304
 
305
\begin{figure}[htbp]
306
        \centering
307
        \includegraphics[scale=0.85]{images/symbols.pdf}
308
        \caption{Schematic symbols for \lxp{}U and \lxp{}C}
309
        \label{fig:symbols}
310
\end{figure}
311
 
312 6 ring0_mipt
\lxp{}U uses the Low Latency Interface (LLI) described in Section \ref{sec:lli} to fetch instructions. This interface is designed to interact with low latency on-chip peripherals such as RAM blocks. It works best with slaves that can return the instruction on the next cycle after its address has been set, although the slave can still introduce wait states if needed. Low Latency Interface can be also connected to a custom (external) instruction cache.
313 2 ring0_mipt
 
314 6 ring0_mipt
To achieve the least possible latency, some LLI outputs are not registered. For this reason the LLI is not suitable for interaction with off-chip peripherals.
315 2 ring0_mipt
 
316 6 ring0_mipt
\lxp{}C is designed to work with high latency memory controllers and uses a simple instruction cache based on a ring buffer. The instructions are fetched over the WISHBONE instruction bus. To maximize throughput, the CPU makes use of the WISHBONE registered feedback signals [CTI\_O()] and [BTE\_O()]. All outputs on this bus are registered. This version is also recommended for use in situations where LLI combinatorial delays are unacceptable.
317 2 ring0_mipt
 
318 6 ring0_mipt
Both \lxp{}U and \lxp{}C use the WISHBONE protocol for the data bus.
319 2 ring0_mipt
 
320
\section{Ports}
321
 
322
\begin{ctabular}{lccl}
323
        \toprule
324
        Port & Direction & Bus width & Description \\
325
        \midrule
326
        \tabcutin{4}{Global signals} \\
327
        \midrule
328
        \signal{clk\_i} & in & 1 & System clock \\
329
        \signal{rst\_i} & in & 1 & Synchronous reset, active high \\
330
        \midrule
331
        \tabcutin{4}{Instruction bus -- Low Latency Interface (\lxp{}U only)} \\
332
        \midrule
333
        \signal{lli\_re\_o} & out & 1 & Read enable output, active high \\
334
        \signal{lli\_adr\_o} & out & 30 & Address output \\
335
        \signal{lli\_dat\_i} & in & 32 & Data input \\
336
        \signal{lli\_busy\_i} & in & 1 & Busy flag input, active high \\
337
        \midrule
338
        \tabcutin{4}{Instruction bus -- WISHBONE (\lxp{}C only)} \\
339
        \midrule
340
        \signal{ibus\_cyc\_o} & out & 1 & Cycle output \\
341
        \signal{ibus\_stb\_o} & out & 1 & Strobe output \\
342
        \signal{ibus\_cti\_o} & out & 3 & Cycle type identifier \\
343
        \signal{ibus\_bte\_o} & out & 2 & Burst type extension \\
344
        \signal{ibus\_ack\_i} & in & 1 & Acknowledge input \\
345
        \signal{ibus\_adr\_o} & out & 30 & Address output \\
346
        \signal{ibus\_dat\_i} & in & 32 & Data input \\
347
        \midrule
348
        \tabcutin{4}{Data bus} \\
349
        \midrule
350
        \signal{dbus\_cyc\_o} & out & 1 & Cycle output \\
351
        \signal{dbus\_stb\_o} & out & 1 & Strobe output \\
352
        \signal{dbus\_we\_o} & out & 1 & Write enable output \\
353
        \signal{dbus\_sel\_o} & out & 4 & Select output \\
354
        \signal{dbus\_ack\_i} & in & 1 & Acknowledge input \\
355
        \signal{dbus\_adr\_o} & out & 30 & Address output \\
356
        \signal{dbus\_dat\_o} & out & 32 & Data output \\
357
        \signal{dbus\_dat\_i} & in & 32 & Data input \\
358
        \midrule
359
        \tabcutin{4}{Other ports} \\
360
        \midrule
361
        \signal{irq\_i} & in & 8 & Interrupt requests \\
362
        \bottomrule
363
\end{ctabular}
364
 
365
\section{Generics}
366
\label{sec:generics}
367
 
368
The following generics can be used to configure the \lxp{} IP core parameters.
369
 
370
\subsection{DBUS\_RMW}
371
 
372
By default, \lxp{} uses the \signal{dbus\_sel\_o} (byte enable) port to perform byte-granular write transactions initiated by the \instr{sb} (\instrname{Store Byte}) instruction. If this option is set to \code{true}, \signal{dbus\_sel\_o} is always tied to \code{"1111"}, and byte-granular write access is performed using the RMW (read-modify-write) cycle. The latter method is slower, but can work with slaves that do not have the [SEL\_I()] port.
373
 
374 6 ring0_mipt
This feature is designed with the assumption that read and write transactions do not cause side effects, thus it can be unsuitable for some slaves.
375 2 ring0_mipt
 
376
\subsection{DIVIDER\_EN}
377
 
378 6 ring0_mipt
\lxp{} includes a divider unit which has quite a low performance but occupies a considerable amount of resources. It can be disabled by setting this option to \code{false}.
379 2 ring0_mipt
 
380
\subsection{IBUS\_BURST\_SIZE}
381
 
382
Instruction bus burst size. Default value is 16. Only for \lxp{}C.
383
 
384
\subsection{IBUS\_PREFETCH\_SIZE}
385
 
386
Number of words that the instruction cache will read ahead from the current instruction pointer. Default value is 32. Only for \lxp{}C.
387
 
388
\subsection{MUL\_ARCH}
389
 
390
\lxp{} provides three multiplier options:
391
 
392
\begin{itemize}
393
        \item \code{"dsp"} is the fastest architecture designed for technologies that provide fast parallel $16 \times 16$ multipliers, which includes most modern FPGA families. One multiplication takes 2 clock cycles.
394
        \item \code{"opt"} architecture uses a semi-parallel multiplication algorithm based on carry-save accumulation of partial products. It is designed for technologies that do not provide fast $16 \times 16$ multipliers. One multiplication takes 6 clock cycles.
395
        \item \code{"seq"} is a fully sequential design. One multiplication takes 34 clock cycles.
396
\end{itemize}
397
 
398
The default multiplier architecture is \code{"dsp"}. This option is recommended for most modern FPGA devices regardless of optimization goal since it is not only the fastest, but also occupies the least amount of general-purpose logic resources. However, it will create a timing bottleneck on technologies that lack fast multipliers.
399
 
400
For older FPGA families that don't provide dedicated multipliers the \code{"opt"} architecture can be used if decent throughput is still needed. It is designed to avoid creating a timing bottleneck on such technologies. Alternatively, \code{"seq"} architecture can be used when throughput is not a concern.
401
 
402
\subsection{START\_ADDR}
403
 
404 6 ring0_mipt
Address of the first instruction to be executed after CPU reset. Default value is \code{0}. The two least significant bits are ignored as instructions are always word-aligned.
405 2 ring0_mipt
 
406
\section{Clock and reset}
407
\label{sec:clockreset}
408
 
409
All flip-flops in the CPU are triggered by a rising edge of the \signal{clk\_i} signal. No specific requirements are imposed on the \signal{clk\_i} signal apart from usual constraints on setup and hold times.
410
 
411
\lxp{} is reset synchronously when the \signal{rst\_i} signal is asserted. If the system reset signal comes from an asynchronous source, a synchronization circuit must be used; an example of such a circuit is shown on Figure \ref{fig:resetsync}.
412
 
413
\begin{figure}[htbp]
414
        \centering
415
        \includegraphics[scale=1]{images/resetsync.pdf}
416
        \caption{Reset synchronization circuit}
417
        \label{fig:resetsync}
418
\end{figure}
419
 
420
In SRAM-based FPGAs flip-flops and RAM blocks have deterministic state after a bitstream is loaded. On such technologies \lxp{} can operate without reset. In this case the \signal{rst\_i} port can be tied to a logical \code{0} in the RTL design to allow the synthesizer to remove redundant logic.
421
 
422
\signal{clk\_i} and \signal{rst\_i} signals also serve the role of [CLK\_I] and [RST\_I] WISHBONE signals, respectively, for both instruction and data buses.
423
 
424
\section{Low Latency Interface}
425
\label{sec:lli}
426
 
427 6 ring0_mipt
Low Latency Interface (LLI) is a simple pipelined synchronous protocol with a typical latency of 1 cycle used by \lxp{}U to fetch instructions. It was designed to allow simple connection of the CPU to on-chip program RAM or cache. The timing diagram of the LLI is shown on Figure \ref{fig:llitiming}.
428 2 ring0_mipt
 
429
\begin{figure}[htbp]
430
        \centering
431
        \includegraphics[scale=1]{images/llitiming.pdf}
432
        \caption{Low Latency Interface timing diagram (\lxp{}U)}
433
        \label{fig:llitiming}
434
\end{figure}
435
 
436 6 ring0_mipt
To request a word, the master produces its address on \signal{lli\_adr\_o} and asserts \signal{lli\_re\_o}. The request is considered valid when \signal{lli\_re\_o} is high and \signal{lli\_busy\_i} is low on the same clock cycle. On the next cycle after a valid request, the slave must either produce data on \signal{lli\_dat\_i} or assert \signal{lli\_busy\_i} to indicate that data are not ready. \signal{lli\_busy\_i} must be held high until the valid data are present on the \signal{lli\_dat\_i} port.
437
 
438
The data provided by the slave are only required to be valid on the next cycle after a valid request (if \signal{lli\_busy\_i} is not asserted) or on the cycle when \signal{lli\_busy\_i} is deasserted after being held high. Otherwise \signal{lli\_dat\_i} is undefined.
439
 
440
The values of \signal{lli\_re\_o} and \signal{lli\_adr\_o} are not guaranteed to be preserved by the master while the slave is busy.
441
 
442
The simplest slaves such as on-chip RAM blocks which are never busy can be trivially connected to the LLI by connecting address, data and read enable ports and tying the \signal{lli\_busy\_i} signal to a logical \code{0} (you can even ignore \signal{lli\_re\_o} in this case, although doing so can theoretically increase power consumption).
443
 
444 2 ring0_mipt
Note that the \signal{lli\_adr\_o} signal has a width of 30 bits since it addresses words, not bytes (instructions are always word-aligned).
445
 
446 6 ring0_mipt
Since the \signal{lli\_re\_o} output signal is not registered, this interface is not suitable for interaction with off-chip peripherals. Also, care should be taken to avoid introducing too much additional combinatorial delay on its outputs.
447 2 ring0_mipt
 
448
\section{WISHBONE instruction bus}
449
 
450
The \lxp{}C CPU fetches instructions over the WISHBONE bus. Its parameters are defined in the WISHBONE datasheet (Appendix \ref{app:wishbonedatasheet}). For a detailed description of the bus protocol refer to the WISHBONE specification, revision B3.
451
 
452 6 ring0_mipt
With classic WISHBONE handshake decent throughput can be only achieved when the slave is able to terminate cycles asynchronously. It is usually possible only for the simplest slaves which should probably be using the Low Latency Interface instead. To maximize throughput for complex, high latency slaves, \lxp{}C instruction bus uses optional WISHBONE address tags [CTI\_O()] (Cycle Type Identifier) and [BTE\_O()] (Burst Type Extension). These signals are hints allowing the slave to predict the address that will be set by the master in the next cycle and prepare data in advance. The slave can ignore these hints, processing requests as classic WISHBONE cycles, although performance would almost certainly suffer in this case.
453 2 ring0_mipt
 
454
A typical \lxp{}C instruction bus burst timing diagram is shown on Figure \ref{fig:ibustiming}.
455
 
456
\begin{figure}[htbp]
457
        \centering
458
        \includegraphics[scale=0.786]{images/ibustiming.pdf}
459
        \caption{Typical WISHBONE instruction bus burst (\lxp{}C)}
460
        \label{fig:ibustiming}
461
\end{figure}
462
 
463
\section{WISHBONE data bus}
464
 
465
\lxp{} uses the WISHBONE bus to interact with data memory and other peripherals. This bus is distinct from the instruction bus; its parameters are defined in the WISHBONE datasheet (Appendix \ref{app:wishbonedatasheet}).
466
 
467
This bus uses a 30-bit \signal{dbus\_adr\_o} port to address 32-bit words; the \signal{dbus\_sel\_o} port is used to select individual bytes to be written or read. Alternatively, with the \code{DBUS\_RMW} option (Section \ref{sec:generics}) the \signal{dbus\_sel\_o} port is not used; byte-granular write access is performed using the read-modify-write cycle instead.
468
 
469
For a detailed description of the bus protocol refer to the WISHBONE specification, revision B3.
470
 
471
Typical timing diagrams for write and read cycles are shown on Figure \ref{fig:dbustiming}. In these examples the peripheral terminates the cycle asynchronously; however, it can also introduce wait states by delaying the \signal{dbus\_ack\_i} signal.
472
 
473
\begin{figure}[htbp]
474
        \centering
475
        \includegraphics[scale=0.928]{images/dbustiming.pdf}
476
        \caption{Typical WISHBONE data bus WRITE and READ cycles}
477
        \label{fig:dbustiming}
478
\end{figure}
479
 
480
\section{Interrupts}
481
 
482
\lxp{} registers an interrupt condition when the corresponding request signal goes from \code{0} to \code{1}. Transitions from \code{1} to \code{0} are ignored. All interrupt request signals must be synchronous with the system clock (\signal{clk\_i}); if coming from an asynchronous source, they must be synchronized using a sequence of at least two flip-flops clocked by \signal{clk\_i}. These flip-flops are not included in the \lxp{} core in order not to increase interrupt processing delay for interrupt sources that are inherently synchronous. Failure to properly synchronize interrupt request signals will cause timing violations that will manifest itself as intermittent, hard to debug faults.
483
 
484
\section{Synthesis and optimization}
485
\label{sec:synthesis}
486
 
487
\subsection{Technology specific primitives}
488
 
489
\lxp{} RTL design is described in behavioral VHDL. However, it can also benefit from certain special resources provided by most FPGA devices, namely, RAM blocks and dedicated multipliers. For improved portability, hardware description that can potentially be mapped to such resources is localized in separate design units:
490
 
491
\begin{itemize}
492
        \item \shellcmd{lxp32\_ram256x32} -- a dual-port synchronous $256 \times 32$ bit RAM with one write port and one read port;
493
        \item \shellcmd{lxp32\_mul16x16} -- an unsigned $16 \times 16$ multiplier with an output register.
494
\end{itemize}
495
 
496
These design units contain behavioral description of respective hardware that is recognizable by FPGA synthesis tools. Usually no adjustments are needed as the synthesizer will automatically infer an appropriate primitive from its behavioral description. If automatic inference produces unsatisfactory results, these design units can be replaced with library element wrappers. The same is true for ASIC logic synthesis software which is unlikely to infer complex primitives.
497
 
498 6 ring0_mipt
\lxp{} implements its own bypass logic dealing with situations when RAM read and write addresses collide. It does not depend on the read/write conflict resolution behavior of the underlying primitive.
499
 
500 2 ring0_mipt
\subsection{General optimization guidelines}
501
 
502
This subsection contains general advice on achieving satisfactory synthesis results regardless of the optimization goal. Some of these suggestions are also mentioned in other parts of this manual.
503
 
504
\begin{enumerate}
505
        \item If the technology doesn't provide dedicated multiplier resources, consider using \code{"opt"} or \code{"seq"} multiplier architecture (Section \ref{sec:generics}).
506
 
507
        \item Ensure that the instruction bus has adequate throughput. For \lxp{}C, check that the slave supports the WISHBONE registered feedback signals [CTI\_I()] and [BTE\_I()].
508
 
509
        \item Multiplexing instruction and data buses, or connecting them to the same interconnect that allows only one master at a time to be active (i.e. \emph{shared bus} interconnect topology) is not recommended. If you absolutely must do so, assign a higher priority level to the data bus, otherwise instruction prefetches will massively slow down data transactions.
510 6 ring0_mipt
 
511
        \item For small programs, consider mapping code and data memory to the beginning or end of the address space (i.e. \code{0x00000000}--\code{0x000FFFFF} or \code{0xFFF00000}--\code{0xFFFFFFFF}) to be able to load pointers with the \instr{lcs} instruction which saves both memory and CPU cycles as compared to \instr{lc}.
512 2 ring0_mipt
\end{enumerate}
513
 
514
\subsection{Optimizing for timing}
515
 
516
\begin{enumerate}
517 6 ring0_mipt
        \item Set up reasonable timing constraints. Do not overconstrain the design by more than 10--15~\%.
518 2 ring0_mipt
 
519
        \item Analyze the worst path. The natural \lxp{} timing bottleneck usually goes from the scratchpad (register file) output through the ALU (in the Execute stage) to the scratchpad input. If timing analysis lists other critical paths, the problem can lie elsewhere. If the \signal{rst\_i} signal becomes a bottleneck, promote it to a global network or, with SRAM-based FPGAs, consider operating without reset (see Section \ref{sec:clockreset}). Critical paths affecting the WISHBONE state machines could indicate problems with interconnect performance.
520
 
521
        \item Configure the synthesis tool to reduce the fanout limit. Note that setting this limit to a too small value can lead to an opposite effect.
522
 
523
        \item Synthesis tools can support additional options to improve timing, such as the \emph{Retiming} algorithm which rearranges registers and combinatorial logic across the pipeline in attempt to balance delays. The efficiency of such algorithms is not very predictable. In general, sloppy designs are the most likely to benefit from it, while for a carefully designed circuit timing can sometimes get worse.
524
\end{enumerate}
525
 
526
\subsection{Optimizing for area}
527
 
528
\begin{enumerate}
529 6 ring0_mipt
        \item Consider disabling the divider if not using it (see Section \ref{sec:generics}).
530 2 ring0_mipt
 
531
        \item Relaxing timing constraints can sometimes allow the synthesizer to produce a more area-efficient circuit.
532
 
533
        \item Increase the fanout limit in the synthesizer settings to reduce buffer replication.
534
\end{enumerate}
535
 
536 6 ring0_mipt
\chapter{Hardware architecture}
537
\label{ch:pipeline}
538
 
539
The \lxp{} CPU is based on a 3-stage hazard-free pipelined architecture and uses a large RAM-based register file (scratchpad) with two read ports and one write port. The pipeline includes the following stages:
540
 
541
\begin{itemize}
542
        \item\emph{Fetch} -- fetches instructions from the program memory.
543
        \item\emph{Decode} -- decodes instructions and reads register operand values from the scratchpad.
544
        \item\emph{Execute} -- executes instructions and writes the results (if any) to the scratchpad.
545
\end{itemize}
546
 
547
\lxp{} instructions are encoded in such a way that operand register numbers can be known without decoding the instruction (Section \ref{sec:instructionformat}). When the \emph{Fetch} stage produces an instruction, scratchpad input addresses are set immediately, before the instruction itself is decoded. If the instruction does not use one or both of the register operands, the corresponding data read from the scratchpad are discarded. Collision bypass logic in the scratchpad detects situations where the \emph{Decode} stage tries to read a register which is currently being written by the \emph{Execute} stage and forwards its value, bypassing the RAM block and avoiding Read After Write (RAW) pipeline hazards. Other types of data hazards are also impossible with this architecture.
548
 
549
As an example, consider the following simple code chunk:
550
 
551
\begin{codepar}
552
    \instr{mov} r0, 10 \emph{// alias for add r0, 10, 0}
553
    \instr{mov} r1, 20 \emph{// alias for add r1, 20, 0}
554
    \instr{add} r2, r0, r1
555
\end{codepar}
556
 
557
Table \ref{tab:examplepipeline} illustrates how this chunk is processed by the \lxp{} pipeline. Note that on the fourth cycle the \emph{Decode} stage requests the \code{r1} register value while the \emph{Execute} stage writes to the same register. Collision bypass logic in the scratchpad ensures that the \emph{Decode} stage reads the correct (new) value of \code{r1} without stalling the pipeline.
558
 
559
\begin{table}[htbp]
560
        \caption{Example of the \lxp{} pipeline operation}
561
        \small
562
        \label{tab:examplepipeline}
563
        \begin{tabularx}{\textwidth}{lllL}
564
                \toprule
565
                Cycle & Fetch & Decode & Execute \\
566
                \midrule
567
                1 & \code{\instr{add} r0, 10, 0} & & \\
568
                \midrule
569
                2 & \code{\instr{add} r1, 20, 0} & \code{\instr{add} r0, 10, 0} & \\
570
                  & & Request \code{r10} (discarded) & \\
571
                  & & Request \code{r0} (discarded) & \\
572
                  & & Pass 10 and 0 as operands & \\
573
                \midrule
574
                3 & \code{\instr{add} r2, r0, r1} & \code{\instr{add} r1, 20, 0} & Perform the addition \\
575
                  & & Request \code{r20} (discarded) & Write 10 to \code{r0} \\
576
                  & & Request \code{r0} (discarded) & \\
577
                  & & Pass 20 and 0 as operands & \\
578
                \midrule
579
                4 & & \code{\instr{add} r2, r0, r1} & Perform the addition \\
580
                  & & Request \code{r0} & Write 20 to \code{r1} \\
581
                  & & Request \code{r1} (bypass) & \\
582
                  & & Pass 10 and 20 as operands & \\
583
                \midrule
584
                5 & & & Perform the addition \\
585
                  & & & Write 30 to \code{r2} \\
586
                \bottomrule
587
        \end{tabularx}
588
\end{table}
589
 
590
When an instruction takes more than one cycle to execute, the \emph{Execute} stage simply stalls the pipeline.
591
 
592
Branch hazards are impossible in \lxp{} as well since the pipeline is flushed whenever an execution transfer occurs.
593
 
594 2 ring0_mipt
\chapter{Simulation}
595
\label{ch:simulation}
596
 
597
\lxp{} package includes an automated verification environment (self-checking testbench) which verifies the \lxp{} CPU functional correctness. The environment consists of two major parts: a test platform which is a SoC-like design providing peripherals for the CPU to interact with, and the testbench itself which loads test firmware and monitors the platform's output signals. Like the CPU itself, the test environment is written in VHDL-93.
598
 
599
A separate testbench for the instruction cache (\shellcmd{lxp32\_icache}) is also provided. It can be invoked similarly to the main CPU testbench.
600
 
601
\section{Requirements}
602
 
603
The following software is required to simulate the \lxp{} design:
604
 
605
\begin{itemize}
606
        \item An HDL simulator supporting VHDL-93. \lxp{} package includes scripts (makefiles) for the following simulators:
607
 
608
        \begin{itemize}
609
                \item GHDL -- a free and open-source VHDL simulator which supports multiple operating systems\footnote{\url{http://ghdl.free.fr/}};
610
                \item Mentor Graphics\textregistered{} ModelSim\textregistered{} simulator (\shellcmd{vsim});
611
                \item Xilinx\textregistered{} Vivado\textregistered{} Simulator (\shellcmd{xsim}).
612
        \end{itemize}
613
 
614
        With GHDL, a waveform viewer such as GTKWave is also recommended (Figure \ref{fig:gtkwave})\footnote{\url{http://gtkwave.sourceforge.net/}}.
615
 
616
        Some FPGA vendors provide limited versions of the ModelSim\textregistered{} simulator for free as parts of their design suites. These versions should suffice for \lxp{} simulation.
617
 
618
        Other simulators can be used with some preparations (Section \ref{sec:simmanual}).
619
 
620
        \item GNU \shellcmd{make} and \shellcmd{coreutils} are needed to simulate the design using the provided makefiles. Under Microsoft\textregistered{} Windows\textregistered{}, MSYS or Cygwin can be used.
621
        \item \lxp{} assembler/linker program (\shellcmd{lxp32asm}) must be present (Section \ref{sec:lxp32asm}). A prebuilt executable for Microsoft\textregistered{} Windows\textregistered{} is already included in the \lxp{} package, for other operating systems \shellcmd{lxp32asm} must be built from source (Section \ref{sec:buildfromsource}).
622
\end{itemize}
623
 
624
\begin{figure}[htbp]
625
        \centering
626
        \includegraphics[scale=0.65]{images/gtkwave.png}
627
        \caption{GTKWave displaying the \lxp{} waveform dump produced by GHDL}
628
        \label{fig:gtkwave}
629
\end{figure}
630
 
631
\section{Running simulation using makefiles}
632
 
633
To simulate the design, go to the \shellcmd{verify/lxp32/run/<\emph{simulator}>} directory and run \shellcmd{make}. The following make targets are supported:
634
 
635
\begin{itemize}
636
        \item \shellcmd{batch} -- simulate the design in batch mode. Results will be written to the standard output. This is the default target.
637
        \item \shellcmd{gui} -- simulate the design in GUI mode. Note: since GHDL doesn't have a GUI, the simulation itself will be run in batch mode; upon a successful completion, GTKWave will be run automatically to display dumped waveforms.
638
        \item \shellcmd{compile} -- compile only, don't run simulation.
639
        \item \shellcmd{clean} -- delete all the produced artifacts.
640
\end{itemize}
641
 
642
\section{Running simulation manually}
643
\label{sec:simmanual}
644
 
645
\lxp{} testbench can be also run manually. The following steps must be performed:
646
 
647
\begin{enumerate}
648
        \item Compile the test firmware in the \shellcmd{verify/lxp32/src/firmware} directory:
649
 
650
        \begin{codepar}
651
    lxp32asm -f textio \emph{filename}.asm -o \emph{filename}.ram
652
        \end{codepar}
653
 
654
        Produced \shellcmd{*.ram} files must be placed to the simulator's working directory.
655
        \item Compile the \lxp{} RTL description (\shellcmd{rtl} directory).
656 6 ring0_mipt
        \item Compile the common package (\shellcmd{verify/common\_pkg}).
657 2 ring0_mipt
        \item Compile the test platform (\shellcmd{verify/lxp32/src/platform} directory).
658
        \item Compile the testbench itself (\shellcmd{verify/lxp32/src/tb} directory).
659
        \item Simulate the \shellcmd{tb} design unit defined in the \shellcmd{tb.vhd} file.
660
\end{enumerate}
661
 
662
\section{Testbench parameters}
663
 
664
Simulation parameters can be configured by overriding generics defined by the \shellcmd{tb} design unit:
665
 
666
\begin{itemize}
667 6 ring0_mipt
        \item \code{CPU\_DBUS\_RMW} -- \code{DBUS\_RMW} CPU generic value (see Section \ref{sec:generics}).
668
        \item \code{CPU\_MUL\_ARCH} -- \code{MUL\_ARCH} CPU generic value (see Section \ref{sec:generics}).
669 2 ring0_mipt
        \item \code{MODEL\_LXP32C} -- simulate the \lxp{}C version. By default, this option is set to \code{true}. If set to \code{false}, \lxp{}U is simulated instead.
670
        \item \code{TEST\_CASE} -- if set to a non-empty string, specifies the file name of a test case to run. If set to an empty string (default), all tests are executed.
671
        \item \code{THROTTLE\_DBUS} -- perform pseudo-random data bus throttling. By default, this option is set to \code{true}.
672
        \item \code{THROTTLE\_IBUS} -- perform pseudo-random instruction bus throttling. By default, this option is set to \code{true}.
673
        \item \code{VERBOSE} -- print more messages.
674
\end{itemize}
675
 
676
\chapter{Development tools}
677
\label{ch:developmenttools}
678
 
679
\section{\shellcmd{lxp32asm} -- Assembler and linker}
680
\label{sec:lxp32asm}
681
 
682
\shellcmd{lxp32asm} is a combined assembler and linker for the \lxp{} platform. It takes one or more input files and produces executable code for the CPU. Input files can be either source files in the \lxp{} assembly language (Appendix \ref{app:assemblylanguage}) or \emph{linkable objects}. Linkable object is a relocatable format for storing compiled \lxp{} code together with symbol information.
683
 
684
\shellcmd{lxp32asm} operates in two stages:
685
 
686
\begin{enumerate}
687
        \item Compile.
688
 
689
        Source files are compiled to linkable objects.
690
 
691
        \item Link.
692
 
693
        Linkable objects are combined into a single executable module. References to symbols defined in external modules are resolved at this stage.
694
\end{enumerate}
695
 
696
In the simplest case there is only one input source file which doesn't contain external symbol references. If there are multiple input files, one of them must define the \code{entry} symbol at the beginning of the code.
697
 
698
\subsection{Command line syntax}
699 6 ring0_mipt
\label{subsec:assemblercmdline}
700 2 ring0_mipt
 
701
\begin{codepar}
702
    lxp32asm [ \emph{options} | \emph{input files} ]
703
\end{codepar}
704
 
705 6 ring0_mipt
\subsubsection{General options}
706 2 ring0_mipt
 
707
\begin{itemize}
708
        \item \shellcmd{-c} -- compile only (skip the Link stage).
709
 
710
        \item \shellcmd{-h}, \shellcmd{--help} -- display a short help message and exit.
711
 
712 6 ring0_mipt
        \item \shellcmd{-o \emph{file}} -- output file name.
713
 
714
        \item \shellcmd{--} -- do not interpret the subsequent command line arguments as options. Can be used if there are input file names starting with a dash.
715
\end{itemize}
716
 
717
\subsubsection{Compiler options}
718
 
719
\begin{itemize}
720 2 ring0_mipt
        \item \shellcmd{-i \emph{dir}} -- add \emph{dir} to the list of directories used to search for included files. Multiple directories can be specified with multiple \shellcmd{-i} arguments.
721 6 ring0_mipt
\end{itemize}
722
 
723
\subsubsection{Linker options (ignored in compile-only mode)}
724
 
725
\begin{itemize}
726
        \item \shellcmd{-a \emph{align}} -- object alignment. Must be a power of 2 and can't be less than 4. Default value is 4.
727 2 ring0_mipt
 
728 6 ring0_mipt
        \item \shellcmd{-b \emph{addr}} -- base address, that is, the address in memory where the executable image will be located. Must be a multiple of object alignment. Default value is 0.
729 2 ring0_mipt
 
730 6 ring0_mipt
        \item \shellcmd{-f \emph{fmt}} -- executable image format. See below for the list of supported formats.
731 2 ring0_mipt
 
732 6 ring0_mipt
        \item \shellcmd{-m \emph{file}} -- generate a map file. A map file is a human-readable list of all object and symbol addresses in the executable image.
733
 
734
        \item \shellcmd{-s \emph{size}} -- size of the executable image. Must be a multiple of 4. If total code size is less than the specified value, the executable image is padded with zeros. By default, the image is not padded.
735 2 ring0_mipt
\end{itemize}
736
 
737
\subsection{Output formats}
738
 
739 6 ring0_mipt
Output formats that can be specified with the \shellcmd{-f} command line option are listed below.
740 2 ring0_mipt
 
741
\begin{itemize}
742 6 ring0_mipt
        \item \shellcmd{bin} -- raw binary image (little-endian). This is the default format.
743 2 ring0_mipt
        \item \shellcmd{textio} -- text format representing binary data as a sequence of zeros and ones. This format can be directly read from VHDL (using the \code{std.textio} package) or Verilog\textregistered{} (using the \code{\$readmemb} function).
744
        \item \shellcmd{dec} -- text format representing each word as a decimal number.
745
        \item \shellcmd{hex} -- text format representing each word as a hexadecimal number.
746
\end{itemize}
747
 
748
\section{\shellcmd{lxp32dump} -- Disassembler}
749
 
750
\shellcmd{lxp32dump} takes an executable image and produces a source file in \lxp{} assembly language. The produced file is a valid program that can be compiled by \shellcmd{lxp32asm}.
751
 
752
\subsection{Command line syntax}
753
 
754
\begin{codepar}
755
    lxp32dump [ \emph{options} | \emph{input file} ]
756
\end{codepar}
757
 
758
Supported options are:
759
 
760
\begin{itemize}
761
        \item \shellcmd{-b \emph{addr}} -- executable image base address, only used for comments.
762
 
763
        \item \shellcmd{-f \emph{fmt}} -- input file format. All \shellcmd{lxp32asm} output formats are supported. If this option is not supplied, autodetection is performed.
764
 
765
        \item \shellcmd{-h}, \shellcmd{--help} -- display a short help message and exit.
766
 
767 6 ring0_mipt
        \item \shellcmd{-na} -- do not use instruction aliases (such as \instr{mov}, \instr{ret}, \instr{not}) and register aliases (such as \code{sp}, \code{rp}).
768
 
769 2 ring0_mipt
        \item \shellcmd{-o \emph{file}} -- output file name. By default, the standard output stream is used.
770
 
771
        \item \shellcmd{--} -- do not interpret subsequent command line arguments as options.
772
\end{itemize}
773
 
774
\section{\shellcmd{wigen} -- Interconnect generator}
775
 
776 6 ring0_mipt
\shellcmd{wigen} is a small tool that generates VHDL description of a simple WISHBONE interconnect based on shared bus topology. It supports any number of masters and slaves. The interconnect can then be used to create a SoC based on \lxp{}.
777 2 ring0_mipt
 
778
For interconnects with multiple masters a priority-based arbitration circuit is inserted with lower-numbered masters taking precedence. However, when a bus cycle is in progress ([CYC\_O] is asserted by the active master), the arbiter will not interrupt it even if a master with a higher priority level requests bus ownership.
779
 
780
\subsection{Command line syntax}
781
 
782
\begin{codepar}
783
        wigen [ \emph{option(s)} ] \emph{nm} \emph{ns} \emph{ma} \emph{sa} \emph{ps} [ \emph{pg} ]
784
\end{codepar}
785
 
786
\begin{itemize}
787
        \item\shellcmd{\emph{nm}} -- number of masters,
788
        \item\shellcmd{\emph{ns}} -- number of slaves,
789
        \item\shellcmd{\emph{ma}} -- master address width,
790
        \item\shellcmd{\emph{sa}} -- slave address width,
791
        \item\shellcmd{\emph{ps}} -- port size (8, 16, 32 or 64),
792
        \item\shellcmd{\emph{pg}} -- port granularity (8, 16, 32 or 64, default: the same as port size).
793
\end{itemize}
794
 
795
Supported options are:
796
 
797
\begin{itemize}
798
        \item \shellcmd{-e \emph{entity}} -- name of the design entity (default is \code{"intercon"}).
799
 
800
        \item \shellcmd{-h}, \shellcmd{--help} -- display a short help message and exit.
801
 
802
        \item \shellcmd{-o \emph{file}} -- output file name (default is \shellcmd{\emph{entity}.vhd}).
803
 
804
        \item \shellcmd{-p} -- generate pipelined arbiter (reduced combinatorial delays, increased latency).
805
 
806
        \item \shellcmd{-r} -- generate WISHBONE registered feedback signals ([CTI\_IO()] and [BTE\_IO()]).
807
 
808
        \item \shellcmd{-u} -- generate unsafe slave decoder (reduced combinatorial delays and resource usage, may not work properly if the address is invalid).
809
\end{itemize}
810
 
811
\section{Building from source}
812
\label{sec:buildfromsource}
813
 
814
Prebuilt tool executables for 32-bit Microsoft\textregistered{} Windows\textregistered{} are included in the \lxp{} IP core package. For other platforms the tools must be built from source. Since they are developed in \cplusplus{} using only the standard library, it should be possible to build them for any platform that provides a modern \cplusplus{} compiler.
815
 
816
\subsection{Requirements}
817
 
818
The following software is required to build \lxp{} tools from source:
819
 
820
\begin{enumerate}
821
        \item A modern \cplusplus{} compiler, such as Microsoft\textregistered{} Visual Studio\textregistered{} 2013 or newer, GCC 4.8 or newer, Clang 3.4 or newer.
822
        \item CMake 3.3 or newer.
823
\end{enumerate}
824
 
825
\subsection{Build procedure}
826
 
827 6 ring0_mipt
This software uses CMake as a build system generator. Building it involves two steps: first, the \shellcmd{cmake} program is invoked to generate a native build environment (a set of Makefiles or an IDE project); second, the generated environment is used to build the software. More details can be found in the CMake documentation.
828 2 ring0_mipt
 
829
\subsubsection{Examples}
830
 
831
In the following examples, it is assumed that the commands are run from the \shellcmd{tools} subdirectory of the \lxp{} IP core package tree.
832
 
833
For Microsoft\textregistered{} Visual Studio\textregistered{}:
834
 
835
\begin{codepar}
836
    mkdir build
837
    cd build
838
    cmake -G "NMake Makefiles" ../src
839
    nmake
840
    nmake install
841
\end{codepar}
842
 
843
For MSYS:
844
 
845
\begin{codepar}
846
    mkdir build
847
    cd build
848
    cmake -G "MSYS Makefiles" ../src
849
    make
850
    make install
851
\end{codepar}
852
 
853
For MinGW without MSYS:
854
 
855
\begin{codepar}
856
    mkdir build
857
    cd build
858
    cmake -G "MinGW Makefiles" ../src
859
    mingw32-make
860
    mingw32-make install
861
\end{codepar}
862
 
863
For other platforms:
864
 
865
\begin{codepar}
866
    mkdir build
867
    cd build
868
    cmake ../src
869
    make
870
    make install
871
\end{codepar}
872
 
873
\appendix
874
 
875
\chapter{Instruction set reference}
876
\label{app:instructionset}
877
 
878
See Section \ref{sec:instructionformat} for a general description of \lxp{} instruction encoding.
879
 
880
\section{List of instructions by group}
881
 
882
\begin{ctabular}{lll}
883
        \toprule
884
        Instruction & Description & Opcode \\
885
        \midrule
886
        \tabcutin{3}{Data transfer} \\
887
        \midrule
888
        \hyperref[subsec:instr:mov]{\instr{mov}} & Move & alias for \code{\instr{add} dst, src, 0} \\
889
        \hyperref[subsec:instr:lc]{\instr{lc}} & Load Constant & \code{000001} \\
890 6 ring0_mipt
        \hyperref[subsec:instr:lcs]{\instr{lcs}} & Load Constant Short & \code{101xxx} \\
891 2 ring0_mipt
        \hyperref[subsec:instr:lw]{\instr{lw}} & Load Word & \code{001000} \\
892
        \hyperref[subsec:instr:lub]{\instr{lub}} & Load Unsigned Byte & \code{001010} \\
893
        \hyperref[subsec:instr:lsb]{\instr{lsb}} & Load Signed Byte & \code{001011} \\
894
        \hyperref[subsec:instr:sw]{\instr{sw}} & Store Word & \code{001100} \\
895
        \hyperref[subsec:instr:sb]{\instr{sb}} & Store Byte & \code{001110} \\
896
        \midrule
897
        \tabcutin{3}{Arithmetic operations} \\
898
        \midrule
899
        \hyperref[subsec:instr:add]{\instr{add}} & Add & \code{010000} \\
900
        \hyperref[subsec:instr:sub]{\instr{sub}} & Subtract & \code{010001} \\
901 6 ring0_mipt
        \hyperref[subsec:instr:neg]{\instr{neg}} & Negate & alias for \code{\instr{sub} dst, 0, src} \\
902 2 ring0_mipt
        \hyperref[subsec:instr:mul]{\instr{mul}} & Multiply & \code{010010} \\
903
        \hyperref[subsec:instr:divu]{\instr{divu}} & Divide Unsigned & \code{010100} \\
904
        \hyperref[subsec:instr:divs]{\instr{divs}} & Divide Signed & \code{010101} \\
905
        \hyperref[subsec:instr:modu]{\instr{modu}} & Modulo Unsigned & \code{010110} \\
906
        \hyperref[subsec:instr:mods]{\instr{mods}} & Modulo Signed & \code{010111} \\
907
        \midrule
908
        \tabcutin{3}{Bitwise operations} \\
909
        \midrule
910
        \hyperref[subsec:instr:not]{\instr{not}} & Bitwise Not & alias for \code{\instr{xor} dst, src, -1} \\
911
        \hyperref[subsec:instr:and]{\instr{and}} & Bitwise And & \code{011000} \\
912
        \hyperref[subsec:instr:or]{\instr{or}} & Bitwise Or & \code{011001} \\
913
        \hyperref[subsec:instr:xor]{\instr{xor}} & Bitwise Exclusive Or & \code{011010}\\
914
        \hyperref[subsec:instr:sl]{\instr{sl}} & Shift Left & \code{011100} \\
915
        \hyperref[subsec:instr:sru]{\instr{sru}} & Shift Right Unsigned & \code{011110} \\
916
        \hyperref[subsec:instr:srs]{\instr{srs}} & Shift Right Signed & \code{011111} \\
917
        \midrule
918
        \tabcutin{3}{Execution transfer} \\
919
        \midrule
920
        \hyperref[subsec:instr:jmp]{\instr{jmp}} & Jump & \code{100000} \\
921
        \hyperref[subsec:instr:cjmpxxx]{\instr{cjmp\emph{xxx}}} & Compare and Jump & \code{11\emph{xxxx}} (\code{\emph{xxxx}} = condition) \\
922
        \hyperref[subsec:instr:call]{\instr{call}} & Call Procedure & \code{100001} \\
923
        \hyperref[subsec:instr:ret]{\instr{ret}} & Return from Procedure & alias for \code{\instr{jmp} rp} \\
924
        \hyperref[subsec:instr:iret]{\instr{iret}} & Interrupt Return & alias for \code{\instr{jmp} irp}\\
925
        \midrule
926
        \tabcutin{3}{Miscellaneous instructions} \\
927
        \midrule
928
        \hyperref[subsec:instr:nop]{\instr{nop}} & No Operation & \code{000000} \\
929
        \hyperref[subsec:instr:hlt]{\instr{hlt}} & Halt & \code{000010} \\
930
\end{ctabular}
931
 
932
\section{Alphabetical list of instructions}
933
 
934
\settocdepth{subsection}
935
 
936
{
937
\setlength{\parindent}{0pt}
938
\nonzeroparskip
939
 
940
\subsection{\instr{add} -- Add}
941
\label{subsec:instr:add}
942
 
943
\subsubsection{Syntax}
944
 
945
\code{\instr{add} DST, RD1, RD2}
946
 
947
\subsubsection{Encoding}
948
 
949
\code{010000 T1 T2 DST RD1 RD2}
950
 
951
Example: \code{\instr{add} r2, r1, 10} $\rightarrow$ \code{0x4202010A}
952
 
953
\subsubsection{Operation}
954
 
955
\code{DST := RD1 + RD2}
956
 
957
\subsection{\instr{and} -- Bitwise And}
958
\label{subsec:instr:and}
959
 
960
\subsubsection{Syntax}
961
 
962
\code{\instr{and} DST, RD1, RD2}
963
 
964
\subsubsection{Encoding}
965
 
966
\code{011000 T1 T2 DST RD1 RD2}
967
 
968
Example: \code{\instr{and} r2, r1, 0x3F} $\rightarrow$ \code{0x6202013F}
969
 
970
\subsubsection{Operation}
971
 
972
\code{DST := RD1 $\land$ RD2}
973
 
974
\subsection{\instr{call} -- Call Procedure}
975
\label{subsec:instr:call}
976
 
977
Save a pointer to the next instruction in the \code{rp} register and transfer execution to the address pointed by the operand.
978
 
979
\subsubsection{Syntax}
980
 
981
\code{\instr{call} RD1}
982
 
983
\subsubsection{Encoding}
984
 
985
\code{100001 1 0 11111110 RD1 00000000}
986
 
987
RD1 must be a register.
988
 
989
Example: \code{\instr{call} r1} $\rightarrow$ \code{0x86FE0100}
990
 
991
\subsubsection{Operation}
992
 
993
\code{rp := \emph{return\_address}}
994
 
995
\code{goto RD1}
996
 
997
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
998
 
999
\subsection{\instr{cjmp\emph{xxx}} -- Compare and Jump}
1000
\label{subsec:instr:cjmpxxx}
1001
 
1002
Compare two operands and transfer execution to the specified address if a condition is satisfied.
1003
 
1004
\subsubsection{Syntax}
1005
 
1006
\code{\instr{cjmpe} DST, RD1, RD2} (Equal)
1007
 
1008
\code{\instr{cjmpne} DST, RD1, RD2} (Not Equal)
1009
 
1010
\code{\instr{cjmpsg} DST, RD1, RD2} (Signed Greater)
1011
 
1012
\code{\instr{cjmpsge} DST, RD1, RD2} (Signed Greater or Equal)
1013
 
1014
\code{\instr{cjmpsl} DST, RD1, RD2} (Signed Less)
1015
 
1016
\code{\instr{cjmpsle} DST, RD1, RD2} (Signed Less or Equal)
1017
 
1018
\code{\instr{cjmpug} DST, RD1, RD2} (Unsigned Greater)
1019
 
1020
\code{\instr{cjmpuge} DST, RD1, RD2} (Unsigned Greater or Equal)
1021
 
1022
\code{\instr{cjmpul} DST, RD1, RD2} (Unsigned Less)
1023
 
1024
\code{\instr{cjmpule} DST, RD1, RD2} (Unsigned Less or Equal)
1025
 
1026
\subsubsection{Encoding}
1027
 
1028
\code{OPCODE T1 T2 DST RD1 RD2}
1029
 
1030
Opcodes:
1031
 
1032
\begin{tabularx}{\textwidth}{lL}
1033
\instr{cjmpe}   & \code{111000} \\
1034
\instr{cjmpne}  & \code{110100} \\
1035
\instr{cjmpsg}  & \code{110001} \\
1036
\instr{cjmpsge} & \code{111001} \\
1037
\instr{cjmpug}  & \code{110010} \\
1038
\instr{cjmpuge} & \code{111010} \\
1039
\end{tabularx}
1040
 
1041 6 ring0_mipt
\instr{cjmpsl}, \instr{cjmpsle}, \instr{cjmpul}, \instr{cjmpule} instructions are aliases for \instr{cjmpsg}, \instr{cjmpsge}, \instr{cjmpug}, \instr{cjmpuge}, respectively, with RD1 and RD2 operands swapped.
1042 2 ring0_mipt
 
1043
Example: \code{\instr{cjmpuge} r2, r1, 5} $\rightarrow$ \code{0xEA020105}
1044
 
1045
\subsubsection{Operation}
1046
 
1047
\code{if \emph{condition} then goto DST}
1048
 
1049
Pointer in DST is interpreted as described in Section \ref{sec:addressing}. Unlike most instructions, \instr{cjmp\emph{xxx}} does not write to DST.
1050
 
1051
\subsection{\instr{divs} -- Divide Signed}
1052
\label{subsec:instr:divs}
1053
 
1054
\subsubsection{Syntax}
1055
 
1056
\code{\instr{divs} DST, RD1, RD2}
1057
 
1058
\subsubsection{Encoding}
1059
 
1060
\code{010101 T1 T2 DST RD1 RD2}
1061
 
1062
Example: \code{\instr{divs} r2, r1, -3} $\rightarrow$ \code{0x560201FD}
1063
 
1064
\subsubsection{Operation}
1065
 
1066
\code{DST := (\emph{signed}) RD1 / (\emph{signed}) RD2}
1067
 
1068
The result is rounded towards zero and is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1069
 
1070
\subsection{\instr{divu} -- Divide Unsigned}
1071
\label{subsec:instr:divu}
1072
 
1073
\subsubsection{Syntax}
1074
 
1075
\code{\instr{divu} DST, RD1, RD2}
1076
 
1077
\subsubsection{Encoding}
1078
 
1079
\code{010100 T1 T2 DST RD1 RD2}
1080
 
1081
Example: \code{\instr{divu} r2, r1, 73} $\rightarrow$ \code{0x52020107}
1082
 
1083
\subsubsection{Operation}
1084
 
1085
\code{DST := RD1 / RD2}
1086
 
1087
The result is rounded towards zero and is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1088
 
1089
\subsection{\instr{hlt} -- Halt}
1090
\label{subsec:instr:hlt}
1091
 
1092
Wait for an interrupt.
1093
 
1094
\subsubsection{Syntax}
1095
 
1096
\code{\instr{hlt}}
1097
 
1098
\subsubsection{Encoding}
1099
 
1100
\code{000010 0 0 00000000 00000000 00000000}
1101
 
1102
\subsubsection{Operation}
1103
 
1104
Pause execution until an interrupt is received.
1105
 
1106
\subsection{\instr{jmp} -- Jump}
1107
\label{subsec:instr:jmp}
1108
 
1109
Transfer execution to the address pointed by the operand.
1110
 
1111
\subsubsection{Syntax}
1112
 
1113
\code{\instr{jmp} RD1}
1114
 
1115
\subsubsection{Encoding}
1116
 
1117
\code{100000 1 0 00000000 RD1 00000000}
1118
 
1119
RD1 must be a register.
1120
 
1121
Example: \code{\instr{jmp} r1} $\rightarrow$ \code{0x82000100}
1122
 
1123
\subsubsection{Operation}
1124
 
1125
\code{goto RD1}
1126
 
1127
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1128
 
1129
\subsection{\instr{iret} -- Interrupt Return}
1130
\label{subsec:instr:iret}
1131
 
1132
Return from an interrupt handler.
1133
 
1134
\subsubsection{Syntax}
1135
 
1136
\instr{iret}
1137
 
1138
Alias for \code{\instr{jmp} irp}.
1139
 
1140
\subsection{\instr{lc} -- Load Constant}
1141
\label{subsec:instr:lc}
1142
 
1143 6 ring0_mipt
Load a 32-bit word to the specified register. Note that values from the [-1048576; 1048575] range can be loaded more efficiently using the \instr{lcs} instruction.
1144 2 ring0_mipt
 
1145
\subsubsection{Syntax}
1146
 
1147
\code{\instr{lc} DST, WORD32}
1148
 
1149
\subsubsection{Encoding}
1150
 
1151
\code{000001 0 0 DST 00000000 00000000 WORD32}
1152
 
1153
Unlike other instructions, \instr{lc} occupies two 32-bit words.
1154
 
1155
Example: \code{\instr{lc} r1, 0x12345678} $\rightarrow$ \code{0x04010000 0x12345678}
1156
 
1157
\subsubsection{Operation}
1158
 
1159
\code{DST := WORD32}
1160
 
1161 6 ring0_mipt
\subsection{\instr{lcs} -- Load Constant Short}
1162
\label{subsec:instr:lcs}
1163
 
1164
Load a signed value from the [-1048576; 1048575] range (a sign extended 21-bit value) to the specified register. Unlike the \instr{lc} instruction, this instruction is encoded as a single word.
1165
 
1166
\subsubsection{Syntax}
1167
 
1168
\code{\instr{lcs} DST, VAL}
1169
 
1170
\subsubsection{Encoding}
1171
 
1172
\code{101 VAL[20:16] DST VAL[15:0]}
1173
 
1174
Example: \code{\instr{lcs} r1, -1000000} $\rightarrow$ \code{0xB001BDC0}
1175
 
1176
\subsubsection{Operation}
1177
 
1178
\code{DST := (\emph{signed}) VAL}
1179
 
1180 2 ring0_mipt
\subsection{\instr{lsb} -- Load Signed Byte}
1181
\label{subsec:instr:lsb}
1182
 
1183
Load a byte from the specified address to the register, performing sign extension.
1184
 
1185
\subsubsection{Syntax}
1186
 
1187
\code{\instr{lsb} DST, RD1}
1188
 
1189
\subsubsection{Encoding}
1190
 
1191
\code{001011 1 0 DST RD1 00000000}
1192
 
1193
RD1 must be a register.
1194
 
1195
Example: \code{\instr{lsb} r2, r1} $\rightarrow$ \code{0x2E020100}
1196
 
1197
\subsubsection{Operation}
1198
 
1199
\code{DST := (\emph{signed}) (*(BYTE*)RD1)}
1200
 
1201
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1202
 
1203
\subsection{\instr{lub} -- Load Unsigned Byte}
1204
\label{subsec:instr:lub}
1205
 
1206
Load a byte from the specified address to the register. Higher 24 bits are zeroed.
1207
 
1208
\subsubsection{Syntax}
1209
 
1210
\code{\instr{lub} DST, RD1}
1211
 
1212
\subsubsection{Encoding}
1213
 
1214
\code{001010 1 0 DST RD1 00000000}
1215
 
1216
RD1 must be a register.
1217
 
1218
Example: \code{\instr{lub} r2, r1} $\rightarrow$ \code{0x2A020100}
1219
 
1220
\subsubsection{Operation}
1221
 
1222
\code{DST := *(BYTE*)RD1}
1223
 
1224
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1225
 
1226
\subsection{\instr{lw} -- Load Word}
1227
\label{subsec:instr:lw}
1228
 
1229
Load a word from the specified address to the register.
1230
 
1231
\subsubsection{Syntax}
1232
 
1233
\code{\instr{lw} DST, RD1}
1234
 
1235
\subsubsection{Encoding}
1236
 
1237
\code{001000 1 0 DST RD1 00000000}
1238
 
1239
RD1 must be a register.
1240
 
1241
Example: \code{\instr{lw} r2, r1} $\rightarrow$ \code{0x22020100}
1242
 
1243
\subsubsection{Operation}
1244
 
1245
\code{DST := *RD1}
1246
 
1247
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1248
 
1249
\subsection{\instr{mods} -- Modulo Signed}
1250
\label{subsec:instr:mods}
1251
 
1252
\subsubsection{Syntax}
1253
 
1254
\code{\instr{mods} DST, RD1, RD2}
1255
 
1256
\subsubsection{Encoding}
1257
 
1258
\code{010111 T1 T2 DST RD1 RD2}
1259
 
1260
Example: \code{\instr{mods} r2, r1, 10} $\rightarrow$ \code{0x5E02010A}
1261
 
1262
\subsubsection{Operation}
1263
 
1264
\code{DST := (\emph{signed}) RD1 mod (\emph{signed}) RD2}
1265
 
1266
Modulo operation satisfies the following condition: if $Q=A/B$ and $R=A \mod B$, then $A=B \cdot Q+R$.
1267
 
1268
The result is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1269
 
1270
\subsection{\instr{modu} -- Modulo Unsigned}
1271
\label{subsec:instr:modu}
1272
 
1273
\subsubsection{Syntax}
1274
 
1275
\code{\instr{modu} DST, RD1, RD2}
1276
 
1277
\subsubsection{Encoding}
1278
 
1279
\code{010110 T1 T2 DST RD1 RD2}
1280
 
1281
Example: \code{\instr{modu} r2, r1, 10} $\rightarrow$ \code{0x5A02010A}
1282
 
1283
\subsubsection{Operation}
1284
 
1285
\code{DST := RD1 mod RD2}
1286
 
1287
Modulo operation satisfies the following condition: if $Q=A/B$ and $R=A \mod B$, then $A=B \cdot Q+R$.
1288
 
1289
The result is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1290
 
1291
\subsection{\instr{mov} -- Move}
1292
\label{subsec:instr:mov}
1293
 
1294
\subsubsection{Syntax}
1295
 
1296
\code{\instr{mov} DST, RD1}
1297
 
1298
Alias for \code{\instr{add} DST, RD1, 0}
1299
 
1300
\subsection{\instr{mul} -- Multiply}
1301
\label{subsec:instr:mul}
1302
 
1303
Multiply two 32-bit values. The result is also 32-bit.
1304
 
1305
\subsubsection{Syntax}
1306
 
1307
\code{\instr{mul} DST, RD1, RD2}
1308
 
1309
\subsubsection{Encoding}
1310
 
1311
\code{010010 T1 T2 DST RD1 RD2}
1312
 
1313
Example: \code{\instr{mul} r2, r1, 3} $\rightarrow$ \code{0x4A020103}
1314
 
1315
\subsubsection{Operation}
1316
 
1317
\code{DST := RD1 * RD2}
1318
 
1319
Since the product width is the same as the operand width, the result of a multiplication does not depend on operand signedness.
1320
 
1321 6 ring0_mipt
\subsection{\instr{neg} -- Negate}
1322
\label{subsec:instr:neg}
1323
 
1324
\subsubsection{Syntax}
1325
 
1326
\code{\instr{neg} DST, RD2}
1327
 
1328
Alias for \code{\instr{sub} DST, 0, RD2}
1329
 
1330 2 ring0_mipt
\subsection{\instr{nop} -- No Operation}
1331
\label{subsec:instr:nop}
1332
 
1333
\subsubsection{Syntax}
1334
 
1335
\instr{nop}
1336
 
1337
\subsubsection{Encoding}
1338
 
1339
\code{000000 0 0 00000000 00000000 00000000}
1340
 
1341
\subsubsection{Operation}
1342
 
1343
This instruction does not alter the machine state.
1344
 
1345
\subsection{\instr{not} -- Bitwise Not}
1346
\label{subsec:instr:not}
1347
 
1348
\subsubsection{Syntax}
1349
 
1350
\code{\instr{not} DST, RD1}
1351
 
1352
Alias for \code{\instr{xor} DST, RD1, -1}.
1353
 
1354
\subsection{\instr{or} -- Bitwise Or}
1355
\label{subsec:instr:or}
1356
 
1357
\subsubsection{Syntax}
1358
 
1359
\code{\instr{or} DST, RD1, RD2}
1360
 
1361
\subsubsection{Encoding}
1362
 
1363
\code{011001 T1 T2 DST RD1 RD2}
1364
 
1365
Example: \code{\instr{or} r2, r1, 0x3F} $\rightarrow$ \code{0x6602013F}
1366
 
1367
\subsubsection{Operation}
1368
 
1369
\code{DST := RD1 $\lor$ RD2}
1370
 
1371
\subsection{\instr{ret} -- Return from Procedure}
1372
\label{subsec:instr:ret}
1373
 
1374
Return from a procedure.
1375
 
1376
\subsubsection{Syntax}
1377
 
1378
\instr{ret}
1379
 
1380
Alias for \code{\instr{jmp} rp}.
1381
 
1382
\subsection{\instr{sb} -- Store Byte}
1383
\label{subsec:instr:sb}
1384
 
1385
Store the lowest byte from the register to the specified address.
1386
 
1387
\subsubsection{Syntax}
1388
 
1389
\code{\instr{sb} RD1, RD2}
1390
 
1391
\subsubsection{Encoding}
1392
 
1393
\code{001110 1 T2 00000000 RD1 RD2}
1394
 
1395
RD1 must be a register.
1396
 
1397
Example: \code{\instr{sb} r2, r1} $\rightarrow$ \code{0x3B000201}
1398
 
1399
\subsubsection{Operation}
1400
 
1401
\code{*(BYTE*)RD1 := RD2 $\land$ 0x000000FF}
1402
 
1403
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1404
 
1405
\subsection{\instr{sl} -- Shift Left}
1406
\label{subsec:instr:sl}
1407
 
1408
\subsubsection{Syntax}
1409
 
1410
\code{\instr{sl} DST, RD1, RD2}
1411
 
1412
\subsubsection{Encoding}
1413
 
1414
\code{011100 T1 T2 DST RD1 RD2}
1415
 
1416
Example: \code{\instr{sl} r2, r1, 5} $\rightarrow$ \code{0x72020105}
1417
 
1418
\subsubsection{Operation}
1419
 
1420
\code{DST := RD1 << RD2}
1421
 
1422
The result is undefined if RD2 is outside the [0; 31] range.
1423
 
1424
\subsection{\instr{srs} -- Shift Right Signed}
1425
\label{subsec:instr:srs}
1426
 
1427
\subsubsection{Syntax}
1428
 
1429
\code{\instr{srs} DST, RD1, RD2}
1430
 
1431
\subsubsection{Encoding}
1432
 
1433
\code{011111 T1 T2 DST RD1 RD2}
1434
 
1435
Example: \code{\instr{srs} r2, r1, 5} $\rightarrow$ \code{0x7E020105}
1436
 
1437
\subsubsection{Operation}
1438
 
1439
\code{DST := ((\emph{signed}) RD1) >> RD2}
1440
 
1441
The result is undefined if RD2 is outside the [0; 31] range.
1442
 
1443
\subsection{\instr{sru} -- Shift Right Unsigned}
1444
\label{subsec:instr:sru}
1445
 
1446
\subsubsection{Syntax}
1447
 
1448
\code{\instr{sru} DST, RD1, RD2}
1449
 
1450
\subsubsection{Encoding}
1451
 
1452
\code{011110 T1 T2 DST RD1 RD2}
1453
 
1454
Example: \code{\instr{sru} r2, r1, 5} $\rightarrow$ \code{0x7A020105}
1455
 
1456
\subsubsection{Operation}
1457
 
1458
\code{DST := RD1 >> RD2}
1459
 
1460
The result is undefined if RD2 is outside the [0; 31] range.
1461
 
1462
\subsection{\instr{sub} -- Subtract}
1463
\label{subsec:instr:sub}
1464
 
1465
\subsubsection{Syntax}
1466
 
1467
\code{\instr{sub} DST, RD1, RD2}
1468
 
1469
\subsubsection{Encoding}
1470
 
1471
\code{010001 T1 T2 DST RD1 RD2}
1472
 
1473
Example: \code{\instr{sub} r2, r1, 5} $\rightarrow$ \code{0x46020105}
1474
 
1475
\subsubsection{Operation}
1476
 
1477
\code{DST := RD1 - RD2}
1478
 
1479
\subsection{\instr{sw} -- Store Word}
1480
\label{subsec:instr:sw}
1481
 
1482
Store the value of the register to the specified address.
1483
 
1484
\subsubsection{Syntax}
1485
 
1486
\code{\instr{sw} RD1, RD2}
1487
 
1488
\subsubsection{Encoding}
1489
 
1490
\code{001100 1 T2 00000000 RD1 RD2}
1491
 
1492
RD1 must be a register.
1493
 
1494
Example: \code{\instr{sw} r2, r1} $\rightarrow$ \code{0x33000201}
1495
 
1496
\subsubsection{Operation}
1497
 
1498
\code{*RD1 := RD2}
1499
 
1500
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1501
 
1502
\subsection{\instr{xor} -- Bitwise Exclusive Or}
1503
\label{subsec:instr:xor}
1504
 
1505
\subsubsection{Syntax}
1506
 
1507
\code{\instr{xor} DST, RD1, RD2}
1508
 
1509
\subsubsection{Encoding}
1510
 
1511
\code{011010 T1 T2 DST RD1 RD2}
1512
 
1513
Example: \code{\instr{xor} r2, r1, 0x3F} $\rightarrow$ \code{0x6A02013F}
1514
 
1515
\subsubsection{Operation}
1516
 
1517
\code{DST := RD1 $\oplus$ RD2}
1518
 
1519
}
1520
 
1521
\settocdepth{section}
1522
 
1523
\chapter{Instruction cycle counts}
1524
 
1525 6 ring0_mipt
Cycle counts for \lxp{} instructions are listed in Table \ref{tab:cycles}, based on an assumption that no pipeline stalls are caused by the instruction bus latency or cache misses. These data are provided for reference purposes; the software should not depend on them as they can change in future hardware revisions.
1526 2 ring0_mipt
 
1527
\begin{table}[htbp]
1528
        \centering
1529
        \caption{Instruction cycle counts}
1530
        \label{tab:cycles}
1531
        \begin{tabularx}{0.8\textwidth}{LLLL}
1532
                \toprule
1533 6 ring0_mipt
                Instruction & Cycles & Instruction & Cycles \\
1534 2 ring0_mipt
                \midrule
1535
                \instr{add} & 1 & \instr{modu} & 37 \\
1536
                \instr{and} & 1 & \instr{mov} & 1 \\
1537 6 ring0_mipt
                \instr{call} & 4 & \instr{mul} & 2, 6 or 34\footnotemark[3] \\
1538
                \instr{cjmp\emph{xxx}} & 5 or 2\footnotemark[1] & \instr{neg} & 1 \\
1539
                \instr{divs} & 36 & \instr{nop} & 1 \\
1540
                \instr{divu} & 36 & \instr{not} & 1 \\
1541
                \instr{hlt} & N/A & \instr{or} & 1 \\
1542
                \instr{jmp} & 4 & \instr{ret} & 4 \\
1543
                \instr{iret} & 4 & \instr{sb} & $\ge$ 2\footnotemark[2] \\
1544
                \instr{lc} & 2 & \instr{sl} & 2 \\
1545
                \instr{lcs} & 1 & \instr{srs} & 2 \\
1546 2 ring0_mipt
                \instr{lsb} & $\ge$ 3\footnotemark[2] & \instr{sru} & 2 \\
1547
                \instr{lub} & $\ge$ 3\footnotemark[2] & \instr{sub} & 1 \\
1548
                \instr{lw} & $\ge$ 3\footnotemark[2] & \instr{sw} & $\ge$ 2\footnotemark[2] \\
1549
                \instr{mods} & 37 & \instr{xor} & 1 \\
1550
                \bottomrule
1551
        \end{tabularx}
1552
\end{table}
1553
 
1554 6 ring0_mipt
\footnotetext[1]{Depends on whether the jump is taken or not.}
1555
\footnotetext[2]{Depends on the data bus latency.}
1556
\footnotetext[3]{Depends on the multiplier architecture. See Section \ref{sec:generics}.}
1557 2 ring0_mipt
 
1558
\chapter{LXP32 assembly language}
1559
\label{app:assemblylanguage}
1560
 
1561
This appendix defines the assembly language used by \lxp{} development tools.
1562
 
1563
\section{Comments}
1564
 
1565
\lxp{} assembly language supports C style comments that can span across multiple lines and single-line \cplusplus{} style comments:
1566
 
1567
\begin{codepar}\itshape
1568
    /*
1569
     * This is a comment.
1570
     */
1571
 
1572
    // This is also a comment
1573
\end{codepar}
1574
 
1575
From a parser's point of view comments are equivalent to whitespace.
1576
 
1577
\section{Literals}
1578
 
1579
\lxp{} assembly language uses numeric and string literals similar to those provided by the C programming language.
1580
 
1581
Numeric literals can take form of decimal, hexadecimal or octal numbers. Literals prefixed with \code{0x} are interpreted as hexadecimal, literals prefixed with \code{0} are interpreted as octal, other literals are interpreted as decimal. A numeric literal can also start with an unary plus or minus sign which is also considered a part of the literal.
1582
 
1583 6 ring0_mipt
String literals must be enclosed in double quotes. The most common escape sequences used in C are supported (Table \ref{tab:stringescape}). Note that strings are not null-terminated in the LXP32 assembly language; when required, terminating null character must be inserted explicitly.
1584 2 ring0_mipt
 
1585
\begin{table}[htbp]
1586
        \caption{Escape sequences used in string literals}
1587
        \label{tab:stringescape}
1588
        \begin{tabularx}{\textwidth}{lL}
1589
                \toprule
1590
                Sequence & Interpretation \\
1591
                \midrule
1592
                \code{\textbackslash\textbackslash} & Backslash character \\
1593
                \code{\textbackslash "} & Double quotation mark \\
1594
                \code{\textbackslash '} & Single quotation mark (can be also used directly) \\
1595
                \code{\textbackslash t} & Tabulation character \\
1596
                \code{\textbackslash n} & Line feed \\
1597
                \code{\textbackslash r} & Carriage return \\
1598
                \code{\textbackslash x\emph{XX}} & Character with a hexadecimal code of \emph{XX} (1--2 digits) \\
1599
                \code{\textbackslash \emph{XXX}} & Character with an octal code of \emph{XXX} (1--3 digits) \\
1600
                \bottomrule
1601
        \end{tabularx}
1602
\end{table}
1603
 
1604
\section{Symbols}
1605
\label{sec:symbols}
1606
 
1607 6 ring0_mipt
Symbols (labels) are used to refer to data or code locations. \lxp{} assembly language does not have distinct code and data labels: symbols are used in both these contexts.
1608 2 ring0_mipt
 
1609
Symbol names must be valid identifiers. A valid identifier must start with an alphabetic character or an underscore, and may contain alphanumeric characters and underscores.
1610
 
1611
A symbol definition must be the first token in a source code line followed by a colon. A symbol definition can occupy a separate line (in which case it refers to the following statement). Alternatively, a statement can follow the symbol definition on the same line.
1612
 
1613 6 ring0_mipt
Symbols can be used as operands to the \instr{lc} and \instr{lcs} instruction statements. A symbol reference can end with a \code{@\emph{n}} sequence, where \code{\emph{n}} is a numeric literal; in this case it is interpreted as an offset (in bytes) relative to the symbol definition. For the \instr{lcs} instruction, the resulting address must still fit into the sign extended 21-bit value range (\code{0x00000000}--\code{0x000FFFFF} or \code{0xFFF00000}--\code{0xFFFFFFFF}), otherwise the linker will report an error.
1614 2 ring0_mipt
 
1615 6 ring0_mipt
By default all symbols are local, that is, they can be only referenced from the module where they were defined. To make a symbol accessible from other modules, use the \instr{\#export} directive. To reference a symbol defined in another module use the \instr{\#import} directive.
1616 2 ring0_mipt
 
1617 6 ring0_mipt
A symbol named \code{entry} or \code{Entry} has a special meaning: it is used to inform the linker about the program entry point if there are multiple input files. It does not have to be exported. If defined, this symbol must precede the first instruction or data definition statement in the module. Only one module in the program can define the entry symbol.
1618
 
1619 2 ring0_mipt
\begin{codeparbreakable}
1620
    \instr{lc} r10, jump\_label
1621
    \instr{lc} r11, data\_word
1622
\emph{// ...}
1623
    \instr{sw} r11, r0 \emph{// store the value of r0 to the}
1624
               \emph{// location pointed by data\_word}
1625
    \instr{jmp} r10    \emph{// transfer execution to jump\_label}
1626
\emph{// ...}
1627
jump\_label:
1628
    \instr{mov} r1, r0
1629
\emph{// ...}
1630
data\_word:
1631
    \instr{.word} 0x12345678
1632
\end{codeparbreakable}
1633
 
1634
\section{Statements}
1635
 
1636
Each statement occupies a single source code line. There are three kinds of statements:
1637
 
1638
\begin{itemize}
1639
        \item \emph{Directives} provide directions for the assembler that do not directly cause code generation.
1640
        \item \emph{Data definition statements} insert arbitrary data to the generated code.
1641
        \item \emph{Instruction statements} insert \lxp{} CPU instructions to the generated code.
1642
\end{itemize}
1643
 
1644
\subsection{Directives}
1645
 
1646
The first token of a directive statement always starts with the \code{\#} character.
1647
 
1648
\begin{codepar}
1649
\instr{\#define} \emph{identifier} \emph{token} [ \emph{token} ... ]
1650
\end{codepar}
1651
 
1652
Defines a macro that will be substituted with one or more tokens. The \code{\emph{identifier}} must satisfy the requirements listed in Section \ref{sec:symbols}. Tokens can be anything, including keywords, identifiers, literals and separators (i.e. comma and colon characters).
1653
 
1654
\begin{codepar}
1655 6 ring0_mipt
\instr{\#export} \emph{identifier}
1656 2 ring0_mipt
\end{codepar}
1657
 
1658 6 ring0_mipt
Declares \code{\emph{identifier}} as an exported symbol. Exported symbols can be referenced by other modules.
1659 2 ring0_mipt
 
1660
\begin{codepar}
1661 6 ring0_mipt
\instr{\#import} \emph{identifier}
1662
\end{codepar}
1663
 
1664
Declares \code{\emph{identifier}} as an imported symbol. Used to refer to symbols exported by other modules.
1665
 
1666
\begin{codepar}
1667 2 ring0_mipt
\instr{\#include} \emph{filename}
1668
\end{codepar}
1669
 
1670
Processes \code{\emph{filename}} contents as it were literally inserted at the point of the \instr{\#include} directive. \code{\emph{filename}} must be a string literal.
1671
 
1672
\begin{codepar}
1673
\instr{\#message} \emph{msg}
1674
\end{codepar}
1675
 
1676
Prints \code{\emph{msg}} to the standard output stream. \code{\emph{msg}} must be a string literal.
1677
 
1678
\subsection{Data definition statements}
1679
 
1680
The first token of a data definition statement always starts with the \code{.} (period) character.
1681
 
1682
\begin{codepar}
1683
\instr{.align} [ \emph{alignment} ]
1684
\end{codepar}
1685
 
1686 6 ring0_mipt
Ensures that code generated by the next data definition or instruction statement is aligned to a multiple of \code{\emph{alignment}} bytes, inserting padding zeros if needed. \code{\emph{alignment}} must be a power of 2 and can't be less than 4. Default \code{\emph{alignment}} is 4. Instructions and words are always at least word-aligned; the \instr{.align} statement can be used to align them to a larger boundary, or to align byte data (see below).
1687 2 ring0_mipt
 
1688 6 ring0_mipt
The \instr{.align} statement is not guaranteed to work if the requested alignment is greater than the section alignment specified for the linker (see Subsection \ref{subsec:assemblercmdline}).
1689
 
1690 2 ring0_mipt
\begin{codepar}
1691
\instr{.byte} \emph{token} [, \emph{token} ... ]
1692
\end{codepar}
1693
 
1694
Inserts one or more bytes to the output code. Each \code{\emph{token}} can be either a numeric literal with a valid range of [-128; 255] or a string literal. By default, bytes are not aligned.
1695
 
1696 6 ring0_mipt
To define a null-terminated string, the terminating null character must be inserted explicitly.
1697
 
1698 2 ring0_mipt
\begin{codepar}
1699
\instr{.reserve} \emph{n}
1700
\end{codepar}
1701
 
1702
Inserts \code{\emph{n}} zero bytes to the output code.
1703
 
1704
\begin{codepar}
1705
\instr{.word} \emph{token} [, \emph{token} ... ]
1706
\end{codepar}
1707
 
1708
Inserts one or more 32-bit words to the output code. Tokens must be numeric literals.
1709
 
1710
\subsection{Instruction statements}
1711
 
1712
Instruction statements have the following general syntax:
1713
 
1714
\begin{codepar}
1715
    \instr{\emph{instruction}} [ \emph{operand} [, \emph{operand} ... ] ]
1716
\end{codepar}
1717
 
1718
Depending on the instruction, operands can be registers, numeric literals or symbols. Supported instructions are listed in Appendix \ref{app:instructionset}.
1719
 
1720
\chapter{WISHBONE datasheet}
1721
\label{app:wishbonedatasheet}
1722
 
1723
\section[Instruction bus (LXP32C only)]{Instruction bus (\lxp{}C only)}
1724
 
1725
\begin{ctabular}{ll}
1726
        \toprule
1727
        \tabcutin{2}{\makebox[0.9\textwidth][c]{General information}} \\
1728
        \midrule
1729
        WISHBONE revision & B3 \\
1730
        Type of interface & MASTER \\
1731
        Supported cycles  & BLOCK READ \\
1732
        \midrule
1733
        \tabcutin{2}{Signal names} \\
1734
        \midrule
1735
        \signal{clk\_i}       & CLK\_I \\
1736
        \signal{rst\_i}       & RST\_I \\
1737
        \signal{ibus\_cyc\_o} & CYC\_O \\
1738
        \signal{ibus\_stb\_o} & STB\_O \\
1739
        \signal{ibus\_cti\_o} & CTI\_O() \\
1740
        \signal{ibus\_bte\_o} & BTE\_O() \\
1741
        \signal{ibus\_ack\_i} & ACK\_I \\
1742
        \signal{ibus\_adr\_o} & ADR\_O() \\
1743
        \signal{ibus\_dat\_i} & DAT\_I() \\
1744
        \midrule
1745
        \tabcutin{2}{Supported tag signals} \\
1746
        \midrule
1747
        \signal{ibus\_cti\_o} & Cycle Type Identifier (address tag) \\
1748
        & \hspace{\parindent} ``010'' (Incrementing burst cycle) \\
1749
        & \hspace{\parindent} ``111'' (End-of-Burst) \\
1750
        \signal{ibus\_bte\_o} & Burst Type Extension (address tag) \\
1751
        & \hspace{\parindent} ``00'' (Linear burst) \\
1752
        \midrule
1753
        \tabcutin{2}{Dimensions} \\
1754
        \midrule
1755
        Port size & 32 \\
1756
        Port granularity & 32 \\
1757
        Maximum operand size & 32 \\
1758
        Data transfer ordering & BIG/LITTLE ENDIAN \\
1759
        Data transfer sequence & UNDEFINED \\
1760
        \bottomrule
1761
\end{ctabular}
1762
 
1763
\section{Data bus}
1764
 
1765
\begin{ctabular}{ll}
1766
        \toprule
1767
        \tabcutin{2}{\makebox[0.9\textwidth][c]{General information}} \\
1768
        \midrule
1769
        WISHBONE revision & B3 \\
1770
        Type of interface & MASTER \\
1771
        Supported cycles  & SINGLE READ/WRITE \\
1772
                          & RMW \\
1773
        \midrule
1774
        \tabcutin{2}{Signal names} \\
1775
        \midrule
1776
        \signal{clk\_i}       & CLK\_I \\
1777
        \signal{rst\_i}       & RST\_I \\
1778
        \signal{dbus\_cyc\_o} & CYC\_O \\
1779
        \signal{dbus\_stb\_o} & STB\_O \\
1780
        \signal{dbus\_we\_o}  & WE\_O \\
1781
        \signal{dbus\_sel\_o} & SEL\_O() \\
1782
        \signal{dbus\_ack\_i} & ACK\_I \\
1783
        \signal{dbus\_adr\_o} & ADR\_O() \\
1784
        \signal{dbus\_dat\_o} & DAT\_O() \\
1785
        \signal{dbus\_dat\_i} & DAT\_I() \\
1786
        \midrule
1787
        \tabcutin{2}{Dimensions} \\
1788
        \midrule
1789
        Port size & 32 \\
1790
        Port granularity & 8 \\
1791
        Maximum operand size & 32 \\
1792
        Data transfer ordering & LITTLE ENDIAN \\
1793
        Data transfer sequence & UNDEFINED \\
1794
        \bottomrule
1795
\end{ctabular}
1796
 
1797 6 ring0_mipt
\chapter{List of changes}
1798
 
1799
\section*{Version 1.1 (2019-01-11)}
1800
 
1801
This release introduces a minor but technically breaking hardware change: the START\_ADDR generic, which used to be 30-bit, has been for convenience extended to a full 32-bit word; the two least significant bits are ignored.
1802
 
1803
The other breaking change affects the assembly language syntax. Previously all symbols used to be public, and multiple modules could not define symbols with the same name. As of now only symbols explicitly exported using the \instr{\#export} directive are public. \instr{\#extern} directive has been replaced by \instr{\#import}.
1804
 
1805
Other notable changes include:
1806
 
1807
\begin{itemize}
1808
        \item A new instruction, \instr{lcs} (\instrname{Load Constant Short}), has been added, which loads a 21-bit sign extended constant to a register. Unlike \instr{lc}, it is encoded as a single word and takes one cycle to execute.
1809
        \item Optimizations in the divider unit. Division instructions (\instr{divs} and \instr{divu}) now take one fewer cycle to execute (modulo instructions are unaffected).
1810
        \item LXP32 assembly language now supports a new instruction alias, \instr{neg} (\instrname{Negate}), which is equivalent to \code{\instr{sub} dst, 0, src}.
1811
\end{itemize}
1812
 
1813
\section*{Version 1.0 (2016-02-20)}
1814
 
1815
Initial public release.
1816
 
1817 2 ring0_mipt
\end{document}

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.