OpenCores
URL https://opencores.org/ocsvn/lxp32/lxp32/trunk

Subversion Repositories lxp32

[/] [lxp32/] [trunk/] [doc/] [src/] [trm/] [lxp32-trm.tex] - Blame information for rev 12

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 9 ring0_mipt
% !TEX TS-program = lualatex
2
\documentclass[a4paper,12pt,twoside,extrafontsizes]{memoir}
3
 
4
\input{preamble.tex}
5
 
6
\begin{document}
7
 
8
\input{frontmatter.tex}
9
 
10
\mainmatter
11
 
12
\chapter{Introduction}
13
 
14
\section{Main features}
15
 
16
\lxp{} (\emph{Lightweight eXecution Pipeline}) is a small 32-bit CPU IP core optimized for FPGA implementation. Its key features include:
17
 
18
\begin{itemize}
19
        \item portability (described in behavioral VHDL-93, not tied to any particular vendor);
20
        \item 3-stage hazard-free pipeline;
21
        \item 256 registers implemented as a RAM block;
22
        \item a simple instruction set with only 30 distinct opcodes;
23
        \item separate instruction and data buses, optional instruction cache;
24
        \item WISHBONE compatibility;
25
        \item 8 interrupts with hardwired priorities;
26
        \item optional divider.
27
\end{itemize}
28
 
29
As a lightweight CPU core, \lxp{} lacks some features of more advanced processors, such as nested interrupt handling, debugging support, floating-point and memory management units. \lxp{} is based on an original ISA (Instruction Set Architecture) which does not currently have a C compiler. It can be programmed in the assembly language covered by Appendix \ref{app:assemblylanguage}.
30
 
31
Two major hardware versions of the CPU are provided: \lxp{}U which does not include an instruction cache and uses the Low Latency Interface (Section \ref{sec:lli}) to fetch instructions, and \lxp{}C which fetches instructions over a cached WISHBONE bus protocol. These versions are otherwise identical and have the same instruction set architecture.
32
 
33
\section{Implementation estimates}
34
 
35
Typical results of \lxp{} core FPGA implementation are presented in Table \ref{tab:implementation}. Note that these data are only useful as rough estimates, since actual results depend greatly on tool versions and configuration, design constraints, device utilization ratio and other factors.
36
 
37
Data on two configurations are provided:
38
 
39
\begin{itemize}
40
        \item \emph{Compact}: \lxp{}U (without instruction cache), no divider, 2-cycle multiplier.
41
        \item \emph{Full}: \lxp{}C (with instruction cache), divider, 2-cycle multiplier.
42
\end{itemize}
43
 
44
The slowest speed grade was used for clock frequency estimation.
45
 
46
\begin{table}[htbp]
47
        \caption{Typical results of \lxp{} core FPGA implementation}
48
        \label{tab:implementation}
49
        \begin{tabularx}{\textwidth}{Q{0.5\textwidth}LL}
50
                \toprule
51
                Resource & Compact & Full \\
52
                \midrule
53
                \multicolumn{3}{c}{Microsemi\textregistered{} IGLOO\textregistered{}2 M2GL005-FG484} \\
54
                \midrule
55
                Logic elements (LUT+DFF) & 1457 & 2086 \\
56
                \hspace*{1em}LUTs & 1421 & 1999 \\
57
                \hspace*{1em}Flip-flops & 706 & 1110 \\
58
                Mathblocks (MACC) & 3 & 3 \\
59
                RAM blocks (RAM1K18) & 2 & 3 \\
60
                Clock frequency & 107.7 MHz & 109.2 MHz \\
61
                \midrule
62
                \multicolumn{3}{c}{Xilinx\textregistered{} Artix\textregistered{}-7 xc7a15tfgg484-1} \\
63
                \midrule
64
                Slices & 235 & 365 \\
65
                \hspace*{1em}LUTs & 666 & 1011 \\
66
                \hspace*{1em}Flip-flops & 528 & 883 \\
67
                DSP blocks (DSP48E1) & 4 & 4 \\
68
                RAM blocks (RAMB18E1) & 2 & 3 \\
69
                Clock frequency & 111.9 MHz & 120.2 MHz \\
70
                \bottomrule
71
        \end{tabularx}
72
\end{table}
73
 
74
\section{Structure of this manual}
75
 
76
General description of the \lxp{} operation from a software developer's point of view can be found in Chapter \ref{ch:isa}, \styledtitleref{ch:isa}. Future versions of the \lxp{} CPU are intended to be at least backwards compatible with this architecture.
77
 
78
Topics related to hardware, such as synthesis, implementation and interfacing other IP cores, are covered in Chapter \ref{ch:integration}, \styledtitleref{ch:integration}. A brief description of the \lxp{} pipelined architecture is provided in Chapter \ref{ch:pipeline}, \styledtitleref{ch:pipeline}. The \lxp{} IP core package includes a verification environment (self-checking testbench) which can be used to simulate the design as described in Chapter \ref{ch:simulation}, \styledtitleref{ch:simulation}.
79
 
80
Documentation for tools shipped with the \lxp{} IP core package (assembler/linker, disassembler and interconnect generator) is provided in Chapter \ref{ch:developmenttools}, \styledtitleref{ch:developmenttools}.
81
 
82
Appendices include a detailed description of the \lxp{} instruction set, instruction cycle counts and \lxp{} assembly language definition. WISHBONE datasheet required by the WISHBONE specification is also provided.
83
 
84
\chapter{Instruction set architecture}
85
\label{ch:isa}
86
 
87
\section{Data format}
88
 
89
Most \lxp{} instructions work with 32-bit data words. A few instructions that address individual bytes use little-endian order, that is, the least significant byte is stored at the lowest address. Signed values are encoded in a 2's complement format.
90
 
91
\section{Instruction format}
92
\label{sec:instructionformat}
93
 
94
All \lxp{} instructions are encoded as 32-bit words, with the exception of \instr{lc} (\instrname{Load Constant}), which occupies two adjacent 32-bit words. Instructions in memory must be aligned to word boundaries.
95
 
96
Most arithmetic and logical instructions take two source operands and write the result to an independent destination register. General instruction format is presented on Figure \ref{fig:instructionformat}.
97
 
98
\begin{figure}[htbp]
99
        \centering
100
        \includegraphics[scale=1.2]{images/instructionformat.pdf}
101
        \caption{\lxp{} instruction format}
102
        \label{fig:instructionformat}
103
\end{figure}
104
 
105
This format includes the following fields:
106
 
107
\begin{enumerate}
108
        \item OPCODE -- a 6-bit instruction code (see Appendix \ref{app:instructionset}).
109
        \item T1 -- type of the RD1 field.
110
        \item T2 -- type of the RD2 field.
111
        \item DST -- register number (usually the destination register).
112
        \item RD1 -- register/direct operand 1.
113
        \item RD2 -- register/direct operand 2.
114
\end{enumerate}
115
 
116
Some of these fields may not have meaning for a particular instruction; such unused fields are replaced with zeros.
117
 
118
DST field specifies one of the 256 \lxp{} registers. RD1 and RD2 fields can denote either source register operands or direct (immediate) operands: if the corresponding T field is 1, RD value is a register number, otherwise it is interpreted as a direct signed byte in a 2's complement format (valid values range from -128 to 127).
119
 
120
For example, consider the following instruction that adds \code{10} to \code{r0} and writes the result to \code{r1}:
121
 
122
\begin{codepar}
123
    \instr{add} r1, r0, 10
124
\end{codepar}
125
 
126
In this example, OPCODE is \code{010000}, T1 is \code{1}, T2 is \code{0}, DST is \code{00000001}, RD1 is \code{00000000} and RD2 is \code{00001010}. Hence, the instruction is encoded as \code{0x4201000A}.
127
 
128
For convenience, some instructions have alias mnemonics. For example, \lxp{} does not have a distinct \instr{mov} opcode: instead, \code{\instr{mov} dst, src} is an alias for \code{\instr{add} dst, src, 0}.
129
 
130
A complete list of \lxp{} instructions is provided in Appendix \ref{app:instructionset}.
131
 
132
\section{Registers}
133
 
134
\lxp{} has 256 registers denoted as \code{r0} -- \code{r255}. The first 240 of them (from \code{r0} to \code{r239}) are general-purpose registers (GPR), the last 16 (from \code{r240} to \code{r255}) are special-purpose registers (SPR). For convenience, some special-purpose registers have alias names: for example, \code{r255} can be also referred to as \code{sp} (stack pointer). Special purpose registers are listed in Table \ref{tab:spr}. Some of these registers are reserved: the software should not access them.
135
 
136
\begin{table}[htbp]
137
        \caption{\lxp{} special-purpose registers}
138
        \label{tab:spr}
139
        \begin{tabularx}{\textwidth}{llL}
140
                \toprule
141
                Alias name & Generic name & Description \\
142
                \midrule
143
                \code{iv0} & \code{r240} & Interrupt vector 0 (Section \ref{sec:interrupthandling}) \\
144
                \code{iv1} & \code{r241} & Interrupt vector 1 (Section \ref{sec:interrupthandling}) \\
145
                \code{iv2} & \code{r242} & Interrupt vector 2 (Section \ref{sec:interrupthandling}) \\
146
                \code{iv3} & \code{r243} & Interrupt vector 3 (Section \ref{sec:interrupthandling}) \\
147
                \code{iv4} & \code{r244} & Interrupt vector 4 (Section \ref{sec:interrupthandling}) \\
148
                \code{iv5} & \code{r245} & Interrupt vector 5 (Section \ref{sec:interrupthandling}) \\
149
                \code{iv6} & \code{r246} & Interrupt vector 6 (Section \ref{sec:interrupthandling}) \\
150
                \code{iv7} & \code{r247} & Interrupt vector 7 (Section \ref{sec:interrupthandling}) \\
151
                \multicolumn{1}{l}{---} & \code{r248}\,--\,\code{r251} & \emph{Reserved} \\
152
                \code{cr}  & \code{r252} & Control register (Section \ref{sec:interrupthandling}) \\
153
                \code{irp} & \code{r253} & Interrupt return pointer (Section \ref{sec:interrupthandling}) \\
154
                \code{rp}  & \code{r254} & Return pointer (Section \ref{sec:callingprocedures})\\
155
                \code{sp}  & \code{r255} & Stack pointer (Section \ref{sec:stack}) \\
156
                \bottomrule
157
        \end{tabularx}
158
\end{table}
159
 
160
All registers are zero-initialized during the CPU reset.
161
 
162
\section{Addressing}
163
\label{sec:addressing}
164
 
165
All addressing in \lxp{} is indirect. In order to access a memory location, its address must be stored in a register; any available register can be used for this purpose.
166
 
167
\lxp{} uses a 32-bit address space. Each address refers to an individual byte. Some instructions, namely \instr{lsb} (\instrname{Load Signed Byte}), \instr{lub} (\instrname{Load Unsigned Byte}) and \instr{sb} (\instrname{Store Byte}) provide  byte-granular access, in which case all 32 bits in the address are significant. Otherwise the least two address bits are ignored as \lxp{} doesn't support unaligned access to 32-bit data words (during simulation, a warning is emitted if such a transaction is attempted).
168
 
169
A special rule applies to pointers that refer to instructions: since instructions are always word-aligned, the least significant bit is interpreted as the \code{IRF} (\emph{Interrupt Return Flag}). See Section \ref{sec:interrupthandling} for details.
170
 
171
\section{Stack}
172
\label{sec:stack}
173
 
174
The current pointer to the top of the stack is stored in the \code{sp} register. To the hardware this register is not different from general purpose registers, that is, in no situation does the CPU access the stack implicitly (procedure calls and interrupts use register-based conventions).
175
 
176
Software can access the stack as follows:
177
 
178
\begin{codepar}
179
    \emph{// push r0 on the stack}
180
    \instr{sub} sp, sp, 4
181
    \instr{sw} sp, r0
182
    \emph{// pop r0 from the stack}
183
    \instr{lw} r0, sp
184
    \instr{add} sp, sp, 4
185
\end{codepar}
186
 
187
Before using the stack, the \code{sp} register must be set up to point to a valid memory location. The simplest software can operate stackless, or even without data memory altogether if registers are enough to store the program state.
188
 
189
\section{Calling procedures}
190
\label{sec:callingprocedures}
191
 
192
\lxp{} provides a \instr{call} instruction which saves the address of the next instruction in the \code{rp} register and transfers execution to the address stored in the register operand. Return from a procedure is performed by the \code{\instr{jmp} rp} instruction which also has a \instr{ret} alias.
193
 
194
If a procedure must in turn call a nested procedure itself, the return address in the \code{rp} register will be overwritten by the \instr{call} instruction. Hence, unless it is a tail call (see below), the procedure must save the \code{rp} value somewhere; the most general solution is to use the stack:
195
 
196
\begin{codepar}
197
    \instr{sub} sp, sp, 4
198
    \instr{sw} sp, rp
199
    ...
200
    \instr{lc} r0, Nested_proc
201
    \instr{call} r0
202
    ...
203
    \instr{lw} rp, sp
204
    \instr{add} sp, sp, 4
205
    \instr{ret}
206
\end{codepar}
207
 
208
Procedures that don't use the \instr{call} instruction (sometimes called \emph{leaf procedures}) don't need to save the \code{rp} value.
209
 
210
Since \instr{ret} is just an alias for \code{\instr{jmp} rp}, one can also use \instrname{Compare and Jump} instructions (\instr{cjmp\emph{xxx}}) to perform a conditional procedure return. For example, consider the following procedure which calculates the absolute value of \code{r1}:
211
 
212
\begin{codepar}
213
Abs_proc:
214
    \instr{cjmpsge} rp, r1, 0 \emph{// return immediately if r1>=0}
215
    \instr{neg} r1, r1 \emph{// otherwise, negate r1}
216
    \instr{ret} \emph{// jmp rp}
217
\end{codepar}
218
 
219
A \emph{tail call} is a special type of procedure call where the calling procedure calls a nested procedure as the last action before return. In such cases the \instr{call} instruction can be replaced with \instr{jmp}, so that when the nested procedure executes \instr{ret}, it returns directly to the caller's parent procedure.
220
 
221
Although the \lxp{} architecture doesn't mandate any particular calling convention, some general recommendations are presented below:
222
 
223
\begin{enumerate}
224
        \item Pass arguments and return values through the \code{r1}--\code{r31} registers (a procedure can have multiple return values).
225
        \item If necessary, the \code{r0} register can be used to load the procedure address.
226
        \item Designate \code{r0}--\code{r31} registers as \emph{caller-saved}, that is, they are not guaranteed to be preserved during procedure calls and must be saved by the caller if needed. The procedure can use them for any purpose, regardless of whether they are used to pass arguments and/or return values.
227
\end{enumerate}
228
 
229
\section{Interrupt handling}
230
\label{sec:interrupthandling}
231
 
232
\subsection{Control register}
233
 
234 12 ring0_mipt
\lxp{} supports 8 interrupts with hardwired priority levels (interrupts with lower vector numbers have higher priority). Interrupt vectors (pointers to interrupt handlers) are stored in the \code{iv0}--\code{iv7} registers. Interrupt handling is controlled by the \code{cr} register (Table \ref{tab:cr}).
235 9 ring0_mipt
 
236
\begin{table}[htbp]
237
        \caption{Control register}
238
        \label{tab:cr}
239
        \begin{tabularx}{\textwidth}{lL}
240
                \toprule
241
                Bit & Description \\
242
                \midrule
243
 
244
                & \ldots \\
245
                7      & Enable interrupt 7 \\
246 12 ring0_mipt
                8      & Interrupt 0 wake-up flag \\
247 9 ring0_mipt
                & \ldots \\
248 12 ring0_mipt
                15     & Interrupt 7 wake-up flag \\
249 9 ring0_mipt
                31--16 & \emph{Reserved} \\
250
                \bottomrule
251
        \end{tabularx}
252
\end{table}
253
 
254 12 ring0_mipt
Disabled interrupts are ignored altogether: if the CPU receives an interrupt request signal while the corresponding interrupt is disabled, the interrupt will not be processed even if it is enabled later.
255 9 ring0_mipt
 
256 12 ring0_mipt
Wake-up flag marks the interrupt as a \emph{wake-up interrupt} (see below).
257
 
258 9 ring0_mipt
Like other registers, \code{cr} is zero-initialized during the CPU reset, meaning that no interrupts are initially enabled.
259
 
260
\subsection{Invoking interrupt handlers}
261
 
262
Interrupt handlers are invoked by the CPU similarly to procedures (Section \ref{sec:callingprocedures}), the difference being that in this case return address is stored in the \code{irp} register (as opposed to \code{rp}), and the least significant bit of the register (\code{IRF} -- \emph{Interrupt Return Flag}) is set.
263
 
264
An interrupt handler returns using the \code{\instr{jmp} irp} instruction which also has an \instr{iret} alias. Until the interrupt handler returns, the CPU will defer further interrupt processing (although incoming interrupt requests will still be registered). This also means that the \code{irp} register value will not be unexpectedly overwritten. When executing the \code{\instr{jmp} irp} instruction, the CPU will recognize the \code{IRF} flag and resume interrupt processing as usual. It is also possible to perform a conditional return from the interrupt handler, similarly to the technique described in Section \ref{sec:callingprocedures} for conditional procedure returns.
265
 
266 12 ring0_mipt
\subsection{Wake-up interrupts}
267
 
268
When a wake-up interrupt is received, the interrupt handler is not called, but the CPU still resumes execution if halted by the \instr{hlt} instruction. The effect is similar to invoking an interrupt with an empty handler (containing only \instr{iret}), but without the overhead of interrupt processing. Wake-up interrupts do not affect the CPU when it is not halted.
269
 
270
Unlike normal interrupts, wake-up interrupts are processed even when the CPU executes an interrupt handler for another interrupt.
271
 
272 9 ring0_mipt
\subsection{Non-returnable interrupts}
273
 
274
If an interrupt vector has the least significant bit (\code{IRF}) set, the CPU will resume interrupt processing immediately. One should not try to invoke \instr{iret} from such a handler since the \code{irp} register could have been overwritten by another interrupt. This technique can be useful when the CPU's only task is to process external events:
275
 
276
\begin{codeparbreakable}
277
\emph{// Set the IRF to mark the interrupt as non-returnable}
278
    \instr{lc} iv0, main\_loop@1
279
    \instr{mov} cr, 1 \emph{// enable the interrupt}
280
    \instr{hlt} \emph{// wait for an interrupt request}
281
main\_loop:
282
\emph{// Process the event...}
283
    \instr{hlt} \emph{// wait for the next interrupt request}
284
\end{codeparbreakable}
285
 
286
Note that \instr{iret} is never called in this example.
287
 
288
\chapter{Integration}
289
\label{ch:integration}
290
 
291
\section{Overview}
292
 
293
The \lxp{} IP core is delivered in a form of a synthesizable RTL description expressed in \mbox{VHDL-93}. It does not use any technology specific primitives and should work out of the box with major FPGA synthesis software. \lxp{} can be integrated in both VHDL and Verilog\textregistered{} based SoC designs.
294
 
295
Major \lxp{} hardware versions have separate top-level design units:
296
 
297
\begin{itemize}
298
        \item \shellcmd{lxp32u\_top} -- \lxp{}U (without instruction cache),
299
        \item \shellcmd{lxp32c\_top} -- \lxp{}C (with instruction cache).
300
\end{itemize}
301
 
302
A high level block diagram of the CPU is presented on Figure \ref{fig:blockdiagram}. Schematic symbols for \lxp{}U and \lxp{}C are shown on Figure \ref{fig:symbols}.
303
 
304
\begin{figure}[htbp]
305
        \centering
306
        \includegraphics[scale=0.85]{images/blockdiagram.pdf}
307
        \caption{\lxp{} CPU block diagram}
308
        \label{fig:blockdiagram}
309
\end{figure}
310
 
311
\begin{figure}[htbp]
312
        \centering
313
        \includegraphics[scale=0.85]{images/symbols.pdf}
314
        \caption{Schematic symbols for \lxp{}U and \lxp{}C}
315
        \label{fig:symbols}
316
\end{figure}
317
 
318
\lxp{}U uses the Low Latency Interface (LLI) described in Section \ref{sec:lli} to fetch instructions. This interface is designed to interact with low latency on-chip peripherals such as RAM blocks. It works best with slaves that can return the instruction on the next cycle after its address has been set, although the slave can still introduce wait states if needed. Low Latency Interface can be also connected to a custom (external) instruction cache.
319
 
320
To achieve the least possible latency, some LLI outputs are not registered. For this reason the LLI is not suitable for interaction with off-chip peripherals.
321
 
322
\lxp{}C is designed to work with high latency memory controllers and uses a simple instruction cache based on a ring buffer. The instructions are fetched over the WISHBONE instruction bus. To maximize throughput, the CPU makes use of the WISHBONE registered feedback signals [CTI\_O()] and [BTE\_O()]. All outputs on this bus are registered. This version is also recommended for use in situations where LLI combinatorial delays are unacceptable.
323
 
324
Both \lxp{}U and \lxp{}C use the WISHBONE protocol for the data bus.
325
 
326
\section{Ports}
327
 
328
\begin{ctabular}{lccl}
329
        \toprule
330
        Port & Direction & Bus width & Description \\
331
        \midrule
332
        \tabcutin{4}{Global signals} \\
333
        \midrule
334
        \signal{clk\_i} & in & 1 & System clock \\
335
        \signal{rst\_i} & in & 1 & Synchronous reset, active high \\
336
        \midrule
337
        \tabcutin{4}{Instruction bus -- Low Latency Interface (\lxp{}U only)} \\
338
        \midrule
339
        \signal{lli\_re\_o} & out & 1 & Read enable output, active high \\
340
        \signal{lli\_adr\_o} & out & 30 & Address output \\
341
        \signal{lli\_dat\_i} & in & 32 & Data input \\
342
        \signal{lli\_busy\_i} & in & 1 & Busy flag input, active high \\
343
        \midrule
344
        \tabcutin{4}{Instruction bus -- WISHBONE (\lxp{}C only)} \\
345
        \midrule
346
        \signal{ibus\_cyc\_o} & out & 1 & Cycle output \\
347
        \signal{ibus\_stb\_o} & out & 1 & Strobe output \\
348
        \signal{ibus\_cti\_o} & out & 3 & Cycle type identifier \\
349
        \signal{ibus\_bte\_o} & out & 2 & Burst type extension \\
350
        \signal{ibus\_ack\_i} & in & 1 & Acknowledge input \\
351
        \signal{ibus\_adr\_o} & out & 30 & Address output \\
352
        \signal{ibus\_dat\_i} & in & 32 & Data input \\
353
        \midrule
354
        \tabcutin{4}{Data bus} \\
355
        \midrule
356
        \signal{dbus\_cyc\_o} & out & 1 & Cycle output \\
357
        \signal{dbus\_stb\_o} & out & 1 & Strobe output \\
358
        \signal{dbus\_we\_o} & out & 1 & Write enable output \\
359
        \signal{dbus\_sel\_o} & out & 4 & Select output \\
360
        \signal{dbus\_ack\_i} & in & 1 & Acknowledge input \\
361
        \signal{dbus\_adr\_o} & out & 30 & Address output \\
362
        \signal{dbus\_dat\_o} & out & 32 & Data output \\
363
        \signal{dbus\_dat\_i} & in & 32 & Data input \\
364
        \midrule
365
        \tabcutin{4}{Other ports} \\
366
        \midrule
367
        \signal{irq\_i} & in & 8 & Interrupt requests \\
368
        \bottomrule
369
\end{ctabular}
370
 
371
\section{Generics}
372
\label{sec:generics}
373
 
374
The following generics can be used to configure the \lxp{} IP core parameters.
375
 
376
\subsection{DBUS\_RMW}
377
 
378
By default, \lxp{} uses the \signal{dbus\_sel\_o} (byte enable) port to perform byte-granular write transactions initiated by the \instr{sb} (\instrname{Store Byte}) instruction. If this option is set to \code{true}, \signal{dbus\_sel\_o} is always tied to \code{"1111"}, and byte-granular write access is performed using the RMW (read-modify-write) cycle. The latter method is slower, but can work with slaves that do not have the [SEL\_I()] port.
379
 
380
This feature is designed with the assumption that read and write transactions do not cause side effects, thus it can be unsuitable for some slaves.
381
 
382
\subsection{DIVIDER\_EN}
383
 
384
\lxp{} includes a divider unit which has quite a low performance but occupies a considerable amount of resources. It can be disabled by setting this option to \code{false}.
385
 
386
\subsection{IBUS\_BURST\_SIZE}
387
 
388
Instruction bus burst size. Default value is 16. Only for \lxp{}C.
389
 
390
\subsection{IBUS\_PREFETCH\_SIZE}
391
 
392
Number of words that the instruction cache will read ahead from the current instruction pointer. Default value is 32. Only for \lxp{}C.
393
 
394
\subsection{MUL\_ARCH}
395
 
396
\lxp{} provides three multiplier options:
397
 
398
\begin{itemize}
399
        \item \code{"dsp"} is the fastest architecture designed for technologies that provide fast parallel $16 \times 16$ multipliers, which includes most modern FPGA families. One multiplication takes 2 clock cycles.
400
        \item \code{"opt"} architecture uses a semi-parallel multiplication algorithm based on carry-save accumulation of partial products. It is designed for technologies that do not provide fast $16 \times 16$ multipliers. One multiplication takes 6 clock cycles.
401
        \item \code{"seq"} is a fully sequential design. One multiplication takes 34 clock cycles.
402
\end{itemize}
403
 
404
The default multiplier architecture is \code{"dsp"}. This option is recommended for most modern FPGA devices regardless of optimization goal since it is not only the fastest, but also occupies the least amount of general-purpose logic resources. However, it will create a timing bottleneck on technologies that lack fast multipliers.
405
 
406
For older FPGA families that don't provide dedicated multipliers the \code{"opt"} architecture can be used if decent throughput is still needed. It is designed to avoid creating a timing bottleneck on such technologies. Alternatively, \code{"seq"} architecture can be used when throughput is not a concern.
407
 
408
\subsection{START\_ADDR}
409
 
410
Address of the first instruction to be executed after CPU reset. Default value is \code{0}. The two least significant bits are ignored as instructions are always word-aligned.
411
 
412
\section{Clock and reset}
413
\label{sec:clockreset}
414
 
415
All flip-flops in the CPU are triggered by a rising edge of the \signal{clk\_i} signal. No specific requirements are imposed on the \signal{clk\_i} signal apart from usual constraints on setup and hold times.
416
 
417
\lxp{} is reset synchronously when the \signal{rst\_i} signal is asserted. If the system reset signal comes from an asynchronous source, a synchronization circuit must be used; an example of such a circuit is shown on Figure \ref{fig:resetsync}.
418
 
419
\begin{figure}[htbp]
420
        \centering
421
        \includegraphics[scale=1]{images/resetsync.pdf}
422
        \caption{Reset synchronization circuit}
423
        \label{fig:resetsync}
424
\end{figure}
425
 
426
In SRAM-based FPGAs flip-flops and RAM blocks have deterministic state after a bitstream is loaded. On such technologies \lxp{} can operate without reset. In this case the \signal{rst\_i} port can be tied to a logical \code{0} in the RTL design to allow the synthesizer to remove redundant logic.
427
 
428
\signal{clk\_i} and \signal{rst\_i} signals also serve the role of [CLK\_I] and [RST\_I] WISHBONE signals, respectively, for both instruction and data buses.
429
 
430
\section{Low Latency Interface}
431
\label{sec:lli}
432
 
433
Low Latency Interface (LLI) is a simple pipelined synchronous protocol with a typical latency of 1 cycle used by \lxp{}U to fetch instructions. It was designed to allow simple connection of the CPU to on-chip program RAM or cache. The timing diagram of the LLI is shown on Figure \ref{fig:llitiming}.
434
 
435
\begin{figure}[htbp]
436
        \centering
437
        \includegraphics[scale=1]{images/llitiming.pdf}
438
        \caption{Low Latency Interface timing diagram (\lxp{}U)}
439
        \label{fig:llitiming}
440
\end{figure}
441
 
442
To request a word, the master produces its address on \signal{lli\_adr\_o} and asserts \signal{lli\_re\_o}. The request is considered valid when \signal{lli\_re\_o} is high and \signal{lli\_busy\_i} is low on the same clock cycle. On the next cycle after a valid request, the slave must either produce data on \signal{lli\_dat\_i} or assert \signal{lli\_busy\_i} to indicate that data are not ready. \signal{lli\_busy\_i} must be held high until the valid data are present on the \signal{lli\_dat\_i} port.
443
 
444
The data provided by the slave are only required to be valid on the next cycle after a valid request (if \signal{lli\_busy\_i} is not asserted) or on the cycle when \signal{lli\_busy\_i} is deasserted after being held high. Otherwise \signal{lli\_dat\_i} is undefined.
445
 
446
The values of \signal{lli\_re\_o} and \signal{lli\_adr\_o} are not guaranteed to be preserved by the master while the slave is busy.
447
 
448
The simplest slaves such as on-chip RAM blocks which are never busy can be trivially connected to the LLI by connecting address, data and read enable ports and tying the \signal{lli\_busy\_i} signal to a logical \code{0} (you can even ignore \signal{lli\_re\_o} in this case, although doing so can theoretically increase power consumption).
449
 
450
Since the \signal{lli\_re\_o} output signal is not registered, this interface is not suitable for interaction with off-chip peripherals. Also, care should be taken to avoid introducing too much additional combinatorial delay on its outputs.
451
 
452
The instruction bus, whether LLI or WISHBONE, doesn't support access to individual bytes and uses a 30-bit address port to address 32-bit words (instructions are always word-aligned). The lower two bits of the 32-bit address are ignored for the purpose of addressing. Consider the following example:
453
 
454
\begin{codeparbreakable}
455
    \instr{lc} r0, 0x10000000
456
    \instr{jmp} r0
457
\emph{// 0x04000000 will appear on lli_adr_o or ibus_adr_o}
458
\end{codeparbreakable}
459
 
460
\section{WISHBONE instruction bus}
461
 
462
The \lxp{}C CPU fetches instructions over the WISHBONE bus. Its parameters are defined in the WISHBONE datasheet (Appendix \ref{app:wishbonedatasheet}). For a detailed description of the bus protocol refer to the WISHBONE specification, revision B3.
463
 
464
With classic WISHBONE handshake decent throughput can be only achieved when the slave is able to terminate cycles asynchronously. It is usually possible only for the simplest slaves which should probably be using the Low Latency Interface instead. To maximize throughput for complex, high latency slaves, \lxp{}C instruction bus uses optional WISHBONE address tags [CTI\_O()] (Cycle Type Identifier) and [BTE\_O()] (Burst Type Extension). These signals are hints allowing the slave to predict the address that will be set by the master in the next cycle and prepare data in advance. The slave can ignore these hints, processing requests as classic WISHBONE cycles, although performance would almost certainly suffer in this case.
465
 
466
A typical \lxp{}C instruction bus burst timing diagram is shown on Figure \ref{fig:ibustiming}.
467
 
468
\begin{figure}[htbp]
469
        \centering
470
        \includegraphics[scale=0.786]{images/ibustiming.pdf}
471
        \caption{Typical WISHBONE instruction bus burst (\lxp{}C)}
472
        \label{fig:ibustiming}
473
\end{figure}
474
 
475
\section{WISHBONE data bus}
476
 
477
\lxp{} uses the WISHBONE bus to interact with data memory and other peripherals. This bus is distinct from the instruction bus; its parameters are defined in the WISHBONE datasheet (Appendix \ref{app:wishbonedatasheet}).
478
 
479
The data bus uses a 30-bit \signal{dbus\_adr\_o} port to address 32-bit words; the \signal{dbus\_sel\_o} port is used to select individual bytes to be written or read. The upper 30 bits of the address appear on the \signal{dbus\_adr\_o} port, while the lower two bits are decoded to create a 4-bit \signal{dbus\_sel\_o} signal. Consider:
480
 
481
\begin{codeparbreakable}
482
    \instr{lc} r0, 0x20000002
483
    \instr{sb} r0, 0x55
484
\emph{// write 0x55 to the address in r0}
485
\emph{// 0x08000000 will appear on dbus_adr_o}
486
\emph{// 0x4 will appear on dbus_sel_o}
487
\end{codeparbreakable}
488
 
489
The byte-granular access feature is optional. If it is not needed, the \signal{dbus\_sel\_o} port can be left unconnected. It is also possible to set the \code{DBUS\_RMW} generic to \code{true} to enable byte-granular access emulation using the read-modify-write (RMW) cycle, which works even if the interconnect or slave doesn't provide the [SEL\_I()] port (Section \ref{sec:generics}).
490
 
491
For a detailed description of the bus protocol refer to the WISHBONE specification, revision B3.
492
 
493
Typical timing diagrams for write and read cycles are shown on Figure \ref{fig:dbustiming}. In these examples the peripheral terminates the cycle asynchronously; however, it can also introduce wait states by delaying the \signal{dbus\_ack\_i} signal.
494
 
495
\begin{figure}[htbp]
496
        \centering
497
        \includegraphics[scale=0.928]{images/dbustiming.pdf}
498
        \caption{Typical WISHBONE data bus WRITE and READ cycles}
499
        \label{fig:dbustiming}
500
\end{figure}
501
 
502
\section{Interrupts}
503 12 ring0_mipt
\label{sec:interrupts}
504 9 ring0_mipt
 
505
\lxp{} registers an interrupt condition when the corresponding request signal goes from \code{0} to \code{1}. Transitions from \code{1} to \code{0} are ignored. All interrupt request signals must be synchronous with the system clock (\signal{clk\_i}); if coming from an asynchronous source, they must be synchronized using a sequence of at least two flip-flops clocked by \signal{clk\_i}. These flip-flops are not included in the \lxp{} core in order not to increase interrupt processing delay for interrupt sources that are inherently synchronous. Failure to properly synchronize interrupt request signals will cause timing violations that will manifest itself as intermittent, hard to debug faults.
506
 
507
\section{Synthesis and optimization}
508
\label{sec:synthesis}
509
 
510
\subsection{Technology specific primitives}
511
 
512
\lxp{} RTL design is described in behavioral VHDL. However, it can also benefit from certain special resources provided by most FPGA devices, namely, RAM blocks and dedicated multipliers. For improved portability, hardware description that can potentially be mapped to such resources is localized in separate design units:
513
 
514
\begin{itemize}
515
        \item \shellcmd{lxp32\_ram256x32} -- a dual-port synchronous $256 \times 32$ bit RAM with one write port and one read port;
516
        \item \shellcmd{lxp32\_mul16x16} -- an unsigned $16 \times 16$ multiplier with an output register.
517
\end{itemize}
518
 
519
These design units contain behavioral description of respective hardware that is recognizable by FPGA synthesis tools. Usually no adjustments are needed as the synthesizer will automatically infer an appropriate primitive from its behavioral description. If automatic inference produces unsatisfactory results, these design units can be replaced with library element wrappers. The same is true for ASIC logic synthesis software which is unlikely to infer complex primitives.
520
 
521
\lxp{} implements its own bypass logic dealing with situations when RAM read and write addresses collide. It does not depend on the read/write conflict resolution behavior of the underlying primitive.
522
 
523
\subsection{General optimization guidelines}
524
 
525
This subsection contains general advice on achieving satisfactory synthesis results regardless of the optimization goal. Some of these suggestions are also mentioned in other parts of this manual.
526
 
527
\begin{enumerate}
528
        \item If the technology doesn't provide dedicated multiplier resources, consider using \code{"opt"} or \code{"seq"} multiplier architecture (Section \ref{sec:generics}).
529
 
530
        \item Ensure that the instruction bus has adequate throughput. For \lxp{}C, check that the slave supports the WISHBONE registered feedback signals [CTI\_I()] and [BTE\_I()].
531
 
532
        \item Multiplexing instruction and data buses, or connecting them to the same interconnect that allows only one master at a time to be active (i.e. \emph{shared bus} interconnect topology) is not recommended. If you absolutely must do so, assign a higher priority level to the data bus, otherwise instruction prefetches will massively slow down data transactions.
533
 
534
        \item For small programs, consider mapping code and data memory to the beginning or end of the address space (i.e. \code{0x00000000}--\code{0x000FFFFF} or \code{0xFFF00000}--\code{0xFFFFFFFF}) to be able to load pointers with the \instr{lcs} instruction which saves both memory and CPU cycles as compared to \instr{lc}.
535
\end{enumerate}
536
 
537
\subsection{Optimizing for timing}
538
 
539
\begin{enumerate}
540
        \item Set up reasonable timing constraints. Do not overconstrain the design by more than 10--15~\%.
541
 
542
        \item Analyze the worst path. The natural \lxp{} timing bottleneck usually goes from the scratchpad (register file) output through the ALU (in the Execute stage) to the scratchpad input. If timing analysis lists other critical paths, the problem can lie elsewhere. If the \signal{rst\_i} signal becomes a bottleneck, promote it to a global network or, with SRAM-based FPGAs, consider operating without reset (see Section \ref{sec:clockreset}). Critical paths affecting the WISHBONE state machines could indicate problems with interconnect performance.
543
 
544
        \item Configure the synthesis tool to reduce the fanout limit. Note that setting this limit to a too small value can lead to an opposite effect.
545
 
546
        \item Synthesis tools can support additional options to improve timing, such as the \emph{Retiming} algorithm which rearranges registers and combinatorial logic across the pipeline in attempt to balance delays. The efficiency of such algorithms is not very predictable. In general, sloppy designs are the most likely to benefit from it, while for a carefully designed circuit timing can sometimes get worse.
547
\end{enumerate}
548
 
549
\subsection{Optimizing for area}
550
 
551
\begin{enumerate}
552
        \item Consider disabling the divider if not using it (see Section \ref{sec:generics}).
553
 
554
        \item Relaxing timing constraints can sometimes allow the synthesizer to produce a more area-efficient circuit.
555
 
556
        \item Increase the fanout limit in the synthesizer settings to reduce buffer replication.
557
\end{enumerate}
558
 
559
\chapter{Hardware architecture}
560
\label{ch:pipeline}
561
 
562
The \lxp{} CPU is based on a 3-stage hazard-free pipelined architecture and uses a large RAM-based register file (scratchpad) with two read ports and one write port. The pipeline includes the following stages:
563
 
564
\begin{itemize}
565
        \item\emph{Fetch} -- fetches instructions from the program memory.
566
        \item\emph{Decode} -- decodes instructions and reads register operand values from the scratchpad.
567
        \item\emph{Execute} -- executes instructions and writes the results (if any) to the scratchpad.
568
\end{itemize}
569
 
570
\lxp{} instructions are encoded in such a way that operand register numbers can be known without decoding the instruction (Section \ref{sec:instructionformat}). When the \emph{Fetch} stage produces an instruction, scratchpad input addresses are set immediately, before the instruction itself is decoded. If the instruction does not use one or both of the register operands, the corresponding data read from the scratchpad are discarded. Collision bypass logic in the scratchpad detects situations where the \emph{Decode} stage tries to read a register which is currently being written by the \emph{Execute} stage and forwards its value, bypassing the RAM block and avoiding Read After Write (RAW) pipeline hazards. Other types of data hazards are also impossible with this architecture.
571
 
572
As an example, consider the following simple code chunk:
573
 
574
\begin{codepar}
575
    \instr{mov} r0, 10 \emph{// alias for add r0, 10, 0}
576
    \instr{mov} r1, 20 \emph{// alias for add r1, 20, 0}
577
    \instr{add} r2, r0, r1
578
\end{codepar}
579
 
580
Table \ref{tab:examplepipeline} illustrates how this chunk is processed by the \lxp{} pipeline. Note that on the fourth cycle the \emph{Decode} stage requests the \code{r1} register value while the \emph{Execute} stage writes to the same register. Collision bypass logic in the scratchpad ensures that the \emph{Decode} stage reads the correct (new) value of \code{r1} without stalling the pipeline.
581
 
582
\begin{table}[htbp]
583
        \caption{Example of the \lxp{} pipeline operation}
584
        \small
585
        \label{tab:examplepipeline}
586
        \begin{tabularx}{\textwidth}{lllL}
587
                \toprule
588
                Cycle & Fetch & Decode & Execute \\
589
                \midrule
590
                1 & \code{\instr{add} r0, 10, 0} & & \\
591
                \midrule
592
                2 & \code{\instr{add} r1, 20, 0} & \code{\instr{add} r0, 10, 0} & \\
593
                  & & Request \code{r10} (discarded) & \\
594
                  & & Request \code{r0} (discarded) & \\
595
                  & & Pass 10 and 0 as operands & \\
596
                \midrule
597
                3 & \code{\instr{add} r2, r0, r1} & \code{\instr{add} r1, 20, 0} & Perform the addition \\
598
                  & & Request \code{r20} (discarded) & Write 10 to \code{r0} \\
599
                  & & Request \code{r0} (discarded) & \\
600
                  & & Pass 20 and 0 as operands & \\
601
                \midrule
602
                4 & & \code{\instr{add} r2, r0, r1} & Perform the addition \\
603
                  & & Request \code{r0} & Write 20 to \code{r1} \\
604
                  & & Request \code{r1} (bypass) & \\
605
                  & & Pass 10 and 20 as operands & \\
606
                \midrule
607
                5 & & & Perform the addition \\
608
                  & & & Write 30 to \code{r2} \\
609
                \bottomrule
610
        \end{tabularx}
611
\end{table}
612
 
613
When an instruction takes more than one cycle to execute, the \emph{Execute} stage simply stalls the pipeline.
614
 
615
Branch hazards are impossible in \lxp{} as well since the pipeline is flushed whenever an execution transfer occurs.
616
 
617
\chapter{Simulation}
618
\label{ch:simulation}
619
 
620
\lxp{} package includes an automated verification environment (self-checking testbench) which verifies the \lxp{} CPU functional correctness. The environment consists of two major parts: a test platform which is a SoC-like design providing peripherals for the CPU to interact with, and the testbench itself which loads test firmware and monitors the platform's output signals. Like the CPU itself, the test environment is written in VHDL-93.
621
 
622
A separate testbench for the instruction cache (\shellcmd{lxp32\_icache}) is also provided. It can be invoked similarly to the main CPU testbench.
623
 
624
\section{Requirements}
625
 
626
The following software is required to simulate the \lxp{} design:
627
 
628
\begin{itemize}
629
        \item An HDL simulator supporting VHDL-93. \lxp{} package includes scripts (makefiles) for the following simulators:
630
 
631
        \begin{itemize}
632
                \item GHDL -- a free and open-source VHDL simulator which supports multiple operating systems\footnote{\url{http://ghdl.free.fr/}};
633
                \item Mentor Graphics\textregistered{} ModelSim\textregistered{} simulator (\shellcmd{vsim});
634
                \item Xilinx\textregistered{} Vivado\textregistered{} Simulator (\shellcmd{xsim}).
635
        \end{itemize}
636
 
637
        With GHDL, a waveform viewer such as GTKWave is also recommended (Figure \ref{fig:gtkwave})\footnote{\url{http://gtkwave.sourceforge.net/}}.
638
 
639
        Some FPGA vendors provide limited versions of the ModelSim\textregistered{} simulator for free as parts of their design suites. These versions should suffice for \lxp{} simulation.
640
 
641
        Other simulators can be used with some preparations (Section \ref{sec:simmanual}).
642
 
643
        \item GNU \shellcmd{make} and \shellcmd{coreutils} are needed to simulate the design using the provided makefiles. Under Microsoft\textregistered{} Windows\textregistered{}, MSYS or Cygwin can be used.
644
        \item \lxp{} assembler/linker program (\shellcmd{lxp32asm}) must be present (Section \ref{sec:lxp32asm}). A prebuilt executable for Microsoft\textregistered{} Windows\textregistered{} is already included in the \lxp{} package, for other operating systems \shellcmd{lxp32asm} must be built from source (Section \ref{sec:buildfromsource}).
645
\end{itemize}
646
 
647
\begin{figure}[htbp]
648
        \centering
649
        \includegraphics[scale=0.65]{images/gtkwave.png}
650
        \caption{GTKWave displaying the \lxp{} waveform dump produced by GHDL}
651
        \label{fig:gtkwave}
652
\end{figure}
653
 
654
\section{Running simulation using makefiles}
655
 
656
To simulate the design, go to the \shellcmd{verify/lxp32/run/<\emph{simulator}>} directory and run \shellcmd{make}. The following make targets are supported:
657
 
658
\begin{itemize}
659
        \item \shellcmd{batch} -- simulate the design in batch mode. Results will be written to the standard output. This is the default target.
660
        \item \shellcmd{gui} -- simulate the design in GUI mode. Note: since GHDL doesn't have a GUI, the simulation itself will be run in batch mode; upon a completion, GTKWave will be run automatically to display the dumped waveforms.
661
        \item \shellcmd{compile} -- compile only, don't run simulation.
662
        \item \shellcmd{clean} -- delete all the produced artifacts.
663
\end{itemize}
664
 
665
\section{Running simulation manually}
666
\label{sec:simmanual}
667
 
668
\lxp{} testbench can be also run manually. The following steps must be performed:
669
 
670
\begin{enumerate}
671
        \item Compile the test firmware in the \shellcmd{verify/lxp32/src/firmware} directory:
672
 
673
        \begin{codepar}
674
    lxp32asm -f textio \emph{filename}.asm -o \emph{filename}.ram
675
        \end{codepar}
676
 
677
        Produced \shellcmd{*.ram} files must be placed to the simulator's working directory.
678
        \item Compile the \lxp{} RTL description (\shellcmd{rtl} directory).
679
        \item Compile the common package (\shellcmd{verify/common\_pkg}).
680
        \item Compile the test platform (\shellcmd{verify/lxp32/src/platform} directory).
681
        \item Compile the testbench itself (\shellcmd{verify/lxp32/src/tb} directory).
682
        \item Simulate the \shellcmd{tb} design unit defined in the \shellcmd{tb.vhd} file.
683
\end{enumerate}
684
 
685
\section{Testbench parameters}
686
 
687
Simulation parameters can be configured by overriding generics defined by the \shellcmd{tb} design unit:
688
 
689
\begin{itemize}
690
        \item \code{CPU\_DBUS\_RMW} -- \code{DBUS\_RMW} CPU generic value (see Section \ref{sec:generics}).
691
        \item \code{CPU\_MUL\_ARCH} -- \code{MUL\_ARCH} CPU generic value (see Section \ref{sec:generics}).
692
        \item \code{MODEL\_LXP32C} -- simulate the \lxp{}C version. By default, this option is set to \code{true}. If set to \code{false}, \lxp{}U is simulated instead.
693
        \item \code{TEST\_CASE} -- if set to a non-empty string, specifies the file name of a test case to run. If set to an empty string (default), all tests are executed.
694
        \item \code{THROTTLE\_DBUS} -- perform pseudo-random data bus throttling. By default, this option is set to \code{true}.
695
        \item \code{THROTTLE\_IBUS} -- perform pseudo-random instruction bus throttling. By default, this option is set to \code{true}.
696
        \item \code{VERBOSE} -- print more messages.
697
\end{itemize}
698
 
699
\chapter{Development tools}
700
\label{ch:developmenttools}
701
 
702
\section{\shellcmd{lxp32asm} -- Assembler and linker}
703
\label{sec:lxp32asm}
704
 
705
\shellcmd{lxp32asm} is a combined assembler and linker for the \lxp{} platform. It takes one or more input files and produces executable code for the CPU. Input files can be either source files in the \lxp{} assembly language (Appendix \ref{app:assemblylanguage}) or \emph{linkable objects}. Linkable object is a relocatable format for storing compiled \lxp{} code together with symbol information.
706
 
707
\shellcmd{lxp32asm} operates in two stages:
708
 
709
\begin{enumerate}
710
        \item Compile.
711
 
712
        Source files are compiled to linkable objects.
713
 
714
        \item Link.
715
 
716
        Linkable objects are combined into a single executable module. References to symbols defined in external modules are resolved at this stage.
717
\end{enumerate}
718
 
719
In the simplest case there is only one input source file which doesn't contain external symbol references. If there are multiple input files, one of them must define the \code{entry} (or \code{Entry}) symbol at the beginning of the code.
720
 
721
\subsection{Command line syntax}
722
\label{subsec:assemblercmdline}
723
 
724
\begin{codepar}
725
    lxp32asm [ \emph{options} | \emph{input files} ]
726
\end{codepar}
727
 
728
\subsubsection{General options}
729
 
730
\begin{itemize}
731
        \item \shellcmd{-c} -- compile only (skip the Link stage).
732
 
733
        \item \shellcmd{-h}, \shellcmd{--help} -- display a short help message and exit.
734
 
735
        \item \shellcmd{-o \emph{file}} -- output file name.
736
 
737
        \item \shellcmd{--} -- do not interpret the subsequent command line arguments as options. Can be used if there are input file names starting with a dash.
738
\end{itemize}
739
 
740
\subsubsection{Compiler options}
741
 
742
\begin{itemize}
743
        \item \shellcmd{-i \emph{dir}} -- add \emph{dir} to the list of directories used to search for included files. Multiple directories can be specified with multiple \shellcmd{-i} arguments.
744
\end{itemize}
745
 
746
\subsubsection{Linker options (ignored in compile-only mode)}
747
 
748
\begin{itemize}
749
        \item \shellcmd{-a \emph{align}} -- object alignment. Must be a power of 2 and can't be less than 4. Default value is 4.
750
 
751
        \item \shellcmd{-b \emph{addr}} -- base address, that is, the address in memory where the executable image will be located. Must be a multiple of object alignment. Default value is 0.
752
 
753
        \item \shellcmd{-f \emph{fmt}} -- executable image format. See below for the list of supported formats.
754
 
755
        \item \shellcmd{-m \emph{file}} -- generate a map file. A map file is a human-readable list of all object and symbol addresses in the executable image.
756
 
757
        \item \shellcmd{-s \emph{size}} -- size of the executable image. Must be a multiple of 4. If total code size is less than the specified value, the executable image is padded with zeros. By default, the image is not padded.
758
\end{itemize}
759
 
760
\subsection{Output formats}
761
 
762
Output formats that can be specified with the \shellcmd{-f} command line option are listed below.
763
 
764
\begin{itemize}
765
        \item \shellcmd{bin} -- raw binary image (little-endian). This is the default format.
766
        \item \shellcmd{textio} -- text format representing binary data as a sequence of zeros and ones. This format can be directly read from VHDL (using the \code{std.textio} package) or Verilog\textregistered{} (using the \code{\$readmemb} function).
767
        \item \shellcmd{dec} -- text format representing each word as a decimal number.
768
        \item \shellcmd{hex} -- text format representing each word as a hexadecimal number.
769
\end{itemize}
770
 
771
\section{\shellcmd{lxp32dump} -- Disassembler}
772
 
773
\shellcmd{lxp32dump} takes an executable image and produces a source file in \lxp{} assembly language. The produced file is a valid program that can be compiled by \shellcmd{lxp32asm}.
774
 
775
\subsection{Command line syntax}
776
 
777
\begin{codepar}
778
    lxp32dump [ \emph{options} | \emph{input file} ]
779
\end{codepar}
780
 
781
Supported options are:
782
 
783
\begin{itemize}
784
        \item \shellcmd{-b \emph{addr}} -- executable image base address, only used for comments.
785
 
786
        \item \shellcmd{-f \emph{fmt}} -- input file format. All \shellcmd{lxp32asm} output formats are supported. If this option is not supplied, autodetection is performed.
787
 
788
        \item \shellcmd{-h}, \shellcmd{--help} -- display a short help message and exit.
789
 
790
        \item \shellcmd{-na} -- do not use instruction aliases (such as \instr{mov}, \instr{ret}, \instr{not}) and register aliases (such as \code{sp}, \code{rp}).
791
 
792
        \item \shellcmd{-o \emph{file}} -- output file name. By default, the standard output stream is used.
793
 
794
        \item \shellcmd{--} -- do not interpret subsequent command line arguments as options.
795
\end{itemize}
796
 
797
\section{\shellcmd{wigen} -- Interconnect generator}
798
 
799
\shellcmd{wigen} is a small tool that generates VHDL description of a simple WISHBONE interconnect based on shared bus topology. It supports any number of masters and slaves. The interconnect can then be used to create a SoC based on \lxp{}.
800
 
801
For interconnects with multiple masters a priority-based arbitration circuit is inserted with lower-numbered masters taking precedence. However, when a bus cycle is in progress ([CYC\_O] is asserted by the active master), the arbiter will not interrupt it even if a master with a higher priority level requests bus ownership.
802
 
803
\subsection{Command line syntax}
804
 
805
\begin{codepar}
806
        wigen [ \emph{option(s)} ] \emph{nm} \emph{ns} \emph{ma} \emph{sa} \emph{ps} [ \emph{pg} ]
807
\end{codepar}
808
 
809
\begin{itemize}
810
        \item\shellcmd{\emph{nm}} -- number of masters,
811
        \item\shellcmd{\emph{ns}} -- number of slaves,
812
        \item\shellcmd{\emph{ma}} -- master address width,
813
        \item\shellcmd{\emph{sa}} -- slave address width,
814
        \item\shellcmd{\emph{ps}} -- port size (8, 16, 32 or 64),
815
        \item\shellcmd{\emph{pg}} -- port granularity (8, 16, 32 or 64, default: the same as port size).
816
\end{itemize}
817
 
818
Supported options are:
819
 
820
\begin{itemize}
821
        \item \shellcmd{-e \emph{entity}} -- name of the design entity (default is \code{"intercon"}).
822
 
823
        \item \shellcmd{-h}, \shellcmd{--help} -- display a short help message and exit.
824
 
825
        \item \shellcmd{-o \emph{file}} -- output file name (default is \shellcmd{\emph{entity}.vhd}).
826
 
827
        \item \shellcmd{-p} -- generate pipelined arbiter (reduced combinatorial delays, increased latency).
828
 
829
        \item \shellcmd{-r} -- generate WISHBONE registered feedback signals ([CTI\_IO()] and [BTE\_IO()]).
830
 
831
        \item \shellcmd{-u} -- generate unsafe slave decoder (reduced combinatorial delays and resource usage, may not work properly if the address is invalid).
832
\end{itemize}
833
 
834
\section{Building from source}
835
\label{sec:buildfromsource}
836
 
837
Prebuilt tool executables for 32-bit Microsoft\textregistered{} Windows\textregistered{} are included in the \lxp{} IP core package. For other platforms the tools must be built from source. Since they are developed in \cplusplus{} using only the standard library, it should be possible to build them for any platform that provides a modern \cplusplus{} compiler.
838
 
839
\subsection{Requirements}
840
 
841
The following software is required to build \lxp{} tools from source:
842
 
843
\begin{enumerate}
844
        \item A modern \cplusplus{} compiler, such as Microsoft\textregistered{} Visual Studio\textregistered{} 2013 or newer, GCC 4.8 or newer, Clang 3.4 or newer.
845
        \item CMake 3.3 or newer.
846
\end{enumerate}
847
 
848
\subsection{Build procedure}
849
 
850
This software uses CMake as a build system generator. Building it involves two steps: first, the \shellcmd{cmake} program is invoked to generate a native build environment (a set of Makefiles or an IDE project); second, the generated environment is used to build the software. More details can be found in the CMake documentation.
851
 
852
\subsubsection{Examples}
853
 
854
In the following examples, it is assumed that the commands are run from the \shellcmd{tools} subdirectory of the \lxp{} IP core package tree.
855
 
856
For Microsoft\textregistered{} Visual Studio\textregistered{}:
857
 
858
\begin{codepar}
859
    mkdir build
860
    cd build
861
    cmake -G "NMake Makefiles" ../src
862
    nmake
863
    nmake install
864
\end{codepar}
865
 
866
For MSYS:
867
 
868
\begin{codepar}
869
    mkdir build
870
    cd build
871
    cmake -G "MSYS Makefiles" ../src
872
    make
873
    make install
874
\end{codepar}
875
 
876
For MinGW without MSYS:
877
 
878
\begin{codepar}
879
    mkdir build
880
    cd build
881
    cmake -G "MinGW Makefiles" ../src
882
    mingw32-make
883
    mingw32-make install
884
\end{codepar}
885
 
886
For other platforms:
887
 
888
\begin{codepar}
889
    mkdir build
890
    cd build
891
    cmake ../src
892
    make
893
    make install
894
\end{codepar}
895
 
896
\appendix
897
 
898
\chapter{Instruction set reference}
899
\label{app:instructionset}
900
 
901
See Section \ref{sec:instructionformat} for a general description of \lxp{} instruction encoding.
902
 
903
\section{List of instructions by group}
904
 
905
\begin{ctabular}{lll}
906
        \toprule
907
        Instruction & Description & Opcode \\
908
        \midrule
909
        \tabcutin{3}{Data transfer} \\
910
        \midrule
911
        \hyperref[subsec:instr:mov]{\instr{mov}} & Move & alias for \code{\instr{add} dst, src, 0} \\
912
        \hyperref[subsec:instr:lc]{\instr{lc}} & Load Constant & \code{000001} \\
913
        \hyperref[subsec:instr:lcs]{\instr{lcs}} & Load Constant Short & \code{101xxx} \\
914
        \hyperref[subsec:instr:lw]{\instr{lw}} & Load Word & \code{001000} \\
915
        \hyperref[subsec:instr:lub]{\instr{lub}} & Load Unsigned Byte & \code{001010} \\
916
        \hyperref[subsec:instr:lsb]{\instr{lsb}} & Load Signed Byte & \code{001011} \\
917
        \hyperref[subsec:instr:sw]{\instr{sw}} & Store Word & \code{001100} \\
918
        \hyperref[subsec:instr:sb]{\instr{sb}} & Store Byte & \code{001110} \\
919
        \midrule
920
        \tabcutin{3}{Arithmetic operations} \\
921
        \midrule
922
        \hyperref[subsec:instr:add]{\instr{add}} & Add & \code{010000} \\
923
        \hyperref[subsec:instr:sub]{\instr{sub}} & Subtract & \code{010001} \\
924
        \hyperref[subsec:instr:neg]{\instr{neg}} & Negate & alias for \code{\instr{sub} dst, 0, src} \\
925
        \hyperref[subsec:instr:mul]{\instr{mul}} & Multiply & \code{010010} \\
926
        \hyperref[subsec:instr:divu]{\instr{divu}} & Divide Unsigned & \code{010100} \\
927
        \hyperref[subsec:instr:divs]{\instr{divs}} & Divide Signed & \code{010101} \\
928
        \hyperref[subsec:instr:modu]{\instr{modu}} & Modulo Unsigned & \code{010110} \\
929
        \hyperref[subsec:instr:mods]{\instr{mods}} & Modulo Signed & \code{010111} \\
930
        \midrule
931
        \tabcutin{3}{Bitwise operations} \\
932
        \midrule
933
        \hyperref[subsec:instr:not]{\instr{not}} & Bitwise Not & alias for \code{\instr{xor} dst, src, -1} \\
934
        \hyperref[subsec:instr:and]{\instr{and}} & Bitwise And & \code{011000} \\
935
        \hyperref[subsec:instr:or]{\instr{or}} & Bitwise Or & \code{011001} \\
936
        \hyperref[subsec:instr:xor]{\instr{xor}} & Bitwise Exclusive Or & \code{011010}\\
937
        \hyperref[subsec:instr:sl]{\instr{sl}} & Shift Left & \code{011100} \\
938
        \hyperref[subsec:instr:sru]{\instr{sru}} & Shift Right Unsigned & \code{011110} \\
939
        \hyperref[subsec:instr:srs]{\instr{srs}} & Shift Right Signed & \code{011111} \\
940
        \midrule
941
        \tabcutin{3}{Execution transfer} \\
942
        \midrule
943
        \hyperref[subsec:instr:jmp]{\instr{jmp}} & Jump & \code{100000} \\
944
        \hyperref[subsec:instr:cjmpxxx]{\instr{cjmp\emph{xxx}}} & Compare and Jump & \code{11\emph{xxxx}} (\code{\emph{xxxx}} = condition) \\
945
        \hyperref[subsec:instr:call]{\instr{call}} & Call Procedure & \code{100001} \\
946
        \hyperref[subsec:instr:ret]{\instr{ret}} & Return from Procedure & alias for \code{\instr{jmp} rp} \\
947
        \hyperref[subsec:instr:iret]{\instr{iret}} & Interrupt Return & alias for \code{\instr{jmp} irp}\\
948
        \midrule
949
        \tabcutin{3}{Miscellaneous instructions} \\
950
        \midrule
951
        \hyperref[subsec:instr:nop]{\instr{nop}} & No Operation & \code{000000} \\
952
        \hyperref[subsec:instr:hlt]{\instr{hlt}} & Halt & \code{000010} \\
953
\end{ctabular}
954
 
955
\section{Alphabetical list of instructions}
956
 
957
\settocdepth{subsection}
958
 
959
{
960
\setlength{\parindent}{0pt}
961
\nonzeroparskip
962
 
963
\subsection{\instr{add} -- Add}
964
\label{subsec:instr:add}
965
 
966
\subsubsection{Syntax}
967
 
968
\code{\instr{add} DST, RD1, RD2}
969
 
970
\subsubsection{Encoding}
971
 
972
\code{010000 T1 T2 DST RD1 RD2}
973
 
974
Example: \code{\instr{add} r2, r1, 10} $\rightarrow$ \code{0x4202010A}
975
 
976
\subsubsection{Operation}
977
 
978
\code{DST := RD1 + RD2}
979
 
980
\subsection{\instr{and} -- Bitwise And}
981
\label{subsec:instr:and}
982
 
983
\subsubsection{Syntax}
984
 
985
\code{\instr{and} DST, RD1, RD2}
986
 
987
\subsubsection{Encoding}
988
 
989
\code{011000 T1 T2 DST RD1 RD2}
990
 
991
Example: \code{\instr{and} r2, r1, 0x3F} $\rightarrow$ \code{0x6202013F}
992
 
993
\subsubsection{Operation}
994
 
995
\code{DST := RD1 $\land$ RD2}
996
 
997
\subsection{\instr{call} -- Call Procedure}
998
\label{subsec:instr:call}
999
 
1000
Save a pointer to the next instruction in the \code{rp} register and transfer execution to the address pointed by the operand.
1001
 
1002
\subsubsection{Syntax}
1003
 
1004
\code{\instr{call} RD1}
1005
 
1006
\subsubsection{Encoding}
1007
 
1008
\code{100001 1 0 11111110 RD1 00000000}
1009
 
1010
RD1 must be a register.
1011
 
1012
Example: \code{\instr{call} r1} $\rightarrow$ \code{0x86FE0100}
1013
 
1014
\subsubsection{Operation}
1015
 
1016
\code{rp := \emph{return\_address}}
1017
 
1018
\code{goto RD1}
1019
 
1020
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1021
 
1022
\subsection{\instr{cjmp\emph{xxx}} -- Compare and Jump}
1023
\label{subsec:instr:cjmpxxx}
1024
 
1025
Compare two operands and transfer execution to the specified address if a condition is satisfied.
1026
 
1027
\subsubsection{Syntax}
1028
 
1029
\code{\instr{cjmpe} DST, RD1, RD2} (Equal)
1030
 
1031
\code{\instr{cjmpne} DST, RD1, RD2} (Not Equal)
1032
 
1033
\code{\instr{cjmpsg} DST, RD1, RD2} (Signed Greater)
1034
 
1035
\code{\instr{cjmpsge} DST, RD1, RD2} (Signed Greater or Equal)
1036
 
1037
\code{\instr{cjmpsl} DST, RD1, RD2} (Signed Less)
1038
 
1039
\code{\instr{cjmpsle} DST, RD1, RD2} (Signed Less or Equal)
1040
 
1041
\code{\instr{cjmpug} DST, RD1, RD2} (Unsigned Greater)
1042
 
1043
\code{\instr{cjmpuge} DST, RD1, RD2} (Unsigned Greater or Equal)
1044
 
1045
\code{\instr{cjmpul} DST, RD1, RD2} (Unsigned Less)
1046
 
1047
\code{\instr{cjmpule} DST, RD1, RD2} (Unsigned Less or Equal)
1048
 
1049
\subsubsection{Encoding}
1050
 
1051
\code{OPCODE T1 T2 DST RD1 RD2}
1052
 
1053
Opcodes:
1054
 
1055
\begin{tabularx}{\textwidth}{lL}
1056
\instr{cjmpe}   & \code{111000} \\
1057
\instr{cjmpne}  & \code{110100} \\
1058
\instr{cjmpsg}  & \code{110001} \\
1059
\instr{cjmpsge} & \code{111001} \\
1060
\instr{cjmpug}  & \code{110010} \\
1061
\instr{cjmpuge} & \code{111010} \\
1062
\end{tabularx}
1063
 
1064
\instr{cjmpsl}, \instr{cjmpsle}, \instr{cjmpul}, \instr{cjmpule} instructions are aliases for \instr{cjmpsg}, \instr{cjmpsge}, \instr{cjmpug}, \instr{cjmpuge}, respectively, with RD1 and RD2 operands swapped.
1065
 
1066
Example: \code{\instr{cjmpuge} r2, r1, 5} $\rightarrow$ \code{0xEA020105}
1067
 
1068
\subsubsection{Operation}
1069
 
1070
\code{if \emph{condition} then goto DST}
1071
 
1072
Pointer in DST is interpreted as described in Section \ref{sec:addressing}. Unlike most instructions, \instr{cjmp\emph{xxx}} does not write to DST.
1073
 
1074
\subsection{\instr{divs} -- Divide Signed}
1075
\label{subsec:instr:divs}
1076
 
1077
\subsubsection{Syntax}
1078
 
1079
\code{\instr{divs} DST, RD1, RD2}
1080
 
1081
\subsubsection{Encoding}
1082
 
1083
\code{010101 T1 T2 DST RD1 RD2}
1084
 
1085
Example: \code{\instr{divs} r2, r1, -3} $\rightarrow$ \code{0x560201FD}
1086
 
1087
\subsubsection{Operation}
1088
 
1089
\code{DST := (\emph{signed}) RD1 / (\emph{signed}) RD2}
1090
 
1091
The result is rounded towards zero and is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1092
 
1093
\subsection{\instr{divu} -- Divide Unsigned}
1094
\label{subsec:instr:divu}
1095
 
1096
\subsubsection{Syntax}
1097
 
1098
\code{\instr{divu} DST, RD1, RD2}
1099
 
1100
\subsubsection{Encoding}
1101
 
1102
\code{010100 T1 T2 DST RD1 RD2}
1103
 
1104
Example: \code{\instr{divu} r2, r1, 73} $\rightarrow$ \code{0x52020107}
1105
 
1106
\subsubsection{Operation}
1107
 
1108
\code{DST := RD1 / RD2}
1109
 
1110
The result is rounded towards zero and is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1111
 
1112
\subsection{\instr{hlt} -- Halt}
1113
\label{subsec:instr:hlt}
1114
 
1115 12 ring0_mipt
Halt the CPU until an enabled interrupt is received.
1116 9 ring0_mipt
 
1117
\subsubsection{Syntax}
1118
 
1119
\code{\instr{hlt}}
1120
 
1121
\subsubsection{Encoding}
1122
 
1123
\code{000010 0 0 00000000 00000000 00000000}
1124
 
1125
\subsubsection{Operation}
1126
 
1127
Pause execution until an interrupt is received.
1128
 
1129
\subsection{\instr{jmp} -- Jump}
1130
\label{subsec:instr:jmp}
1131
 
1132
Transfer execution to the address pointed by the operand.
1133
 
1134
\subsubsection{Syntax}
1135
 
1136
\code{\instr{jmp} RD1}
1137
 
1138
\subsubsection{Encoding}
1139
 
1140
\code{100000 1 0 00000000 RD1 00000000}
1141
 
1142
RD1 must be a register.
1143
 
1144
Example: \code{\instr{jmp} r1} $\rightarrow$ \code{0x82000100}
1145
 
1146
\subsubsection{Operation}
1147
 
1148
\code{goto RD1}
1149
 
1150
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1151
 
1152
\subsection{\instr{iret} -- Interrupt Return}
1153
\label{subsec:instr:iret}
1154
 
1155
Return from an interrupt handler.
1156
 
1157
\subsubsection{Syntax}
1158
 
1159
\instr{iret}
1160
 
1161
Alias for \code{\instr{jmp} irp}.
1162
 
1163
\subsection{\instr{lc} -- Load Constant}
1164
\label{subsec:instr:lc}
1165
 
1166
Load a 32-bit word to the specified register. Note that values from the [-1048576; 1048575] range can be loaded more efficiently using the \instr{lcs} instruction.
1167
 
1168
\subsubsection{Syntax}
1169
 
1170
\code{\instr{lc} DST, WORD32}
1171
 
1172
\subsubsection{Encoding}
1173
 
1174
\code{000001 0 0 DST 00000000 00000000 WORD32}
1175
 
1176
Unlike other instructions, \instr{lc} occupies two 32-bit words.
1177
 
1178
Example: \code{\instr{lc} r1, 0x12345678} $\rightarrow$ \code{0x04010000 0x12345678}
1179
 
1180
\subsubsection{Operation}
1181
 
1182
\code{DST := WORD32}
1183
 
1184
\subsection{\instr{lcs} -- Load Constant Short}
1185
\label{subsec:instr:lcs}
1186
 
1187
Load a signed value from the [-1048576; 1048575] range (a sign extended 21-bit value) to the specified register. Unlike the \instr{lc} instruction, this instruction is encoded as a single word.
1188
 
1189
\subsubsection{Syntax}
1190
 
1191
\code{\instr{lcs} DST, VAL}
1192
 
1193
\subsubsection{Encoding}
1194
 
1195
\code{101 VAL[20:16] DST VAL[15:0]}
1196
 
1197
Example: \code{\instr{lcs} r1, -1000000} $\rightarrow$ \code{0xB001BDC0}
1198
 
1199
\subsubsection{Operation}
1200
 
1201
\code{DST := (\emph{signed}) VAL}
1202
 
1203
\subsection{\instr{lsb} -- Load Signed Byte}
1204
\label{subsec:instr:lsb}
1205
 
1206
Load a byte from the specified address to the register, performing sign extension.
1207
 
1208
\subsubsection{Syntax}
1209
 
1210
\code{\instr{lsb} DST, RD1}
1211
 
1212
\subsubsection{Encoding}
1213
 
1214
\code{001011 1 0 DST RD1 00000000}
1215
 
1216
RD1 must be a register.
1217
 
1218
Example: \code{\instr{lsb} r2, r1} $\rightarrow$ \code{0x2E020100}
1219
 
1220
\subsubsection{Operation}
1221
 
1222
\code{DST := (\emph{signed}) (*(BYTE*)RD1)}
1223
 
1224
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1225
 
1226
\subsection{\instr{lub} -- Load Unsigned Byte}
1227
\label{subsec:instr:lub}
1228
 
1229
Load a byte from the specified address to the register. Higher 24 bits are zeroed.
1230
 
1231
\subsubsection{Syntax}
1232
 
1233
\code{\instr{lub} DST, RD1}
1234
 
1235
\subsubsection{Encoding}
1236
 
1237
\code{001010 1 0 DST RD1 00000000}
1238
 
1239
RD1 must be a register.
1240
 
1241
Example: \code{\instr{lub} r2, r1} $\rightarrow$ \code{0x2A020100}
1242
 
1243
\subsubsection{Operation}
1244
 
1245
\code{DST := *(BYTE*)RD1}
1246
 
1247
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1248
 
1249
\subsection{\instr{lw} -- Load Word}
1250
\label{subsec:instr:lw}
1251
 
1252
Load a word from the specified address to the register.
1253
 
1254
\subsubsection{Syntax}
1255
 
1256
\code{\instr{lw} DST, RD1}
1257
 
1258
\subsubsection{Encoding}
1259
 
1260
\code{001000 1 0 DST RD1 00000000}
1261
 
1262
RD1 must be a register.
1263
 
1264
Example: \code{\instr{lw} r2, r1} $\rightarrow$ \code{0x22020100}
1265
 
1266
\subsubsection{Operation}
1267
 
1268
\code{DST := *RD1}
1269
 
1270
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1271
 
1272
\subsection{\instr{mods} -- Modulo Signed}
1273
\label{subsec:instr:mods}
1274
 
1275
\subsubsection{Syntax}
1276
 
1277
\code{\instr{mods} DST, RD1, RD2}
1278
 
1279
\subsubsection{Encoding}
1280
 
1281
\code{010111 T1 T2 DST RD1 RD2}
1282
 
1283
Example: \code{\instr{mods} r2, r1, 10} $\rightarrow$ \code{0x5E02010A}
1284
 
1285
\subsubsection{Operation}
1286
 
1287
\code{DST := (\emph{signed}) RD1 mod (\emph{signed}) RD2}
1288
 
1289
Modulo operation satisfies the following condition: if $Q=A/B$ and $R=A \mod B$, then $A=B \cdot Q+R$.
1290
 
1291
The result is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1292
 
1293
\subsection{\instr{modu} -- Modulo Unsigned}
1294
\label{subsec:instr:modu}
1295
 
1296
\subsubsection{Syntax}
1297
 
1298
\code{\instr{modu} DST, RD1, RD2}
1299
 
1300
\subsubsection{Encoding}
1301
 
1302
\code{010110 T1 T2 DST RD1 RD2}
1303
 
1304
Example: \code{\instr{modu} r2, r1, 10} $\rightarrow$ \code{0x5A02010A}
1305
 
1306
\subsubsection{Operation}
1307
 
1308
\code{DST := RD1 mod RD2}
1309
 
1310
Modulo operation satisfies the following condition: if $Q=A/B$ and $R=A \mod B$, then $A=B \cdot Q+R$.
1311
 
1312
The result is undefined if RD2 is zero. If the CPU was configured without a divider, this instruction returns \code{0}.
1313
 
1314
\subsection{\instr{mov} -- Move}
1315
\label{subsec:instr:mov}
1316
 
1317
\subsubsection{Syntax}
1318
 
1319
\code{\instr{mov} DST, RD1}
1320
 
1321
Alias for \code{\instr{add} DST, RD1, 0}
1322
 
1323
\subsection{\instr{mul} -- Multiply}
1324
\label{subsec:instr:mul}
1325
 
1326
Multiply two 32-bit values. The result is also 32-bit.
1327
 
1328
\subsubsection{Syntax}
1329
 
1330
\code{\instr{mul} DST, RD1, RD2}
1331
 
1332
\subsubsection{Encoding}
1333
 
1334
\code{010010 T1 T2 DST RD1 RD2}
1335
 
1336
Example: \code{\instr{mul} r2, r1, 3} $\rightarrow$ \code{0x4A020103}
1337
 
1338
\subsubsection{Operation}
1339
 
1340
\code{DST := RD1 * RD2}
1341
 
1342
Since the product width is the same as the operand width, the result of a multiplication does not depend on operand signedness.
1343
 
1344
\subsection{\instr{neg} -- Negate}
1345
\label{subsec:instr:neg}
1346
 
1347
\subsubsection{Syntax}
1348
 
1349
\code{\instr{neg} DST, RD2}
1350
 
1351
Alias for \code{\instr{sub} DST, 0, RD2}
1352
 
1353
\subsection{\instr{nop} -- No Operation}
1354
\label{subsec:instr:nop}
1355
 
1356
\subsubsection{Syntax}
1357
 
1358
\instr{nop}
1359
 
1360
\subsubsection{Encoding}
1361
 
1362
\code{000000 0 0 00000000 00000000 00000000}
1363
 
1364
\subsubsection{Operation}
1365
 
1366
This instruction does not alter the machine state.
1367
 
1368
\subsection{\instr{not} -- Bitwise Not}
1369
\label{subsec:instr:not}
1370
 
1371
\subsubsection{Syntax}
1372
 
1373
\code{\instr{not} DST, RD1}
1374
 
1375
Alias for \code{\instr{xor} DST, RD1, -1}.
1376
 
1377
\subsection{\instr{or} -- Bitwise Or}
1378
\label{subsec:instr:or}
1379
 
1380
\subsubsection{Syntax}
1381
 
1382
\code{\instr{or} DST, RD1, RD2}
1383
 
1384
\subsubsection{Encoding}
1385
 
1386
\code{011001 T1 T2 DST RD1 RD2}
1387
 
1388
Example: \code{\instr{or} r2, r1, 0x3F} $\rightarrow$ \code{0x6602013F}
1389
 
1390
\subsubsection{Operation}
1391
 
1392
\code{DST := RD1 $\lor$ RD2}
1393
 
1394
\subsection{\instr{ret} -- Return from Procedure}
1395
\label{subsec:instr:ret}
1396
 
1397
Return from a procedure.
1398
 
1399
\subsubsection{Syntax}
1400
 
1401
\instr{ret}
1402
 
1403
Alias for \code{\instr{jmp} rp}.
1404
 
1405
\subsection{\instr{sb} -- Store Byte}
1406
\label{subsec:instr:sb}
1407
 
1408
Store the lowest byte from the register to the specified address.
1409
 
1410
\subsubsection{Syntax}
1411
 
1412
\code{\instr{sb} RD1, RD2}
1413
 
1414
\subsubsection{Encoding}
1415
 
1416
\code{001110 1 T2 00000000 RD1 RD2}
1417
 
1418
RD1 must be a register.
1419
 
1420
Example: \code{\instr{sb} r2, r1} $\rightarrow$ \code{0x3B000201}
1421
 
1422
\subsubsection{Operation}
1423
 
1424
\code{*(BYTE*)RD1 := RD2 $\land$ 0x000000FF}
1425
 
1426
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1427
 
1428
\subsection{\instr{sl} -- Shift Left}
1429
\label{subsec:instr:sl}
1430
 
1431
\subsubsection{Syntax}
1432
 
1433
\code{\instr{sl} DST, RD1, RD2}
1434
 
1435
\subsubsection{Encoding}
1436
 
1437
\code{011100 T1 T2 DST RD1 RD2}
1438
 
1439
Example: \code{\instr{sl} r2, r1, 5} $\rightarrow$ \code{0x72020105}
1440
 
1441
\subsubsection{Operation}
1442
 
1443
\code{DST := RD1 << RD2}
1444
 
1445
The result is undefined if RD2 is outside the [0; 31] range.
1446
 
1447
\subsection{\instr{srs} -- Shift Right Signed}
1448
\label{subsec:instr:srs}
1449
 
1450
\subsubsection{Syntax}
1451
 
1452
\code{\instr{srs} DST, RD1, RD2}
1453
 
1454
\subsubsection{Encoding}
1455
 
1456
\code{011111 T1 T2 DST RD1 RD2}
1457
 
1458
Example: \code{\instr{srs} r2, r1, 5} $\rightarrow$ \code{0x7E020105}
1459
 
1460
\subsubsection{Operation}
1461
 
1462
\code{DST := ((\emph{signed}) RD1) >> RD2}
1463
 
1464
The result is undefined if RD2 is outside the [0; 31] range.
1465
 
1466
\subsection{\instr{sru} -- Shift Right Unsigned}
1467
\label{subsec:instr:sru}
1468
 
1469
\subsubsection{Syntax}
1470
 
1471
\code{\instr{sru} DST, RD1, RD2}
1472
 
1473
\subsubsection{Encoding}
1474
 
1475
\code{011110 T1 T2 DST RD1 RD2}
1476
 
1477
Example: \code{\instr{sru} r2, r1, 5} $\rightarrow$ \code{0x7A020105}
1478
 
1479
\subsubsection{Operation}
1480
 
1481
\code{DST := RD1 >> RD2}
1482
 
1483
The result is undefined if RD2 is outside the [0; 31] range.
1484
 
1485
\subsection{\instr{sub} -- Subtract}
1486
\label{subsec:instr:sub}
1487
 
1488
\subsubsection{Syntax}
1489
 
1490
\code{\instr{sub} DST, RD1, RD2}
1491
 
1492
\subsubsection{Encoding}
1493
 
1494
\code{010001 T1 T2 DST RD1 RD2}
1495
 
1496
Example: \code{\instr{sub} r2, r1, 5} $\rightarrow$ \code{0x46020105}
1497
 
1498
\subsubsection{Operation}
1499
 
1500
\code{DST := RD1 - RD2}
1501
 
1502
\subsection{\instr{sw} -- Store Word}
1503
\label{subsec:instr:sw}
1504
 
1505
Store the value of the register to the specified address.
1506
 
1507
\subsubsection{Syntax}
1508
 
1509
\code{\instr{sw} RD1, RD2}
1510
 
1511
\subsubsection{Encoding}
1512
 
1513
\code{001100 1 T2 00000000 RD1 RD2}
1514
 
1515
RD1 must be a register.
1516
 
1517
Example: \code{\instr{sw} r2, r1} $\rightarrow$ \code{0x33000201}
1518
 
1519
\subsubsection{Operation}
1520
 
1521
\code{*RD1 := RD2}
1522
 
1523
Pointer in RD1 is interpreted as described in Section \ref{sec:addressing}.
1524
 
1525
\subsection{\instr{xor} -- Bitwise Exclusive Or}
1526
\label{subsec:instr:xor}
1527
 
1528
\subsubsection{Syntax}
1529
 
1530
\code{\instr{xor} DST, RD1, RD2}
1531
 
1532
\subsubsection{Encoding}
1533
 
1534
\code{011010 T1 T2 DST RD1 RD2}
1535
 
1536
Example: \code{\instr{xor} r2, r1, 0x3F} $\rightarrow$ \code{0x6A02013F}
1537
 
1538
\subsubsection{Operation}
1539
 
1540
\code{DST := RD1 $\oplus$ RD2}
1541
 
1542
}
1543
 
1544
\settocdepth{section}
1545
 
1546
\chapter{Instruction cycle counts}
1547
 
1548
Cycle counts for \lxp{} instructions are listed in Table \ref{tab:cycles}, based on an assumption that no pipeline stalls are caused by the instruction bus latency or cache misses. These data are provided for reference purposes; the software should not depend on them as they can change in future hardware revisions.
1549
 
1550
\begin{table}[htbp]
1551
        \centering
1552
        \caption{Instruction cycle counts}
1553
        \label{tab:cycles}
1554
        \begin{tabularx}{0.8\textwidth}{LLLL}
1555
                \toprule
1556
                Instruction & Cycles & Instruction & Cycles \\
1557
                \midrule
1558
                \instr{add} & 1 & \instr{modu} & 37 \\
1559
                \instr{and} & 1 & \instr{mov} & 1 \\
1560
                \instr{call} & 4 & \instr{mul} & 2, 6 or 34\footnotemark[3] \\
1561
                \instr{cjmp\emph{xxx}} & 5 or 2\footnotemark[1] & \instr{neg} & 1 \\
1562
                \instr{divs} & 36 & \instr{nop} & 1 \\
1563
                \instr{divu} & 36 & \instr{not} & 1 \\
1564
                \instr{hlt} & N/A & \instr{or} & 1 \\
1565
                \instr{jmp} & 4 & \instr{ret} & 4 \\
1566
                \instr{iret} & 4 & \instr{sb} & $\ge$ 2\footnotemark[2] \\
1567
                \instr{lc} & 2 & \instr{sl} & 2 \\
1568
                \instr{lcs} & 1 & \instr{srs} & 2 \\
1569
                \instr{lsb} & $\ge$ 3\footnotemark[2] & \instr{sru} & 2 \\
1570
                \instr{lub} & $\ge$ 3\footnotemark[2] & \instr{sub} & 1 \\
1571
                \instr{lw} & $\ge$ 3\footnotemark[2] & \instr{sw} & $\ge$ 2\footnotemark[2] \\
1572
                \instr{mods} & 37 & \instr{xor} & 1 \\
1573
                \bottomrule
1574
        \end{tabularx}
1575
\end{table}
1576
 
1577
\footnotetext[1]{Depends on whether the jump is taken or not.}
1578
\footnotetext[2]{Depends on the data bus latency.}
1579
\footnotetext[3]{Depends on the multiplier architecture. See Section \ref{sec:generics}.}
1580
 
1581
\chapter{LXP32 assembly language}
1582
\label{app:assemblylanguage}
1583
 
1584
This appendix defines the assembly language used by \lxp{} development tools.
1585
 
1586
\section{Comments}
1587
 
1588
\lxp{} assembly language supports C style comments that can span across multiple lines and single-line \cplusplus{} style comments:
1589
 
1590
\begin{codepar}\itshape
1591
    /*
1592
     * This is a comment.
1593
     */
1594
 
1595
    // This is also a comment
1596
\end{codepar}
1597
 
1598
From a parser's point of view comments are equivalent to whitespace.
1599
 
1600
\section{Literals}
1601
 
1602
\lxp{} assembly language uses numeric and string literals similar to those provided by the C programming language.
1603
 
1604
Numeric literals can take form of decimal, hexadecimal or octal numbers. Literals prefixed with \code{0x} are interpreted as hexadecimal, literals prefixed with \code{0} are interpreted as octal, other literals are interpreted as decimal. A numeric literal can also start with an unary plus or minus sign which is also considered a part of the literal.
1605
 
1606
String literals must be enclosed in double quotes. The most common escape sequences used in C are supported (Table \ref{tab:stringescape}). Note that strings are not null-terminated in the LXP32 assembly language; when required, terminating null character must be inserted explicitly.
1607
 
1608
\begin{table}[htbp]
1609
        \caption{Escape sequences used in string literals}
1610
        \label{tab:stringescape}
1611
        \begin{tabularx}{\textwidth}{lL}
1612
                \toprule
1613
                Sequence & Interpretation \\
1614
                \midrule
1615
                \code{\textbackslash\textbackslash} & Backslash character \\
1616
                \code{\textbackslash "} & Double quotation mark \\
1617
                \code{\textbackslash '} & Single quotation mark (can be also used directly) \\
1618
                \code{\textbackslash t} & Tabulation character \\
1619
                \code{\textbackslash n} & Line feed \\
1620
                \code{\textbackslash r} & Carriage return \\
1621
                \code{\textbackslash x\emph{XX}} & Character with a hexadecimal code of \emph{XX} (1--2 digits) \\
1622
                \code{\textbackslash \emph{XXX}} & Character with an octal code of \emph{XXX} (1--3 digits) \\
1623
                \bottomrule
1624
        \end{tabularx}
1625
\end{table}
1626
 
1627
\section{Symbols}
1628
\label{sec:symbols}
1629
 
1630
Symbols (labels) are used to refer to data or code locations. \lxp{} assembly language does not have distinct code and data labels: symbols are used in both these contexts.
1631
 
1632
Symbol names must be valid identifiers. A valid identifier must start with an alphabetic character or an underscore, and may contain alphanumeric characters and underscores.
1633
 
1634
A symbol definition must be the first token in a source code line followed by a colon. A symbol definition can occupy a separate line (in which case it refers to the following statement). Alternatively, a statement can follow the symbol definition on the same line.
1635
 
1636
Symbols can be used as operands to the \instr{lc} and \instr{lcs} instruction statements. A symbol reference can end with a \code{@\emph{n}} sequence, where \code{\emph{n}} is a numeric literal; in this case it is interpreted as an offset (in bytes) relative to the symbol definition. For the \instr{lcs} instruction, the resulting address must still fit into the sign extended 21-bit value range (\code{0x00000000}--\code{0x000FFFFF} or \code{0xFFF00000}--\code{0xFFFFFFFF}), otherwise the linker will report an error.
1637
 
1638
By default all symbols are local, that is, they can be only referenced from the module where they were defined. To make a symbol accessible from other modules, use the \instr{\#export} directive. To reference a symbol defined in another module use the \instr{\#import} directive.
1639
 
1640
A symbol named \code{entry} or \code{Entry} has a special meaning: it is used to inform the linker about the program entry point if there are multiple input files. It does not have to be exported. If defined, this symbol must precede the first instruction or data definition statement in the module. Only one module in the program can define the entry symbol.
1641
 
1642
\begin{codeparbreakable}
1643
    \instr{lc} r10, jump\_label
1644
    \instr{lc} r11, data\_word
1645
\emph{// ...}
1646
    \instr{sw} r11, r0 \emph{// store the value of r0 to the}
1647
               \emph{// location pointed by data\_word}
1648
    \instr{jmp} r10    \emph{// transfer execution to jump\_label}
1649
\emph{// ...}
1650
jump\_label:
1651
    \instr{mov} r1, r0
1652
\emph{// ...}
1653
data\_word:
1654
    \instr{.word} 0x12345678
1655
\end{codeparbreakable}
1656
 
1657
\section{Statements}
1658
 
1659
Each statement occupies a single source code line. There are three kinds of statements:
1660
 
1661
\begin{itemize}
1662
        \item \emph{Directives} provide directions for the assembler that do not directly cause code generation.
1663
        \item \emph{Data definition statements} insert arbitrary data to the generated code.
1664
        \item \emph{Instruction statements} insert \lxp{} CPU instructions to the generated code.
1665
\end{itemize}
1666
 
1667
\subsection{Directives}
1668
 
1669
The first token of a directive statement always starts with the \code{\#} character.
1670
 
1671
\begin{codepar}
1672
\instr{\#define} \emph{identifier} [ \emph{token} ... ]
1673
\end{codepar}
1674
 
1675
Defines a macro that will be substituted with zero or more tokens. The \code{\emph{identifier}} must satisfy the requirements listed in Section \ref{sec:symbols}. Tokens can be anything, including keywords, identifiers, literals and separators (i.e. comma and colon characters).
1676
 
1677
\begin{codepar}
1678
\instr{\#error} [ \emph{msg} ]
1679
\end{codepar}
1680
 
1681
Raises a compiler error. If \emph{msg} is supplied, uses it as an error message.
1682
 
1683
\begin{codepar}
1684
\instr{\#export} \emph{identifier}
1685
\end{codepar}
1686
 
1687
Declares \code{\emph{identifier}} as an exported symbol. Exported symbols can be referenced by other modules.
1688
 
1689
\begin{codepar}
1690
\instr{\#ifdef} | \instr{\#ifndef} \emph{identifier}
1691
\code{...}
1692
\instr{\#else}
1693
\code{...}
1694
\instr{\#endif}
1695
\end{codepar}
1696
 
1697
Define C preprocessor-style conditional sections which are processed or not based on whether a certain macro has been defined. \instr{\#else} is optional. Can be nested.
1698
 
1699
\begin{codepar}
1700
\instr{\#import} \emph{identifier}
1701
\end{codepar}
1702
 
1703
Declares \code{\emph{identifier}} as an imported symbol. Used to refer to symbols exported by other modules.
1704
 
1705
\begin{codepar}
1706
\instr{\#include} \emph{filename}
1707
\end{codepar}
1708
 
1709
Processes \code{\emph{filename}} contents as it were literally inserted at the point of the \instr{\#include} directive. \code{\emph{filename}} must be a string literal.
1710
 
1711
\begin{codepar}
1712
\instr{\#message} \emph{msg}
1713
\end{codepar}
1714
 
1715
Prints \code{\emph{msg}} to the standard output stream. \code{\emph{msg}} must be a string literal.
1716
 
1717
\subsection{Data definition statements}
1718
 
1719
The first token of a data definition statement always starts with the \code{.} (period) character.
1720
 
1721
\begin{codepar}
1722
\instr{.align} [ \emph{alignment} ]
1723
\end{codepar}
1724
 
1725
Ensures that code generated by the next data definition or instruction statement is aligned to a multiple of \code{\emph{alignment}} bytes, inserting padding zeros if needed. \code{\emph{alignment}} must be a power of 2 and can't be less than 4. Default \code{\emph{alignment}} is 4. Instructions and words are always at least word-aligned; the \instr{.align} statement can be used to align them to a larger boundary, or to align byte data (see below).
1726
 
1727
The \instr{.align} statement is not guaranteed to work if the requested alignment is greater than the section alignment specified for the linker (see Subsection \ref{subsec:assemblercmdline}).
1728
 
1729
\begin{codepar}
1730
\instr{.byte} \emph{token} [, \emph{token} ... ]
1731
\end{codepar}
1732
 
1733
Inserts one or more bytes to the output code. Each \code{\emph{token}} can be either a numeric literal with a valid range of [-128; 255] or a string literal. By default, bytes are not aligned.
1734
 
1735
To define a null-terminated string, the terminating null character must be inserted explicitly.
1736
 
1737
\begin{codepar}
1738
\instr{.reserve} \emph{n}
1739
\end{codepar}
1740
 
1741
Inserts \code{\emph{n}} zero bytes to the output code.
1742
 
1743
\begin{codepar}
1744
\instr{.word} \emph{token} [, \emph{token} ... ]
1745
\end{codepar}
1746
 
1747
Inserts one or more 32-bit words to the output code. Tokens must be numeric literals.
1748
 
1749
\subsection{Instruction statements}
1750
 
1751
Instruction statements have the following general syntax:
1752
 
1753
\begin{codepar}
1754
    \instr{\emph{instruction}} [ \emph{operand} [, \emph{operand} ... ] ]
1755
\end{codepar}
1756
 
1757
Depending on the instruction, operands can be registers, numeric literals or symbols. Supported instructions are listed in Appendix \ref{app:instructionset}.
1758
 
1759
\chapter{WISHBONE datasheet}
1760
\label{app:wishbonedatasheet}
1761
 
1762
\section[Instruction bus (LXP32C only)]{Instruction bus (\lxp{}C only)}
1763
 
1764
\begin{ctabular}{ll}
1765
        \toprule
1766
        \tabcutin{2}{\makebox[0.9\textwidth][c]{General information}} \\
1767
        \midrule
1768
        WISHBONE revision & B3 \\
1769
        Type of interface & MASTER \\
1770
        Supported cycles  & BLOCK READ \\
1771
        \midrule
1772
        \tabcutin{2}{Signal names} \\
1773
        \midrule
1774
        \signal{clk\_i}       & CLK\_I \\
1775
        \signal{rst\_i}       & RST\_I \\
1776
        \signal{ibus\_cyc\_o} & CYC\_O \\
1777
        \signal{ibus\_stb\_o} & STB\_O \\
1778
        \signal{ibus\_cti\_o} & CTI\_O() \\
1779
        \signal{ibus\_bte\_o} & BTE\_O() \\
1780
        \signal{ibus\_ack\_i} & ACK\_I \\
1781
        \signal{ibus\_adr\_o} & ADR\_O() \\
1782
        \signal{ibus\_dat\_i} & DAT\_I() \\
1783
        \midrule
1784
        \tabcutin{2}{Supported tag signals} \\
1785
        \midrule
1786
        \signal{ibus\_cti\_o} & Cycle Type Identifier (address tag) \\
1787
        & \hspace{\parindent} ``010'' (Incrementing burst cycle) \\
1788
        & \hspace{\parindent} ``111'' (End-of-Burst) \\
1789
        \signal{ibus\_bte\_o} & Burst Type Extension (address tag) \\
1790
        & \hspace{\parindent} ``00'' (Linear burst) \\
1791
        \midrule
1792
        \tabcutin{2}{Dimensions} \\
1793
        \midrule
1794
        Port size & 32 \\
1795
        Port granularity & 32 \\
1796
        Maximum operand size & 32 \\
1797
        Data transfer ordering & BIG/LITTLE ENDIAN \\
1798
        Data transfer sequence & UNDEFINED \\
1799
        \bottomrule
1800
\end{ctabular}
1801
 
1802
\section{Data bus}
1803
 
1804
\begin{ctabular}{ll}
1805
        \toprule
1806
        \tabcutin{2}{\makebox[0.9\textwidth][c]{General information}} \\
1807
        \midrule
1808
        WISHBONE revision & B3 \\
1809
        Type of interface & MASTER \\
1810
        Supported cycles  & SINGLE READ/WRITE \\
1811
                          & RMW \\
1812
        \midrule
1813
        \tabcutin{2}{Signal names} \\
1814
        \midrule
1815
        \signal{clk\_i}       & CLK\_I \\
1816
        \signal{rst\_i}       & RST\_I \\
1817
        \signal{dbus\_cyc\_o} & CYC\_O \\
1818
        \signal{dbus\_stb\_o} & STB\_O \\
1819
        \signal{dbus\_we\_o}  & WE\_O \\
1820
        \signal{dbus\_sel\_o} & SEL\_O() \\
1821
        \signal{dbus\_ack\_i} & ACK\_I \\
1822
        \signal{dbus\_adr\_o} & ADR\_O() \\
1823
        \signal{dbus\_dat\_o} & DAT\_O() \\
1824
        \signal{dbus\_dat\_i} & DAT\_I() \\
1825
        \midrule
1826
        \tabcutin{2}{Dimensions} \\
1827
        \midrule
1828
        Port size & 32 \\
1829
        Port granularity & 8 \\
1830
        Maximum operand size & 32 \\
1831
        Data transfer ordering & LITTLE ENDIAN \\
1832
        Data transfer sequence & UNDEFINED \\
1833
        \bottomrule
1834
\end{ctabular}
1835
 
1836
\chapter{List of changes}
1837
 
1838 12 ring0_mipt
\section*{Version 1.3 (2022-08-28)}
1839
 
1840
This release removes support for temporarily blocked interrupts (interrupts can still be disabled) and introduces wake-up interrupts.
1841
 
1842 9 ring0_mipt
\section*{Version 1.2 (2021-10-21)}
1843
 
1844
This release introduces a few non-breaking changes to the software and testbench. The CPU RTL description hasn't been changed from the previous release.
1845
 
1846
\begin{itemize}
1847
        \item \shellcmd{lxp32asm} now supports C-style conditional processing directives: \instr{\#ifdef}, \instr{\#ifndef}, \instr{\#else} and \instr{\#endif}.
1848
        \item \instr{\#define} directive can now declare a macro with zero subsitute tokens.
1849
        \item A new \instr{\#error} directive.
1850
        \item Minor changes to the testbench.
1851
\end{itemize}
1852
 
1853
\section*{Version 1.1 (2019-01-11)}
1854
 
1855
This release introduces a minor but technically breaking hardware change: the START\_ADDR generic, which used to be 30-bit, has been for convenience extended to a full 32-bit word; the two least significant bits are ignored.
1856
 
1857
The other breaking change affects the assembly language syntax. Previously all symbols used to be public, and multiple modules could not define symbols with the same name. As of now only symbols explicitly exported using the \instr{\#export} directive are public. \instr{\#extern} directive has been replaced by \instr{\#import}.
1858
 
1859
Other notable changes include:
1860
 
1861
\begin{itemize}
1862
        \item A new instruction, \instr{lcs} (\instrname{Load Constant Short}), has been added, which loads a 21-bit sign extended constant to a register. Unlike \instr{lc}, it is encoded as a single word and takes one cycle to execute.
1863
        \item Optimizations in the divider unit. Division instructions (\instr{divs} and \instr{divu}) now take one fewer cycle to execute (modulo instructions are unaffected).
1864
        \item LXP32 assembly language now supports a new instruction alias, \instr{neg} (\instrname{Negate}), which is equivalent to \code{\instr{sub} dst, 0, src}.
1865
\end{itemize}
1866
 
1867
\section*{Version 1.0 (2016-02-20)}
1868
 
1869
Initial public release.
1870
 
1871
\end{document}

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.