1 |
2 |
sinclairrf |
SSBCC.9x8 is a free Small Stack-Based Computer Compiler with a 9-bit opcode,
|
2 |
4 |
sinclairrf |
8-bit data core designed to facilitate FPGA HDL development.
|
3 |
2 |
sinclairrf |
|
4 |
4 |
sinclairrf |
The primary design criteria are:
|
5 |
|
|
- high speed (to avoid timing issues)
|
6 |
|
|
- low fabric utilization
|
7 |
|
|
- vendor independent
|
8 |
|
|
- development tools available for all operating systems
|
9 |
|
|
|
10 |
|
|
It has been used in Spartan-3A, Spartan-6, Virtex-6, and Artix-7 FPGAs and has
|
11 |
|
|
been built for Altera, Lattice, and other Xilinx devices. It is faster and
|
12 |
|
|
usually smaller than vendor provided processors.
|
13 |
|
|
|
14 |
2 |
sinclairrf |
The compiler takes an architecture file that describes the micro controller
|
15 |
|
|
memory spaces, inputs and outputs, and peripherals and which specifies the HDL
|
16 |
|
|
language and source assembly. It generates a single HDL module implementing
|
17 |
|
|
the entire micro controller. No user-written HDL is required to instantiate
|
18 |
|
|
I/Os, program memory, etc.
|
19 |
|
|
|
20 |
4 |
sinclairrf |
The features are:
|
21 |
|
|
- high speed, low fabric utilization
|
22 |
|
|
- vendor-independent Verilog output with a VHDL package file
|
23 |
7 |
sinclairrf |
- simple Forth-like assembly language (43 instructions)
|
24 |
4 |
sinclairrf |
- single cycle instruction execution
|
25 |
|
|
- automatic generation of I/O ports
|
26 |
|
|
- configurable instruction, data stack, return stack, and memory utilization
|
27 |
|
|
- extensible set of peripherals (I2C busses, UARTs, AXI4-Lite busses, etc.)
|
28 |
|
|
- extensible set of macros
|
29 |
|
|
- memory initialization file to facilitate code development without rebuilds
|
30 |
|
|
- simulation diagnostics to facilitate identifying code errors
|
31 |
|
|
- conditionally included I/Os and peripherals, functions, and assembly code
|
32 |
|
|
|
33 |
2 |
sinclairrf |
SSBCC has been used for the following projects:
|
34 |
|
|
- operate a media translator from a parallel camera interface to an OMAP GPMC
|
35 |
|
|
interface, detect and report bus errors and hardware errors, and act as an
|
36 |
|
|
SPI slave to the OMAP
|
37 |
|
|
- operate two UART interfaces and multiple PWM controlled 2-lead bi-color LEDs
|
38 |
|
|
- operate and monitor the Artix-7 fabric in a Zynq system using AXI4-Lite
|
39 |
|
|
master and slave buses, I2C buses for timing-critical voltage measurements
|
40 |
|
|
|
41 |
4 |
sinclairrf |
The only external tool required is Python 2.7.
|
42 |
2 |
sinclairrf |
|
43 |
4 |
sinclairrf |
|
44 |
2 |
sinclairrf |
DESCRIPTION
|
45 |
|
|
================================================================================
|
46 |
|
|
|
47 |
|
|
The computer compiler uses an architectural description of the processor stating
|
48 |
|
|
the sizes of the instruction memory, data stack, and return stack; the input and
|
49 |
|
|
output ports; RAM and ROM types and sizes; and peripherals.
|
50 |
|
|
|
51 |
|
|
The instructions are all single-cycle. The instructions include
|
52 |
4 |
sinclairrf |
- 4 arithmetic instructions: addition, subtraction, increment, and decrement
|
53 |
7 |
sinclairrf |
- 2 carry bit instructions: +c and -c for addition and subtraction respectively
|
54 |
4 |
sinclairrf |
- 3 bit-wise logical instructions: and, or, and exclusive or
|
55 |
|
|
- 7 shift and rotation instructions: <<0, <<1, 0>>, 1>>, <>msb, and >>lsb
|
56 |
|
|
- 4 logical instructions: 0=, 0<>, -1=, -1<>
|
57 |
|
|
- 6 Forth-like data stack instructions: drop, dup, nip, over, push, swap
|
58 |
|
|
- 3 Forth-like return stack instructions: >r, r>, r@
|
59 |
|
|
- 2 input and output
|
60 |
|
|
- 6 memory read and write with optional address post increment and post decrement
|
61 |
|
|
- 2 jump and conditional jump
|
62 |
|
|
- 2 call and conditional call
|
63 |
|
|
- 1 function return
|
64 |
|
|
- 1 nop
|
65 |
2 |
sinclairrf |
|
66 |
|
|
The 9x8 address space is up to 8K. This is achieved by pushing the 8 lsb of the
|
67 |
|
|
target address onto the data stack immediately before the jump or call
|
68 |
|
|
instruction and by encoding the 5 msb of the address within the jump or call
|
69 |
|
|
instruction. The instruction immediately following a jump, call, or return is
|
70 |
|
|
executed before the instruction sequence at the destination address is executed
|
71 |
|
|
(this is illustrated later).
|
72 |
|
|
|
73 |
|
|
Up to four banks of memory, either RAM or ROM, are available. Each of these can
|
74 |
|
|
be up to 256 bytes long, providing a total of up to 1 kB of memory.
|
75 |
|
|
|
76 |
4 |
sinclairrf |
The assembly language is Forth-like. Built-in macros are used to encode the
|
77 |
|
|
jump and call instructions and to encode the 2-bit memory bank index in memory
|
78 |
|
|
store and fetch instructions.
|
79 |
2 |
sinclairrf |
|
80 |
|
|
The computer compiler and assembler are written in Python 2.7. Peripherals are
|
81 |
|
|
implemented by Python modules which generate the I/O ports and the peripheral
|
82 |
|
|
HDL.
|
83 |
|
|
|
84 |
|
|
The computer compiler is documented in the doc directory. The 9x8 core is
|
85 |
|
|
documented in the core/9x8/doc directory. Several examples are provided.
|
86 |
|
|
|
87 |
|
|
The computer compiler and assembler are fully functional and there are no known
|
88 |
|
|
bugs.
|
89 |
|
|
|
90 |
|
|
|
91 |
|
|
SPEED AND RESOURCE UTILIZATION
|
92 |
|
|
================================================================================
|
93 |
|
|
These device speed and resource utilization results are copied from the build
|
94 |
7 |
sinclairrf |
tests. The full results are listed in the core/9x8/build directories. The
|
95 |
|
|
tests use a minimal processor implementation (clock, reset, and one output).
|
96 |
|
|
Device-specific scripts state how these performance numbers were obtained.
|
97 |
2 |
sinclairrf |
|
98 |
|
|
VENDOR DEVICE BEST SPEED SMALLEST RESOURCE UTILIZATION
|
99 |
|
|
------ ------ ---------- -------------------------------
|
100 |
|
|
Altera Cyclone-III 190.6 MHz 282 LEs (preliminary)
|
101 |
|
|
Altera Cyclone-IV 192.1 MHz 281 LEs (preliminary)
|
102 |
|
|
Altera Stratix-V 372.9 MHz 198 ALUTs (preliminary)
|
103 |
|
|
Lattice LCMXO2-640ZE-3 98.4 MHz 206 LUTs (preliminary)
|
104 |
|
|
Lattice LFE2-6E-7 157.9 MHz 203 LUTs (preliminary)
|
105 |
7 |
sinclairrf |
Xilinx Artix-7 TBD 163 slice LUTs (48 slices)
|
106 |
|
|
Xilinx Kintex-7 TBD 158 slice LUTs (44 slices)
|
107 |
|
|
Xilinx Spartan-3A 149.4 MHz 232 4-input LUTs (129 slices)
|
108 |
|
|
Xilinx Spartan-6 193.7 MHz 124 Slice LUTs (34 slices)
|
109 |
6 |
sinclairrf |
Xilinx Virtex-6 275.7 MHz 122 Slice LUTs (38 slices) (p.)
|
110 |
2 |
sinclairrf |
|
111 |
|
|
Disclaimer: Like other embedded processors, these are the maximum performance
|
112 |
|
|
claims. Realistic implementations will produce slower maximum clock rates,
|
113 |
|
|
particularly with lots of I/O ports and peripherals and with the constraint of
|
114 |
|
|
existing with other subsystems in the FPGA fabric. What these performance
|
115 |
|
|
numbers do provide is an estimate of the amount of slack available. For
|
116 |
|
|
example, you can't realistically expect to get 110 MHz from a processor that,
|
117 |
6 |
sinclairrf |
under ideal conditions, places and routes at 125 MHz, but you can with a
|
118 |
|
|
processor that is demonstrated to place and route at 150 MHz.
|
119 |
2 |
sinclairrf |
|
120 |
|
|
|
121 |
|
|
EXAMPLE:
|
122 |
|
|
================================================================================
|
123 |
|
|
|
124 |
|
|
The LED flasher example demonstrates the simplicity of the architectural
|
125 |
|
|
specification and the Forth-like assembly language.
|
126 |
|
|
|
127 |
|
|
The architecture file, named "led.9x8", with the comments and user header
|
128 |
|
|
removed, is as follows:
|
129 |
|
|
|
130 |
|
|
ARCHITECTURE core/9x8 Verilog
|
131 |
|
|
|
132 |
|
|
INSTRUCTION 2048
|
133 |
|
|
RETURN_STACK 32
|
134 |
|
|
DATA_STACK 32
|
135 |
|
|
|
136 |
|
|
PORTCOMMENT LED on/off signal
|
137 |
|
|
OUTPORT 1-bit o_led O_LED
|
138 |
|
|
|
139 |
|
|
ASSEMBLY led.s
|
140 |
|
|
|
141 |
|
|
The ARCHITECTURE configuration command specifies the 9x8 core and the Verilog
|
142 |
|
|
language. The INSTRUCTION, RETURN_STACK, and DATA_STACK configuration commands
|
143 |
|
|
specify the sizes of the instruction space, return stack, and data stack. The
|
144 |
|
|
content of the PORTCOMMENT configuration command is inserted in the module
|
145 |
|
|
declaration -- this facilitates identifying signals in micro controllers with a
|
146 |
|
|
lot of inputs and outputs. The single OUTPORT statement specifies a 1-bit
|
147 |
|
|
signal named "o_led". This signal is accessed in the assembly code through the
|
148 |
|
|
symbol "O_LED". The ASSEMBLY command specifies the single input file "led.s,"
|
149 |
|
|
which is listed below. The output module will be "led.v"
|
150 |
|
|
|
151 |
|
|
The "led.s" assembly file is as follows:
|
152 |
|
|
|
153 |
|
|
; Consume 256*5+4 clock cycles.
|
154 |
|
|
; ( - )
|
155 |
|
|
.function pause
|
156 |
|
|
|
157 |
|
|
.return
|
158 |
|
|
|
159 |
|
|
; Repeat "pause" 256 times.
|
160 |
|
|
; ( - )
|
161 |
|
|
.function repause
|
162 |
|
|
|
163 |
|
|
.return
|
164 |
|
|
|
165 |
|
|
; main program (as an infinite loop)
|
166 |
|
|
.main
|
167 |
|
|
|
168 |
|
|
|
169 |
|
|
This example is coded in a traditional Forth structure with the conditional
|
170 |
|
|
jumps consuming the top of the data stack. Examining the "pause" function, the
|
171 |
|
|
".function" directive specifies the start of a function and the function name.
|
172 |
|
|
The "0" instruction pushes the value "0" onto the top of the data stack.
|
173 |
|
|
":inner" is a label for a jump instruction. The "1-" instruction decrements the
|
174 |
|
|
top of the data stack. "dup" is the Forth instruction to push a duplicate of
|
175 |
|
|
the top of the data stack onto the data stack. The ".jumpc(inner)" macro
|
176 |
|
|
expands to three instructions as follows: (1) push the 8 lsb of the address at
|
177 |
|
|
"inner" onto the data stack, (2) the conditional jump instruction with the 5 msb
|
178 |
|
|
of the address of "inner" (the jumpc instruction also drops the top of the data
|
179 |
|
|
stack with its partial address), and (3) a "drop" instruction to drop the
|
180 |
|
|
duplicated loop count from the top of the data stack. Finally, the "drop"
|
181 |
|
|
instruction drops the loop count from the top of the data stack and the
|
182 |
|
|
".return" macro generates the "return" instruction and a "nop" instruction.
|
183 |
|
|
|
184 |
|
|
The function "repause" calls the "pause" function 256 times. The main program
|
185 |
|
|
body is identified by the directive ".main" This function runs an infinite loop
|
186 |
|
|
that toggles the lsb of the LED output, outputs the LED setting, and calls the
|
187 |
|
|
"repause" function.
|
188 |
|
|
|
189 |
|
|
A tighter version of the loop in the "pause" function can be written as
|
190 |
|
|
|
191 |
|
|
; Consume 256*3+3 clock cycles.
|
192 |
|
|
; ( - )
|
193 |
|
|
.function pause
|
194 |
|
|
0xFF :inner .jumpc(inner,1-) .return(drop)
|
195 |
|
|
|
196 |
|
|
which is 3 cycles long for each iteration, the "drop" that is normally part
|
197 |
|
|
of the ".jumpc" macro has been replaced by the decrement instruction, and the
|
198 |
|
|
final "drop" instruction has replaced the default "nop" instruction that is
|
199 |
|
|
normally part of the ".return" macro. Note that the decrement is performed
|
200 |
|
|
after the non-zero comparison in the "jumpc" instruction.
|
201 |
|
|
|
202 |
|
|
A version of the "pause" function that consumes exactly 1000 clock cycles is:
|
203 |
|
|
|
204 |
|
|
.function pause
|
205 |
|
|
${(1000-4)/4-1} :inner nop .jumpc(inner,1-) drop .return
|
206 |
|
|
|
207 |
|
|
The instruction memory initialization for the processor module includes the
|
208 |
|
|
instruction mnemonics being performed at each address and replaces the "list"
|
209 |
|
|
file output from traditional assemblers. The following is the memory
|
210 |
|
|
initialization for this LED flasher example. The main program always starts at
|
211 |
|
|
address zero and functions are included in the order encountered. Unused
|
212 |
|
|
library functions are not included in the generated instruction list.
|
213 |
|
|
|
214 |
|
|
reg [8:0] s_opcodeMemory[2047:0];
|
215 |
|
|
initial begin
|
216 |
|
|
// .main
|
217 |
|
|
s_opcodeMemory['h000] = 9'h100; // 0x00
|
218 |
|
|
s_opcodeMemory['h001] = 9'h101; // :inner 0x01
|
219 |
|
|
s_opcodeMemory['h002] = 9'h052; // ^
|
220 |
|
|
s_opcodeMemory['h003] = 9'h008; // dup
|
221 |
|
|
s_opcodeMemory['h004] = 9'h100; // O_LED
|
222 |
|
|
s_opcodeMemory['h005] = 9'h038; // outport
|
223 |
|
|
s_opcodeMemory['h006] = 9'h054; // drop
|
224 |
|
|
s_opcodeMemory['h007] = 9'h10D; //
|
225 |
|
|
s_opcodeMemory['h008] = 9'h0C0; // call repause
|
226 |
|
|
s_opcodeMemory['h009] = 9'h000; // nop
|
227 |
|
|
s_opcodeMemory['h00A] = 9'h101; //
|
228 |
|
|
s_opcodeMemory['h00B] = 9'h080; // jump inner
|
229 |
|
|
s_opcodeMemory['h00C] = 9'h000; // nop
|
230 |
|
|
// repause
|
231 |
|
|
s_opcodeMemory['h00D] = 9'h100; // 0x00
|
232 |
|
|
s_opcodeMemory['h00E] = 9'h119; // :inner
|
233 |
|
|
s_opcodeMemory['h00F] = 9'h0C0; // call pause
|
234 |
|
|
s_opcodeMemory['h010] = 9'h000; // nop
|
235 |
|
|
s_opcodeMemory['h011] = 9'h05C; // 1-
|
236 |
|
|
s_opcodeMemory['h012] = 9'h008; // dup
|
237 |
|
|
s_opcodeMemory['h013] = 9'h10E; //
|
238 |
|
|
s_opcodeMemory['h014] = 9'h0A0; // jumpc inner
|
239 |
|
|
s_opcodeMemory['h015] = 9'h054; // drop
|
240 |
|
|
s_opcodeMemory['h016] = 9'h054; // drop
|
241 |
|
|
s_opcodeMemory['h017] = 9'h028; // return
|
242 |
|
|
s_opcodeMemory['h018] = 9'h000; // nop
|
243 |
|
|
// pause
|
244 |
|
|
s_opcodeMemory['h019] = 9'h100; // 0x00
|
245 |
|
|
s_opcodeMemory['h01A] = 9'h05C; // :inner 1-
|
246 |
|
|
s_opcodeMemory['h01B] = 9'h008; // dup
|
247 |
|
|
s_opcodeMemory['h01C] = 9'h11A; //
|
248 |
|
|
s_opcodeMemory['h01D] = 9'h0A0; // jumpc inner
|
249 |
|
|
s_opcodeMemory['h01E] = 9'h054; // drop
|
250 |
|
|
s_opcodeMemory['h01F] = 9'h054; // drop
|
251 |
|
|
s_opcodeMemory['h020] = 9'h028; // return
|
252 |
|
|
s_opcodeMemory['h021] = 9'h000; // nop
|
253 |
|
|
s_opcodeMemory['h022] = 9'h000;
|
254 |
|
|
s_opcodeMemory['h023] = 9'h000;
|
255 |
|
|
s_opcodeMemory['h024] = 9'h000;
|
256 |
|
|
...
|
257 |
|
|
s_opcodeMemory['h7FF] = 9'h000;
|
258 |
|
|
end
|
259 |
|
|
|
260 |
|
|
|
261 |
|
|
DATA and STRINGS
|
262 |
|
|
================================================================================
|
263 |
|
|
|
264 |
|
|
Values are pushed onto the data stack by stating the value. For example,
|
265 |
|
|
|
266 |
|
|
0x10 0x20 'x'
|
267 |
|
|
|
268 |
|
|
will successively push the values 0x10, 0x20, and the character 'x' onto the
|
269 |
|
|
data stack. The character 'x' will be at the top of the data stack after these
|
270 |
|
|
3 instructions.
|
271 |
|
|
|
272 |
5 |
sinclairrf |
Numeric values can be represented in binary, octal, decimal, and hex. Binary
|
273 |
|
|
values start with the two characters "0b" followed by a sequence of binary
|
274 |
|
|
digits; octal numbers start with a "0" followed by a sequence of octal digits;
|
275 |
|
|
decimal values can start with a "+" or "-" have a non-zero first digit and have
|
276 |
|
|
zero or more decimal digits; and hex values start with the two characters "0X"
|
277 |
|
|
followed by a sequence of hex digits.
|
278 |
2 |
sinclairrf |
|
279 |
5 |
sinclairrf |
Examples of equivalent numeric values are:
|
280 |
|
|
binary: 0b01 0b10010
|
281 |
|
|
octal: 01 022
|
282 |
|
|
decimal: 1 18
|
283 |
|
|
hex: 0x1 0x12
|
284 |
|
|
|
285 |
|
|
See the COMPUTED VALUES section for using computed values in the assembler.
|
286 |
|
|
|
287 |
2 |
sinclairrf |
There are four ways to specify strings in the assembler. Simply stating the
|
288 |
|
|
string
|
289 |
|
|
|
290 |
|
|
"Hello World!"
|
291 |
|
|
|
292 |
|
|
puts the characters in the string onto the data stack with the letter 'H' at the
|
293 |
|
|
top of the data stack. I.e., the individual push operations are
|
294 |
|
|
|
295 |
|
|
'!' 'd' 'l' ... 'e' 'H'
|
296 |
|
|
|
297 |
|
|
Prepending a 'N' before the double quote, like
|
298 |
|
|
|
299 |
|
|
N"Hello World!"
|
300 |
|
|
|
301 |
|
|
puts a null-terminated string onto the data stack. I.e., the value under the
|
302 |
|
|
'!' will be a 0x00 and the instruction sequence would be
|
303 |
|
|
|
304 |
|
|
0x0 '!' 'd' 'l' ... 'e' 'H'
|
305 |
|
|
|
306 |
|
|
Forth uses counted strings, which are specified here as
|
307 |
|
|
|
308 |
|
|
C"Hello World!"
|
309 |
|
|
|
310 |
4 |
sinclairrf |
In this case the number of characters, 12, in the string is pushed onto the data
|
311 |
|
|
stack after the 'H', i.e., the instruction sequence would be
|
312 |
2 |
sinclairrf |
|
313 |
|
|
'!' 'd' 'l' ... 'e' 'H' 12
|
314 |
|
|
|
315 |
|
|
Finally, a lesser-counted string specified like
|
316 |
|
|
|
317 |
|
|
c"Hello World!"
|
318 |
|
|
|
319 |
|
|
is similar to the Forth-like counted string except that the value pushed onto
|
320 |
|
|
the data stack is one less than the number of characters in the string. Here
|
321 |
|
|
the value pushed onto the data stack after the 'H' would be 11 instead of 12.
|
322 |
|
|
|
323 |
|
|
Simple strings are useful for constructing more complex strings in conjunction
|
324 |
|
|
with other string functions. For example, to transmit the hex values of the
|
325 |
|
|
top 2 values in the data stack, do something like:
|
326 |
|
|
|
327 |
|
|
; move the top 2 values to the return stack
|
328 |
|
|
>r >r
|
329 |
|
|
; push the tail of the message onto the data stack
|
330 |
|
|
N"\n\r"
|
331 |
|
|
; convert the 2 values to 2-digit hex values, LSB deepest in the stack
|
332 |
|
|
r> .call(string_byte_to_hex)
|
333 |
|
|
r> .call(string_byte_to_hex)
|
334 |
|
|
; pre-pend the identification message
|
335 |
|
|
"Message: "
|
336 |
|
|
; transmit the string, using the null terminator to terminate the loop
|
337 |
|
|
:loop_transmit .outport(O_UART_TX) .jumpc(loop_transmit,nop) drop
|
338 |
|
|
|
339 |
|
|
A lesser-counted string would be used like:
|
340 |
|
|
|
341 |
|
|
c"Status Message\r\n"
|
342 |
|
|
:loop_msg swap .outport(O_UART_TX) .jumpc(loop_msg,1-) drop
|
343 |
|
|
|
344 |
|
|
These four string formats can also be used for variable definitions. For
|
345 |
|
|
example 3 variables could be allocated and initialized as follows:
|
346 |
|
|
|
347 |
|
|
.memory ROM myrom
|
348 |
|
|
.variable fred N"fred"
|
349 |
|
|
.variable joe c"joe"
|
350 |
|
|
.variable moe "moe"
|
351 |
|
|
|
352 |
|
|
These are equivalent to
|
353 |
|
|
|
354 |
|
|
.variable fred 'f' 'r' 'e' 'd' 0
|
355 |
|
|
.variable joe 2 'j' 'o' 'e'
|
356 |
|
|
.variable moe 'm' 'o' 'e'
|
357 |
|
|
|
358 |
|
|
with 5 bytes allocated for the variable fred, 4 bytes for joe, and 3 bytes for
|
359 |
|
|
moe.
|
360 |
|
|
|
361 |
|
|
The following escaped characters are recognized:
|
362 |
|
|
|
363 |
|
|
'\0' null character
|
364 |
|
|
'\a' bell
|
365 |
|
|
'\b' backspace
|
366 |
|
|
'\f' form feed
|
367 |
|
|
'\n' line feed
|
368 |
|
|
'\r' carriage return
|
369 |
|
|
'\t' horizontal tab
|
370 |
|
|
"\0ooo" 3-digit octal value
|
371 |
|
|
"\xXX" 2-digit hex value where X is one of 0-9, a-f, or A-F
|
372 |
|
|
"\Xxx" alternate form for 2-digit hex value
|
373 |
|
|
"\\" backslash character
|
374 |
|
|
|
375 |
|
|
Unrecognized escaped characters are simple treated as that character. For
|
376 |
|
|
example, '\m' is treated as the single character 'm' and '\'' is treated as the
|
377 |
|
|
single quote character.
|
378 |
|
|
|
379 |
|
|
|
380 |
|
|
INSTRUCTIONS
|
381 |
|
|
================================================================================
|
382 |
|
|
|
383 |
7 |
sinclairrf |
The 43 instructions are as follows (see core/9x8/doc/opcodes.html for detailed
|
384 |
2 |
sinclairrf |
descriptions). Here, T is the top of the data stack, N is the next-to-top of
|
385 |
|
|
the data stack, and R is the top of the return stack. All of these are the
|
386 |
|
|
values at the start of the instruction.
|
387 |
|
|
|
388 |
|
|
The nop instruction does nothing:
|
389 |
|
|
|
390 |
|
|
nop no operation
|
391 |
|
|
|
392 |
|
|
Mathematical operations drop one value from the data stack and replace the new
|
393 |
|
|
top with the state value:
|
394 |
|
|
|
395 |
|
|
& bitwise and of N and T
|
396 |
|
|
+ N + T
|
397 |
|
|
- N - T
|
398 |
|
|
^ bitwise exclusive or of N and T
|
399 |
|
|
or bitwise or of N and T
|
400 |
|
|
|
401 |
7 |
sinclairrf |
Push the carry bit for addition or subtraction onto the data stack (see
|
402 |
|
|
lib/9x8/math.s for examples on using +c and -c for multi-byte arithmetic):
|
403 |
|
|
|
404 |
|
|
+c carry bit for N+T
|
405 |
|
|
-c carry bit for N-T
|
406 |
|
|
|
407 |
2 |
sinclairrf |
Increment and decrement replace the top of the data stack with the stated
|
408 |
|
|
result.
|
409 |
|
|
|
410 |
|
|
1+ replace T with T+1
|
411 |
|
|
1- replace T with T-1
|
412 |
|
|
|
413 |
|
|
Comparison operations replace the top of the data stack with the results of the
|
414 |
|
|
comparison:
|
415 |
|
|
|
416 |
|
|
-1<> replace T with -1 if T != -1, otherwise set T to 0
|
417 |
|
|
-1= replace T with 0 if T != -1, otherwise leave T as -1
|
418 |
|
|
0<> replace T with -1 if T != 0, otherwise leave T as 0
|
419 |
|
|
0= replace T with -1 if T == 0, otherwise set T to 0
|
420 |
|
|
|
421 |
|
|
Shift/rotate operations replace the top of the data with with the result of the
|
422 |
|
|
specified shift/rotate.
|
423 |
|
|
|
424 |
|
|
0>> shift T right one bit and set the msb to 0
|
425 |
|
|
1>> shift T right 1 bit and set the msb to 1
|
426 |
|
|
<<0 shift T left 1 bit and set the lsb to 0
|
427 |
|
|
<<1 shift T left 1 bit and set the lsb to 1
|
428 |
|
|
<
|
429 |
|
|
lsb>> rotate T right 1 bit
|
430 |
|
|
msb>> shift T right 1 bit and set the msb to the old msb
|
431 |
|
|
|
432 |
|
|
Note: There is no "<
|
433 |
|
|
|
434 |
|
|
Stack manipulation instructions are as follows:
|
435 |
|
|
|
436 |
|
|
>r pushd T onto the return stack and drop T from the data stack
|
437 |
|
|
drop drop T from the data stack
|
438 |
|
|
dup push T onto the data stack
|
439 |
|
|
nip drop N from the data stack
|
440 |
|
|
over push N onto the data stack
|
441 |
|
|
push push a single byte onto the data stack, see the preceding DATA
|
442 |
|
|
and STRINGS section
|
443 |
|
|
r> push R onto the data stack and drop R from the return stack
|
444 |
|
|
r@ push R onto the data stack
|
445 |
|
|
swap swap N and T
|
446 |
|
|
|
447 |
|
|
Jump and call and their conditional variants are as follows and must use the
|
448 |
|
|
associated macro:
|
449 |
|
|
|
450 |
|
|
call call instruction -- use the .call macro
|
451 |
|
|
callc conditional call instruction -- use the .callc macro
|
452 |
|
|
jump jump instruction -- use the .jump macro
|
453 |
|
|
jumpc conditional jump instruction -- use the .jumpc macro
|
454 |
|
|
return return instruction -- use the .return macro
|
455 |
|
|
|
456 |
|
|
See the MEMORY section for details for these memory operations. T is the
|
457 |
|
|
address for the instructions, N is the value stored. Chained fetches insert the
|
458 |
|
|
value below T. Chained stores drop N.
|
459 |
|
|
|
460 |
|
|
fetch memory fetch, replace T with the value fetched
|
461 |
|
|
fetch+ chained memory fetch, retain and increment the address
|
462 |
|
|
fetch- chained memory fetch, retain and decrement the address
|
463 |
|
|
store memory store, drop T (N is the next value of T)
|
464 |
|
|
store+ chained memory store, retain and increment the address
|
465 |
|
|
store- chained memory store, retain and decrement the address
|
466 |
|
|
|
467 |
|
|
See the INPORT and OUTPORT section for details for the input and output port
|
468 |
|
|
operations:
|
469 |
|
|
|
470 |
|
|
inport input port operation
|
471 |
|
|
outport output port operation
|
472 |
|
|
|
473 |
|
|
The .call, .callc, .jump, and .jumpc macros encode the 3 instructions required
|
474 |
|
|
to perform a call or jump along with the subsequent instructions. The default
|
475 |
|
|
third instructions is "nop" for .call and .jump and it is "drop" for .callc and
|
476 |
|
|
.jumpc. The default can be changed by specifying the optional second argument.
|
477 |
|
|
The .call and .callc macros must specify a function identified by the .function
|
478 |
|
|
directive and the .jump and .jumpc macros must specify a label.
|
479 |
|
|
|
480 |
|
|
The .function directive takes the name of the function and the function body.
|
481 |
|
|
Function bodies must end with a .return or a .jump macro. The .main directive
|
482 |
|
|
defines the body of the main function, i.e., the function at which the processor
|
483 |
|
|
starts.
|
484 |
|
|
|
485 |
|
|
The .include directive is used to read additional assembly code. You can, for
|
486 |
|
|
example, put the main function in uc.s, define constants and such in consts.s,
|
487 |
|
|
define the memories and variables in ram.s, and include UART utilities in
|
488 |
|
|
uart.s. These files could be included in uc.s through the following lines:
|
489 |
|
|
|
490 |
|
|
.include consts.s
|
491 |
|
|
.include myram.s
|
492 |
|
|
.include uart.s
|
493 |
|
|
|
494 |
|
|
The assembler only includes functions that can be reached from the main
|
495 |
|
|
function. Unused functions will not consume instruction space.
|
496 |
|
|
|
497 |
|
|
|
498 |
|
|
INPORT and OUTPORT
|
499 |
|
|
================================================================================
|
500 |
|
|
|
501 |
|
|
The INPORT and OUTPORT configuration commands are used to specify 2-state inputs
|
502 |
|
|
and outputs. For example
|
503 |
|
|
|
504 |
|
|
INPORT 8-bit i_value I_VALUE
|
505 |
|
|
|
506 |
|
|
specifies a single 8-bit input signal named "i_value" for the module. The port
|
507 |
|
|
is accessed in assembly by ".inport(I_VALUE)" which is equivalent to the
|
508 |
|
|
two-instruction sequence "I_VALUE inport". To input an 8-bit value from a FIFO
|
509 |
|
|
and send a single-clock-cycle wide acknowledgment strobe, use
|
510 |
|
|
|
511 |
|
|
INPORT 8-bit,strobe i_fifo,o_fifo_ack I_FIFO
|
512 |
|
|
|
513 |
|
|
The assembly ".inport(I_FIFO)" will automatically send an acknowledgment strobe
|
514 |
|
|
to the FIFO through "o_fifo_ack".
|
515 |
|
|
|
516 |
|
|
A write port to an 8-bit FIFO is similarly specified by
|
517 |
|
|
|
518 |
|
|
OUTPORT 8-bit,strobe o_fifo,o_fifo_wr O_FIFO
|
519 |
|
|
|
520 |
|
|
The assembly ".outport(O_FIFO)" which is equivalent to "O_FIFO outport drop"
|
521 |
|
|
will automatically send a write strobe to the FIFO through "o_fifo_wr".
|
522 |
|
|
|
523 |
|
|
Multiple signals can be packed into a single input or output port by defining
|
524 |
|
|
them in comma separated lists. The associated bit masks can be defined
|
525 |
|
|
coincident with the port definition as follows:
|
526 |
|
|
|
527 |
|
|
INPUT 1-bit,1-bit i_fifo_full,i_fifo_empty I_FIFO_STATUS
|
528 |
|
|
CONSTANT C_FIFO_STATUS__FULL 0x02
|
529 |
|
|
CONSTANT C_FIFO_STATUS__EMPTY 0x01
|
530 |
|
|
|
531 |
|
|
Checking the "full" status of the FIFO can be done by the following assembly
|
532 |
|
|
sequence:
|
533 |
|
|
|
534 |
|
|
.inport(I_FIFO_STATUS) C_FIFO_STATUS__FULL &
|
535 |
|
|
|
536 |
|
|
Multiple bits can be masked using a computed value as follows (see below for
|
537 |
|
|
more details):
|
538 |
|
|
|
539 |
|
|
.inport(I_FIFO_STATUS) ${C_FIFO_STATUS__FULL|C_FIFO_STATUS__EMPTY} &
|
540 |
|
|
|
541 |
|
|
The "${...}" creates an instruction to push the 8-bit value in the braces onto
|
542 |
|
|
the data stack. The computation is performed using the Python "eval" function
|
543 |
|
|
in the context of the program constants, memory addresses, and memory sizes.
|
544 |
|
|
|
545 |
|
|
Preceding all of these by
|
546 |
|
|
|
547 |
|
|
PORTCOMMENT external FIFO
|
548 |
|
|
|
549 |
|
|
produces the following in the Verilog module statement. The I/O ports are
|
550 |
|
|
listed in the order in which they are declared.
|
551 |
|
|
|
552 |
|
|
// external FIFO
|
553 |
|
|
input wire [7:0] i_fifo,
|
554 |
|
|
output reg o_fifo_ack,
|
555 |
|
|
output reg [7:0] o_fifo,
|
556 |
|
|
output reg o_fifo_wr,
|
557 |
|
|
input wire i_fifo_full,
|
558 |
|
|
input wire i_fifo_empty
|
559 |
|
|
|
560 |
|
|
The HDL to implement the inputs and outputs is computer generated. Identifying
|
561 |
|
|
the port name in the architecture file eliminates the possibility of
|
562 |
|
|
inconsistent port numbers between the HDL and the assembly. Specifying the bit
|
563 |
|
|
mapping for the assembly code immediately after the port definition helps
|
564 |
|
|
prevent inconsistencies between the port definition and the bit mapping in the
|
565 |
|
|
assembly code.
|
566 |
|
|
|
567 |
|
|
The normal initial value for an outport is zero. This can be changed by
|
568 |
|
|
including an optional initial value as follows. This initial value will be
|
569 |
|
|
applied on system startup and when the micro controller is reset.
|
570 |
|
|
|
571 |
|
|
OUTPORT 4-bit=4'hA o_signal O_SIGNAL
|
572 |
|
|
|
573 |
|
|
An isolated output strobe can also be created using:
|
574 |
|
|
|
575 |
|
|
OUTPORT strobe o_strobe O_STROBE
|
576 |
|
|
|
577 |
|
|
The assembly ".outstrobe(O_STROBE)" which is equivalent to "O_STROBE outport"
|
578 |
|
|
is used to generate the strobe. Since "O_STROBE" is a strobe-only outport, the
|
579 |
|
|
".outport" macro cannot be used with it. Similarly, attempting to use the
|
580 |
|
|
".outstrobe" macro will generate an error if it is invoked with an outport
|
581 |
|
|
that does have data.
|
582 |
|
|
|
583 |
|
|
A single-bit "set-reset" input port type is also included. This sets a register
|
584 |
|
|
when an external strobe is received and clears the register when the port is
|
585 |
|
|
read. For example, to capture an external timer for a polled-loop, include the
|
586 |
|
|
following in the architecture file:
|
587 |
|
|
|
588 |
|
|
PORTCOMMENT external timer
|
589 |
|
|
INPORT set-reset i_timer I_TIMER
|
590 |
|
|
|
591 |
|
|
The following is the assembly code to conditionally call two functions when the
|
592 |
|
|
timer event is encountered:
|
593 |
|
|
|
594 |
|
|
.inport(I_TIMER)
|
595 |
|
|
.callc(timer_event_1,nop)
|
596 |
|
|
.callc(timer_event_2)
|
597 |
|
|
|
598 |
|
|
The "nop" in the first conditional call prevents the conditional from being
|
599 |
|
|
dropped from the data stack so that it can be used by the subsequent conditional
|
600 |
|
|
function call.
|
601 |
|
|
|
602 |
|
|
|
603 |
|
|
PERIPHERAL
|
604 |
|
|
================================================================================
|
605 |
|
|
|
606 |
|
|
Peripherals are implemented via Python modules. For example, an open drain I/O
|
607 |
|
|
signal, such as is required for an I2C bus, does not fit the INPORT and OUTPORT
|
608 |
|
|
functionality. Instead, an "open_drain" peripheral is provided by the Python
|
609 |
|
|
script in "core/9x8/peripherals/open_drain.py". This puts a tri-state I/O in
|
610 |
|
|
the module statement, allows it to be read through an "inport" instruction, and
|
611 |
|
|
allows it to be set low or released through an "outport" instruction. An I2C
|
612 |
|
|
bus with separate SCL and SDA ports can then be incorporated into the processor
|
613 |
|
|
as follows:
|
614 |
|
|
|
615 |
|
|
PORTCOMMENT I2C bus
|
616 |
|
|
PERIPHERAL open_drain inport=I_SCL \
|
617 |
|
|
outport=O_SCL \
|
618 |
|
|
iosignal=io_scl
|
619 |
|
|
PERIPHERAL open_drain inport=I_SDA \
|
620 |
|
|
outport=O_SDA \
|
621 |
|
|
iosignal=io_sda
|
622 |
|
|
|
623 |
|
|
The default width for this peripheral is 1 bit. The module statement will then
|
624 |
|
|
include the lines
|
625 |
|
|
|
626 |
|
|
// I2C bus
|
627 |
|
|
inout wire io_scl,
|
628 |
|
|
inout wire io_sda
|
629 |
|
|
|
630 |
|
|
The assembly code to set the io_scl signal low is "0 .outport(O_SCL)" and to
|
631 |
|
|
release it is "1 .outport(O_SCL)". These instruction sequences are actually
|
632 |
|
|
"0 O_SCL outport drop" and "1 O_SCL outport drop" respectively. The "outport"
|
633 |
|
|
instruction drops the top of the data stack (which contained the port number)
|
634 |
|
|
and sends the next-to-the-top of the data stack to the designated output port.
|
635 |
|
|
|
636 |
|
|
Two examples of I2C device operation are included in the examples directory.
|
637 |
|
|
|
638 |
|
|
The following peripherals are provided:
|
639 |
|
|
adder_16bit 16-bit adder/subtractor
|
640 |
|
|
AXI4_Lite_Master
|
641 |
|
|
32-bit read/write AXI4-Lite Master
|
642 |
|
|
Note: The synchronous version has been tested on hardware.
|
643 |
|
|
AXI4_Lite_Slave_DualPortRAM
|
644 |
|
|
dual-port-RAM interface for the micro controller to act as an
|
645 |
|
|
AXI4-Lite slave
|
646 |
|
|
big_inport shift reads from a single INPORT to construct a wide input
|
647 |
|
|
big_outport shift writes to a single OUTPORT to construct a wide output
|
648 |
|
|
counter counter for number of received high cycles from signal
|
649 |
|
|
inFIFO_async input FIFO with an asynchronous write clock
|
650 |
|
|
latch latch wide inputs for sampling
|
651 |
|
|
monitor_stack simulation diagnostic (see below)
|
652 |
|
|
open_drain for software-implemented I2C buses or similar
|
653 |
|
|
outFIFO_async output FIFO with an asynchronous read clock
|
654 |
|
|
PWM_8bit PWM generator with an 8-bit control
|
655 |
|
|
timer timing for polled loops or similar
|
656 |
|
|
trace simulation diagnostic (see below)
|
657 |
|
|
UART bidirectional UART
|
658 |
|
|
UART_Rx receive UART
|
659 |
|
|
UART_Tx transmit UART
|
660 |
3 |
sinclairrf |
wide_strobe 1 to 8 bit strobe generator
|
661 |
2 |
sinclairrf |
|
662 |
|
|
The following command illustrates how to display the help message for
|
663 |
|
|
peripherals:
|
664 |
|
|
|
665 |
|
|
echo "ARCHITECTURE core/9x8 Verilog" | ssbcc -P "big_inport help" - | less
|
666 |
|
|
|
667 |
|
|
User defined peripherals can be in the same directory as the architecture file
|
668 |
|
|
or a subdirectory named "peripherals".
|
669 |
|
|
|
670 |
|
|
|
671 |
|
|
PARAMETER and LOCALPARAM
|
672 |
|
|
================================================================================
|
673 |
|
|
|
674 |
|
|
Parameters are incorporated through the PARAMETER and LOCALPARAM configuration
|
675 |
|
|
commands. For example, the clock frequency in hertz is needed for UARTs for
|
676 |
|
|
their baud rate generator. The configuration command
|
677 |
|
|
|
678 |
|
|
PARAMETER G_CLK_FREQ_HZ 97_000_000
|
679 |
|
|
|
680 |
|
|
specifies the clock frequency as 97 MHz. The HDL instantiating the processor
|
681 |
|
|
can change this specification. The frequency can also be changed through the
|
682 |
|
|
command-line invocation of the computer compiler. For example,
|
683 |
|
|
|
684 |
|
|
ssbcc -G "G_CLK_FREQ_HZ=100_000_000" myprogram.9x8
|
685 |
|
|
|
686 |
|
|
specifies that a frequency of 100 MHz be used instead of the default frequency
|
687 |
|
|
of 97 MHz.
|
688 |
|
|
|
689 |
|
|
The LOCALPARAM configuration command can be used to specify parameters that
|
690 |
|
|
should not be changed by the surrounding HDL. For example,
|
691 |
|
|
|
692 |
|
|
LOCALPARAM L_VERSION 24'h00_00_00
|
693 |
|
|
|
694 |
|
|
specifies a 24-bit parameter named "L_VERSION". The 8-bit major, minor, and
|
695 |
|
|
build sections of the parameter can be accessed in an assembly program using
|
696 |
|
|
"L_VERSION[16+:8]", "L_VERSION[8+:8]", and "L_VERSION[0+:8]".
|
697 |
|
|
|
698 |
|
|
For both parameters and localparams, the default range is "[0+:8]". The
|
699 |
|
|
instruction memory is initialized using the parameter value during synthesis,
|
700 |
|
|
not the value used to initialize the parameter. That is, the instruction memory
|
701 |
|
|
initialization will be:
|
702 |
|
|
|
703 |
|
|
s_opcodeMemory[...] = { 1'b1, L_VERSION[16+:8] };
|
704 |
|
|
|
705 |
|
|
The value of the localparam can be set when the computer compiler is run using
|
706 |
|
|
the "-G" option. For example,
|
707 |
|
|
|
708 |
|
|
ssbcc -G "L_VERSION=24'h01_04_03" myprogram.9x8
|
709 |
|
|
|
710 |
|
|
can be used in a makefile to set the version number for a release without
|
711 |
|
|
modifying the micro controller architecture file.
|
712 |
|
|
|
713 |
|
|
|
714 |
|
|
DIAGNOSTICS AND DEBUGGING
|
715 |
|
|
================================================================================
|
716 |
|
|
|
717 |
|
|
A 3-character, human readable version of the opcode can be included in
|
718 |
|
|
simulation waveform outputs by adding "--display-opcode" to the ssbcc command.
|
719 |
|
|
|
720 |
|
|
The stack health can be monitored during simulation by including the
|
721 |
|
|
"monitor_stack" peripheral through the command line. For example, the LED
|
722 |
|
|
flasher example can be generated using
|
723 |
|
|
|
724 |
|
|
ssbcc -P monitor_stack led.9x8
|
725 |
|
|
|
726 |
|
|
This allows the architecture file to be unchanged between simulation and an FPGA
|
727 |
|
|
build.
|
728 |
|
|
|
729 |
|
|
Stack errors include underflow and overflow, malformed data validity, and
|
730 |
|
|
incorrect use of the values on the return stack (returns to data values and data
|
731 |
|
|
operations on return addresses). Other errors include out-of-range for memory,
|
732 |
|
|
inport, and outport operations.
|
733 |
|
|
|
734 |
|
|
When stack errors are detected the last 50 instructions are dumped to the
|
735 |
|
|
console and the simulation terminates. The dump includes the PC, numeric
|
736 |
|
|
opcode, textual representation of the opcode, data stack pointer, next-to-top of
|
737 |
|
|
the data stack, top of the data stack, top of the return stack, and the return
|
738 |
|
|
stack pointer. Invalid stack values are displayed as "XX". The length of the
|
739 |
|
|
history dumped is configurable.
|
740 |
|
|
|
741 |
|
|
Out-of-range PC checks are also performed if the instruction space is not a
|
742 |
|
|
power of 2.
|
743 |
|
|
|
744 |
|
|
A "trace" peripheral is also provided that dumps the entire execution history.
|
745 |
|
|
This was used to validate the processor core.
|
746 |
|
|
|
747 |
|
|
|
748 |
|
|
MEMORY ARCHITECTURE
|
749 |
|
|
================================================================================
|
750 |
|
|
|
751 |
|
|
The DATA_STACK, RETURN_STACK, INSTRUCTION, and MEMORY configuration commands
|
752 |
|
|
allocate memory for the data stack, return stack, instruction ROM, and memory
|
753 |
|
|
RAM and ROM respectively. The data stack, return stack, and memories are
|
754 |
|
|
normally instantiated as dual-port LUT-based memories with asynchronous reads
|
755 |
|
|
while the instruction memory is always instantiated with a synchronous read
|
756 |
|
|
architecture.
|
757 |
|
|
|
758 |
|
|
The COMBINE configuration command is used to coalesce memories and to convert
|
759 |
|
|
LUT-based memories to synchronous SRAM-based memories. For example, the large
|
760 |
|
|
SRAMs in modern FPGAs are ideal for storing the instruction opcodes and their
|
761 |
|
|
dual-ported access allows either the data stack or the return stack to be
|
762 |
|
|
stored in a relatively small region at the end of the large instruction memory.
|
763 |
|
|
Memories, which required dual-ported operation, can also be instantiated in
|
764 |
|
|
large RAMs either individually or in combination with each other. Conversion
|
765 |
|
|
to SRAM-based memories is also useful for FPGA architectures that do not have
|
766 |
|
|
efficient LUT-based memories.
|
767 |
|
|
|
768 |
|
|
The INSTRUCTION configuration allocates memory for the processor instruction
|
769 |
|
|
space. It has the form "INSTRUCTION N" or "INSTRUCTION N*M" where N must be a
|
770 |
|
|
power of 2. The first form is used if the desired instruction memory size is a
|
771 |
|
|
power of 2. The second form is used to allocate M memory blocks of size N
|
772 |
|
|
where M is not a power of 2. For example, on an Altera Cyclone III, the
|
773 |
|
|
configuration command "INSTRUCTION 1024*3" allocates three M9Ks for the
|
774 |
|
|
instruction space, saving one M9K as compared to the configuration command
|
775 |
|
|
"INSTRUCTION 4096".
|
776 |
|
|
|
777 |
|
|
The DATA_STACK configuration command allocates memory for the data stack. It
|
778 |
|
|
has the form "DATA_STACK N" where N is the commanded size of the data stack.
|
779 |
|
|
N must be a power of 2.
|
780 |
|
|
|
781 |
|
|
The RETURN_STACK configuration command allocates memory for the return stack and
|
782 |
|
|
has the same format as the DATA_STACK configuration command.
|
783 |
|
|
|
784 |
|
|
The MEMORY configuration command is used to define one to four memories, either
|
785 |
|
|
RAM or ROM, with up to 256 bytes each. If no MEMORY configuration command is
|
786 |
|
|
issued, then no memories are allocated for the processor. The MEMORY
|
787 |
|
|
configuration command has the format "MEMORY {RAM|ROM} name N" where
|
788 |
|
|
"{RAM|ROM}" specifies either a RAM or a ROM, name is the name of the memory and
|
789 |
|
|
must start with an alphabetic character, and the size of the memory, N, must be
|
790 |
|
|
a power of 2. For example, "MEMORY RAM myram 64" allocates 64 bytes of memory
|
791 |
|
|
to form a RAM named myram. Similarly, "MEMORY ROM lut 256" defines a 256 byte
|
792 |
|
|
ROM named lut. More details on using memories is provided in the next section.
|
793 |
|
|
|
794 |
|
|
The COMBINE configuration command can be used to combine the various memories
|
795 |
|
|
for more efficient processor implementation as follows:
|
796 |
|
|
|
797 |
|
|
COMBINE INSTRUCTION,
|
798 |
|
|
COMBINE
|
799 |
|
|
COMBINE ,
|
800 |
|
|
COMBINE
|
801 |
|
|
|
802 |
|
|
where is one of DATA_STACK, RETURN_STACK, or a list of one
|
803 |
|
|
or more ROMs and is a list of one or more RAMs and/or ROMs. The first
|
804 |
|
|
configuration command reserves space at the end of the instruction memory for
|
805 |
|
|
the DATA_STACK, RETURN_STACK, or listed ROMs.
|
806 |
|
|
|
807 |
|
|
The SRAM_WIDTH configuration command is used to make the memory allocations more
|
808 |
|
|
efficient when the SRAM block width is more than 9 bits. For example,
|
809 |
|
|
Altera's Cyclone V family has 10-bit wide memory blocks and the configuration
|
810 |
|
|
command "SRAM_WIDTH 10" is appropriate. The configuration command
|
811 |
|
|
sequence
|
812 |
|
|
|
813 |
|
|
INSTRUCTION 1024
|
814 |
|
|
RETURN_STACK 32
|
815 |
|
|
SRAM_WIDTH 10
|
816 |
|
|
COMBINE INSTRUCTION,RETURN_STACK
|
817 |
|
|
|
818 |
|
|
will use a single 10-bit memory entry for each element of the return stack
|
819 |
|
|
instead of packing the 10-bit values into two memory entries of a 9-bit wide
|
820 |
|
|
memory.
|
821 |
|
|
|
822 |
|
|
The following illustrates a possible configuration for a Spartan-6 with a
|
823 |
|
|
2048-long SRAM and relatively large 64-deep data stack. The data stack will be
|
824 |
|
|
in the last 64 elements of the instruction memory and the instruction space will
|
825 |
|
|
be reduced to 1984 words.
|
826 |
|
|
|
827 |
|
|
INSTRUCTION 2048
|
828 |
|
|
DATA_STACK 64
|
829 |
|
|
COMBINE INSTRUCTION,DATA_STACK
|
830 |
|
|
|
831 |
|
|
The following illustrates a possible configuration for a Cyclone-III with three
|
832 |
|
|
M9Ks for the instruction ROM and the data stack.
|
833 |
|
|
|
834 |
|
|
INSTRUCTION 1024*3
|
835 |
|
|
DATA_STACK 64
|
836 |
|
|
COMBINE INSTRUCTION,DATA_STACK
|
837 |
|
|
|
838 |
|
|
WARNING: Some devices, such as Xilinx' Spartan-3A devices, do not support
|
839 |
|
|
asynchronous reads, so the COMBINE configuration command does not work for them.
|
840 |
|
|
|
841 |
|
|
WARNING: Xilinx XST does not correctly infer a Block RAM when the
|
842 |
|
|
"COMBINE INSTRUCTION,RETURN_STACK" configuration command is used and the
|
843 |
|
|
instruction space is 1024 instructions or larger. Xilinx is supposed to fix
|
844 |
|
|
this in a future release of Vivado so the fix will only apply to 7-series or
|
845 |
|
|
later FPGAs.
|
846 |
|
|
|
847 |
|
|
|
848 |
|
|
MEMORY
|
849 |
|
|
================================================================================
|
850 |
|
|
|
851 |
|
|
The MEMORY configuration command is used as follows to allocate a 128-byte RAM
|
852 |
|
|
named "myram" and to allocate a 32-byte ROM named "myrom". Zero to four
|
853 |
|
|
memories can be allocated, each with up to 256 bytes.
|
854 |
|
|
|
855 |
|
|
MEMORY RAM myram 128
|
856 |
|
|
MEMORY ROM myrom 32
|
857 |
|
|
|
858 |
|
|
The assembly code to lay out the memory uses the ".memory" directive to identify
|
859 |
|
|
the memory and the ".variable" directive to identify the symbol and its content.
|
860 |
|
|
Single or multiple values can be listed and "*N" can be used to identify a
|
861 |
|
|
repeat count.
|
862 |
|
|
|
863 |
|
|
.memory RAM myram
|
864 |
|
|
.variable a 0
|
865 |
|
|
.variable b 0
|
866 |
|
|
.variable c 0 0 0 0
|
867 |
|
|
.variable d 0*4
|
868 |
|
|
|
869 |
|
|
.memory ROM myrom
|
870 |
|
|
.variable coeff_table 0x04
|
871 |
|
|
0x08
|
872 |
|
|
0x10
|
873 |
|
|
0x20
|
874 |
|
|
.variable hello_world N"Hello World!\r\n"
|
875 |
|
|
|
876 |
|
|
Single values are fetched from or stored to memory using the following assembly:
|
877 |
|
|
|
878 |
|
|
.fetchvalue(a)
|
879 |
|
|
0x12 .storevalue(b)
|
880 |
|
|
|
881 |
|
|
Multi-byte values are fetched or stored as follows. This copies the four values
|
882 |
|
|
from coeff_table, which is stored in a ROM, to d.
|
883 |
|
|
|
884 |
|
|
.fetchvector(coeff_table,4) .storevector(d,4)
|
885 |
|
|
|
886 |
|
|
The memory size is available using computed values (see below) and can be used
|
887 |
|
|
to clear the entire memory, etc.
|
888 |
|
|
|
889 |
|
|
The available single-cycle memory operation macros are:
|
890 |
|
|
.fetch(mem_name) replaces T with the value at the address T in the memory
|
891 |
|
|
mem_name
|
892 |
5 |
sinclairrf |
Note: .fetchram(var_name) is safer.
|
893 |
2 |
sinclairrf |
.fetch+(mem_name) pushes the value at address T in the memory mem_name
|
894 |
|
|
into the data stack below T and increments T
|
895 |
|
|
Note: This is useful for fetching successive values
|
896 |
|
|
from memory into the data stack.
|
897 |
5 |
sinclairrf |
Note: .fetchram+(var_name) is safer.
|
898 |
2 |
sinclairrf |
.fetch-(mem_name) similar to .fetch+ but decrements T
|
899 |
5 |
sinclairrf |
Note: .fetchram-(var_name) is safer.
|
900 |
2 |
sinclairrf |
.store(ram_name) stores N at address T in the RAM ram_name, also drops
|
901 |
|
|
the top of the data stack
|
902 |
5 |
sinclairrf |
Note: .storeram(var_name) is safer.
|
903 |
2 |
sinclairrf |
.store+(ram_name) stores N at address T in the RAM ram_name, also drops N
|
904 |
|
|
from the data stack and increments T
|
905 |
5 |
sinclairrf |
Note: .storeram+(var_name) is safer.
|
906 |
2 |
sinclairrf |
.store-(ram_name) similar to .store+ but decrements T
|
907 |
5 |
sinclairrf |
Note: .storeram-(var_name) is safer.
|
908 |
2 |
sinclairrf |
|
909 |
|
|
The following multi-cycle macros provide more generalized access to the
|
910 |
|
|
memories:
|
911 |
|
|
.fetchindexed(var_name)
|
912 |
|
|
uses the top of the data stack as an index into var_name
|
913 |
|
|
Note: This is equivalent to the 3 instruction sequence
|
914 |
|
|
"var_name + .fetch(mem_name)"
|
915 |
|
|
.fetchoffset(var_name,offset)
|
916 |
|
|
fetches the single-byte value of var_name offset by
|
917 |
|
|
"offset" bytes
|
918 |
|
|
Note: This is equivalent to
|
919 |
|
|
"${var_name+offset} .fetch(mem_name)"
|
920 |
5 |
sinclairrf |
.fetchram(var_name) is similar to the .fetch(mem_name) macro except that the
|
921 |
|
|
variable name is used to identify the memory instead of
|
922 |
|
|
the name of the memory
|
923 |
|
|
.fetchram+(var_name) is similar to the .fetch+(mem_name) macro except that
|
924 |
|
|
the variable name is used to identify the memory instead
|
925 |
|
|
of the name of the memory
|
926 |
|
|
.fetchram-(var_name) is similar to the .fetch-(mem_name) macro except that the
|
927 |
|
|
the variable name is used to identify the memory instead
|
928 |
|
|
of the name of the memory
|
929 |
|
|
.fetchvalue(var_name) fetches the single-byte value of var_name
|
930 |
|
|
Note: This is equivalent to "var_name .fetch(mem_name)"
|
931 |
|
|
where mem_name is the memory in which var_name is
|
932 |
|
|
stored.
|
933 |
|
|
.fetchvalueoffset(var_name,offset)
|
934 |
|
|
fetches the single-byte value stored at var_name+offset
|
935 |
|
|
Note: This is equivalent to
|
936 |
|
|
"${var_name+offset}" .fetch(mem_name)
|
937 |
|
|
where mem_name is the memory in which var_name is
|
938 |
|
|
stored.
|
939 |
2 |
sinclairrf |
.fetchvector(var_name,N)
|
940 |
|
|
fetches N values starting at var_name into the data
|
941 |
|
|
stack with the value at var_name at the top and the
|
942 |
|
|
value at var_name+N-1 deep in the stack.
|
943 |
|
|
Note: This is equivalent N+1 operation sequence
|
944 |
|
|
"${var_name+N-1} .fetch-(mem_name) ...
|
945 |
|
|
.fetch-(mem_name) .fetch(mem_name)"
|
946 |
|
|
where ".fetch-(mem_name)" is repeated N-1 times.
|
947 |
|
|
.storeindexed(var_name)
|
948 |
|
|
uses the top of the data stack as an index into var_name
|
949 |
|
|
into which to store the next-to-top of the data stack.
|
950 |
|
|
Note: This is equivalent to the 4 instruction sequence
|
951 |
|
|
"var_name + .store(mem_name) drop".
|
952 |
|
|
Note: The default "drop" instruction can be overriden
|
953 |
|
|
by providing the optional second argument
|
954 |
|
|
similarly to the .storevalue macro.
|
955 |
|
|
.storeoffset(var_name,offset)
|
956 |
|
|
stores the single-byte value at the top of the data
|
957 |
|
|
stack at var_name offset by "offset" bytes
|
958 |
|
|
Note: This is equivalent to
|
959 |
|
|
"${var_name+offset} .store(mem_name) drop"
|
960 |
|
|
Note: The optional third argument is as per the
|
961 |
|
|
optional second argument of .storevalue
|
962 |
5 |
sinclairrf |
.storeram(var_name) is similar to the .store(mem_name) macro except that the
|
963 |
|
|
variable name is used to identify the RAM instead of the
|
964 |
|
|
name of the RAM
|
965 |
|
|
.storeram+(var_name) is similar to the .store+(mem_name) macro except that
|
966 |
|
|
the variable name is used to identify the RAM instead of
|
967 |
|
|
the name of the RAM
|
968 |
|
|
.storeram-(var_name) is similar to the .store-(mem_name) macro except that
|
969 |
|
|
the variable name is used to identify the RAM instead of
|
970 |
|
|
the name of the RAM
|
971 |
|
|
.storevalue(var_name) stores the single-byte value at the top of the data
|
972 |
|
|
stack at var_name
|
973 |
|
|
Note: This is equivalent to
|
974 |
|
|
"var_name .store(mem_name) drop"
|
975 |
|
|
Note: The default "drop" instruction can be replaced by
|
976 |
|
|
providing the optional second argument. For
|
977 |
|
|
example, the following instruction will store and
|
978 |
|
|
then decrement the value at the top of the data
|
979 |
|
|
stack:
|
980 |
|
|
.storevalue(var_name,1-)
|
981 |
2 |
sinclairrf |
.storevector(var_name,N)
|
982 |
|
|
Does the reverse of the .fetchvector macro.
|
983 |
|
|
Note: This is equivalent to the N+2 operation sequence
|
984 |
|
|
"var_name .store+(mem_name) ... .store+(mem_name)
|
985 |
|
|
.store(mem_name) drop"
|
986 |
|
|
where ".store+(mem_name)" is repeated N-1 times.
|
987 |
|
|
|
988 |
|
|
The .fetchvector and .storevector macros are intended to work with values stored
|
989 |
|
|
MSB first in memory and with the MSB toward the top of the data stack,
|
990 |
|
|
similarly to the Forth language with multi-word values. To demonstrate how
|
991 |
|
|
this data structure works, consider the examples of decrementing and
|
992 |
|
|
incrementing a two-byte value on the data stack:
|
993 |
|
|
|
994 |
|
|
; Decrement a 2-byte value
|
995 |
|
|
; swap 1- swap - decrement the LSB
|
996 |
|
|
; over -1= - puts -1 on the top of the data stack if the LSB rolled
|
997 |
|
|
; over from 0 to -1, puts 0 on the top otherwise
|
998 |
|
|
; + - decrements the MSB if the LSB rolled over
|
999 |
|
|
; ( u_LSB u_MSB - u_LSB' u_MSB' )
|
1000 |
|
|
.function decrement_2byte
|
1001 |
|
|
swap 1- swap over -1= .return(+)
|
1002 |
|
|
|
1003 |
|
|
; Increment a 2-byte value
|
1004 |
|
|
; swap 1+ swap - increment the LSB
|
1005 |
|
|
; over 0= - puts -1 on the top of the data stack if the LSB rolled
|
1006 |
|
|
; over from 0xFF to 0, puts 0 on the top otherwise
|
1007 |
|
|
; - - increments the MSB if the LSB rolled over (by
|
1008 |
|
|
; subtracting -1)
|
1009 |
|
|
; ( u_LSB u_MSB - u_LSB' u_MSB' )
|
1010 |
|
|
.function increment_2byte
|
1011 |
|
|
swap 1+ swap over 0= .return(-)
|
1012 |
|
|
|
1013 |
|
|
|
1014 |
|
|
COMPUTED VALUES
|
1015 |
|
|
================================================================================
|
1016 |
|
|
|
1017 |
|
|
Computed values can be pushed on the stack using a "${...}" where the "..." is
|
1018 |
|
|
evaluated in Python and cannot have any spaces.
|
1019 |
|
|
|
1020 |
|
|
For example, a loop that should be run 5 times can be coded as:
|
1021 |
|
|
|
1022 |
|
|
${5-1} :loop ... .jumpc(loop,1-) drop
|
1023 |
|
|
|
1024 |
|
|
which is a clearer indication that the loop is to be run 5 times than is the
|
1025 |
|
|
instruction sequence
|
1026 |
|
|
|
1027 |
|
|
4 :loop ...
|
1028 |
|
|
|
1029 |
|
|
Constants can be accessed in the computation. For example, a block of memory
|
1030 |
|
|
can be allocated as follows:
|
1031 |
|
|
|
1032 |
|
|
.constant C_RESERVE
|
1033 |
|
|
.memory RAM myram
|
1034 |
|
|
...
|
1035 |
|
|
.variable reserved 0*${C_RESERVE}
|
1036 |
|
|
|
1037 |
|
|
and the block of reserved memory can be cleared using the following loop:
|
1038 |
|
|
|
1039 |
|
|
${C_RESERVE-1} :loop 0 over .storeindexed(reserved) .jumpc(loop,1-) drop
|
1040 |
|
|
|
1041 |
|
|
The offsets of variables in their memory can also be accessed through a computed
|
1042 |
|
|
value. The value of reserved could also be cleared as follows:
|
1043 |
|
|
|
1044 |
|
|
${reserved-1} ${C_RESERVE-1} :loop >r
|
1045 |
|
|
|
1046 |
|
|
r> .jumpc(loop,-1) drop drop
|
1047 |
|
|
|
1048 |
|
|
This body of this version of the loop is the same length as the first version.
|
1049 |
|
|
In general, it is better to use the memory macros to access variables as they
|
1050 |
|
|
ensure the correct memory is accessed.
|
1051 |
|
|
|
1052 |
|
|
The sizes of memories can also be accessed using computed values. If "myram" is
|
1053 |
|
|
a RAM, then "${size['myram']}" will push the size of "myram" on the stack. As
|
1054 |
|
|
an example, the following code will clear the entire RAM:
|
1055 |
|
|
|
1056 |
|
|
${size['myram']-1} :loop 0 swap .jumpc(loop,.store-(myram)) drop
|
1057 |
|
|
|
1058 |
|
|
The lengths of I/O signals can also be accessed using computed values. If
|
1059 |
|
|
"o_mask" is a mask, then "${size['o_mask']}" will push the size of the mask on
|
1060 |
|
|
the stack and "${2**size['o_mask']-1}" will push a value that sets all the bits
|
1061 |
|
|
of the mask. The I/O signals include I/O signals instantiated by peripherals.
|
1062 |
|
|
For example, for the configuration command
|
1063 |
|
|
|
1064 |
|
|
PERIPHERAL big_outport outport=O_BIG outsignal=o_big width=47
|
1065 |
|
|
|
1066 |
|
|
the width of the output signal is accessible using "${size['o_big']}". You can
|
1067 |
|
|
set the wide signal to all zeroes using:
|
1068 |
|
|
|
1069 |
|
|
${(size['o_big']+7)/8-1} :loop 0 .outport(O_BIG) .jumpc(loop,1-) drop
|
1070 |
|
|
|
1071 |
3 |
sinclairrf |
|
1072 |
|
|
MACROS
|
1073 |
|
|
================================================================================
|
1074 |
|
|
There are 3 types of macros used by the assembler.
|
1075 |
|
|
|
1076 |
|
|
The first kind of macros are built in to the assembler and are required to
|
1077 |
|
|
encode instructions that have embedded values or have mandatory subsequent
|
1078 |
|
|
instructions. These include function calls, jump instructions, function return,
|
1079 |
|
|
and memory accesses as follows:
|
1080 |
|
|
.call(function,[op])
|
1081 |
|
|
.callc(function,[op])
|
1082 |
|
|
.fetch(ramName)
|
1083 |
|
|
.fetch+(ramName)
|
1084 |
|
|
.fetch-(ramName)
|
1085 |
|
|
.jump(label,[op])
|
1086 |
|
|
.jumpc(label,[op])
|
1087 |
|
|
.return([op])
|
1088 |
|
|
.store(ramName)
|
1089 |
|
|
.store+(ramName)
|
1090 |
|
|
.store-(ramName)
|
1091 |
|
|
|
1092 |
|
|
The second kind of macros are designed to ease access to input and output
|
1093 |
|
|
operations and for memory accesses and to help ensure these operations are
|
1094 |
|
|
correctly constructed. These are defined as python scripts in the
|
1095 |
|
|
core/9x8/macros directory and are automatically loaded into the assembler.
|
1096 |
|
|
These macros are:
|
1097 |
|
|
.fetchindexed(variable)
|
1098 |
|
|
.fetchoffset(variable,ix)
|
1099 |
|
|
.fetchvalue(variableName)
|
1100 |
|
|
.fetchvector(variableName,N)
|
1101 |
|
|
.inport(I_name)
|
1102 |
|
|
.outport(O_name[,op])
|
1103 |
|
|
.outstrobe(O_name)
|
1104 |
|
|
.storeindexed(variableName[,op])
|
1105 |
|
|
.storeoffset(variableName,ix[,op])
|
1106 |
|
|
.storevalue(variableName[,op])
|
1107 |
|
|
.storevector(variableName,N)
|
1108 |
|
|
|
1109 |
|
|
The third kind of macro is user-defined macros. These macros must be registered
|
1110 |
|
|
with the assembler using the ".macro" directive.
|
1111 |
|
|
|
1112 |
|
|
For example, the ".push32" macro is defined by macros/9x8/push32.py and can be
|
1113 |
|
|
used to push 32-bit (4-byte) values onto the data stack as follows:
|
1114 |
|
|
|
1115 |
|
|
.macro push32
|
1116 |
|
|
.constant C_X 0x87654321
|
1117 |
|
|
.main
|
1118 |
|
|
...
|
1119 |
|
|
.push32(0x12345678)
|
1120 |
|
|
.push32(C_X)
|
1121 |
|
|
.push32(${0x12345678^C_X})
|
1122 |
|
|
...
|
1123 |
|
|
|
1124 |
|
|
The following macros are provided in macros/9x8:
|
1125 |
|
|
.push16(v) push the 16-bit (2-byte) value "v" onto the data stack with the
|
1126 |
|
|
MSB at the top of the data stack
|
1127 |
4 |
sinclairrf |
.push24(v) push the 24-bit (3-byte) value "v" onto the data stack with the
|
1128 |
|
|
MSB at the top of the data stack
|
1129 |
3 |
sinclairrf |
.push32(v) push the 32-bit (4-byte) value "v" onto the data stack with the
|
1130 |
|
|
MSB at the top of the data stack
|
1131 |
4 |
sinclairrf |
.pushByte(v,ix)
|
1132 |
|
|
push the ix'th byte of v onto the data stack
|
1133 |
|
|
Note: ix=0 designates the LSB
|
1134 |
3 |
sinclairrf |
|
1135 |
|
|
Directories are searched in the following order for macros:
|
1136 |
|
|
.
|
1137 |
|
|
./macros
|
1138 |
|
|
include paths specified by the '-M' command line option.
|
1139 |
|
|
macros/9x8
|
1140 |
|
|
|
1141 |
|
|
The python scripts in core/9x8/macros and macros/9x8 can be used as design
|
1142 |
|
|
examples for user-defined macros. The assembler does some type checking based
|
1143 |
|
|
on the list provided when the macro is registered by the "AddMacro" method, but
|
1144 |
|
|
additional type checking is often warranted by the macro "emitFunction" which
|
1145 |
|
|
emits the actual assembly code. The ".fetchvector" and ".storevector" macros
|
1146 |
4 |
sinclairrf |
demonstrates how to design variable-length macros. Several macros in
|
1147 |
|
|
core/9x8/macros illustrate designing macros with optional arguments.
|
1148 |
3 |
sinclairrf |
|
1149 |
|
|
It is not an error to repeat the ".macro MACRO_NAME" directive for user-defined
|
1150 |
|
|
macros. The assembler will issue a fatal error if a user-defined macro
|
1151 |
|
|
conflicts with a built-in macro.
|
1152 |
|
|
|
1153 |
|
|
|
1154 |
2 |
sinclairrf |
CONDITIONAL COMPILATION
|
1155 |
|
|
================================================================================
|
1156 |
|
|
The computer compiler and assembler recognize conditional compilation as
|
1157 |
|
|
follows: .IFDEF, .IFNDEF, .ELSE, and .ENDIF can be used in the architecture
|
1158 |
|
|
file and they can be used to conditionally include functions, files, etc within
|
1159 |
|
|
the assembly code; .ifdef, .ifndef, .else, and .endif can be used in function
|
1160 |
|
|
bodies, variable bodies, etc. to conditionally include assembly code, symbols,
|
1161 |
|
|
or data. Conditionals cannot cross file boundaries.
|
1162 |
|
|
|
1163 |
|
|
The computer compiler examines the list of defined symbols such as I/O ports,
|
1164 |
|
|
I/O signals, etc. to evaluate the true/false condition associated with the
|
1165 |
|
|
.IFDEF and .IFNDEF commands. The "-D" option to the computer compiler is
|
1166 |
|
|
provided to define symbols for enabling conditionally compiled configuration
|
1167 |
|
|
commands. Similarly, the assembler examines the list of I/O ports, I/O signals,
|
1168 |
|
|
parameters, constants, etc. to evaluate the .IFDEF, .IFNDEF, .ifdef, and .ifndef
|
1169 |
|
|
conditionals.
|
1170 |
|
|
|
1171 |
|
|
For example, a diagnostic UART can be conditionally included using the
|
1172 |
|
|
configuration commands:
|
1173 |
|
|
|
1174 |
|
|
.IFDEF ENABLE_UART
|
1175 |
|
|
PORTCOMMENT Diagnostic UART
|
1176 |
|
|
PERIPHERAL UART_Tx outport=O_UART_TX ...
|
1177 |
|
|
.ENDIF
|
1178 |
|
|
|
1179 |
|
|
And the assembly code can include conditional code fragments such the following,
|
1180 |
|
|
where the existence of the output port is used to determine whether or not to
|
1181 |
|
|
send a character to that output port:
|
1182 |
|
|
|
1183 |
|
|
.ifdef(O_UART_TX) 'A' .outport(O_UART_TX) .endif
|
1184 |
|
|
|
1185 |
|
|
Invoking the computer compiler with "-D ENABLE_UART" will generate a module with
|
1186 |
|
|
the UART peripheral and will enable the conditional code sending the 'A'
|
1187 |
|
|
character to the UART port.
|
1188 |
|
|
|
1189 |
|
|
The following code can be used to preclude multiple attempted inclusions of an
|
1190 |
|
|
assembly library file.
|
1191 |
|
|
|
1192 |
|
|
; put these two lines near the top of the file
|
1193 |
|
|
.IFNDEF C_FILENAME_INCLUDED
|
1194 |
|
|
.constant C_FILENAME_INCLUDED 1
|
1195 |
|
|
; put the library body here
|
1196 |
|
|
...
|
1197 |
|
|
; put this line at the bottom of the file
|
1198 |
|
|
.ENDIF ; .IFNDEF C_FILENAME_INCLUDED
|
1199 |
|
|
|
1200 |
|
|
The ".INCLUDE" configuration command can be used to read configuration commands
|
1201 |
|
|
from additional sources.
|
1202 |
|
|
|
1203 |
|
|
|
1204 |
|
|
SIMULATIONS
|
1205 |
|
|
================================================================================
|
1206 |
|
|
|
1207 |
|
|
Simulations have been performed with Icarus Verilog, Verilator, and Xilinx'
|
1208 |
|
|
ISIM. Icarus Verilog is good for short, simple simulations and is used for the
|
1209 |
|
|
core and peripheral test benches; Verilator for long simulations of large,
|
1210 |
|
|
complex systems; and ISIM when Xilinx-specific cores are used. Verilator is
|
1211 |
|
|
the fastest simulators I've encountered. Verilator is also used for lint
|
1212 |
|
|
checking in the core test benches.
|
1213 |
|
|
|
1214 |
|
|
|
1215 |
|
|
MEM INITIALIZATION FILE
|
1216 |
|
|
================================================================================
|
1217 |
|
|
|
1218 |
|
|
A memory initialization file is produced during compilation. This file can be
|
1219 |
|
|
used with tools such as Xilinx' data2mem to modify the SRAM contents without
|
1220 |
|
|
having to rebuild the entire system. It is restricted to the opcode memory
|
1221 |
|
|
initialization. The file must be processed before it can be used by specific
|
1222 |
|
|
tools, see doc/MemoryInitialization.html.
|
1223 |
|
|
|
1224 |
|
|
WARNING: The values of parameters used in the assembly code must match the
|
1225 |
|
|
instantiated design.
|
1226 |
|
|
|
1227 |
|
|
|
1228 |
|
|
THEORY OF OPERATION
|
1229 |
|
|
================================================================================
|
1230 |
|
|
|
1231 |
|
|
Registers are used for the top of data stack, "T", and the next-to-top of the
|
1232 |
|
|
data stack, "N". The data stack is a separate memory. This means that the
|
1233 |
|
|
"DATA_STACK N" configuration command actually allows N+2 values in the data
|
1234 |
|
|
stack since T and N are not stored in the N-element deep data stack.
|
1235 |
|
|
|
1236 |
|
|
The return stack is similar in that "R" is the top of the return stack and the
|
1237 |
|
|
"RETURN_STACK N" allocates an additional N words of memory. The return stack is
|
1238 |
|
|
the wider of the 8-bit data width and the program counter width.
|
1239 |
|
|
|
1240 |
|
|
The program counter is always either incremented by 1 or is set to an address
|
1241 |
|
|
as controlled by jump, jumpc, call, callc, and return instructions. The
|
1242 |
|
|
registered program counter is used to read the next opcode from the instruction
|
1243 |
|
|
memory and this opcode is also registered in the memory. This means that there
|
1244 |
|
|
is a 1 clock cycle delay between the address changing and the associated
|
1245 |
|
|
instruction being performed. This is also part of the architecture required to
|
1246 |
|
|
have the processor operate at one instruction per clock cycle.
|
1247 |
|
|
|
1248 |
|
|
Separate ALUs are used for the program counter, adders, logical operations, etc.
|
1249 |
|
|
and MUXes are used to select the values desired for the destination registers.
|
1250 |
|
|
The instruction execution consists of translating the upper 6 msb of the opcode
|
1251 |
|
|
into MUX settings and performing opcode-dependent ALU operations as controlled
|
1252 |
|
|
by the 3 lsb of the opcode (during the first half of the clock cycle) and then
|
1253 |
|
|
setting the T, N, R, memories, etc. as controlled by the computed MUX settings.
|
1254 |
|
|
|
1255 |
|
|
The "core.v" file is the code for these operations. Within this file there are
|
1256 |
|
|
several "@xxx@" strings that specify where the computer compiler is to insert
|
1257 |
|
|
code such as I/O declarations, memories, inport interpretation, outport
|
1258 |
|
|
generation, peripherals, etc.
|
1259 |
|
|
|
1260 |
|
|
The file structure, i.e., putting the core and the assembler in "core/9x8"
|
1261 |
|
|
should facilitate application-specific modification of processor. For example,
|
1262 |
|
|
the store+, store-, fetch+, and fetch- instructions could be replaced with
|
1263 |
|
|
additional stack manipulation operations, arithmetic operations with 2 byte
|
1264 |
|
|
results, etc. Simply copy the "9x8" directory to something like "9x8_XXX" and
|
1265 |
|
|
make your modifications in that directory. The 8-bit peripherals should still
|
1266 |
|
|
work, but the 9x8 library functions may need rework to accommodate the
|
1267 |
|
|
modifications.
|
1268 |
|
|
|
1269 |
|
|
|
1270 |
|
|
MISCELLANEOUS
|
1271 |
|
|
================================================================================
|
1272 |
|
|
|
1273 |
4 |
sinclairrf |
Features and peripherals are still being added and the documentation is
|
1274 |
|
|
incomplete. The output HDL is currently restricted to Verilog although a VHDL
|
1275 |
|
|
package file is automatically generated by the computer compiler.
|
1276 |
|
|
|
1277 |
2 |
sinclairrf |
The "INVERT_RESET" configuration command is used to indicate an active-low reset
|
1278 |
|
|
is input to the micro controller rather than an active-high reset.
|
1279 |
|
|
|
1280 |
|
|
A VHDL package file is automatically generated by the computer compiler.
|