| 1 |
2 |
sinclairrf |
SSBCC.9x8 is a free Small Stack-Based Computer Compiler with a 9-bit opcode,
|
| 2 |
4 |
sinclairrf |
8-bit data core designed to facilitate FPGA HDL development.
|
| 3 |
2 |
sinclairrf |
|
| 4 |
4 |
sinclairrf |
The primary design criteria are:
|
| 5 |
|
|
- high speed (to avoid timing issues)
|
| 6 |
|
|
- low fabric utilization
|
| 7 |
|
|
- vendor independent
|
| 8 |
|
|
- development tools available for all operating systems
|
| 9 |
|
|
|
| 10 |
|
|
It has been used in Spartan-3A, Spartan-6, Virtex-6, and Artix-7 FPGAs and has
|
| 11 |
|
|
been built for Altera, Lattice, and other Xilinx devices. It is faster and
|
| 12 |
|
|
usually smaller than vendor provided processors.
|
| 13 |
|
|
|
| 14 |
2 |
sinclairrf |
The compiler takes an architecture file that describes the micro controller
|
| 15 |
|
|
memory spaces, inputs and outputs, and peripherals and which specifies the HDL
|
| 16 |
|
|
language and source assembly. It generates a single HDL module implementing
|
| 17 |
|
|
the entire micro controller. No user-written HDL is required to instantiate
|
| 18 |
|
|
I/Os, program memory, etc.
|
| 19 |
|
|
|
| 20 |
4 |
sinclairrf |
The features are:
|
| 21 |
|
|
- high speed, low fabric utilization
|
| 22 |
|
|
- vendor-independent Verilog output with a VHDL package file
|
| 23 |
|
|
- simple Forth-like assembly language (41 instructions)
|
| 24 |
|
|
- single cycle instruction execution
|
| 25 |
|
|
- automatic generation of I/O ports
|
| 26 |
|
|
- configurable instruction, data stack, return stack, and memory utilization
|
| 27 |
|
|
- extensible set of peripherals (I2C busses, UARTs, AXI4-Lite busses, etc.)
|
| 28 |
|
|
- extensible set of macros
|
| 29 |
|
|
- memory initialization file to facilitate code development without rebuilds
|
| 30 |
|
|
- simulation diagnostics to facilitate identifying code errors
|
| 31 |
|
|
- conditionally included I/Os and peripherals, functions, and assembly code
|
| 32 |
|
|
|
| 33 |
2 |
sinclairrf |
SSBCC has been used for the following projects:
|
| 34 |
|
|
- operate a media translator from a parallel camera interface to an OMAP GPMC
|
| 35 |
|
|
interface, detect and report bus errors and hardware errors, and act as an
|
| 36 |
|
|
SPI slave to the OMAP
|
| 37 |
|
|
- operate two UART interfaces and multiple PWM controlled 2-lead bi-color LEDs
|
| 38 |
|
|
- operate and monitor the Artix-7 fabric in a Zynq system using AXI4-Lite
|
| 39 |
|
|
master and slave buses, I2C buses for timing-critical voltage measurements
|
| 40 |
|
|
|
| 41 |
4 |
sinclairrf |
The only external tool required is Python 2.7.
|
| 42 |
2 |
sinclairrf |
|
| 43 |
4 |
sinclairrf |
|
| 44 |
2 |
sinclairrf |
DESCRIPTION
|
| 45 |
|
|
================================================================================
|
| 46 |
|
|
|
| 47 |
|
|
The computer compiler uses an architectural description of the processor stating
|
| 48 |
|
|
the sizes of the instruction memory, data stack, and return stack; the input and
|
| 49 |
|
|
output ports; RAM and ROM types and sizes; and peripherals.
|
| 50 |
|
|
|
| 51 |
|
|
The instructions are all single-cycle. The instructions include
|
| 52 |
4 |
sinclairrf |
- 4 arithmetic instructions: addition, subtraction, increment, and decrement
|
| 53 |
|
|
- 3 bit-wise logical instructions: and, or, and exclusive or
|
| 54 |
|
|
- 7 shift and rotation instructions: <<0, <<1, 0>>, 1>>, <>msb, and >>lsb
|
| 55 |
|
|
- 4 logical instructions: 0=, 0<>, -1=, -1<>
|
| 56 |
|
|
- 6 Forth-like data stack instructions: drop, dup, nip, over, push, swap
|
| 57 |
|
|
- 3 Forth-like return stack instructions: >r, r>, r@
|
| 58 |
|
|
- 2 input and output
|
| 59 |
|
|
- 6 memory read and write with optional address post increment and post decrement
|
| 60 |
|
|
- 2 jump and conditional jump
|
| 61 |
|
|
- 2 call and conditional call
|
| 62 |
|
|
- 1 function return
|
| 63 |
|
|
- 1 nop
|
| 64 |
2 |
sinclairrf |
|
| 65 |
|
|
The 9x8 address space is up to 8K. This is achieved by pushing the 8 lsb of the
|
| 66 |
|
|
target address onto the data stack immediately before the jump or call
|
| 67 |
|
|
instruction and by encoding the 5 msb of the address within the jump or call
|
| 68 |
|
|
instruction. The instruction immediately following a jump, call, or return is
|
| 69 |
|
|
executed before the instruction sequence at the destination address is executed
|
| 70 |
|
|
(this is illustrated later).
|
| 71 |
|
|
|
| 72 |
|
|
Up to four banks of memory, either RAM or ROM, are available. Each of these can
|
| 73 |
|
|
be up to 256 bytes long, providing a total of up to 1 kB of memory.
|
| 74 |
|
|
|
| 75 |
4 |
sinclairrf |
The assembly language is Forth-like. Built-in macros are used to encode the
|
| 76 |
|
|
jump and call instructions and to encode the 2-bit memory bank index in memory
|
| 77 |
|
|
store and fetch instructions.
|
| 78 |
2 |
sinclairrf |
|
| 79 |
|
|
The computer compiler and assembler are written in Python 2.7. Peripherals are
|
| 80 |
|
|
implemented by Python modules which generate the I/O ports and the peripheral
|
| 81 |
|
|
HDL.
|
| 82 |
|
|
|
| 83 |
|
|
The computer compiler is documented in the doc directory. The 9x8 core is
|
| 84 |
|
|
documented in the core/9x8/doc directory. Several examples are provided.
|
| 85 |
|
|
|
| 86 |
|
|
The computer compiler and assembler are fully functional and there are no known
|
| 87 |
|
|
bugs.
|
| 88 |
|
|
|
| 89 |
|
|
|
| 90 |
|
|
SPEED AND RESOURCE UTILIZATION
|
| 91 |
|
|
================================================================================
|
| 92 |
|
|
These device speed and resource utilization results are copied from the build
|
| 93 |
|
|
tests. The full results are listed in core/9x8/build/uc/uc_led.9x8 which
|
| 94 |
|
|
represents a minimal processor implementation (clock, reset, and one output).
|
| 95 |
|
|
See the uc_peripherals.9x8 file for results for a more complicated
|
| 96 |
|
|
implementation. Device-specific scripts state how these performance numbers
|
| 97 |
|
|
were obtained.
|
| 98 |
|
|
|
| 99 |
|
|
VENDOR DEVICE BEST SPEED SMALLEST RESOURCE UTILIZATION
|
| 100 |
|
|
------ ------ ---------- -------------------------------
|
| 101 |
|
|
Altera Cyclone-III 190.6 MHz 282 LEs (preliminary)
|
| 102 |
|
|
Altera Cyclone-IV 192.1 MHz 281 LEs (preliminary)
|
| 103 |
|
|
Altera Stratix-V 372.9 MHz 198 ALUTs (preliminary)
|
| 104 |
|
|
Lattice LCMXO2-640ZE-3 98.4 MHz 206 LUTs (preliminary)
|
| 105 |
|
|
Lattice LFE2-6E-7 157.9 MHz 203 LUTs (preliminary)
|
| 106 |
|
|
Xilinx Spartan-3A 148.3 MHz 130 slices, 231 4-input LUTS
|
| 107 |
|
|
Xilinx Spartan-6 200.0 MHz 36 slices, 120 Slice LUTs
|
| 108 |
|
|
Xilinx Virtex-6 275.7 MHz 38 slices, 122 Slice LUTs (p.)
|
| 109 |
|
|
|
| 110 |
|
|
Disclaimer: Like other embedded processors, these are the maximum performance
|
| 111 |
|
|
claims. Realistic implementations will produce slower maximum clock rates,
|
| 112 |
|
|
particularly with lots of I/O ports and peripherals and with the constraint of
|
| 113 |
|
|
existing with other subsystems in the FPGA fabric. What these performance
|
| 114 |
|
|
numbers do provide is an estimate of the amount of slack available. For
|
| 115 |
|
|
example, you can't realistically expect to get 110 MHz from a processor that,
|
| 116 |
|
|
under ideal conditions, routes and places at 125 MHz, but you can with a
|
| 117 |
4 |
sinclairrf |
processor that synthesizes at 150 MHz.
|
| 118 |
2 |
sinclairrf |
|
| 119 |
|
|
|
| 120 |
|
|
EXAMPLE:
|
| 121 |
|
|
================================================================================
|
| 122 |
|
|
|
| 123 |
|
|
The LED flasher example demonstrates the simplicity of the architectural
|
| 124 |
|
|
specification and the Forth-like assembly language.
|
| 125 |
|
|
|
| 126 |
|
|
The architecture file, named "led.9x8", with the comments and user header
|
| 127 |
|
|
removed, is as follows:
|
| 128 |
|
|
|
| 129 |
|
|
ARCHITECTURE core/9x8 Verilog
|
| 130 |
|
|
|
| 131 |
|
|
INSTRUCTION 2048
|
| 132 |
|
|
RETURN_STACK 32
|
| 133 |
|
|
DATA_STACK 32
|
| 134 |
|
|
|
| 135 |
|
|
PORTCOMMENT LED on/off signal
|
| 136 |
|
|
OUTPORT 1-bit o_led O_LED
|
| 137 |
|
|
|
| 138 |
|
|
ASSEMBLY led.s
|
| 139 |
|
|
|
| 140 |
|
|
The ARCHITECTURE configuration command specifies the 9x8 core and the Verilog
|
| 141 |
|
|
language. The INSTRUCTION, RETURN_STACK, and DATA_STACK configuration commands
|
| 142 |
|
|
specify the sizes of the instruction space, return stack, and data stack. The
|
| 143 |
|
|
content of the PORTCOMMENT configuration command is inserted in the module
|
| 144 |
|
|
declaration -- this facilitates identifying signals in micro controllers with a
|
| 145 |
|
|
lot of inputs and outputs. The single OUTPORT statement specifies a 1-bit
|
| 146 |
|
|
signal named "o_led". This signal is accessed in the assembly code through the
|
| 147 |
|
|
symbol "O_LED". The ASSEMBLY command specifies the single input file "led.s,"
|
| 148 |
|
|
which is listed below. The output module will be "led.v"
|
| 149 |
|
|
|
| 150 |
|
|
The "led.s" assembly file is as follows:
|
| 151 |
|
|
|
| 152 |
|
|
; Consume 256*5+4 clock cycles.
|
| 153 |
|
|
; ( - )
|
| 154 |
|
|
.function pause
|
| 155 |
|
|
|
| 156 |
|
|
.return
|
| 157 |
|
|
|
| 158 |
|
|
; Repeat "pause" 256 times.
|
| 159 |
|
|
; ( - )
|
| 160 |
|
|
.function repause
|
| 161 |
|
|
|
| 162 |
|
|
.return
|
| 163 |
|
|
|
| 164 |
|
|
; main program (as an infinite loop)
|
| 165 |
|
|
.main
|
| 166 |
|
|
|
| 167 |
|
|
|
| 168 |
|
|
This example is coded in a traditional Forth structure with the conditional
|
| 169 |
|
|
jumps consuming the top of the data stack. Examining the "pause" function, the
|
| 170 |
|
|
".function" directive specifies the start of a function and the function name.
|
| 171 |
|
|
The "0" instruction pushes the value "0" onto the top of the data stack.
|
| 172 |
|
|
":inner" is a label for a jump instruction. The "1-" instruction decrements the
|
| 173 |
|
|
top of the data stack. "dup" is the Forth instruction to push a duplicate of
|
| 174 |
|
|
the top of the data stack onto the data stack. The ".jumpc(inner)" macro
|
| 175 |
|
|
expands to three instructions as follows: (1) push the 8 lsb of the address at
|
| 176 |
|
|
"inner" onto the data stack, (2) the conditional jump instruction with the 5 msb
|
| 177 |
|
|
of the address of "inner" (the jumpc instruction also drops the top of the data
|
| 178 |
|
|
stack with its partial address), and (3) a "drop" instruction to drop the
|
| 179 |
|
|
duplicated loop count from the top of the data stack. Finally, the "drop"
|
| 180 |
|
|
instruction drops the loop count from the top of the data stack and the
|
| 181 |
|
|
".return" macro generates the "return" instruction and a "nop" instruction.
|
| 182 |
|
|
|
| 183 |
|
|
The function "repause" calls the "pause" function 256 times. The main program
|
| 184 |
|
|
body is identified by the directive ".main" This function runs an infinite loop
|
| 185 |
|
|
that toggles the lsb of the LED output, outputs the LED setting, and calls the
|
| 186 |
|
|
"repause" function.
|
| 187 |
|
|
|
| 188 |
|
|
A tighter version of the loop in the "pause" function can be written as
|
| 189 |
|
|
|
| 190 |
|
|
; Consume 256*3+3 clock cycles.
|
| 191 |
|
|
; ( - )
|
| 192 |
|
|
.function pause
|
| 193 |
|
|
0xFF :inner .jumpc(inner,1-) .return(drop)
|
| 194 |
|
|
|
| 195 |
|
|
which is 3 cycles long for each iteration, the "drop" that is normally part
|
| 196 |
|
|
of the ".jumpc" macro has been replaced by the decrement instruction, and the
|
| 197 |
|
|
final "drop" instruction has replaced the default "nop" instruction that is
|
| 198 |
|
|
normally part of the ".return" macro. Note that the decrement is performed
|
| 199 |
|
|
after the non-zero comparison in the "jumpc" instruction.
|
| 200 |
|
|
|
| 201 |
|
|
A version of the "pause" function that consumes exactly 1000 clock cycles is:
|
| 202 |
|
|
|
| 203 |
|
|
.function pause
|
| 204 |
|
|
${(1000-4)/4-1} :inner nop .jumpc(inner,1-) drop .return
|
| 205 |
|
|
|
| 206 |
|
|
The instruction memory initialization for the processor module includes the
|
| 207 |
|
|
instruction mnemonics being performed at each address and replaces the "list"
|
| 208 |
|
|
file output from traditional assemblers. The following is the memory
|
| 209 |
|
|
initialization for this LED flasher example. The main program always starts at
|
| 210 |
|
|
address zero and functions are included in the order encountered. Unused
|
| 211 |
|
|
library functions are not included in the generated instruction list.
|
| 212 |
|
|
|
| 213 |
|
|
reg [8:0] s_opcodeMemory[2047:0];
|
| 214 |
|
|
initial begin
|
| 215 |
|
|
// .main
|
| 216 |
|
|
s_opcodeMemory['h000] = 9'h100; // 0x00
|
| 217 |
|
|
s_opcodeMemory['h001] = 9'h101; // :inner 0x01
|
| 218 |
|
|
s_opcodeMemory['h002] = 9'h052; // ^
|
| 219 |
|
|
s_opcodeMemory['h003] = 9'h008; // dup
|
| 220 |
|
|
s_opcodeMemory['h004] = 9'h100; // O_LED
|
| 221 |
|
|
s_opcodeMemory['h005] = 9'h038; // outport
|
| 222 |
|
|
s_opcodeMemory['h006] = 9'h054; // drop
|
| 223 |
|
|
s_opcodeMemory['h007] = 9'h10D; //
|
| 224 |
|
|
s_opcodeMemory['h008] = 9'h0C0; // call repause
|
| 225 |
|
|
s_opcodeMemory['h009] = 9'h000; // nop
|
| 226 |
|
|
s_opcodeMemory['h00A] = 9'h101; //
|
| 227 |
|
|
s_opcodeMemory['h00B] = 9'h080; // jump inner
|
| 228 |
|
|
s_opcodeMemory['h00C] = 9'h000; // nop
|
| 229 |
|
|
// repause
|
| 230 |
|
|
s_opcodeMemory['h00D] = 9'h100; // 0x00
|
| 231 |
|
|
s_opcodeMemory['h00E] = 9'h119; // :inner
|
| 232 |
|
|
s_opcodeMemory['h00F] = 9'h0C0; // call pause
|
| 233 |
|
|
s_opcodeMemory['h010] = 9'h000; // nop
|
| 234 |
|
|
s_opcodeMemory['h011] = 9'h05C; // 1-
|
| 235 |
|
|
s_opcodeMemory['h012] = 9'h008; // dup
|
| 236 |
|
|
s_opcodeMemory['h013] = 9'h10E; //
|
| 237 |
|
|
s_opcodeMemory['h014] = 9'h0A0; // jumpc inner
|
| 238 |
|
|
s_opcodeMemory['h015] = 9'h054; // drop
|
| 239 |
|
|
s_opcodeMemory['h016] = 9'h054; // drop
|
| 240 |
|
|
s_opcodeMemory['h017] = 9'h028; // return
|
| 241 |
|
|
s_opcodeMemory['h018] = 9'h000; // nop
|
| 242 |
|
|
// pause
|
| 243 |
|
|
s_opcodeMemory['h019] = 9'h100; // 0x00
|
| 244 |
|
|
s_opcodeMemory['h01A] = 9'h05C; // :inner 1-
|
| 245 |
|
|
s_opcodeMemory['h01B] = 9'h008; // dup
|
| 246 |
|
|
s_opcodeMemory['h01C] = 9'h11A; //
|
| 247 |
|
|
s_opcodeMemory['h01D] = 9'h0A0; // jumpc inner
|
| 248 |
|
|
s_opcodeMemory['h01E] = 9'h054; // drop
|
| 249 |
|
|
s_opcodeMemory['h01F] = 9'h054; // drop
|
| 250 |
|
|
s_opcodeMemory['h020] = 9'h028; // return
|
| 251 |
|
|
s_opcodeMemory['h021] = 9'h000; // nop
|
| 252 |
|
|
s_opcodeMemory['h022] = 9'h000;
|
| 253 |
|
|
s_opcodeMemory['h023] = 9'h000;
|
| 254 |
|
|
s_opcodeMemory['h024] = 9'h000;
|
| 255 |
|
|
...
|
| 256 |
|
|
s_opcodeMemory['h7FF] = 9'h000;
|
| 257 |
|
|
end
|
| 258 |
|
|
|
| 259 |
|
|
|
| 260 |
|
|
DATA and STRINGS
|
| 261 |
|
|
================================================================================
|
| 262 |
|
|
|
| 263 |
|
|
Values are pushed onto the data stack by stating the value. For example,
|
| 264 |
|
|
|
| 265 |
|
|
0x10 0x20 'x'
|
| 266 |
|
|
|
| 267 |
|
|
will successively push the values 0x10, 0x20, and the character 'x' onto the
|
| 268 |
|
|
data stack. The character 'x' will be at the top of the data stack after these
|
| 269 |
|
|
3 instructions.
|
| 270 |
|
|
|
| 271 |
5 |
sinclairrf |
Numeric values can be represented in binary, octal, decimal, and hex. Binary
|
| 272 |
|
|
values start with the two characters "0b" followed by a sequence of binary
|
| 273 |
|
|
digits; octal numbers start with a "0" followed by a sequence of octal digits;
|
| 274 |
|
|
decimal values can start with a "+" or "-" have a non-zero first digit and have
|
| 275 |
|
|
zero or more decimal digits; and hex values start with the two characters "0X"
|
| 276 |
|
|
followed by a sequence of hex digits.
|
| 277 |
2 |
sinclairrf |
|
| 278 |
5 |
sinclairrf |
Examples of equivalent numeric values are:
|
| 279 |
|
|
binary: 0b01 0b10010
|
| 280 |
|
|
octal: 01 022
|
| 281 |
|
|
decimal: 1 18
|
| 282 |
|
|
hex: 0x1 0x12
|
| 283 |
|
|
|
| 284 |
|
|
See the COMPUTED VALUES section for using computed values in the assembler.
|
| 285 |
|
|
|
| 286 |
2 |
sinclairrf |
There are four ways to specify strings in the assembler. Simply stating the
|
| 287 |
|
|
string
|
| 288 |
|
|
|
| 289 |
|
|
"Hello World!"
|
| 290 |
|
|
|
| 291 |
|
|
puts the characters in the string onto the data stack with the letter 'H' at the
|
| 292 |
|
|
top of the data stack. I.e., the individual push operations are
|
| 293 |
|
|
|
| 294 |
|
|
'!' 'd' 'l' ... 'e' 'H'
|
| 295 |
|
|
|
| 296 |
|
|
Prepending a 'N' before the double quote, like
|
| 297 |
|
|
|
| 298 |
|
|
N"Hello World!"
|
| 299 |
|
|
|
| 300 |
|
|
puts a null-terminated string onto the data stack. I.e., the value under the
|
| 301 |
|
|
'!' will be a 0x00 and the instruction sequence would be
|
| 302 |
|
|
|
| 303 |
|
|
0x0 '!' 'd' 'l' ... 'e' 'H'
|
| 304 |
|
|
|
| 305 |
|
|
Forth uses counted strings, which are specified here as
|
| 306 |
|
|
|
| 307 |
|
|
C"Hello World!"
|
| 308 |
|
|
|
| 309 |
4 |
sinclairrf |
In this case the number of characters, 12, in the string is pushed onto the data
|
| 310 |
|
|
stack after the 'H', i.e., the instruction sequence would be
|
| 311 |
2 |
sinclairrf |
|
| 312 |
|
|
'!' 'd' 'l' ... 'e' 'H' 12
|
| 313 |
|
|
|
| 314 |
|
|
Finally, a lesser-counted string specified like
|
| 315 |
|
|
|
| 316 |
|
|
c"Hello World!"
|
| 317 |
|
|
|
| 318 |
|
|
is similar to the Forth-like counted string except that the value pushed onto
|
| 319 |
|
|
the data stack is one less than the number of characters in the string. Here
|
| 320 |
|
|
the value pushed onto the data stack after the 'H' would be 11 instead of 12.
|
| 321 |
|
|
|
| 322 |
|
|
Simple strings are useful for constructing more complex strings in conjunction
|
| 323 |
|
|
with other string functions. For example, to transmit the hex values of the
|
| 324 |
|
|
top 2 values in the data stack, do something like:
|
| 325 |
|
|
|
| 326 |
|
|
; move the top 2 values to the return stack
|
| 327 |
|
|
>r >r
|
| 328 |
|
|
; push the tail of the message onto the data stack
|
| 329 |
|
|
N"\n\r"
|
| 330 |
|
|
; convert the 2 values to 2-digit hex values, LSB deepest in the stack
|
| 331 |
|
|
r> .call(string_byte_to_hex)
|
| 332 |
|
|
r> .call(string_byte_to_hex)
|
| 333 |
|
|
; pre-pend the identification message
|
| 334 |
|
|
"Message: "
|
| 335 |
|
|
; transmit the string, using the null terminator to terminate the loop
|
| 336 |
|
|
:loop_transmit .outport(O_UART_TX) .jumpc(loop_transmit,nop) drop
|
| 337 |
|
|
|
| 338 |
|
|
A lesser-counted string would be used like:
|
| 339 |
|
|
|
| 340 |
|
|
c"Status Message\r\n"
|
| 341 |
|
|
:loop_msg swap .outport(O_UART_TX) .jumpc(loop_msg,1-) drop
|
| 342 |
|
|
|
| 343 |
|
|
These four string formats can also be used for variable definitions. For
|
| 344 |
|
|
example 3 variables could be allocated and initialized as follows:
|
| 345 |
|
|
|
| 346 |
|
|
.memory ROM myrom
|
| 347 |
|
|
.variable fred N"fred"
|
| 348 |
|
|
.variable joe c"joe"
|
| 349 |
|
|
.variable moe "moe"
|
| 350 |
|
|
|
| 351 |
|
|
These are equivalent to
|
| 352 |
|
|
|
| 353 |
|
|
.variable fred 'f' 'r' 'e' 'd' 0
|
| 354 |
|
|
.variable joe 2 'j' 'o' 'e'
|
| 355 |
|
|
.variable moe 'm' 'o' 'e'
|
| 356 |
|
|
|
| 357 |
|
|
with 5 bytes allocated for the variable fred, 4 bytes for joe, and 3 bytes for
|
| 358 |
|
|
moe.
|
| 359 |
|
|
|
| 360 |
|
|
The following escaped characters are recognized:
|
| 361 |
|
|
|
| 362 |
|
|
'\0' null character
|
| 363 |
|
|
'\a' bell
|
| 364 |
|
|
'\b' backspace
|
| 365 |
|
|
'\f' form feed
|
| 366 |
|
|
'\n' line feed
|
| 367 |
|
|
'\r' carriage return
|
| 368 |
|
|
'\t' horizontal tab
|
| 369 |
|
|
"\0ooo" 3-digit octal value
|
| 370 |
|
|
"\xXX" 2-digit hex value where X is one of 0-9, a-f, or A-F
|
| 371 |
|
|
"\Xxx" alternate form for 2-digit hex value
|
| 372 |
|
|
"\\" backslash character
|
| 373 |
|
|
|
| 374 |
|
|
Unrecognized escaped characters are simple treated as that character. For
|
| 375 |
|
|
example, '\m' is treated as the single character 'm' and '\'' is treated as the
|
| 376 |
|
|
single quote character.
|
| 377 |
|
|
|
| 378 |
|
|
|
| 379 |
|
|
INSTRUCTIONS
|
| 380 |
|
|
================================================================================
|
| 381 |
|
|
|
| 382 |
|
|
The 41 instructions are as follows (see core/9x8/doc/opcodes.html for detailed
|
| 383 |
|
|
descriptions). Here, T is the top of the data stack, N is the next-to-top of
|
| 384 |
|
|
the data stack, and R is the top of the return stack. All of these are the
|
| 385 |
|
|
values at the start of the instruction.
|
| 386 |
|
|
|
| 387 |
|
|
The nop instruction does nothing:
|
| 388 |
|
|
|
| 389 |
|
|
nop no operation
|
| 390 |
|
|
|
| 391 |
|
|
Mathematical operations drop one value from the data stack and replace the new
|
| 392 |
|
|
top with the state value:
|
| 393 |
|
|
|
| 394 |
|
|
& bitwise and of N and T
|
| 395 |
|
|
+ N + T
|
| 396 |
|
|
- N - T
|
| 397 |
|
|
^ bitwise exclusive or of N and T
|
| 398 |
|
|
or bitwise or of N and T
|
| 399 |
|
|
|
| 400 |
|
|
Increment and decrement replace the top of the data stack with the stated
|
| 401 |
|
|
result.
|
| 402 |
|
|
|
| 403 |
|
|
1+ replace T with T+1
|
| 404 |
|
|
1- replace T with T-1
|
| 405 |
|
|
|
| 406 |
|
|
Comparison operations replace the top of the data stack with the results of the
|
| 407 |
|
|
comparison:
|
| 408 |
|
|
|
| 409 |
|
|
-1<> replace T with -1 if T != -1, otherwise set T to 0
|
| 410 |
|
|
-1= replace T with 0 if T != -1, otherwise leave T as -1
|
| 411 |
|
|
0<> replace T with -1 if T != 0, otherwise leave T as 0
|
| 412 |
|
|
0= replace T with -1 if T == 0, otherwise set T to 0
|
| 413 |
|
|
|
| 414 |
|
|
Shift/rotate operations replace the top of the data with with the result of the
|
| 415 |
|
|
specified shift/rotate.
|
| 416 |
|
|
|
| 417 |
|
|
0>> shift T right one bit and set the msb to 0
|
| 418 |
|
|
1>> shift T right 1 bit and set the msb to 1
|
| 419 |
|
|
<<0 shift T left 1 bit and set the lsb to 0
|
| 420 |
|
|
<<1 shift T left 1 bit and set the lsb to 1
|
| 421 |
|
|
<
|
| 422 |
|
|
lsb>> rotate T right 1 bit
|
| 423 |
|
|
msb>> shift T right 1 bit and set the msb to the old msb
|
| 424 |
|
|
|
| 425 |
|
|
Note: There is no "<
|
| 426 |
|
|
|
| 427 |
|
|
Stack manipulation instructions are as follows:
|
| 428 |
|
|
|
| 429 |
|
|
>r pushd T onto the return stack and drop T from the data stack
|
| 430 |
|
|
drop drop T from the data stack
|
| 431 |
|
|
dup push T onto the data stack
|
| 432 |
|
|
nip drop N from the data stack
|
| 433 |
|
|
over push N onto the data stack
|
| 434 |
|
|
push push a single byte onto the data stack, see the preceding DATA
|
| 435 |
|
|
and STRINGS section
|
| 436 |
|
|
r> push R onto the data stack and drop R from the return stack
|
| 437 |
|
|
r@ push R onto the data stack
|
| 438 |
|
|
swap swap N and T
|
| 439 |
|
|
|
| 440 |
|
|
Jump and call and their conditional variants are as follows and must use the
|
| 441 |
|
|
associated macro:
|
| 442 |
|
|
|
| 443 |
|
|
call call instruction -- use the .call macro
|
| 444 |
|
|
callc conditional call instruction -- use the .callc macro
|
| 445 |
|
|
jump jump instruction -- use the .jump macro
|
| 446 |
|
|
jumpc conditional jump instruction -- use the .jumpc macro
|
| 447 |
|
|
return return instruction -- use the .return macro
|
| 448 |
|
|
|
| 449 |
|
|
See the MEMORY section for details for these memory operations. T is the
|
| 450 |
|
|
address for the instructions, N is the value stored. Chained fetches insert the
|
| 451 |
|
|
value below T. Chained stores drop N.
|
| 452 |
|
|
|
| 453 |
|
|
fetch memory fetch, replace T with the value fetched
|
| 454 |
|
|
fetch+ chained memory fetch, retain and increment the address
|
| 455 |
|
|
fetch- chained memory fetch, retain and decrement the address
|
| 456 |
|
|
store memory store, drop T (N is the next value of T)
|
| 457 |
|
|
store+ chained memory store, retain and increment the address
|
| 458 |
|
|
store- chained memory store, retain and decrement the address
|
| 459 |
|
|
|
| 460 |
|
|
See the INPORT and OUTPORT section for details for the input and output port
|
| 461 |
|
|
operations:
|
| 462 |
|
|
|
| 463 |
|
|
inport input port operation
|
| 464 |
|
|
outport output port operation
|
| 465 |
|
|
|
| 466 |
|
|
The .call, .callc, .jump, and .jumpc macros encode the 3 instructions required
|
| 467 |
|
|
to perform a call or jump along with the subsequent instructions. The default
|
| 468 |
|
|
third instructions is "nop" for .call and .jump and it is "drop" for .callc and
|
| 469 |
|
|
.jumpc. The default can be changed by specifying the optional second argument.
|
| 470 |
|
|
The .call and .callc macros must specify a function identified by the .function
|
| 471 |
|
|
directive and the .jump and .jumpc macros must specify a label.
|
| 472 |
|
|
|
| 473 |
|
|
The .function directive takes the name of the function and the function body.
|
| 474 |
|
|
Function bodies must end with a .return or a .jump macro. The .main directive
|
| 475 |
|
|
defines the body of the main function, i.e., the function at which the processor
|
| 476 |
|
|
starts.
|
| 477 |
|
|
|
| 478 |
|
|
The .include directive is used to read additional assembly code. You can, for
|
| 479 |
|
|
example, put the main function in uc.s, define constants and such in consts.s,
|
| 480 |
|
|
define the memories and variables in ram.s, and include UART utilities in
|
| 481 |
|
|
uart.s. These files could be included in uc.s through the following lines:
|
| 482 |
|
|
|
| 483 |
|
|
.include consts.s
|
| 484 |
|
|
.include myram.s
|
| 485 |
|
|
.include uart.s
|
| 486 |
|
|
|
| 487 |
|
|
The assembler only includes functions that can be reached from the main
|
| 488 |
|
|
function. Unused functions will not consume instruction space.
|
| 489 |
|
|
|
| 490 |
|
|
|
| 491 |
|
|
INPORT and OUTPORT
|
| 492 |
|
|
================================================================================
|
| 493 |
|
|
|
| 494 |
|
|
The INPORT and OUTPORT configuration commands are used to specify 2-state inputs
|
| 495 |
|
|
and outputs. For example
|
| 496 |
|
|
|
| 497 |
|
|
INPORT 8-bit i_value I_VALUE
|
| 498 |
|
|
|
| 499 |
|
|
specifies a single 8-bit input signal named "i_value" for the module. The port
|
| 500 |
|
|
is accessed in assembly by ".inport(I_VALUE)" which is equivalent to the
|
| 501 |
|
|
two-instruction sequence "I_VALUE inport". To input an 8-bit value from a FIFO
|
| 502 |
|
|
and send a single-clock-cycle wide acknowledgment strobe, use
|
| 503 |
|
|
|
| 504 |
|
|
INPORT 8-bit,strobe i_fifo,o_fifo_ack I_FIFO
|
| 505 |
|
|
|
| 506 |
|
|
The assembly ".inport(I_FIFO)" will automatically send an acknowledgment strobe
|
| 507 |
|
|
to the FIFO through "o_fifo_ack".
|
| 508 |
|
|
|
| 509 |
|
|
A write port to an 8-bit FIFO is similarly specified by
|
| 510 |
|
|
|
| 511 |
|
|
OUTPORT 8-bit,strobe o_fifo,o_fifo_wr O_FIFO
|
| 512 |
|
|
|
| 513 |
|
|
The assembly ".outport(O_FIFO)" which is equivalent to "O_FIFO outport drop"
|
| 514 |
|
|
will automatically send a write strobe to the FIFO through "o_fifo_wr".
|
| 515 |
|
|
|
| 516 |
|
|
Multiple signals can be packed into a single input or output port by defining
|
| 517 |
|
|
them in comma separated lists. The associated bit masks can be defined
|
| 518 |
|
|
coincident with the port definition as follows:
|
| 519 |
|
|
|
| 520 |
|
|
INPUT 1-bit,1-bit i_fifo_full,i_fifo_empty I_FIFO_STATUS
|
| 521 |
|
|
CONSTANT C_FIFO_STATUS__FULL 0x02
|
| 522 |
|
|
CONSTANT C_FIFO_STATUS__EMPTY 0x01
|
| 523 |
|
|
|
| 524 |
|
|
Checking the "full" status of the FIFO can be done by the following assembly
|
| 525 |
|
|
sequence:
|
| 526 |
|
|
|
| 527 |
|
|
.inport(I_FIFO_STATUS) C_FIFO_STATUS__FULL &
|
| 528 |
|
|
|
| 529 |
|
|
Multiple bits can be masked using a computed value as follows (see below for
|
| 530 |
|
|
more details):
|
| 531 |
|
|
|
| 532 |
|
|
.inport(I_FIFO_STATUS) ${C_FIFO_STATUS__FULL|C_FIFO_STATUS__EMPTY} &
|
| 533 |
|
|
|
| 534 |
|
|
The "${...}" creates an instruction to push the 8-bit value in the braces onto
|
| 535 |
|
|
the data stack. The computation is performed using the Python "eval" function
|
| 536 |
|
|
in the context of the program constants, memory addresses, and memory sizes.
|
| 537 |
|
|
|
| 538 |
|
|
Preceding all of these by
|
| 539 |
|
|
|
| 540 |
|
|
PORTCOMMENT external FIFO
|
| 541 |
|
|
|
| 542 |
|
|
produces the following in the Verilog module statement. The I/O ports are
|
| 543 |
|
|
listed in the order in which they are declared.
|
| 544 |
|
|
|
| 545 |
|
|
// external FIFO
|
| 546 |
|
|
input wire [7:0] i_fifo,
|
| 547 |
|
|
output reg o_fifo_ack,
|
| 548 |
|
|
output reg [7:0] o_fifo,
|
| 549 |
|
|
output reg o_fifo_wr,
|
| 550 |
|
|
input wire i_fifo_full,
|
| 551 |
|
|
input wire i_fifo_empty
|
| 552 |
|
|
|
| 553 |
|
|
The HDL to implement the inputs and outputs is computer generated. Identifying
|
| 554 |
|
|
the port name in the architecture file eliminates the possibility of
|
| 555 |
|
|
inconsistent port numbers between the HDL and the assembly. Specifying the bit
|
| 556 |
|
|
mapping for the assembly code immediately after the port definition helps
|
| 557 |
|
|
prevent inconsistencies between the port definition and the bit mapping in the
|
| 558 |
|
|
assembly code.
|
| 559 |
|
|
|
| 560 |
|
|
The normal initial value for an outport is zero. This can be changed by
|
| 561 |
|
|
including an optional initial value as follows. This initial value will be
|
| 562 |
|
|
applied on system startup and when the micro controller is reset.
|
| 563 |
|
|
|
| 564 |
|
|
OUTPORT 4-bit=4'hA o_signal O_SIGNAL
|
| 565 |
|
|
|
| 566 |
|
|
An isolated output strobe can also be created using:
|
| 567 |
|
|
|
| 568 |
|
|
OUTPORT strobe o_strobe O_STROBE
|
| 569 |
|
|
|
| 570 |
|
|
The assembly ".outstrobe(O_STROBE)" which is equivalent to "O_STROBE outport"
|
| 571 |
|
|
is used to generate the strobe. Since "O_STROBE" is a strobe-only outport, the
|
| 572 |
|
|
".outport" macro cannot be used with it. Similarly, attempting to use the
|
| 573 |
|
|
".outstrobe" macro will generate an error if it is invoked with an outport
|
| 574 |
|
|
that does have data.
|
| 575 |
|
|
|
| 576 |
|
|
A single-bit "set-reset" input port type is also included. This sets a register
|
| 577 |
|
|
when an external strobe is received and clears the register when the port is
|
| 578 |
|
|
read. For example, to capture an external timer for a polled-loop, include the
|
| 579 |
|
|
following in the architecture file:
|
| 580 |
|
|
|
| 581 |
|
|
PORTCOMMENT external timer
|
| 582 |
|
|
INPORT set-reset i_timer I_TIMER
|
| 583 |
|
|
|
| 584 |
|
|
The following is the assembly code to conditionally call two functions when the
|
| 585 |
|
|
timer event is encountered:
|
| 586 |
|
|
|
| 587 |
|
|
.inport(I_TIMER)
|
| 588 |
|
|
.callc(timer_event_1,nop)
|
| 589 |
|
|
.callc(timer_event_2)
|
| 590 |
|
|
|
| 591 |
|
|
The "nop" in the first conditional call prevents the conditional from being
|
| 592 |
|
|
dropped from the data stack so that it can be used by the subsequent conditional
|
| 593 |
|
|
function call.
|
| 594 |
|
|
|
| 595 |
|
|
|
| 596 |
|
|
PERIPHERAL
|
| 597 |
|
|
================================================================================
|
| 598 |
|
|
|
| 599 |
|
|
Peripherals are implemented via Python modules. For example, an open drain I/O
|
| 600 |
|
|
signal, such as is required for an I2C bus, does not fit the INPORT and OUTPORT
|
| 601 |
|
|
functionality. Instead, an "open_drain" peripheral is provided by the Python
|
| 602 |
|
|
script in "core/9x8/peripherals/open_drain.py". This puts a tri-state I/O in
|
| 603 |
|
|
the module statement, allows it to be read through an "inport" instruction, and
|
| 604 |
|
|
allows it to be set low or released through an "outport" instruction. An I2C
|
| 605 |
|
|
bus with separate SCL and SDA ports can then be incorporated into the processor
|
| 606 |
|
|
as follows:
|
| 607 |
|
|
|
| 608 |
|
|
PORTCOMMENT I2C bus
|
| 609 |
|
|
PERIPHERAL open_drain inport=I_SCL \
|
| 610 |
|
|
outport=O_SCL \
|
| 611 |
|
|
iosignal=io_scl
|
| 612 |
|
|
PERIPHERAL open_drain inport=I_SDA \
|
| 613 |
|
|
outport=O_SDA \
|
| 614 |
|
|
iosignal=io_sda
|
| 615 |
|
|
|
| 616 |
|
|
The default width for this peripheral is 1 bit. The module statement will then
|
| 617 |
|
|
include the lines
|
| 618 |
|
|
|
| 619 |
|
|
// I2C bus
|
| 620 |
|
|
inout wire io_scl,
|
| 621 |
|
|
inout wire io_sda
|
| 622 |
|
|
|
| 623 |
|
|
The assembly code to set the io_scl signal low is "0 .outport(O_SCL)" and to
|
| 624 |
|
|
release it is "1 .outport(O_SCL)". These instruction sequences are actually
|
| 625 |
|
|
"0 O_SCL outport drop" and "1 O_SCL outport drop" respectively. The "outport"
|
| 626 |
|
|
instruction drops the top of the data stack (which contained the port number)
|
| 627 |
|
|
and sends the next-to-the-top of the data stack to the designated output port.
|
| 628 |
|
|
|
| 629 |
|
|
Two examples of I2C device operation are included in the examples directory.
|
| 630 |
|
|
|
| 631 |
|
|
The following peripherals are provided:
|
| 632 |
|
|
adder_16bit 16-bit adder/subtractor
|
| 633 |
|
|
AXI4_Lite_Master
|
| 634 |
|
|
32-bit read/write AXI4-Lite Master
|
| 635 |
|
|
Note: The synchronous version has been tested on hardware.
|
| 636 |
|
|
AXI4_Lite_Slave_DualPortRAM
|
| 637 |
|
|
dual-port-RAM interface for the micro controller to act as an
|
| 638 |
|
|
AXI4-Lite slave
|
| 639 |
|
|
big_inport shift reads from a single INPORT to construct a wide input
|
| 640 |
|
|
big_outport shift writes to a single OUTPORT to construct a wide output
|
| 641 |
|
|
counter counter for number of received high cycles from signal
|
| 642 |
|
|
inFIFO_async input FIFO with an asynchronous write clock
|
| 643 |
|
|
latch latch wide inputs for sampling
|
| 644 |
|
|
monitor_stack simulation diagnostic (see below)
|
| 645 |
|
|
open_drain for software-implemented I2C buses or similar
|
| 646 |
|
|
outFIFO_async output FIFO with an asynchronous read clock
|
| 647 |
|
|
PWM_8bit PWM generator with an 8-bit control
|
| 648 |
|
|
timer timing for polled loops or similar
|
| 649 |
|
|
trace simulation diagnostic (see below)
|
| 650 |
|
|
UART bidirectional UART
|
| 651 |
|
|
UART_Rx receive UART
|
| 652 |
|
|
UART_Tx transmit UART
|
| 653 |
3 |
sinclairrf |
wide_strobe 1 to 8 bit strobe generator
|
| 654 |
2 |
sinclairrf |
|
| 655 |
|
|
The following command illustrates how to display the help message for
|
| 656 |
|
|
peripherals:
|
| 657 |
|
|
|
| 658 |
|
|
echo "ARCHITECTURE core/9x8 Verilog" | ssbcc -P "big_inport help" - | less
|
| 659 |
|
|
|
| 660 |
|
|
User defined peripherals can be in the same directory as the architecture file
|
| 661 |
|
|
or a subdirectory named "peripherals".
|
| 662 |
|
|
|
| 663 |
|
|
|
| 664 |
|
|
PARAMETER and LOCALPARAM
|
| 665 |
|
|
================================================================================
|
| 666 |
|
|
|
| 667 |
|
|
Parameters are incorporated through the PARAMETER and LOCALPARAM configuration
|
| 668 |
|
|
commands. For example, the clock frequency in hertz is needed for UARTs for
|
| 669 |
|
|
their baud rate generator. The configuration command
|
| 670 |
|
|
|
| 671 |
|
|
PARAMETER G_CLK_FREQ_HZ 97_000_000
|
| 672 |
|
|
|
| 673 |
|
|
specifies the clock frequency as 97 MHz. The HDL instantiating the processor
|
| 674 |
|
|
can change this specification. The frequency can also be changed through the
|
| 675 |
|
|
command-line invocation of the computer compiler. For example,
|
| 676 |
|
|
|
| 677 |
|
|
ssbcc -G "G_CLK_FREQ_HZ=100_000_000" myprogram.9x8
|
| 678 |
|
|
|
| 679 |
|
|
specifies that a frequency of 100 MHz be used instead of the default frequency
|
| 680 |
|
|
of 97 MHz.
|
| 681 |
|
|
|
| 682 |
|
|
The LOCALPARAM configuration command can be used to specify parameters that
|
| 683 |
|
|
should not be changed by the surrounding HDL. For example,
|
| 684 |
|
|
|
| 685 |
|
|
LOCALPARAM L_VERSION 24'h00_00_00
|
| 686 |
|
|
|
| 687 |
|
|
specifies a 24-bit parameter named "L_VERSION". The 8-bit major, minor, and
|
| 688 |
|
|
build sections of the parameter can be accessed in an assembly program using
|
| 689 |
|
|
"L_VERSION[16+:8]", "L_VERSION[8+:8]", and "L_VERSION[0+:8]".
|
| 690 |
|
|
|
| 691 |
|
|
For both parameters and localparams, the default range is "[0+:8]". The
|
| 692 |
|
|
instruction memory is initialized using the parameter value during synthesis,
|
| 693 |
|
|
not the value used to initialize the parameter. That is, the instruction memory
|
| 694 |
|
|
initialization will be:
|
| 695 |
|
|
|
| 696 |
|
|
s_opcodeMemory[...] = { 1'b1, L_VERSION[16+:8] };
|
| 697 |
|
|
|
| 698 |
|
|
The value of the localparam can be set when the computer compiler is run using
|
| 699 |
|
|
the "-G" option. For example,
|
| 700 |
|
|
|
| 701 |
|
|
ssbcc -G "L_VERSION=24'h01_04_03" myprogram.9x8
|
| 702 |
|
|
|
| 703 |
|
|
can be used in a makefile to set the version number for a release without
|
| 704 |
|
|
modifying the micro controller architecture file.
|
| 705 |
|
|
|
| 706 |
|
|
|
| 707 |
|
|
DIAGNOSTICS AND DEBUGGING
|
| 708 |
|
|
================================================================================
|
| 709 |
|
|
|
| 710 |
|
|
A 3-character, human readable version of the opcode can be included in
|
| 711 |
|
|
simulation waveform outputs by adding "--display-opcode" to the ssbcc command.
|
| 712 |
|
|
|
| 713 |
|
|
The stack health can be monitored during simulation by including the
|
| 714 |
|
|
"monitor_stack" peripheral through the command line. For example, the LED
|
| 715 |
|
|
flasher example can be generated using
|
| 716 |
|
|
|
| 717 |
|
|
ssbcc -P monitor_stack led.9x8
|
| 718 |
|
|
|
| 719 |
|
|
This allows the architecture file to be unchanged between simulation and an FPGA
|
| 720 |
|
|
build.
|
| 721 |
|
|
|
| 722 |
|
|
Stack errors include underflow and overflow, malformed data validity, and
|
| 723 |
|
|
incorrect use of the values on the return stack (returns to data values and data
|
| 724 |
|
|
operations on return addresses). Other errors include out-of-range for memory,
|
| 725 |
|
|
inport, and outport operations.
|
| 726 |
|
|
|
| 727 |
|
|
When stack errors are detected the last 50 instructions are dumped to the
|
| 728 |
|
|
console and the simulation terminates. The dump includes the PC, numeric
|
| 729 |
|
|
opcode, textual representation of the opcode, data stack pointer, next-to-top of
|
| 730 |
|
|
the data stack, top of the data stack, top of the return stack, and the return
|
| 731 |
|
|
stack pointer. Invalid stack values are displayed as "XX". The length of the
|
| 732 |
|
|
history dumped is configurable.
|
| 733 |
|
|
|
| 734 |
|
|
Out-of-range PC checks are also performed if the instruction space is not a
|
| 735 |
|
|
power of 2.
|
| 736 |
|
|
|
| 737 |
|
|
A "trace" peripheral is also provided that dumps the entire execution history.
|
| 738 |
|
|
This was used to validate the processor core.
|
| 739 |
|
|
|
| 740 |
|
|
|
| 741 |
|
|
MEMORY ARCHITECTURE
|
| 742 |
|
|
================================================================================
|
| 743 |
|
|
|
| 744 |
|
|
The DATA_STACK, RETURN_STACK, INSTRUCTION, and MEMORY configuration commands
|
| 745 |
|
|
allocate memory for the data stack, return stack, instruction ROM, and memory
|
| 746 |
|
|
RAM and ROM respectively. The data stack, return stack, and memories are
|
| 747 |
|
|
normally instantiated as dual-port LUT-based memories with asynchronous reads
|
| 748 |
|
|
while the instruction memory is always instantiated with a synchronous read
|
| 749 |
|
|
architecture.
|
| 750 |
|
|
|
| 751 |
|
|
The COMBINE configuration command is used to coalesce memories and to convert
|
| 752 |
|
|
LUT-based memories to synchronous SRAM-based memories. For example, the large
|
| 753 |
|
|
SRAMs in modern FPGAs are ideal for storing the instruction opcodes and their
|
| 754 |
|
|
dual-ported access allows either the data stack or the return stack to be
|
| 755 |
|
|
stored in a relatively small region at the end of the large instruction memory.
|
| 756 |
|
|
Memories, which required dual-ported operation, can also be instantiated in
|
| 757 |
|
|
large RAMs either individually or in combination with each other. Conversion
|
| 758 |
|
|
to SRAM-based memories is also useful for FPGA architectures that do not have
|
| 759 |
|
|
efficient LUT-based memories.
|
| 760 |
|
|
|
| 761 |
|
|
The INSTRUCTION configuration allocates memory for the processor instruction
|
| 762 |
|
|
space. It has the form "INSTRUCTION N" or "INSTRUCTION N*M" where N must be a
|
| 763 |
|
|
power of 2. The first form is used if the desired instruction memory size is a
|
| 764 |
|
|
power of 2. The second form is used to allocate M memory blocks of size N
|
| 765 |
|
|
where M is not a power of 2. For example, on an Altera Cyclone III, the
|
| 766 |
|
|
configuration command "INSTRUCTION 1024*3" allocates three M9Ks for the
|
| 767 |
|
|
instruction space, saving one M9K as compared to the configuration command
|
| 768 |
|
|
"INSTRUCTION 4096".
|
| 769 |
|
|
|
| 770 |
|
|
The DATA_STACK configuration command allocates memory for the data stack. It
|
| 771 |
|
|
has the form "DATA_STACK N" where N is the commanded size of the data stack.
|
| 772 |
|
|
N must be a power of 2.
|
| 773 |
|
|
|
| 774 |
|
|
The RETURN_STACK configuration command allocates memory for the return stack and
|
| 775 |
|
|
has the same format as the DATA_STACK configuration command.
|
| 776 |
|
|
|
| 777 |
|
|
The MEMORY configuration command is used to define one to four memories, either
|
| 778 |
|
|
RAM or ROM, with up to 256 bytes each. If no MEMORY configuration command is
|
| 779 |
|
|
issued, then no memories are allocated for the processor. The MEMORY
|
| 780 |
|
|
configuration command has the format "MEMORY {RAM|ROM} name N" where
|
| 781 |
|
|
"{RAM|ROM}" specifies either a RAM or a ROM, name is the name of the memory and
|
| 782 |
|
|
must start with an alphabetic character, and the size of the memory, N, must be
|
| 783 |
|
|
a power of 2. For example, "MEMORY RAM myram 64" allocates 64 bytes of memory
|
| 784 |
|
|
to form a RAM named myram. Similarly, "MEMORY ROM lut 256" defines a 256 byte
|
| 785 |
|
|
ROM named lut. More details on using memories is provided in the next section.
|
| 786 |
|
|
|
| 787 |
|
|
The COMBINE configuration command can be used to combine the various memories
|
| 788 |
|
|
for more efficient processor implementation as follows:
|
| 789 |
|
|
|
| 790 |
|
|
COMBINE INSTRUCTION,
|
| 791 |
|
|
COMBINE
|
| 792 |
|
|
COMBINE ,
|
| 793 |
|
|
COMBINE
|
| 794 |
|
|
|
| 795 |
|
|
where is one of DATA_STACK, RETURN_STACK, or a list of one
|
| 796 |
|
|
or more ROMs and is a list of one or more RAMs and/or ROMs. The first
|
| 797 |
|
|
configuration command reserves space at the end of the instruction memory for
|
| 798 |
|
|
the DATA_STACK, RETURN_STACK, or listed ROMs.
|
| 799 |
|
|
|
| 800 |
|
|
The SRAM_WIDTH configuration command is used to make the memory allocations more
|
| 801 |
|
|
efficient when the SRAM block width is more than 9 bits. For example,
|
| 802 |
|
|
Altera's Cyclone V family has 10-bit wide memory blocks and the configuration
|
| 803 |
|
|
command "SRAM_WIDTH 10" is appropriate. The configuration command
|
| 804 |
|
|
sequence
|
| 805 |
|
|
|
| 806 |
|
|
INSTRUCTION 1024
|
| 807 |
|
|
RETURN_STACK 32
|
| 808 |
|
|
SRAM_WIDTH 10
|
| 809 |
|
|
COMBINE INSTRUCTION,RETURN_STACK
|
| 810 |
|
|
|
| 811 |
|
|
will use a single 10-bit memory entry for each element of the return stack
|
| 812 |
|
|
instead of packing the 10-bit values into two memory entries of a 9-bit wide
|
| 813 |
|
|
memory.
|
| 814 |
|
|
|
| 815 |
|
|
The following illustrates a possible configuration for a Spartan-6 with a
|
| 816 |
|
|
2048-long SRAM and relatively large 64-deep data stack. The data stack will be
|
| 817 |
|
|
in the last 64 elements of the instruction memory and the instruction space will
|
| 818 |
|
|
be reduced to 1984 words.
|
| 819 |
|
|
|
| 820 |
|
|
INSTRUCTION 2048
|
| 821 |
|
|
DATA_STACK 64
|
| 822 |
|
|
COMBINE INSTRUCTION,DATA_STACK
|
| 823 |
|
|
|
| 824 |
|
|
The following illustrates a possible configuration for a Cyclone-III with three
|
| 825 |
|
|
M9Ks for the instruction ROM and the data stack.
|
| 826 |
|
|
|
| 827 |
|
|
INSTRUCTION 1024*3
|
| 828 |
|
|
DATA_STACK 64
|
| 829 |
|
|
COMBINE INSTRUCTION,DATA_STACK
|
| 830 |
|
|
|
| 831 |
|
|
WARNING: Some devices, such as Xilinx' Spartan-3A devices, do not support
|
| 832 |
|
|
asynchronous reads, so the COMBINE configuration command does not work for them.
|
| 833 |
|
|
|
| 834 |
|
|
WARNING: Xilinx XST does not correctly infer a Block RAM when the
|
| 835 |
|
|
"COMBINE INSTRUCTION,RETURN_STACK" configuration command is used and the
|
| 836 |
|
|
instruction space is 1024 instructions or larger. Xilinx is supposed to fix
|
| 837 |
|
|
this in a future release of Vivado so the fix will only apply to 7-series or
|
| 838 |
|
|
later FPGAs.
|
| 839 |
|
|
|
| 840 |
|
|
|
| 841 |
|
|
MEMORY
|
| 842 |
|
|
================================================================================
|
| 843 |
|
|
|
| 844 |
|
|
The MEMORY configuration command is used as follows to allocate a 128-byte RAM
|
| 845 |
|
|
named "myram" and to allocate a 32-byte ROM named "myrom". Zero to four
|
| 846 |
|
|
memories can be allocated, each with up to 256 bytes.
|
| 847 |
|
|
|
| 848 |
|
|
MEMORY RAM myram 128
|
| 849 |
|
|
MEMORY ROM myrom 32
|
| 850 |
|
|
|
| 851 |
|
|
The assembly code to lay out the memory uses the ".memory" directive to identify
|
| 852 |
|
|
the memory and the ".variable" directive to identify the symbol and its content.
|
| 853 |
|
|
Single or multiple values can be listed and "*N" can be used to identify a
|
| 854 |
|
|
repeat count.
|
| 855 |
|
|
|
| 856 |
|
|
.memory RAM myram
|
| 857 |
|
|
.variable a 0
|
| 858 |
|
|
.variable b 0
|
| 859 |
|
|
.variable c 0 0 0 0
|
| 860 |
|
|
.variable d 0*4
|
| 861 |
|
|
|
| 862 |
|
|
.memory ROM myrom
|
| 863 |
|
|
.variable coeff_table 0x04
|
| 864 |
|
|
0x08
|
| 865 |
|
|
0x10
|
| 866 |
|
|
0x20
|
| 867 |
|
|
.variable hello_world N"Hello World!\r\n"
|
| 868 |
|
|
|
| 869 |
|
|
Single values are fetched from or stored to memory using the following assembly:
|
| 870 |
|
|
|
| 871 |
|
|
.fetchvalue(a)
|
| 872 |
|
|
0x12 .storevalue(b)
|
| 873 |
|
|
|
| 874 |
|
|
Multi-byte values are fetched or stored as follows. This copies the four values
|
| 875 |
|
|
from coeff_table, which is stored in a ROM, to d.
|
| 876 |
|
|
|
| 877 |
|
|
.fetchvector(coeff_table,4) .storevector(d,4)
|
| 878 |
|
|
|
| 879 |
|
|
The memory size is available using computed values (see below) and can be used
|
| 880 |
|
|
to clear the entire memory, etc.
|
| 881 |
|
|
|
| 882 |
|
|
The available single-cycle memory operation macros are:
|
| 883 |
|
|
.fetch(mem_name) replaces T with the value at the address T in the memory
|
| 884 |
|
|
mem_name
|
| 885 |
5 |
sinclairrf |
Note: .fetchram(var_name) is safer.
|
| 886 |
2 |
sinclairrf |
.fetch+(mem_name) pushes the value at address T in the memory mem_name
|
| 887 |
|
|
into the data stack below T and increments T
|
| 888 |
|
|
Note: This is useful for fetching successive values
|
| 889 |
|
|
from memory into the data stack.
|
| 890 |
5 |
sinclairrf |
Note: .fetchram+(var_name) is safer.
|
| 891 |
2 |
sinclairrf |
.fetch-(mem_name) similar to .fetch+ but decrements T
|
| 892 |
5 |
sinclairrf |
Note: .fetchram-(var_name) is safer.
|
| 893 |
2 |
sinclairrf |
.store(ram_name) stores N at address T in the RAM ram_name, also drops
|
| 894 |
|
|
the top of the data stack
|
| 895 |
5 |
sinclairrf |
Note: .storeram(var_name) is safer.
|
| 896 |
2 |
sinclairrf |
.store+(ram_name) stores N at address T in the RAM ram_name, also drops N
|
| 897 |
|
|
from the data stack and increments T
|
| 898 |
5 |
sinclairrf |
Note: .storeram+(var_name) is safer.
|
| 899 |
2 |
sinclairrf |
.store-(ram_name) similar to .store+ but decrements T
|
| 900 |
5 |
sinclairrf |
Note: .storeram-(var_name) is safer.
|
| 901 |
2 |
sinclairrf |
|
| 902 |
|
|
The following multi-cycle macros provide more generalized access to the
|
| 903 |
|
|
memories:
|
| 904 |
|
|
.fetchindexed(var_name)
|
| 905 |
|
|
uses the top of the data stack as an index into var_name
|
| 906 |
|
|
Note: This is equivalent to the 3 instruction sequence
|
| 907 |
|
|
"var_name + .fetch(mem_name)"
|
| 908 |
|
|
.fetchoffset(var_name,offset)
|
| 909 |
|
|
fetches the single-byte value of var_name offset by
|
| 910 |
|
|
"offset" bytes
|
| 911 |
|
|
Note: This is equivalent to
|
| 912 |
|
|
"${var_name+offset} .fetch(mem_name)"
|
| 913 |
5 |
sinclairrf |
.fetchram(var_name) is similar to the .fetch(mem_name) macro except that the
|
| 914 |
|
|
variable name is used to identify the memory instead of
|
| 915 |
|
|
the name of the memory
|
| 916 |
|
|
.fetchram+(var_name) is similar to the .fetch+(mem_name) macro except that
|
| 917 |
|
|
the variable name is used to identify the memory instead
|
| 918 |
|
|
of the name of the memory
|
| 919 |
|
|
.fetchram-(var_name) is similar to the .fetch-(mem_name) macro except that the
|
| 920 |
|
|
the variable name is used to identify the memory instead
|
| 921 |
|
|
of the name of the memory
|
| 922 |
|
|
.fetchvalue(var_name) fetches the single-byte value of var_name
|
| 923 |
|
|
Note: This is equivalent to "var_name .fetch(mem_name)"
|
| 924 |
|
|
where mem_name is the memory in which var_name is
|
| 925 |
|
|
stored.
|
| 926 |
|
|
.fetchvalueoffset(var_name,offset)
|
| 927 |
|
|
fetches the single-byte value stored at var_name+offset
|
| 928 |
|
|
Note: This is equivalent to
|
| 929 |
|
|
"${var_name+offset}" .fetch(mem_name)
|
| 930 |
|
|
where mem_name is the memory in which var_name is
|
| 931 |
|
|
stored.
|
| 932 |
2 |
sinclairrf |
.fetchvector(var_name,N)
|
| 933 |
|
|
fetches N values starting at var_name into the data
|
| 934 |
|
|
stack with the value at var_name at the top and the
|
| 935 |
|
|
value at var_name+N-1 deep in the stack.
|
| 936 |
|
|
Note: This is equivalent N+1 operation sequence
|
| 937 |
|
|
"${var_name+N-1} .fetch-(mem_name) ...
|
| 938 |
|
|
.fetch-(mem_name) .fetch(mem_name)"
|
| 939 |
|
|
where ".fetch-(mem_name)" is repeated N-1 times.
|
| 940 |
|
|
.storeindexed(var_name)
|
| 941 |
|
|
uses the top of the data stack as an index into var_name
|
| 942 |
|
|
into which to store the next-to-top of the data stack.
|
| 943 |
|
|
Note: This is equivalent to the 4 instruction sequence
|
| 944 |
|
|
"var_name + .store(mem_name) drop".
|
| 945 |
|
|
Note: The default "drop" instruction can be overriden
|
| 946 |
|
|
by providing the optional second argument
|
| 947 |
|
|
similarly to the .storevalue macro.
|
| 948 |
|
|
.storeoffset(var_name,offset)
|
| 949 |
|
|
stores the single-byte value at the top of the data
|
| 950 |
|
|
stack at var_name offset by "offset" bytes
|
| 951 |
|
|
Note: This is equivalent to
|
| 952 |
|
|
"${var_name+offset} .store(mem_name) drop"
|
| 953 |
|
|
Note: The optional third argument is as per the
|
| 954 |
|
|
optional second argument of .storevalue
|
| 955 |
5 |
sinclairrf |
.storeram(var_name) is similar to the .store(mem_name) macro except that the
|
| 956 |
|
|
variable name is used to identify the RAM instead of the
|
| 957 |
|
|
name of the RAM
|
| 958 |
|
|
.storeram+(var_name) is similar to the .store+(mem_name) macro except that
|
| 959 |
|
|
the variable name is used to identify the RAM instead of
|
| 960 |
|
|
the name of the RAM
|
| 961 |
|
|
.storeram-(var_name) is similar to the .store-(mem_name) macro except that
|
| 962 |
|
|
the variable name is used to identify the RAM instead of
|
| 963 |
|
|
the name of the RAM
|
| 964 |
|
|
.storevalue(var_name) stores the single-byte value at the top of the data
|
| 965 |
|
|
stack at var_name
|
| 966 |
|
|
Note: This is equivalent to
|
| 967 |
|
|
"var_name .store(mem_name) drop"
|
| 968 |
|
|
Note: The default "drop" instruction can be replaced by
|
| 969 |
|
|
providing the optional second argument. For
|
| 970 |
|
|
example, the following instruction will store and
|
| 971 |
|
|
then decrement the value at the top of the data
|
| 972 |
|
|
stack:
|
| 973 |
|
|
.storevalue(var_name,1-)
|
| 974 |
2 |
sinclairrf |
.storevector(var_name,N)
|
| 975 |
|
|
Does the reverse of the .fetchvector macro.
|
| 976 |
|
|
Note: This is equivalent to the N+2 operation sequence
|
| 977 |
|
|
"var_name .store+(mem_name) ... .store+(mem_name)
|
| 978 |
|
|
.store(mem_name) drop"
|
| 979 |
|
|
where ".store+(mem_name)" is repeated N-1 times.
|
| 980 |
|
|
|
| 981 |
|
|
The .fetchvector and .storevector macros are intended to work with values stored
|
| 982 |
|
|
MSB first in memory and with the MSB toward the top of the data stack,
|
| 983 |
|
|
similarly to the Forth language with multi-word values. To demonstrate how
|
| 984 |
|
|
this data structure works, consider the examples of decrementing and
|
| 985 |
|
|
incrementing a two-byte value on the data stack:
|
| 986 |
|
|
|
| 987 |
|
|
; Decrement a 2-byte value
|
| 988 |
|
|
; swap 1- swap - decrement the LSB
|
| 989 |
|
|
; over -1= - puts -1 on the top of the data stack if the LSB rolled
|
| 990 |
|
|
; over from 0 to -1, puts 0 on the top otherwise
|
| 991 |
|
|
; + - decrements the MSB if the LSB rolled over
|
| 992 |
|
|
; ( u_LSB u_MSB - u_LSB' u_MSB' )
|
| 993 |
|
|
.function decrement_2byte
|
| 994 |
|
|
swap 1- swap over -1= .return(+)
|
| 995 |
|
|
|
| 996 |
|
|
; Increment a 2-byte value
|
| 997 |
|
|
; swap 1+ swap - increment the LSB
|
| 998 |
|
|
; over 0= - puts -1 on the top of the data stack if the LSB rolled
|
| 999 |
|
|
; over from 0xFF to 0, puts 0 on the top otherwise
|
| 1000 |
|
|
; - - increments the MSB if the LSB rolled over (by
|
| 1001 |
|
|
; subtracting -1)
|
| 1002 |
|
|
; ( u_LSB u_MSB - u_LSB' u_MSB' )
|
| 1003 |
|
|
.function increment_2byte
|
| 1004 |
|
|
swap 1+ swap over 0= .return(-)
|
| 1005 |
|
|
|
| 1006 |
|
|
|
| 1007 |
|
|
COMPUTED VALUES
|
| 1008 |
|
|
================================================================================
|
| 1009 |
|
|
|
| 1010 |
|
|
Computed values can be pushed on the stack using a "${...}" where the "..." is
|
| 1011 |
|
|
evaluated in Python and cannot have any spaces.
|
| 1012 |
|
|
|
| 1013 |
|
|
For example, a loop that should be run 5 times can be coded as:
|
| 1014 |
|
|
|
| 1015 |
|
|
${5-1} :loop ... .jumpc(loop,1-) drop
|
| 1016 |
|
|
|
| 1017 |
|
|
which is a clearer indication that the loop is to be run 5 times than is the
|
| 1018 |
|
|
instruction sequence
|
| 1019 |
|
|
|
| 1020 |
|
|
4 :loop ...
|
| 1021 |
|
|
|
| 1022 |
|
|
Constants can be accessed in the computation. For example, a block of memory
|
| 1023 |
|
|
can be allocated as follows:
|
| 1024 |
|
|
|
| 1025 |
|
|
.constant C_RESERVE
|
| 1026 |
|
|
.memory RAM myram
|
| 1027 |
|
|
...
|
| 1028 |
|
|
.variable reserved 0*${C_RESERVE}
|
| 1029 |
|
|
|
| 1030 |
|
|
and the block of reserved memory can be cleared using the following loop:
|
| 1031 |
|
|
|
| 1032 |
|
|
${C_RESERVE-1} :loop 0 over .storeindexed(reserved) .jumpc(loop,1-) drop
|
| 1033 |
|
|
|
| 1034 |
|
|
The offsets of variables in their memory can also be accessed through a computed
|
| 1035 |
|
|
value. The value of reserved could also be cleared as follows:
|
| 1036 |
|
|
|
| 1037 |
|
|
${reserved-1} ${C_RESERVE-1} :loop >r
|
| 1038 |
|
|
|
| 1039 |
|
|
r> .jumpc(loop,-1) drop drop
|
| 1040 |
|
|
|
| 1041 |
|
|
This body of this version of the loop is the same length as the first version.
|
| 1042 |
|
|
In general, it is better to use the memory macros to access variables as they
|
| 1043 |
|
|
ensure the correct memory is accessed.
|
| 1044 |
|
|
|
| 1045 |
|
|
The sizes of memories can also be accessed using computed values. If "myram" is
|
| 1046 |
|
|
a RAM, then "${size['myram']}" will push the size of "myram" on the stack. As
|
| 1047 |
|
|
an example, the following code will clear the entire RAM:
|
| 1048 |
|
|
|
| 1049 |
|
|
${size['myram']-1} :loop 0 swap .jumpc(loop,.store-(myram)) drop
|
| 1050 |
|
|
|
| 1051 |
|
|
The lengths of I/O signals can also be accessed using computed values. If
|
| 1052 |
|
|
"o_mask" is a mask, then "${size['o_mask']}" will push the size of the mask on
|
| 1053 |
|
|
the stack and "${2**size['o_mask']-1}" will push a value that sets all the bits
|
| 1054 |
|
|
of the mask. The I/O signals include I/O signals instantiated by peripherals.
|
| 1055 |
|
|
For example, for the configuration command
|
| 1056 |
|
|
|
| 1057 |
|
|
PERIPHERAL big_outport outport=O_BIG outsignal=o_big width=47
|
| 1058 |
|
|
|
| 1059 |
|
|
the width of the output signal is accessible using "${size['o_big']}". You can
|
| 1060 |
|
|
set the wide signal to all zeroes using:
|
| 1061 |
|
|
|
| 1062 |
|
|
${(size['o_big']+7)/8-1} :loop 0 .outport(O_BIG) .jumpc(loop,1-) drop
|
| 1063 |
|
|
|
| 1064 |
3 |
sinclairrf |
|
| 1065 |
|
|
MACROS
|
| 1066 |
|
|
================================================================================
|
| 1067 |
|
|
There are 3 types of macros used by the assembler.
|
| 1068 |
|
|
|
| 1069 |
|
|
The first kind of macros are built in to the assembler and are required to
|
| 1070 |
|
|
encode instructions that have embedded values or have mandatory subsequent
|
| 1071 |
|
|
instructions. These include function calls, jump instructions, function return,
|
| 1072 |
|
|
and memory accesses as follows:
|
| 1073 |
|
|
.call(function,[op])
|
| 1074 |
|
|
.callc(function,[op])
|
| 1075 |
|
|
.fetch(ramName)
|
| 1076 |
|
|
.fetch+(ramName)
|
| 1077 |
|
|
.fetch-(ramName)
|
| 1078 |
|
|
.jump(label,[op])
|
| 1079 |
|
|
.jumpc(label,[op])
|
| 1080 |
|
|
.return([op])
|
| 1081 |
|
|
.store(ramName)
|
| 1082 |
|
|
.store+(ramName)
|
| 1083 |
|
|
.store-(ramName)
|
| 1084 |
|
|
|
| 1085 |
|
|
The second kind of macros are designed to ease access to input and output
|
| 1086 |
|
|
operations and for memory accesses and to help ensure these operations are
|
| 1087 |
|
|
correctly constructed. These are defined as python scripts in the
|
| 1088 |
|
|
core/9x8/macros directory and are automatically loaded into the assembler.
|
| 1089 |
|
|
These macros are:
|
| 1090 |
|
|
.fetchindexed(variable)
|
| 1091 |
|
|
.fetchoffset(variable,ix)
|
| 1092 |
|
|
.fetchvalue(variableName)
|
| 1093 |
|
|
.fetchvector(variableName,N)
|
| 1094 |
|
|
.inport(I_name)
|
| 1095 |
|
|
.outport(O_name[,op])
|
| 1096 |
|
|
.outstrobe(O_name)
|
| 1097 |
|
|
.storeindexed(variableName[,op])
|
| 1098 |
|
|
.storeoffset(variableName,ix[,op])
|
| 1099 |
|
|
.storevalue(variableName[,op])
|
| 1100 |
|
|
.storevector(variableName,N)
|
| 1101 |
|
|
|
| 1102 |
|
|
The third kind of macro is user-defined macros. These macros must be registered
|
| 1103 |
|
|
with the assembler using the ".macro" directive.
|
| 1104 |
|
|
|
| 1105 |
|
|
For example, the ".push32" macro is defined by macros/9x8/push32.py and can be
|
| 1106 |
|
|
used to push 32-bit (4-byte) values onto the data stack as follows:
|
| 1107 |
|
|
|
| 1108 |
|
|
.macro push32
|
| 1109 |
|
|
.constant C_X 0x87654321
|
| 1110 |
|
|
.main
|
| 1111 |
|
|
...
|
| 1112 |
|
|
.push32(0x12345678)
|
| 1113 |
|
|
.push32(C_X)
|
| 1114 |
|
|
.push32(${0x12345678^C_X})
|
| 1115 |
|
|
...
|
| 1116 |
|
|
|
| 1117 |
|
|
The following macros are provided in macros/9x8:
|
| 1118 |
|
|
.push16(v) push the 16-bit (2-byte) value "v" onto the data stack with the
|
| 1119 |
|
|
MSB at the top of the data stack
|
| 1120 |
4 |
sinclairrf |
.push24(v) push the 24-bit (3-byte) value "v" onto the data stack with the
|
| 1121 |
|
|
MSB at the top of the data stack
|
| 1122 |
3 |
sinclairrf |
.push32(v) push the 32-bit (4-byte) value "v" onto the data stack with the
|
| 1123 |
|
|
MSB at the top of the data stack
|
| 1124 |
4 |
sinclairrf |
.pushByte(v,ix)
|
| 1125 |
|
|
push the ix'th byte of v onto the data stack
|
| 1126 |
|
|
Note: ix=0 designates the LSB
|
| 1127 |
3 |
sinclairrf |
|
| 1128 |
|
|
Directories are searched in the following order for macros:
|
| 1129 |
|
|
.
|
| 1130 |
|
|
./macros
|
| 1131 |
|
|
include paths specified by the '-M' command line option.
|
| 1132 |
|
|
macros/9x8
|
| 1133 |
|
|
|
| 1134 |
|
|
The python scripts in core/9x8/macros and macros/9x8 can be used as design
|
| 1135 |
|
|
examples for user-defined macros. The assembler does some type checking based
|
| 1136 |
|
|
on the list provided when the macro is registered by the "AddMacro" method, but
|
| 1137 |
|
|
additional type checking is often warranted by the macro "emitFunction" which
|
| 1138 |
|
|
emits the actual assembly code. The ".fetchvector" and ".storevector" macros
|
| 1139 |
4 |
sinclairrf |
demonstrates how to design variable-length macros. Several macros in
|
| 1140 |
|
|
core/9x8/macros illustrate designing macros with optional arguments.
|
| 1141 |
3 |
sinclairrf |
|
| 1142 |
|
|
It is not an error to repeat the ".macro MACRO_NAME" directive for user-defined
|
| 1143 |
|
|
macros. The assembler will issue a fatal error if a user-defined macro
|
| 1144 |
|
|
conflicts with a built-in macro.
|
| 1145 |
|
|
|
| 1146 |
|
|
|
| 1147 |
2 |
sinclairrf |
CONDITIONAL COMPILATION
|
| 1148 |
|
|
================================================================================
|
| 1149 |
|
|
The computer compiler and assembler recognize conditional compilation as
|
| 1150 |
|
|
follows: .IFDEF, .IFNDEF, .ELSE, and .ENDIF can be used in the architecture
|
| 1151 |
|
|
file and they can be used to conditionally include functions, files, etc within
|
| 1152 |
|
|
the assembly code; .ifdef, .ifndef, .else, and .endif can be used in function
|
| 1153 |
|
|
bodies, variable bodies, etc. to conditionally include assembly code, symbols,
|
| 1154 |
|
|
or data. Conditionals cannot cross file boundaries.
|
| 1155 |
|
|
|
| 1156 |
|
|
The computer compiler examines the list of defined symbols such as I/O ports,
|
| 1157 |
|
|
I/O signals, etc. to evaluate the true/false condition associated with the
|
| 1158 |
|
|
.IFDEF and .IFNDEF commands. The "-D" option to the computer compiler is
|
| 1159 |
|
|
provided to define symbols for enabling conditionally compiled configuration
|
| 1160 |
|
|
commands. Similarly, the assembler examines the list of I/O ports, I/O signals,
|
| 1161 |
|
|
parameters, constants, etc. to evaluate the .IFDEF, .IFNDEF, .ifdef, and .ifndef
|
| 1162 |
|
|
conditionals.
|
| 1163 |
|
|
|
| 1164 |
|
|
For example, a diagnostic UART can be conditionally included using the
|
| 1165 |
|
|
configuration commands:
|
| 1166 |
|
|
|
| 1167 |
|
|
.IFDEF ENABLE_UART
|
| 1168 |
|
|
PORTCOMMENT Diagnostic UART
|
| 1169 |
|
|
PERIPHERAL UART_Tx outport=O_UART_TX ...
|
| 1170 |
|
|
.ENDIF
|
| 1171 |
|
|
|
| 1172 |
|
|
And the assembly code can include conditional code fragments such the following,
|
| 1173 |
|
|
where the existence of the output port is used to determine whether or not to
|
| 1174 |
|
|
send a character to that output port:
|
| 1175 |
|
|
|
| 1176 |
|
|
.ifdef(O_UART_TX) 'A' .outport(O_UART_TX) .endif
|
| 1177 |
|
|
|
| 1178 |
|
|
Invoking the computer compiler with "-D ENABLE_UART" will generate a module with
|
| 1179 |
|
|
the UART peripheral and will enable the conditional code sending the 'A'
|
| 1180 |
|
|
character to the UART port.
|
| 1181 |
|
|
|
| 1182 |
|
|
The following code can be used to preclude multiple attempted inclusions of an
|
| 1183 |
|
|
assembly library file.
|
| 1184 |
|
|
|
| 1185 |
|
|
; put these two lines near the top of the file
|
| 1186 |
|
|
.IFNDEF C_FILENAME_INCLUDED
|
| 1187 |
|
|
.constant C_FILENAME_INCLUDED 1
|
| 1188 |
|
|
; put the library body here
|
| 1189 |
|
|
...
|
| 1190 |
|
|
; put this line at the bottom of the file
|
| 1191 |
|
|
.ENDIF ; .IFNDEF C_FILENAME_INCLUDED
|
| 1192 |
|
|
|
| 1193 |
|
|
The ".INCLUDE" configuration command can be used to read configuration commands
|
| 1194 |
|
|
from additional sources.
|
| 1195 |
|
|
|
| 1196 |
|
|
|
| 1197 |
|
|
SIMULATIONS
|
| 1198 |
|
|
================================================================================
|
| 1199 |
|
|
|
| 1200 |
|
|
Simulations have been performed with Icarus Verilog, Verilator, and Xilinx'
|
| 1201 |
|
|
ISIM. Icarus Verilog is good for short, simple simulations and is used for the
|
| 1202 |
|
|
core and peripheral test benches; Verilator for long simulations of large,
|
| 1203 |
|
|
complex systems; and ISIM when Xilinx-specific cores are used. Verilator is
|
| 1204 |
|
|
the fastest simulators I've encountered. Verilator is also used for lint
|
| 1205 |
|
|
checking in the core test benches.
|
| 1206 |
|
|
|
| 1207 |
|
|
|
| 1208 |
|
|
MEM INITIALIZATION FILE
|
| 1209 |
|
|
================================================================================
|
| 1210 |
|
|
|
| 1211 |
|
|
A memory initialization file is produced during compilation. This file can be
|
| 1212 |
|
|
used with tools such as Xilinx' data2mem to modify the SRAM contents without
|
| 1213 |
|
|
having to rebuild the entire system. It is restricted to the opcode memory
|
| 1214 |
|
|
initialization. The file must be processed before it can be used by specific
|
| 1215 |
|
|
tools, see doc/MemoryInitialization.html.
|
| 1216 |
|
|
|
| 1217 |
|
|
WARNING: The values of parameters used in the assembly code must match the
|
| 1218 |
|
|
instantiated design.
|
| 1219 |
|
|
|
| 1220 |
|
|
|
| 1221 |
|
|
THEORY OF OPERATION
|
| 1222 |
|
|
================================================================================
|
| 1223 |
|
|
|
| 1224 |
|
|
Registers are used for the top of data stack, "T", and the next-to-top of the
|
| 1225 |
|
|
data stack, "N". The data stack is a separate memory. This means that the
|
| 1226 |
|
|
"DATA_STACK N" configuration command actually allows N+2 values in the data
|
| 1227 |
|
|
stack since T and N are not stored in the N-element deep data stack.
|
| 1228 |
|
|
|
| 1229 |
|
|
The return stack is similar in that "R" is the top of the return stack and the
|
| 1230 |
|
|
"RETURN_STACK N" allocates an additional N words of memory. The return stack is
|
| 1231 |
|
|
the wider of the 8-bit data width and the program counter width.
|
| 1232 |
|
|
|
| 1233 |
|
|
The program counter is always either incremented by 1 or is set to an address
|
| 1234 |
|
|
as controlled by jump, jumpc, call, callc, and return instructions. The
|
| 1235 |
|
|
registered program counter is used to read the next opcode from the instruction
|
| 1236 |
|
|
memory and this opcode is also registered in the memory. This means that there
|
| 1237 |
|
|
is a 1 clock cycle delay between the address changing and the associated
|
| 1238 |
|
|
instruction being performed. This is also part of the architecture required to
|
| 1239 |
|
|
have the processor operate at one instruction per clock cycle.
|
| 1240 |
|
|
|
| 1241 |
|
|
Separate ALUs are used for the program counter, adders, logical operations, etc.
|
| 1242 |
|
|
and MUXes are used to select the values desired for the destination registers.
|
| 1243 |
|
|
The instruction execution consists of translating the upper 6 msb of the opcode
|
| 1244 |
|
|
into MUX settings and performing opcode-dependent ALU operations as controlled
|
| 1245 |
|
|
by the 3 lsb of the opcode (during the first half of the clock cycle) and then
|
| 1246 |
|
|
setting the T, N, R, memories, etc. as controlled by the computed MUX settings.
|
| 1247 |
|
|
|
| 1248 |
|
|
The "core.v" file is the code for these operations. Within this file there are
|
| 1249 |
|
|
several "@xxx@" strings that specify where the computer compiler is to insert
|
| 1250 |
|
|
code such as I/O declarations, memories, inport interpretation, outport
|
| 1251 |
|
|
generation, peripherals, etc.
|
| 1252 |
|
|
|
| 1253 |
|
|
The file structure, i.e., putting the core and the assembler in "core/9x8"
|
| 1254 |
|
|
should facilitate application-specific modification of processor. For example,
|
| 1255 |
|
|
the store+, store-, fetch+, and fetch- instructions could be replaced with
|
| 1256 |
|
|
additional stack manipulation operations, arithmetic operations with 2 byte
|
| 1257 |
|
|
results, etc. Simply copy the "9x8" directory to something like "9x8_XXX" and
|
| 1258 |
|
|
make your modifications in that directory. The 8-bit peripherals should still
|
| 1259 |
|
|
work, but the 9x8 library functions may need rework to accommodate the
|
| 1260 |
|
|
modifications.
|
| 1261 |
|
|
|
| 1262 |
|
|
|
| 1263 |
|
|
MISCELLANEOUS
|
| 1264 |
|
|
================================================================================
|
| 1265 |
|
|
|
| 1266 |
4 |
sinclairrf |
Features and peripherals are still being added and the documentation is
|
| 1267 |
|
|
incomplete. The output HDL is currently restricted to Verilog although a VHDL
|
| 1268 |
|
|
package file is automatically generated by the computer compiler.
|
| 1269 |
|
|
|
| 1270 |
2 |
sinclairrf |
The "INVERT_RESET" configuration command is used to indicate an active-low reset
|
| 1271 |
|
|
is input to the micro controller rather than an active-high reset.
|
| 1272 |
|
|
|
| 1273 |
|
|
A VHDL package file is automatically generated by the computer compiler.
|