OpenCores
URL https://opencores.org/ocsvn/ssbcc/ssbcc/trunk

Subversion Repositories ssbcc

[/] [ssbcc/] [trunk/] [README] - Blame information for rev 5

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 sinclairrf
SSBCC.9x8 is a free Small Stack-Based Computer Compiler with a 9-bit opcode,
2 4 sinclairrf
8-bit data core designed to facilitate FPGA HDL development.
3 2 sinclairrf
 
4 4 sinclairrf
The primary design criteria are:
5
- high speed (to avoid timing issues)
6
- low fabric utilization
7
- vendor independent
8
- development tools available for all operating systems
9
 
10
It has been used in Spartan-3A, Spartan-6, Virtex-6, and Artix-7 FPGAs and has
11
been built for Altera, Lattice, and other Xilinx devices.  It is faster and
12
usually smaller than vendor provided processors.
13
 
14 2 sinclairrf
The compiler takes an architecture file that describes the micro controller
15
memory spaces, inputs and outputs, and peripherals and which specifies the HDL
16
language and source assembly.  It generates a single HDL module implementing
17
the entire micro controller.  No user-written HDL is required to instantiate
18
I/Os, program memory, etc.
19
 
20 4 sinclairrf
The features are:
21
- high speed, low fabric utilization
22
- vendor-independent Verilog output with a VHDL package file
23
- simple Forth-like assembly language (41 instructions)
24
- single cycle instruction execution
25
- automatic generation of I/O ports
26
- configurable instruction, data stack, return stack, and memory utilization
27
- extensible set of peripherals (I2C busses, UARTs, AXI4-Lite busses, etc.)
28
- extensible set of macros
29
- memory initialization file to facilitate code development without rebuilds
30
- simulation diagnostics to facilitate identifying code errors
31
- conditionally included I/Os and peripherals, functions, and assembly code
32
 
33 2 sinclairrf
SSBCC has been used for the following projects:
34
- operate a media translator from a parallel camera interface to an OMAP GPMC
35
  interface, detect and report bus errors and hardware errors, and act as an
36
  SPI slave to the OMAP
37
- operate two UART interfaces and multiple PWM controlled 2-lead bi-color LEDs
38
- operate and monitor the Artix-7 fabric in a Zynq system using AXI4-Lite
39
  master and slave buses, I2C buses for timing-critical voltage measurements
40
 
41 4 sinclairrf
The only external tool required is Python 2.7.
42 2 sinclairrf
 
43 4 sinclairrf
 
44 2 sinclairrf
DESCRIPTION
45
================================================================================
46
 
47
The computer compiler uses an architectural description of the processor stating
48
the sizes of the instruction memory, data stack, and return stack; the input and
49
output ports; RAM and ROM types and sizes; and peripherals.
50
 
51
The instructions are all single-cycle.  The instructions include
52 4 sinclairrf
- 4 arithmetic instructions:  addition, subtraction, increment, and decrement
53
- 3 bit-wise logical instructions:  and, or, and exclusive or
54
- 7 shift and rotation instructions: <<0, <<1, 0>>, 1>>, <>msb, and >>lsb
55
- 4 logical instructions:  0=, 0<>, -1=, -1<>
56
- 6 Forth-like data stack instructions:  drop, dup, nip, over, push, swap
57
- 3 Forth-like return stack instructions:  >r, r>, r@
58
- 2 input and output
59
- 6 memory read and write with optional address post increment and post decrement
60
- 2 jump and conditional jump
61
- 2 call and conditional call
62
- 1 function return
63
- 1 nop
64 2 sinclairrf
 
65
The 9x8 address space is up to 8K.  This is achieved by pushing the 8 lsb of the
66
target address onto the data stack immediately before the jump or call
67
instruction and by encoding the 5 msb of the address within the jump or call
68
instruction.  The instruction immediately following a jump, call, or return is
69
executed before the instruction sequence at the destination address is executed
70
(this is illustrated later).
71
 
72
Up to four banks of memory, either RAM or ROM, are available.  Each of these can
73
be up to 256 bytes long, providing a total of up to 1 kB of memory.
74
 
75 4 sinclairrf
The assembly language is Forth-like.  Built-in macros are used to encode the
76
jump and call instructions and to encode the 2-bit memory bank index in memory
77
store and fetch instructions.
78 2 sinclairrf
 
79
The computer compiler and assembler are written in Python 2.7.  Peripherals are
80
implemented by Python modules which generate the I/O ports and the peripheral
81
HDL.
82
 
83
The computer compiler is documented in the doc directory.  The 9x8 core is
84
documented in the core/9x8/doc directory.  Several examples are provided.
85
 
86
The computer compiler and assembler are fully functional and there are no known
87
bugs.
88
 
89
 
90
SPEED AND RESOURCE UTILIZATION
91
================================================================================
92
These device speed and resource utilization results are copied from the build
93
tests.  The full results are listed in core/9x8/build/uc/uc_led.9x8 which
94
represents a minimal processor implementation (clock, reset, and one output).
95
See the uc_peripherals.9x8 file for results for a more complicated
96
implementation.  Device-specific scripts state how these performance numbers
97
were obtained.
98
 
99
VENDOR          DEVICE          BEST SPEED      SMALLEST RESOURCE UTILIZATION
100
------          ------          ----------      -------------------------------
101
Altera          Cyclone-III     190.6 MHz       282 LEs           (preliminary)
102
Altera          Cyclone-IV      192.1 MHz       281 LEs           (preliminary)
103
Altera          Stratix-V       372.9 MHz       198 ALUTs         (preliminary)
104
Lattice         LCMXO2-640ZE-3   98.4 MHz       206 LUTs          (preliminary)
105
Lattice         LFE2-6E-7       157.9 MHz       203 LUTs          (preliminary)
106
Xilinx          Spartan-3A      148.3 MHz       130 slices, 231 4-input LUTS
107
Xilinx          Spartan-6       200.0 MHz        36 slices, 120 Slice LUTs
108
Xilinx          Virtex-6        275.7 MHz        38 slices, 122 Slice LUTs (p.)
109
 
110
Disclaimer:  Like other embedded processors, these are the maximum performance
111
claims.  Realistic implementations will produce slower maximum clock rates,
112
particularly with lots of I/O ports and peripherals and with the constraint of
113
existing with other subsystems in the FPGA fabric.  What these performance
114
numbers do provide is an estimate of the amount of slack available.  For
115
example, you can't realistically expect to get 110 MHz from a processor that,
116
under ideal conditions, routes and places at 125 MHz, but you can with a
117 4 sinclairrf
processor that synthesizes at 150 MHz.
118 2 sinclairrf
 
119
 
120
EXAMPLE:
121
================================================================================
122
 
123
The LED flasher example demonstrates the simplicity of the architectural
124
specification and the Forth-like assembly language.
125
 
126
The architecture file, named "led.9x8", with the comments and user header
127
removed, is as follows:
128
 
129
  ARCHITECTURE    core/9x8 Verilog
130
 
131
  INSTRUCTION     2048
132
  RETURN_STACK    32
133
  DATA_STACK      32
134
 
135
  PORTCOMMENT LED on/off signal
136
  OUTPORT 1-bit o_led O_LED
137
 
138
  ASSEMBLY led.s
139
 
140
The ARCHITECTURE configuration command specifies the 9x8 core and the Verilog
141
language.  The INSTRUCTION, RETURN_STACK, and DATA_STACK configuration commands
142
specify the sizes of the instruction space, return stack, and data stack.  The
143
content of the PORTCOMMENT configuration command is inserted in the module
144
declaration -- this facilitates identifying signals in micro controllers with a
145
lot of inputs and outputs.  The single OUTPORT statement specifies a 1-bit
146
signal named "o_led".  This signal is accessed in the assembly code through the
147
symbol "O_LED".  The ASSEMBLY command specifies the single input file "led.s,"
148
which is listed below.  The output module will be "led.v"
149
 
150
The "led.s" assembly file is as follows:
151
 
152
  ; Consume 256*5+4 clock cycles.
153
  ; ( - )
154
  .function pause
155
 
156
  .return
157
 
158
  ; Repeat "pause" 256 times.
159
  ; ( - )
160
  .function repause
161
 
162
  .return
163
 
164
  ; main program (as an infinite loop)
165
  .main
166
 
167
 
168
This example is coded in a traditional Forth structure with the conditional
169
jumps consuming the top of the data stack.  Examining the "pause" function, the
170
".function" directive specifies the start of a function and the function name.
171
The "0" instruction pushes the value "0" onto the top of the data stack.
172
":inner" is a label for a jump instruction.  The "1-" instruction decrements the
173
top of the data stack.  "dup" is the Forth instruction to push a duplicate of
174
the top of the data stack onto the data stack.  The ".jumpc(inner)" macro
175
expands to three instructions as follows:  (1) push the 8 lsb of the address at
176
"inner" onto the data stack, (2) the conditional jump instruction with the 5 msb
177
of the address of "inner" (the jumpc instruction also drops the top of the data
178
stack with its partial address), and (3) a "drop" instruction to drop the
179
duplicated loop count from the top of the data stack.  Finally, the "drop"
180
instruction drops the loop count from the top of the data stack and the
181
".return" macro generates the "return" instruction and a "nop" instruction.
182
 
183
The function "repause" calls the "pause" function 256 times.  The main program
184
body is identified by the directive ".main"  This function runs an infinite loop
185
that toggles the lsb of the LED output, outputs the LED setting, and calls the
186
"repause" function.
187
 
188
A tighter version of the loop in the "pause" function can be written as
189
 
190
  ; Consume 256*3+3 clock cycles.
191
  ; ( - )
192
  .function pause
193
    0xFF :inner .jumpc(inner,1-) .return(drop)
194
 
195
which is 3 cycles long for each iteration, the "drop" that is normally part
196
of the ".jumpc" macro has been replaced by the decrement instruction, and the
197
final "drop" instruction has replaced the default "nop" instruction that is
198
normally part of the ".return" macro.  Note that the decrement is performed
199
after the non-zero comparison in the "jumpc" instruction.
200
 
201
A version of the "pause" function that consumes exactly 1000 clock cycles is:
202
 
203
  .function pause
204
    ${(1000-4)/4-1} :inner nop .jumpc(inner,1-) drop .return
205
 
206
The instruction memory initialization for the processor module includes the
207
instruction mnemonics being performed at each address and replaces the "list"
208
file output from traditional assemblers.  The following is the memory
209
initialization for this LED flasher example.  The main program always starts at
210
address zero and functions are included in the order encountered.  Unused
211
library functions are not included in the generated instruction list.
212
 
213
  reg [8:0] s_opcodeMemory[2047:0];
214
  initial begin
215
    // .main
216
    s_opcodeMemory['h000] = 9'h100; // 0x00
217
    s_opcodeMemory['h001] = 9'h101; // :inner 0x01
218
    s_opcodeMemory['h002] = 9'h052; // ^
219
    s_opcodeMemory['h003] = 9'h008; // dup
220
    s_opcodeMemory['h004] = 9'h100; // O_LED
221
    s_opcodeMemory['h005] = 9'h038; // outport
222
    s_opcodeMemory['h006] = 9'h054; // drop
223
    s_opcodeMemory['h007] = 9'h10D; //
224
    s_opcodeMemory['h008] = 9'h0C0; // call repause
225
    s_opcodeMemory['h009] = 9'h000; // nop
226
    s_opcodeMemory['h00A] = 9'h101; //
227
    s_opcodeMemory['h00B] = 9'h080; // jump inner
228
    s_opcodeMemory['h00C] = 9'h000; // nop
229
    // repause
230
    s_opcodeMemory['h00D] = 9'h100; // 0x00
231
    s_opcodeMemory['h00E] = 9'h119; // :inner
232
    s_opcodeMemory['h00F] = 9'h0C0; // call pause
233
    s_opcodeMemory['h010] = 9'h000; // nop
234
    s_opcodeMemory['h011] = 9'h05C; // 1-
235
    s_opcodeMemory['h012] = 9'h008; // dup
236
    s_opcodeMemory['h013] = 9'h10E; //
237
    s_opcodeMemory['h014] = 9'h0A0; // jumpc inner
238
    s_opcodeMemory['h015] = 9'h054; // drop
239
    s_opcodeMemory['h016] = 9'h054; // drop
240
    s_opcodeMemory['h017] = 9'h028; // return
241
    s_opcodeMemory['h018] = 9'h000; // nop
242
    // pause
243
    s_opcodeMemory['h019] = 9'h100; // 0x00
244
    s_opcodeMemory['h01A] = 9'h05C; // :inner 1-
245
    s_opcodeMemory['h01B] = 9'h008; // dup
246
    s_opcodeMemory['h01C] = 9'h11A; //
247
    s_opcodeMemory['h01D] = 9'h0A0; // jumpc inner
248
    s_opcodeMemory['h01E] = 9'h054; // drop
249
    s_opcodeMemory['h01F] = 9'h054; // drop
250
    s_opcodeMemory['h020] = 9'h028; // return
251
    s_opcodeMemory['h021] = 9'h000; // nop
252
    s_opcodeMemory['h022] = 9'h000;
253
    s_opcodeMemory['h023] = 9'h000;
254
    s_opcodeMemory['h024] = 9'h000;
255
    ...
256
    s_opcodeMemory['h7FF] = 9'h000;
257
  end
258
 
259
 
260
DATA and STRINGS
261
================================================================================
262
 
263
Values are pushed onto the data stack by stating the value.  For example,
264
 
265
  0x10 0x20 'x'
266
 
267
will successively push the values 0x10, 0x20, and the character 'x' onto the
268
data stack.  The character 'x' will be at the top of the data stack after these
269
3 instructions.
270
 
271 5 sinclairrf
Numeric values can be represented in binary, octal, decimal, and hex.  Binary
272
values start with the two characters "0b" followed by a sequence of binary
273
digits; octal numbers start with a "0" followed by a sequence of octal digits;
274
decimal values can start with a "+" or "-" have a non-zero first digit and have
275
zero or more decimal digits; and hex values start with the two characters "0X"
276
followed by a sequence of hex digits.
277 2 sinclairrf
 
278 5 sinclairrf
Examples of equivalent numeric values are:
279
  binary:   0b01  0b10010
280
  octal:    01    022
281
  decimal:  1     18
282
  hex:      0x1   0x12
283
 
284
See the COMPUTED VALUES section for using computed values in the assembler.
285
 
286 2 sinclairrf
There are four ways to specify strings in the assembler.  Simply stating the
287
string
288
 
289
  "Hello World!"
290
 
291
puts the characters in the string onto the data stack with the letter 'H' at the
292
top of the data stack.  I.e., the individual push operations are
293
 
294
  '!' 'd' 'l' ... 'e' 'H'
295
 
296
Prepending a 'N' before the double quote, like
297
 
298
  N"Hello World!"
299
 
300
puts a null-terminated string onto the data stack.  I.e., the value under the
301
'!' will be a 0x00 and the instruction sequence would be
302
 
303
  0x0 '!' 'd' 'l' ... 'e' 'H'
304
 
305
Forth uses counted strings, which are specified here as
306
 
307
  C"Hello World!"
308
 
309 4 sinclairrf
In this case the number of characters, 12, in the string is pushed onto the data
310
stack after the 'H', i.e., the instruction sequence would be
311 2 sinclairrf
 
312
  '!' 'd' 'l' ... 'e' 'H' 12
313
 
314
Finally, a lesser-counted string specified like
315
 
316
  c"Hello World!"
317
 
318
is similar to the Forth-like counted string except that the value pushed onto
319
the data stack is one less than the number of characters in the string.  Here
320
the value pushed onto the data stack after the 'H' would be 11 instead of 12.
321
 
322
Simple strings are useful for constructing more complex strings in conjunction
323
with other string functions.   For example, to transmit the hex values of the
324
top 2 values in the data stack, do something like:
325
 
326
  ; move the top 2 values to the return stack
327
  >r >r
328
  ; push the tail of the message onto the data stack
329
  N"\n\r"
330
  ; convert the 2 values to 2-digit hex values, LSB deepest in the stack
331
  r> .call(string_byte_to_hex)
332
  r> .call(string_byte_to_hex)
333
  ; pre-pend the identification message
334
  "Message:  "
335
  ; transmit the string, using the null terminator to terminate the loop
336
  :loop_transmit .outport(O_UART_TX) .jumpc(loop_transmit,nop) drop
337
 
338
A lesser-counted string would be used like:
339
 
340
  c"Status Message\r\n"
341
  :loop_msg swap .outport(O_UART_TX) .jumpc(loop_msg,1-) drop
342
 
343
These four string formats can also be used for variable definitions.  For
344
example 3 variables could be allocated and initialized as follows:
345
 
346
  .memory ROM myrom
347
  .variable fred N"fred"
348
  .variable joe  c"joe"
349
  .variable moe  "moe"
350
 
351
These are equivalent to
352
 
353
  .variable fred 'f' 'r' 'e' 'd'  0
354
  .variable joe   2  'j' 'o' 'e'
355
  .variable moe  'm' 'o' 'e'
356
 
357
with 5 bytes allocated for the variable fred, 4 bytes for joe, and 3 bytes for
358
moe.
359
 
360
The following escaped characters are recognized:
361
 
362
  '\0'     null character
363
  '\a'     bell
364
  '\b'     backspace
365
  '\f'     form feed
366
  '\n'     line feed
367
  '\r'     carriage return
368
  '\t'     horizontal tab
369
  "\0ooo"  3-digit octal value
370
  "\xXX"   2-digit hex value where X is one of 0-9, a-f, or A-F
371
  "\Xxx"   alternate form for 2-digit hex value
372
  "\\"     backslash character
373
 
374
Unrecognized escaped characters are simple treated as that character.  For
375
example, '\m' is treated as the single character 'm' and '\'' is treated as the
376
single quote character.
377
 
378
 
379
INSTRUCTIONS
380
================================================================================
381
 
382
The 41 instructions are as follows (see core/9x8/doc/opcodes.html for detailed
383
descriptions).  Here, T is the top of the data stack, N is the next-to-top of
384
the data stack, and R is the top of the return stack.  All of these are the
385
values at the start of the instruction.
386
 
387
The nop instruction does nothing:
388
 
389
  nop           no operation
390
 
391
Mathematical operations drop one value from the data stack and replace the new
392
top with the state value:
393
 
394
  &             bitwise and of N and T
395
  +             N + T
396
  -             N - T
397
  ^             bitwise exclusive or of N and T
398
  or            bitwise or of N and T
399
 
400
Increment and decrement replace the top of the data stack with the stated
401
result.
402
 
403
  1+            replace T with T+1
404
  1-            replace T with T-1
405
 
406
Comparison operations replace the top of the data stack with the results of the
407
comparison:
408
 
409
  -1<>          replace T with -1 if T != -1, otherwise set T to 0
410
  -1=           replace T with 0 if T != -1, otherwise leave T as -1
411
  0<>           replace T with -1 if T != 0, otherwise leave T as 0
412
  0=            replace T with -1 if T == 0, otherwise set T to 0
413
 
414
Shift/rotate operations replace the top of the data with with the result of the
415
specified shift/rotate.
416
 
417
  0>>           shift T right one bit and set the msb to 0
418
  1>>           shift T right 1 bit and set the msb to 1
419
  <<0           shift T left 1 bit and set the lsb to 0
420
  <<1           shift T left 1 bit and set the lsb to 1
421
  <
422
  lsb>>         rotate T right 1 bit
423
  msb>>         shift T right 1 bit and set the msb to the old msb
424
 
425
Note:  There is no "<
426
 
427
Stack manipulation instructions are as follows:
428
 
429
  >r            pushd T onto the return stack and drop T from the data stack
430
  drop          drop T from the data stack
431
  dup           push T onto the data stack
432
  nip           drop N from the data stack
433
  over          push N onto the data stack
434
  push          push a single byte onto the data stack, see the preceding DATA
435
                and STRINGS section
436
  r>            push R onto the data stack and drop R from the return stack
437
  r@            push R onto the data stack
438
  swap          swap N and T
439
 
440
Jump and call and their conditional variants are as follows and must use the
441
associated macro:
442
 
443
  call          call instruction -- use the .call macro
444
  callc         conditional call instruction -- use the .callc macro
445
  jump          jump instruction -- use the .jump macro
446
  jumpc         conditional jump instruction -- use the .jumpc macro
447
  return        return instruction -- use the .return macro
448
 
449
See the MEMORY section for details for these memory operations.  T is the
450
address for the instructions, N is the value stored.  Chained fetches insert the
451
value below T.  Chained stores drop N.
452
 
453
  fetch         memory fetch, replace T with the value fetched
454
  fetch+        chained memory fetch, retain and increment the address
455
  fetch-        chained memory fetch, retain and decrement the address
456
  store         memory store, drop T (N is the next value of T)
457
  store+        chained memory store, retain and increment the address
458
  store-        chained memory store, retain and decrement the address
459
 
460
See the INPORT and OUTPORT section for details for the input and output port
461
operations:
462
 
463
  inport        input port operation
464
  outport       output port operation
465
 
466
The .call, .callc, .jump, and .jumpc macros encode the 3 instructions required
467
to perform a call or jump along with the subsequent instructions.  The default
468
third instructions is "nop" for .call and .jump and it is "drop" for .callc and
469
.jumpc.  The default can be changed by specifying the optional second argument.
470
The .call and .callc macros must specify a function identified by the .function
471
directive and the .jump and .jumpc macros must specify a label.
472
 
473
The .function directive takes the name of the function and the function body.
474
Function bodies must end with a .return or a .jump macro.  The .main directive
475
defines the body of the main function, i.e., the function at which the processor
476
starts.
477
 
478
The .include directive is used to read additional assembly code.  You can, for
479
example, put the main function in uc.s, define constants and such in consts.s,
480
define the memories and variables in ram.s, and include UART utilities in
481
uart.s.  These files could be included in uc.s through the following lines:
482
 
483
  .include consts.s
484
  .include myram.s
485
  .include uart.s
486
 
487
The assembler only includes functions that can be reached from the main
488
function.  Unused functions will not consume instruction space.
489
 
490
 
491
INPORT and OUTPORT
492
================================================================================
493
 
494
The INPORT and OUTPORT configuration commands are used to specify 2-state inputs
495
and outputs.  For example
496
 
497
  INPORT 8-bit i_value I_VALUE
498
 
499
specifies a single 8-bit input signal named "i_value" for the module.  The port
500
is accessed in assembly by ".inport(I_VALUE)" which is equivalent to the
501
two-instruction sequence "I_VALUE inport".  To input an 8-bit value from a FIFO
502
and send a single-clock-cycle wide acknowledgment strobe, use
503
 
504
  INPORT 8-bit,strobe i_fifo,o_fifo_ack I_FIFO
505
 
506
The assembly ".inport(I_FIFO)" will automatically send an acknowledgment strobe
507
to the FIFO through "o_fifo_ack".
508
 
509
A write port to an 8-bit FIFO is similarly specified by
510
 
511
  OUTPORT 8-bit,strobe o_fifo,o_fifo_wr O_FIFO
512
 
513
The assembly ".outport(O_FIFO)" which is equivalent to "O_FIFO outport drop"
514
will automatically send a write strobe to the FIFO through "o_fifo_wr".
515
 
516
Multiple signals can be packed into a single input or output port by defining
517
them in comma separated lists.  The associated bit masks can be defined
518
coincident with the port definition as follows:
519
 
520
  INPUT 1-bit,1-bit i_fifo_full,i_fifo_empty I_FIFO_STATUS
521
  CONSTANT C_FIFO_STATUS__FULL  0x02
522
  CONSTANT C_FIFO_STATUS__EMPTY 0x01
523
 
524
Checking the "full" status of the FIFO can be done by the following assembly
525
sequence:
526
 
527
  .inport(I_FIFO_STATUS) C_FIFO_STATUS__FULL &
528
 
529
Multiple bits can be masked using a computed value as follows (see below for
530
more details):
531
 
532
  .inport(I_FIFO_STATUS) ${C_FIFO_STATUS__FULL|C_FIFO_STATUS__EMPTY} &
533
 
534
The "${...}" creates an instruction to push the 8-bit value in the braces onto
535
the data stack.  The computation is performed using the Python "eval" function
536
in the context of the program constants, memory addresses, and memory sizes.
537
 
538
Preceding all of these by
539
 
540
  PORTCOMMENT external FIFO
541
 
542
produces the following in the Verilog module statement.  The I/O ports are
543
listed in the order in which they are declared.
544
 
545
  // external FIFO
546
  input  wire       [7:0] i_fifo,
547
  output reg              o_fifo_ack,
548
  output reg        [7:0] o_fifo,
549
  output reg              o_fifo_wr,
550
  input  wire             i_fifo_full,
551
  input  wire             i_fifo_empty
552
 
553
The HDL to implement the inputs and outputs is computer generated.  Identifying
554
the port name in the architecture file eliminates the possibility of
555
inconsistent port numbers between the HDL and the assembly.  Specifying the bit
556
mapping for the assembly code immediately after the port definition helps
557
prevent inconsistencies between the port definition and the bit mapping in the
558
assembly code.
559
 
560
The normal initial value for an outport is zero.  This can be changed by
561
including an optional initial value as follows.  This initial value will be
562
applied on system startup and when the micro controller is reset.
563
 
564
  OUTPORT 4-bit=4'hA o_signal O_SIGNAL
565
 
566
An isolated output strobe can also be created using:
567
 
568
  OUTPORT strobe o_strobe O_STROBE
569
 
570
The assembly ".outstrobe(O_STROBE)" which is equivalent to "O_STROBE outport"
571
is used to generate the strobe.  Since "O_STROBE" is a strobe-only outport, the
572
".outport" macro cannot be used with it.  Similarly, attempting to use the
573
".outstrobe" macro will generate an error if it is invoked with an outport
574
that does have data.
575
 
576
A single-bit "set-reset" input port type is also included.  This sets a register
577
when an external strobe is received and clears the register when the port is
578
read.  For example, to capture an external timer for a polled-loop, include the
579
following in the architecture file:
580
 
581
  PORTCOMMENT external timer
582
  INPORT set-reset i_timer I_TIMER
583
 
584
The following is the assembly code to conditionally call two functions when the
585
timer event is encountered:
586
 
587
  .inport(I_TIMER)
588
    .callc(timer_event_1,nop)
589
    .callc(timer_event_2)
590
 
591
The "nop" in the first conditional call prevents the conditional from being
592
dropped from the data stack so that it can be used by the subsequent conditional
593
function call.
594
 
595
 
596
PERIPHERAL
597
================================================================================
598
 
599
Peripherals are implemented via Python modules.  For example, an open drain I/O
600
signal, such as is required for an I2C bus, does not fit the INPORT and OUTPORT
601
functionality.  Instead, an "open_drain" peripheral is provided by the Python
602
script in "core/9x8/peripherals/open_drain.py".  This puts a tri-state I/O in
603
the module statement, allows it to be read through an "inport" instruction, and
604
allows it to be set low or released through an "outport" instruction.  An I2C
605
bus with separate SCL and SDA ports can then be incorporated into the processor
606
as follows:
607
 
608
  PORTCOMMENT     I2C bus
609
  PERIPHERAL      open_drain      inport=I_SCL \
610
                                  outport=O_SCL \
611
                                  iosignal=io_scl
612
  PERIPHERAL      open_drain      inport=I_SDA \
613
                                  outport=O_SDA \
614
                                  iosignal=io_sda
615
 
616
The default width for this peripheral is 1 bit.  The module statement will then
617
include the lines
618
 
619
  // I2C bus
620
  inout  wire     io_scl,
621
  inout  wire     io_sda
622
 
623
The assembly code to set the io_scl signal low is "0 .outport(O_SCL)" and to
624
release it is "1 .outport(O_SCL)".  These instruction sequences are actually
625
"0 O_SCL outport drop" and "1 O_SCL outport drop" respectively.  The "outport"
626
instruction drops the top of the data stack (which contained the port number)
627
and sends the next-to-the-top of the data stack to the designated output port.
628
 
629
Two examples of I2C device operation are included in the examples directory.
630
 
631
The following peripherals are provided:
632
  adder_16bit   16-bit adder/subtractor
633
  AXI4_Lite_Master
634
                32-bit read/write AXI4-Lite Master
635
                Note:  The synchronous version has been tested on hardware.
636
  AXI4_Lite_Slave_DualPortRAM
637
                dual-port-RAM interface for the micro controller to act as an
638
                AXI4-Lite slave
639
  big_inport    shift reads from a single INPORT to construct a wide input
640
  big_outport   shift writes to a single OUTPORT to construct a wide output
641
  counter       counter for number of received high cycles from signal
642
  inFIFO_async  input FIFO with an asynchronous write clock
643
  latch         latch wide inputs for sampling
644
  monitor_stack simulation diagnostic (see below)
645
  open_drain    for software-implemented I2C buses or similar
646
  outFIFO_async output FIFO with an asynchronous read clock
647
  PWM_8bit      PWM generator with an 8-bit control
648
  timer         timing for polled loops or similar
649
  trace         simulation diagnostic (see below)
650
  UART          bidirectional UART
651
  UART_Rx       receive UART
652
  UART_Tx       transmit UART
653 3 sinclairrf
  wide_strobe   1 to 8 bit strobe generator
654 2 sinclairrf
 
655
The following command illustrates how to display the help message for
656
peripherals:
657
 
658
  echo "ARCHITECTURE core/9x8 Verilog" | ssbcc -P "big_inport help" - | less
659
 
660
User defined peripherals can be in the same directory as the architecture file
661
or a subdirectory named "peripherals".
662
 
663
 
664
PARAMETER and LOCALPARAM
665
================================================================================
666
 
667
Parameters are incorporated through the PARAMETER and LOCALPARAM configuration
668
commands.  For example, the clock frequency in hertz is needed for UARTs for
669
their baud rate generator.  The configuration command
670
 
671
  PARAMETER G_CLK_FREQ_HZ 97_000_000
672
 
673
specifies the clock frequency as 97 MHz.  The HDL instantiating the processor
674
can change this specification.  The frequency can also be changed through the
675
command-line invocation of the computer compiler.  For example,
676
 
677
  ssbcc -G "G_CLK_FREQ_HZ=100_000_000" myprogram.9x8
678
 
679
specifies that a frequency of 100 MHz be used instead of the default frequency
680
of 97 MHz.
681
 
682
The LOCALPARAM configuration command can be used to specify parameters that
683
should not be changed by the surrounding HDL.  For example,
684
 
685
  LOCALPARAM L_VERSION 24'h00_00_00
686
 
687
specifies a 24-bit parameter named "L_VERSION".  The 8-bit major, minor, and
688
build sections of the parameter can be accessed in an assembly program using
689
"L_VERSION[16+:8]", "L_VERSION[8+:8]", and "L_VERSION[0+:8]".
690
 
691
For both parameters and localparams, the default range is "[0+:8]".  The
692
instruction memory is initialized using the parameter value during synthesis,
693
not the value used to initialize the parameter.  That is, the instruction memory
694
initialization will be:
695
 
696
  s_opcodeMemory[...] = { 1'b1, L_VERSION[16+:8] };
697
 
698
The value of the localparam can be set when the computer compiler is run using
699
the "-G" option.  For example,
700
 
701
  ssbcc -G "L_VERSION=24'h01_04_03" myprogram.9x8
702
 
703
can be used in a makefile to set the version number for a release without
704
modifying the micro controller architecture file.
705
 
706
 
707
DIAGNOSTICS AND DEBUGGING
708
================================================================================
709
 
710
A 3-character, human readable version of the opcode can be included in
711
simulation waveform outputs by adding "--display-opcode" to the ssbcc command.
712
 
713
The stack health can be monitored during simulation by including the
714
"monitor_stack" peripheral through the command line.  For example, the LED
715
flasher example can be generated using
716
 
717
  ssbcc -P monitor_stack led.9x8
718
 
719
This allows the architecture file to be unchanged between simulation and an FPGA
720
build.
721
 
722
Stack errors include underflow and overflow, malformed data validity, and
723
incorrect use of the values on the return stack (returns to data values and data
724
operations on return addresses).  Other errors include out-of-range for memory,
725
inport, and outport operations.
726
 
727
When stack errors are detected the last 50 instructions are dumped to the
728
console and the simulation terminates.  The dump includes the PC, numeric
729
opcode, textual representation of the opcode, data stack pointer, next-to-top of
730
the data stack, top of the data stack, top of the return stack, and the return
731
stack pointer.  Invalid stack values are displayed as "XX".  The length of the
732
history dumped is configurable.
733
 
734
Out-of-range PC checks are also performed if the instruction space is not a
735
power of 2.
736
 
737
A "trace" peripheral is also provided that dumps the entire execution history.
738
This was used to validate the processor core.
739
 
740
 
741
MEMORY ARCHITECTURE
742
================================================================================
743
 
744
The DATA_STACK, RETURN_STACK, INSTRUCTION, and MEMORY configuration commands
745
allocate memory for the data stack, return stack, instruction ROM, and memory
746
RAM and ROM respectively.  The data stack, return stack, and memories are
747
normally instantiated as dual-port LUT-based memories with asynchronous reads
748
while the instruction memory is always instantiated with a synchronous read
749
architecture.
750
 
751
The COMBINE configuration command is used to coalesce memories and to convert
752
LUT-based memories to synchronous SRAM-based memories.  For example, the large
753
SRAMs in modern FPGAs are ideal for storing the instruction opcodes and their
754
dual-ported access allows either the data stack or the return stack to be
755
stored in a relatively small region at the end of the large instruction memory.
756
Memories, which required dual-ported operation, can also be instantiated in
757
large RAMs either individually or in combination with each other.  Conversion
758
to SRAM-based memories is also useful for FPGA architectures that do not have
759
efficient LUT-based memories.
760
 
761
The INSTRUCTION configuration allocates memory for the processor instruction
762
space.  It has the form "INSTRUCTION N" or "INSTRUCTION N*M" where N must be a
763
power of 2.  The first form is used if the desired instruction memory size is a
764
power of 2.  The second form is used to allocate M memory blocks of size N
765
where M is not a power of 2.  For example, on an Altera Cyclone III, the
766
configuration command "INSTRUCTION 1024*3" allocates three M9Ks for the
767
instruction space, saving one M9K as compared to the configuration command
768
"INSTRUCTION 4096".
769
 
770
The DATA_STACK configuration command allocates memory for the data stack.  It
771
has the form "DATA_STACK N" where N is the commanded size of the data stack.
772
N must be a power of 2.
773
 
774
The RETURN_STACK configuration command allocates memory for the return stack and
775
has the same format as the DATA_STACK configuration command.
776
 
777
The MEMORY configuration command is used to define one to four memories, either
778
RAM or ROM, with up to 256 bytes each.  If no MEMORY configuration command is
779
issued, then no memories are allocated for the processor.  The MEMORY
780
configuration command has the format "MEMORY {RAM|ROM} name N" where
781
"{RAM|ROM}" specifies either a RAM or a ROM, name is the name of the memory and
782
must start with an alphabetic character, and the size of the memory, N, must be
783
a power of 2.  For example, "MEMORY RAM myram 64" allocates 64 bytes of memory
784
to form a RAM named myram.  Similarly, "MEMORY ROM lut 256" defines a 256 byte
785
ROM named lut.  More details on using memories is provided in the next section.
786
 
787
The COMBINE configuration command can be used to combine the various memories
788
for more efficient processor implementation as follows:
789
 
790
  COMBINE INSTRUCTION,
791
  COMBINE 
792
  COMBINE ,
793
  COMBINE 
794
 
795
where  is one of DATA_STACK, RETURN_STACK, or a list of one
796
or more ROMs and  is a list of one or more RAMs and/or ROMs.  The first
797
configuration command reserves space at the end of the instruction memory for
798
the DATA_STACK, RETURN_STACK, or listed ROMs.
799
 
800
The SRAM_WIDTH configuration command is used to make the memory allocations more
801
efficient when the SRAM block width is more than 9 bits.  For example,
802
Altera's Cyclone V family has 10-bit wide memory blocks and the configuration
803
command "SRAM_WIDTH 10" is appropriate.  The configuration command
804
sequence
805
 
806
  INSTRUCTION     1024
807
  RETURN_STACK    32
808
  SRAM_WIDTH      10
809
  COMBINE         INSTRUCTION,RETURN_STACK
810
 
811
will use a single 10-bit memory entry for each element of the return stack
812
instead of packing the 10-bit values into two memory entries of a 9-bit wide
813
memory.
814
 
815
The following illustrates a possible configuration for a Spartan-6 with a
816
2048-long SRAM and relatively large 64-deep data stack.  The data stack will be
817
in the last 64 elements of the instruction memory and the instruction space will
818
be reduced to 1984 words.
819
 
820
  INSTRUCTION   2048
821
  DATA_STACK    64
822
  COMBINE       INSTRUCTION,DATA_STACK
823
 
824
The following illustrates a possible configuration for a Cyclone-III with three
825
M9Ks for the instruction ROM and the data stack.
826
 
827
  INSTRUCTION   1024*3
828
  DATA_STACK    64
829
  COMBINE       INSTRUCTION,DATA_STACK
830
 
831
WARNING:  Some devices, such as Xilinx' Spartan-3A devices, do not support
832
asynchronous reads, so the COMBINE configuration command does not work for them.
833
 
834
WARNING:  Xilinx XST does not correctly infer a Block RAM when the
835
"COMBINE INSTRUCTION,RETURN_STACK" configuration command is used and the
836
instruction space is 1024 instructions or larger.  Xilinx is supposed to fix
837
this in a future release of Vivado so the fix will only apply to 7-series or
838
later FPGAs.
839
 
840
 
841
MEMORY
842
================================================================================
843
 
844
The MEMORY configuration command is used as follows to allocate a 128-byte RAM
845
named "myram" and to allocate a 32-byte ROM named "myrom".  Zero to four
846
memories can be allocated, each with up to 256 bytes.
847
 
848
  MEMORY RAM myram 128
849
  MEMORY ROM myrom  32
850
 
851
The assembly code to lay out the memory uses the ".memory" directive to identify
852
the memory and the ".variable" directive to identify the symbol and its content.
853
Single or multiple values can be listed and "*N" can be used to identify a
854
repeat count.
855
 
856
  .memory RAM myram
857
  .variable a 0
858
  .variable b 0
859
  .variable c 0 0 0 0
860
  .variable d 0*4
861
 
862
  .memory ROM myrom
863
  .variable coeff_table 0x04
864
                        0x08
865
                        0x10
866
                        0x20
867
  .variable hello_world N"Hello World!\r\n"
868
 
869
Single values are fetched from or stored to memory using the following assembly:
870
 
871
  .fetchvalue(a)
872
  0x12 .storevalue(b)
873
 
874
Multi-byte values are fetched or stored as follows.  This copies the four values
875
from coeff_table, which is stored in a ROM, to d.
876
 
877
  .fetchvector(coeff_table,4) .storevector(d,4)
878
 
879
The memory size is available using computed values (see below) and can be used
880
to clear the entire memory, etc.
881
 
882
The available single-cycle memory operation macros are:
883
  .fetch(mem_name)      replaces T with the value at the address T in the memory
884
                        mem_name
885 5 sinclairrf
                        Note:  .fetchram(var_name) is safer.
886 2 sinclairrf
  .fetch+(mem_name)     pushes the value at address T in the memory mem_name
887
                        into the data stack below T and increments T
888
                        Note:  This is useful for fetching successive values
889
                               from memory into the data stack.
890 5 sinclairrf
                        Note:  .fetchram+(var_name) is safer.
891 2 sinclairrf
  .fetch-(mem_name)     similar to .fetch+ but decrements T
892 5 sinclairrf
                        Note:  .fetchram-(var_name) is safer.
893 2 sinclairrf
  .store(ram_name)      stores N at address T in the RAM ram_name, also drops
894
                        the top of the data stack
895 5 sinclairrf
                        Note:  .storeram(var_name) is safer.
896 2 sinclairrf
  .store+(ram_name)     stores N at address T in the RAM ram_name, also drops N
897
                        from the data stack and increments T
898 5 sinclairrf
                        Note:  .storeram+(var_name) is safer.
899 2 sinclairrf
  .store-(ram_name)     similar to .store+ but decrements T
900 5 sinclairrf
                        Note:  .storeram-(var_name) is safer.
901 2 sinclairrf
 
902
The following multi-cycle macros provide more generalized access to the
903
memories:
904
  .fetchindexed(var_name)
905
                        uses the top of the data stack as an index into var_name
906
                        Note:  This is equivalent to the 3 instruction sequence
907
                               "var_name + .fetch(mem_name)"
908
  .fetchoffset(var_name,offset)
909
                        fetches the single-byte value of var_name offset by
910
                        "offset" bytes
911
                        Note:  This is equivalent to
912
                               "${var_name+offset} .fetch(mem_name)"
913 5 sinclairrf
  .fetchram(var_name)   is similar to the .fetch(mem_name) macro except that the
914
                        variable name is used to identify the memory instead of
915
                        the name of the memory
916
  .fetchram+(var_name)  is similar to the .fetch+(mem_name) macro except that
917
                        the variable name is used to identify the memory instead
918
                        of the name of the memory
919
  .fetchram-(var_name)  is similar to the .fetch-(mem_name) macro except that the
920
                        the variable name is used to identify the memory instead
921
                        of the name of the memory
922
  .fetchvalue(var_name) fetches the single-byte value of var_name
923
                        Note:  This is equivalent to "var_name .fetch(mem_name)"
924
                               where mem_name is the memory in which var_name is
925
                               stored.
926
  .fetchvalueoffset(var_name,offset)
927
                        fetches the single-byte value stored at var_name+offset
928
                        Note:  This is equivalent to
929
                               "${var_name+offset}" .fetch(mem_name)
930
                               where mem_name is the memory in which var_name is
931
                               stored.
932 2 sinclairrf
  .fetchvector(var_name,N)
933
                        fetches N values starting at var_name into the data
934
                        stack with the value at var_name at the top and the
935
                        value at var_name+N-1 deep in the stack.
936
                        Note:  This is equivalent N+1 operation sequence
937
                               "${var_name+N-1} .fetch-(mem_name) ...
938
                               .fetch-(mem_name) .fetch(mem_name)"
939
                               where ".fetch-(mem_name)" is repeated N-1 times.
940
  .storeindexed(var_name)
941
                        uses the top of the data stack as an index into var_name
942
                        into which to store the next-to-top of the data stack.
943
                        Note:  This is equivalent to the 4 instruction sequence
944
                               "var_name + .store(mem_name) drop".
945
                        Note:  The default "drop" instruction can be overriden
946
                               by providing the optional second argument
947
                               similarly to the .storevalue macro.
948
  .storeoffset(var_name,offset)
949
                        stores the single-byte value at the top of the data
950
                        stack at var_name offset by "offset" bytes
951
                        Note:  This is equivalent to
952
                               "${var_name+offset} .store(mem_name) drop"
953
                        Note:  The optional third argument is as per the
954
                               optional second argument of .storevalue
955 5 sinclairrf
  .storeram(var_name)   is similar to the .store(mem_name) macro except that the
956
                        variable name is used to identify the RAM instead of the
957
                        name of the RAM
958
  .storeram+(var_name)  is similar to the .store+(mem_name) macro except that
959
                        the variable name is used to identify the RAM instead of
960
                        the name of the RAM
961
  .storeram-(var_name)  is similar to the .store-(mem_name) macro except that
962
                        the variable name is used to identify the RAM instead of
963
                        the name of the RAM
964
  .storevalue(var_name) stores the single-byte value at the top of the data
965
                        stack at var_name
966
                        Note:  This is equivalent to
967
                               "var_name .store(mem_name) drop"
968
                        Note:  The default "drop" instruction can be replaced by
969
                               providing the optional second argument.  For
970
                               example, the following instruction will store and
971
                               then decrement the value at the top of the data
972
                               stack:
973
                                 .storevalue(var_name,1-)
974 2 sinclairrf
  .storevector(var_name,N)
975
                        Does the reverse of the .fetchvector macro.
976
                        Note:  This is equivalent to the N+2 operation sequence
977
                               "var_name .store+(mem_name) ... .store+(mem_name)
978
                               .store(mem_name) drop"
979
                               where ".store+(mem_name)" is repeated N-1 times.
980
 
981
The .fetchvector and .storevector macros are intended to work with values stored
982
MSB first in memory and with the MSB toward the top of the data stack,
983
similarly to the Forth language with multi-word values.  To demonstrate how
984
this data structure works, consider the examples of decrementing and
985
incrementing a two-byte value on the data stack:
986
 
987
  ; Decrement a 2-byte value
988
  ;   swap 1- swap      - decrement the LSB
989
  ;   over -1=          - puts -1 on the top of the data stack if the LSB rolled
990
  ;                       over from 0 to -1, puts 0 on the top otherwise
991
  ;   +                 - decrements the MSB if the LSB rolled over
992
  ; ( u_LSB u_MSB - u_LSB' u_MSB' )
993
  .function decrement_2byte
994
  swap 1- swap over -1= .return(+)
995
 
996
  ; Increment a 2-byte value
997
  ;   swap 1+ swap      - increment the LSB
998
  ;   over 0=           - puts -1 on the top of the data stack if the LSB rolled
999
  ;                       over from 0xFF to 0, puts 0 on the top otherwise
1000
  ;   -                 - increments the MSB if the LSB rolled over (by
1001
  ;                       subtracting -1)
1002
  ; ( u_LSB u_MSB - u_LSB' u_MSB' )
1003
  .function increment_2byte
1004
  swap 1+ swap over 0= .return(-)
1005
 
1006
 
1007
COMPUTED VALUES
1008
================================================================================
1009
 
1010
Computed values can be pushed on the stack using a "${...}" where the "..." is
1011
evaluated in Python and cannot have any spaces.
1012
 
1013
For example, a loop that should be run 5 times can be coded as:
1014
 
1015
  ${5-1} :loop ... .jumpc(loop,1-) drop
1016
 
1017
which is a clearer indication that the loop is to be run 5 times than is the
1018
instruction sequence
1019
 
1020
  4 :loop ...
1021
 
1022
Constants can be accessed in the computation.  For example, a block of memory
1023
can be allocated as follows:
1024
 
1025
  .constant C_RESERVE
1026
  .memory RAM myram
1027
  ...
1028
  .variable reserved 0*${C_RESERVE}
1029
 
1030
and the block of reserved memory can be cleared using the following loop:
1031
 
1032
  ${C_RESERVE-1} :loop 0 over .storeindexed(reserved) .jumpc(loop,1-) drop
1033
 
1034
The offsets of variables in their memory can also be accessed through a computed
1035
value.  The value of reserved could also be cleared as follows:
1036
 
1037
  ${reserved-1} ${C_RESERVE-1} :loop >r
1038
 
1039
  r> .jumpc(loop,-1) drop drop
1040
 
1041
This body of this version of the loop is the same length as the first version.
1042
In general, it is better to use the memory macros to access variables as they
1043
ensure the correct memory is accessed.
1044
 
1045
The sizes of memories can also be accessed using computed values.  If "myram" is
1046
a RAM, then "${size['myram']}" will push the size of "myram" on the stack.  As
1047
an example, the following code will clear the entire RAM:
1048
 
1049
  ${size['myram']-1} :loop 0 swap .jumpc(loop,.store-(myram)) drop
1050
 
1051
The lengths of I/O signals can also be accessed using computed values.  If
1052
"o_mask" is a mask, then "${size['o_mask']}" will push the size of the mask on
1053
the stack and "${2**size['o_mask']-1}" will push a value that sets all the bits
1054
of the mask.  The I/O signals include I/O signals instantiated by peripherals.
1055
For example, for the configuration command
1056
 
1057
  PERIPHERAL big_outport outport=O_BIG outsignal=o_big width=47
1058
 
1059
the width of the output signal is accessible using "${size['o_big']}".  You can
1060
set the wide signal to all zeroes using:
1061
 
1062
  ${(size['o_big']+7)/8-1} :loop 0 .outport(O_BIG) .jumpc(loop,1-) drop
1063
 
1064 3 sinclairrf
 
1065
MACROS
1066
================================================================================
1067
There are 3 types of macros used by the assembler.
1068
 
1069
The first kind of macros are built in to the assembler and are required to
1070
encode instructions that have embedded values or have mandatory subsequent
1071
instructions.  These include function calls, jump instructions, function return,
1072
and memory accesses as follows:
1073
  .call(function,[op])
1074
  .callc(function,[op])
1075
  .fetch(ramName)
1076
  .fetch+(ramName)
1077
  .fetch-(ramName)
1078
  .jump(label,[op])
1079
  .jumpc(label,[op])
1080
  .return([op])
1081
  .store(ramName)
1082
  .store+(ramName)
1083
  .store-(ramName)
1084
 
1085
The second kind of macros are designed to ease access to input and output
1086
operations and for memory accesses and to help ensure these operations are
1087
correctly constructed.  These are defined as python scripts in the
1088
core/9x8/macros directory and are automatically loaded into the assembler.
1089
These macros are:
1090
  .fetchindexed(variable)
1091
  .fetchoffset(variable,ix)
1092
  .fetchvalue(variableName)
1093
  .fetchvector(variableName,N)
1094
  .inport(I_name)
1095
  .outport(O_name[,op])
1096
  .outstrobe(O_name)
1097
  .storeindexed(variableName[,op])
1098
  .storeoffset(variableName,ix[,op])
1099
  .storevalue(variableName[,op])
1100
  .storevector(variableName,N)
1101
 
1102
The third kind of macro is user-defined macros.  These macros must be registered
1103
with the assembler using the ".macro" directive.
1104
 
1105
For example, the ".push32" macro is defined by macros/9x8/push32.py and can be
1106
used to push 32-bit (4-byte) values onto the data stack as follows:
1107
 
1108
  .macro push32
1109
  .constant C_X 0x87654321
1110
  .main
1111
    ...
1112
    .push32(0x12345678)
1113
    .push32(C_X)
1114
    .push32(${0x12345678^C_X})
1115
    ...
1116
 
1117
The following macros are provided in macros/9x8:
1118
  .push16(v)    push the 16-bit (2-byte) value "v" onto the data stack with the
1119
                MSB at the top of the data stack
1120 4 sinclairrf
  .push24(v)    push the 24-bit (3-byte) value "v" onto the data stack with the
1121
                MSB at the top of the data stack
1122 3 sinclairrf
  .push32(v)    push the 32-bit (4-byte) value "v" onto the data stack with the
1123
                MSB at the top of the data stack
1124 4 sinclairrf
  .pushByte(v,ix)
1125
                push the ix'th byte of v onto the data stack
1126
                Note:  ix=0 designates the LSB
1127 3 sinclairrf
 
1128
Directories are searched in the following order for macros:
1129
  .
1130
  ./macros
1131
  include paths specified by the '-M' command line option.
1132
  macros/9x8
1133
 
1134
The python scripts in core/9x8/macros and macros/9x8 can be used as design
1135
examples for user-defined macros.  The assembler does some type checking based
1136
on the list provided when the macro is registered by the "AddMacro" method, but
1137
additional type checking is often warranted by the macro "emitFunction" which
1138
emits the actual assembly code.  The ".fetchvector" and ".storevector" macros
1139 4 sinclairrf
demonstrates how to design variable-length macros.  Several macros in
1140
core/9x8/macros illustrate designing macros with optional arguments.
1141 3 sinclairrf
 
1142
It is not an error to repeat the ".macro MACRO_NAME" directive for user-defined
1143
macros.  The assembler will issue a fatal error if a user-defined macro
1144
conflicts with a built-in macro.
1145
 
1146
 
1147 2 sinclairrf
CONDITIONAL COMPILATION
1148
================================================================================
1149
The computer compiler and assembler recognize conditional compilation as
1150
follows:  .IFDEF, .IFNDEF, .ELSE, and .ENDIF can be used in the architecture
1151
file and they can be used to conditionally include functions, files, etc within
1152
the assembly code; .ifdef, .ifndef, .else, and .endif can be used in function
1153
bodies, variable bodies, etc. to conditionally include assembly code, symbols,
1154
or data.  Conditionals cannot cross file boundaries.
1155
 
1156
The computer compiler examines the list of defined symbols such as I/O ports,
1157
I/O signals, etc. to evaluate the true/false condition associated with the
1158
.IFDEF and .IFNDEF commands.  The "-D" option to the computer compiler is
1159
provided to define symbols for enabling conditionally compiled configuration
1160
commands.  Similarly, the assembler examines the list of I/O ports, I/O signals,
1161
parameters, constants, etc. to evaluate the .IFDEF, .IFNDEF, .ifdef, and .ifndef
1162
conditionals.
1163
 
1164
For example, a diagnostic UART can be conditionally included using the
1165
configuration commands:
1166
 
1167
  .IFDEF ENABLE_UART
1168
  PORTCOMMENT Diagnostic UART
1169
  PERIPHERAL UART_Tx outport=O_UART_TX ...
1170
  .ENDIF
1171
 
1172
And the assembly code can include conditional code fragments such the following,
1173
where the existence of the output port is used to determine whether or not to
1174
send a character to that output port:
1175
 
1176
  .ifdef(O_UART_TX) 'A' .outport(O_UART_TX) .endif
1177
 
1178
Invoking the computer compiler with "-D ENABLE_UART" will generate a module with
1179
the UART peripheral and will enable the conditional code sending the 'A'
1180
character to the UART port.
1181
 
1182
The following code can be used to preclude multiple attempted inclusions of an
1183
assembly library file.
1184
 
1185
  ; put these two lines near the top of the file
1186
  .IFNDEF C_FILENAME_INCLUDED
1187
  .constant C_FILENAME_INCLUDED 1
1188
  ; put the library body here
1189
  ...
1190
  ; put this line at the bottom of the file
1191
  .ENDIF ; .IFNDEF C_FILENAME_INCLUDED
1192
 
1193
The ".INCLUDE" configuration command can be used to read configuration commands
1194
from additional sources.
1195
 
1196
 
1197
SIMULATIONS
1198
================================================================================
1199
 
1200
Simulations have been performed with Icarus Verilog, Verilator, and Xilinx'
1201
ISIM.  Icarus Verilog is good for short, simple simulations and is used for the
1202
core and peripheral test benches; Verilator for long simulations of large,
1203
complex systems; and ISIM when Xilinx-specific cores are used.  Verilator is
1204
the fastest simulators I've encountered.  Verilator is also used for lint
1205
checking in the core test benches.
1206
 
1207
 
1208
MEM INITIALIZATION FILE
1209
================================================================================
1210
 
1211
A memory initialization file is produced during compilation.  This file can be
1212
used with tools such as Xilinx' data2mem to modify the SRAM contents without
1213
having to rebuild the entire system.  It is restricted to the opcode memory
1214
initialization.  The file must be processed before it can be used by specific
1215
tools, see doc/MemoryInitialization.html.
1216
 
1217
WARNING:  The values of parameters used in the assembly code must match the
1218
instantiated design.
1219
 
1220
 
1221
THEORY OF OPERATION
1222
================================================================================
1223
 
1224
Registers are used for the top of data stack, "T", and the next-to-top of the
1225
data stack, "N".  The data stack is a separate memory.  This means that the
1226
"DATA_STACK N" configuration command actually allows N+2 values in the data
1227
stack since T and N are not stored in the N-element deep data stack.
1228
 
1229
The return stack is similar in that "R" is the top of the return stack and the
1230
"RETURN_STACK N" allocates an additional N words of memory.  The return stack is
1231
the wider of the 8-bit data width and the program counter width.
1232
 
1233
The program counter is always either incremented by 1 or is set to an address
1234
as controlled by jump, jumpc, call, callc, and return instructions.  The
1235
registered program counter is used to read the next opcode from the instruction
1236
memory and this opcode is also registered in the memory.  This means that there
1237
is a 1 clock cycle delay between the address changing and the associated
1238
instruction being performed.  This is also part of the architecture required to
1239
have the processor operate at one instruction per clock cycle.
1240
 
1241
Separate ALUs are used for the program counter, adders, logical operations, etc.
1242
and MUXes are used to select the values desired for the destination registers.
1243
The instruction execution consists of translating the upper 6 msb of the opcode
1244
into MUX settings and performing opcode-dependent ALU operations as controlled
1245
by the 3 lsb of the opcode (during the first half of the clock cycle) and then
1246
setting the T, N, R, memories, etc. as controlled by the computed MUX settings.
1247
 
1248
The "core.v" file is the code for these operations.  Within this file there are
1249
several "@xxx@" strings that specify where the computer compiler is to insert
1250
code such as I/O declarations, memories, inport interpretation, outport
1251
generation, peripherals, etc.
1252
 
1253
The file structure, i.e., putting the core and the assembler in "core/9x8"
1254
should facilitate application-specific modification of processor.  For example,
1255
the store+, store-, fetch+, and fetch- instructions could be replaced with
1256
additional stack manipulation operations, arithmetic operations with 2 byte
1257
results, etc.  Simply copy the "9x8" directory to something like "9x8_XXX" and
1258
make your modifications in that directory.  The 8-bit peripherals should still
1259
work, but the 9x8 library functions may need rework to accommodate the
1260
modifications.
1261
 
1262
 
1263
MISCELLANEOUS
1264
================================================================================
1265
 
1266 4 sinclairrf
Features and peripherals are still being added and the documentation is
1267
incomplete.  The output HDL is currently restricted to Verilog although a VHDL
1268
package file is automatically generated by the computer compiler.
1269
 
1270 2 sinclairrf
The "INVERT_RESET" configuration command is used to indicate an active-low reset
1271
is input to the micro controller rather than an active-high reset.
1272
 
1273
A VHDL package file is automatically generated by the computer compiler.

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.