OpenCores
URL https://opencores.org/ocsvn/ssbcc/ssbcc/trunk

Subversion Repositories ssbcc

[/] [ssbcc/] [trunk/] [README] - Blame information for rev 8

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 sinclairrf
SSBCC.9x8 is a free Small Stack-Based Computer Compiler with a 9-bit opcode,
2 4 sinclairrf
8-bit data core designed to facilitate FPGA HDL development.
3 2 sinclairrf
 
4 4 sinclairrf
The primary design criteria are:
5
- high speed (to avoid timing issues)
6
- low fabric utilization
7
- vendor independent
8
- development tools available for all operating systems
9
 
10
It has been used in Spartan-3A, Spartan-6, Virtex-6, and Artix-7 FPGAs and has
11
been built for Altera, Lattice, and other Xilinx devices.  It is faster and
12
usually smaller than vendor provided processors.
13
 
14 2 sinclairrf
The compiler takes an architecture file that describes the micro controller
15
memory spaces, inputs and outputs, and peripherals and which specifies the HDL
16
language and source assembly.  It generates a single HDL module implementing
17
the entire micro controller.  No user-written HDL is required to instantiate
18
I/Os, program memory, etc.
19
 
20 4 sinclairrf
The features are:
21
- high speed, low fabric utilization
22
- vendor-independent Verilog output with a VHDL package file
23 7 sinclairrf
- simple Forth-like assembly language (43 instructions)
24 4 sinclairrf
- single cycle instruction execution
25
- automatic generation of I/O ports
26
- configurable instruction, data stack, return stack, and memory utilization
27
- extensible set of peripherals (I2C busses, UARTs, AXI4-Lite busses, etc.)
28
- extensible set of macros
29
- memory initialization file to facilitate code development without rebuilds
30
- simulation diagnostics to facilitate identifying code errors
31
- conditionally included I/Os and peripherals, functions, and assembly code
32
 
33 2 sinclairrf
SSBCC has been used for the following projects:
34
- operate a media translator from a parallel camera interface to an OMAP GPMC
35
  interface, detect and report bus errors and hardware errors, and act as an
36
  SPI slave to the OMAP
37
- operate two UART interfaces and multiple PWM controlled 2-lead bi-color LEDs
38
- operate and monitor the Artix-7 fabric in a Zynq system using AXI4-Lite
39
  master and slave buses, I2C buses for timing-critical voltage measurements
40
 
41 4 sinclairrf
The only external tool required is Python 2.7.
42 2 sinclairrf
 
43 4 sinclairrf
 
44 2 sinclairrf
DESCRIPTION
45
================================================================================
46
 
47
The computer compiler uses an architectural description of the processor stating
48
the sizes of the instruction memory, data stack, and return stack; the input and
49
output ports; RAM and ROM types and sizes; and peripherals.
50
 
51
The instructions are all single-cycle.  The instructions include
52 4 sinclairrf
- 4 arithmetic instructions:  addition, subtraction, increment, and decrement
53 7 sinclairrf
- 2 carry bit instructions:  +c and -c for addition and subtraction respectively
54 4 sinclairrf
- 3 bit-wise logical instructions:  and, or, and exclusive or
55
- 7 shift and rotation instructions: <<0, <<1, 0>>, 1>>, <>msb, and >>lsb
56
- 4 logical instructions:  0=, 0<>, -1=, -1<>
57
- 6 Forth-like data stack instructions:  drop, dup, nip, over, push, swap
58
- 3 Forth-like return stack instructions:  >r, r>, r@
59
- 2 input and output
60
- 6 memory read and write with optional address post increment and post decrement
61
- 2 jump and conditional jump
62
- 2 call and conditional call
63
- 1 function return
64
- 1 nop
65 2 sinclairrf
 
66
The 9x8 address space is up to 8K.  This is achieved by pushing the 8 lsb of the
67
target address onto the data stack immediately before the jump or call
68
instruction and by encoding the 5 msb of the address within the jump or call
69
instruction.  The instruction immediately following a jump, call, or return is
70
executed before the instruction sequence at the destination address is executed
71
(this is illustrated later).
72
 
73
Up to four banks of memory, either RAM or ROM, are available.  Each of these can
74
be up to 256 bytes long, providing a total of up to 1 kB of memory.
75
 
76 4 sinclairrf
The assembly language is Forth-like.  Built-in macros are used to encode the
77
jump and call instructions and to encode the 2-bit memory bank index in memory
78
store and fetch instructions.
79 2 sinclairrf
 
80
The computer compiler and assembler are written in Python 2.7.  Peripherals are
81
implemented by Python modules which generate the I/O ports and the peripheral
82
HDL.
83
 
84
The computer compiler is documented in the doc directory.  The 9x8 core is
85
documented in the core/9x8/doc directory.  Several examples are provided.
86
 
87
The computer compiler and assembler are fully functional and there are no known
88
bugs.
89
 
90
 
91
SPEED AND RESOURCE UTILIZATION
92
================================================================================
93
These device speed and resource utilization results are copied from the build
94 7 sinclairrf
tests.  The full results are listed in the core/9x8/build directories.  The
95
tests use a minimal processor implementation (clock, reset, and one output).
96
Device-specific scripts state how these performance numbers were obtained.
97 2 sinclairrf
 
98
VENDOR          DEVICE          BEST SPEED      SMALLEST RESOURCE UTILIZATION
99
------          ------          ----------      -------------------------------
100
Altera          Cyclone-III     190.6 MHz       282 LEs           (preliminary)
101
Altera          Cyclone-IV      192.1 MHz       281 LEs           (preliminary)
102
Altera          Stratix-V       372.9 MHz       198 ALUTs         (preliminary)
103
Lattice         LCMXO2-640ZE-3   98.4 MHz       206 LUTs          (preliminary)
104
Lattice         LFE2-6E-7       157.9 MHz       203 LUTs          (preliminary)
105 7 sinclairrf
Xilinx          Artix-7         TBD             163 slice LUTs (48 slices)
106
Xilinx          Kintex-7        TBD             158 slice LUTs (44 slices)
107
Xilinx          Spartan-3A      149.4 MHz       232 4-input LUTs (129 slices)
108
Xilinx          Spartan-6       193.7 MHz       124 Slice LUTs (34 slices)
109 6 sinclairrf
Xilinx          Virtex-6        275.7 MHz       122 Slice LUTs (38 slices) (p.)
110 2 sinclairrf
 
111
Disclaimer:  Like other embedded processors, these are the maximum performance
112
claims.  Realistic implementations will produce slower maximum clock rates,
113
particularly with lots of I/O ports and peripherals and with the constraint of
114
existing with other subsystems in the FPGA fabric.  What these performance
115
numbers do provide is an estimate of the amount of slack available.  For
116
example, you can't realistically expect to get 110 MHz from a processor that,
117 6 sinclairrf
under ideal conditions, places and routes at 125 MHz, but you can with a
118
processor that is demonstrated to place and route at 150 MHz.
119 2 sinclairrf
 
120
 
121
EXAMPLE:
122
================================================================================
123
 
124
The LED flasher example demonstrates the simplicity of the architectural
125
specification and the Forth-like assembly language.
126
 
127
The architecture file, named "led.9x8", with the comments and user header
128
removed, is as follows:
129
 
130
  ARCHITECTURE    core/9x8 Verilog
131
 
132
  INSTRUCTION     2048
133
  RETURN_STACK    32
134
  DATA_STACK      32
135
 
136
  PORTCOMMENT LED on/off signal
137
  OUTPORT 1-bit o_led O_LED
138
 
139
  ASSEMBLY led.s
140
 
141
The ARCHITECTURE configuration command specifies the 9x8 core and the Verilog
142
language.  The INSTRUCTION, RETURN_STACK, and DATA_STACK configuration commands
143
specify the sizes of the instruction space, return stack, and data stack.  The
144
content of the PORTCOMMENT configuration command is inserted in the module
145
declaration -- this facilitates identifying signals in micro controllers with a
146
lot of inputs and outputs.  The single OUTPORT statement specifies a 1-bit
147
signal named "o_led".  This signal is accessed in the assembly code through the
148
symbol "O_LED".  The ASSEMBLY command specifies the single input file "led.s,"
149
which is listed below.  The output module will be "led.v"
150
 
151
The "led.s" assembly file is as follows:
152
 
153
  ; Consume 256*5+4 clock cycles.
154
  ; ( - )
155
  .function pause
156
 
157
  .return
158
 
159
  ; Repeat "pause" 256 times.
160
  ; ( - )
161
  .function repause
162
 
163
  .return
164
 
165
  ; main program (as an infinite loop)
166
  .main
167
 
168
 
169
This example is coded in a traditional Forth structure with the conditional
170
jumps consuming the top of the data stack.  Examining the "pause" function, the
171
".function" directive specifies the start of a function and the function name.
172
The "0" instruction pushes the value "0" onto the top of the data stack.
173
":inner" is a label for a jump instruction.  The "1-" instruction decrements the
174
top of the data stack.  "dup" is the Forth instruction to push a duplicate of
175
the top of the data stack onto the data stack.  The ".jumpc(inner)" macro
176
expands to three instructions as follows:  (1) push the 8 lsb of the address at
177
"inner" onto the data stack, (2) the conditional jump instruction with the 5 msb
178
of the address of "inner" (the jumpc instruction also drops the top of the data
179
stack with its partial address), and (3) a "drop" instruction to drop the
180
duplicated loop count from the top of the data stack.  Finally, the "drop"
181
instruction drops the loop count from the top of the data stack and the
182
".return" macro generates the "return" instruction and a "nop" instruction.
183
 
184
The function "repause" calls the "pause" function 256 times.  The main program
185
body is identified by the directive ".main"  This function runs an infinite loop
186
that toggles the lsb of the LED output, outputs the LED setting, and calls the
187
"repause" function.
188
 
189
A tighter version of the loop in the "pause" function can be written as
190
 
191
  ; Consume 256*3+3 clock cycles.
192
  ; ( - )
193
  .function pause
194
    0xFF :inner .jumpc(inner,1-) .return(drop)
195
 
196
which is 3 cycles long for each iteration, the "drop" that is normally part
197
of the ".jumpc" macro has been replaced by the decrement instruction, and the
198
final "drop" instruction has replaced the default "nop" instruction that is
199
normally part of the ".return" macro.  Note that the decrement is performed
200
after the non-zero comparison in the "jumpc" instruction.
201
 
202
A version of the "pause" function that consumes exactly 1000 clock cycles is:
203
 
204
  .function pause
205
    ${(1000-4)/4-1} :inner nop .jumpc(inner,1-) drop .return
206
 
207
The instruction memory initialization for the processor module includes the
208
instruction mnemonics being performed at each address and replaces the "list"
209
file output from traditional assemblers.  The following is the memory
210
initialization for this LED flasher example.  The main program always starts at
211
address zero and functions are included in the order encountered.  Unused
212
library functions are not included in the generated instruction list.
213
 
214
  reg [8:0] s_opcodeMemory[2047:0];
215
  initial begin
216
    // .main
217
    s_opcodeMemory['h000] = 9'h100; // 0x00
218
    s_opcodeMemory['h001] = 9'h101; // :inner 0x01
219
    s_opcodeMemory['h002] = 9'h052; // ^
220
    s_opcodeMemory['h003] = 9'h008; // dup
221
    s_opcodeMemory['h004] = 9'h100; // O_LED
222
    s_opcodeMemory['h005] = 9'h038; // outport
223
    s_opcodeMemory['h006] = 9'h054; // drop
224
    s_opcodeMemory['h007] = 9'h10D; //
225
    s_opcodeMemory['h008] = 9'h0C0; // call repause
226
    s_opcodeMemory['h009] = 9'h000; // nop
227
    s_opcodeMemory['h00A] = 9'h101; //
228
    s_opcodeMemory['h00B] = 9'h080; // jump inner
229
    s_opcodeMemory['h00C] = 9'h000; // nop
230
    // repause
231
    s_opcodeMemory['h00D] = 9'h100; // 0x00
232
    s_opcodeMemory['h00E] = 9'h119; // :inner
233
    s_opcodeMemory['h00F] = 9'h0C0; // call pause
234
    s_opcodeMemory['h010] = 9'h000; // nop
235
    s_opcodeMemory['h011] = 9'h05C; // 1-
236
    s_opcodeMemory['h012] = 9'h008; // dup
237
    s_opcodeMemory['h013] = 9'h10E; //
238
    s_opcodeMemory['h014] = 9'h0A0; // jumpc inner
239
    s_opcodeMemory['h015] = 9'h054; // drop
240
    s_opcodeMemory['h016] = 9'h054; // drop
241
    s_opcodeMemory['h017] = 9'h028; // return
242
    s_opcodeMemory['h018] = 9'h000; // nop
243
    // pause
244
    s_opcodeMemory['h019] = 9'h100; // 0x00
245
    s_opcodeMemory['h01A] = 9'h05C; // :inner 1-
246
    s_opcodeMemory['h01B] = 9'h008; // dup
247
    s_opcodeMemory['h01C] = 9'h11A; //
248
    s_opcodeMemory['h01D] = 9'h0A0; // jumpc inner
249
    s_opcodeMemory['h01E] = 9'h054; // drop
250
    s_opcodeMemory['h01F] = 9'h054; // drop
251
    s_opcodeMemory['h020] = 9'h028; // return
252
    s_opcodeMemory['h021] = 9'h000; // nop
253
    s_opcodeMemory['h022] = 9'h000;
254
    s_opcodeMemory['h023] = 9'h000;
255
    s_opcodeMemory['h024] = 9'h000;
256
    ...
257
    s_opcodeMemory['h7FF] = 9'h000;
258
  end
259
 
260
 
261
DATA and STRINGS
262
================================================================================
263
 
264
Values are pushed onto the data stack by stating the value.  For example,
265
 
266
  0x10 0x20 'x'
267
 
268
will successively push the values 0x10, 0x20, and the character 'x' onto the
269
data stack.  The character 'x' will be at the top of the data stack after these
270
3 instructions.
271
 
272 5 sinclairrf
Numeric values can be represented in binary, octal, decimal, and hex.  Binary
273
values start with the two characters "0b" followed by a sequence of binary
274
digits; octal numbers start with a "0" followed by a sequence of octal digits;
275
decimal values can start with a "+" or "-" have a non-zero first digit and have
276
zero or more decimal digits; and hex values start with the two characters "0X"
277
followed by a sequence of hex digits.
278 2 sinclairrf
 
279 5 sinclairrf
Examples of equivalent numeric values are:
280
  binary:   0b01  0b10010
281
  octal:    01    022
282
  decimal:  1     18
283
  hex:      0x1   0x12
284
 
285
See the COMPUTED VALUES section for using computed values in the assembler.
286
 
287 2 sinclairrf
There are four ways to specify strings in the assembler.  Simply stating the
288
string
289
 
290
  "Hello World!"
291
 
292
puts the characters in the string onto the data stack with the letter 'H' at the
293
top of the data stack.  I.e., the individual push operations are
294
 
295
  '!' 'd' 'l' ... 'e' 'H'
296
 
297
Prepending a 'N' before the double quote, like
298
 
299
  N"Hello World!"
300
 
301
puts a null-terminated string onto the data stack.  I.e., the value under the
302
'!' will be a 0x00 and the instruction sequence would be
303
 
304
  0x0 '!' 'd' 'l' ... 'e' 'H'
305
 
306
Forth uses counted strings, which are specified here as
307
 
308
  C"Hello World!"
309
 
310 4 sinclairrf
In this case the number of characters, 12, in the string is pushed onto the data
311
stack after the 'H', i.e., the instruction sequence would be
312 2 sinclairrf
 
313
  '!' 'd' 'l' ... 'e' 'H' 12
314
 
315
Finally, a lesser-counted string specified like
316
 
317
  c"Hello World!"
318
 
319
is similar to the Forth-like counted string except that the value pushed onto
320
the data stack is one less than the number of characters in the string.  Here
321
the value pushed onto the data stack after the 'H' would be 11 instead of 12.
322
 
323
Simple strings are useful for constructing more complex strings in conjunction
324
with other string functions.   For example, to transmit the hex values of the
325
top 2 values in the data stack, do something like:
326
 
327
  ; move the top 2 values to the return stack
328
  >r >r
329
  ; push the tail of the message onto the data stack
330
  N"\n\r"
331
  ; convert the 2 values to 2-digit hex values, LSB deepest in the stack
332
  r> .call(string_byte_to_hex)
333
  r> .call(string_byte_to_hex)
334
  ; pre-pend the identification message
335
  "Message:  "
336
  ; transmit the string, using the null terminator to terminate the loop
337
  :loop_transmit .outport(O_UART_TX) .jumpc(loop_transmit,nop) drop
338
 
339
A lesser-counted string would be used like:
340
 
341
  c"Status Message\r\n"
342
  :loop_msg swap .outport(O_UART_TX) .jumpc(loop_msg,1-) drop
343
 
344
These four string formats can also be used for variable definitions.  For
345
example 3 variables could be allocated and initialized as follows:
346
 
347
  .memory ROM myrom
348
  .variable fred N"fred"
349
  .variable joe  c"joe"
350
  .variable moe  "moe"
351
 
352
These are equivalent to
353
 
354
  .variable fred 'f' 'r' 'e' 'd'  0
355
  .variable joe   2  'j' 'o' 'e'
356
  .variable moe  'm' 'o' 'e'
357
 
358
with 5 bytes allocated for the variable fred, 4 bytes for joe, and 3 bytes for
359
moe.
360
 
361
The following escaped characters are recognized:
362
 
363
  '\0'     null character
364
  '\a'     bell
365
  '\b'     backspace
366
  '\f'     form feed
367
  '\n'     line feed
368
  '\r'     carriage return
369
  '\t'     horizontal tab
370
  "\0ooo"  3-digit octal value
371
  "\xXX"   2-digit hex value where X is one of 0-9, a-f, or A-F
372
  "\Xxx"   alternate form for 2-digit hex value
373
  "\\"     backslash character
374
 
375
Unrecognized escaped characters are simple treated as that character.  For
376
example, '\m' is treated as the single character 'm' and '\'' is treated as the
377
single quote character.
378
 
379
 
380
INSTRUCTIONS
381
================================================================================
382
 
383 7 sinclairrf
The 43 instructions are as follows (see core/9x8/doc/opcodes.html for detailed
384 2 sinclairrf
descriptions).  Here, T is the top of the data stack, N is the next-to-top of
385
the data stack, and R is the top of the return stack.  All of these are the
386
values at the start of the instruction.
387
 
388
The nop instruction does nothing:
389
 
390
  nop           no operation
391
 
392
Mathematical operations drop one value from the data stack and replace the new
393
top with the state value:
394
 
395
  &             bitwise and of N and T
396
  +             N + T
397
  -             N - T
398
  ^             bitwise exclusive or of N and T
399
  or            bitwise or of N and T
400
 
401 7 sinclairrf
Push the carry bit for addition or subtraction onto the data stack (see
402
lib/9x8/math.s for examples on using +c and -c for multi-byte arithmetic):
403
 
404
  +c            carry bit for N+T
405
  -c            carry bit for N-T
406
 
407 2 sinclairrf
Increment and decrement replace the top of the data stack with the stated
408
result.
409
 
410
  1+            replace T with T+1
411
  1-            replace T with T-1
412
 
413
Comparison operations replace the top of the data stack with the results of the
414
comparison:
415
 
416
  -1<>          replace T with -1 if T != -1, otherwise set T to 0
417
  -1=           replace T with 0 if T != -1, otherwise leave T as -1
418
  0<>           replace T with -1 if T != 0, otherwise leave T as 0
419
  0=            replace T with -1 if T == 0, otherwise set T to 0
420
 
421
Shift/rotate operations replace the top of the data with with the result of the
422
specified shift/rotate.
423
 
424
  0>>           shift T right one bit and set the msb to 0
425
  1>>           shift T right 1 bit and set the msb to 1
426
  <<0           shift T left 1 bit and set the lsb to 0
427
  <<1           shift T left 1 bit and set the lsb to 1
428
  <
429
  lsb>>         rotate T right 1 bit
430
  msb>>         shift T right 1 bit and set the msb to the old msb
431
 
432
Note:  There is no "<
433
 
434
Stack manipulation instructions are as follows:
435
 
436
  >r            pushd T onto the return stack and drop T from the data stack
437
  drop          drop T from the data stack
438
  dup           push T onto the data stack
439
  nip           drop N from the data stack
440
  over          push N onto the data stack
441
  push          push a single byte onto the data stack, see the preceding DATA
442
                and STRINGS section
443
  r>            push R onto the data stack and drop R from the return stack
444
  r@            push R onto the data stack
445
  swap          swap N and T
446
 
447
Jump and call and their conditional variants are as follows and must use the
448
associated macro:
449
 
450
  call          call instruction -- use the .call macro
451
  callc         conditional call instruction -- use the .callc macro
452
  jump          jump instruction -- use the .jump macro
453
  jumpc         conditional jump instruction -- use the .jumpc macro
454
  return        return instruction -- use the .return macro
455
 
456
See the MEMORY section for details for these memory operations.  T is the
457
address for the instructions, N is the value stored.  Chained fetches insert the
458
value below T.  Chained stores drop N.
459
 
460
  fetch         memory fetch, replace T with the value fetched
461
  fetch+        chained memory fetch, retain and increment the address
462
  fetch-        chained memory fetch, retain and decrement the address
463
  store         memory store, drop T (N is the next value of T)
464
  store+        chained memory store, retain and increment the address
465
  store-        chained memory store, retain and decrement the address
466
 
467
See the INPORT and OUTPORT section for details for the input and output port
468
operations:
469
 
470
  inport        input port operation
471
  outport       output port operation
472
 
473
The .call, .callc, .jump, and .jumpc macros encode the 3 instructions required
474
to perform a call or jump along with the subsequent instructions.  The default
475
third instructions is "nop" for .call and .jump and it is "drop" for .callc and
476
.jumpc.  The default can be changed by specifying the optional second argument.
477
The .call and .callc macros must specify a function identified by the .function
478
directive and the .jump and .jumpc macros must specify a label.
479
 
480
The .function directive takes the name of the function and the function body.
481
Function bodies must end with a .return or a .jump macro.  The .main directive
482
defines the body of the main function, i.e., the function at which the processor
483
starts.
484
 
485
The .include directive is used to read additional assembly code.  You can, for
486
example, put the main function in uc.s, define constants and such in consts.s,
487
define the memories and variables in ram.s, and include UART utilities in
488
uart.s.  These files could be included in uc.s through the following lines:
489
 
490
  .include consts.s
491
  .include myram.s
492
  .include uart.s
493
 
494
The assembler only includes functions that can be reached from the main
495
function.  Unused functions will not consume instruction space.
496
 
497
 
498
INPORT and OUTPORT
499
================================================================================
500
 
501
The INPORT and OUTPORT configuration commands are used to specify 2-state inputs
502
and outputs.  For example
503
 
504
  INPORT 8-bit i_value I_VALUE
505
 
506
specifies a single 8-bit input signal named "i_value" for the module.  The port
507
is accessed in assembly by ".inport(I_VALUE)" which is equivalent to the
508
two-instruction sequence "I_VALUE inport".  To input an 8-bit value from a FIFO
509
and send a single-clock-cycle wide acknowledgment strobe, use
510
 
511
  INPORT 8-bit,strobe i_fifo,o_fifo_ack I_FIFO
512
 
513
The assembly ".inport(I_FIFO)" will automatically send an acknowledgment strobe
514
to the FIFO through "o_fifo_ack".
515
 
516
A write port to an 8-bit FIFO is similarly specified by
517
 
518
  OUTPORT 8-bit,strobe o_fifo,o_fifo_wr O_FIFO
519
 
520
The assembly ".outport(O_FIFO)" which is equivalent to "O_FIFO outport drop"
521
will automatically send a write strobe to the FIFO through "o_fifo_wr".
522
 
523
Multiple signals can be packed into a single input or output port by defining
524
them in comma separated lists.  The associated bit masks can be defined
525
coincident with the port definition as follows:
526
 
527
  INPUT 1-bit,1-bit i_fifo_full,i_fifo_empty I_FIFO_STATUS
528
  CONSTANT C_FIFO_STATUS__FULL  0x02
529
  CONSTANT C_FIFO_STATUS__EMPTY 0x01
530
 
531
Checking the "full" status of the FIFO can be done by the following assembly
532
sequence:
533
 
534
  .inport(I_FIFO_STATUS) C_FIFO_STATUS__FULL &
535
 
536
Multiple bits can be masked using a computed value as follows (see below for
537
more details):
538
 
539
  .inport(I_FIFO_STATUS) ${C_FIFO_STATUS__FULL|C_FIFO_STATUS__EMPTY} &
540
 
541
The "${...}" creates an instruction to push the 8-bit value in the braces onto
542
the data stack.  The computation is performed using the Python "eval" function
543
in the context of the program constants, memory addresses, and memory sizes.
544
 
545
Preceding all of these by
546
 
547
  PORTCOMMENT external FIFO
548
 
549
produces the following in the Verilog module statement.  The I/O ports are
550
listed in the order in which they are declared.
551
 
552
  // external FIFO
553
  input  wire       [7:0] i_fifo,
554
  output reg              o_fifo_ack,
555
  output reg        [7:0] o_fifo,
556
  output reg              o_fifo_wr,
557
  input  wire             i_fifo_full,
558
  input  wire             i_fifo_empty
559
 
560
The HDL to implement the inputs and outputs is computer generated.  Identifying
561
the port name in the architecture file eliminates the possibility of
562
inconsistent port numbers between the HDL and the assembly.  Specifying the bit
563
mapping for the assembly code immediately after the port definition helps
564
prevent inconsistencies between the port definition and the bit mapping in the
565
assembly code.
566
 
567
The normal initial value for an outport is zero.  This can be changed by
568
including an optional initial value as follows.  This initial value will be
569
applied on system startup and when the micro controller is reset.
570
 
571
  OUTPORT 4-bit=4'hA o_signal O_SIGNAL
572
 
573
An isolated output strobe can also be created using:
574
 
575
  OUTPORT strobe o_strobe O_STROBE
576
 
577
The assembly ".outstrobe(O_STROBE)" which is equivalent to "O_STROBE outport"
578
is used to generate the strobe.  Since "O_STROBE" is a strobe-only outport, the
579
".outport" macro cannot be used with it.  Similarly, attempting to use the
580
".outstrobe" macro will generate an error if it is invoked with an outport
581
that does have data.
582
 
583
A single-bit "set-reset" input port type is also included.  This sets a register
584
when an external strobe is received and clears the register when the port is
585
read.  For example, to capture an external timer for a polled-loop, include the
586
following in the architecture file:
587
 
588
  PORTCOMMENT external timer
589
  INPORT set-reset i_timer I_TIMER
590
 
591
The following is the assembly code to conditionally call two functions when the
592
timer event is encountered:
593
 
594
  .inport(I_TIMER)
595
    .callc(timer_event_1,nop)
596
    .callc(timer_event_2)
597
 
598
The "nop" in the first conditional call prevents the conditional from being
599
dropped from the data stack so that it can be used by the subsequent conditional
600
function call.
601
 
602
 
603
PERIPHERAL
604
================================================================================
605
 
606
Peripherals are implemented via Python modules.  For example, an open drain I/O
607
signal, such as is required for an I2C bus, does not fit the INPORT and OUTPORT
608
functionality.  Instead, an "open_drain" peripheral is provided by the Python
609
script in "core/9x8/peripherals/open_drain.py".  This puts a tri-state I/O in
610
the module statement, allows it to be read through an "inport" instruction, and
611
allows it to be set low or released through an "outport" instruction.  An I2C
612
bus with separate SCL and SDA ports can then be incorporated into the processor
613
as follows:
614
 
615
  PORTCOMMENT     I2C bus
616
  PERIPHERAL      open_drain      inport=I_SCL \
617
                                  outport=O_SCL \
618
                                  iosignal=io_scl
619
  PERIPHERAL      open_drain      inport=I_SDA \
620
                                  outport=O_SDA \
621
                                  iosignal=io_sda
622
 
623
The default width for this peripheral is 1 bit.  The module statement will then
624
include the lines
625
 
626
  // I2C bus
627
  inout  wire     io_scl,
628
  inout  wire     io_sda
629
 
630
The assembly code to set the io_scl signal low is "0 .outport(O_SCL)" and to
631
release it is "1 .outport(O_SCL)".  These instruction sequences are actually
632
"0 O_SCL outport drop" and "1 O_SCL outport drop" respectively.  The "outport"
633
instruction drops the top of the data stack (which contained the port number)
634
and sends the next-to-the-top of the data stack to the designated output port.
635
 
636
Two examples of I2C device operation are included in the examples directory.
637
 
638
The following peripherals are provided:
639
  adder_16bit   16-bit adder/subtractor
640
  AXI4_Lite_Master
641
                32-bit read/write AXI4-Lite Master
642
                Note:  The synchronous version has been tested on hardware.
643
  AXI4_Lite_Slave_DualPortRAM
644
                dual-port-RAM interface for the micro controller to act as an
645
                AXI4-Lite slave
646
  big_inport    shift reads from a single INPORT to construct a wide input
647
  big_outport   shift writes to a single OUTPORT to construct a wide output
648
  counter       counter for number of received high cycles from signal
649
  inFIFO_async  input FIFO with an asynchronous write clock
650
  latch         latch wide inputs for sampling
651
  monitor_stack simulation diagnostic (see below)
652
  open_drain    for software-implemented I2C buses or similar
653
  outFIFO_async output FIFO with an asynchronous read clock
654
  PWM_8bit      PWM generator with an 8-bit control
655
  timer         timing for polled loops or similar
656
  trace         simulation diagnostic (see below)
657
  UART          bidirectional UART
658
  UART_Rx       receive UART
659
  UART_Tx       transmit UART
660 3 sinclairrf
  wide_strobe   1 to 8 bit strobe generator
661 2 sinclairrf
 
662
The following command illustrates how to display the help message for
663
peripherals:
664
 
665
  echo "ARCHITECTURE core/9x8 Verilog" | ssbcc -P "big_inport help" - | less
666
 
667
User defined peripherals can be in the same directory as the architecture file
668
or a subdirectory named "peripherals".
669
 
670
 
671
PARAMETER and LOCALPARAM
672
================================================================================
673
 
674
Parameters are incorporated through the PARAMETER and LOCALPARAM configuration
675
commands.  For example, the clock frequency in hertz is needed for UARTs for
676
their baud rate generator.  The configuration command
677
 
678
  PARAMETER G_CLK_FREQ_HZ 97_000_000
679
 
680
specifies the clock frequency as 97 MHz.  The HDL instantiating the processor
681
can change this specification.  The frequency can also be changed through the
682
command-line invocation of the computer compiler.  For example,
683
 
684
  ssbcc -G "G_CLK_FREQ_HZ=100_000_000" myprogram.9x8
685
 
686
specifies that a frequency of 100 MHz be used instead of the default frequency
687
of 97 MHz.
688
 
689
The LOCALPARAM configuration command can be used to specify parameters that
690
should not be changed by the surrounding HDL.  For example,
691
 
692
  LOCALPARAM L_VERSION 24'h00_00_00
693
 
694
specifies a 24-bit parameter named "L_VERSION".  The 8-bit major, minor, and
695
build sections of the parameter can be accessed in an assembly program using
696
"L_VERSION[16+:8]", "L_VERSION[8+:8]", and "L_VERSION[0+:8]".
697
 
698
For both parameters and localparams, the default range is "[0+:8]".  The
699
instruction memory is initialized using the parameter value during synthesis,
700
not the value used to initialize the parameter.  That is, the instruction memory
701
initialization will be:
702
 
703
  s_opcodeMemory[...] = { 1'b1, L_VERSION[16+:8] };
704
 
705
The value of the localparam can be set when the computer compiler is run using
706
the "-G" option.  For example,
707
 
708
  ssbcc -G "L_VERSION=24'h01_04_03" myprogram.9x8
709
 
710
can be used in a makefile to set the version number for a release without
711
modifying the micro controller architecture file.
712
 
713
 
714
DIAGNOSTICS AND DEBUGGING
715
================================================================================
716
 
717
A 3-character, human readable version of the opcode can be included in
718
simulation waveform outputs by adding "--display-opcode" to the ssbcc command.
719
 
720
The stack health can be monitored during simulation by including the
721
"monitor_stack" peripheral through the command line.  For example, the LED
722
flasher example can be generated using
723
 
724
  ssbcc -P monitor_stack led.9x8
725
 
726
This allows the architecture file to be unchanged between simulation and an FPGA
727
build.
728
 
729
Stack errors include underflow and overflow, malformed data validity, and
730
incorrect use of the values on the return stack (returns to data values and data
731
operations on return addresses).  Other errors include out-of-range for memory,
732
inport, and outport operations.
733
 
734
When stack errors are detected the last 50 instructions are dumped to the
735
console and the simulation terminates.  The dump includes the PC, numeric
736
opcode, textual representation of the opcode, data stack pointer, next-to-top of
737
the data stack, top of the data stack, top of the return stack, and the return
738
stack pointer.  Invalid stack values are displayed as "XX".  The length of the
739
history dumped is configurable.
740
 
741
Out-of-range PC checks are also performed if the instruction space is not a
742
power of 2.
743
 
744
A "trace" peripheral is also provided that dumps the entire execution history.
745
This was used to validate the processor core.
746
 
747
 
748
MEMORY ARCHITECTURE
749
================================================================================
750
 
751
The DATA_STACK, RETURN_STACK, INSTRUCTION, and MEMORY configuration commands
752
allocate memory for the data stack, return stack, instruction ROM, and memory
753
RAM and ROM respectively.  The data stack, return stack, and memories are
754
normally instantiated as dual-port LUT-based memories with asynchronous reads
755
while the instruction memory is always instantiated with a synchronous read
756
architecture.
757
 
758
The COMBINE configuration command is used to coalesce memories and to convert
759
LUT-based memories to synchronous SRAM-based memories.  For example, the large
760
SRAMs in modern FPGAs are ideal for storing the instruction opcodes and their
761
dual-ported access allows either the data stack or the return stack to be
762
stored in a relatively small region at the end of the large instruction memory.
763
Memories, which required dual-ported operation, can also be instantiated in
764
large RAMs either individually or in combination with each other.  Conversion
765
to SRAM-based memories is also useful for FPGA architectures that do not have
766
efficient LUT-based memories.
767
 
768
The INSTRUCTION configuration allocates memory for the processor instruction
769
space.  It has the form "INSTRUCTION N" or "INSTRUCTION N*M" where N must be a
770
power of 2.  The first form is used if the desired instruction memory size is a
771
power of 2.  The second form is used to allocate M memory blocks of size N
772
where M is not a power of 2.  For example, on an Altera Cyclone III, the
773
configuration command "INSTRUCTION 1024*3" allocates three M9Ks for the
774
instruction space, saving one M9K as compared to the configuration command
775
"INSTRUCTION 4096".
776
 
777
The DATA_STACK configuration command allocates memory for the data stack.  It
778
has the form "DATA_STACK N" where N is the commanded size of the data stack.
779
N must be a power of 2.
780
 
781
The RETURN_STACK configuration command allocates memory for the return stack and
782
has the same format as the DATA_STACK configuration command.
783
 
784
The MEMORY configuration command is used to define one to four memories, either
785
RAM or ROM, with up to 256 bytes each.  If no MEMORY configuration command is
786
issued, then no memories are allocated for the processor.  The MEMORY
787
configuration command has the format "MEMORY {RAM|ROM} name N" where
788
"{RAM|ROM}" specifies either a RAM or a ROM, name is the name of the memory and
789
must start with an alphabetic character, and the size of the memory, N, must be
790
a power of 2.  For example, "MEMORY RAM myram 64" allocates 64 bytes of memory
791
to form a RAM named myram.  Similarly, "MEMORY ROM lut 256" defines a 256 byte
792
ROM named lut.  More details on using memories is provided in the next section.
793
 
794
The COMBINE configuration command can be used to combine the various memories
795
for more efficient processor implementation as follows:
796
 
797
  COMBINE INSTRUCTION,
798
  COMBINE 
799
  COMBINE ,
800
  COMBINE 
801
 
802
where  is one of DATA_STACK, RETURN_STACK, or a list of one
803
or more ROMs and  is a list of one or more RAMs and/or ROMs.  The first
804
configuration command reserves space at the end of the instruction memory for
805
the DATA_STACK, RETURN_STACK, or listed ROMs.
806
 
807
The SRAM_WIDTH configuration command is used to make the memory allocations more
808
efficient when the SRAM block width is more than 9 bits.  For example,
809
Altera's Cyclone V family has 10-bit wide memory blocks and the configuration
810
command "SRAM_WIDTH 10" is appropriate.  The configuration command
811
sequence
812
 
813
  INSTRUCTION     1024
814
  RETURN_STACK    32
815
  SRAM_WIDTH      10
816
  COMBINE         INSTRUCTION,RETURN_STACK
817
 
818
will use a single 10-bit memory entry for each element of the return stack
819
instead of packing the 10-bit values into two memory entries of a 9-bit wide
820
memory.
821
 
822
The following illustrates a possible configuration for a Spartan-6 with a
823
2048-long SRAM and relatively large 64-deep data stack.  The data stack will be
824
in the last 64 elements of the instruction memory and the instruction space will
825
be reduced to 1984 words.
826
 
827
  INSTRUCTION   2048
828
  DATA_STACK    64
829
  COMBINE       INSTRUCTION,DATA_STACK
830
 
831
The following illustrates a possible configuration for a Cyclone-III with three
832
M9Ks for the instruction ROM and the data stack.
833
 
834
  INSTRUCTION   1024*3
835
  DATA_STACK    64
836
  COMBINE       INSTRUCTION,DATA_STACK
837
 
838
WARNING:  Some devices, such as Xilinx' Spartan-3A devices, do not support
839
asynchronous reads, so the COMBINE configuration command does not work for them.
840
 
841
WARNING:  Xilinx XST does not correctly infer a Block RAM when the
842
"COMBINE INSTRUCTION,RETURN_STACK" configuration command is used and the
843
instruction space is 1024 instructions or larger.  Xilinx is supposed to fix
844
this in a future release of Vivado so the fix will only apply to 7-series or
845
later FPGAs.
846
 
847
 
848
MEMORY
849
================================================================================
850
 
851
The MEMORY configuration command is used as follows to allocate a 128-byte RAM
852
named "myram" and to allocate a 32-byte ROM named "myrom".  Zero to four
853
memories can be allocated, each with up to 256 bytes.
854
 
855
  MEMORY RAM myram 128
856
  MEMORY ROM myrom  32
857
 
858
The assembly code to lay out the memory uses the ".memory" directive to identify
859
the memory and the ".variable" directive to identify the symbol and its content.
860
Single or multiple values can be listed and "*N" can be used to identify a
861
repeat count.
862
 
863
  .memory RAM myram
864
  .variable a 0
865
  .variable b 0
866
  .variable c 0 0 0 0
867
  .variable d 0*4
868
 
869
  .memory ROM myrom
870
  .variable coeff_table 0x04
871
                        0x08
872
                        0x10
873
                        0x20
874
  .variable hello_world N"Hello World!\r\n"
875
 
876
Single values are fetched from or stored to memory using the following assembly:
877
 
878
  .fetchvalue(a)
879
  0x12 .storevalue(b)
880
 
881
Multi-byte values are fetched or stored as follows.  This copies the four values
882
from coeff_table, which is stored in a ROM, to d.
883
 
884
  .fetchvector(coeff_table,4) .storevector(d,4)
885
 
886
The memory size is available using computed values (see below) and can be used
887
to clear the entire memory, etc.
888
 
889
The available single-cycle memory operation macros are:
890
  .fetch(mem_name)      replaces T with the value at the address T in the memory
891
                        mem_name
892 5 sinclairrf
                        Note:  .fetchram(var_name) is safer.
893 2 sinclairrf
  .fetch+(mem_name)     pushes the value at address T in the memory mem_name
894
                        into the data stack below T and increments T
895
                        Note:  This is useful for fetching successive values
896
                               from memory into the data stack.
897 5 sinclairrf
                        Note:  .fetchram+(var_name) is safer.
898 2 sinclairrf
  .fetch-(mem_name)     similar to .fetch+ but decrements T
899 5 sinclairrf
                        Note:  .fetchram-(var_name) is safer.
900 2 sinclairrf
  .store(ram_name)      stores N at address T in the RAM ram_name, also drops
901
                        the top of the data stack
902 5 sinclairrf
                        Note:  .storeram(var_name) is safer.
903 2 sinclairrf
  .store+(ram_name)     stores N at address T in the RAM ram_name, also drops N
904
                        from the data stack and increments T
905 5 sinclairrf
                        Note:  .storeram+(var_name) is safer.
906 2 sinclairrf
  .store-(ram_name)     similar to .store+ but decrements T
907 5 sinclairrf
                        Note:  .storeram-(var_name) is safer.
908 2 sinclairrf
 
909
The following multi-cycle macros provide more generalized access to the
910
memories:
911
  .fetchindexed(var_name)
912
                        uses the top of the data stack as an index into var_name
913
                        Note:  This is equivalent to the 3 instruction sequence
914
                               "var_name + .fetch(mem_name)"
915
  .fetchoffset(var_name,offset)
916
                        fetches the single-byte value of var_name offset by
917
                        "offset" bytes
918
                        Note:  This is equivalent to
919
                               "${var_name+offset} .fetch(mem_name)"
920 5 sinclairrf
  .fetchram(var_name)   is similar to the .fetch(mem_name) macro except that the
921
                        variable name is used to identify the memory instead of
922
                        the name of the memory
923
  .fetchram+(var_name)  is similar to the .fetch+(mem_name) macro except that
924
                        the variable name is used to identify the memory instead
925
                        of the name of the memory
926
  .fetchram-(var_name)  is similar to the .fetch-(mem_name) macro except that the
927
                        the variable name is used to identify the memory instead
928
                        of the name of the memory
929
  .fetchvalue(var_name) fetches the single-byte value of var_name
930
                        Note:  This is equivalent to "var_name .fetch(mem_name)"
931
                               where mem_name is the memory in which var_name is
932
                               stored.
933
  .fetchvalueoffset(var_name,offset)
934
                        fetches the single-byte value stored at var_name+offset
935
                        Note:  This is equivalent to
936
                               "${var_name+offset}" .fetch(mem_name)
937
                               where mem_name is the memory in which var_name is
938
                               stored.
939 2 sinclairrf
  .fetchvector(var_name,N)
940
                        fetches N values starting at var_name into the data
941
                        stack with the value at var_name at the top and the
942
                        value at var_name+N-1 deep in the stack.
943
                        Note:  This is equivalent N+1 operation sequence
944
                               "${var_name+N-1} .fetch-(mem_name) ...
945
                               .fetch-(mem_name) .fetch(mem_name)"
946
                               where ".fetch-(mem_name)" is repeated N-1 times.
947
  .storeindexed(var_name)
948
                        uses the top of the data stack as an index into var_name
949
                        into which to store the next-to-top of the data stack.
950
                        Note:  This is equivalent to the 4 instruction sequence
951
                               "var_name + .store(mem_name) drop".
952
                        Note:  The default "drop" instruction can be overriden
953
                               by providing the optional second argument
954
                               similarly to the .storevalue macro.
955
  .storeoffset(var_name,offset)
956
                        stores the single-byte value at the top of the data
957
                        stack at var_name offset by "offset" bytes
958
                        Note:  This is equivalent to
959
                               "${var_name+offset} .store(mem_name) drop"
960
                        Note:  The optional third argument is as per the
961
                               optional second argument of .storevalue
962 5 sinclairrf
  .storeram(var_name)   is similar to the .store(mem_name) macro except that the
963
                        variable name is used to identify the RAM instead of the
964
                        name of the RAM
965
  .storeram+(var_name)  is similar to the .store+(mem_name) macro except that
966
                        the variable name is used to identify the RAM instead of
967
                        the name of the RAM
968
  .storeram-(var_name)  is similar to the .store-(mem_name) macro except that
969
                        the variable name is used to identify the RAM instead of
970
                        the name of the RAM
971
  .storevalue(var_name) stores the single-byte value at the top of the data
972
                        stack at var_name
973
                        Note:  This is equivalent to
974
                               "var_name .store(mem_name) drop"
975
                        Note:  The default "drop" instruction can be replaced by
976
                               providing the optional second argument.  For
977
                               example, the following instruction will store and
978
                               then decrement the value at the top of the data
979
                               stack:
980
                                 .storevalue(var_name,1-)
981 2 sinclairrf
  .storevector(var_name,N)
982
                        Does the reverse of the .fetchvector macro.
983
                        Note:  This is equivalent to the N+2 operation sequence
984
                               "var_name .store+(mem_name) ... .store+(mem_name)
985
                               .store(mem_name) drop"
986
                               where ".store+(mem_name)" is repeated N-1 times.
987
 
988
The .fetchvector and .storevector macros are intended to work with values stored
989
MSB first in memory and with the MSB toward the top of the data stack,
990
similarly to the Forth language with multi-word values.  To demonstrate how
991
this data structure works, consider the examples of decrementing and
992
incrementing a two-byte value on the data stack:
993
 
994
  ; Decrement a 2-byte value
995
  ;   swap 1- swap      - decrement the LSB
996
  ;   over -1=          - puts -1 on the top of the data stack if the LSB rolled
997
  ;                       over from 0 to -1, puts 0 on the top otherwise
998
  ;   +                 - decrements the MSB if the LSB rolled over
999
  ; ( u_LSB u_MSB - u_LSB' u_MSB' )
1000
  .function decrement_2byte
1001
  swap 1- swap over -1= .return(+)
1002
 
1003
  ; Increment a 2-byte value
1004
  ;   swap 1+ swap      - increment the LSB
1005
  ;   over 0=           - puts -1 on the top of the data stack if the LSB rolled
1006
  ;                       over from 0xFF to 0, puts 0 on the top otherwise
1007
  ;   -                 - increments the MSB if the LSB rolled over (by
1008
  ;                       subtracting -1)
1009
  ; ( u_LSB u_MSB - u_LSB' u_MSB' )
1010
  .function increment_2byte
1011
  swap 1+ swap over 0= .return(-)
1012
 
1013
 
1014
COMPUTED VALUES
1015
================================================================================
1016
 
1017
Computed values can be pushed on the stack using a "${...}" where the "..." is
1018
evaluated in Python and cannot have any spaces.
1019
 
1020
For example, a loop that should be run 5 times can be coded as:
1021
 
1022
  ${5-1} :loop ... .jumpc(loop,1-) drop
1023
 
1024
which is a clearer indication that the loop is to be run 5 times than is the
1025
instruction sequence
1026
 
1027
  4 :loop ...
1028
 
1029
Constants can be accessed in the computation.  For example, a block of memory
1030
can be allocated as follows:
1031
 
1032
  .constant C_RESERVE
1033
  .memory RAM myram
1034
  ...
1035
  .variable reserved 0*${C_RESERVE}
1036
 
1037
and the block of reserved memory can be cleared using the following loop:
1038
 
1039
  ${C_RESERVE-1} :loop 0 over .storeindexed(reserved) .jumpc(loop,1-) drop
1040
 
1041
The offsets of variables in their memory can also be accessed through a computed
1042
value.  The value of reserved could also be cleared as follows:
1043
 
1044
  ${reserved-1} ${C_RESERVE-1} :loop >r
1045
 
1046
  r> .jumpc(loop,-1) drop drop
1047
 
1048
This body of this version of the loop is the same length as the first version.
1049
In general, it is better to use the memory macros to access variables as they
1050
ensure the correct memory is accessed.
1051
 
1052
The sizes of memories can also be accessed using computed values.  If "myram" is
1053
a RAM, then "${size['myram']}" will push the size of "myram" on the stack.  As
1054
an example, the following code will clear the entire RAM:
1055
 
1056
  ${size['myram']-1} :loop 0 swap .jumpc(loop,.store-(myram)) drop
1057
 
1058
The lengths of I/O signals can also be accessed using computed values.  If
1059
"o_mask" is a mask, then "${size['o_mask']}" will push the size of the mask on
1060
the stack and "${2**size['o_mask']-1}" will push a value that sets all the bits
1061
of the mask.  The I/O signals include I/O signals instantiated by peripherals.
1062
For example, for the configuration command
1063
 
1064
  PERIPHERAL big_outport outport=O_BIG outsignal=o_big width=47
1065
 
1066
the width of the output signal is accessible using "${size['o_big']}".  You can
1067
set the wide signal to all zeroes using:
1068
 
1069
  ${(size['o_big']+7)/8-1} :loop 0 .outport(O_BIG) .jumpc(loop,1-) drop
1070
 
1071 3 sinclairrf
 
1072
MACROS
1073
================================================================================
1074
There are 3 types of macros used by the assembler.
1075
 
1076
The first kind of macros are built in to the assembler and are required to
1077
encode instructions that have embedded values or have mandatory subsequent
1078
instructions.  These include function calls, jump instructions, function return,
1079
and memory accesses as follows:
1080
  .call(function,[op])
1081
  .callc(function,[op])
1082
  .fetch(ramName)
1083
  .fetch+(ramName)
1084
  .fetch-(ramName)
1085
  .jump(label,[op])
1086
  .jumpc(label,[op])
1087
  .return([op])
1088
  .store(ramName)
1089
  .store+(ramName)
1090
  .store-(ramName)
1091
 
1092
The second kind of macros are designed to ease access to input and output
1093
operations and for memory accesses and to help ensure these operations are
1094
correctly constructed.  These are defined as python scripts in the
1095
core/9x8/macros directory and are automatically loaded into the assembler.
1096
These macros are:
1097
  .fetchindexed(variable)
1098
  .fetchoffset(variable,ix)
1099
  .fetchvalue(variableName)
1100
  .fetchvector(variableName,N)
1101
  .inport(I_name)
1102
  .outport(O_name[,op])
1103
  .outstrobe(O_name)
1104
  .storeindexed(variableName[,op])
1105
  .storeoffset(variableName,ix[,op])
1106
  .storevalue(variableName[,op])
1107
  .storevector(variableName,N)
1108
 
1109
The third kind of macro is user-defined macros.  These macros must be registered
1110
with the assembler using the ".macro" directive.
1111
 
1112
For example, the ".push32" macro is defined by macros/9x8/push32.py and can be
1113
used to push 32-bit (4-byte) values onto the data stack as follows:
1114
 
1115
  .macro push32
1116
  .constant C_X 0x87654321
1117
  .main
1118
    ...
1119
    .push32(0x12345678)
1120
    .push32(C_X)
1121
    .push32(${0x12345678^C_X})
1122
    ...
1123
 
1124
The following macros are provided in macros/9x8:
1125
  .push16(v)    push the 16-bit (2-byte) value "v" onto the data stack with the
1126
                MSB at the top of the data stack
1127 4 sinclairrf
  .push24(v)    push the 24-bit (3-byte) value "v" onto the data stack with the
1128
                MSB at the top of the data stack
1129 3 sinclairrf
  .push32(v)    push the 32-bit (4-byte) value "v" onto the data stack with the
1130
                MSB at the top of the data stack
1131 4 sinclairrf
  .pushByte(v,ix)
1132
                push the ix'th byte of v onto the data stack
1133
                Note:  ix=0 designates the LSB
1134 3 sinclairrf
 
1135
Directories are searched in the following order for macros:
1136
  .
1137
  ./macros
1138
  include paths specified by the '-M' command line option.
1139
  macros/9x8
1140
 
1141
The python scripts in core/9x8/macros and macros/9x8 can be used as design
1142
examples for user-defined macros.  The assembler does some type checking based
1143
on the list provided when the macro is registered by the "AddMacro" method, but
1144
additional type checking is often warranted by the macro "emitFunction" which
1145
emits the actual assembly code.  The ".fetchvector" and ".storevector" macros
1146 4 sinclairrf
demonstrates how to design variable-length macros.  Several macros in
1147
core/9x8/macros illustrate designing macros with optional arguments.
1148 3 sinclairrf
 
1149
It is not an error to repeat the ".macro MACRO_NAME" directive for user-defined
1150
macros.  The assembler will issue a fatal error if a user-defined macro
1151
conflicts with a built-in macro.
1152
 
1153
 
1154 2 sinclairrf
CONDITIONAL COMPILATION
1155
================================================================================
1156
The computer compiler and assembler recognize conditional compilation as
1157
follows:  .IFDEF, .IFNDEF, .ELSE, and .ENDIF can be used in the architecture
1158
file and they can be used to conditionally include functions, files, etc within
1159
the assembly code; .ifdef, .ifndef, .else, and .endif can be used in function
1160
bodies, variable bodies, etc. to conditionally include assembly code, symbols,
1161
or data.  Conditionals cannot cross file boundaries.
1162
 
1163
The computer compiler examines the list of defined symbols such as I/O ports,
1164
I/O signals, etc. to evaluate the true/false condition associated with the
1165
.IFDEF and .IFNDEF commands.  The "-D" option to the computer compiler is
1166
provided to define symbols for enabling conditionally compiled configuration
1167
commands.  Similarly, the assembler examines the list of I/O ports, I/O signals,
1168
parameters, constants, etc. to evaluate the .IFDEF, .IFNDEF, .ifdef, and .ifndef
1169
conditionals.
1170
 
1171
For example, a diagnostic UART can be conditionally included using the
1172
configuration commands:
1173
 
1174
  .IFDEF ENABLE_UART
1175
  PORTCOMMENT Diagnostic UART
1176
  PERIPHERAL UART_Tx outport=O_UART_TX ...
1177
  .ENDIF
1178
 
1179
And the assembly code can include conditional code fragments such the following,
1180
where the existence of the output port is used to determine whether or not to
1181
send a character to that output port:
1182
 
1183
  .ifdef(O_UART_TX) 'A' .outport(O_UART_TX) .endif
1184
 
1185
Invoking the computer compiler with "-D ENABLE_UART" will generate a module with
1186
the UART peripheral and will enable the conditional code sending the 'A'
1187
character to the UART port.
1188
 
1189
The following code can be used to preclude multiple attempted inclusions of an
1190
assembly library file.
1191
 
1192
  ; put these two lines near the top of the file
1193
  .IFNDEF C_FILENAME_INCLUDED
1194
  .constant C_FILENAME_INCLUDED 1
1195
  ; put the library body here
1196
  ...
1197
  ; put this line at the bottom of the file
1198
  .ENDIF ; .IFNDEF C_FILENAME_INCLUDED
1199
 
1200
The ".INCLUDE" configuration command can be used to read configuration commands
1201
from additional sources.
1202
 
1203
 
1204
SIMULATIONS
1205
================================================================================
1206
 
1207
Simulations have been performed with Icarus Verilog, Verilator, and Xilinx'
1208
ISIM.  Icarus Verilog is good for short, simple simulations and is used for the
1209
core and peripheral test benches; Verilator for long simulations of large,
1210
complex systems; and ISIM when Xilinx-specific cores are used.  Verilator is
1211
the fastest simulators I've encountered.  Verilator is also used for lint
1212
checking in the core test benches.
1213
 
1214
 
1215
MEM INITIALIZATION FILE
1216
================================================================================
1217
 
1218
A memory initialization file is produced during compilation.  This file can be
1219
used with tools such as Xilinx' data2mem to modify the SRAM contents without
1220
having to rebuild the entire system.  It is restricted to the opcode memory
1221
initialization.  The file must be processed before it can be used by specific
1222
tools, see doc/MemoryInitialization.html.
1223
 
1224
WARNING:  The values of parameters used in the assembly code must match the
1225
instantiated design.
1226
 
1227
 
1228
THEORY OF OPERATION
1229
================================================================================
1230
 
1231
Registers are used for the top of data stack, "T", and the next-to-top of the
1232
data stack, "N".  The data stack is a separate memory.  This means that the
1233
"DATA_STACK N" configuration command actually allows N+2 values in the data
1234
stack since T and N are not stored in the N-element deep data stack.
1235
 
1236
The return stack is similar in that "R" is the top of the return stack and the
1237
"RETURN_STACK N" allocates an additional N words of memory.  The return stack is
1238
the wider of the 8-bit data width and the program counter width.
1239
 
1240
The program counter is always either incremented by 1 or is set to an address
1241
as controlled by jump, jumpc, call, callc, and return instructions.  The
1242
registered program counter is used to read the next opcode from the instruction
1243
memory and this opcode is also registered in the memory.  This means that there
1244
is a 1 clock cycle delay between the address changing and the associated
1245
instruction being performed.  This is also part of the architecture required to
1246
have the processor operate at one instruction per clock cycle.
1247
 
1248
Separate ALUs are used for the program counter, adders, logical operations, etc.
1249
and MUXes are used to select the values desired for the destination registers.
1250
The instruction execution consists of translating the upper 6 msb of the opcode
1251
into MUX settings and performing opcode-dependent ALU operations as controlled
1252
by the 3 lsb of the opcode (during the first half of the clock cycle) and then
1253
setting the T, N, R, memories, etc. as controlled by the computed MUX settings.
1254
 
1255
The "core.v" file is the code for these operations.  Within this file there are
1256
several "@xxx@" strings that specify where the computer compiler is to insert
1257
code such as I/O declarations, memories, inport interpretation, outport
1258
generation, peripherals, etc.
1259
 
1260
The file structure, i.e., putting the core and the assembler in "core/9x8"
1261
should facilitate application-specific modification of processor.  For example,
1262
the store+, store-, fetch+, and fetch- instructions could be replaced with
1263
additional stack manipulation operations, arithmetic operations with 2 byte
1264
results, etc.  Simply copy the "9x8" directory to something like "9x8_XXX" and
1265
make your modifications in that directory.  The 8-bit peripherals should still
1266
work, but the 9x8 library functions may need rework to accommodate the
1267
modifications.
1268
 
1269
 
1270
MISCELLANEOUS
1271
================================================================================
1272
 
1273 4 sinclairrf
Features and peripherals are still being added and the documentation is
1274
incomplete.  The output HDL is currently restricted to Verilog although a VHDL
1275
package file is automatically generated by the computer compiler.
1276
 
1277 2 sinclairrf
The "INVERT_RESET" configuration command is used to indicate an active-low reset
1278
is input to the micro controller rather than an active-high reset.
1279
 
1280
A VHDL package file is automatically generated by the computer compiler.

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.