OpenCores
URL https://opencores.org/ocsvn/ssbcc/ssbcc/trunk

Subversion Repositories ssbcc

[/] [ssbcc/] [trunk/] [README] - Blame information for rev 9

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 sinclairrf
SSBCC.9x8 is a free Small Stack-Based Computer Compiler with a 9-bit opcode,
2 4 sinclairrf
8-bit data core designed to facilitate FPGA HDL development.
3 2 sinclairrf
 
4 4 sinclairrf
The primary design criteria are:
5
- high speed (to avoid timing issues)
6
- low fabric utilization
7
- vendor independent
8
- development tools available for all operating systems
9
 
10
It has been used in Spartan-3A, Spartan-6, Virtex-6, and Artix-7 FPGAs and has
11
been built for Altera, Lattice, and other Xilinx devices.  It is faster and
12
usually smaller than vendor provided processors.
13
 
14 2 sinclairrf
The compiler takes an architecture file that describes the micro controller
15
memory spaces, inputs and outputs, and peripherals and which specifies the HDL
16
language and source assembly.  It generates a single HDL module implementing
17
the entire micro controller.  No user-written HDL is required to instantiate
18
I/Os, program memory, etc.
19
 
20 4 sinclairrf
The features are:
21
- high speed, low fabric utilization
22
- vendor-independent Verilog output with a VHDL package file
23 7 sinclairrf
- simple Forth-like assembly language (43 instructions)
24 4 sinclairrf
- single cycle instruction execution
25
- automatic generation of I/O ports
26
- configurable instruction, data stack, return stack, and memory utilization
27
- extensible set of peripherals (I2C busses, UARTs, AXI4-Lite busses, etc.)
28
- extensible set of macros
29
- memory initialization file to facilitate code development without rebuilds
30
- simulation diagnostics to facilitate identifying code errors
31
- conditionally included I/Os and peripherals, functions, and assembly code
32
 
33 2 sinclairrf
SSBCC has been used for the following projects:
34
- operate a media translator from a parallel camera interface to an OMAP GPMC
35
  interface, detect and report bus errors and hardware errors, and act as an
36
  SPI slave to the OMAP
37
- operate two UART interfaces and multiple PWM controlled 2-lead bi-color LEDs
38
- operate and monitor the Artix-7 fabric in a Zynq system using AXI4-Lite
39
  master and slave buses, I2C buses for timing-critical voltage measurements
40
 
41 4 sinclairrf
The only external tool required is Python 2.7.
42 2 sinclairrf
 
43 4 sinclairrf
 
44 2 sinclairrf
DESCRIPTION
45
================================================================================
46
 
47
The computer compiler uses an architectural description of the processor stating
48
the sizes of the instruction memory, data stack, and return stack; the input and
49
output ports; RAM and ROM types and sizes; and peripherals.
50
 
51 9 sinclairrf
The instructions are all single-cycle.  The instructions are:
52 4 sinclairrf
- 4 arithmetic instructions:  addition, subtraction, increment, and decrement
53 7 sinclairrf
- 2 carry bit instructions:  +c and -c for addition and subtraction respectively
54 4 sinclairrf
- 3 bit-wise logical instructions:  and, or, and exclusive or
55 9 sinclairrf
- 7 shift and rotation instructions: <<0, <<1, 0>>, 1>>, <>, and lsb>>
56 4 sinclairrf
- 4 logical instructions:  0=, 0<>, -1=, -1<>
57
- 6 Forth-like data stack instructions:  drop, dup, nip, over, push, swap
58
- 3 Forth-like return stack instructions:  >r, r>, r@
59 9 sinclairrf
- 2 I/O: inport, outport
60 4 sinclairrf
- 6 memory read and write with optional address post increment and post decrement
61
- 2 jump and conditional jump
62
- 2 call and conditional call
63
- 1 function return
64
- 1 nop
65 2 sinclairrf
 
66
The 9x8 address space is up to 8K.  This is achieved by pushing the 8 lsb of the
67
target address onto the data stack immediately before the jump or call
68
instruction and by encoding the 5 msb of the address within the jump or call
69
instruction.  The instruction immediately following a jump, call, or return is
70
executed before the instruction sequence at the destination address is executed
71
(this is illustrated later).
72
 
73
Up to four banks of memory, either RAM or ROM, are available.  Each of these can
74
be up to 256 bytes long, providing a total of up to 1 kB of memory.
75
 
76 4 sinclairrf
The assembly language is Forth-like.  Built-in macros are used to encode the
77
jump and call instructions and to encode the 2-bit memory bank index in memory
78
store and fetch instructions.
79 2 sinclairrf
 
80
The computer compiler and assembler are written in Python 2.7.  Peripherals are
81
implemented by Python modules which generate the I/O ports and the peripheral
82
HDL.
83
 
84
The computer compiler is documented in the doc directory.  The 9x8 core is
85
documented in the core/9x8/doc directory.  Several examples are provided.
86
 
87
The computer compiler and assembler are fully functional and there are no known
88
bugs.
89
 
90
 
91
SPEED AND RESOURCE UTILIZATION
92
================================================================================
93 9 sinclairrf
 
94 2 sinclairrf
These device speed and resource utilization results are copied from the build
95 7 sinclairrf
tests.  The full results are listed in the core/9x8/build directories.  The
96
tests use a minimal processor implementation (clock, reset, and one output).
97
Device-specific scripts state how these performance numbers were obtained.
98 2 sinclairrf
 
99
VENDOR          DEVICE          BEST SPEED      SMALLEST RESOURCE UTILIZATION
100
------          ------          ----------      -------------------------------
101
Altera          Cyclone-III     190.6 MHz       282 LEs           (preliminary)
102
Altera          Cyclone-IV      192.1 MHz       281 LEs           (preliminary)
103
Altera          Stratix-V       372.9 MHz       198 ALUTs         (preliminary)
104
Lattice         LCMXO2-640ZE-3   98.4 MHz       206 LUTs          (preliminary)
105
Lattice         LFE2-6E-7       157.9 MHz       203 LUTs          (preliminary)
106 7 sinclairrf
Xilinx          Artix-7         TBD             163 slice LUTs (48 slices)
107
Xilinx          Kintex-7        TBD             158 slice LUTs (44 slices)
108
Xilinx          Spartan-3A      149.4 MHz       232 4-input LUTs (129 slices)
109
Xilinx          Spartan-6       193.7 MHz       124 Slice LUTs (34 slices)
110 6 sinclairrf
Xilinx          Virtex-6        275.7 MHz       122 Slice LUTs (38 slices) (p.)
111 2 sinclairrf
 
112
Disclaimer:  Like other embedded processors, these are the maximum performance
113
claims.  Realistic implementations will produce slower maximum clock rates,
114
particularly with lots of I/O ports and peripherals and with the constraint of
115
existing with other subsystems in the FPGA fabric.  What these performance
116
numbers do provide is an estimate of the amount of slack available.  For
117
example, you can't realistically expect to get 110 MHz from a processor that,
118 6 sinclairrf
under ideal conditions, places and routes at 125 MHz, but you can with a
119
processor that is demonstrated to place and route at 150 MHz.
120 2 sinclairrf
 
121
 
122
EXAMPLE:
123
================================================================================
124
 
125
The LED flasher example demonstrates the simplicity of the architectural
126
specification and the Forth-like assembly language.
127
 
128
The architecture file, named "led.9x8", with the comments and user header
129
removed, is as follows:
130
 
131
  ARCHITECTURE    core/9x8 Verilog
132
 
133
  INSTRUCTION     2048
134
  RETURN_STACK    32
135
  DATA_STACK      32
136
 
137
  PORTCOMMENT LED on/off signal
138
  OUTPORT 1-bit o_led O_LED
139
 
140
  ASSEMBLY led.s
141
 
142
The ARCHITECTURE configuration command specifies the 9x8 core and the Verilog
143
language.  The INSTRUCTION, RETURN_STACK, and DATA_STACK configuration commands
144
specify the sizes of the instruction space, return stack, and data stack.  The
145
content of the PORTCOMMENT configuration command is inserted in the module
146
declaration -- this facilitates identifying signals in micro controllers with a
147
lot of inputs and outputs.  The single OUTPORT statement specifies a 1-bit
148
signal named "o_led".  This signal is accessed in the assembly code through the
149
symbol "O_LED".  The ASSEMBLY command specifies the single input file "led.s,"
150
which is listed below.  The output module will be "led.v"
151
 
152
The "led.s" assembly file is as follows:
153
 
154
  ; Consume 256*5+4 clock cycles.
155
  ; ( - )
156
  .function pause
157
 
158
  .return
159
 
160
  ; Repeat "pause" 256 times.
161
  ; ( - )
162
  .function repause
163
 
164
  .return
165
 
166
  ; main program (as an infinite loop)
167
  .main
168
 
169
 
170
This example is coded in a traditional Forth structure with the conditional
171
jumps consuming the top of the data stack.  Examining the "pause" function, the
172
".function" directive specifies the start of a function and the function name.
173
The "0" instruction pushes the value "0" onto the top of the data stack.
174
":inner" is a label for a jump instruction.  The "1-" instruction decrements the
175
top of the data stack.  "dup" is the Forth instruction to push a duplicate of
176
the top of the data stack onto the data stack.  The ".jumpc(inner)" macro
177
expands to three instructions as follows:  (1) push the 8 lsb of the address at
178
"inner" onto the data stack, (2) the conditional jump instruction with the 5 msb
179
of the address of "inner" (the jumpc instruction also drops the top of the data
180
stack with its partial address), and (3) a "drop" instruction to drop the
181
duplicated loop count from the top of the data stack.  Finally, the "drop"
182
instruction drops the loop count from the top of the data stack and the
183
".return" macro generates the "return" instruction and a "nop" instruction.
184
 
185
The function "repause" calls the "pause" function 256 times.  The main program
186
body is identified by the directive ".main"  This function runs an infinite loop
187
that toggles the lsb of the LED output, outputs the LED setting, and calls the
188
"repause" function.
189
 
190
A tighter version of the loop in the "pause" function can be written as
191
 
192
  ; Consume 256*3+3 clock cycles.
193
  ; ( - )
194
  .function pause
195
    0xFF :inner .jumpc(inner,1-) .return(drop)
196
 
197
which is 3 cycles long for each iteration, the "drop" that is normally part
198
of the ".jumpc" macro has been replaced by the decrement instruction, and the
199
final "drop" instruction has replaced the default "nop" instruction that is
200
normally part of the ".return" macro.  Note that the decrement is performed
201
after the non-zero comparison in the "jumpc" instruction.
202
 
203
A version of the "pause" function that consumes exactly 1000 clock cycles is:
204
 
205
  .function pause
206
    ${(1000-4)/4-1} :inner nop .jumpc(inner,1-) drop .return
207
 
208
The instruction memory initialization for the processor module includes the
209
instruction mnemonics being performed at each address and replaces the "list"
210
file output from traditional assemblers.  The following is the memory
211
initialization for this LED flasher example.  The main program always starts at
212
address zero and functions are included in the order encountered.  Unused
213
library functions are not included in the generated instruction list.
214
 
215
  reg [8:0] s_opcodeMemory[2047:0];
216
  initial begin
217
    // .main
218
    s_opcodeMemory['h000] = 9'h100; // 0x00
219
    s_opcodeMemory['h001] = 9'h101; // :inner 0x01
220
    s_opcodeMemory['h002] = 9'h052; // ^
221
    s_opcodeMemory['h003] = 9'h008; // dup
222
    s_opcodeMemory['h004] = 9'h100; // O_LED
223
    s_opcodeMemory['h005] = 9'h038; // outport
224
    s_opcodeMemory['h006] = 9'h054; // drop
225
    s_opcodeMemory['h007] = 9'h10D; //
226
    s_opcodeMemory['h008] = 9'h0C0; // call repause
227
    s_opcodeMemory['h009] = 9'h000; // nop
228
    s_opcodeMemory['h00A] = 9'h101; //
229
    s_opcodeMemory['h00B] = 9'h080; // jump inner
230
    s_opcodeMemory['h00C] = 9'h000; // nop
231
    // repause
232
    s_opcodeMemory['h00D] = 9'h100; // 0x00
233
    s_opcodeMemory['h00E] = 9'h119; // :inner
234
    s_opcodeMemory['h00F] = 9'h0C0; // call pause
235
    s_opcodeMemory['h010] = 9'h000; // nop
236
    s_opcodeMemory['h011] = 9'h05C; // 1-
237
    s_opcodeMemory['h012] = 9'h008; // dup
238
    s_opcodeMemory['h013] = 9'h10E; //
239
    s_opcodeMemory['h014] = 9'h0A0; // jumpc inner
240
    s_opcodeMemory['h015] = 9'h054; // drop
241
    s_opcodeMemory['h016] = 9'h054; // drop
242
    s_opcodeMemory['h017] = 9'h028; // return
243
    s_opcodeMemory['h018] = 9'h000; // nop
244
    // pause
245
    s_opcodeMemory['h019] = 9'h100; // 0x00
246
    s_opcodeMemory['h01A] = 9'h05C; // :inner 1-
247
    s_opcodeMemory['h01B] = 9'h008; // dup
248
    s_opcodeMemory['h01C] = 9'h11A; //
249
    s_opcodeMemory['h01D] = 9'h0A0; // jumpc inner
250
    s_opcodeMemory['h01E] = 9'h054; // drop
251
    s_opcodeMemory['h01F] = 9'h054; // drop
252
    s_opcodeMemory['h020] = 9'h028; // return
253
    s_opcodeMemory['h021] = 9'h000; // nop
254
    s_opcodeMemory['h022] = 9'h000;
255
    s_opcodeMemory['h023] = 9'h000;
256
    s_opcodeMemory['h024] = 9'h000;
257
    ...
258
    s_opcodeMemory['h7FF] = 9'h000;
259
  end
260
 
261
 
262
DATA and STRINGS
263
================================================================================
264
 
265
Values are pushed onto the data stack by stating the value.  For example,
266
 
267
  0x10 0x20 'x'
268
 
269
will successively push the values 0x10, 0x20, and the character 'x' onto the
270
data stack.  The character 'x' will be at the top of the data stack after these
271
3 instructions.
272
 
273 5 sinclairrf
Numeric values can be represented in binary, octal, decimal, and hex.  Binary
274
values start with the two characters "0b" followed by a sequence of binary
275
digits; octal numbers start with a "0" followed by a sequence of octal digits;
276
decimal values can start with a "+" or "-" have a non-zero first digit and have
277
zero or more decimal digits; and hex values start with the two characters "0X"
278
followed by a sequence of hex digits.
279 2 sinclairrf
 
280 5 sinclairrf
Examples of equivalent numeric values are:
281
  binary:   0b01  0b10010
282
  octal:    01    022
283
  decimal:  1     18
284
  hex:      0x1   0x12
285
 
286
See the COMPUTED VALUES section for using computed values in the assembler.
287
 
288 2 sinclairrf
There are four ways to specify strings in the assembler.  Simply stating the
289
string
290
 
291
  "Hello World!"
292
 
293
puts the characters in the string onto the data stack with the letter 'H' at the
294
top of the data stack.  I.e., the individual push operations are
295
 
296
  '!' 'd' 'l' ... 'e' 'H'
297
 
298
Prepending a 'N' before the double quote, like
299
 
300
  N"Hello World!"
301
 
302
puts a null-terminated string onto the data stack.  I.e., the value under the
303
'!' will be a 0x00 and the instruction sequence would be
304
 
305
  0x0 '!' 'd' 'l' ... 'e' 'H'
306
 
307
Forth uses counted strings, which are specified here as
308
 
309
  C"Hello World!"
310
 
311 4 sinclairrf
In this case the number of characters, 12, in the string is pushed onto the data
312
stack after the 'H', i.e., the instruction sequence would be
313 2 sinclairrf
 
314
  '!' 'd' 'l' ... 'e' 'H' 12
315
 
316
Finally, a lesser-counted string specified like
317
 
318
  c"Hello World!"
319
 
320
is similar to the Forth-like counted string except that the value pushed onto
321
the data stack is one less than the number of characters in the string.  Here
322
the value pushed onto the data stack after the 'H' would be 11 instead of 12.
323
 
324
Simple strings are useful for constructing more complex strings in conjunction
325
with other string functions.   For example, to transmit the hex values of the
326
top 2 values in the data stack, do something like:
327
 
328
  ; move the top 2 values to the return stack
329
  >r >r
330
  ; push the tail of the message onto the data stack
331
  N"\n\r"
332
  ; convert the 2 values to 2-digit hex values, LSB deepest in the stack
333
  r> .call(string_byte_to_hex)
334
  r> .call(string_byte_to_hex)
335
  ; pre-pend the identification message
336
  "Message:  "
337
  ; transmit the string, using the null terminator to terminate the loop
338
  :loop_transmit .outport(O_UART_TX) .jumpc(loop_transmit,nop) drop
339
 
340
A lesser-counted string would be used like:
341
 
342
  c"Status Message\r\n"
343
  :loop_msg swap .outport(O_UART_TX) .jumpc(loop_msg,1-) drop
344
 
345
These four string formats can also be used for variable definitions.  For
346
example 3 variables could be allocated and initialized as follows:
347
 
348
  .memory ROM myrom
349
  .variable fred N"fred"
350
  .variable joe  c"joe"
351
  .variable moe  "moe"
352
 
353
These are equivalent to
354
 
355
  .variable fred 'f' 'r' 'e' 'd'  0
356
  .variable joe   2  'j' 'o' 'e'
357
  .variable moe  'm' 'o' 'e'
358
 
359
with 5 bytes allocated for the variable fred, 4 bytes for joe, and 3 bytes for
360
moe.
361
 
362
The following escaped characters are recognized:
363
 
364
  '\0'     null character
365
  '\a'     bell
366
  '\b'     backspace
367
  '\f'     form feed
368
  '\n'     line feed
369
  '\r'     carriage return
370
  '\t'     horizontal tab
371
  "\0ooo"  3-digit octal value
372
  "\xXX"   2-digit hex value where X is one of 0-9, a-f, or A-F
373
  "\Xxx"   alternate form for 2-digit hex value
374
  "\\"     backslash character
375
 
376
Unrecognized escaped characters are simple treated as that character.  For
377
example, '\m' is treated as the single character 'm' and '\'' is treated as the
378
single quote character.
379
 
380
 
381
INSTRUCTIONS
382
================================================================================
383
 
384 7 sinclairrf
The 43 instructions are as follows (see core/9x8/doc/opcodes.html for detailed
385 2 sinclairrf
descriptions).  Here, T is the top of the data stack, N is the next-to-top of
386
the data stack, and R is the top of the return stack.  All of these are the
387
values at the start of the instruction.
388
 
389
The nop instruction does nothing:
390
 
391
  nop           no operation
392
 
393
Mathematical operations drop one value from the data stack and replace the new
394
top with the state value:
395
 
396
  &             bitwise and of N and T
397
  +             N + T
398
  -             N - T
399
  ^             bitwise exclusive or of N and T
400
  or            bitwise or of N and T
401
 
402 7 sinclairrf
Push the carry bit for addition or subtraction onto the data stack (see
403
lib/9x8/math.s for examples on using +c and -c for multi-byte arithmetic):
404
 
405
  +c            carry bit for N+T
406
  -c            carry bit for N-T
407
 
408 2 sinclairrf
Increment and decrement replace the top of the data stack with the stated
409
result.
410
 
411
  1+            replace T with T+1
412
  1-            replace T with T-1
413
 
414
Comparison operations replace the top of the data stack with the results of the
415
comparison:
416
 
417
  -1<>          replace T with -1 if T != -1, otherwise set T to 0
418
  -1=           replace T with 0 if T != -1, otherwise leave T as -1
419
  0<>           replace T with -1 if T != 0, otherwise leave T as 0
420
  0=            replace T with -1 if T == 0, otherwise set T to 0
421
 
422
Shift/rotate operations replace the top of the data with with the result of the
423
specified shift/rotate.
424
 
425
  0>>           shift T right one bit and set the msb to 0
426
  1>>           shift T right 1 bit and set the msb to 1
427
  <<0           shift T left 1 bit and set the lsb to 0
428
  <<1           shift T left 1 bit and set the lsb to 1
429
  <
430
  lsb>>         rotate T right 1 bit
431
  msb>>         shift T right 1 bit and set the msb to the old msb
432
 
433
Note:  There is no "<
434
 
435
Stack manipulation instructions are as follows:
436
 
437
  >r            pushd T onto the return stack and drop T from the data stack
438
  drop          drop T from the data stack
439
  dup           push T onto the data stack
440
  nip           drop N from the data stack
441
  over          push N onto the data stack
442
  push          push a single byte onto the data stack, see the preceding DATA
443
                and STRINGS section
444
  r>            push R onto the data stack and drop R from the return stack
445
  r@            push R onto the data stack
446
  swap          swap N and T
447
 
448
Jump and call and their conditional variants are as follows and must use the
449
associated macro:
450
 
451
  call          call instruction -- use the .call macro
452
  callc         conditional call instruction -- use the .callc macro
453
  jump          jump instruction -- use the .jump macro
454
  jumpc         conditional jump instruction -- use the .jumpc macro
455
  return        return instruction -- use the .return macro
456
 
457
See the MEMORY section for details for these memory operations.  T is the
458
address for the instructions, N is the value stored.  Chained fetches insert the
459
value below T.  Chained stores drop N.
460
 
461
  fetch         memory fetch, replace T with the value fetched
462
  fetch+        chained memory fetch, retain and increment the address
463
  fetch-        chained memory fetch, retain and decrement the address
464
  store         memory store, drop T (N is the next value of T)
465
  store+        chained memory store, retain and increment the address
466
  store-        chained memory store, retain and decrement the address
467
 
468
See the INPORT and OUTPORT section for details for the input and output port
469
operations:
470
 
471
  inport        input port operation
472
  outport       output port operation
473
 
474
The .call, .callc, .jump, and .jumpc macros encode the 3 instructions required
475
to perform a call or jump along with the subsequent instructions.  The default
476
third instructions is "nop" for .call and .jump and it is "drop" for .callc and
477
.jumpc.  The default can be changed by specifying the optional second argument.
478
The .call and .callc macros must specify a function identified by the .function
479
directive and the .jump and .jumpc macros must specify a label.
480
 
481
The .function directive takes the name of the function and the function body.
482
Function bodies must end with a .return or a .jump macro.  The .main directive
483
defines the body of the main function, i.e., the function at which the processor
484
starts.
485
 
486
The .include directive is used to read additional assembly code.  You can, for
487
example, put the main function in uc.s, define constants and such in consts.s,
488
define the memories and variables in ram.s, and include UART utilities in
489
uart.s.  These files could be included in uc.s through the following lines:
490
 
491
  .include consts.s
492
  .include myram.s
493
  .include uart.s
494
 
495
The assembler only includes functions that can be reached from the main
496
function.  Unused functions will not consume instruction space.
497
 
498
 
499
INPORT and OUTPORT
500
================================================================================
501
 
502
The INPORT and OUTPORT configuration commands are used to specify 2-state inputs
503
and outputs.  For example
504
 
505
  INPORT 8-bit i_value I_VALUE
506
 
507
specifies a single 8-bit input signal named "i_value" for the module.  The port
508
is accessed in assembly by ".inport(I_VALUE)" which is equivalent to the
509
two-instruction sequence "I_VALUE inport".  To input an 8-bit value from a FIFO
510
and send a single-clock-cycle wide acknowledgment strobe, use
511
 
512
  INPORT 8-bit,strobe i_fifo,o_fifo_ack I_FIFO
513
 
514
The assembly ".inport(I_FIFO)" will automatically send an acknowledgment strobe
515
to the FIFO through "o_fifo_ack".
516
 
517
A write port to an 8-bit FIFO is similarly specified by
518
 
519
  OUTPORT 8-bit,strobe o_fifo,o_fifo_wr O_FIFO
520
 
521
The assembly ".outport(O_FIFO)" which is equivalent to "O_FIFO outport drop"
522
will automatically send a write strobe to the FIFO through "o_fifo_wr".
523
 
524
Multiple signals can be packed into a single input or output port by defining
525
them in comma separated lists.  The associated bit masks can be defined
526
coincident with the port definition as follows:
527
 
528
  INPUT 1-bit,1-bit i_fifo_full,i_fifo_empty I_FIFO_STATUS
529
  CONSTANT C_FIFO_STATUS__FULL  0x02
530
  CONSTANT C_FIFO_STATUS__EMPTY 0x01
531
 
532
Checking the "full" status of the FIFO can be done by the following assembly
533
sequence:
534
 
535
  .inport(I_FIFO_STATUS) C_FIFO_STATUS__FULL &
536
 
537
Multiple bits can be masked using a computed value as follows (see below for
538
more details):
539
 
540
  .inport(I_FIFO_STATUS) ${C_FIFO_STATUS__FULL|C_FIFO_STATUS__EMPTY} &
541
 
542
The "${...}" creates an instruction to push the 8-bit value in the braces onto
543
the data stack.  The computation is performed using the Python "eval" function
544
in the context of the program constants, memory addresses, and memory sizes.
545
 
546
Preceding all of these by
547
 
548
  PORTCOMMENT external FIFO
549
 
550
produces the following in the Verilog module statement.  The I/O ports are
551
listed in the order in which they are declared.
552
 
553
  // external FIFO
554
  input  wire       [7:0] i_fifo,
555
  output reg              o_fifo_ack,
556
  output reg        [7:0] o_fifo,
557
  output reg              o_fifo_wr,
558
  input  wire             i_fifo_full,
559
  input  wire             i_fifo_empty
560
 
561
The HDL to implement the inputs and outputs is computer generated.  Identifying
562
the port name in the architecture file eliminates the possibility of
563
inconsistent port numbers between the HDL and the assembly.  Specifying the bit
564
mapping for the assembly code immediately after the port definition helps
565
prevent inconsistencies between the port definition and the bit mapping in the
566
assembly code.
567
 
568
The normal initial value for an outport is zero.  This can be changed by
569
including an optional initial value as follows.  This initial value will be
570
applied on system startup and when the micro controller is reset.
571
 
572
  OUTPORT 4-bit=4'hA o_signal O_SIGNAL
573
 
574
An isolated output strobe can also be created using:
575
 
576
  OUTPORT strobe o_strobe O_STROBE
577
 
578
The assembly ".outstrobe(O_STROBE)" which is equivalent to "O_STROBE outport"
579
is used to generate the strobe.  Since "O_STROBE" is a strobe-only outport, the
580
".outport" macro cannot be used with it.  Similarly, attempting to use the
581
".outstrobe" macro will generate an error if it is invoked with an outport
582
that does have data.
583
 
584
A single-bit "set-reset" input port type is also included.  This sets a register
585
when an external strobe is received and clears the register when the port is
586
read.  For example, to capture an external timer for a polled-loop, include the
587
following in the architecture file:
588
 
589
  PORTCOMMENT external timer
590
  INPORT set-reset i_timer I_TIMER
591
 
592
The following is the assembly code to conditionally call two functions when the
593
timer event is encountered:
594
 
595
  .inport(I_TIMER)
596
    .callc(timer_event_1,nop)
597
    .callc(timer_event_2)
598
 
599
The "nop" in the first conditional call prevents the conditional from being
600
dropped from the data stack so that it can be used by the subsequent conditional
601
function call.
602
 
603 9 sinclairrf
The input from a set-reset INPORT is a pure flag.  I.e., either all of the bits
604
are zero or all of the bits are one.  This can be used as part of executing a
605
loop a fixed number of times.  For example, the inperiod argument of the
606
servo_motor peripheral can be used to receive a strobe every time the PWM goes
607
high.  The following loop will wait for 10 occurrences of the rising edge of the
608
servo_motor PWM before proceeding to the next block of code:
609 2 sinclairrf
 
610 9 sinclairrf
  10 :loop .inport(I_INPERIOD) + .jumpc(loop,nop) drop
611
 
612
 
613 2 sinclairrf
PERIPHERAL
614
================================================================================
615
 
616
Peripherals are implemented via Python modules.  For example, an open drain I/O
617
signal, such as is required for an I2C bus, does not fit the INPORT and OUTPORT
618
functionality.  Instead, an "open_drain" peripheral is provided by the Python
619
script in "core/9x8/peripherals/open_drain.py".  This puts a tri-state I/O in
620
the module statement, allows it to be read through an "inport" instruction, and
621
allows it to be set low or released through an "outport" instruction.  An I2C
622
bus with separate SCL and SDA ports can then be incorporated into the processor
623
as follows:
624
 
625
  PORTCOMMENT     I2C bus
626
  PERIPHERAL      open_drain      inport=I_SCL \
627
                                  outport=O_SCL \
628
                                  iosignal=io_scl
629
  PERIPHERAL      open_drain      inport=I_SDA \
630
                                  outport=O_SDA \
631
                                  iosignal=io_sda
632
 
633
The default width for this peripheral is 1 bit.  The module statement will then
634
include the lines
635
 
636
  // I2C bus
637
  inout  wire     io_scl,
638
  inout  wire     io_sda
639
 
640
The assembly code to set the io_scl signal low is "0 .outport(O_SCL)" and to
641
release it is "1 .outport(O_SCL)".  These instruction sequences are actually
642
"0 O_SCL outport drop" and "1 O_SCL outport drop" respectively.  The "outport"
643
instruction drops the top of the data stack (which contained the port number)
644
and sends the next-to-the-top of the data stack to the designated output port.
645
 
646
Two examples of I2C device operation are included in the examples directory.
647
 
648
The following peripherals are provided:
649
  adder_16bit   16-bit adder/subtractor
650
  AXI4_Lite_Master
651
                32-bit read/write AXI4-Lite Master
652
                Note:  The synchronous version has been tested on hardware.
653
  AXI4_Lite_Slave_DualPortRAM
654
                dual-port-RAM interface for the micro controller to act as an
655
                AXI4-Lite slave
656
  big_inport    shift reads from a single INPORT to construct a wide input
657
  big_outport   shift writes to a single OUTPORT to construct a wide output
658
  counter       counter for number of received high cycles from signal
659
  inFIFO_async  input FIFO with an asynchronous write clock
660
  latch         latch wide inputs for sampling
661
  monitor_stack simulation diagnostic (see below)
662
  open_drain    for software-implemented I2C buses or similar
663
  outFIFO_async output FIFO with an asynchronous read clock
664
  PWM_8bit      PWM generator with an 8-bit control
665 9 sinclairrf
  servo_motor   PWM modulation suitable for servo motor or similar control
666
  stepper_motor stepper motor controller with acceleration
667 2 sinclairrf
  timer         timing for polled loops or similar
668
  trace         simulation diagnostic (see below)
669
  UART          bidirectional UART
670
  UART_Rx       receive UART
671
  UART_Tx       transmit UART
672 3 sinclairrf
  wide_strobe   1 to 8 bit strobe generator
673 2 sinclairrf
 
674
The following command illustrates how to display the help message for
675
peripherals:
676
 
677
  echo "ARCHITECTURE core/9x8 Verilog" | ssbcc -P "big_inport help" - | less
678
 
679
User defined peripherals can be in the same directory as the architecture file
680
or a subdirectory named "peripherals".
681
 
682
 
683
PARAMETER and LOCALPARAM
684
================================================================================
685
 
686
Parameters are incorporated through the PARAMETER and LOCALPARAM configuration
687
commands.  For example, the clock frequency in hertz is needed for UARTs for
688
their baud rate generator.  The configuration command
689
 
690
  PARAMETER G_CLK_FREQ_HZ 97_000_000
691
 
692
specifies the clock frequency as 97 MHz.  The HDL instantiating the processor
693
can change this specification.  The frequency can also be changed through the
694
command-line invocation of the computer compiler.  For example,
695
 
696
  ssbcc -G "G_CLK_FREQ_HZ=100_000_000" myprogram.9x8
697
 
698
specifies that a frequency of 100 MHz be used instead of the default frequency
699
of 97 MHz.
700
 
701
The LOCALPARAM configuration command can be used to specify parameters that
702
should not be changed by the surrounding HDL.  For example,
703
 
704
  LOCALPARAM L_VERSION 24'h00_00_00
705
 
706
specifies a 24-bit parameter named "L_VERSION".  The 8-bit major, minor, and
707
build sections of the parameter can be accessed in an assembly program using
708
"L_VERSION[16+:8]", "L_VERSION[8+:8]", and "L_VERSION[0+:8]".
709
 
710
For both parameters and localparams, the default range is "[0+:8]".  The
711
instruction memory is initialized using the parameter value during synthesis,
712
not the value used to initialize the parameter.  That is, the instruction memory
713
initialization will be:
714
 
715
  s_opcodeMemory[...] = { 1'b1, L_VERSION[16+:8] };
716
 
717
The value of the localparam can be set when the computer compiler is run using
718
the "-G" option.  For example,
719
 
720
  ssbcc -G "L_VERSION=24'h01_04_03" myprogram.9x8
721
 
722
can be used in a makefile to set the version number for a release without
723
modifying the micro controller architecture file.
724
 
725
 
726
DIAGNOSTICS AND DEBUGGING
727
================================================================================
728
 
729
A 3-character, human readable version of the opcode can be included in
730
simulation waveform outputs by adding "--display-opcode" to the ssbcc command.
731
 
732
The stack health can be monitored during simulation by including the
733
"monitor_stack" peripheral through the command line.  For example, the LED
734
flasher example can be generated using
735
 
736
  ssbcc -P monitor_stack led.9x8
737
 
738
This allows the architecture file to be unchanged between simulation and an FPGA
739
build.
740
 
741
Stack errors include underflow and overflow, malformed data validity, and
742
incorrect use of the values on the return stack (returns to data values and data
743
operations on return addresses).  Other errors include out-of-range for memory,
744
inport, and outport operations.
745
 
746
When stack errors are detected the last 50 instructions are dumped to the
747
console and the simulation terminates.  The dump includes the PC, numeric
748
opcode, textual representation of the opcode, data stack pointer, next-to-top of
749
the data stack, top of the data stack, top of the return stack, and the return
750
stack pointer.  Invalid stack values are displayed as "XX".  The length of the
751
history dumped is configurable.
752
 
753
Out-of-range PC checks are also performed if the instruction space is not a
754
power of 2.
755
 
756
A "trace" peripheral is also provided that dumps the entire execution history.
757
This was used to validate the processor core.
758
 
759
 
760
MEMORY ARCHITECTURE
761
================================================================================
762
 
763
The DATA_STACK, RETURN_STACK, INSTRUCTION, and MEMORY configuration commands
764
allocate memory for the data stack, return stack, instruction ROM, and memory
765
RAM and ROM respectively.  The data stack, return stack, and memories are
766
normally instantiated as dual-port LUT-based memories with asynchronous reads
767
while the instruction memory is always instantiated with a synchronous read
768
architecture.
769
 
770
The COMBINE configuration command is used to coalesce memories and to convert
771
LUT-based memories to synchronous SRAM-based memories.  For example, the large
772
SRAMs in modern FPGAs are ideal for storing the instruction opcodes and their
773
dual-ported access allows either the data stack or the return stack to be
774
stored in a relatively small region at the end of the large instruction memory.
775
Memories, which required dual-ported operation, can also be instantiated in
776
large RAMs either individually or in combination with each other.  Conversion
777
to SRAM-based memories is also useful for FPGA architectures that do not have
778
efficient LUT-based memories.
779
 
780
The INSTRUCTION configuration allocates memory for the processor instruction
781
space.  It has the form "INSTRUCTION N" or "INSTRUCTION N*M" where N must be a
782
power of 2.  The first form is used if the desired instruction memory size is a
783
power of 2.  The second form is used to allocate M memory blocks of size N
784
where M is not a power of 2.  For example, on an Altera Cyclone III, the
785
configuration command "INSTRUCTION 1024*3" allocates three M9Ks for the
786
instruction space, saving one M9K as compared to the configuration command
787
"INSTRUCTION 4096".
788
 
789
The DATA_STACK configuration command allocates memory for the data stack.  It
790
has the form "DATA_STACK N" where N is the commanded size of the data stack.
791
N must be a power of 2.
792
 
793
The RETURN_STACK configuration command allocates memory for the return stack and
794
has the same format as the DATA_STACK configuration command.
795
 
796
The MEMORY configuration command is used to define one to four memories, either
797
RAM or ROM, with up to 256 bytes each.  If no MEMORY configuration command is
798
issued, then no memories are allocated for the processor.  The MEMORY
799
configuration command has the format "MEMORY {RAM|ROM} name N" where
800
"{RAM|ROM}" specifies either a RAM or a ROM, name is the name of the memory and
801
must start with an alphabetic character, and the size of the memory, N, must be
802
a power of 2.  For example, "MEMORY RAM myram 64" allocates 64 bytes of memory
803
to form a RAM named myram.  Similarly, "MEMORY ROM lut 256" defines a 256 byte
804
ROM named lut.  More details on using memories is provided in the next section.
805
 
806
The COMBINE configuration command can be used to combine the various memories
807
for more efficient processor implementation as follows:
808
 
809
  COMBINE INSTRUCTION,
810
  COMBINE 
811
  COMBINE ,
812
  COMBINE 
813
 
814
where  is one of DATA_STACK, RETURN_STACK, or a list of one
815
or more ROMs and  is a list of one or more RAMs and/or ROMs.  The first
816
configuration command reserves space at the end of the instruction memory for
817
the DATA_STACK, RETURN_STACK, or listed ROMs.
818
 
819
The SRAM_WIDTH configuration command is used to make the memory allocations more
820
efficient when the SRAM block width is more than 9 bits.  For example,
821
Altera's Cyclone V family has 10-bit wide memory blocks and the configuration
822
command "SRAM_WIDTH 10" is appropriate.  The configuration command
823
sequence
824
 
825
  INSTRUCTION     1024
826
  RETURN_STACK    32
827
  SRAM_WIDTH      10
828
  COMBINE         INSTRUCTION,RETURN_STACK
829
 
830
will use a single 10-bit memory entry for each element of the return stack
831
instead of packing the 10-bit values into two memory entries of a 9-bit wide
832
memory.
833
 
834
The following illustrates a possible configuration for a Spartan-6 with a
835
2048-long SRAM and relatively large 64-deep data stack.  The data stack will be
836
in the last 64 elements of the instruction memory and the instruction space will
837
be reduced to 1984 words.
838
 
839
  INSTRUCTION   2048
840
  DATA_STACK    64
841
  COMBINE       INSTRUCTION,DATA_STACK
842
 
843
The following illustrates a possible configuration for a Cyclone-III with three
844
M9Ks for the instruction ROM and the data stack.
845
 
846
  INSTRUCTION   1024*3
847
  DATA_STACK    64
848
  COMBINE       INSTRUCTION,DATA_STACK
849
 
850
WARNING:  Some devices, such as Xilinx' Spartan-3A devices, do not support
851
asynchronous reads, so the COMBINE configuration command does not work for them.
852
 
853
WARNING:  Xilinx XST does not correctly infer a Block RAM when the
854
"COMBINE INSTRUCTION,RETURN_STACK" configuration command is used and the
855
instruction space is 1024 instructions or larger.  Xilinx is supposed to fix
856
this in a future release of Vivado so the fix will only apply to 7-series or
857
later FPGAs.
858
 
859
 
860
MEMORY
861
================================================================================
862
 
863
The MEMORY configuration command is used as follows to allocate a 128-byte RAM
864
named "myram" and to allocate a 32-byte ROM named "myrom".  Zero to four
865
memories can be allocated, each with up to 256 bytes.
866
 
867
  MEMORY RAM myram 128
868
  MEMORY ROM myrom  32
869
 
870
The assembly code to lay out the memory uses the ".memory" directive to identify
871
the memory and the ".variable" directive to identify the symbol and its content.
872
Single or multiple values can be listed and "*N" can be used to identify a
873
repeat count.
874
 
875
  .memory RAM myram
876
  .variable a 0
877
  .variable b 0
878
  .variable c 0 0 0 0
879
  .variable d 0*4
880
 
881
  .memory ROM myrom
882
  .variable coeff_table 0x04
883
                        0x08
884
                        0x10
885
                        0x20
886
  .variable hello_world N"Hello World!\r\n"
887
 
888
Single values are fetched from or stored to memory using the following assembly:
889
 
890
  .fetchvalue(a)
891
  0x12 .storevalue(b)
892
 
893
Multi-byte values are fetched or stored as follows.  This copies the four values
894
from coeff_table, which is stored in a ROM, to d.
895
 
896
  .fetchvector(coeff_table,4) .storevector(d,4)
897
 
898
The memory size is available using computed values (see below) and can be used
899
to clear the entire memory, etc.
900
 
901
The available single-cycle memory operation macros are:
902
  .fetch(mem_name)      replaces T with the value at the address T in the memory
903
                        mem_name
904 5 sinclairrf
                        Note:  .fetchram(var_name) is safer.
905 2 sinclairrf
  .fetch+(mem_name)     pushes the value at address T in the memory mem_name
906
                        into the data stack below T and increments T
907
                        Note:  This is useful for fetching successive values
908
                               from memory into the data stack.
909 5 sinclairrf
                        Note:  .fetchram+(var_name) is safer.
910 2 sinclairrf
  .fetch-(mem_name)     similar to .fetch+ but decrements T
911 5 sinclairrf
                        Note:  .fetchram-(var_name) is safer.
912 2 sinclairrf
  .store(ram_name)      stores N at address T in the RAM ram_name, also drops
913
                        the top of the data stack
914 5 sinclairrf
                        Note:  .storeram(var_name) is safer.
915 2 sinclairrf
  .store+(ram_name)     stores N at address T in the RAM ram_name, also drops N
916
                        from the data stack and increments T
917 5 sinclairrf
                        Note:  .storeram+(var_name) is safer.
918 2 sinclairrf
  .store-(ram_name)     similar to .store+ but decrements T
919 5 sinclairrf
                        Note:  .storeram-(var_name) is safer.
920 2 sinclairrf
 
921
The following multi-cycle macros provide more generalized access to the
922
memories:
923
  .fetchindexed(var_name)
924
                        uses the top of the data stack as an index into var_name
925
                        Note:  This is equivalent to the 3 instruction sequence
926
                               "var_name + .fetch(mem_name)"
927
  .fetchoffset(var_name,offset)
928
                        fetches the single-byte value of var_name offset by
929
                        "offset" bytes
930
                        Note:  This is equivalent to
931
                               "${var_name+offset} .fetch(mem_name)"
932 5 sinclairrf
  .fetchram(var_name)   is similar to the .fetch(mem_name) macro except that the
933
                        variable name is used to identify the memory instead of
934
                        the name of the memory
935
  .fetchram+(var_name)  is similar to the .fetch+(mem_name) macro except that
936
                        the variable name is used to identify the memory instead
937
                        of the name of the memory
938
  .fetchram-(var_name)  is similar to the .fetch-(mem_name) macro except that the
939
                        the variable name is used to identify the memory instead
940
                        of the name of the memory
941
  .fetchvalue(var_name) fetches the single-byte value of var_name
942
                        Note:  This is equivalent to "var_name .fetch(mem_name)"
943
                               where mem_name is the memory in which var_name is
944
                               stored.
945
  .fetchvalueoffset(var_name,offset)
946
                        fetches the single-byte value stored at var_name+offset
947
                        Note:  This is equivalent to
948
                               "${var_name+offset}" .fetch(mem_name)
949
                               where mem_name is the memory in which var_name is
950
                               stored.
951 2 sinclairrf
  .fetchvector(var_name,N)
952
                        fetches N values starting at var_name into the data
953
                        stack with the value at var_name at the top and the
954
                        value at var_name+N-1 deep in the stack.
955
                        Note:  This is equivalent N+1 operation sequence
956
                               "${var_name+N-1} .fetch-(mem_name) ...
957
                               .fetch-(mem_name) .fetch(mem_name)"
958
                               where ".fetch-(mem_name)" is repeated N-1 times.
959
  .storeindexed(var_name)
960
                        uses the top of the data stack as an index into var_name
961
                        into which to store the next-to-top of the data stack.
962
                        Note:  This is equivalent to the 4 instruction sequence
963
                               "var_name + .store(mem_name) drop".
964
                        Note:  The default "drop" instruction can be overriden
965
                               by providing the optional second argument
966
                               similarly to the .storevalue macro.
967
  .storeoffset(var_name,offset)
968
                        stores the single-byte value at the top of the data
969
                        stack at var_name offset by "offset" bytes
970
                        Note:  This is equivalent to
971
                               "${var_name+offset} .store(mem_name) drop"
972
                        Note:  The optional third argument is as per the
973
                               optional second argument of .storevalue
974 5 sinclairrf
  .storeram(var_name)   is similar to the .store(mem_name) macro except that the
975
                        variable name is used to identify the RAM instead of the
976
                        name of the RAM
977
  .storeram+(var_name)  is similar to the .store+(mem_name) macro except that
978
                        the variable name is used to identify the RAM instead of
979
                        the name of the RAM
980
  .storeram-(var_name)  is similar to the .store-(mem_name) macro except that
981
                        the variable name is used to identify the RAM instead of
982
                        the name of the RAM
983
  .storevalue(var_name) stores the single-byte value at the top of the data
984
                        stack at var_name
985
                        Note:  This is equivalent to
986
                               "var_name .store(mem_name) drop"
987
                        Note:  The default "drop" instruction can be replaced by
988
                               providing the optional second argument.  For
989
                               example, the following instruction will store and
990
                               then decrement the value at the top of the data
991
                               stack:
992
                                 .storevalue(var_name,1-)
993 2 sinclairrf
  .storevector(var_name,N)
994
                        Does the reverse of the .fetchvector macro.
995
                        Note:  This is equivalent to the N+2 operation sequence
996
                               "var_name .store+(mem_name) ... .store+(mem_name)
997
                               .store(mem_name) drop"
998
                               where ".store+(mem_name)" is repeated N-1 times.
999
 
1000
The .fetchvector and .storevector macros are intended to work with values stored
1001
MSB first in memory and with the MSB toward the top of the data stack,
1002
similarly to the Forth language with multi-word values.  To demonstrate how
1003
this data structure works, consider the examples of decrementing and
1004
incrementing a two-byte value on the data stack:
1005
 
1006
  ; Decrement a 2-byte value
1007
  ;   swap 1- swap      - decrement the LSB
1008
  ;   over -1=          - puts -1 on the top of the data stack if the LSB rolled
1009
  ;                       over from 0 to -1, puts 0 on the top otherwise
1010
  ;   +                 - decrements the MSB if the LSB rolled over
1011
  ; ( u_LSB u_MSB - u_LSB' u_MSB' )
1012
  .function decrement_2byte
1013
  swap 1- swap over -1= .return(+)
1014
 
1015
  ; Increment a 2-byte value
1016
  ;   swap 1+ swap      - increment the LSB
1017
  ;   over 0=           - puts -1 on the top of the data stack if the LSB rolled
1018
  ;                       over from 0xFF to 0, puts 0 on the top otherwise
1019
  ;   -                 - increments the MSB if the LSB rolled over (by
1020
  ;                       subtracting -1)
1021
  ; ( u_LSB u_MSB - u_LSB' u_MSB' )
1022
  .function increment_2byte
1023
  swap 1+ swap over 0= .return(-)
1024
 
1025
 
1026
COMPUTED VALUES
1027
================================================================================
1028
 
1029
Computed values can be pushed on the stack using a "${...}" where the "..." is
1030
evaluated in Python and cannot have any spaces.
1031
 
1032
For example, a loop that should be run 5 times can be coded as:
1033
 
1034
  ${5-1} :loop ... .jumpc(loop,1-) drop
1035
 
1036
which is a clearer indication that the loop is to be run 5 times than is the
1037
instruction sequence
1038
 
1039
  4 :loop ...
1040
 
1041
Constants can be accessed in the computation.  For example, a block of memory
1042
can be allocated as follows:
1043
 
1044
  .constant C_RESERVE
1045
  .memory RAM myram
1046
  ...
1047
  .variable reserved 0*${C_RESERVE}
1048
 
1049
and the block of reserved memory can be cleared using the following loop:
1050
 
1051
  ${C_RESERVE-1} :loop 0 over .storeindexed(reserved) .jumpc(loop,1-) drop
1052
 
1053
The offsets of variables in their memory can also be accessed through a computed
1054
value.  The value of reserved could also be cleared as follows:
1055
 
1056
  ${reserved-1} ${C_RESERVE-1} :loop >r
1057
 
1058
  r> .jumpc(loop,-1) drop drop
1059
 
1060
This body of this version of the loop is the same length as the first version.
1061
In general, it is better to use the memory macros to access variables as they
1062
ensure the correct memory is accessed.
1063
 
1064
The sizes of memories can also be accessed using computed values.  If "myram" is
1065
a RAM, then "${size['myram']}" will push the size of "myram" on the stack.  As
1066
an example, the following code will clear the entire RAM:
1067
 
1068
  ${size['myram']-1} :loop 0 swap .jumpc(loop,.store-(myram)) drop
1069
 
1070
The lengths of I/O signals can also be accessed using computed values.  If
1071
"o_mask" is a mask, then "${size['o_mask']}" will push the size of the mask on
1072
the stack and "${2**size['o_mask']-1}" will push a value that sets all the bits
1073
of the mask.  The I/O signals include I/O signals instantiated by peripherals.
1074
For example, for the configuration command
1075
 
1076
  PERIPHERAL big_outport outport=O_BIG outsignal=o_big width=47
1077
 
1078
the width of the output signal is accessible using "${size['o_big']}".  You can
1079
set the wide signal to all zeroes using:
1080
 
1081
  ${(size['o_big']+7)/8-1} :loop 0 .outport(O_BIG) .jumpc(loop,1-) drop
1082
 
1083 3 sinclairrf
 
1084
MACROS
1085
================================================================================
1086 9 sinclairrf
 
1087 3 sinclairrf
There are 3 types of macros used by the assembler.
1088
 
1089
The first kind of macros are built in to the assembler and are required to
1090
encode instructions that have embedded values or have mandatory subsequent
1091
instructions.  These include function calls, jump instructions, function return,
1092
and memory accesses as follows:
1093
  .call(function,[op])
1094
  .callc(function,[op])
1095
  .fetch(ramName)
1096
  .fetch+(ramName)
1097
  .fetch-(ramName)
1098
  .jump(label,[op])
1099
  .jumpc(label,[op])
1100
  .return([op])
1101
  .store(ramName)
1102
  .store+(ramName)
1103
  .store-(ramName)
1104
 
1105
The second kind of macros are designed to ease access to input and output
1106
operations and for memory accesses and to help ensure these operations are
1107
correctly constructed.  These are defined as python scripts in the
1108
core/9x8/macros directory and are automatically loaded into the assembler.
1109
These macros are:
1110
  .fetchindexed(variable)
1111
  .fetchoffset(variable,ix)
1112
  .fetchvalue(variableName)
1113
  .fetchvector(variableName,N)
1114
  .inport(I_name)
1115
  .outport(O_name[,op])
1116
  .outstrobe(O_name)
1117
  .storeindexed(variableName[,op])
1118
  .storeoffset(variableName,ix[,op])
1119
  .storevalue(variableName[,op])
1120
  .storevector(variableName,N)
1121
 
1122
The third kind of macro is user-defined macros.  These macros must be registered
1123
with the assembler using the ".macro" directive.
1124
 
1125
For example, the ".push32" macro is defined by macros/9x8/push32.py and can be
1126
used to push 32-bit (4-byte) values onto the data stack as follows:
1127
 
1128
  .macro push32
1129
  .constant C_X 0x87654321
1130
  .main
1131
    ...
1132
    .push32(0x12345678)
1133
    .push32(C_X)
1134
    .push32(${0x12345678^C_X})
1135
    ...
1136
 
1137
The following macros are provided in macros/9x8:
1138
  .push16(v)    push the 16-bit (2-byte) value "v" onto the data stack with the
1139
                MSB at the top of the data stack
1140 4 sinclairrf
  .push24(v)    push the 24-bit (3-byte) value "v" onto the data stack with the
1141
                MSB at the top of the data stack
1142 3 sinclairrf
  .push32(v)    push the 32-bit (4-byte) value "v" onto the data stack with the
1143
                MSB at the top of the data stack
1144 4 sinclairrf
  .pushByte(v,ix)
1145
                push the ix'th byte of v onto the data stack
1146
                Note:  ix=0 designates the LSB
1147 3 sinclairrf
 
1148
Directories are searched in the following order for macros:
1149
  .
1150
  ./macros
1151
  include paths specified by the '-M' command line option.
1152
  macros/9x8
1153
 
1154
The python scripts in core/9x8/macros and macros/9x8 can be used as design
1155
examples for user-defined macros.  The assembler does some type checking based
1156
on the list provided when the macro is registered by the "AddMacro" method, but
1157
additional type checking is often warranted by the macro "emitFunction" which
1158
emits the actual assembly code.  The ".fetchvector" and ".storevector" macros
1159 4 sinclairrf
demonstrates how to design variable-length macros.  Several macros in
1160
core/9x8/macros illustrate designing macros with optional arguments.
1161 3 sinclairrf
 
1162
It is not an error to repeat the ".macro MACRO_NAME" directive for user-defined
1163
macros.  The assembler will issue a fatal error if a user-defined macro
1164
conflicts with a built-in macro.
1165
 
1166
 
1167 2 sinclairrf
CONDITIONAL COMPILATION
1168
================================================================================
1169
 
1170 9 sinclairrf
Conditional compilation is accepted in the architecture file and in the assembly
1171
source files and is based on whether or not the referenced symbol is defined.
1172 2 sinclairrf
 
1173 9 sinclairrf
In the architecture file, symbols are defined by CONSTANT, INPORT, OUTPORT, and
1174
PERIPHERAL statements, or by the "-D D_name" argument to ssbcc.  The line
1175 2 sinclairrf
 
1176 9 sinclairrf
  .IFDEF name
1177
  
1178
  .ENDIF
1179
 
1180
will then include the lines "" if the symbol "name" is defined.
1181
The ".IFNDEF" command is similar except that the statements included if the
1182
symbol is not defined.  A ".ELSE" is also provided.
1183
 
1184
In an assembly file directives are conditionally included using the .IFDEF and
1185
.IFNDEF directives, an optional .ELSE directive, and the terminating .ENDIF
1186
directive.  Note that this is done at the directive level, i.e. function
1187
declarations, memory declarations, and so forth.  Within a function code is
1188
conditionally included starting with a ".ifdef(name)" or a ".ifndef(name)", an
1189
optional ".else" and a terminating ".endif".
1190
 
1191
For example, a diagnostic UART can be conditionally included in a program by
1192
including the following lines in the architecture file:
1193
 
1194
  .IFDEF D_ENABLE_UART
1195 2 sinclairrf
  PORTCOMMENT Diagnostic UART
1196
  PERIPHERAL UART_Tx outport=O_UART_TX ...
1197
  .ENDIF
1198
 
1199 9 sinclairrf
A "uart_tx" function can be optionally created using code similar to the
1200
following.  Note that the symbol for the .outport macro is used to determine
1201
whether or not the function is defined since that is more closely related to
1202
whether or not the function can be defined.  The function definition must
1203
preceed any ".ifdef(uart_tx)" conditionals used to output diagnostics.
1204 2 sinclairrf
 
1205 9 sinclairrf
  .IFDEF O_UART_TX
1206
  .function uart_tx
1207
    :loop .outport(O_UART_TX) .jumpc(loop,nop) .return(drop)
1208
  .ENDIF
1209 2 sinclairrf
 
1210 9 sinclairrf
Diagnostics in the assembly code can be included using either
1211 2 sinclairrf
 
1212 9 sinclairrf
  .ifdef(D_ENABLE_UART) N"Msg\r\n" .call(uart_tx) .endif
1213 2 sinclairrf
 
1214 9 sinclairrf
or
1215
 
1216
  .ifdef(uart_tx) N"Msg\r\n" .call(uart_tx) .endif
1217
 
1218
Conditional compilation cannot cross file boundaries.
1219
 
1220
The assembler also recognizes the ".define" directive.  For example, specific
1221
diagnostics could be enabled if the UART is instantiated as follows:
1222
 
1223
  .IFDEF O_UART_TX
1224
  .define D_DEBUG_FUNCTION_A
1225
  .ENDIF
1226
 
1227
  ...
1228
 
1229
  .ifdef(D_DEBUG_FUNCTION_A) N"Debug Msg\r\n" .call(uart_tx) .endif
1230
 
1231
The following code illustrates how to preclude multiple attempted inclusions of
1232
an assembly library file.
1233
 
1234 2 sinclairrf
  ; put these two lines near the top of the file
1235 9 sinclairrf
  .IFNDEF D_FILENAME_INCLUDED
1236
  .define D_FILENAME_INCLUDED
1237 2 sinclairrf
  ; put the library body here
1238
  ...
1239
  ; put this line at the bottom of the file
1240 9 sinclairrf
  .ENDIF ; .IFNDEF D_FILENAME_INCLUDED
1241 2 sinclairrf
 
1242
The ".INCLUDE" configuration command can be used to read configuration commands
1243 9 sinclairrf
from additional sources.  For example, the following code will conditionally
1244
include a general UART library if the outport O_UART_TX is defined:
1245 2 sinclairrf
 
1246 9 sinclairrf
  .IFDEF O_UART_TX
1247
  .INCLUDE uart.s
1248
  .ENDIF
1249 2 sinclairrf
 
1250 9 sinclairrf
 
1251 2 sinclairrf
SIMULATIONS
1252
================================================================================
1253
 
1254
Simulations have been performed with Icarus Verilog, Verilator, and Xilinx'
1255
ISIM.  Icarus Verilog is good for short, simple simulations and is used for the
1256
core and peripheral test benches; Verilator for long simulations of large,
1257
complex systems; and ISIM when Xilinx-specific cores are used.  Verilator is
1258
the fastest simulators I've encountered.  Verilator is also used for lint
1259
checking in the core test benches.
1260
 
1261
 
1262
MEM INITIALIZATION FILE
1263
================================================================================
1264
 
1265
A memory initialization file is produced during compilation.  This file can be
1266
used with tools such as Xilinx' data2mem to modify the SRAM contents without
1267
having to rebuild the entire system.  It is restricted to the opcode memory
1268
initialization.  The file must be processed before it can be used by specific
1269
tools, see doc/MemoryInitialization.html.
1270
 
1271
WARNING:  The values of parameters used in the assembly code must match the
1272
instantiated design.
1273
 
1274
 
1275
THEORY OF OPERATION
1276
================================================================================
1277
 
1278
Registers are used for the top of data stack, "T", and the next-to-top of the
1279
data stack, "N".  The data stack is a separate memory.  This means that the
1280
"DATA_STACK N" configuration command actually allows N+2 values in the data
1281
stack since T and N are not stored in the N-element deep data stack.
1282
 
1283
The return stack is similar in that "R" is the top of the return stack and the
1284
"RETURN_STACK N" allocates an additional N words of memory.  The return stack is
1285
the wider of the 8-bit data width and the program counter width.
1286
 
1287
The program counter is always either incremented by 1 or is set to an address
1288
as controlled by jump, jumpc, call, callc, and return instructions.  The
1289
registered program counter is used to read the next opcode from the instruction
1290
memory and this opcode is also registered in the memory.  This means that there
1291
is a 1 clock cycle delay between the address changing and the associated
1292
instruction being performed.  This is also part of the architecture required to
1293
have the processor operate at one instruction per clock cycle.
1294
 
1295
Separate ALUs are used for the program counter, adders, logical operations, etc.
1296
and MUXes are used to select the values desired for the destination registers.
1297
The instruction execution consists of translating the upper 6 msb of the opcode
1298
into MUX settings and performing opcode-dependent ALU operations as controlled
1299
by the 3 lsb of the opcode (during the first half of the clock cycle) and then
1300
setting the T, N, R, memories, etc. as controlled by the computed MUX settings.
1301
 
1302
The "core.v" file is the code for these operations.  Within this file there are
1303
several "@xxx@" strings that specify where the computer compiler is to insert
1304
code such as I/O declarations, memories, inport interpretation, outport
1305
generation, peripherals, etc.
1306
 
1307
The file structure, i.e., putting the core and the assembler in "core/9x8"
1308
should facilitate application-specific modification of processor.  For example,
1309
the store+, store-, fetch+, and fetch- instructions could be replaced with
1310
additional stack manipulation operations, arithmetic operations with 2 byte
1311
results, etc.  Simply copy the "9x8" directory to something like "9x8_XXX" and
1312
make your modifications in that directory.  The 8-bit peripherals should still
1313
work, but the 9x8 library functions may need rework to accommodate the
1314
modifications.
1315
 
1316
 
1317
MISCELLANEOUS
1318
================================================================================
1319
 
1320 4 sinclairrf
Features and peripherals are still being added and the documentation is
1321
incomplete.  The output HDL is currently restricted to Verilog although a VHDL
1322
package file is automatically generated by the computer compiler.
1323
 
1324 2 sinclairrf
The "INVERT_RESET" configuration command is used to indicate an active-low reset
1325
is input to the micro controller rather than an active-high reset.

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.