OpenCores
URL https://opencores.org/ocsvn/ssbcc/ssbcc/trunk

Subversion Repositories ssbcc

[/] [ssbcc/] [trunk/] [README] - Blame information for rev 12

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 sinclairrf
SSBCC.9x8 is a free Small Stack-Based Computer Compiler with a 9-bit opcode,
2 4 sinclairrf
8-bit data core designed to facilitate FPGA HDL development.
3 2 sinclairrf
 
4 4 sinclairrf
The primary design criteria are:
5
- high speed (to avoid timing issues)
6
- low fabric utilization
7
- vendor independent
8
- development tools available for all operating systems
9
 
10
It has been used in Spartan-3A, Spartan-6, Virtex-6, and Artix-7 FPGAs and has
11
been built for Altera, Lattice, and other Xilinx devices.  It is faster and
12
usually smaller than vendor provided processors.
13
 
14 2 sinclairrf
The compiler takes an architecture file that describes the micro controller
15
memory spaces, inputs and outputs, and peripherals and which specifies the HDL
16
language and source assembly.  It generates a single HDL module implementing
17
the entire micro controller.  No user-written HDL is required to instantiate
18
I/Os, program memory, etc.
19
 
20 4 sinclairrf
The features are:
21
- high speed, low fabric utilization
22
- vendor-independent Verilog output with a VHDL package file
23 7 sinclairrf
- simple Forth-like assembly language (43 instructions)
24 4 sinclairrf
- single cycle instruction execution
25
- automatic generation of I/O ports
26
- configurable instruction, data stack, return stack, and memory utilization
27
- extensible set of peripherals (I2C busses, UARTs, AXI4-Lite busses, etc.)
28 12 sinclairrf
- optional interrupt peripheral and interrupt handler
29 4 sinclairrf
- extensible set of macros
30
- memory initialization file to facilitate code development without rebuilds
31
- simulation diagnostics to facilitate identifying code errors
32
- conditionally included I/Os and peripherals, functions, and assembly code
33
 
34 2 sinclairrf
SSBCC has been used for the following projects:
35
- operate a media translator from a parallel camera interface to an OMAP GPMC
36
  interface, detect and report bus errors and hardware errors, and act as an
37
  SPI slave to the OMAP
38
- operate two UART interfaces and multiple PWM controlled 2-lead bi-color LEDs
39
- operate and monitor the Artix-7 fabric in a Zynq system using AXI4-Lite
40
  master and slave buses, I2C buses for timing-critical voltage measurements
41
 
42 4 sinclairrf
The only external tool required is Python 2.7.
43 2 sinclairrf
 
44 4 sinclairrf
 
45 2 sinclairrf
DESCRIPTION
46
================================================================================
47
 
48
The computer compiler uses an architectural description of the processor stating
49
the sizes of the instruction memory, data stack, and return stack; the input and
50
output ports; RAM and ROM types and sizes; and peripherals.
51
 
52 9 sinclairrf
The instructions are all single-cycle.  The instructions are:
53 4 sinclairrf
- 4 arithmetic instructions:  addition, subtraction, increment, and decrement
54 7 sinclairrf
- 2 carry bit instructions:  +c and -c for addition and subtraction respectively
55 4 sinclairrf
- 3 bit-wise logical instructions:  and, or, and exclusive or
56 9 sinclairrf
- 7 shift and rotation instructions: <<0, <<1, 0>>, 1>>, <>, and lsb>>
57 4 sinclairrf
- 4 logical instructions:  0=, 0<>, -1=, -1<>
58
- 6 Forth-like data stack instructions:  drop, dup, nip, over, push, swap
59
- 3 Forth-like return stack instructions:  >r, r>, r@
60 9 sinclairrf
- 2 I/O: inport, outport
61 4 sinclairrf
- 6 memory read and write with optional address post increment and post decrement
62
- 2 jump and conditional jump
63
- 2 call and conditional call
64
- 1 function return
65
- 1 nop
66 2 sinclairrf
 
67
The 9x8 address space is up to 8K.  This is achieved by pushing the 8 lsb of the
68
target address onto the data stack immediately before the jump or call
69
instruction and by encoding the 5 msb of the address within the jump or call
70
instruction.  The instruction immediately following a jump, call, or return is
71
executed before the instruction sequence at the destination address is executed
72
(this is illustrated later).
73
 
74
Up to four banks of memory, either RAM or ROM, are available.  Each of these can
75
be up to 256 bytes long, providing a total of up to 1 kB of memory.
76
 
77 4 sinclairrf
The assembly language is Forth-like.  Built-in macros are used to encode the
78
jump and call instructions and to encode the 2-bit memory bank index in memory
79
store and fetch instructions.
80 2 sinclairrf
 
81
The computer compiler and assembler are written in Python 2.7.  Peripherals are
82
implemented by Python modules which generate the I/O ports and the peripheral
83
HDL.
84
 
85
The computer compiler is documented in the doc directory.  The 9x8 core is
86
documented in the core/9x8/doc directory.  Several examples are provided.
87
 
88
The computer compiler and assembler are fully functional and there are no known
89
bugs.
90
 
91
 
92
SPEED AND RESOURCE UTILIZATION
93
================================================================================
94 9 sinclairrf
 
95 2 sinclairrf
These device speed and resource utilization results are copied from the build
96 7 sinclairrf
tests.  The full results are listed in the core/9x8/build directories.  The
97
tests use a minimal processor implementation (clock, reset, and one output).
98
Device-specific scripts state how these performance numbers were obtained.
99 2 sinclairrf
 
100
VENDOR          DEVICE          BEST SPEED      SMALLEST RESOURCE UTILIZATION
101
------          ------          ----------      -------------------------------
102
Altera          Cyclone-III     190.6 MHz       282 LEs           (preliminary)
103
Altera          Cyclone-IV      192.1 MHz       281 LEs           (preliminary)
104
Altera          Stratix-V       372.9 MHz       198 ALUTs         (preliminary)
105
Lattice         LCMXO2-640ZE-3   98.4 MHz       206 LUTs          (preliminary)
106
Lattice         LFE2-6E-7       157.9 MHz       203 LUTs          (preliminary)
107 7 sinclairrf
Xilinx          Artix-7         TBD             163 slice LUTs (48 slices)
108
Xilinx          Kintex-7        TBD             158 slice LUTs (44 slices)
109
Xilinx          Spartan-3A      149.4 MHz       232 4-input LUTs (129 slices)
110
Xilinx          Spartan-6       193.7 MHz       124 Slice LUTs (34 slices)
111 6 sinclairrf
Xilinx          Virtex-6        275.7 MHz       122 Slice LUTs (38 slices) (p.)
112 2 sinclairrf
 
113
Disclaimer:  Like other embedded processors, these are the maximum performance
114
claims.  Realistic implementations will produce slower maximum clock rates,
115
particularly with lots of I/O ports and peripherals and with the constraint of
116
existing with other subsystems in the FPGA fabric.  What these performance
117
numbers do provide is an estimate of the amount of slack available.  For
118
example, you can't realistically expect to get 110 MHz from a processor that,
119 6 sinclairrf
under ideal conditions, places and routes at 125 MHz, but you can with a
120
processor that is demonstrated to place and route at 150 MHz.
121 2 sinclairrf
 
122
 
123
EXAMPLE:
124
================================================================================
125
 
126
The LED flasher example demonstrates the simplicity of the architectural
127
specification and the Forth-like assembly language.
128
 
129
The architecture file, named "led.9x8", with the comments and user header
130
removed, is as follows:
131
 
132
  ARCHITECTURE    core/9x8 Verilog
133
 
134
  INSTRUCTION     2048
135
  RETURN_STACK    32
136
  DATA_STACK      32
137
 
138
  PORTCOMMENT LED on/off signal
139
  OUTPORT 1-bit o_led O_LED
140
 
141
  ASSEMBLY led.s
142
 
143
The ARCHITECTURE configuration command specifies the 9x8 core and the Verilog
144
language.  The INSTRUCTION, RETURN_STACK, and DATA_STACK configuration commands
145
specify the sizes of the instruction space, return stack, and data stack.  The
146
content of the PORTCOMMENT configuration command is inserted in the module
147
declaration -- this facilitates identifying signals in micro controllers with a
148
lot of inputs and outputs.  The single OUTPORT statement specifies a 1-bit
149
signal named "o_led".  This signal is accessed in the assembly code through the
150
symbol "O_LED".  The ASSEMBLY command specifies the single input file "led.s,"
151
which is listed below.  The output module will be "led.v"
152
 
153
The "led.s" assembly file is as follows:
154
 
155
  ; Consume 256*5+4 clock cycles.
156
  ; ( - )
157
  .function pause
158
 
159
  .return
160
 
161
  ; Repeat "pause" 256 times.
162
  ; ( - )
163
  .function repause
164
 
165
  .return
166
 
167
  ; main program (as an infinite loop)
168
  .main
169
 
170
 
171
This example is coded in a traditional Forth structure with the conditional
172
jumps consuming the top of the data stack.  Examining the "pause" function, the
173
".function" directive specifies the start of a function and the function name.
174
The "0" instruction pushes the value "0" onto the top of the data stack.
175
":inner" is a label for a jump instruction.  The "1-" instruction decrements the
176
top of the data stack.  "dup" is the Forth instruction to push a duplicate of
177
the top of the data stack onto the data stack.  The ".jumpc(inner)" macro
178
expands to three instructions as follows:  (1) push the 8 lsb of the address at
179
"inner" onto the data stack, (2) the conditional jump instruction with the 5 msb
180
of the address of "inner" (the jumpc instruction also drops the top of the data
181
stack with its partial address), and (3) a "drop" instruction to drop the
182
duplicated loop count from the top of the data stack.  Finally, the "drop"
183
instruction drops the loop count from the top of the data stack and the
184
".return" macro generates the "return" instruction and a "nop" instruction.
185
 
186
The function "repause" calls the "pause" function 256 times.  The main program
187
body is identified by the directive ".main"  This function runs an infinite loop
188
that toggles the lsb of the LED output, outputs the LED setting, and calls the
189
"repause" function.
190
 
191
A tighter version of the loop in the "pause" function can be written as
192
 
193
  ; Consume 256*3+3 clock cycles.
194
  ; ( - )
195
  .function pause
196
    0xFF :inner .jumpc(inner,1-) .return(drop)
197
 
198
which is 3 cycles long for each iteration, the "drop" that is normally part
199
of the ".jumpc" macro has been replaced by the decrement instruction, and the
200
final "drop" instruction has replaced the default "nop" instruction that is
201
normally part of the ".return" macro.  Note that the decrement is performed
202
after the non-zero comparison in the "jumpc" instruction.
203
 
204
A version of the "pause" function that consumes exactly 1000 clock cycles is:
205
 
206
  .function pause
207
    ${(1000-4)/4-1} :inner nop .jumpc(inner,1-) drop .return
208
 
209
The instruction memory initialization for the processor module includes the
210
instruction mnemonics being performed at each address and replaces the "list"
211
file output from traditional assemblers.  The following is the memory
212
initialization for this LED flasher example.  The main program always starts at
213
address zero and functions are included in the order encountered.  Unused
214
library functions are not included in the generated instruction list.
215
 
216
  reg [8:0] s_opcodeMemory[2047:0];
217
  initial begin
218
    // .main
219
    s_opcodeMemory['h000] = 9'h100; // 0x00
220
    s_opcodeMemory['h001] = 9'h101; // :inner 0x01
221
    s_opcodeMemory['h002] = 9'h052; // ^
222
    s_opcodeMemory['h003] = 9'h008; // dup
223
    s_opcodeMemory['h004] = 9'h100; // O_LED
224
    s_opcodeMemory['h005] = 9'h038; // outport
225
    s_opcodeMemory['h006] = 9'h054; // drop
226
    s_opcodeMemory['h007] = 9'h10D; //
227
    s_opcodeMemory['h008] = 9'h0C0; // call repause
228
    s_opcodeMemory['h009] = 9'h000; // nop
229
    s_opcodeMemory['h00A] = 9'h101; //
230
    s_opcodeMemory['h00B] = 9'h080; // jump inner
231
    s_opcodeMemory['h00C] = 9'h000; // nop
232
    // repause
233
    s_opcodeMemory['h00D] = 9'h100; // 0x00
234
    s_opcodeMemory['h00E] = 9'h119; // :inner
235
    s_opcodeMemory['h00F] = 9'h0C0; // call pause
236
    s_opcodeMemory['h010] = 9'h000; // nop
237
    s_opcodeMemory['h011] = 9'h05C; // 1-
238
    s_opcodeMemory['h012] = 9'h008; // dup
239
    s_opcodeMemory['h013] = 9'h10E; //
240
    s_opcodeMemory['h014] = 9'h0A0; // jumpc inner
241
    s_opcodeMemory['h015] = 9'h054; // drop
242
    s_opcodeMemory['h016] = 9'h054; // drop
243
    s_opcodeMemory['h017] = 9'h028; // return
244
    s_opcodeMemory['h018] = 9'h000; // nop
245
    // pause
246
    s_opcodeMemory['h019] = 9'h100; // 0x00
247
    s_opcodeMemory['h01A] = 9'h05C; // :inner 1-
248
    s_opcodeMemory['h01B] = 9'h008; // dup
249
    s_opcodeMemory['h01C] = 9'h11A; //
250
    s_opcodeMemory['h01D] = 9'h0A0; // jumpc inner
251
    s_opcodeMemory['h01E] = 9'h054; // drop
252
    s_opcodeMemory['h01F] = 9'h054; // drop
253
    s_opcodeMemory['h020] = 9'h028; // return
254
    s_opcodeMemory['h021] = 9'h000; // nop
255
    s_opcodeMemory['h022] = 9'h000;
256
    s_opcodeMemory['h023] = 9'h000;
257
    s_opcodeMemory['h024] = 9'h000;
258
    ...
259
    s_opcodeMemory['h7FF] = 9'h000;
260
  end
261
 
262
 
263
DATA and STRINGS
264
================================================================================
265
 
266
Values are pushed onto the data stack by stating the value.  For example,
267
 
268
  0x10 0x20 'x'
269
 
270
will successively push the values 0x10, 0x20, and the character 'x' onto the
271
data stack.  The character 'x' will be at the top of the data stack after these
272
3 instructions.
273
 
274 5 sinclairrf
Numeric values can be represented in binary, octal, decimal, and hex.  Binary
275
values start with the two characters "0b" followed by a sequence of binary
276
digits; octal numbers start with a "0" followed by a sequence of octal digits;
277
decimal values can start with a "+" or "-" have a non-zero first digit and have
278
zero or more decimal digits; and hex values start with the two characters "0X"
279
followed by a sequence of hex digits.
280 2 sinclairrf
 
281 5 sinclairrf
Examples of equivalent numeric values are:
282
  binary:   0b01  0b10010
283
  octal:    01    022
284
  decimal:  1     18
285
  hex:      0x1   0x12
286
 
287
See the COMPUTED VALUES section for using computed values in the assembler.
288
 
289 2 sinclairrf
There are four ways to specify strings in the assembler.  Simply stating the
290
string
291
 
292
  "Hello World!"
293
 
294
puts the characters in the string onto the data stack with the letter 'H' at the
295
top of the data stack.  I.e., the individual push operations are
296
 
297
  '!' 'd' 'l' ... 'e' 'H'
298
 
299
Prepending a 'N' before the double quote, like
300
 
301
  N"Hello World!"
302
 
303
puts a null-terminated string onto the data stack.  I.e., the value under the
304
'!' will be a 0x00 and the instruction sequence would be
305
 
306
  0x0 '!' 'd' 'l' ... 'e' 'H'
307
 
308
Forth uses counted strings, which are specified here as
309
 
310
  C"Hello World!"
311
 
312 4 sinclairrf
In this case the number of characters, 12, in the string is pushed onto the data
313
stack after the 'H', i.e., the instruction sequence would be
314 2 sinclairrf
 
315
  '!' 'd' 'l' ... 'e' 'H' 12
316
 
317
Finally, a lesser-counted string specified like
318
 
319
  c"Hello World!"
320
 
321
is similar to the Forth-like counted string except that the value pushed onto
322
the data stack is one less than the number of characters in the string.  Here
323
the value pushed onto the data stack after the 'H' would be 11 instead of 12.
324
 
325
Simple strings are useful for constructing more complex strings in conjunction
326
with other string functions.   For example, to transmit the hex values of the
327
top 2 values in the data stack, do something like:
328
 
329
  ; move the top 2 values to the return stack
330
  >r >r
331
  ; push the tail of the message onto the data stack
332
  N"\n\r"
333
  ; convert the 2 values to 2-digit hex values, LSB deepest in the stack
334
  r> .call(string_byte_to_hex)
335
  r> .call(string_byte_to_hex)
336
  ; pre-pend the identification message
337
  "Message:  "
338
  ; transmit the string, using the null terminator to terminate the loop
339
  :loop_transmit .outport(O_UART_TX) .jumpc(loop_transmit,nop) drop
340
 
341
A lesser-counted string would be used like:
342
 
343
  c"Status Message\r\n"
344
  :loop_msg swap .outport(O_UART_TX) .jumpc(loop_msg,1-) drop
345
 
346
These four string formats can also be used for variable definitions.  For
347
example 3 variables could be allocated and initialized as follows:
348
 
349
  .memory ROM myrom
350
  .variable fred N"fred"
351
  .variable joe  c"joe"
352
  .variable moe  "moe"
353
 
354
These are equivalent to
355
 
356
  .variable fred 'f' 'r' 'e' 'd'  0
357
  .variable joe   2  'j' 'o' 'e'
358
  .variable moe  'm' 'o' 'e'
359
 
360
with 5 bytes allocated for the variable fred, 4 bytes for joe, and 3 bytes for
361
moe.
362
 
363
The following escaped characters are recognized:
364
 
365
  '\0'     null character
366
  '\a'     bell
367
  '\b'     backspace
368
  '\f'     form feed
369
  '\n'     line feed
370
  '\r'     carriage return
371
  '\t'     horizontal tab
372
  "\0ooo"  3-digit octal value
373
  "\xXX"   2-digit hex value where X is one of 0-9, a-f, or A-F
374
  "\Xxx"   alternate form for 2-digit hex value
375
  "\\"     backslash character
376
 
377
Unrecognized escaped characters are simple treated as that character.  For
378
example, '\m' is treated as the single character 'm' and '\'' is treated as the
379
single quote character.
380
 
381
 
382
INSTRUCTIONS
383
================================================================================
384
 
385 7 sinclairrf
The 43 instructions are as follows (see core/9x8/doc/opcodes.html for detailed
386 2 sinclairrf
descriptions).  Here, T is the top of the data stack, N is the next-to-top of
387
the data stack, and R is the top of the return stack.  All of these are the
388
values at the start of the instruction.
389
 
390
The nop instruction does nothing:
391
 
392
  nop           no operation
393
 
394
Mathematical operations drop one value from the data stack and replace the new
395
top with the state value:
396
 
397
  &             bitwise and of N and T
398
  +             N + T
399
  -             N - T
400
  ^             bitwise exclusive or of N and T
401
  or            bitwise or of N and T
402
 
403 7 sinclairrf
Push the carry bit for addition or subtraction onto the data stack (see
404
lib/9x8/math.s for examples on using +c and -c for multi-byte arithmetic):
405
 
406
  +c            carry bit for N+T
407
  -c            carry bit for N-T
408
 
409 2 sinclairrf
Increment and decrement replace the top of the data stack with the stated
410
result.
411
 
412
  1+            replace T with T+1
413
  1-            replace T with T-1
414
 
415
Comparison operations replace the top of the data stack with the results of the
416
comparison:
417
 
418
  -1<>          replace T with -1 if T != -1, otherwise set T to 0
419
  -1=           replace T with 0 if T != -1, otherwise leave T as -1
420
  0<>           replace T with -1 if T != 0, otherwise leave T as 0
421
  0=            replace T with -1 if T == 0, otherwise set T to 0
422
 
423
Shift/rotate operations replace the top of the data with with the result of the
424
specified shift/rotate.
425
 
426
  0>>           shift T right one bit and set the msb to 0
427
  1>>           shift T right 1 bit and set the msb to 1
428
  <<0           shift T left 1 bit and set the lsb to 0
429
  <<1           shift T left 1 bit and set the lsb to 1
430
  <
431
  lsb>>         rotate T right 1 bit
432
  msb>>         shift T right 1 bit and set the msb to the old msb
433
 
434
Note:  There is no "<
435
 
436
Stack manipulation instructions are as follows:
437
 
438
  >r            pushd T onto the return stack and drop T from the data stack
439
  drop          drop T from the data stack
440
  dup           push T onto the data stack
441
  nip           drop N from the data stack
442
  over          push N onto the data stack
443
  push          push a single byte onto the data stack, see the preceding DATA
444
                and STRINGS section
445
  r>            push R onto the data stack and drop R from the return stack
446
  r@            push R onto the data stack
447
  swap          swap N and T
448
 
449
Jump and call and their conditional variants are as follows and must use the
450
associated macro:
451
 
452
  call          call instruction -- use the .call macro
453
  callc         conditional call instruction -- use the .callc macro
454
  jump          jump instruction -- use the .jump macro
455
  jumpc         conditional jump instruction -- use the .jumpc macro
456
  return        return instruction -- use the .return macro
457
 
458
See the MEMORY section for details for these memory operations.  T is the
459
address for the instructions, N is the value stored.  Chained fetches insert the
460
value below T.  Chained stores drop N.
461
 
462
  fetch         memory fetch, replace T with the value fetched
463
  fetch+        chained memory fetch, retain and increment the address
464
  fetch-        chained memory fetch, retain and decrement the address
465
  store         memory store, drop T (N is the next value of T)
466
  store+        chained memory store, retain and increment the address
467
  store-        chained memory store, retain and decrement the address
468
 
469
See the INPORT and OUTPORT section for details for the input and output port
470
operations:
471
 
472
  inport        input port operation
473
  outport       output port operation
474
 
475
The .call, .callc, .jump, and .jumpc macros encode the 3 instructions required
476
to perform a call or jump along with the subsequent instructions.  The default
477
third instructions is "nop" for .call and .jump and it is "drop" for .callc and
478
.jumpc.  The default can be changed by specifying the optional second argument.
479
The .call and .callc macros must specify a function identified by the .function
480
directive and the .jump and .jumpc macros must specify a label.
481
 
482
The .function directive takes the name of the function and the function body.
483
Function bodies must end with a .return or a .jump macro.  The .main directive
484
defines the body of the main function, i.e., the function at which the processor
485
starts.
486
 
487
The .include directive is used to read additional assembly code.  You can, for
488
example, put the main function in uc.s, define constants and such in consts.s,
489
define the memories and variables in ram.s, and include UART utilities in
490
uart.s.  These files could be included in uc.s through the following lines:
491
 
492
  .include consts.s
493
  .include myram.s
494
  .include uart.s
495
 
496
The assembler only includes functions that can be reached from the main
497
function.  Unused functions will not consume instruction space.
498
 
499
 
500
INPORT and OUTPORT
501
================================================================================
502
 
503
The INPORT and OUTPORT configuration commands are used to specify 2-state inputs
504
and outputs.  For example
505
 
506
  INPORT 8-bit i_value I_VALUE
507
 
508
specifies a single 8-bit input signal named "i_value" for the module.  The port
509
is accessed in assembly by ".inport(I_VALUE)" which is equivalent to the
510
two-instruction sequence "I_VALUE inport".  To input an 8-bit value from a FIFO
511
and send a single-clock-cycle wide acknowledgment strobe, use
512
 
513
  INPORT 8-bit,strobe i_fifo,o_fifo_ack I_FIFO
514
 
515
The assembly ".inport(I_FIFO)" will automatically send an acknowledgment strobe
516
to the FIFO through "o_fifo_ack".
517
 
518
A write port to an 8-bit FIFO is similarly specified by
519
 
520
  OUTPORT 8-bit,strobe o_fifo,o_fifo_wr O_FIFO
521
 
522
The assembly ".outport(O_FIFO)" which is equivalent to "O_FIFO outport drop"
523
will automatically send a write strobe to the FIFO through "o_fifo_wr".
524
 
525
Multiple signals can be packed into a single input or output port by defining
526
them in comma separated lists.  The associated bit masks can be defined
527
coincident with the port definition as follows:
528
 
529
  INPUT 1-bit,1-bit i_fifo_full,i_fifo_empty I_FIFO_STATUS
530
  CONSTANT C_FIFO_STATUS__FULL  0x02
531
  CONSTANT C_FIFO_STATUS__EMPTY 0x01
532
 
533
Checking the "full" status of the FIFO can be done by the following assembly
534
sequence:
535
 
536
  .inport(I_FIFO_STATUS) C_FIFO_STATUS__FULL &
537
 
538
Multiple bits can be masked using a computed value as follows (see below for
539
more details):
540
 
541
  .inport(I_FIFO_STATUS) ${C_FIFO_STATUS__FULL|C_FIFO_STATUS__EMPTY} &
542
 
543
The "${...}" creates an instruction to push the 8-bit value in the braces onto
544
the data stack.  The computation is performed using the Python "eval" function
545
in the context of the program constants, memory addresses, and memory sizes.
546
 
547
Preceding all of these by
548
 
549
  PORTCOMMENT external FIFO
550
 
551
produces the following in the Verilog module statement.  The I/O ports are
552
listed in the order in which they are declared.
553
 
554
  // external FIFO
555
  input  wire       [7:0] i_fifo,
556
  output reg              o_fifo_ack,
557
  output reg        [7:0] o_fifo,
558
  output reg              o_fifo_wr,
559
  input  wire             i_fifo_full,
560
  input  wire             i_fifo_empty
561
 
562
The HDL to implement the inputs and outputs is computer generated.  Identifying
563
the port name in the architecture file eliminates the possibility of
564
inconsistent port numbers between the HDL and the assembly.  Specifying the bit
565
mapping for the assembly code immediately after the port definition helps
566
prevent inconsistencies between the port definition and the bit mapping in the
567
assembly code.
568
 
569
The normal initial value for an outport is zero.  This can be changed by
570
including an optional initial value as follows.  This initial value will be
571
applied on system startup and when the micro controller is reset.
572
 
573
  OUTPORT 4-bit=4'hA o_signal O_SIGNAL
574
 
575
An isolated output strobe can also be created using:
576
 
577
  OUTPORT strobe o_strobe O_STROBE
578
 
579
The assembly ".outstrobe(O_STROBE)" which is equivalent to "O_STROBE outport"
580
is used to generate the strobe.  Since "O_STROBE" is a strobe-only outport, the
581
".outport" macro cannot be used with it.  Similarly, attempting to use the
582
".outstrobe" macro will generate an error if it is invoked with an outport
583
that does have data.
584
 
585
A single-bit "set-reset" input port type is also included.  This sets a register
586
when an external strobe is received and clears the register when the port is
587
read.  For example, to capture an external timer for a polled-loop, include the
588
following in the architecture file:
589
 
590
  PORTCOMMENT external timer
591
  INPORT set-reset i_timer I_TIMER
592
 
593
The following is the assembly code to conditionally call two functions when the
594
timer event is encountered:
595
 
596
  .inport(I_TIMER)
597
    .callc(timer_event_1,nop)
598
    .callc(timer_event_2)
599
 
600
The "nop" in the first conditional call prevents the conditional from being
601
dropped from the data stack so that it can be used by the subsequent conditional
602
function call.
603
 
604 9 sinclairrf
The input from a set-reset INPORT is a pure flag.  I.e., either all of the bits
605
are zero or all of the bits are one.  This can be used as part of executing a
606
loop a fixed number of times.  For example, the inperiod argument of the
607
servo_motor peripheral can be used to receive a strobe every time the PWM goes
608
high.  The following loop will wait for 10 occurrences of the rising edge of the
609
servo_motor PWM before proceeding to the next block of code:
610 2 sinclairrf
 
611 9 sinclairrf
  10 :loop .inport(I_INPERIOD) + .jumpc(loop,nop) drop
612
 
613
 
614 2 sinclairrf
PERIPHERAL
615
================================================================================
616
 
617
Peripherals are implemented via Python modules.  For example, an open drain I/O
618
signal, such as is required for an I2C bus, does not fit the INPORT and OUTPORT
619
functionality.  Instead, an "open_drain" peripheral is provided by the Python
620
script in "core/9x8/peripherals/open_drain.py".  This puts a tri-state I/O in
621
the module statement, allows it to be read through an "inport" instruction, and
622
allows it to be set low or released through an "outport" instruction.  An I2C
623
bus with separate SCL and SDA ports can then be incorporated into the processor
624
as follows:
625
 
626
  PORTCOMMENT     I2C bus
627
  PERIPHERAL      open_drain      inport=I_SCL \
628
                                  outport=O_SCL \
629
                                  iosignal=io_scl
630
  PERIPHERAL      open_drain      inport=I_SDA \
631
                                  outport=O_SDA \
632
                                  iosignal=io_sda
633
 
634
The default width for this peripheral is 1 bit.  The module statement will then
635
include the lines
636
 
637
  // I2C bus
638
  inout  wire     io_scl,
639
  inout  wire     io_sda
640
 
641
The assembly code to set the io_scl signal low is "0 .outport(O_SCL)" and to
642
release it is "1 .outport(O_SCL)".  These instruction sequences are actually
643
"0 O_SCL outport drop" and "1 O_SCL outport drop" respectively.  The "outport"
644
instruction drops the top of the data stack (which contained the port number)
645
and sends the next-to-the-top of the data stack to the designated output port.
646
 
647
Two examples of I2C device operation are included in the examples directory.
648
 
649
The following peripherals are provided:
650
  adder_16bit   16-bit adder/subtractor
651
  AXI4_Lite_Master
652
                32-bit read/write AXI4-Lite Master
653
                Note:  The synchronous version has been tested on hardware.
654
  AXI4_Lite_Slave_DualPortRAM
655
                dual-port-RAM interface for the micro controller to act as an
656
                AXI4-Lite slave
657
  big_inport    shift reads from a single INPORT to construct a wide input
658
  big_outport   shift writes to a single OUTPORT to construct a wide output
659
  counter       counter for number of received high cycles from signal
660
  inFIFO_async  input FIFO with an asynchronous write clock
661
  latch         latch wide inputs for sampling
662
  monitor_stack simulation diagnostic (see below)
663
  open_drain    for software-implemented I2C buses or similar
664
  outFIFO_async output FIFO with an asynchronous read clock
665
  PWM_8bit      PWM generator with an 8-bit control
666 9 sinclairrf
  servo_motor   PWM modulation suitable for servo motor or similar control
667
  stepper_motor stepper motor controller with acceleration
668 2 sinclairrf
  timer         timing for polled loops or similar
669
  trace         simulation diagnostic (see below)
670
  UART          bidirectional UART
671
  UART_Rx       receive UART
672
  UART_Tx       transmit UART
673 3 sinclairrf
  wide_strobe   1 to 8 bit strobe generator
674 2 sinclairrf
 
675
The following command illustrates how to display the help message for
676
peripherals:
677
 
678
  echo "ARCHITECTURE core/9x8 Verilog" | ssbcc -P "big_inport help" - | less
679
 
680
User defined peripherals can be in the same directory as the architecture file
681
or a subdirectory named "peripherals".
682
 
683
 
684 12 sinclairrf
INTERRUPTS
685
================================================================================
686
 
687
Interrupts are enabled by including an interrupt peripheral in the architecture
688
file and an interrupt handler in the assembly code.
689
 
690
Incorporating an interrupt adds the following output ports as strobes:
691
 
692
  O_INTERRUPT_DIS       disable interrupts
693
  O_INTERRUPT_ENA       enable interrupts
694
 
695
and the following 3 macros
696
 
697
  .dis          disable interrupts
698
                Note:  This is equivalent to .outstrobe(O_INTERRUPT_DIS)
699
  .ena          enable interrupts
700
                Note:  This is equivalent to .outstrobe(O_INTERRUPT_ENA)
701
  .returni      return from interrupt handler and enable interrupts
702
                Note:  This is equivalent to the 3 instruction sequence
703
                       "O_INTERRUPT_ENA return outport"
704
                Note:  The assembler prohibits the ".return" macro in the
705
                       interrupt handler and the ".returni" macro anywhere other
706
                       than the interrupt handler.
707
 
708
See examples/interrupt for a demonstration of an interrupt handler for two
709
events, one external to the processor and one internal to the processor.
710
 
711
The interrupt handler included in core/9x8/peripherals handles one to eight
712
interrupt sources.  See core/9x8/doc/interrupt.html for instructions on creating
713
custom interrupt peripherals, possibly with more interrupt sources.
714
 
715
 
716 2 sinclairrf
PARAMETER and LOCALPARAM
717
================================================================================
718
 
719
Parameters are incorporated through the PARAMETER and LOCALPARAM configuration
720
commands.  For example, the clock frequency in hertz is needed for UARTs for
721
their baud rate generator.  The configuration command
722
 
723
  PARAMETER G_CLK_FREQ_HZ 97_000_000
724
 
725
specifies the clock frequency as 97 MHz.  The HDL instantiating the processor
726
can change this specification.  The frequency can also be changed through the
727
command-line invocation of the computer compiler.  For example,
728
 
729
  ssbcc -G "G_CLK_FREQ_HZ=100_000_000" myprogram.9x8
730
 
731
specifies that a frequency of 100 MHz be used instead of the default frequency
732
of 97 MHz.
733
 
734
The LOCALPARAM configuration command can be used to specify parameters that
735
should not be changed by the surrounding HDL.  For example,
736
 
737
  LOCALPARAM L_VERSION 24'h00_00_00
738
 
739
specifies a 24-bit parameter named "L_VERSION".  The 8-bit major, minor, and
740
build sections of the parameter can be accessed in an assembly program using
741
"L_VERSION[16+:8]", "L_VERSION[8+:8]", and "L_VERSION[0+:8]".
742
 
743
For both parameters and localparams, the default range is "[0+:8]".  The
744
instruction memory is initialized using the parameter value during synthesis,
745
not the value used to initialize the parameter.  That is, the instruction memory
746
initialization will be:
747
 
748
  s_opcodeMemory[...] = { 1'b1, L_VERSION[16+:8] };
749
 
750
The value of the localparam can be set when the computer compiler is run using
751
the "-G" option.  For example,
752
 
753
  ssbcc -G "L_VERSION=24'h01_04_03" myprogram.9x8
754
 
755
can be used in a makefile to set the version number for a release without
756
modifying the micro controller architecture file.
757
 
758
 
759
DIAGNOSTICS AND DEBUGGING
760
================================================================================
761
 
762
A 3-character, human readable version of the opcode can be included in
763
simulation waveform outputs by adding "--display-opcode" to the ssbcc command.
764
 
765
The stack health can be monitored during simulation by including the
766
"monitor_stack" peripheral through the command line.  For example, the LED
767
flasher example can be generated using
768
 
769
  ssbcc -P monitor_stack led.9x8
770
 
771
This allows the architecture file to be unchanged between simulation and an FPGA
772
build.
773
 
774
Stack errors include underflow and overflow, malformed data validity, and
775
incorrect use of the values on the return stack (returns to data values and data
776
operations on return addresses).  Other errors include out-of-range for memory,
777
inport, and outport operations.
778
 
779
When stack errors are detected the last 50 instructions are dumped to the
780
console and the simulation terminates.  The dump includes the PC, numeric
781
opcode, textual representation of the opcode, data stack pointer, next-to-top of
782
the data stack, top of the data stack, top of the return stack, and the return
783
stack pointer.  Invalid stack values are displayed as "XX".  The length of the
784
history dumped is configurable.
785
 
786
Out-of-range PC checks are also performed if the instruction space is not a
787
power of 2.
788
 
789
A "trace" peripheral is also provided that dumps the entire execution history.
790
This was used to validate the processor core.
791
 
792
 
793
MEMORY ARCHITECTURE
794
================================================================================
795
 
796
The DATA_STACK, RETURN_STACK, INSTRUCTION, and MEMORY configuration commands
797
allocate memory for the data stack, return stack, instruction ROM, and memory
798
RAM and ROM respectively.  The data stack, return stack, and memories are
799
normally instantiated as dual-port LUT-based memories with asynchronous reads
800
while the instruction memory is always instantiated with a synchronous read
801
architecture.
802
 
803
The COMBINE configuration command is used to coalesce memories and to convert
804
LUT-based memories to synchronous SRAM-based memories.  For example, the large
805
SRAMs in modern FPGAs are ideal for storing the instruction opcodes and their
806
dual-ported access allows either the data stack or the return stack to be
807
stored in a relatively small region at the end of the large instruction memory.
808
Memories, which required dual-ported operation, can also be instantiated in
809
large RAMs either individually or in combination with each other.  Conversion
810
to SRAM-based memories is also useful for FPGA architectures that do not have
811
efficient LUT-based memories.
812
 
813
The INSTRUCTION configuration allocates memory for the processor instruction
814
space.  It has the form "INSTRUCTION N" or "INSTRUCTION N*M" where N must be a
815
power of 2.  The first form is used if the desired instruction memory size is a
816
power of 2.  The second form is used to allocate M memory blocks of size N
817
where M is not a power of 2.  For example, on an Altera Cyclone III, the
818
configuration command "INSTRUCTION 1024*3" allocates three M9Ks for the
819
instruction space, saving one M9K as compared to the configuration command
820
"INSTRUCTION 4096".
821
 
822
The DATA_STACK configuration command allocates memory for the data stack.  It
823
has the form "DATA_STACK N" where N is the commanded size of the data stack.
824
N must be a power of 2.
825
 
826
The RETURN_STACK configuration command allocates memory for the return stack and
827
has the same format as the DATA_STACK configuration command.
828
 
829
The MEMORY configuration command is used to define one to four memories, either
830
RAM or ROM, with up to 256 bytes each.  If no MEMORY configuration command is
831
issued, then no memories are allocated for the processor.  The MEMORY
832
configuration command has the format "MEMORY {RAM|ROM} name N" where
833
"{RAM|ROM}" specifies either a RAM or a ROM, name is the name of the memory and
834
must start with an alphabetic character, and the size of the memory, N, must be
835
a power of 2.  For example, "MEMORY RAM myram 64" allocates 64 bytes of memory
836
to form a RAM named myram.  Similarly, "MEMORY ROM lut 256" defines a 256 byte
837
ROM named lut.  More details on using memories is provided in the next section.
838
 
839
The COMBINE configuration command can be used to combine the various memories
840
for more efficient processor implementation as follows:
841
 
842
  COMBINE INSTRUCTION,
843
  COMBINE 
844
  COMBINE ,
845
  COMBINE 
846
 
847
where  is one of DATA_STACK, RETURN_STACK, or a list of one
848
or more ROMs and  is a list of one or more RAMs and/or ROMs.  The first
849
configuration command reserves space at the end of the instruction memory for
850
the DATA_STACK, RETURN_STACK, or listed ROMs.
851
 
852
The SRAM_WIDTH configuration command is used to make the memory allocations more
853
efficient when the SRAM block width is more than 9 bits.  For example,
854
Altera's Cyclone V family has 10-bit wide memory blocks and the configuration
855
command "SRAM_WIDTH 10" is appropriate.  The configuration command
856
sequence
857
 
858
  INSTRUCTION     1024
859
  RETURN_STACK    32
860
  SRAM_WIDTH      10
861
  COMBINE         INSTRUCTION,RETURN_STACK
862
 
863
will use a single 10-bit memory entry for each element of the return stack
864
instead of packing the 10-bit values into two memory entries of a 9-bit wide
865
memory.
866
 
867
The following illustrates a possible configuration for a Spartan-6 with a
868
2048-long SRAM and relatively large 64-deep data stack.  The data stack will be
869
in the last 64 elements of the instruction memory and the instruction space will
870
be reduced to 1984 words.
871
 
872
  INSTRUCTION   2048
873
  DATA_STACK    64
874
  COMBINE       INSTRUCTION,DATA_STACK
875
 
876
The following illustrates a possible configuration for a Cyclone-III with three
877
M9Ks for the instruction ROM and the data stack.
878
 
879
  INSTRUCTION   1024*3
880
  DATA_STACK    64
881
  COMBINE       INSTRUCTION,DATA_STACK
882
 
883
WARNING:  Some devices, such as Xilinx' Spartan-3A devices, do not support
884
asynchronous reads, so the COMBINE configuration command does not work for them.
885
 
886
WARNING:  Xilinx XST does not correctly infer a Block RAM when the
887
"COMBINE INSTRUCTION,RETURN_STACK" configuration command is used and the
888
instruction space is 1024 instructions or larger.  Xilinx is supposed to fix
889
this in a future release of Vivado so the fix will only apply to 7-series or
890
later FPGAs.
891
 
892
 
893
MEMORY
894
================================================================================
895
 
896
The MEMORY configuration command is used as follows to allocate a 128-byte RAM
897
named "myram" and to allocate a 32-byte ROM named "myrom".  Zero to four
898
memories can be allocated, each with up to 256 bytes.
899
 
900
  MEMORY RAM myram 128
901
  MEMORY ROM myrom  32
902
 
903
The assembly code to lay out the memory uses the ".memory" directive to identify
904
the memory and the ".variable" directive to identify the symbol and its content.
905
Single or multiple values can be listed and "*N" can be used to identify a
906
repeat count.
907
 
908
  .memory RAM myram
909
  .variable a 0
910
  .variable b 0
911
  .variable c 0 0 0 0
912
  .variable d 0*4
913
 
914
  .memory ROM myrom
915
  .variable coeff_table 0x04
916
                        0x08
917
                        0x10
918
                        0x20
919
  .variable hello_world N"Hello World!\r\n"
920
 
921
Single values are fetched from or stored to memory using the following assembly:
922
 
923
  .fetchvalue(a)
924
  0x12 .storevalue(b)
925
 
926
Multi-byte values are fetched or stored as follows.  This copies the four values
927
from coeff_table, which is stored in a ROM, to d.
928
 
929
  .fetchvector(coeff_table,4) .storevector(d,4)
930
 
931
The memory size is available using computed values (see below) and can be used
932
to clear the entire memory, etc.
933
 
934
The available single-cycle memory operation macros are:
935
  .fetch(mem_name)      replaces T with the value at the address T in the memory
936
                        mem_name
937 5 sinclairrf
                        Note:  .fetchram(var_name) is safer.
938 2 sinclairrf
  .fetch+(mem_name)     pushes the value at address T in the memory mem_name
939
                        into the data stack below T and increments T
940
                        Note:  This is useful for fetching successive values
941
                               from memory into the data stack.
942 5 sinclairrf
                        Note:  .fetchram+(var_name) is safer.
943 2 sinclairrf
  .fetch-(mem_name)     similar to .fetch+ but decrements T
944 5 sinclairrf
                        Note:  .fetchram-(var_name) is safer.
945 2 sinclairrf
  .store(ram_name)      stores N at address T in the RAM ram_name, also drops
946
                        the top of the data stack
947 5 sinclairrf
                        Note:  .storeram(var_name) is safer.
948 2 sinclairrf
  .store+(ram_name)     stores N at address T in the RAM ram_name, also drops N
949
                        from the data stack and increments T
950 5 sinclairrf
                        Note:  .storeram+(var_name) is safer.
951 2 sinclairrf
  .store-(ram_name)     similar to .store+ but decrements T
952 5 sinclairrf
                        Note:  .storeram-(var_name) is safer.
953 2 sinclairrf
 
954
The following multi-cycle macros provide more generalized access to the
955
memories:
956
  .fetchindexed(var_name)
957
                        uses the top of the data stack as an index into var_name
958
                        Note:  This is equivalent to the 3 instruction sequence
959
                               "var_name + .fetch(mem_name)"
960
  .fetchoffset(var_name,offset)
961
                        fetches the single-byte value of var_name offset by
962
                        "offset" bytes
963
                        Note:  This is equivalent to
964 11 sinclairrf
                                 "${var_name+offset} .fetch(mem_name)"
965
                               where mem_name is the memory in which var_name is
966
                               stored.
967 5 sinclairrf
  .fetchram(var_name)   is similar to the .fetch(mem_name) macro except that the
968
                        variable name is used to identify the memory instead of
969
                        the name of the memory
970
  .fetchram+(var_name)  is similar to the .fetch+(mem_name) macro except that
971
                        the variable name is used to identify the memory instead
972
                        of the name of the memory
973
  .fetchram-(var_name)  is similar to the .fetch-(mem_name) macro except that the
974
                        the variable name is used to identify the memory instead
975
                        of the name of the memory
976
  .fetchvalue(var_name) fetches the single-byte value of var_name
977
                        Note:  This is equivalent to
978 11 sinclairrf
                                 "var_name .fetch(mem_name)"
979 5 sinclairrf
                               where mem_name is the memory in which var_name is
980
                               stored.
981 2 sinclairrf
  .fetchvector(var_name,N)
982
                        fetches N values starting at var_name into the data
983
                        stack with the value at var_name at the top and the
984
                        value at var_name+N-1 deep in the stack.
985
                        Note:  This is equivalent N+1 operation sequence
986 11 sinclairrf
                                 "${var_name+N-1} .fetch-(mem_name) ...
987
                                 .fetch-(mem_name) .fetch(mem_name)"
988 2 sinclairrf
                               where ".fetch-(mem_name)" is repeated N-1 times.
989
  .storeindexed(var_name)
990
                        uses the top of the data stack as an index into var_name
991
                        into which to store the next-to-top of the data stack.
992
                        Note:  This is equivalent to the 4 instruction sequence
993 11 sinclairrf
                                 "var_name + .store(mem_name) drop".
994 2 sinclairrf
                        Note:  The default "drop" instruction can be overriden
995
                               by providing the optional second argument
996
                               similarly to the .storevalue macro.
997
  .storeoffset(var_name,offset)
998
                        stores the single-byte value at the top of the data
999
                        stack at var_name offset by "offset" bytes
1000
                        Note:  This is equivalent to
1001 11 sinclairrf
                                 "${var_name+offset} .store(mem_name) drop"
1002
                               where mem_name is the memory in which var_name is
1003
                               stored.
1004
                        Note:  The default "drop" instruction can be replaced by
1005
                               providing the optional third argument.
1006 5 sinclairrf
  .storeram(var_name)   is similar to the .store(mem_name) macro except that the
1007
                        variable name is used to identify the RAM instead of the
1008
                        name of the RAM
1009
  .storeram+(var_name)  is similar to the .store+(mem_name) macro except that
1010
                        the variable name is used to identify the RAM instead of
1011
                        the name of the RAM
1012
  .storeram-(var_name)  is similar to the .store-(mem_name) macro except that
1013
                        the variable name is used to identify the RAM instead of
1014
                        the name of the RAM
1015
  .storevalue(var_name) stores the single-byte value at the top of the data
1016
                        stack at var_name
1017
                        Note:  This is equivalent to
1018 11 sinclairrf
                                 "var_name .store(mem_name) drop"
1019 5 sinclairrf
                        Note:  The default "drop" instruction can be replaced by
1020
                               providing the optional second argument.  For
1021
                               example, the following instruction will store and
1022
                               then decrement the value at the top of the data
1023
                               stack:
1024
                                 .storevalue(var_name,1-)
1025 2 sinclairrf
  .storevector(var_name,N)
1026
                        Does the reverse of the .fetchvector macro.
1027
                        Note:  This is equivalent to the N+2 operation sequence
1028 11 sinclairrf
                                 "var_name .store+(mem_name) ... .store+(mem_name)
1029
                                 .store(mem_name) drop"
1030 2 sinclairrf
                               where ".store+(mem_name)" is repeated N-1 times.
1031
 
1032
The .fetchvector and .storevector macros are intended to work with values stored
1033
MSB first in memory and with the MSB toward the top of the data stack,
1034
similarly to the Forth language with multi-word values.  To demonstrate how
1035
this data structure works, consider the examples of decrementing and
1036
incrementing a two-byte value on the data stack:
1037
 
1038
  ; Decrement a 2-byte value
1039
  ;   swap 1- swap      - decrement the LSB
1040
  ;   over -1=          - puts -1 on the top of the data stack if the LSB rolled
1041
  ;                       over from 0 to -1, puts 0 on the top otherwise
1042
  ;   +                 - decrements the MSB if the LSB rolled over
1043
  ; ( u_LSB u_MSB - u_LSB' u_MSB' )
1044
  .function decrement_2byte
1045
  swap 1- swap over -1= .return(+)
1046
 
1047
  ; Increment a 2-byte value
1048
  ;   swap 1+ swap      - increment the LSB
1049
  ;   over 0=           - puts -1 on the top of the data stack if the LSB rolled
1050
  ;                       over from 0xFF to 0, puts 0 on the top otherwise
1051
  ;   -                 - increments the MSB if the LSB rolled over (by
1052
  ;                       subtracting -1)
1053
  ; ( u_LSB u_MSB - u_LSB' u_MSB' )
1054
  .function increment_2byte
1055
  swap 1+ swap over 0= .return(-)
1056
 
1057
 
1058
COMPUTED VALUES
1059
================================================================================
1060
 
1061
Computed values can be pushed on the stack using a "${...}" where the "..." is
1062
evaluated in Python and cannot have any spaces.
1063
 
1064
For example, a loop that should be run 5 times can be coded as:
1065
 
1066
  ${5-1} :loop ... .jumpc(loop,1-) drop
1067
 
1068
which is a clearer indication that the loop is to be run 5 times than is the
1069
instruction sequence
1070
 
1071
  4 :loop ...
1072
 
1073
Constants can be accessed in the computation.  For example, a block of memory
1074
can be allocated as follows:
1075
 
1076
  .constant C_RESERVE
1077
  .memory RAM myram
1078
  ...
1079
  .variable reserved 0*${C_RESERVE}
1080
 
1081
and the block of reserved memory can be cleared using the following loop:
1082
 
1083
  ${C_RESERVE-1} :loop 0 over .storeindexed(reserved) .jumpc(loop,1-) drop
1084
 
1085
The offsets of variables in their memory can also be accessed through a computed
1086
value.  The value of reserved could also be cleared as follows:
1087
 
1088
  ${reserved-1} ${C_RESERVE-1} :loop >r
1089
 
1090
  r> .jumpc(loop,-1) drop drop
1091
 
1092
This body of this version of the loop is the same length as the first version.
1093
In general, it is better to use the memory macros to access variables as they
1094
ensure the correct memory is accessed.
1095
 
1096
The sizes of memories can also be accessed using computed values.  If "myram" is
1097
a RAM, then "${size['myram']}" will push the size of "myram" on the stack.  As
1098
an example, the following code will clear the entire RAM:
1099
 
1100
  ${size['myram']-1} :loop 0 swap .jumpc(loop,.store-(myram)) drop
1101
 
1102
The lengths of I/O signals can also be accessed using computed values.  If
1103
"o_mask" is a mask, then "${size['o_mask']}" will push the size of the mask on
1104
the stack and "${2**size['o_mask']-1}" will push a value that sets all the bits
1105
of the mask.  The I/O signals include I/O signals instantiated by peripherals.
1106
For example, for the configuration command
1107
 
1108
  PERIPHERAL big_outport outport=O_BIG outsignal=o_big width=47
1109
 
1110
the width of the output signal is accessible using "${size['o_big']}".  You can
1111
set the wide signal to all zeroes using:
1112
 
1113
  ${(size['o_big']+7)/8-1} :loop 0 .outport(O_BIG) .jumpc(loop,1-) drop
1114
 
1115 3 sinclairrf
 
1116
MACROS
1117
================================================================================
1118 9 sinclairrf
 
1119 3 sinclairrf
There are 3 types of macros used by the assembler.
1120
 
1121
The first kind of macros are built in to the assembler and are required to
1122
encode instructions that have embedded values or have mandatory subsequent
1123
instructions.  These include function calls, jump instructions, function return,
1124
and memory accesses as follows:
1125
  .call(function,[op])
1126
  .callc(function,[op])
1127
  .fetch(ramName)
1128
  .fetch+(ramName)
1129
  .fetch-(ramName)
1130
  .jump(label,[op])
1131
  .jumpc(label,[op])
1132
  .return([op])
1133
  .store(ramName)
1134
  .store+(ramName)
1135
  .store-(ramName)
1136
 
1137
The second kind of macros are designed to ease access to input and output
1138
operations and for memory accesses and to help ensure these operations are
1139
correctly constructed.  These are defined as python scripts in the
1140
core/9x8/macros directory and are automatically loaded into the assembler.
1141
These macros are:
1142
  .fetchindexed(variable)
1143
  .fetchoffset(variable,ix)
1144
  .fetchvalue(variableName)
1145
  .fetchvector(variableName,N)
1146
  .inport(I_name)
1147
  .outport(O_name[,op])
1148
  .outstrobe(O_name)
1149
  .storeindexed(variableName[,op])
1150
  .storeoffset(variableName,ix[,op])
1151
  .storevalue(variableName[,op])
1152
  .storevector(variableName,N)
1153
 
1154
The third kind of macro is user-defined macros.  These macros must be registered
1155
with the assembler using the ".macro" directive.
1156
 
1157
For example, the ".push32" macro is defined by macros/9x8/push32.py and can be
1158
used to push 32-bit (4-byte) values onto the data stack as follows:
1159
 
1160
  .macro push32
1161
  .constant C_X 0x87654321
1162
  .main
1163
    ...
1164
    .push32(0x12345678)
1165
    .push32(C_X)
1166
    .push32(${0x12345678^C_X})
1167
    ...
1168
 
1169
The following macros are provided in macros/9x8:
1170
  .push16(v)    push the 16-bit (2-byte) value "v" onto the data stack with the
1171
                MSB at the top of the data stack
1172 4 sinclairrf
  .push24(v)    push the 24-bit (3-byte) value "v" onto the data stack with the
1173
                MSB at the top of the data stack
1174 3 sinclairrf
  .push32(v)    push the 32-bit (4-byte) value "v" onto the data stack with the
1175
                MSB at the top of the data stack
1176 4 sinclairrf
  .pushByte(v,ix)
1177
                push the ix'th byte of v onto the data stack
1178
                Note:  ix=0 designates the LSB
1179 3 sinclairrf
 
1180
Directories are searched in the following order for macros:
1181
  .
1182
  ./macros
1183
  include paths specified by the '-M' command line option.
1184
  macros/9x8
1185
 
1186
The python scripts in core/9x8/macros and macros/9x8 can be used as design
1187
examples for user-defined macros.  The assembler does some type checking based
1188
on the list provided when the macro is registered by the "AddMacro" method, but
1189
additional type checking is often warranted by the macro "emitFunction" which
1190
emits the actual assembly code.  The ".fetchvector" and ".storevector" macros
1191 4 sinclairrf
demonstrates how to design variable-length macros.  Several macros in
1192
core/9x8/macros illustrate designing macros with optional arguments.
1193 3 sinclairrf
 
1194
It is not an error to repeat the ".macro MACRO_NAME" directive for user-defined
1195
macros.  The assembler will issue a fatal error if a user-defined macro
1196
conflicts with a built-in macro.
1197
 
1198
 
1199 2 sinclairrf
CONDITIONAL COMPILATION
1200
================================================================================
1201
 
1202 9 sinclairrf
Conditional compilation is accepted in the architecture file and in the assembly
1203
source files and is based on whether or not the referenced symbol is defined.
1204 2 sinclairrf
 
1205 9 sinclairrf
In the architecture file, symbols are defined by CONSTANT, INPORT, OUTPORT, and
1206
PERIPHERAL statements, or by the "-D D_name" argument to ssbcc.  The line
1207 2 sinclairrf
 
1208 9 sinclairrf
  .IFDEF name
1209
  
1210
  .ENDIF
1211
 
1212
will then include the lines "" if the symbol "name" is defined.
1213
The ".IFNDEF" command is similar except that the statements included if the
1214
symbol is not defined.  A ".ELSE" is also provided.
1215
 
1216
In an assembly file directives are conditionally included using the .IFDEF and
1217
.IFNDEF directives, an optional .ELSE directive, and the terminating .ENDIF
1218
directive.  Note that this is done at the directive level, i.e. function
1219
declarations, memory declarations, and so forth.  Within a function code is
1220
conditionally included starting with a ".ifdef(name)" or a ".ifndef(name)", an
1221
optional ".else" and a terminating ".endif".
1222
 
1223
For example, a diagnostic UART can be conditionally included in a program by
1224
including the following lines in the architecture file:
1225
 
1226
  .IFDEF D_ENABLE_UART
1227 2 sinclairrf
  PORTCOMMENT Diagnostic UART
1228
  PERIPHERAL UART_Tx outport=O_UART_TX ...
1229
  .ENDIF
1230
 
1231 9 sinclairrf
A "uart_tx" function can be optionally created using code similar to the
1232
following.  Note that the symbol for the .outport macro is used to determine
1233
whether or not the function is defined since that is more closely related to
1234
whether or not the function can be defined.  The function definition must
1235
preceed any ".ifdef(uart_tx)" conditionals used to output diagnostics.
1236 2 sinclairrf
 
1237 9 sinclairrf
  .IFDEF O_UART_TX
1238
  .function uart_tx
1239
    :loop .outport(O_UART_TX) .jumpc(loop,nop) .return(drop)
1240
  .ENDIF
1241 2 sinclairrf
 
1242 9 sinclairrf
Diagnostics in the assembly code can be included using either
1243 2 sinclairrf
 
1244 9 sinclairrf
  .ifdef(D_ENABLE_UART) N"Msg\r\n" .call(uart_tx) .endif
1245 2 sinclairrf
 
1246 9 sinclairrf
or
1247
 
1248
  .ifdef(uart_tx) N"Msg\r\n" .call(uart_tx) .endif
1249
 
1250
Conditional compilation cannot cross file boundaries.
1251
 
1252
The assembler also recognizes the ".define" directive.  For example, specific
1253
diagnostics could be enabled if the UART is instantiated as follows:
1254
 
1255
  .IFDEF O_UART_TX
1256
  .define D_DEBUG_FUNCTION_A
1257
  .ENDIF
1258
 
1259
  ...
1260
 
1261
  .ifdef(D_DEBUG_FUNCTION_A) N"Debug Msg\r\n" .call(uart_tx) .endif
1262
 
1263
The following code illustrates how to preclude multiple attempted inclusions of
1264
an assembly library file.
1265
 
1266 2 sinclairrf
  ; put these two lines near the top of the file
1267 9 sinclairrf
  .IFNDEF D_FILENAME_INCLUDED
1268
  .define D_FILENAME_INCLUDED
1269 2 sinclairrf
  ; put the library body here
1270
  ...
1271
  ; put this line at the bottom of the file
1272 9 sinclairrf
  .ENDIF ; .IFNDEF D_FILENAME_INCLUDED
1273 2 sinclairrf
 
1274
The ".INCLUDE" configuration command can be used to read configuration commands
1275 9 sinclairrf
from additional sources.  For example, the following code will conditionally
1276
include a general UART library if the outport O_UART_TX is defined:
1277 2 sinclairrf
 
1278 9 sinclairrf
  .IFDEF O_UART_TX
1279
  .INCLUDE uart.s
1280
  .ENDIF
1281 2 sinclairrf
 
1282 9 sinclairrf
 
1283 2 sinclairrf
SIMULATIONS
1284
================================================================================
1285
 
1286
Simulations have been performed with Icarus Verilog, Verilator, and Xilinx'
1287
ISIM.  Icarus Verilog is good for short, simple simulations and is used for the
1288
core and peripheral test benches; Verilator for long simulations of large,
1289
complex systems; and ISIM when Xilinx-specific cores are used.  Verilator is
1290
the fastest simulators I've encountered.  Verilator is also used for lint
1291
checking in the core test benches.
1292
 
1293
 
1294
MEM INITIALIZATION FILE
1295
================================================================================
1296
 
1297
A memory initialization file is produced during compilation.  This file can be
1298
used with tools such as Xilinx' data2mem to modify the SRAM contents without
1299
having to rebuild the entire system.  It is restricted to the opcode memory
1300
initialization.  The file must be processed before it can be used by specific
1301
tools, see doc/MemoryInitialization.html.
1302
 
1303
WARNING:  The values of parameters used in the assembly code must match the
1304
instantiated design.
1305
 
1306
 
1307
THEORY OF OPERATION
1308
================================================================================
1309
 
1310
Registers are used for the top of data stack, "T", and the next-to-top of the
1311
data stack, "N".  The data stack is a separate memory.  This means that the
1312
"DATA_STACK N" configuration command actually allows N+2 values in the data
1313
stack since T and N are not stored in the N-element deep data stack.
1314
 
1315
The return stack is similar in that "R" is the top of the return stack and the
1316
"RETURN_STACK N" allocates an additional N words of memory.  The return stack is
1317
the wider of the 8-bit data width and the program counter width.
1318
 
1319
The program counter is always either incremented by 1 or is set to an address
1320
as controlled by jump, jumpc, call, callc, and return instructions.  The
1321
registered program counter is used to read the next opcode from the instruction
1322
memory and this opcode is also registered in the memory.  This means that there
1323
is a 1 clock cycle delay between the address changing and the associated
1324
instruction being performed.  This is also part of the architecture required to
1325
have the processor operate at one instruction per clock cycle.
1326
 
1327
Separate ALUs are used for the program counter, adders, logical operations, etc.
1328
and MUXes are used to select the values desired for the destination registers.
1329
The instruction execution consists of translating the upper 6 msb of the opcode
1330
into MUX settings and performing opcode-dependent ALU operations as controlled
1331
by the 3 lsb of the opcode (during the first half of the clock cycle) and then
1332
setting the T, N, R, memories, etc. as controlled by the computed MUX settings.
1333
 
1334
The "core.v" file is the code for these operations.  Within this file there are
1335
several "@xxx@" strings that specify where the computer compiler is to insert
1336
code such as I/O declarations, memories, inport interpretation, outport
1337
generation, peripherals, etc.
1338
 
1339
The file structure, i.e., putting the core and the assembler in "core/9x8"
1340
should facilitate application-specific modification of processor.  For example,
1341
the store+, store-, fetch+, and fetch- instructions could be replaced with
1342
additional stack manipulation operations, arithmetic operations with 2 byte
1343
results, etc.  Simply copy the "9x8" directory to something like "9x8_XXX" and
1344
make your modifications in that directory.  The 8-bit peripherals should still
1345
work, but the 9x8 library functions may need rework to accommodate the
1346
modifications.
1347
 
1348
 
1349
MISCELLANEOUS
1350
================================================================================
1351
 
1352 4 sinclairrf
Features and peripherals are still being added and the documentation is
1353
incomplete.  The output HDL is currently restricted to Verilog although a VHDL
1354
package file is automatically generated by the computer compiler.
1355
 
1356 2 sinclairrf
The "INVERT_RESET" configuration command is used to indicate an active-low reset
1357
is input to the micro controller rather than an active-high reset.

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.