OpenCores
URL https://opencores.org/ocsvn/ssbcc/ssbcc/trunk

Subversion Repositories ssbcc

[/] [ssbcc/] [trunk/] [README] - Blame information for rev 3

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 sinclairrf
SSBCC.9x8 is a free Small Stack-Based Computer Compiler with a 9-bit opcode,
2
8-bit data core.  It creates vendor-independent, high-speed, low fabric
3
utilization micro controllers for FPGAs.  It has been used in Spartan-3A,
4
Spartan-6, Virtex-6, and Artix-7 FPGAs and has been built for Altera, Lattice,
5
and other Xilinx devices.  It is faster and usually smaller than vendor provided
6
processors.
7
 
8
The compiler takes an architecture file that describes the micro controller
9
memory spaces, inputs and outputs, and peripherals and which specifies the HDL
10
language and source assembly.  It generates a single HDL module implementing
11
the entire micro controller.  No user-written HDL is required to instantiate
12
I/Os, program memory, etc.
13
 
14
SSBCC has been used for the following projects:
15
- operate a media translator from a parallel camera interface to an OMAP GPMC
16
  interface, detect and report bus errors and hardware errors, and act as an
17
  SPI slave to the OMAP
18
- operate two UART interfaces and multiple PWM controlled 2-lead bi-color LEDs
19
- operate and monitor the Artix-7 fabric in a Zynq system using AXI4-Lite
20
  master and slave buses, I2C buses for timing-critical voltage measurements
21
 
22
 
23
DESCRIPTION
24
================================================================================
25
 
26
The computer compiler uses an architectural description of the processor stating
27
the sizes of the instruction memory, data stack, and return stack; the input and
28
output ports; RAM and ROM types and sizes; and peripherals.
29
 
30
The instructions are all single-cycle.  The instructions include
31
- pushing an 8-bit value onto the data stack
32
- arithmetic operations:  addition, subtraction, increment, and decrement
33
- bit-wise logical operations:  and, or, and exclusive or
34
- rotations
35
- logical operations:  0=, 0<>, -1=, -1<>
36
- Forth-like data stack operations:  dup, over, swap, drop, nip
37
- Forth-like return stack operations:  >r, r>, r@
38
- input and output port operations
39
- memory read and write with optional address post increment and post decrement
40
- jumps and conditional jumps
41
- calls and conditional calls
42
- function return
43
 
44
The 9x8 address space is up to 8K.  This is achieved by pushing the 8 lsb of the
45
target address onto the data stack immediately before the jump or call
46
instruction and by encoding the 5 msb of the address within the jump or call
47
instruction.  The instruction immediately following a jump, call, or return is
48
executed before the instruction sequence at the destination address is executed
49
(this is illustrated later).
50
 
51
Up to four banks of memory, either RAM or ROM, are available.  Each of these can
52
be up to 256 bytes long, providing a total of up to 1 kB of memory.
53
 
54
The assembly language is Forth-like.  Macros are used to encode the jump and
55
call instructions and to encode the 2-bit memory bank index in memory store and
56
fetch instructions.
57
 
58
The computer compiler and assembler are written in Python 2.7.  Peripherals are
59
implemented by Python modules which generate the I/O ports and the peripheral
60
HDL.
61
 
62
The computer compiler is documented in the doc directory.  The 9x8 core is
63
documented in the core/9x8/doc directory.  Several examples are provided.
64
 
65
The computer compiler and assembler are fully functional and there are no known
66
bugs.
67
 
68
Features and peripherals are still being added and the documentation is
69
incomplete.  The output HDL is currently restricted to Verilog although a VHDL
70
package file is automatically generated by the computer compiler.
71
 
72
 
73
SPEED AND RESOURCE UTILIZATION
74
================================================================================
75
These device speed and resource utilization results are copied from the build
76
tests.  The full results are listed in core/9x8/build/uc/uc_led.9x8 which
77
represents a minimal processor implementation (clock, reset, and one output).
78
See the uc_peripherals.9x8 file for results for a more complicated
79
implementation.  Device-specific scripts state how these performance numbers
80
were obtained.
81
 
82
VENDOR          DEVICE          BEST SPEED      SMALLEST RESOURCE UTILIZATION
83
------          ------          ----------      -------------------------------
84
Altera          Cyclone-III     190.6 MHz       282 LEs           (preliminary)
85
Altera          Cyclone-IV      192.1 MHz       281 LEs           (preliminary)
86
Altera          Stratix-V       372.9 MHz       198 ALUTs         (preliminary)
87
Lattice         LCMXO2-640ZE-3   98.4 MHz       206 LUTs          (preliminary)
88
Lattice         LFE2-6E-7       157.9 MHz       203 LUTs          (preliminary)
89
Xilinx          Spartan-3A      148.3 MHz       130 slices, 231 4-input LUTS
90
Xilinx          Spartan-6       200.0 MHz        36 slices, 120 Slice LUTs
91
Xilinx          Virtex-6        275.7 MHz        38 slices, 122 Slice LUTs (p.)
92
 
93
Disclaimer:  Like other embedded processors, these are the maximum performance
94
claims.  Realistic implementations will produce slower maximum clock rates,
95
particularly with lots of I/O ports and peripherals and with the constraint of
96
existing with other subsystems in the FPGA fabric.  What these performance
97
numbers do provide is an estimate of the amount of slack available.  For
98
example, you can't realistically expect to get 110 MHz from a processor that,
99
under ideal conditions, routes and places at 125 MHz, but you can with a
100
processor that synthesizes at more than 150 MHz.
101
 
102
 
103
EXAMPLE:
104
================================================================================
105
 
106
The LED flasher example demonstrates the simplicity of the architectural
107
specification and the Forth-like assembly language.
108
 
109
The architecture file, named "led.9x8", with the comments and user header
110
removed, is as follows:
111
 
112
  ARCHITECTURE    core/9x8 Verilog
113
 
114
  INSTRUCTION     2048
115
  RETURN_STACK    32
116
  DATA_STACK      32
117
 
118
  PORTCOMMENT LED on/off signal
119
  OUTPORT 1-bit o_led O_LED
120
 
121
  ASSEMBLY led.s
122
 
123
The ARCHITECTURE configuration command specifies the 9x8 core and the Verilog
124
language.  The INSTRUCTION, RETURN_STACK, and DATA_STACK configuration commands
125
specify the sizes of the instruction space, return stack, and data stack.  The
126
content of the PORTCOMMENT configuration command is inserted in the module
127
declaration -- this facilitates identifying signals in micro controllers with a
128
lot of inputs and outputs.  The single OUTPORT statement specifies a 1-bit
129
signal named "o_led".  This signal is accessed in the assembly code through the
130
symbol "O_LED".  The ASSEMBLY command specifies the single input file "led.s,"
131
which is listed below.  The output module will be "led.v"
132
 
133
The "led.s" assembly file is as follows:
134
 
135
  ; Consume 256*5+4 clock cycles.
136
  ; ( - )
137
  .function pause
138
 
139
  .return
140
 
141
  ; Repeat "pause" 256 times.
142
  ; ( - )
143
  .function repause
144
 
145
  .return
146
 
147
  ; main program (as an infinite loop)
148
  .main
149
 
150
 
151
This example is coded in a traditional Forth structure with the conditional
152
jumps consuming the top of the data stack.  Examining the "pause" function, the
153
".function" directive specifies the start of a function and the function name.
154
The "0" instruction pushes the value "0" onto the top of the data stack.
155
":inner" is a label for a jump instruction.  The "1-" instruction decrements the
156
top of the data stack.  "dup" is the Forth instruction to push a duplicate of
157
the top of the data stack onto the data stack.  The ".jumpc(inner)" macro
158
expands to three instructions as follows:  (1) push the 8 lsb of the address at
159
"inner" onto the data stack, (2) the conditional jump instruction with the 5 msb
160
of the address of "inner" (the jumpc instruction also drops the top of the data
161
stack with its partial address), and (3) a "drop" instruction to drop the
162
duplicated loop count from the top of the data stack.  Finally, the "drop"
163
instruction drops the loop count from the top of the data stack and the
164
".return" macro generates the "return" instruction and a "nop" instruction.
165
 
166
The function "repause" calls the "pause" function 256 times.  The main program
167
body is identified by the directive ".main"  This function runs an infinite loop
168
that toggles the lsb of the LED output, outputs the LED setting, and calls the
169
"repause" function.
170
 
171
A tighter version of the loop in the "pause" function can be written as
172
 
173
  ; Consume 256*3+3 clock cycles.
174
  ; ( - )
175
  .function pause
176
    0xFF :inner .jumpc(inner,1-) .return(drop)
177
 
178
which is 3 cycles long for each iteration, the "drop" that is normally part
179
of the ".jumpc" macro has been replaced by the decrement instruction, and the
180
final "drop" instruction has replaced the default "nop" instruction that is
181
normally part of the ".return" macro.  Note that the decrement is performed
182
after the non-zero comparison in the "jumpc" instruction.
183
 
184
A version of the "pause" function that consumes exactly 1000 clock cycles is:
185
 
186
  .function pause
187
    ${(1000-4)/4-1} :inner nop .jumpc(inner,1-) drop .return
188
 
189
The instruction memory initialization for the processor module includes the
190
instruction mnemonics being performed at each address and replaces the "list"
191
file output from traditional assemblers.  The following is the memory
192
initialization for this LED flasher example.  The main program always starts at
193
address zero and functions are included in the order encountered.  Unused
194
library functions are not included in the generated instruction list.
195
 
196
  reg [8:0] s_opcodeMemory[2047:0];
197
  initial begin
198
    // .main
199
    s_opcodeMemory['h000] = 9'h100; // 0x00
200
    s_opcodeMemory['h001] = 9'h101; // :inner 0x01
201
    s_opcodeMemory['h002] = 9'h052; // ^
202
    s_opcodeMemory['h003] = 9'h008; // dup
203
    s_opcodeMemory['h004] = 9'h100; // O_LED
204
    s_opcodeMemory['h005] = 9'h038; // outport
205
    s_opcodeMemory['h006] = 9'h054; // drop
206
    s_opcodeMemory['h007] = 9'h10D; //
207
    s_opcodeMemory['h008] = 9'h0C0; // call repause
208
    s_opcodeMemory['h009] = 9'h000; // nop
209
    s_opcodeMemory['h00A] = 9'h101; //
210
    s_opcodeMemory['h00B] = 9'h080; // jump inner
211
    s_opcodeMemory['h00C] = 9'h000; // nop
212
    // repause
213
    s_opcodeMemory['h00D] = 9'h100; // 0x00
214
    s_opcodeMemory['h00E] = 9'h119; // :inner
215
    s_opcodeMemory['h00F] = 9'h0C0; // call pause
216
    s_opcodeMemory['h010] = 9'h000; // nop
217
    s_opcodeMemory['h011] = 9'h05C; // 1-
218
    s_opcodeMemory['h012] = 9'h008; // dup
219
    s_opcodeMemory['h013] = 9'h10E; //
220
    s_opcodeMemory['h014] = 9'h0A0; // jumpc inner
221
    s_opcodeMemory['h015] = 9'h054; // drop
222
    s_opcodeMemory['h016] = 9'h054; // drop
223
    s_opcodeMemory['h017] = 9'h028; // return
224
    s_opcodeMemory['h018] = 9'h000; // nop
225
    // pause
226
    s_opcodeMemory['h019] = 9'h100; // 0x00
227
    s_opcodeMemory['h01A] = 9'h05C; // :inner 1-
228
    s_opcodeMemory['h01B] = 9'h008; // dup
229
    s_opcodeMemory['h01C] = 9'h11A; //
230
    s_opcodeMemory['h01D] = 9'h0A0; // jumpc inner
231
    s_opcodeMemory['h01E] = 9'h054; // drop
232
    s_opcodeMemory['h01F] = 9'h054; // drop
233
    s_opcodeMemory['h020] = 9'h028; // return
234
    s_opcodeMemory['h021] = 9'h000; // nop
235
    s_opcodeMemory['h022] = 9'h000;
236
    s_opcodeMemory['h023] = 9'h000;
237
    s_opcodeMemory['h024] = 9'h000;
238
    ...
239
    s_opcodeMemory['h7FF] = 9'h000;
240
  end
241
 
242
 
243
DATA and STRINGS
244
================================================================================
245
 
246
Values are pushed onto the data stack by stating the value.  For example,
247
 
248
  0x10 0x20 'x'
249
 
250
will successively push the values 0x10, 0x20, and the character 'x' onto the
251
data stack.  The character 'x' will be at the top of the data stack after these
252
3 instructions.
253
 
254
See the COMPUTED VALUES section for using computing values in the assembler.
255
 
256
There are four ways to specify strings in the assembler.  Simply stating the
257
string
258
 
259
  "Hello World!"
260
 
261
puts the characters in the string onto the data stack with the letter 'H' at the
262
top of the data stack.  I.e., the individual push operations are
263
 
264
  '!' 'd' 'l' ... 'e' 'H'
265
 
266
Prepending a 'N' before the double quote, like
267
 
268
  N"Hello World!"
269
 
270
puts a null-terminated string onto the data stack.  I.e., the value under the
271
'!' will be a 0x00 and the instruction sequence would be
272
 
273
  0x0 '!' 'd' 'l' ... 'e' 'H'
274
 
275
Forth uses counted strings, which are specified here as
276
 
277
  C"Hello World!"
278
 
279
In this case the number of characters, 12 in this example, in the string is
280
pushed onto the data stack after the 'H', i.e., the instruction sequence would
281
be
282
 
283
  '!' 'd' 'l' ... 'e' 'H' 12
284
 
285
Finally, a lesser-counted string specified like
286
 
287
  c"Hello World!"
288
 
289
is similar to the Forth-like counted string except that the value pushed onto
290
the data stack is one less than the number of characters in the string.  Here
291
the value pushed onto the data stack after the 'H' would be 11 instead of 12.
292
 
293
Simple strings are useful for constructing more complex strings in conjunction
294
with other string functions.   For example, to transmit the hex values of the
295
top 2 values in the data stack, do something like:
296
 
297
  ; move the top 2 values to the return stack
298
  >r >r
299
  ; push the tail of the message onto the data stack
300
  N"\n\r"
301
  ; convert the 2 values to 2-digit hex values, LSB deepest in the stack
302
  r> .call(string_byte_to_hex)
303
  r> .call(string_byte_to_hex)
304
  ; pre-pend the identification message
305
  "Message:  "
306
  ; transmit the string, using the null terminator to terminate the loop
307
  :loop_transmit .outport(O_UART_TX) .jumpc(loop_transmit,nop) drop
308
 
309
A lesser-counted string would be used like:
310
 
311
  c"Status Message\r\n"
312
  :loop_msg swap .outport(O_UART_TX) .jumpc(loop_msg,1-) drop
313
 
314
These four string formats can also be used for variable definitions.  For
315
example 3 variables could be allocated and initialized as follows:
316
 
317
  .memory ROM myrom
318
  .variable fred N"fred"
319
  .variable joe  c"joe"
320
  .variable moe  "moe"
321
 
322
These are equivalent to
323
 
324
  .variable fred 'f' 'r' 'e' 'd'  0
325
  .variable joe   2  'j' 'o' 'e'
326
  .variable moe  'm' 'o' 'e'
327
 
328
with 5 bytes allocated for the variable fred, 4 bytes for joe, and 3 bytes for
329
moe.
330
 
331
The following escaped characters are recognized:
332
 
333
  '\0'     null character
334
  '\a'     bell
335
  '\b'     backspace
336
  '\f'     form feed
337
  '\n'     line feed
338
  '\r'     carriage return
339
  '\t'     horizontal tab
340
  "\0ooo"  3-digit octal value
341
  "\xXX"   2-digit hex value where X is one of 0-9, a-f, or A-F
342
  "\Xxx"   alternate form for 2-digit hex value
343
  "\\"     backslash character
344
 
345
Unrecognized escaped characters are simple treated as that character.  For
346
example, '\m' is treated as the single character 'm' and '\'' is treated as the
347
single quote character.
348
 
349
 
350
INSTRUCTIONS
351
================================================================================
352
 
353
The 41 instructions are as follows (see core/9x8/doc/opcodes.html for detailed
354
descriptions).  Here, T is the top of the data stack, N is the next-to-top of
355
the data stack, and R is the top of the return stack.  All of these are the
356
values at the start of the instruction.
357
 
358
The nop instruction does nothing:
359
 
360
  nop           no operation
361
 
362
Mathematical operations drop one value from the data stack and replace the new
363
top with the state value:
364
 
365
  &             bitwise and of N and T
366
  +             N + T
367
  -             N - T
368
  ^             bitwise exclusive or of N and T
369
  or            bitwise or of N and T
370
 
371
Increment and decrement replace the top of the data stack with the stated
372
result.
373
 
374
  1+            replace T with T+1
375
  1-            replace T with T-1
376
 
377
Comparison operations replace the top of the data stack with the results of the
378
comparison:
379
 
380
  -1<>          replace T with -1 if T != -1, otherwise set T to 0
381
  -1=           replace T with 0 if T != -1, otherwise leave T as -1
382
  0<>           replace T with -1 if T != 0, otherwise leave T as 0
383
  0=            replace T with -1 if T == 0, otherwise set T to 0
384
 
385
Shift/rotate operations replace the top of the data with with the result of the
386
specified shift/rotate.
387
 
388
  0>>           shift T right one bit and set the msb to 0
389
  1>>           shift T right 1 bit and set the msb to 1
390
  <<0           shift T left 1 bit and set the lsb to 0
391
  <<1           shift T left 1 bit and set the lsb to 1
392
  <
393
  lsb>>         rotate T right 1 bit
394
  msb>>         shift T right 1 bit and set the msb to the old msb
395
 
396
Note:  There is no "<
397
 
398
Stack manipulation instructions are as follows:
399
 
400
  >r            pushd T onto the return stack and drop T from the data stack
401
  drop          drop T from the data stack
402
  dup           push T onto the data stack
403
  nip           drop N from the data stack
404
  over          push N onto the data stack
405
  push          push a single byte onto the data stack, see the preceding DATA
406
                and STRINGS section
407
  r>            push R onto the data stack and drop R from the return stack
408
  r@            push R onto the data stack
409
  swap          swap N and T
410
 
411
Jump and call and their conditional variants are as follows and must use the
412
associated macro:
413
 
414
  call          call instruction -- use the .call macro
415
  callc         conditional call instruction -- use the .callc macro
416
  jump          jump instruction -- use the .jump macro
417
  jumpc         conditional jump instruction -- use the .jumpc macro
418
  return        return instruction -- use the .return macro
419
 
420
See the MEMORY section for details for these memory operations.  T is the
421
address for the instructions, N is the value stored.  Chained fetches insert the
422
value below T.  Chained stores drop N.
423
 
424
  fetch         memory fetch, replace T with the value fetched
425
  fetch+        chained memory fetch, retain and increment the address
426
  fetch-        chained memory fetch, retain and decrement the address
427
  store         memory store, drop T (N is the next value of T)
428
  store+        chained memory store, retain and increment the address
429
  store-        chained memory store, retain and decrement the address
430
 
431
See the INPORT and OUTPORT section for details for the input and output port
432
operations:
433
 
434
  inport        input port operation
435
  outport       output port operation
436
 
437
The .call, .callc, .jump, and .jumpc macros encode the 3 instructions required
438
to perform a call or jump along with the subsequent instructions.  The default
439
third instructions is "nop" for .call and .jump and it is "drop" for .callc and
440
.jumpc.  The default can be changed by specifying the optional second argument.
441
The .call and .callc macros must specify a function identified by the .function
442
directive and the .jump and .jumpc macros must specify a label.
443
 
444
The .function directive takes the name of the function and the function body.
445
Function bodies must end with a .return or a .jump macro.  The .main directive
446
defines the body of the main function, i.e., the function at which the processor
447
starts.
448
 
449
The .include directive is used to read additional assembly code.  You can, for
450
example, put the main function in uc.s, define constants and such in consts.s,
451
define the memories and variables in ram.s, and include UART utilities in
452
uart.s.  These files could be included in uc.s through the following lines:
453
 
454
  .include consts.s
455
  .include myram.s
456
  .include uart.s
457
 
458
The assembler only includes functions that can be reached from the main
459
function.  Unused functions will not consume instruction space.
460
 
461
 
462
INPORT and OUTPORT
463
================================================================================
464
 
465
The INPORT and OUTPORT configuration commands are used to specify 2-state inputs
466
and outputs.  For example
467
 
468
  INPORT 8-bit i_value I_VALUE
469
 
470
specifies a single 8-bit input signal named "i_value" for the module.  The port
471
is accessed in assembly by ".inport(I_VALUE)" which is equivalent to the
472
two-instruction sequence "I_VALUE inport".  To input an 8-bit value from a FIFO
473
and send a single-clock-cycle wide acknowledgment strobe, use
474
 
475
  INPORT 8-bit,strobe i_fifo,o_fifo_ack I_FIFO
476
 
477
The assembly ".inport(I_FIFO)" will automatically send an acknowledgment strobe
478
to the FIFO through "o_fifo_ack".
479
 
480
A write port to an 8-bit FIFO is similarly specified by
481
 
482
  OUTPORT 8-bit,strobe o_fifo,o_fifo_wr O_FIFO
483
 
484
The assembly ".outport(O_FIFO)" which is equivalent to "O_FIFO outport drop"
485
will automatically send a write strobe to the FIFO through "o_fifo_wr".
486
 
487
Multiple signals can be packed into a single input or output port by defining
488
them in comma separated lists.  The associated bit masks can be defined
489
coincident with the port definition as follows:
490
 
491
  INPUT 1-bit,1-bit i_fifo_full,i_fifo_empty I_FIFO_STATUS
492
  CONSTANT C_FIFO_STATUS__FULL  0x02
493
  CONSTANT C_FIFO_STATUS__EMPTY 0x01
494
 
495
Checking the "full" status of the FIFO can be done by the following assembly
496
sequence:
497
 
498
  .inport(I_FIFO_STATUS) C_FIFO_STATUS__FULL &
499
 
500
Multiple bits can be masked using a computed value as follows (see below for
501
more details):
502
 
503
  .inport(I_FIFO_STATUS) ${C_FIFO_STATUS__FULL|C_FIFO_STATUS__EMPTY} &
504
 
505
The "${...}" creates an instruction to push the 8-bit value in the braces onto
506
the data stack.  The computation is performed using the Python "eval" function
507
in the context of the program constants, memory addresses, and memory sizes.
508
 
509
Preceding all of these by
510
 
511
  PORTCOMMENT external FIFO
512
 
513
produces the following in the Verilog module statement.  The I/O ports are
514
listed in the order in which they are declared.
515
 
516
  // external FIFO
517
  input  wire       [7:0] i_fifo,
518
  output reg              o_fifo_ack,
519
  output reg        [7:0] o_fifo,
520
  output reg              o_fifo_wr,
521
  input  wire             i_fifo_full,
522
  input  wire             i_fifo_empty
523
 
524
The HDL to implement the inputs and outputs is computer generated.  Identifying
525
the port name in the architecture file eliminates the possibility of
526
inconsistent port numbers between the HDL and the assembly.  Specifying the bit
527
mapping for the assembly code immediately after the port definition helps
528
prevent inconsistencies between the port definition and the bit mapping in the
529
assembly code.
530
 
531
The normal initial value for an outport is zero.  This can be changed by
532
including an optional initial value as follows.  This initial value will be
533
applied on system startup and when the micro controller is reset.
534
 
535
  OUTPORT 4-bit=4'hA o_signal O_SIGNAL
536
 
537
An isolated output strobe can also be created using:
538
 
539
  OUTPORT strobe o_strobe O_STROBE
540
 
541
The assembly ".outstrobe(O_STROBE)" which is equivalent to "O_STROBE outport"
542
is used to generate the strobe.  Since "O_STROBE" is a strobe-only outport, the
543
".outport" macro cannot be used with it.  Similarly, attempting to use the
544
".outstrobe" macro will generate an error if it is invoked with an outport
545
that does have data.
546
 
547
A single-bit "set-reset" input port type is also included.  This sets a register
548
when an external strobe is received and clears the register when the port is
549
read.  For example, to capture an external timer for a polled-loop, include the
550
following in the architecture file:
551
 
552
  PORTCOMMENT external timer
553
  INPORT set-reset i_timer I_TIMER
554
 
555
The following is the assembly code to conditionally call two functions when the
556
timer event is encountered:
557
 
558
  .inport(I_TIMER)
559
    .callc(timer_event_1,nop)
560
    .callc(timer_event_2)
561
 
562
The "nop" in the first conditional call prevents the conditional from being
563
dropped from the data stack so that it can be used by the subsequent conditional
564
function call.
565
 
566
 
567
PERIPHERAL
568
================================================================================
569
 
570
Peripherals are implemented via Python modules.  For example, an open drain I/O
571
signal, such as is required for an I2C bus, does not fit the INPORT and OUTPORT
572
functionality.  Instead, an "open_drain" peripheral is provided by the Python
573
script in "core/9x8/peripherals/open_drain.py".  This puts a tri-state I/O in
574
the module statement, allows it to be read through an "inport" instruction, and
575
allows it to be set low or released through an "outport" instruction.  An I2C
576
bus with separate SCL and SDA ports can then be incorporated into the processor
577
as follows:
578
 
579
  PORTCOMMENT     I2C bus
580
  PERIPHERAL      open_drain      inport=I_SCL \
581
                                  outport=O_SCL \
582
                                  iosignal=io_scl
583
  PERIPHERAL      open_drain      inport=I_SDA \
584
                                  outport=O_SDA \
585
                                  iosignal=io_sda
586
 
587
The default width for this peripheral is 1 bit.  The module statement will then
588
include the lines
589
 
590
  // I2C bus
591
  inout  wire     io_scl,
592
  inout  wire     io_sda
593
 
594
The assembly code to set the io_scl signal low is "0 .outport(O_SCL)" and to
595
release it is "1 .outport(O_SCL)".  These instruction sequences are actually
596
"0 O_SCL outport drop" and "1 O_SCL outport drop" respectively.  The "outport"
597
instruction drops the top of the data stack (which contained the port number)
598
and sends the next-to-the-top of the data stack to the designated output port.
599
 
600
Two examples of I2C device operation are included in the examples directory.
601
 
602
The following peripherals are provided:
603
  adder_16bit   16-bit adder/subtractor
604
  AXI4_Lite_Master
605
                32-bit read/write AXI4-Lite Master
606
                Note:  The synchronous version has been tested on hardware.
607
  AXI4_Lite_Slave_DualPortRAM
608
                dual-port-RAM interface for the micro controller to act as an
609
                AXI4-Lite slave
610
  big_inport    shift reads from a single INPORT to construct a wide input
611
  big_outport   shift writes to a single OUTPORT to construct a wide output
612
  counter       counter for number of received high cycles from signal
613
  inFIFO_async  input FIFO with an asynchronous write clock
614
  latch         latch wide inputs for sampling
615
  monitor_stack simulation diagnostic (see below)
616
  open_drain    for software-implemented I2C buses or similar
617
  outFIFO_async output FIFO with an asynchronous read clock
618
  PWM_8bit      PWM generator with an 8-bit control
619
  timer         timing for polled loops or similar
620
  trace         simulation diagnostic (see below)
621
  UART          bidirectional UART
622
  UART_Rx       receive UART
623
  UART_Tx       transmit UART
624 3 sinclairrf
  wide_strobe   1 to 8 bit strobe generator
625 2 sinclairrf
 
626
The following command illustrates how to display the help message for
627
peripherals:
628
 
629
  echo "ARCHITECTURE core/9x8 Verilog" | ssbcc -P "big_inport help" - | less
630
 
631
User defined peripherals can be in the same directory as the architecture file
632
or a subdirectory named "peripherals".
633
 
634
 
635
PARAMETER and LOCALPARAM
636
================================================================================
637
 
638
Parameters are incorporated through the PARAMETER and LOCALPARAM configuration
639
commands.  For example, the clock frequency in hertz is needed for UARTs for
640
their baud rate generator.  The configuration command
641
 
642
  PARAMETER G_CLK_FREQ_HZ 97_000_000
643
 
644
specifies the clock frequency as 97 MHz.  The HDL instantiating the processor
645
can change this specification.  The frequency can also be changed through the
646
command-line invocation of the computer compiler.  For example,
647
 
648
  ssbcc -G "G_CLK_FREQ_HZ=100_000_000" myprogram.9x8
649
 
650
specifies that a frequency of 100 MHz be used instead of the default frequency
651
of 97 MHz.
652
 
653
The LOCALPARAM configuration command can be used to specify parameters that
654
should not be changed by the surrounding HDL.  For example,
655
 
656
  LOCALPARAM L_VERSION 24'h00_00_00
657
 
658
specifies a 24-bit parameter named "L_VERSION".  The 8-bit major, minor, and
659
build sections of the parameter can be accessed in an assembly program using
660
"L_VERSION[16+:8]", "L_VERSION[8+:8]", and "L_VERSION[0+:8]".
661
 
662
For both parameters and localparams, the default range is "[0+:8]".  The
663
instruction memory is initialized using the parameter value during synthesis,
664
not the value used to initialize the parameter.  That is, the instruction memory
665
initialization will be:
666
 
667
  s_opcodeMemory[...] = { 1'b1, L_VERSION[16+:8] };
668
 
669
The value of the localparam can be set when the computer compiler is run using
670
the "-G" option.  For example,
671
 
672
  ssbcc -G "L_VERSION=24'h01_04_03" myprogram.9x8
673
 
674
can be used in a makefile to set the version number for a release without
675
modifying the micro controller architecture file.
676
 
677
 
678
DIAGNOSTICS AND DEBUGGING
679
================================================================================
680
 
681
A 3-character, human readable version of the opcode can be included in
682
simulation waveform outputs by adding "--display-opcode" to the ssbcc command.
683
 
684
The stack health can be monitored during simulation by including the
685
"monitor_stack" peripheral through the command line.  For example, the LED
686
flasher example can be generated using
687
 
688
  ssbcc -P monitor_stack led.9x8
689
 
690
This allows the architecture file to be unchanged between simulation and an FPGA
691
build.
692
 
693
Stack errors include underflow and overflow, malformed data validity, and
694
incorrect use of the values on the return stack (returns to data values and data
695
operations on return addresses).  Other errors include out-of-range for memory,
696
inport, and outport operations.
697
 
698
When stack errors are detected the last 50 instructions are dumped to the
699
console and the simulation terminates.  The dump includes the PC, numeric
700
opcode, textual representation of the opcode, data stack pointer, next-to-top of
701
the data stack, top of the data stack, top of the return stack, and the return
702
stack pointer.  Invalid stack values are displayed as "XX".  The length of the
703
history dumped is configurable.
704
 
705
Out-of-range PC checks are also performed if the instruction space is not a
706
power of 2.
707
 
708
A "trace" peripheral is also provided that dumps the entire execution history.
709
This was used to validate the processor core.
710
 
711
 
712
MEMORY ARCHITECTURE
713
================================================================================
714
 
715
The DATA_STACK, RETURN_STACK, INSTRUCTION, and MEMORY configuration commands
716
allocate memory for the data stack, return stack, instruction ROM, and memory
717
RAM and ROM respectively.  The data stack, return stack, and memories are
718
normally instantiated as dual-port LUT-based memories with asynchronous reads
719
while the instruction memory is always instantiated with a synchronous read
720
architecture.
721
 
722
The COMBINE configuration command is used to coalesce memories and to convert
723
LUT-based memories to synchronous SRAM-based memories.  For example, the large
724
SRAMs in modern FPGAs are ideal for storing the instruction opcodes and their
725
dual-ported access allows either the data stack or the return stack to be
726
stored in a relatively small region at the end of the large instruction memory.
727
Memories, which required dual-ported operation, can also be instantiated in
728
large RAMs either individually or in combination with each other.  Conversion
729
to SRAM-based memories is also useful for FPGA architectures that do not have
730
efficient LUT-based memories.
731
 
732
The INSTRUCTION configuration allocates memory for the processor instruction
733
space.  It has the form "INSTRUCTION N" or "INSTRUCTION N*M" where N must be a
734
power of 2.  The first form is used if the desired instruction memory size is a
735
power of 2.  The second form is used to allocate M memory blocks of size N
736
where M is not a power of 2.  For example, on an Altera Cyclone III, the
737
configuration command "INSTRUCTION 1024*3" allocates three M9Ks for the
738
instruction space, saving one M9K as compared to the configuration command
739
"INSTRUCTION 4096".
740
 
741
The DATA_STACK configuration command allocates memory for the data stack.  It
742
has the form "DATA_STACK N" where N is the commanded size of the data stack.
743
N must be a power of 2.
744
 
745
The RETURN_STACK configuration command allocates memory for the return stack and
746
has the same format as the DATA_STACK configuration command.
747
 
748
The MEMORY configuration command is used to define one to four memories, either
749
RAM or ROM, with up to 256 bytes each.  If no MEMORY configuration command is
750
issued, then no memories are allocated for the processor.  The MEMORY
751
configuration command has the format "MEMORY {RAM|ROM} name N" where
752
"{RAM|ROM}" specifies either a RAM or a ROM, name is the name of the memory and
753
must start with an alphabetic character, and the size of the memory, N, must be
754
a power of 2.  For example, "MEMORY RAM myram 64" allocates 64 bytes of memory
755
to form a RAM named myram.  Similarly, "MEMORY ROM lut 256" defines a 256 byte
756
ROM named lut.  More details on using memories is provided in the next section.
757
 
758
The COMBINE configuration command can be used to combine the various memories
759
for more efficient processor implementation as follows:
760
 
761
  COMBINE INSTRUCTION,
762
  COMBINE 
763
  COMBINE ,
764
  COMBINE 
765
 
766
where  is one of DATA_STACK, RETURN_STACK, or a list of one
767
or more ROMs and  is a list of one or more RAMs and/or ROMs.  The first
768
configuration command reserves space at the end of the instruction memory for
769
the DATA_STACK, RETURN_STACK, or listed ROMs.
770
 
771
The SRAM_WIDTH configuration command is used to make the memory allocations more
772
efficient when the SRAM block width is more than 9 bits.  For example,
773
Altera's Cyclone V family has 10-bit wide memory blocks and the configuration
774
command "SRAM_WIDTH 10" is appropriate.  The configuration command
775
sequence
776
 
777
  INSTRUCTION     1024
778
  RETURN_STACK    32
779
  SRAM_WIDTH      10
780
  COMBINE         INSTRUCTION,RETURN_STACK
781
 
782
will use a single 10-bit memory entry for each element of the return stack
783
instead of packing the 10-bit values into two memory entries of a 9-bit wide
784
memory.
785
 
786
The following illustrates a possible configuration for a Spartan-6 with a
787
2048-long SRAM and relatively large 64-deep data stack.  The data stack will be
788
in the last 64 elements of the instruction memory and the instruction space will
789
be reduced to 1984 words.
790
 
791
  INSTRUCTION   2048
792
  DATA_STACK    64
793
  COMBINE       INSTRUCTION,DATA_STACK
794
 
795
The following illustrates a possible configuration for a Cyclone-III with three
796
M9Ks for the instruction ROM and the data stack.
797
 
798
  INSTRUCTION   1024*3
799
  DATA_STACK    64
800
  COMBINE       INSTRUCTION,DATA_STACK
801
 
802
WARNING:  Some devices, such as Xilinx' Spartan-3A devices, do not support
803
asynchronous reads, so the COMBINE configuration command does not work for them.
804
 
805
WARNING:  Xilinx XST does not correctly infer a Block RAM when the
806
"COMBINE INSTRUCTION,RETURN_STACK" configuration command is used and the
807
instruction space is 1024 instructions or larger.  Xilinx is supposed to fix
808
this in a future release of Vivado so the fix will only apply to 7-series or
809
later FPGAs.
810
 
811
 
812
MEMORY
813
================================================================================
814
 
815
The MEMORY configuration command is used as follows to allocate a 128-byte RAM
816
named "myram" and to allocate a 32-byte ROM named "myrom".  Zero to four
817
memories can be allocated, each with up to 256 bytes.
818
 
819
  MEMORY RAM myram 128
820
  MEMORY ROM myrom  32
821
 
822
The assembly code to lay out the memory uses the ".memory" directive to identify
823
the memory and the ".variable" directive to identify the symbol and its content.
824
Single or multiple values can be listed and "*N" can be used to identify a
825
repeat count.
826
 
827
  .memory RAM myram
828
  .variable a 0
829
  .variable b 0
830
  .variable c 0 0 0 0
831
  .variable d 0*4
832
 
833
  .memory ROM myrom
834
  .variable coeff_table 0x04
835
                        0x08
836
                        0x10
837
                        0x20
838
  .variable hello_world N"Hello World!\r\n"
839
 
840
Single values are fetched from or stored to memory using the following assembly:
841
 
842
  .fetchvalue(a)
843
  0x12 .storevalue(b)
844
 
845
Multi-byte values are fetched or stored as follows.  This copies the four values
846
from coeff_table, which is stored in a ROM, to d.
847
 
848
  .fetchvector(coeff_table,4) .storevector(d,4)
849
 
850
The memory size is available using computed values (see below) and can be used
851
to clear the entire memory, etc.
852
 
853
The available single-cycle memory operation macros are:
854
  .fetch(mem_name)      replaces T with the value at the address T in the memory
855
                        mem_name
856
  .fetch+(mem_name)     pushes the value at address T in the memory mem_name
857
                        into the data stack below T and increments T
858
                        Note:  This is useful for fetching successive values
859
                               from memory into the data stack.
860
  .fetch-(mem_name)     similar to .fetch+ but decrements T
861
  .store(ram_name)      stores N at address T in the RAM ram_name, also drops
862
                        the top of the data stack
863
  .store+(ram_name)     stores N at address T in the RAM ram_name, also drops N
864
                        from the data stack and increments T
865
  .store-(ram_name)     similar to .store+ but decrements T
866
 
867
The following multi-cycle macros provide more generalized access to the
868
memories:
869
  .fetchvalue(var_name) fetches the single-byte value of var_name
870
                        Note:  This is equivalent to "var_name .fetch(mem_name)"
871
                               where mem_name is the memory in which var_name is
872
                               stored.
873
  .fetchindexed(var_name)
874
                        uses the top of the data stack as an index into var_name
875
                        Note:  This is equivalent to the 3 instruction sequence
876
                               "var_name + .fetch(mem_name)"
877
  .fetchoffset(var_name,offset)
878
                        fetches the single-byte value of var_name offset by
879
                        "offset" bytes
880
                        Note:  This is equivalent to
881
                               "${var_name+offset} .fetch(mem_name)"
882
  .fetchvector(var_name,N)
883
                        fetches N values starting at var_name into the data
884
                        stack with the value at var_name at the top and the
885
                        value at var_name+N-1 deep in the stack.
886
                        Note:  This is equivalent N+1 operation sequence
887
                               "${var_name+N-1} .fetch-(mem_name) ...
888
                               .fetch-(mem_name) .fetch(mem_name)"
889
                               where ".fetch-(mem_name)" is repeated N-1 times.
890
  .storevalue(var_name) stores the single-byte value at the top of the data
891
                        stack at var_name
892
                        Note:  This is equivalent to
893
                               "var_name .store(mem_name) drop"
894
                        Note:  The default "drop" instruction can be replaced by
895
                               providing the optional second argument.  For
896
                               example, the following instruction will store and
897
                               then decrement the value at the top of the data
898
                               stack:
899
                                 .storevalue(var_name,1-)
900
  .storeindexed(var_name)
901
                        uses the top of the data stack as an index into var_name
902
                        into which to store the next-to-top of the data stack.
903
                        Note:  This is equivalent to the 4 instruction sequence
904
                               "var_name + .store(mem_name) drop".
905
                        Note:  The default "drop" instruction can be overriden
906
                               by providing the optional second argument
907
                               similarly to the .storevalue macro.
908
  .storeoffset(var_name,offset)
909
                        stores the single-byte value at the top of the data
910
                        stack at var_name offset by "offset" bytes
911
                        Note:  This is equivalent to
912
                               "${var_name+offset} .store(mem_name) drop"
913
                        Note:  The optional third argument is as per the
914
                               optional second argument of .storevalue
915
  .storevector(var_name,N)
916
                        Does the reverse of the .fetchvector macro.
917
                        Note:  This is equivalent to the N+2 operation sequence
918
                               "var_name .store+(mem_name) ... .store+(mem_name)
919
                               .store(mem_name) drop"
920
                               where ".store+(mem_name)" is repeated N-1 times.
921
 
922
The .fetchvector and .storevector macros are intended to work with values stored
923
MSB first in memory and with the MSB toward the top of the data stack,
924
similarly to the Forth language with multi-word values.  To demonstrate how
925
this data structure works, consider the examples of decrementing and
926
incrementing a two-byte value on the data stack:
927
 
928
  ; Decrement a 2-byte value
929
  ;   swap 1- swap      - decrement the LSB
930
  ;   over -1=          - puts -1 on the top of the data stack if the LSB rolled
931
  ;                       over from 0 to -1, puts 0 on the top otherwise
932
  ;   +                 - decrements the MSB if the LSB rolled over
933
  ; ( u_LSB u_MSB - u_LSB' u_MSB' )
934
  .function decrement_2byte
935
  swap 1- swap over -1= .return(+)
936
 
937
  ; Increment a 2-byte value
938
  ;   swap 1+ swap      - increment the LSB
939
  ;   over 0=           - puts -1 on the top of the data stack if the LSB rolled
940
  ;                       over from 0xFF to 0, puts 0 on the top otherwise
941
  ;   -                 - increments the MSB if the LSB rolled over (by
942
  ;                       subtracting -1)
943
  ; ( u_LSB u_MSB - u_LSB' u_MSB' )
944
  .function increment_2byte
945
  swap 1+ swap over 0= .return(-)
946
 
947
 
948
COMPUTED VALUES
949
================================================================================
950
 
951
Computed values can be pushed on the stack using a "${...}" where the "..." is
952
evaluated in Python and cannot have any spaces.
953
 
954
For example, a loop that should be run 5 times can be coded as:
955
 
956
  ${5-1} :loop ... .jumpc(loop,1-) drop
957
 
958
which is a clearer indication that the loop is to be run 5 times than is the
959
instruction sequence
960
 
961
  4 :loop ...
962
 
963
Constants can be accessed in the computation.  For example, a block of memory
964
can be allocated as follows:
965
 
966
  .constant C_RESERVE
967
  .memory RAM myram
968
  ...
969
  .variable reserved 0*${C_RESERVE}
970
 
971
and the block of reserved memory can be cleared using the following loop:
972
 
973
  ${C_RESERVE-1} :loop 0 over .storeindexed(reserved) .jumpc(loop,1-) drop
974
 
975
The offsets of variables in their memory can also be accessed through a computed
976
value.  The value of reserved could also be cleared as follows:
977
 
978
  ${reserved-1} ${C_RESERVE-1} :loop >r
979
 
980
  r> .jumpc(loop,-1) drop drop
981
 
982
This body of this version of the loop is the same length as the first version.
983
In general, it is better to use the memory macros to access variables as they
984
ensure the correct memory is accessed.
985
 
986
The sizes of memories can also be accessed using computed values.  If "myram" is
987
a RAM, then "${size['myram']}" will push the size of "myram" on the stack.  As
988
an example, the following code will clear the entire RAM:
989
 
990
  ${size['myram']-1} :loop 0 swap .jumpc(loop,.store-(myram)) drop
991
 
992
The lengths of I/O signals can also be accessed using computed values.  If
993
"o_mask" is a mask, then "${size['o_mask']}" will push the size of the mask on
994
the stack and "${2**size['o_mask']-1}" will push a value that sets all the bits
995
of the mask.  The I/O signals include I/O signals instantiated by peripherals.
996
For example, for the configuration command
997
 
998
  PERIPHERAL big_outport outport=O_BIG outsignal=o_big width=47
999
 
1000
the width of the output signal is accessible using "${size['o_big']}".  You can
1001
set the wide signal to all zeroes using:
1002
 
1003
  ${(size['o_big']+7)/8-1} :loop 0 .outport(O_BIG) .jumpc(loop,1-) drop
1004
 
1005 3 sinclairrf
 
1006
MACROS
1007
================================================================================
1008
There are 3 types of macros used by the assembler.
1009
 
1010
The first kind of macros are built in to the assembler and are required to
1011
encode instructions that have embedded values or have mandatory subsequent
1012
instructions.  These include function calls, jump instructions, function return,
1013
and memory accesses as follows:
1014
  .call(function,[op])
1015
  .callc(function,[op])
1016
  .fetch(ramName)
1017
  .fetch+(ramName)
1018
  .fetch-(ramName)
1019
  .jump(label,[op])
1020
  .jumpc(label,[op])
1021
  .return([op])
1022
  .store(ramName)
1023
  .store+(ramName)
1024
  .store-(ramName)
1025
 
1026
The second kind of macros are designed to ease access to input and output
1027
operations and for memory accesses and to help ensure these operations are
1028
correctly constructed.  These are defined as python scripts in the
1029
core/9x8/macros directory and are automatically loaded into the assembler.
1030
These macros are:
1031
  .fetchindexed(variable)
1032
  .fetchoffset(variable,ix)
1033
  .fetchvalue(variableName)
1034
  .fetchvector(variableName,N)
1035
  .inport(I_name)
1036
  .outport(O_name[,op])
1037
  .outstrobe(O_name)
1038
  .storeindexed(variableName[,op])
1039
  .storeoffset(variableName,ix[,op])
1040
  .storevalue(variableName[,op])
1041
  .storevector(variableName,N)
1042
 
1043
The third kind of macro is user-defined macros.  These macros must be registered
1044
with the assembler using the ".macro" directive.
1045
 
1046
For example, the ".push32" macro is defined by macros/9x8/push32.py and can be
1047
used to push 32-bit (4-byte) values onto the data stack as follows:
1048
 
1049
  .macro push32
1050
  .constant C_X 0x87654321
1051
  .main
1052
    ...
1053
    .push32(0x12345678)
1054
    .push32(C_X)
1055
    .push32(${0x12345678^C_X})
1056
    ...
1057
 
1058
The following macros are provided in macros/9x8:
1059
  .push16(v)    push the 16-bit (2-byte) value "v" onto the data stack with the
1060
                MSB at the top of the data stack
1061
  .push32(v)    push the 32-bit (4-byte) value "v" onto the data stack with the
1062
                MSB at the top of the data stack
1063
 
1064
Directories are searched in the following order for macros:
1065
  .
1066
  ./macros
1067
  include paths specified by the '-M' command line option.
1068
  macros/9x8
1069
 
1070
The python scripts in core/9x8/macros and macros/9x8 can be used as design
1071
examples for user-defined macros.  The assembler does some type checking based
1072
on the list provided when the macro is registered by the "AddMacro" method, but
1073
additional type checking is often warranted by the macro "emitFunction" which
1074
emits the actual assembly code.  The ".fetchvector" and ".storevector" macros
1075
demonstrates how to design variable-length macros.
1076
 
1077
It is not an error to repeat the ".macro MACRO_NAME" directive for user-defined
1078
macros.  The assembler will issue a fatal error if a user-defined macro
1079
conflicts with a built-in macro.
1080
 
1081
 
1082 2 sinclairrf
CONDITIONAL COMPILATION
1083
================================================================================
1084
The computer compiler and assembler recognize conditional compilation as
1085
follows:  .IFDEF, .IFNDEF, .ELSE, and .ENDIF can be used in the architecture
1086
file and they can be used to conditionally include functions, files, etc within
1087
the assembly code; .ifdef, .ifndef, .else, and .endif can be used in function
1088
bodies, variable bodies, etc. to conditionally include assembly code, symbols,
1089
or data.  Conditionals cannot cross file boundaries.
1090
 
1091
The computer compiler examines the list of defined symbols such as I/O ports,
1092
I/O signals, etc. to evaluate the true/false condition associated with the
1093
.IFDEF and .IFNDEF commands.  The "-D" option to the computer compiler is
1094
provided to define symbols for enabling conditionally compiled configuration
1095
commands.  Similarly, the assembler examines the list of I/O ports, I/O signals,
1096
parameters, constants, etc. to evaluate the .IFDEF, .IFNDEF, .ifdef, and .ifndef
1097
conditionals.
1098
 
1099
For example, a diagnostic UART can be conditionally included using the
1100
configuration commands:
1101
 
1102
  .IFDEF ENABLE_UART
1103
  PORTCOMMENT Diagnostic UART
1104
  PERIPHERAL UART_Tx outport=O_UART_TX ...
1105
  .ENDIF
1106
 
1107
And the assembly code can include conditional code fragments such the following,
1108
where the existence of the output port is used to determine whether or not to
1109
send a character to that output port:
1110
 
1111
  .ifdef(O_UART_TX) 'A' .outport(O_UART_TX) .endif
1112
 
1113
Invoking the computer compiler with "-D ENABLE_UART" will generate a module with
1114
the UART peripheral and will enable the conditional code sending the 'A'
1115
character to the UART port.
1116
 
1117
The following code can be used to preclude multiple attempted inclusions of an
1118
assembly library file.
1119
 
1120
  ; put these two lines near the top of the file
1121
  .IFNDEF C_FILENAME_INCLUDED
1122
  .constant C_FILENAME_INCLUDED 1
1123
  ; put the library body here
1124
  ...
1125
  ; put this line at the bottom of the file
1126
  .ENDIF ; .IFNDEF C_FILENAME_INCLUDED
1127
 
1128
The ".INCLUDE" configuration command can be used to read configuration commands
1129
from additional sources.
1130
 
1131
 
1132
SIMULATIONS
1133
================================================================================
1134
 
1135
Simulations have been performed with Icarus Verilog, Verilator, and Xilinx'
1136
ISIM.  Icarus Verilog is good for short, simple simulations and is used for the
1137
core and peripheral test benches; Verilator for long simulations of large,
1138
complex systems; and ISIM when Xilinx-specific cores are used.  Verilator is
1139
the fastest simulators I've encountered.  Verilator is also used for lint
1140
checking in the core test benches.
1141
 
1142
 
1143
MEM INITIALIZATION FILE
1144
================================================================================
1145
 
1146
A memory initialization file is produced during compilation.  This file can be
1147
used with tools such as Xilinx' data2mem to modify the SRAM contents without
1148
having to rebuild the entire system.  It is restricted to the opcode memory
1149
initialization.  The file must be processed before it can be used by specific
1150
tools, see doc/MemoryInitialization.html.
1151
 
1152
WARNING:  The values of parameters used in the assembly code must match the
1153
instantiated design.
1154
 
1155
 
1156
THEORY OF OPERATION
1157
================================================================================
1158
 
1159
Registers are used for the top of data stack, "T", and the next-to-top of the
1160
data stack, "N".  The data stack is a separate memory.  This means that the
1161
"DATA_STACK N" configuration command actually allows N+2 values in the data
1162
stack since T and N are not stored in the N-element deep data stack.
1163
 
1164
The return stack is similar in that "R" is the top of the return stack and the
1165
"RETURN_STACK N" allocates an additional N words of memory.  The return stack is
1166
the wider of the 8-bit data width and the program counter width.
1167
 
1168
The program counter is always either incremented by 1 or is set to an address
1169
as controlled by jump, jumpc, call, callc, and return instructions.  The
1170
registered program counter is used to read the next opcode from the instruction
1171
memory and this opcode is also registered in the memory.  This means that there
1172
is a 1 clock cycle delay between the address changing and the associated
1173
instruction being performed.  This is also part of the architecture required to
1174
have the processor operate at one instruction per clock cycle.
1175
 
1176
Separate ALUs are used for the program counter, adders, logical operations, etc.
1177
and MUXes are used to select the values desired for the destination registers.
1178
The instruction execution consists of translating the upper 6 msb of the opcode
1179
into MUX settings and performing opcode-dependent ALU operations as controlled
1180
by the 3 lsb of the opcode (during the first half of the clock cycle) and then
1181
setting the T, N, R, memories, etc. as controlled by the computed MUX settings.
1182
 
1183
The "core.v" file is the code for these operations.  Within this file there are
1184
several "@xxx@" strings that specify where the computer compiler is to insert
1185
code such as I/O declarations, memories, inport interpretation, outport
1186
generation, peripherals, etc.
1187
 
1188
The file structure, i.e., putting the core and the assembler in "core/9x8"
1189
should facilitate application-specific modification of processor.  For example,
1190
the store+, store-, fetch+, and fetch- instructions could be replaced with
1191
additional stack manipulation operations, arithmetic operations with 2 byte
1192
results, etc.  Simply copy the "9x8" directory to something like "9x8_XXX" and
1193
make your modifications in that directory.  The 8-bit peripherals should still
1194
work, but the 9x8 library functions may need rework to accommodate the
1195
modifications.
1196
 
1197
 
1198
MISCELLANEOUS
1199
================================================================================
1200
 
1201
The "INVERT_RESET" configuration command is used to indicate an active-low reset
1202
is input to the micro controller rather than an active-high reset.
1203
 
1204
A VHDL package file is automatically generated by the computer compiler.

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.