# **Minimal Openrisc System on Chip Implementation**

The Minimal System on Chip is a System-on-Chip (SoC) implementation with standard IP Cores available at OpenCores. This implementation is composed by a standard project, comprehending the standard IP Cores necessary for a SoC embedding the OpenRISC implementation or 1200.

This project idea is to offer a SoC, which can be uploaded to every FPGA and be compatible with every FPGA board, without the requirement of changing its code. In order to deliver such a project, the project has been based on a standard memory implementation and the Advanced Debug System, which allows system debug with the same cables used for FPGA configuration.

The adaptation of the project to a target board is made in 2 steps maximum. First the minsoc\_defines.v file has to be adjusted, generally one has to only uncomment his FPGA manufacturer and FPGA model definitions. After that a constraint file for your specific pinout has to be created. There are constraint files for standard boards also, in the backend directory of the project.

Furthermore the project offers for this same SoC a working testbench and firmwares. The actual testbench can be run out of the box using Icarus Verilog v. 9.1. The firmwares are nearly the same of those of orpsoc\_v2. The differences are for now, that the known uart "hello world" example now runs with interrupts and a new Ethernet example has been added to it.

To complete, the size of the standard memory of the impementation can be adapted to your needs/possibilities by defining its address width inside of the same minsoc defines.v file.

#### How To

- 1. download minsoc
  - a) download further necessary IP cores
    - cd minsoc/rtl/verilog
    - > svn co http://opencores.org/ocsvn/adv debug sys/adv debug sys/trunk adv debug sys
    - > svn co http://opencores.org/ocsvn/ethmac/ethmac/trunk ethmac
    - > svn co http://opencores.org/ocsvn/openrisc/openrisc/trunk/or1200 or1200
    - > svn co http://opencores.org/ocsvn/uart16550/uart16550/trunk uart16550
- 2. install tools (GNU toolchain and adv jtag bridge)
  - a) Follow: <a href="http://www.opencores.org/openrisc.gnu">http://www.opencores.org/openrisc.gnu</a> toolchain (to install binutils, gcc, gdb)
  - b) To debug and load the firmware you have to use the new advanced\_debug\_system. This project is included in the minsoc files inside of minsoc/rtl/verilog/adv\_debug\_sys. There you can find the software in Software and the documentation, which shall help you to go under Doc.
    - > change the Makefile in Software/adv jtag bridge and compile the software using make.
    - > sudo make install

- ➤ Copy the description file of your FPGA to your home directory "cp /opt/Xilinx/10.1/ISE/spartan3e/data/xc3s500e fg320.bsd ~/"
- c) With the adv\_jtag\_bridge you can also debug your simulation. To do so, the simulation has to include a vpi module. This has to be compiled by your system. The sources are found under "minsoc/adv debug sys/Software/adv jtag bridge/sim lib/icarus".
  - > cd minsoc/adv debug sys/Software/adv jtag bridge/sim lib/icarus
  - > make
  - cp minsoc\_adv\_debug\_sys/Software/adv\_jtag\_bridge/sim\_lib/icarus/jp-io-vpi.vpi minsoc/bench/verilog/vpi
- d) The adv\_jtag\_bridge connect the debug system to gdb, the GNU debugger. But the actual version of gdb has some issues, which have to be corrected before use. To do so, the adv\_jtag\_bridge software includes a patch for gdb. Save the patch to the gdb source code directory installed by the toolchain installation script and patch it:
  - cp minsoc/adv\_debug\_sys/Software/adv\_jtag\_bridge/gdb-6.8-bz436037-reg-no-longer-active.patch toolchain build directory/gdb-6.8
  - cd toolchain\_build\_directory/gdb-6.8
  - > patch -p1 < gdb-6.8-bz436037-reg-no-longer-active.patch
  - > make
  - sudo make install

### 3. Compile software

- a) edit sw/support/orp.ld line 14 LENGTH = 0x00006E00 to
  - $\triangleright$  your memory amount in Bytes  $4 \cdot 2^{MEMORYADRWIDTH}$ , where MEMORYADRWIDTH is defined in `define MEMORY\_ADR\_WIDTH in "minsoc\_defines.v"

```
\xi \cdot \nabla^{MEMORYADRWIDTH} minus ORIGIN = 0x00001200
(e.g. 4*2^13 = 32,768 Bytes = 0x8000 \mid LENGTH = 0x8000 - 0x1200 = 0x6E00)
```

- b) select your STACK size on board.h line 16 #define STACK SIZE 0x01000
  - > change your IN CLK if not using 25000000 (25MHz)
- c) inside of sw/support make clean, make all
- d) inside of sw/utils make clean, make all
- e) inside of the target software (e.g. sw/uart) make clean, make all

### 4. Simulation

- a) Install Icarus Verilog
  - You will need at least version 0.9.1 (<a href="ftp://ftp.icarus.com/pub/eda/verilog/v0.9/">ftp://ftp.icarus.com/pub/eda/verilog/v0.9/</a>)
- b) configure your system: minsoc defines.v (not necessary)
  - ➤ Note: (since revision 17) Ethernet functionality of testbench does not base on eth\_phy.v anymore. This means that you can uncomment ETHERNET on minsoc\_defines.v and simulate it without simulation speed decrease and without changing any file.

- ➤ **Previous to revision 17:** if you uncomment "`define ETHERNET" you have to:
  - edit "minsoc/sim/run/generate bench":
    - substitute "../bin/minsoc\_model\_fast.txt" for "../bin/minsoc\_model\_complete.txt"
  - THIS WILL SLOW DOWN YOUR SIMULATION BY FACTOR 300
- c) configure minsoc bench defines.v (not necessary)
  - ➤ Your testbench will use a memory model, not actually the same memory controller the implementation uses. This enables the option "'define INITIALIZE\_MEMORY\_MODEL", where the firmware is loaded to the memory before testbench start.
  - You may use the actual implementation memory:
    - comment "'define INITIALIZE MEMORY MODEL"
    - edit minsoc/sim/run/generate bench
      - substitute "../bin/minsoc model.txt" for "../bin/minsoc memory.txt"
    - You might want to uncomment "'define START\_UP", it loads the firmware to a SPI memory. At start of testbench the system reads this memory and loads the firmware to main memory. Takes +-3 min. This is possible to be used for a real system, all you have to do is uncomment "'define START\_UP" from minsoc/rtl/verilog/minsoc defines.v.
- d) Modify testbench (not necessary)
- e) command to start testbench and select firmware
  - > from minsoc/sim/run/
    - ./generate\_bench
    - ./run bench <your firmware.hex>
      - ./run bench ../../sw/uart/uart-nocache-twobyte-sizefirst.hex
- f) Debuging the testbench (3 terminals)
  - > terminal 1: from minsoc/sim/run/
    - ./generate bench
    - ./run bench <your firmware.hex>
      - ./run\_bench ../../sw/uart/uart-nocache-twobyte-sizefirst.hex
  - terminal 2: from minsoc/sim/run
    - ./start server
  - > terminal 3: at minsoc/sw/uart
    - or32-elf-gdb uart-nocache.or32
    - target remote :9999

- load
  - if you have INITIALIZE\_MEMORY\_MODEL enabled you don't have to do this
  - if you have START\_UP and waited for the message: "Memory start-up completed..." you also don't need this
- set \$pc=0x100
- c

## 5. Implementation

- a) configure minsoc defines.v (recommended values for different devices under 7. Examples)
  - ➤ Select your FPGA device by uncommenting its manufacturer and commenting all other manufacturers. Select then your FPGA model by uncommenting it in case you have a Xilinx FPGA, for Altera comment all out.
  - ➤ 'define MEMORY\_ADR\_WIDTH 13 defines the amount of memory you get. The depth is defined by 2<sup>MEMORYADRWIDTH</sup>, since its data width is 32 bits, the amount in Bytes is 4 times its depth. (this is not allowed to be less than 12, 11 is the memory block address width)
  - ➤ choose a clock division for your global clock related to your design max speed by changing the definition: "`define CLOCK\_DIVISOR 5". If you have an Altera device please use only even numbers for the division, odd numbers are going to be rounded down.
  - ➤ Define your RESET polarity: uncomment "'define POSITIVE\_RESET" for an active high reset or "'define NEGATIVE\_RESET" for an active low reset and comment the other.
- b) configure or 1200\_defines.v (optional -> reduce logic usage) (recommended values for different devices under 7. Examples)
  - ➤ Target FPGA memories (OR1200\_XILINX\_RAMB16 for Xilinx, Spartan 3 and above, OR1200 ALTERA LPM for all Altera)
  - > Type of register file RAM (generic, twoport or dual port) (dual port is supported by Xilinx BRAM and Altera)(select only one of the three)
  - ➤ If not using Linux you can comment out OR1200\_QMEM\_IMPLEMENTED.
- c) define user constrains for system pinout (edit backend/yourboard.ucf file)
- d) create project in project manager (ISE, Quartus), include files
- e) synthesize, P&R and upload bitfile
- f) connect the cable to the selected JTAG TAP
- 6. use GDB to upload software and debug for simulation and implementation
  - a) start adv jtag bridge
    - ➤ cd ~/
    - > sudo adv jtag bridge xpc3 (xess, usbblaster, xpc usb, ft2232)

- Let the program running and open another terminal
- b) Open a terminal program (e.g. gtkterm)
  - > configure port to a serial port connected to your board
  - > configure bitrate to 115200
- c) start gdb, load firmware (example)
  - > cd minsoc/sw/uart
  - > or32-elf-gdb uart-nocache.or32
  - > target remote :9999
  - > load
  - > set \$pc=0x100
  - > c
- d) Inside of gtkterm "Hello World." should have appeared, if you press any key inside of gtkterm the processor will return the next alphabetical letter (press a, it returns b)
- 7. Examples: different constraint files for different boards  $\rightarrow$  inside of backend directory
  - a) Spartan 3A DSP 1800 (100%)
    - minsoc\_defines.v
      - no definitions change, ready to go
    - > or1200 defines.v (optional, reduce logic use)
      - uncomment `define OR1200\_XILINX\_RAMB16
      - uncomment `define OR1200 RFRAM DUALPORT
      - comment `define OR1200 RFRAM GENERIC
  - b) Spartan 3E Starter Kit **no Ethernet** (100%)
    - > minsoc defines.v
      - comment `define SPARTAN3A
      - uncomment `define SPARTAN3E
      - change CLOCK DIVISOR from 5 to 2
      - comment `define ETHERNET
    - > or1200\_defines.v
      - uncomment `define OR1200\_XILINX\_RAMB16
  - c) Spartan 3E Starter Kit with Ethernet (not tested)
    - > Synthesis properties:
      - Optimization Goal: Area
      - Optimization Effort: High

- > minsoc defines.v
  - comment `define SPARTAN3A
  - uncomment `define SPARTAN3E
  - let CLOCK DIVISOR at 5
  - change MEMORY\_ADR\_WIDTH from 13 to 12
  - uncomment `define ETHERNET
  - comment `define UART
    - this is not necessary, though you will get 99% device usage if not commenting, 89% otherwise.
- > or1200\_defines.v
  - uncomment 'define OR1200 XILINX RAMB16
  - comment 'define OR1200 MULT IMPLEMENTED
  - comment 'define OR1200 MAC IMPLEMENTED
  - uncomment `define OR1200 RFRAM DUALPORT
  - comment `define OR1200 RFRAM GENERIC
  - comment `define OR1200 PM IMPLEMENTED
  - comment `define OR1200 QMEM IMPLEMENTED
  - comment `define OR1200 CFGR IMPLEMENTED
- > eth defines.v
  - uncomment `define ETH\_FIFO\_XILINX
  - uncomment `define ETH XILINX RAMB4
- > Collateral effects:
  - from sw/support/Makefile.inc line 7:
    - GCC\_OPT=-mhard-mul -g to GCC\_OPT=-msoft-mul -g
  - change sw/support/orp.ld:
    - ram: LENGTH = from 0x00006E00 to 0x00002E00
    - this is not much memory, I recommend the inclusion of the wb\_ddr project to minsoc to use your DDR SRAM memory
  - Further area optimization possibilities: (not necessary, DON'T DO)
    - Turn off: pic, tick timer or debug unit
- d) Altera Devices
  - > minsoc defines.v
    - uncoment `define ALTERA FPGA

- comment `define XILINX\_FPGA
- comment `define SPARTAN3A
- select your memory amount "'define MEMORY ADR WIDTH 13"
- choose a clock division for your global clock related to your design max speed by changing the definition: "'define CLOCK\_DIVISOR 5". Since you have an Altera device please use only even numbers for the division, odd numbers are going to be rounded down
- Define your RESET polarity uncommenting "'define POSITIVE\_RESET" or "'define NEGATIVE RESET" and commenting the other.
- > or1200\_defines.v
  - uncomment `define OR1200\_ALTERA\_LPM
  - uncomment `define OR1200\_RFRAM\_DUALPORT
  - comment `define OR1200 RFRAM GENERIC
  - comment 'define OR1200 QMEM IMPLEMENTED

#### To Do:

- 1. Implement a instantiable standard memory verilog file (90%)
  - a) compatible for simulation, xilinx, altera, asics (100%)
  - b) resizable (100%)
  - c) Able to read in 1 clock cycle (30%)
    - > minsoc\_onchip\_ram\_top needs 2 cycles to complete a read operation, because the read acknowledge is triggered on the rising edge of wb\_clk after wb\_cyc has been set
      - Attempt to change it to negedge led to non running system for XILINX\_RAMB16: neither CPU self test of adv\_jtag\_bridge nor firmware running did work
    - ➤ Change back onchip\_ram.v to generic implementation to posedge because it correspond to the reality of internal FPGA memories. (100%)
    - phasing the clock connection from onchip\_ram\_top to the onchip\_rams by 180° it works both in bench and on board. onchip\_ram(.clk(~wb\_clk\_i));
      - Though, self test of or32 by adv jtag bridge fails, has to be debugged
        - maybe the adv\_jtag\_bridge or the debug unit is stalling the cpu before it can complete something. It stores a program in the memory from 0x0 to 0x28.
          - memory 2 clks: runs till 0x20, stalls for 1 instruction, runs (0x28,0x10,0x14)
          - memory 1 clk: runs till 0x24, stalls for 1 instruction, runs (0x10)
        - difference: signal aborted r from or1200 iwb biu is supressed for 1 clk

memory while not for 2 clk at instruction 0x20 (reading instruction from addr 0x20)

- It is not standard to use the negative clock edge to solve this speed issue (check wishbone specification). Generally this speed decrease would get better with burst accesses, if the acknowledges after the first come instantaneously.
  - But for that, OR1200\_WB\_B3 has to be uncommented in or1200\_defines.v and minsoc\_onchip\_ram\_top.v has to be changed accordingly to use the signals: wb bte i and wb cti i
  - This is also only relevant if cache is implemented, otherwise all accesses from instruction wishbone interface are single accesses. (I guess)
- 2. Use a tc\_top.v from older orpsoc, which manages the system memory and enables connection of peripherals (100%)
- 3. Implement a standard clock divider, which is automatically configured by the system definition file (75%)
  - a) Standard (100%)
  - b) Xilinx (100%)
  - c) Altera (0%)
    - > For now Altera clock divider is implemented as the Standard
  - d) implement in a separated file (100%)
- 4. Implement a standard an unique system definition file, where one can select: (100%)
  - a) how much memory to instantiate (100%)
  - b) FPGA manufacturer and type (100%)
  - c) which JTAG Tap to use (Generic of FPGA) (100%)
    - ➤ which fpga, overwrite internal jtag options.v(100%)
  - d) system clock, clock divider (100%)
  - e) Which interfaces to be connected to the system (UART and ETH) (100%)
- 5. Have standard software which can be directly compiled with make to be uploaded to the system (40%)
  - a) Have it (100%)
  - b) direct set amount of memory and stack size for software based on system definition file (0%)
  - c) direct adopt system address space as configured in definition file to the device drivers written in sw/support directory (0%)
- 6. Have a standard testbench, with which one can simulate through iverilog easily and which can be easily redefined to simulate the environment (interfaces read and write functions) (90%)
  - a) read and write for most interfaces (ETH, UART, CAN, I2C, SPI) (70%)
  - b) regular testbench for the SoC (100%)

- runnable testbench (100%)
- ➤ debug interface (100%)
- > uart output (100%)
- > spi start-up (100%)
  - start-up rom memory (or1k-startup project) (100%)
  - create a SPI model to load the firmare to it at begin (100%)
    - create a memory, which can be written by \$readmemh and read from through spi interface to the or1k-startup project (100%)
    - Add some glue logic to the SoC switch to assign the first commands to the openrisc to jump to the address of the or1k\_startup 0x40000000. (100%)
      - instead of setting the address as in orpsoc 1, assert the instruction through wb\_rim\_dat\_i . (100%)
- Fill memory with a \$readmem code for fast firmware upload on simulation. (100%)
  - It is not allowed to access instances created by generate through a variable, so it is not possible to access each memory block from the memory to load the firmware before testbench execution (now working 100%)
    - Create a memory model, which changes the address width of the memory blocks allowing any memory depth with only 4 instances. Substitute the memory controller used by the implementation with it for the testbench. This way both testbench and implementation work. (100%)
      - This is only necessary for the `define INITIALIZE\_MEMORY\_MODEL, where the firmware is uploaded to the memory even before testbench execution.
      - To use the real memory from the implementation instead, comment this and change minsoc/sim/run/generate\_bench script to use "minsoc/sim/bin/minsoc\_memory\_fast.txt" instead of "minsoc/sim/bin/minsoc\_model\_fast.txt".
  - Filled memory does not run the complete program: ("o World.") ( now working 100%)
    - After reset by gdb (set \$pc=0x100), continue leads to SIGBUS error. ( now working 100%)
      - this seems to happen from the debug side, debug has asserted du\_stall, why?
      - Reloading the program by gdb works, if we split the memory into 4 blocks (100%)
    - SIGBUS error and ("o World.") output happened, because after the program size bytes the following program goes into the memory starting from address 0x4 not 0x0. (now working 100%)
- > select firmware on command line (100%)

- ➤ Including tb\_eth\_defines.v , eth\_phy\_defines.v , eth\_phy.v reduces simulation speed by factor 300 more or less, solve this (100%)
  - minsoc/sim/bin/minsoc\_model\_fast.txt removes them from bench initialization
    - This requires you to comment `define ETHERNET from minsoc/rtl/verilog/minsoc defines.v
    - Then edit generate\_bench to use minsoc\_model\_fast.txt
  - · minsoc model fast runs hello world in 13 seconds
    - minsoc model complete would run it in 65 minutes
  - eth\_phy.v removed from project, instead eth\_rx\_send task has been ported to the main testbench and is working. It does not handle collisions, but collisions only happen with more communicating nodes, which are not standard here. Speed is back to normal and task is functional.
- 7. Have different constraint files for different boards  $\rightarrow$  inside of backend directory
  - a) Spartan 3A DSP 1800 (100%)
  - b) Spartan 3E Starter Kit (100%)
    - minsoc\_defines.v
      - comment `define SPARTAN3A
      - uncomment `define SPARTAN3E
      - change CLOCK DIVISOR from 5 to 2
      - comment `define ETHERNET
    - > or1200 defines.v
      - uncomment `define OR1200\_XILINX\_RAMB16

#### To Do v. 2:

- 1. Add memory interfaces for external memory
  - a) SDRAM, DDR, DDR2
- 2. Look for a way to allow automatically insertion of new modules to minsoc top:
  - a) memory address input
  - b) automatic wishbone connection for minsoc top
  - c) automatic connection to minsoc tc top
  - d) Full switch function, according to amount of masters, i.e.:
    - ➤ 2 masters, 2 buses, 2 router: arbiter has to assign correct bus to the calling master)
  - e) Switch issues, "minsoc tc top.v":
    - > modules instantiated by generate cannot be accessed through variable later on

- > maybe it can be done with parameters, macro and loop only
- > possibility of perlilog use for that
  - hardens testbench creation and raises compatibility issues (I suppose)