Line 1... |
Line 1... |
Let's Get It Started!
|
Let's Get It Started!
|
|
|
To make your NEORV32 project run, follow the guides from the upcoming sections. Follow these guides
|
This user guide uses the NEORV32 project _as is_ from the official `neorv32` repository.
|
step by step and in the presented order.
|
To make your first NEORV32 project run, follow the guides from the upcoming sections. It is recommended to
|
|
follow these guides step by step and eventually in the presented order.
|
|
|
|
[TIP]
|
|
This guide uses the minimalistic and platform/toolchain agnostic SoC test setups from
|
|
`rtl/test_setups` for illustration. You can use one of the provided test setups for
|
|
your first FPGA tests. Alternatively, have a look at the `setups` folder,
|
|
which provides more sophisticated example setups for various FPGAs/FPGA boards and toolchains.
|
|
|
|
|
:sectnums:
|
:sectnums:
|
== Software Toolchain Setup
|
== Software Toolchain Setup
|
|
|
To compile (and debug) executables for the NEORV32 a RISC-V toolchain is required.
|
To compile (and debug) executables for the NEORV32 a RISC-V toolchain is required.
|
There are two possibilities to get this:
|
There are two possibilities to get this:
|
|
|
1. Download and _build_ the official RISC-V GNU toolchain yourself
|
1. Download and _build_ the official RISC-V GNU toolchain yourself.
|
2. Download and install a prebuilt version of the toolchain; this might also done via the package manager / app store of your OS
|
2. Download and install a prebuilt version of the toolchain; this might also done via the package manager / app store of your OS
|
|
|
[TIP]
|
[NOTE]
|
The default toolchain prefix for this project is **`riscv32-unknown-elf-`**. Of course you can use any other RISC-V
|
The default toolchain prefix (`RISCV_PREFIX` variable) for this project is **`riscv32-unknown-elf-`**. Of course you can use any other RISC-V
|
toolchain (like `riscv64-unknown-elf-`) that is capable to emit code for a `rv32` architecture. Just change the _RISCV_PREFIX_ variable in the application
|
toolchain (like `riscv64-unknown-elf-`) that is capable to emit code for a `rv32` architecture. Just change `RISCV_PREFIX`
|
makefile(s) according to your needs or define this variable when invoking the makefile.
|
according to your needs.
|
|
|
[IMPORTANT]
|
|
Keep in mind that – for instance – a rv32imc toolchain only provides library code compiled with
|
|
compressed (_C_) and `mul`/`div` instructions (_M_)! Hence, this code cannot be executed (without
|
|
emulation) on an architecture without these extensions!
|
|
|
|
|
|
:sectnums:
|
:sectnums:
|
=== Building the Toolchain from Scratch
|
=== Building the Toolchain from Scratch
|
|
|
Line 37... |
Line 40... |
----
|
----
|
riscv-gnu-toolchain$ ./configure --prefix=/opt/riscv --with-arch=rv32i –-with-abi=ilp32
|
riscv-gnu-toolchain$ ./configure --prefix=/opt/riscv --with-arch=rv32i –-with-abi=ilp32
|
riscv-gnu-toolchain$ make
|
riscv-gnu-toolchain$ make
|
----
|
----
|
|
|
|
[IMPORTANT]
|
|
Keep in mind that – for instance – a toolchain build with `--with-arch=rv32imc` only provides library code compiled with
|
|
compressed (`C`) and `mul`/`div` instructions (`M`)! Hence, this code cannot be executed (without
|
|
emulation) on an architecture without these extensions!
|
|
|
|
|
:sectnums:
|
:sectnums:
|
=== Downloading and Installing a Prebuilt Toolchain
|
=== Downloading and Installing a Prebuilt Toolchain
|
|
|
Alternatively, you can download a prebuilt toolchain.
|
Alternatively, you can download a prebuilt toolchain.
|
Line 101... |
Line 109... |
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
== General Hardware Setup
|
== General Hardware Setup
|
|
|
This guide will setup a NEORV32 project for FPGA implementation (or simulation only) _from scratch_
|
This guide shows the basics of setting up a NEORV32 project for FPGA implementation (or simulation only)
|
|
_from scratch_. It uses a _simplified_ test "SoC" setup of the processor to keeps things simple at the beginning.
|
|
This simple setup is intended for evaluation or as "hello world" project to check out the NEORV32
|
|
on _your_ FPGA board.
|
|
|
[TIP]
|
[TIP]
|
If you want to use a complete pre-defined setup to start with, check out the
|
If you want to use a more sophisticated pre-defined setup to start with, check out the
|
project's `setups` folder (https://github.com/stnolting/neorv32/tree/master/setups),
|
`setups` folder, which provides example setups for various FPGA, boards and toolchains.
|
which provides (script-based) demo setups for various FPGA boards and toolchains.
|
|
|
The NEORV32 project features two minimalistic pre-configured test setups in
|
|
https://github.com/stnolting/neorv32/blob/master/rtl/test_setups[`rtl/test_setups`].
|
|
Both test setups only implement very basic processor and CPU features.
|
|
The main difference between the two setups is the processor boot concept - so how to get a software executable
|
|
_into_ the processor:
|
|
|
|
* **`rtl/test_setups/neorv32_testsetup_approm.vhd`**: this setup does not require a connection via UART. The
|
|
software executable is "installed" into the bitstream to initialize a read-only memory. Use this setup
|
|
if your FPGA board does _not_ provide a UART interface.
|
|
* **`rtl/test_setups/neorv32_testsetup_bootloader.vhd`**: this setups uses the UART and the default NEORV32
|
|
bootloader to upload new software executables. Use this setup if your board _does_ provide a UART interface.
|
|
|
|
.NEORV32 "hello world" test setup (`rtl/test_setups/neorv32_testsetup_bootloader.vhd`)
|
|
image::neorv32_test_setup.png[align=center]
|
|
|
This tutorial uses a _simplified_ test setup of the processor
|
.External Clock Source
|
to keeps things simple at the beginning as this setup is intended as
|
[NOTE]
|
evaluation or "hello world" project to check out the NEORV32.
|
These test setups are intended to be directly used as **design top entity**. Of course you can also instantiate them
|
|
into another design unit. If your FPGA board only provides _very fast_ external clock sources (like on the FOMU board)
|
|
you might need to add clock management components (PLLs, DCMs, MMCMs, ...) to the test setup or to the according top entity
|
|
if you instantiate one of the test setups.
|
|
|
[start=1]
|
[start=1]
|
. Create a new project with your FPGA EDA tool of choice.
|
. Create a new project with your FPGA EDA tool of choice.
|
. Add all VHDL files from the project's `rtl/core` folder to your project. Make sure to _reference_ the
|
. Add all VHDL files from the project's `rtl/core` folder to your project.
|
files only – do not copy them.
|
|
. Make sure to add all the rtl files to a new library called `neorv32`. If your FPGA tools does not
|
. Make sure to add all the rtl files to a new library called `neorv32`. If your FPGA tools does not
|
provide a field to enter the library name, check out the "properties" menu of the added rtl files.
|
provide a field to enter the library name, check out the "properties" menu of the added rtl files.
|
. The `rtl/core/neorv32_top.vhd` VHDL file is the top entity of the NEORV32 processor. If you
|
. The `rtl/core/neorv32_top.vhd` VHDL file is the top entity of the NEORV32 processor, which can be
|
already have a design, instantiate this unit into your design and proceed.
|
instantiated into the "real" project. However, in this tutorial we will use one of the pre-defined
|
|
test setups from `rtl/test_setups` (see above).
|
|
|
[IMPORTANT]
|
[IMPORTANT]
|
Make sure to include the `neorv32` package into your design when instantiating the processor: add
|
Make sure to include the `neorv32` package into your design when instantiating the processor: add
|
`library neorv32;` and `use neorv32.neorv32_package.all;` to your design unit.
|
`library neorv32;` and `use neorv32.neorv32_package.all;` to your design unit.
|
|
|
[start=5]
|
[start=5]
|
. If you do not have a design yet and just want to check out the NEORV32 – no problem! This guide
|
. Add the pre-defined test setup of choice to the project, too, and select it as _top entity_.
|
uses a simplified top entity, that encapsulates the actual processor top entity: add the
|
. The entity of both test setups
|
`rtl/templates/processor/neorv32_ProcessorTop_Test.vhd` VHDL file to your project, too, and
|
provide a minimal set of configuration generics, that might have to be adapted to match your FPGA and board:
|
select it as _top entity_.
|
|
. This test setup provides a minimal test hardware setup:
|
|
|
|
.NEORV32 "hello world" test setup
|
|
image::neorv32_test_setup.png[align=center]
|
|
|
|
[start=7]
|
.Test setup entity - configuration generics
|
. It only implements some very basic processor and CPU features. Also, only the
|
|
minimum number of signals is propagated to the outer world.
|
|
. However, a minimal setup-specific configuration of the NEORV32 processor is required to make it run
|
|
on your FPGA board of choice. Only the absolutely required modifications will be made while
|
|
keeping the default configuration for the remaining configuration options:
|
|
|
|
.Cut-out of `neorv32_ProcessorTop_Test.vhd` showing the processor instance and its configuration
|
|
[source,vhdl]
|
[source,vhdl]
|
----
|
----
|
neorv32_top_inst: neorv32_top
|
generic (
|
generic map (
|
-- adapt these for your setup --
|
-- General --
|
CLOCK_FREQUENCY : natural := 100000000; <1>
|
CLOCK_FREQUENCY => 100000000, -- in Hz # <1>
|
MEM_INT_IMEM_SIZE : natural := 16*1024; <2>
|
INT_BOOTLOADER_EN => true,
|
MEM_INT_DMEM_SIZE : natural := 8*1024 <3>
|
...
|
);
|
-- Internal instruction memory --
|
|
MEM_INT_IMEM_EN => true,
|
|
MEM_INT_IMEM_SIZE => 16*1024, # <2>
|
|
-- Internal data memory --
|
|
MEM_INT_DMEM_EN => true,
|
|
MEM_INT_DMEM_SIZE => 8*1024, # <3>
|
|
...
|
|
----
|
----
|
<1> Clock frequency of `clk_i` signal in Hertz
|
<1> Clock frequency of `clk_i` signal in Hertz
|
<2> Default size of internal instruction memory: 16kB
|
<2> Default size of internal instruction memory: 16kB
|
<3> Default size of internal data memory: 8kB
|
<3> Default size of internal data memory: 8kB
|
|
|
[start=9]
|
[start=7]
|
. There is one generic that has to be set according to your FPGA board setup: the actual clock frequency
|
. If you feel like it – or if your FPGA does not provide sufficient resources – you can modify the
|
of the top's clock input signal (`clk_i`). Use the _CLOCK_FREQUENC_Y generic to specify your clock source's
|
_memory sizes_ (`MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE` – marked with notes "2" and "3"). But as mentioned
|
frequency in Hertz (Hz) (note "1").
|
above, let's keep things simple at first and use the standard configuration for now.
|
. If you feel like it – or if your FPGA does not provide many resources – you can modify the
|
. There is one generic that _has to be set according to your FPGA board_ setup: the actual clock frequency
|
**memory sizes** (_MEM_INT_IMEM_SIZE_ and _MEM_INT_DMEM_SIZE_ – marked with notes "2" and "3") or even
|
of the top's clock input signal (`clk_i`). Use the `CLOCK_FREQUENCY` generic to specify your clock source's
|
exclude certain ISA extensions and peripheral modules from implementation - but as mentioned above, let's keep things
|
frequency in Hertz (Hz).
|
simple at first and use the standard configuration for now.
|
|
|
|
[NOTE]
|
[NOTE]
|
If you have changed the default memory configuration (_MEM_INT_IMEM_SIZE_ and _MEM_INT_DMEM_SIZE_ generics)
|
If you have changed the default memory configuration (`MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE` generics)
|
keep those new sizes in mind – these values are required for setting
|
keep those new sizes in mind – these values are required for setting
|
up the software framework in the next section <<_general_software_framework_setup>>.
|
up the software framework in the next section <<_general_software_framework_setup>>.
|
|
|
[start=11]
|
[start=9]
|
. Depending on your FPGA tool of choice, it is time to assign the signals of the test setup top entity to
|
. Depending on your FPGA tool of choice, it is time to assign the signals of the test setup top entity to
|
the according pins of your FPGA board. All the signals can be found in the entity declaration:
|
the according pins of your FPGA board. All the signals can be found in the entity declaration of the
|
|
corresponding test setup:
|
|
|
.Entity signals of `neorv32_test_setup.vhd`
|
.Entity signals of `neorv32_testsetup_approm.vhd`
|
[source,vhdl]
|
[source,vhdl]
|
----
|
----
|
entity neorv32_test_setup is
|
|
port (
|
port (
|
-- Global control --
|
-- Global control --
|
clk_i : in std_ulogic := '0'; -- global clock, rising edge
|
clk_i : in std_ulogic; -- global clock, rising edge
|
rstn_i : in std_ulogic := '0'; -- global reset, low-active, async
|
rstn_i : in std_ulogic; -- global reset, low-active, async
|
|
-- GPIO --
|
|
gpio_o : out std_ulogic_vector(7 downto 0) -- parallel output
|
|
);
|
|
----
|
|
|
|
.Entity signals of `neorv32_testsetup_bootloader.vhd`
|
|
[source,vhdl]
|
|
----
|
|
port (
|
|
-- Global control --
|
|
clk_i : in std_ulogic; -- global clock, rising edge
|
|
rstn_i : in std_ulogic; -- global reset, low-active, async
|
-- GPIO --
|
-- GPIO --
|
gpio_o : out std_ulogic_vector(7 downto 0); -- parallel output
|
gpio_o : out std_ulogic_vector(7 downto 0); -- parallel output
|
-- UART0 --
|
-- UART0 --
|
uart0_txd_o : out std_ulogic; -- UART0 send data
|
uart0_txd_o : out std_ulogic; -- UART0 send data
|
uart0_rxd_i : in std_ulogic := '0' -- UART0 receive data
|
uart0_rxd_i : in std_ulogic -- UART0 receive data
|
);
|
);
|
end neorv32_test_setup;
|
|
----
|
----
|
|
|
[start=12]
|
.Signal Polarity
|
|
[NOTE]
|
|
If your FPGA board has inverse polarity for certain input/output you can add `not` gates. Example: The reset signal
|
|
`rstn_i` is low-active by default; the LEDs connected to `gpio_o` high-active by default.
|
|
You can do this in your board top if you instantiate the test setup,
|
|
or _inside_ the test setup if this is your top entity (low-active LEDs example: `gpio_o <= NOT con_gpio_o(7 downto 0);`).
|
|
|
|
[start=10]
|
. Attach the clock input `clk_i` to your clock source and connect the reset line `rstn_i` to a button of
|
. Attach the clock input `clk_i` to your clock source and connect the reset line `rstn_i` to a button of
|
your FPGA board. Check whether it is low-active or high-active – the reset signal of the processor is
|
your FPGA board. Check whether it is low-active or high-active – the reset signal of the processor is
|
**low-active**, so maybe you need to invert the input signal.
|
**low-active**, so maybe you need to invert the input signal.
|
. If possible, connected at least bit `0` of the GPIO output port `gpio_o` to a high-active LED (invert
|
. If possible, connected _at least_ bit `0` of the GPIO output port `gpio_o` to a LED (see "Signal Polarity" note above).
|
the signal when your LEDs are low-active). This LED will be used as status LED for the setup.
|
. Finally, if your are using the UART-based test setup (`neorv32_testsetup_bootloader.vhd`)
|
. Finally, if your FPGA board provides a serial host interface (USB-to-serial converter) interface,
|
connect the UART communication signals `uart0_txd_o` and `uart0_rxd_i` to the host interface (e.g. USB-UART converter).
|
connect the UART communication signals `uart0_txd_o` and `uart0_rxd_i`.
|
|
. Perform the project HDL compilation (synthesis, mapping, bitstream generation).
|
. Perform the project HDL compilation (synthesis, mapping, bitstream generation).
|
. Program the generated bitstream into your FPGA and press the button connected to the reset signal.
|
. Program the generated bitstream into your FPGA and press the button connected to the reset signal.
|
. Done! The assigned status LED should be flashing now for some sections before permanently lighting up.
|
. Done! The LED at `gpio_o(0)` should be flashing now.
|
|
|
|
[TIP]
|
|
After the GCC toolchain for compiling RISC-V source code is ready (chapter <<_general_software_framework_setup>>),
|
|
you can advance to one of these chapters to learn how to get a software executable into your processor setup:
|
|
* If you are using the `neorv32_testsetup_approm.vhd` setup: See section <<_installing_an_executable_directly_into_memory>>.
|
|
* If you are using the `neorv32_testsetup_bootloader.vhd` setup: See section <<_uploading_and_starting_of_a_binary_executable_image_via_uart>>.
|
|
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
Line 600... |
Line 631... |
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
|
== Application-Specific Processor Configuration
|
|
|
|
Due to the processor's configuration options, which are mainly defined via the top entity VHDL generics, the SoC
|
|
can be tailored to the application-specific requirements. Note that this chapter does not focus on optional
|
|
_SoC features_ like IO/peripheral modules. It rather gives ideas on how to optimize for _overall goals_
|
|
like performance and area.
|
|
|
|
[NOTE]
|
|
Please keep in mind that optimizing the design in one direction (like performance) will also effect other potential
|
|
optimization goals (like area and energy).
|
|
|
|
=== Optimize for Performance
|
|
|
|
The following points show some concepts to optimize the processor for performance regardless of the costs
|
|
(i.e. increasing area and energy requirements):
|
|
|
|
* Enable all performance-related RISC-V CPU extensions that implement dedicated hardware accelerators instead
|
|
of emulating operations entirely in software: `M`, `C`, `Zfinx`
|
|
* Enable mapping of compleX CPU operations to dedicated hardware: `FAST_MUL_EN => true` to use DSP slices for
|
|
multiplications, `FAST_SHIFT_EN => true` use a fast barrel shifter for shift operations.
|
|
* Implement the instruction cache: `ICACHE_EN => true`
|
|
* Use as many _internal_ memory as possible to reduce memory access latency: `MEM_INT_IMEM_EN => true` and
|
|
`MEM_INT_DMEM_EN => true`, maximize `MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE`
|
|
* Increase the CPU's instruction prefetch buffer size: `CPU_IPB_ENTRIES`
|
|
* _To be continued..._
|
|
|
|
|
|
=== Optimize for Size
|
|
|
|
The NEORV32 is a size-optimized processor system that is intended to fit into tiny niches within large SoC
|
|
designs or to be used a customized microcontroller in really tiny / low-power FPGAs (like Lattice iCE40).
|
|
Here are some ideas how to make the processor even smaller while maintaining it's _general purpose system_
|
|
concept and maximum RISC-V compatibility.
|
|
|
|
**SoC**
|
|
|
|
* This is obvious, but exclude all unused optional IO/peripheral modules from synthesis via the processor
|
|
configuration generics.
|
|
* If an IO module provides an option to configure the number of "channels", constrain this number to the
|
|
actually required value (e.g. the PWM module `IO_PWM_NUM_CH` or the external interrupt controller `XIRQ_NUM_CH`).
|
|
* Reduce the FIFO sizes of implemented modules (e.g. `SLINK_TX_FIFO`).
|
|
* Disable the instruction cache (`ICACHE_EN => false`) if the design only uses processor-internal IMEM
|
|
and DMEM memories.
|
|
* _To be continued..._
|
|
|
|
**CPU**
|
|
|
|
* Use the _embedded_ RISC-V CPU architecture extension (`CPU_EXTENSION_RISCV_E`) to reduce block RAM utilization.
|
|
* The compressed instructions extension (`CPU_EXTENSION_RISCV_C`) requires additional logic for the decoder but
|
|
also reduces program code size by approximately 30%.
|
|
* If not explicitly used/required, constrain the CPU's counter sizes: `CPU_CNT_WIDTH` for `[m]instret[h]`
|
|
(number of instruction) and `[m]cycle[h]` (number of cycles) counters. You can even remove these counters
|
|
by setting `CPU_CNT_WIDTH => 0` if they are not used at all (note, this is not RISC-V compliant).
|
|
* Reduce the CPU's prefetch buffer size (`CPU_IPB_ENTRIES`).
|
|
* Map CPU shift operations to a small and iterative shifter unit (`FAST_SHIFT_EN => false`).
|
|
* If you have unused DSP block available, you can map multiplication operations to those slices instead of
|
|
using LUTs to implement the multiplier (`FAST_MUL_EN => true`).
|
|
* If there is no need to execute division in hardware, use the `Zmmul` extension instead of the full-scale
|
|
`M` extension.
|
|
* Disable CPU extension that are not explicitly used (`A`, `U`, `Zfinx`).
|
|
* _To be continued..._
|
|
|
|
=== Optimize for Clock Speed
|
|
|
|
The NEORV32 Processor and CPU are designed to provide minimal logic between register stages to keep the
|
|
critical path as short as possible. When enabling additional extension or modules the impact on the existing
|
|
logic is also kept at a minimum to prevent timing degrading. If there is a major impact on existing
|
|
logic (example: many physical memory protection address configuration registers) the VHDL code automatically
|
|
adds additional register stages to maintain critical path length. Obviously, this increases operation latency.
|
|
|
|
In order to optimize for a minimal critical path (= maximum clock speed) the following points should be considered:
|
|
|
|
* Complex CPU extensions (in terms of hardware requirements) should be avoided (examples: floating-point unit, physical memory protection).
|
|
* Large carry chains (>32-bit) should be avoided (constrain CPU counter sizes: e.g. `CPU_CNT_WIDTH => 32` and `HPM_NUM_CNTS => 32`).
|
|
* If the target FPGA provides sufficient DSP resources, CPU multiplication operations can be mapped to DSP slices (`FAST_MUL_EN => true`)
|
|
reducing LUT usage and critical path impact while also increasing overall performance.
|
|
* Use the synchronous (registered) RX path configuration of the external memory interface (`MEM_EXT_ASYNC_RX => false`).
|
|
* _To be continued..._
|
|
|
|
[NOTE]
|
|
The short and fixed-length critical path allows to integrate the core into existing clock domains.
|
|
So no clock domain-crossing and no sub-clock generation is required. However, for very high clock
|
|
frequencies (this is technology / platform dependent) clock domain crossing becomes crucial for chip-internal
|
|
connections.
|
|
|
|
|
|
=== Optimize for Energy
|
|
|
|
There are no _dedicated_ configuration options to optimize the processor for energy (minimal consumption;
|
|
energy/instruction ratio) yet. However, a reduced processor area (<<_optimize_for_size>>) will also reduce
|
|
static energy consumption.
|
|
|
|
To optimize your setup for low-power applications, you can make use of the CPU sleep mode (`wfi` instruction).
|
|
Put the CPU to sleep mode whenever possible. Disable all processor modules that are not actually used (exclude them
|
|
from synthesis if the will be _never_ used; disable the module via it's control register if the module is not
|
|
_currently_ used). When is sleep mode, you can keep a timer module running (MTIME or the watch dog) to wake up
|
|
the CPU again. Since the wake up is triggered by _any_ interrupt, the external interrupt controller can also
|
|
be used to wake up the CPU again. By this, all timers (and all other modules) can be deactivated as well.
|
|
|
|
.Processor-internal clock generator shutdown
|
|
[TIP]
|
|
If _no_ IO/peripheral module is currently enabled, the processor's internal clock generator circuit will be
|
|
shut down reducing switching activity and thus, dynamic energy consumption.
|
|
|
|
|
|
|
|
|
|
// ####################################################################################################################
|
|
:sectnums:
|
== Customizing the Internal Bootloader
|
== Customizing the Internal Bootloader
|
|
|
The NEORV32 bootloader provides several options to configure and customize it for a certain application setup.
|
The NEORV32 bootloader provides several options to configure and customize it for a certain application setup.
|
This configuration is done by passing _defines_ when compiling the bootloader. Of course you can also
|
This configuration is done by passing _defines_ when compiling the bootloader. Of course you can also
|
modify to bootloader source code to provide a setup that perfectly fits your needs.
|
modify to bootloader source code to provide a setup that perfectly fits your needs.
|
Line 630... |
Line 770... |
4+^| Boot configuration
|
4+^| Boot configuration
|
| `AUTO_BOOT_SPI_EN` | `0` | `0`, `1` | Set `1` to enable immediate boot from external SPI flash
|
| `AUTO_BOOT_SPI_EN` | `0` | `0`, `1` | Set `1` to enable immediate boot from external SPI flash
|
| `AUTO_BOOT_OCD_EN` | `0` | `0`, `1` | Set `1` to enable boot via on-chip debugger (OCD)
|
| `AUTO_BOOT_OCD_EN` | `0` | `0`, `1` | Set `1` to enable boot via on-chip debugger (OCD)
|
| `AUTO_BOOT_TIMEOUT` | `8` | _any_ | Time in seconds after the auto-boot sequence starts (if there is no UART input by user); set to 0 to disabled auto-boot sequence
|
| `AUTO_BOOT_TIMEOUT` | `8` | _any_ | Time in seconds after the auto-boot sequence starts (if there is no UART input by user); set to 0 to disabled auto-boot sequence
|
4+^| SPI configuration
|
4+^| SPI configuration
|
|
| `SPI_EN` | `1` | `0`, `1` | Set `1` to enable the usage of the SPI module (including load/store executables from/to SPI flash options)
|
| `SPI_FLASH_CS` | `0` | `0` ... `7` | SPI chip select output (`spi_csn_o`) for selecting flash
|
| `SPI_FLASH_CS` | `0` | `0` ... `7` | SPI chip select output (`spi_csn_o`) for selecting flash
|
| `SPI_FLASH_SECTOR_SIZE` | `65536` | _any_ | SPI flash sector size in bytes
|
| `SPI_FLASH_SECTOR_SIZE` | `65536` | _any_ | SPI flash sector size in bytes
|
| `SPI_FLASH_CLK_PRSC` | `CLK_PRSC_8` | `CLK_PRSC_2` `CLK_PRSC_4` `CLK_PRSC_8` `CLK_PRSC_64` `CLK_PRSC_128` `CLK_PRSC_1024` `CLK_PRSC_2024` `CLK_PRSC_4096` | SPI clock pre-scaler (dividing main processor clock)
|
| `SPI_FLASH_CLK_PRSC` | `CLK_PRSC_8` | `CLK_PRSC_2` `CLK_PRSC_4` `CLK_PRSC_8` `CLK_PRSC_64` `CLK_PRSC_128` `CLK_PRSC_1024` `CLK_PRSC_2024` `CLK_PRSC_4096` | SPI clock pre-scaler (dividing main processor clock)
|
| `SPI_BOOT_BASE_ADDR` | `0x08000000` | _any_ 32-bit value | Defines the _base_ address of the executable in external flash
|
| `SPI_BOOT_BASE_ADDR` | `0x08000000` | _any_ 32-bit value | Defines the _base_ address of the executable in external flash
|
|=======================
|
|=======================
|
Line 818... |
Line 959... |
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
== Simulating the Processor
|
== Simulating the Processor
|
|
|
.WORK IN PROGRESS
|
|
[WARNING]
|
|
This Section Is Under Construction! +
|
|
+
|
|
FIXME!
|
|
|
|
:sectnums:
|
:sectnums:
|
=== Testbench
|
=== Testbench
|
|
|
The NEORV32 project features a simple default testbench (`sim/neorv32_tb.simple.vhd`) that can be used to simulate
|
The NEORV32 project features a simple, plain-VHDL (no third-party libraries) default testbench (`sim/neorv32_tb.simple.vhd`)
|
and test the processor setup. This testbench features a 100MHz clock and enables all optional peripheral and
|
that can be used to simulate and test the processor setup. This testbench features a 100MHz clock and enables all optional
|
CPU extensions except for the `E` extension and the TRNG IO module (that CANNOT be simulated due to its
|
peripheral and CPU extensions except for the `E` extension and the TRNG IO module (that CANNOT be simulated due to its
|
combinatorial (looped) oscillator architecture).
|
combinatorial (looped) architecture).
|
|
|
The simulation setup is configured via the "User Configuration" section located right at the beginning of
|
The simulation setup is configured via the "User Configuration" section located right at the beginning of
|
the testbench's architecture. Each configuration constant provides comments to explain the functionality.
|
the testbench's architecture. Each configuration constant provides comments to explain the functionality.
|
|
|
Besides the actual NEORV32 Processor, the testbench also simulates "external" components that are connected
|
Besides the actual NEORV32 Processor, the testbench also simulates "external" components that are connected
|
Line 858... |
Line 993... |
| `0x80000000` | `dmem_size_c` | `r/w/e, a, 8/16/32` | external DMEM
|
| `0x80000000` | `dmem_size_c` | `r/w/e, a, 8/16/32` | external DMEM
|
| `0xf0000000` | 64 bytes | `r/w/e, !a, 8/16/32` | external "IO" memory, atomic accesses will fail
|
| `0xf0000000` | 64 bytes | `r/w/e, !a, 8/16/32` | external "IO" memory, atomic accesses will fail
|
| `0xff000000` | 4 bytes | `-/w/-, a, -/-/32` | memory-mapped register to trigger "machine external", "machine software" and "SoC Fast Interrupt" interrupts
|
| `0xff000000` | 4 bytes | `-/w/-, a, -/-/32` | memory-mapped register to trigger "machine external", "machine software" and "SoC Fast Interrupt" interrupts
|
|=======================
|
|=======================
|
|
|
The simulated NEORV32 does not use the bootloader and directly boots the current application image (from
|
|
the `rtl/core/neorv32_application_image.vhd` image file). Make sure to use the `all` target of the
|
|
makefile to install your application as VHDL image after compilation:
|
|
|
|
[source, bash]
|
|
----
|
|
sw/example/blink_led$ make clean_all all
|
|
----
|
|
|
|
.Simulation-Optimized CPU/Processors Modules
|
|
[NOTE]
|
[NOTE]
|
The `sim/rtl_modules` folder provides simulation-optimized versions of certain CPU/processor modules.
|
The simulated NEORV32 does not use the bootloader and _directly boots_ the current application image (from
|
These alternatives can be used to replace the default CPU/processor HDL files to allow faster/easier/more
|
the `rtl/core/neorv32_application_image.vhd` image file).
|
efficient simulation. **These files are not intended for synthesis!**
|
|
|
|
**Simulation Console Output**
|
|
|
|
|
.UART output during simulation
|
|
[NOTE]
|
Data written to the NEORV32 UART0 / UART1 transmitter is send to a virtual UART receiver implemented
|
Data written to the NEORV32 UART0 / UART1 transmitter is send to a virtual UART receiver implemented
|
as part of the testbench. Received chars are send to the simulator console and are also stored to a log file
|
as part of the testbench. Received chars are send to the simulator console and are also stored to a log file
|
(`neorv32.testbench_uart0.out` for UART0, `neorv32.testbench_uart1.out` for UART1) inside the simulator home folder.
|
(`neorv32.testbench_uart0.out` for UART0, `neorv32.testbench_uart1.out` for UART1) inside the simulation's home folder.
|
|
**Please note that printing via the native UART receiver takes a lot of time.** For faster simulation console output
|
|
see section <<_faster_simulation_console_output>>.
|
|
|
|
|
:sectnums:
|
:sectnums:
|
=== Faster Simulation Console Output
|
=== Faster Simulation Console Output
|
|
|
Line 907... |
Line 1033... |
[source, bash]
|
[source, bash]
|
----
|
----
|
sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all all
|
sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all all
|
----
|
----
|
|
|
The provided define will change the default UART0/UART1 setup function in order to set the simulation mode flag in the according UART's control register.
|
The provided define will change the default UART0/UART1 setup function in order to set the simulation
|
|
mode flag in the according UART's control register.
|
|
|
[NOTE]
|
[NOTE]
|
The UART simulation output (to file and to screen) outputs "complete lines" at once. A line is
|
The UART simulation output (to file and to screen) outputs "complete lines" at once. A line is
|
completed with a line feed (newline, ASCII `\n` = 10).
|
completed with a line feed (newline, ASCII `\n` = 10).
|
|
|
Line 927... |
Line 1054... |
----
|
----
|
neorv32/sim$ sh ghdl_sim.sh --stop-time=20ms
|
neorv32/sim$ sh ghdl_sim.sh --stop-time=20ms
|
----
|
----
|
|
|
|
|
|
:sectnums:
|
|
=== In-Console Application Simulation
|
|
|
|
To directly compile and run a program in the console (using the default testbench and GHDL
|
|
as simulator) you can use the `sim` makefile target. Make sure to use the UART simulation mode
|
|
(`USER_FLAGS+=-DUART0_SIM_MODE` and/or `USER_FLAGS+=-DUART1_SIM_MODE`) to get
|
|
faster / direct-to-console UART output.
|
|
|
|
[source, bash]
|
|
----
|
|
sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all sim
|
|
[...]
|
|
Blinking LED demo program
|
|
----
|
|
|
|
|
|
:sectnums:
|
|
=== Hello World!
|
|
|
|
To do a quick test of the NEORV32 make sure to have [GHDL](https://github.com/ghdl/ghdl) and a
|
|
[RISC-V gcc toolchain](https://github.com/stnolting/riscv-gcc-prebuilt) installed, navigate to the project's
|
|
`sw/example/hello_world` folder and run `make USER_FLAGS+=-DUART0_SIM_MODE MARCH=-march=rv32imac clean_all sim`:
|
|
|
|
[TIP]
|
|
The simulator will output some _sanity check_ notes (and warnings or even errors if something is ill-configured)
|
|
right at the beginning of the simulation to give a brief overview of the actual NEORV32 SoC and CPU configurations.
|
|
|
|
[source, bash]
|
|
----
|
|
stnolting@Einstein:/mnt/n/Projects/neorv32/sw/example/hello_world$ make USER_FLAGS+=-DUART0_SIM_MODE MARCH=-march=rv32imac clean_all sim
|
|
../../../sw/lib/source/neorv32_uart.c: In function 'neorv32_uart0_setup':
|
|
../../../sw/lib/source/neorv32_uart.c:301:4: warning: #warning UART0_SIM_MODE (primary UART) enabled! Sending all UART0.TX data to text.io simulation output instead of real UART0 transmitter. Use this for simulations only! [-Wcpp]
|
|
301 | #warning UART0_SIM_MODE (primary UART) enabled! Sending all UART0.TX data to text.io simulation output instead of real UART0 transmitter. Use this for simulations only!
|
|
| ^~~~~~~
|
|
Memory utilization:
|
|
text data bss dec hex filename
|
|
4612 0 120 4732 127c main.elf
|
|
Compiling ../../../sw/image_gen/image_gen
|
|
Installing application image to ../../../rtl/core/neorv32_application_image.vhd
|
|
Simulating neorv32_application_image.vhd...
|
|
Tip: Compile application with USER_FLAGS+=-DUART[0/1]_SIM_MODE to auto-enable UART[0/1]'s simulation mode (redirect UART output to simulator console).
|
|
Using simulation runtime args: --stop-time=10ms
|
|
../rtl/core/neorv32_top.vhd:347:3:@0ms:(assertion note): NEORV32 PROCESSOR IO Configuration: GPIO MTIME UART0 UART1 SPI TWI PWM WDT CFS SLINK NEOLED XIRQ
|
|
../rtl/core/neorv32_top.vhd:370:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Boot configuration: Direct boot from memory (processor-internal IMEM).
|
|
../rtl/core/neorv32_top.vhd:394:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing on-chip debugger (OCD).
|
|
../rtl/core/neorv32_cpu.vhd:169:3:@0ms:(assertion note): NEORV32 CPU ISA Configuration (MARCH): RV32IMACU_Zbb_Zicsr_Zifencei_Zfinx_Debug
|
|
../rtl/core/neorv32_cpu.vhd:189:3:@0ms:(assertion note): NEORV32 CPU CONFIG NOTE: Implementing NO dedicated hardware reset for uncritical registers (default, might reduce area). Set package constant = TRUE to configure a DEFINED reset value for all CPU registers.
|
|
../rtl/core/neorv32_imem.vhd:107:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal IMEM as ROM (16384 bytes), pre-initialized with application (4612 bytes).
|
|
../rtl/core/neorv32_dmem.vhd:89:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal DMEM (RAM, 8192 bytes).
|
|
../rtl/core/neorv32_wishbone.vhd:136:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing STANDARD Wishbone protocol.
|
|
../rtl/core/neorv32_wishbone.vhd:140:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing auto-timeout (255 cycles).
|
|
../rtl/core/neorv32_wishbone.vhd:144:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing LITTLE-endian byte order.
|
|
../rtl/core/neorv32_wishbone.vhd:148:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing registered RX path.
|
|
../rtl/core/neorv32_slink.vhd:161:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing 8 RX and 8 TX stream links.
|
|
|
|
##
|
|
## ## ## ##
|
|
## ## ######### ######## ######## ## ## ######## ######## ## ################
|
|
#### ## ## ## ## ## ## ## ## ## ## ## ## ## #### ####
|
|
## ## ## ## ## ## ## ## ## ## ## ## ## ## ###### ##
|
|
## ## ## ######### ## ## ######### ## ## ##### ## ## #### ###### ####
|
|
## ## ## ## ## ## ## ## ## ## ## ## ## ## ###### ##
|
|
## #### ## ## ## ## ## ## ## ## ## ## ## #### ####
|
|
## ## ######### ######## ## ## ## ######## ########## ## ################
|
|
## ## ## ##
|
|
##
|
|
Hello world! :)
|
|
----
|
|
|
|
|
|
:sectnums:
|
|
=== Advanced Simulation using VUNIT
|
|
|
|
.WORK IN PROGRESS
|
|
[WARNING]
|
|
This Section Is Under Construction! +
|
|
+
|
|
FIXME!
|
|
|
|
The NEORV32 provides a more sophisticated simulation setup using https://vunit.github.io/[VUNIT].
|
|
The according VUNIT-based testbench is `sim/neorv32_tb.vhd`.
|
|
|
|
**WORK-IN-PROGRESS**
|
|
|
|
|
|
|
|
|
// ####################################################################################################################
|
// ####################################################################################################################
|
:sectnums:
|
:sectnums:
|
== Building the Documentation
|
== Building the Documentation
|