OpenCores

Rev 62	Rev 63
Line 1...	Line 1...
`Let's Get It Started!`	`Let's Get It Started!`

`To make your NEORV32 project run, follow the guides from the upcoming sections. Follow these guides`	This user guide uses the NEORV32 project _as is_ from the official `neorv32` repository.
`step by step and in the presented order.`	`To make your first NEORV32 project run, follow the guides from the upcoming sections. It is recommended to`
	`follow these guides step by step and eventually in the presented order.`

	`[TIP]`
	`This guide uses the minimalistic and platform/toolchain agnostic SoC test setups from`
	`rtl/test_setups` for illustration. You can use one of the provided test setups for
	your first FPGA tests. Alternatively, have a look at the `setups` folder,
	`which provides more sophisticated example setups for various FPGAs/FPGA boards and toolchains.`


`:sectnums:`	`:sectnums:`
`== Software Toolchain Setup`	`== Software Toolchain Setup`

`To compile (and debug) executables for the NEORV32 a RISC-V toolchain is required.`	`To compile (and debug) executables for the NEORV32 a RISC-V toolchain is required.`
`There are two possibilities to get this:`	`There are two possibilities to get this:`

`1. Download and _build_ the official RISC-V GNU toolchain yourself`	`1. Download and _build_ the official RISC-V GNU toolchain yourself.`
`2. Download and install a prebuilt version of the toolchain; this might also done via the package manager / app store of your OS`	`2. Download and install a prebuilt version of the toolchain; this might also done via the package manager / app store of your OS`

`[TIP]`	`[NOTE]`
The default toolchain prefix for this project is `riscv32-unknown-elf-`. Of course you can use any other RISC-V	The default toolchain prefix (`RISCV_PREFIX` variable) for this project is `riscv32-unknown-elf-`. Of course you can use any other RISC-V
toolchain (like `riscv64-unknown-elf-`) that is capable to emit code for a `rv32` architecture. Just change the _RISCV_PREFIX_ variable in the application	toolchain (like `riscv64-unknown-elf-`) that is capable to emit code for a `rv32` architecture. Just change `RISCV_PREFIX`
`makefile(s) according to your needs or define this variable when invoking the makefile.`	`according to your needs.`

`[IMPORTANT]`
`Keep in mind that – for instance – a rv32imc toolchain only provides library code compiled with`
compressed (_C_) and `mul`/`div` instructions (_M_)! Hence, this code cannot be executed (without
`emulation) on an architecture without these extensions!`


`:sectnums:`	`:sectnums:`
`=== Building the Toolchain from Scratch`	`=== Building the Toolchain from Scratch`

Line 37...	Line 40...
`----`	`----`
`riscv-gnu-toolchain$ ./configure --prefix=/opt/riscv --with-arch=rv32i –-with-abi=ilp32`	`riscv-gnu-toolchain$ ./configure --prefix=/opt/riscv --with-arch=rv32i –-with-abi=ilp32`
`riscv-gnu-toolchain$ make`	`riscv-gnu-toolchain$ make`
`----`	`----`

	`[IMPORTANT]`
	Keep in mind that – for instance – a toolchain build with `--with-arch=rv32imc` only provides library code compiled with
	compressed (`C`) and `mul`/`div` instructions (`M`)! Hence, this code cannot be executed (without
	`emulation) on an architecture without these extensions!`


`:sectnums:`	`:sectnums:`
`=== Downloading and Installing a Prebuilt Toolchain`	`=== Downloading and Installing a Prebuilt Toolchain`

`Alternatively, you can download a prebuilt toolchain.`	`Alternatively, you can download a prebuilt toolchain.`
Line 101...	Line 109...

`// ####################################################################################################################`	`// ####################################################################################################################`
`:sectnums:`	`:sectnums:`
`== General Hardware Setup`	`== General Hardware Setup`

`This guide will setup a NEORV32 project for FPGA implementation (or simulation only) _from scratch_`	`This guide shows the basics of setting up a NEORV32 project for FPGA implementation (or simulation only)`
	`_from scratch_. It uses a _simplified_ test "SoC" setup of the processor to keeps things simple at the beginning.`
	`This simple setup is intended for evaluation or as "hello world" project to check out the NEORV32`
	`on _your_ FPGA board.`

`[TIP]`	`[TIP]`
`If you want to use a complete pre-defined setup to start with, check out the`	`If you want to use a more sophisticated pre-defined setup to start with, check out the`
project's `setups` folder (https://github.com/stnolting/neorv32/tree/master/setups),	`setups` folder, which provides example setups for various FPGA, boards and toolchains.
`which provides (script-based) demo setups for various FPGA boards and toolchains.`
	`The NEORV32 project features two minimalistic pre-configured test setups in`
	https://github.com/stnolting/neorv32/blob/master/rtl/test_setups[`rtl/test_setups`].
	`Both test setups only implement very basic processor and CPU features.`
	`The main difference between the two setups is the processor boot concept - so how to get a software executable`
	`_into_ the processor:`

	* `rtl/test_setups/neorv32_testsetup_approm.vhd`: this setup does not require a connection via UART. The
	`software executable is "installed" into the bitstream to initialize a read-only memory. Use this setup`
	`if your FPGA board does _not_ provide a UART interface.`
	* `rtl/test_setups/neorv32_testsetup_bootloader.vhd`: this setups uses the UART and the default NEORV32
	`bootloader to upload new software executables. Use this setup if your board _does_ provide a UART interface.`

	.NEORV32 "hello world" test setup (`rtl/test_setups/neorv32_testsetup_bootloader.vhd`)
	`image::neorv32_test_setup.png[align=center]`

`This tutorial uses a _simplified_ test setup of the processor`	`.External Clock Source`
`to keeps things simple at the beginning as this setup is intended as`	`[NOTE]`
`evaluation or "hello world" project to check out the NEORV32.`	`These test setups are intended to be directly used as design top entity. Of course you can also instantiate them`
	`into another design unit. If your FPGA board only provides _very fast_ external clock sources (like on the FOMU board)`
	`you might need to add clock management components (PLLs, DCMs, MMCMs, ...) to the test setup or to the according top entity`
	`if you instantiate one of the test setups.`

`[start=1]`	`[start=1]`
`. Create a new project with your FPGA EDA tool of choice.`	`. Create a new project with your FPGA EDA tool of choice.`
. Add all VHDL files from the project's `rtl/core` folder to your project. Make sure to _reference_ the	. Add all VHDL files from the project's `rtl/core` folder to your project.
`files only – do not copy them.`
. Make sure to add all the rtl files to a new library called `neorv32`. If your FPGA tools does not	. Make sure to add all the rtl files to a new library called `neorv32`. If your FPGA tools does not
`provide a field to enter the library name, check out the "properties" menu of the added rtl files.`	`provide a field to enter the library name, check out the "properties" menu of the added rtl files.`
. The `rtl/core/neorv32_top.vhd` VHDL file is the top entity of the NEORV32 processor. If you	. The `rtl/core/neorv32_top.vhd` VHDL file is the top entity of the NEORV32 processor, which can be
`already have a design, instantiate this unit into your design and proceed.`	`instantiated into the "real" project. However, in this tutorial we will use one of the pre-defined`
	test setups from `rtl/test_setups` (see above).

`[IMPORTANT]`	`[IMPORTANT]`
Make sure to include the `neorv32` package into your design when instantiating the processor: add	Make sure to include the `neorv32` package into your design when instantiating the processor: add
`library neorv32;` and `use neorv32.neorv32_package.all;` to your design unit.	`library neorv32;` and `use neorv32.neorv32_package.all;` to your design unit.

`[start=5]`	`[start=5]`
`. If you do not have a design yet and just want to check out the NEORV32 – no problem! This guide`	`. Add the pre-defined test setup of choice to the project, too, and select it as _top entity_.`
`uses a simplified top entity, that encapsulates the actual processor top entity: add the`	`. The entity of both test setups`
`rtl/templates/processor/neorv32_ProcessorTop_Test.vhd` VHDL file to your project, too, and	`provide a minimal set of configuration generics, that might have to be adapted to match your FPGA and board:`
`select it as _top entity_.`
`. This test setup provides a minimal test hardware setup:`

`.NEORV32 "hello world" test setup`
`image::neorv32_test_setup.png[align=center]`

`[start=7]`	`.Test setup entity - configuration generics`
`. It only implements some very basic processor and CPU features. Also, only the`
`minimum number of signals is propagated to the outer world.`
`. However, a minimal setup-specific configuration of the NEORV32 processor is required to make it run`
`on your FPGA board of choice. Only the absolutely required modifications will be made while`
`keeping the default configuration for the remaining configuration options:`

.Cut-out of `neorv32_ProcessorTop_Test.vhd` showing the processor instance and its configuration
`[source,vhdl]`	`[source,vhdl]`
`----`	`----`
`neorv32_top_inst: neorv32_top`	`generic (`
`generic map (`	`-- adapt these for your setup --`
`-- General --`	`CLOCK_FREQUENCY : natural := 100000000; <1>`
`CLOCK_FREQUENCY => 100000000, -- in Hz # <1>`	`MEM_INT_IMEM_SIZE : natural := 16*1024; <2>`
`INT_BOOTLOADER_EN => true,`	`MEM_INT_DMEM_SIZE : natural := 8*1024 <3>`
`...`	`);`
`-- Internal instruction memory --`
`MEM_INT_IMEM_EN => true,`
`MEM_INT_IMEM_SIZE => 16*1024, # <2>`
`-- Internal data memory --`
`MEM_INT_DMEM_EN => true,`
`MEM_INT_DMEM_SIZE => 8*1024, # <3>`
`...`
`----`	`----`
<1> Clock frequency of `clk_i` signal in Hertz	<1> Clock frequency of `clk_i` signal in Hertz
`<2> Default size of internal instruction memory: 16kB`	`<2> Default size of internal instruction memory: 16kB`
`<3> Default size of internal data memory: 8kB`	`<3> Default size of internal data memory: 8kB`

`[start=9]`	`[start=7]`
`. There is one generic that has to be set according to your FPGA board setup: the actual clock frequency`	`. If you feel like it – or if your FPGA does not provide sufficient resources – you can modify the`
of the top's clock input signal (`clk_i`). Use the _CLOCK_FREQUENC_Y generic to specify your clock source's	_memory sizes_ (`MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE` – marked with notes "2" and "3"). But as mentioned
`frequency in Hertz (Hz) (note "1").`	`above, let's keep things simple at first and use the standard configuration for now.`
`. If you feel like it – or if your FPGA does not provide many resources – you can modify the`	`. There is one generic that _has to be set according to your FPGA board_ setup: the actual clock frequency`
`memory sizes (_MEM_INT_IMEM_SIZE_ and _MEM_INT_DMEM_SIZE_ – marked with notes "2" and "3") or even`	of the top's clock input signal (`clk_i`). Use the `CLOCK_FREQUENCY` generic to specify your clock source's
`exclude certain ISA extensions and peripheral modules from implementation - but as mentioned above, let's keep things`	`frequency in Hertz (Hz).`
`simple at first and use the standard configuration for now.`

`[NOTE]`	`[NOTE]`
`If you have changed the default memory configuration (_MEM_INT_IMEM_SIZE_ and _MEM_INT_DMEM_SIZE_ generics)`	If you have changed the default memory configuration (`MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE` generics)
`keep those new sizes in mind – these values are required for setting`	`keep those new sizes in mind – these values are required for setting`
`up the software framework in the next section <<_general_software_framework_setup>>.`	`up the software framework in the next section <<_general_software_framework_setup>>.`

`[start=11]`	`[start=9]`
`. Depending on your FPGA tool of choice, it is time to assign the signals of the test setup top entity to`	`. Depending on your FPGA tool of choice, it is time to assign the signals of the test setup top entity to`
`the according pins of your FPGA board. All the signals can be found in the entity declaration:`	`the according pins of your FPGA board. All the signals can be found in the entity declaration of the`
	`corresponding test setup:`

.Entity signals of `neorv32_test_setup.vhd`	.Entity signals of `neorv32_testsetup_approm.vhd`
`[source,vhdl]`	`[source,vhdl]`
`----`	`----`
`entity neorv32_test_setup is`
`port (`	`port (`
`-- Global control --`	`-- Global control --`
`clk_i : in std_ulogic := '0'; -- global clock, rising edge`	`clk_i : in std_ulogic; -- global clock, rising edge`
`rstn_i : in std_ulogic := '0'; -- global reset, low-active, async`	`rstn_i : in std_ulogic; -- global reset, low-active, async`
	`-- GPIO --`
	`gpio_o : out std_ulogic_vector(7 downto 0) -- parallel output`
	`);`
	`----`

	.Entity signals of `neorv32_testsetup_bootloader.vhd`
	`[source,vhdl]`
	`----`
	`port (`
	`-- Global control --`
	`clk_i : in std_ulogic; -- global clock, rising edge`
	`rstn_i : in std_ulogic; -- global reset, low-active, async`
`-- GPIO --`	`-- GPIO --`
`gpio_o : out std_ulogic_vector(7 downto 0); -- parallel output`	`gpio_o : out std_ulogic_vector(7 downto 0); -- parallel output`
`-- UART0 --`	`-- UART0 --`
`uart0_txd_o : out std_ulogic; -- UART0 send data`	`uart0_txd_o : out std_ulogic; -- UART0 send data`
`uart0_rxd_i : in std_ulogic := '0' -- UART0 receive data`	`uart0_rxd_i : in std_ulogic -- UART0 receive data`
`);`	`);`
`end neorv32_test_setup;`
`----`	`----`

`[start=12]`	`.Signal Polarity`
	`[NOTE]`
	If your FPGA board has inverse polarity for certain input/output you can add `not` gates. Example: The reset signal
	`rstn_i` is low-active by default; the LEDs connected to `gpio_o` high-active by default.
	`You can do this in your board top if you instantiate the test setup,`
	or _inside_ the test setup if this is your top entity (low-active LEDs example: `gpio_o <= NOT con_gpio_o(7 downto 0);`).

	`[start=10]`
. Attach the clock input `clk_i` to your clock source and connect the reset line `rstn_i` to a button of	. Attach the clock input `clk_i` to your clock source and connect the reset line `rstn_i` to a button of
`your FPGA board. Check whether it is low-active or high-active – the reset signal of the processor is`	`your FPGA board. Check whether it is low-active or high-active – the reset signal of the processor is`
`low-active, so maybe you need to invert the input signal.`	`low-active, so maybe you need to invert the input signal.`
. If possible, connected at least bit `0` of the GPIO output port `gpio_o` to a high-active LED (invert	. If possible, connected _at least_ bit `0` of the GPIO output port `gpio_o` to a LED (see "Signal Polarity" note above).
`the signal when your LEDs are low-active). This LED will be used as status LED for the setup.`	. Finally, if your are using the UART-based test setup (`neorv32_testsetup_bootloader.vhd`)
`. Finally, if your FPGA board provides a serial host interface (USB-to-serial converter) interface,`	connect the UART communication signals `uart0_txd_o` and `uart0_rxd_i` to the host interface (e.g. USB-UART converter).
connect the UART communication signals `uart0_txd_o` and `uart0_rxd_i`.
`. Perform the project HDL compilation (synthesis, mapping, bitstream generation).`	`. Perform the project HDL compilation (synthesis, mapping, bitstream generation).`
`. Program the generated bitstream into your FPGA and press the button connected to the reset signal.`	`. Program the generated bitstream into your FPGA and press the button connected to the reset signal.`
`. Done! The assigned status LED should be flashing now for some sections before permanently lighting up.`	. Done! The LED at `gpio_o(0)` should be flashing now.

	`[TIP]`
	`After the GCC toolchain for compiling RISC-V source code is ready (chapter <<_general_software_framework_setup>>),`
	`you can advance to one of these chapters to learn how to get a software executable into your processor setup:`
	* If you are using the `neorv32_testsetup_approm.vhd` setup: See section <<_installing_an_executable_directly_into_memory>>.
	* If you are using the `neorv32_testsetup_bootloader.vhd` setup: See section <<_uploading_and_starting_of_a_binary_executable_image_via_uart>>.




`// ####################################################################################################################`	`// ####################################################################################################################`
Line 600...	Line 631...



`// ####################################################################################################################`	`// ####################################################################################################################`
`:sectnums:`	`:sectnums:`
	`== Application-Specific Processor Configuration`

	`Due to the processor's configuration options, which are mainly defined via the top entity VHDL generics, the SoC`
	`can be tailored to the application-specific requirements. Note that this chapter does not focus on optional`
	`_SoC features_ like IO/peripheral modules. It rather gives ideas on how to optimize for _overall goals_`
	`like performance and area.`

	`[NOTE]`
	`Please keep in mind that optimizing the design in one direction (like performance) will also effect other potential`
	`optimization goals (like area and energy).`

	`=== Optimize for Performance`

	`The following points show some concepts to optimize the processor for performance regardless of the costs`
	`(i.e. increasing area and energy requirements):`

	`* Enable all performance-related RISC-V CPU extensions that implement dedicated hardware accelerators instead`
	of emulating operations entirely in software: `M`, `C`, `Zfinx`
	* Enable mapping of compleX CPU operations to dedicated hardware: `FAST_MUL_EN => true` to use DSP slices for
	multiplications, `FAST_SHIFT_EN => true` use a fast barrel shifter for shift operations.
	* Implement the instruction cache: `ICACHE_EN => true`
	* Use as many _internal_ memory as possible to reduce memory access latency: `MEM_INT_IMEM_EN => true` and
	`MEM_INT_DMEM_EN => true`, maximize `MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE`
	* Increase the CPU's instruction prefetch buffer size: `CPU_IPB_ENTRIES`
	`* _To be continued..._`


	`=== Optimize for Size`

	`The NEORV32 is a size-optimized processor system that is intended to fit into tiny niches within large SoC`
	`designs or to be used a customized microcontroller in really tiny / low-power FPGAs (like Lattice iCE40).`
	`Here are some ideas how to make the processor even smaller while maintaining it's _general purpose system_`
	`concept and maximum RISC-V compatibility.`

	`SoC`

	`* This is obvious, but exclude all unused optional IO/peripheral modules from synthesis via the processor`
	`configuration generics.`
	`* If an IO module provides an option to configure the number of "channels", constrain this number to the`
	actually required value (e.g. the PWM module `IO_PWM_NUM_CH` or the external interrupt controller `XIRQ_NUM_CH`).
	* Reduce the FIFO sizes of implemented modules (e.g. `SLINK_TX_FIFO`).
	* Disable the instruction cache (`ICACHE_EN => false`) if the design only uses processor-internal IMEM
	`and DMEM memories.`
	`* _To be continued..._`

	`CPU`

	* Use the _embedded_ RISC-V CPU architecture extension (`CPU_EXTENSION_RISCV_E`) to reduce block RAM utilization.
	* The compressed instructions extension (`CPU_EXTENSION_RISCV_C`) requires additional logic for the decoder but
	`also reduces program code size by approximately 30%.`
	* If not explicitly used/required, constrain the CPU's counter sizes: `CPU_CNT_WIDTH` for `[m]instret[h]`
	(number of instruction) and `[m]cycle[h]` (number of cycles) counters. You can even remove these counters
	by setting `CPU_CNT_WIDTH => 0` if they are not used at all (note, this is not RISC-V compliant).
	* Reduce the CPU's prefetch buffer size (`CPU_IPB_ENTRIES`).
	* Map CPU shift operations to a small and iterative shifter unit (`FAST_SHIFT_EN => false`).
	`* If you have unused DSP block available, you can map multiplication operations to those slices instead of`
	using LUTs to implement the multiplier (`FAST_MUL_EN => true`).
	* If there is no need to execute division in hardware, use the `Zmmul` extension instead of the full-scale
	`M` extension.
	* Disable CPU extension that are not explicitly used (`A`, `U`, `Zfinx`).
	`* _To be continued..._`

	`=== Optimize for Clock Speed`

	`The NEORV32 Processor and CPU are designed to provide minimal logic between register stages to keep the`
	`critical path as short as possible. When enabling additional extension or modules the impact on the existing`
	`logic is also kept at a minimum to prevent timing degrading. If there is a major impact on existing`
	`logic (example: many physical memory protection address configuration registers) the VHDL code automatically`
	`adds additional register stages to maintain critical path length. Obviously, this increases operation latency.`

	`In order to optimize for a minimal critical path (= maximum clock speed) the following points should be considered:`

	`* Complex CPU extensions (in terms of hardware requirements) should be avoided (examples: floating-point unit, physical memory protection).`
	* Large carry chains (>32-bit) should be avoided (constrain CPU counter sizes: e.g. `CPU_CNT_WIDTH => 32` and `HPM_NUM_CNTS => 32`).
	* If the target FPGA provides sufficient DSP resources, CPU multiplication operations can be mapped to DSP slices (`FAST_MUL_EN => true`)
	`reducing LUT usage and critical path impact while also increasing overall performance.`
	* Use the synchronous (registered) RX path configuration of the external memory interface (`MEM_EXT_ASYNC_RX => false`).
	`* _To be continued..._`

	`[NOTE]`
	`The short and fixed-length critical path allows to integrate the core into existing clock domains.`
	`So no clock domain-crossing and no sub-clock generation is required. However, for very high clock`
	`frequencies (this is technology / platform dependent) clock domain crossing becomes crucial for chip-internal`
	`connections.`


	`=== Optimize for Energy`

	`There are no _dedicated_ configuration options to optimize the processor for energy (minimal consumption;`
	`energy/instruction ratio) yet. However, a reduced processor area (<<_optimize_for_size>>) will also reduce`
	`static energy consumption.`

	To optimize your setup for low-power applications, you can make use of the CPU sleep mode (`wfi` instruction).
	`Put the CPU to sleep mode whenever possible. Disable all processor modules that are not actually used (exclude them`
	`from synthesis if the will be _never_ used; disable the module via it's control register if the module is not`
	`_currently_ used). When is sleep mode, you can keep a timer module running (MTIME or the watch dog) to wake up`
	`the CPU again. Since the wake up is triggered by _any_ interrupt, the external interrupt controller can also`
	`be used to wake up the CPU again. By this, all timers (and all other modules) can be deactivated as well.`

	`.Processor-internal clock generator shutdown`
	`[TIP]`
	`If _no_ IO/peripheral module is currently enabled, the processor's internal clock generator circuit will be`
	`shut down reducing switching activity and thus, dynamic energy consumption.`




	`// ####################################################################################################################`
	`:sectnums:`
`== Customizing the Internal Bootloader`	`== Customizing the Internal Bootloader`

`The NEORV32 bootloader provides several options to configure and customize it for a certain application setup.`	`The NEORV32 bootloader provides several options to configure and customize it for a certain application setup.`
`This configuration is done by passing _defines_ when compiling the bootloader. Of course you can also`	`This configuration is done by passing _defines_ when compiling the bootloader. Of course you can also`
`modify to bootloader source code to provide a setup that perfectly fits your needs.`	`modify to bootloader source code to provide a setup that perfectly fits your needs.`
Line 630...	Line 770...
`4+^\| Boot configuration`	`4+^\| Boot configuration`
\| `AUTO_BOOT_SPI_EN` \| `0` \| `0`, `1` \| Set `1` to enable immediate boot from external SPI flash	\| `AUTO_BOOT_SPI_EN` \| `0` \| `0`, `1` \| Set `1` to enable immediate boot from external SPI flash
\| `AUTO_BOOT_OCD_EN` \| `0` \| `0`, `1` \| Set `1` to enable boot via on-chip debugger (OCD)	\| `AUTO_BOOT_OCD_EN` \| `0` \| `0`, `1` \| Set `1` to enable boot via on-chip debugger (OCD)
\| `AUTO_BOOT_TIMEOUT` \| `8` \| _any_ \| Time in seconds after the auto-boot sequence starts (if there is no UART input by user); set to 0 to disabled auto-boot sequence	\| `AUTO_BOOT_TIMEOUT` \| `8` \| _any_ \| Time in seconds after the auto-boot sequence starts (if there is no UART input by user); set to 0 to disabled auto-boot sequence
`4+^\| SPI configuration`	`4+^\| SPI configuration`
	\| `SPI_EN` \| `1` \| `0`, `1` \| Set `1` to enable the usage of the SPI module (including load/store executables from/to SPI flash options)
\| `SPI_FLASH_CS` \| `0` \| `0` ... `7` \| SPI chip select output (`spi_csn_o`) for selecting flash	\| `SPI_FLASH_CS` \| `0` \| `0` ... `7` \| SPI chip select output (`spi_csn_o`) for selecting flash
\| `SPI_FLASH_SECTOR_SIZE` \| `65536` \| _any_ \| SPI flash sector size in bytes	\| `SPI_FLASH_SECTOR_SIZE` \| `65536` \| _any_ \| SPI flash sector size in bytes
\| `SPI_FLASH_CLK_PRSC` \| `CLK_PRSC_8` \| `CLK_PRSC_2` `CLK_PRSC_4` `CLK_PRSC_8` `CLK_PRSC_64` `CLK_PRSC_128` `CLK_PRSC_1024` `CLK_PRSC_2024` `CLK_PRSC_4096` \| SPI clock pre-scaler (dividing main processor clock)	\| `SPI_FLASH_CLK_PRSC` \| `CLK_PRSC_8` \| `CLK_PRSC_2` `CLK_PRSC_4` `CLK_PRSC_8` `CLK_PRSC_64` `CLK_PRSC_128` `CLK_PRSC_1024` `CLK_PRSC_2024` `CLK_PRSC_4096` \| SPI clock pre-scaler (dividing main processor clock)
\| `SPI_BOOT_BASE_ADDR` \| `0x08000000` \| _any_ 32-bit value \| Defines the _base_ address of the executable in external flash	\| `SPI_BOOT_BASE_ADDR` \| `0x08000000` \| _any_ 32-bit value \| Defines the _base_ address of the executable in external flash
`\|=======================`	`\|=======================`
Line 818...	Line 959...

`// ####################################################################################################################`	`// ####################################################################################################################`
`:sectnums:`	`:sectnums:`
`== Simulating the Processor`	`== Simulating the Processor`

`.WORK IN PROGRESS`
`[WARNING]`
`This Section Is Under Construction! +`
`+`
`FIXME!`

`:sectnums:`	`:sectnums:`
`=== Testbench`	`=== Testbench`

The NEORV32 project features a simple default testbench (`sim/neorv32_tb.simple.vhd`) that can be used to simulate	The NEORV32 project features a simple, plain-VHDL (no third-party libraries) default testbench (`sim/neorv32_tb.simple.vhd`)
`and test the processor setup. This testbench features a 100MHz clock and enables all optional peripheral and`	`that can be used to simulate and test the processor setup. This testbench features a 100MHz clock and enables all optional`
CPU extensions except for the `E` extension and the TRNG IO module (that CANNOT be simulated due to its	peripheral and CPU extensions except for the `E` extension and the TRNG IO module (that CANNOT be simulated due to its
`combinatorial (looped) oscillator architecture).`	`combinatorial (looped) architecture).`

`The simulation setup is configured via the "User Configuration" section located right at the beginning of`	`The simulation setup is configured via the "User Configuration" section located right at the beginning of`
`the testbench's architecture. Each configuration constant provides comments to explain the functionality.`	`the testbench's architecture. Each configuration constant provides comments to explain the functionality.`

`Besides the actual NEORV32 Processor, the testbench also simulates "external" components that are connected`	`Besides the actual NEORV32 Processor, the testbench also simulates "external" components that are connected`
Line 858...	Line 993...
\| `0x80000000` \| `dmem_size_c` \| `r/w/e, a, 8/16/32` \| external DMEM	\| `0x80000000` \| `dmem_size_c` \| `r/w/e, a, 8/16/32` \| external DMEM
\| `0xf0000000` \| 64 bytes \| `r/w/e, !a, 8/16/32` \| external "IO" memory, atomic accesses will fail	\| `0xf0000000` \| 64 bytes \| `r/w/e, !a, 8/16/32` \| external "IO" memory, atomic accesses will fail
\| `0xff000000` \| 4 bytes \| `-/w/-, a, -/-/32` \| memory-mapped register to trigger "machine external", "machine software" and "SoC Fast Interrupt" interrupts	\| `0xff000000` \| 4 bytes \| `-/w/-, a, -/-/32` \| memory-mapped register to trigger "machine external", "machine software" and "SoC Fast Interrupt" interrupts
`\|=======================`	`\|=======================`

`The simulated NEORV32 does not use the bootloader and directly boots the current application image (from`
the `rtl/core/neorv32_application_image.vhd` image file). Make sure to use the `all` target of the
`makefile to install your application as VHDL image after compilation:`

`[source, bash]`
`----`
`sw/example/blink_led$ make clean_all all`
`----`

`.Simulation-Optimized CPU/Processors Modules`
`[NOTE]`	`[NOTE]`
The `sim/rtl_modules` folder provides simulation-optimized versions of certain CPU/processor modules.	`The simulated NEORV32 does not use the bootloader and _directly boots_ the current application image (from`
`These alternatives can be used to replace the default CPU/processor HDL files to allow faster/easier/more`	the `rtl/core/neorv32_application_image.vhd` image file).
`efficient simulation. These files are not intended for synthesis!`

`Simulation Console Output`

	`.UART output during simulation`
	`[NOTE]`
`Data written to the NEORV32 UART0 / UART1 transmitter is send to a virtual UART receiver implemented`	`Data written to the NEORV32 UART0 / UART1 transmitter is send to a virtual UART receiver implemented`
`as part of the testbench. Received chars are send to the simulator console and are also stored to a log file`	`as part of the testbench. Received chars are send to the simulator console and are also stored to a log file`
(`neorv32.testbench_uart0.out` for UART0, `neorv32.testbench_uart1.out` for UART1) inside the simulator home folder.	(`neorv32.testbench_uart0.out` for UART0, `neorv32.testbench_uart1.out` for UART1) inside the simulation's home folder.
	`Please note that printing via the native UART receiver takes a lot of time. For faster simulation console output`
	`see section <<_faster_simulation_console_output>>.`


`:sectnums:`	`:sectnums:`
`=== Faster Simulation Console Output`	`=== Faster Simulation Console Output`

Line 907...	Line 1033...
`[source, bash]`	`[source, bash]`
`----`	`----`
`sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all all`	`sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all all`
`----`	`----`

`The provided define will change the default UART0/UART1 setup function in order to set the simulation mode flag in the according UART's control register.`	`The provided define will change the default UART0/UART1 setup function in order to set the simulation`
	`mode flag in the according UART's control register.`

`[NOTE]`	`[NOTE]`
`The UART simulation output (to file and to screen) outputs "complete lines" at once. A line is`	`The UART simulation output (to file and to screen) outputs "complete lines" at once. A line is`
completed with a line feed (newline, ASCII `\n` = 10).	completed with a line feed (newline, ASCII `\n` = 10).

Line 927...	Line 1054...
`----`	`----`
`neorv32/sim$ sh ghdl_sim.sh --stop-time=20ms`	`neorv32/sim$ sh ghdl_sim.sh --stop-time=20ms`
`----`	`----`


	`:sectnums:`
	`=== In-Console Application Simulation`

	`To directly compile and run a program in the console (using the default testbench and GHDL`
	as simulator) you can use the `sim` makefile target. Make sure to use the UART simulation mode
	(`USER_FLAGS+=-DUART0_SIM_MODE` and/or `USER_FLAGS+=-DUART1_SIM_MODE`) to get
	`faster / direct-to-console UART output.`

	`[source, bash]`
	`----`
	`sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all sim`
	`[...]`
	`Blinking LED demo program`
	`----`


	`:sectnums:`
	`=== Hello World!`

	`To do a quick test of the NEORV32 make sure to have [GHDL](https://github.com/ghdl/ghdl) and a`
	`[RISC-V gcc toolchain](https://github.com/stnolting/riscv-gcc-prebuilt) installed, navigate to the project's`
	`sw/example/hello_world` folder and run `make USER_FLAGS+=-DUART0_SIM_MODE MARCH=-march=rv32imac clean_all sim`:

	`[TIP]`
	`The simulator will output some _sanity check_ notes (and warnings or even errors if something is ill-configured)`
	`right at the beginning of the simulation to give a brief overview of the actual NEORV32 SoC and CPU configurations.`

	`[source, bash]`
	`----`
	`stnolting@Einstein:/mnt/n/Projects/neorv32/sw/example/hello_world$ make USER_FLAGS+=-DUART0_SIM_MODE MARCH=-march=rv32imac clean_all sim`
	`../../../sw/lib/source/neorv32_uart.c: In function 'neorv32_uart0_setup':`
	`../../../sw/lib/source/neorv32_uart.c:301:4: warning: #warning UART0_SIM_MODE (primary UART) enabled! Sending all UART0.TX data to text.io simulation output instead of real UART0 transmitter. Use this for simulations only! [-Wcpp]`
	`301 \| #warning UART0_SIM_MODE (primary UART) enabled! Sending all UART0.TX data to text.io simulation output instead of real UART0 transmitter. Use this for simulations only!`
	`\| ^~~~~~~`
	`Memory utilization:`
	`text data bss dec hex filename`
	`4612 0 120 4732 127c main.elf`
	`Compiling ../../../sw/image_gen/image_gen`
	`Installing application image to ../../../rtl/core/neorv32_application_image.vhd`
	`Simulating neorv32_application_image.vhd...`
	`Tip: Compile application with USER_FLAGS+=-DUART[0/1]_SIM_MODE to auto-enable UART[0/1]'s simulation mode (redirect UART output to simulator console).`
	`Using simulation runtime args: --stop-time=10ms`
	`../rtl/core/neorv32_top.vhd:347:3:@0ms:(assertion note): NEORV32 PROCESSOR IO Configuration: GPIO MTIME UART0 UART1 SPI TWI PWM WDT CFS SLINK NEOLED XIRQ`
	`../rtl/core/neorv32_top.vhd:370:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Boot configuration: Direct boot from memory (processor-internal IMEM).`
	`../rtl/core/neorv32_top.vhd:394:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing on-chip debugger (OCD).`
	`../rtl/core/neorv32_cpu.vhd:169:3:@0ms:(assertion note): NEORV32 CPU ISA Configuration (MARCH): RV32IMACU_Zbb_Zicsr_Zifencei_Zfinx_Debug`
	`../rtl/core/neorv32_cpu.vhd:189:3:@0ms:(assertion note): NEORV32 CPU CONFIG NOTE: Implementing NO dedicated hardware reset for uncritical registers (default, might reduce area). Set package constant = TRUE to configure a DEFINED reset value for all CPU registers.`
	`../rtl/core/neorv32_imem.vhd:107:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal IMEM as ROM (16384 bytes), pre-initialized with application (4612 bytes).`
	`../rtl/core/neorv32_dmem.vhd:89:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal DMEM (RAM, 8192 bytes).`
	`../rtl/core/neorv32_wishbone.vhd:136:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing STANDARD Wishbone protocol.`
	`../rtl/core/neorv32_wishbone.vhd:140:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing auto-timeout (255 cycles).`
	`../rtl/core/neorv32_wishbone.vhd:144:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing LITTLE-endian byte order.`
	`../rtl/core/neorv32_wishbone.vhd:148:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing registered RX path.`
	`../rtl/core/neorv32_slink.vhd:161:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing 8 RX and 8 TX stream links.`

	`##`
	`## ## ## ##`
	`## ## ######### ######## ######## ## ## ######## ######## ## ################`
	`#### ## ## ## ## ## ## ## ## ## ## ## ## ## #### ####`
	`## ## ## ## ## ## ## ## ## ## ## ## ## ## ###### ##`
	`## ## ## ######### ## ## ######### ## ## ##### ## ## #### ###### ####`
	`## ## ## ## ## ## ## ## ## ## ## ## ## ## ###### ##`
	`## #### ## ## ## ## ## ## ## ## ## ## ## #### ####`
	`## ## ######### ######## ## ## ## ######## ########## ## ################`
	`## ## ## ##`
	`##`
	`Hello world! :)`
	`----`


	`:sectnums:`

Line 1...

Let's Get It Started!

Let's Get It Started!

To make your NEORV32 project run, follow the guides from the upcoming sections. Follow these guides

This user guide uses the NEORV32 project _as is_ from the official `neorv32` repository.

step by step and in the presented order.

To make your first NEORV32 project run, follow the guides from the upcoming sections. It is recommended to

follow these guides step by step and eventually in the presented order.

[TIP]

This guide uses the minimalistic and platform/toolchain agnostic SoC test setups from

`rtl/test_setups` for illustration. You can use one of the provided test setups for

your first FPGA tests. Alternatively, have a look at the `setups` folder,

which provides more sophisticated example setups for various FPGAs/FPGA boards and toolchains.

:sectnums:

:sectnums:

== Software Toolchain Setup

== Software Toolchain Setup

To compile (and debug) executables for the NEORV32 a RISC-V toolchain is required.

To compile (and debug) executables for the NEORV32 a RISC-V toolchain is required.

There are two possibilities to get this:

There are two possibilities to get this:

1. Download and _build_ the official RISC-V GNU toolchain yourself

1. Download and _build_ the official RISC-V GNU toolchain yourself.

2. Download and install a prebuilt version of the toolchain; this might also done via the package manager / app store of your OS

2. Download and install a prebuilt version of the toolchain; this might also done via the package manager / app store of your OS

[TIP]

[NOTE]

The default toolchain prefix for this project is **`riscv32-unknown-elf-`**. Of course you can use any other RISC-V

The default toolchain prefix (`RISCV_PREFIX` variable) for this project is **`riscv32-unknown-elf-`**. Of course you can use any other RISC-V

toolchain (like `riscv64-unknown-elf-`) that is capable to emit code for a `rv32` architecture. Just change the _RISCV_PREFIX_ variable in the application

toolchain (like `riscv64-unknown-elf-`) that is capable to emit code for a `rv32` architecture. Just change `RISCV_PREFIX`

makefile(s) according to your needs or define this variable when invoking the makefile.

according to your needs.

[IMPORTANT]

Keep in mind that – for instance – a rv32imc toolchain only provides library code compiled with

compressed (_C_) and `mul`/`div` instructions (_M_)! Hence, this code cannot be executed (without

emulation) on an architecture without these extensions!

:sectnums:

:sectnums:

=== Building the Toolchain from Scratch

=== Building the Toolchain from Scratch

Line 37...

Line 40...

----

----

riscv-gnu-toolchain$ ./configure --prefix=/opt/riscv --with-arch=rv32i –-with-abi=ilp32

riscv-gnu-toolchain$ ./configure --prefix=/opt/riscv --with-arch=rv32i –-with-abi=ilp32

riscv-gnu-toolchain$ make

riscv-gnu-toolchain$ make

----

----

[IMPORTANT]

Keep in mind that – for instance – a toolchain build with `--with-arch=rv32imc` only provides library code compiled with

compressed (`C`) and `mul`/`div` instructions (`M`)! Hence, this code cannot be executed (without

emulation) on an architecture without these extensions!

:sectnums:

:sectnums:

=== Downloading and Installing a Prebuilt Toolchain

=== Downloading and Installing a Prebuilt Toolchain

Alternatively, you can download a prebuilt toolchain.

Alternatively, you can download a prebuilt toolchain.

Line 101...

Line 109...

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

== General Hardware Setup

== General Hardware Setup

This guide will setup a NEORV32 project for FPGA implementation (or simulation only) _from scratch_

This guide shows the basics of setting up a NEORV32 project for FPGA implementation (or simulation only)

_from scratch_. It uses a _simplified_ test "SoC" setup of the processor to keeps things simple at the beginning.

This simple setup is intended for evaluation or as "hello world" project to check out the NEORV32

on _your_ FPGA board.

[TIP]

[TIP]

If you want to use a complete pre-defined setup to start with, check out the

If you want to use a more sophisticated pre-defined setup to start with, check out the

project's `setups` folder (https://github.com/stnolting/neorv32/tree/master/setups),

`setups` folder, which provides example setups for various FPGA, boards and toolchains.

which provides (script-based) demo setups for various FPGA boards and toolchains.

The NEORV32 project features two minimalistic pre-configured test setups in

https://github.com/stnolting/neorv32/blob/master/rtl/test_setups[`rtl/test_setups`].

Both test setups only implement very basic processor and CPU features.

The main difference between the two setups is the processor boot concept - so how to get a software executable

_into_ the processor:

* **`rtl/test_setups/neorv32_testsetup_approm.vhd`**: this setup does not require a connection via UART. The

software executable is "installed" into the bitstream to initialize a read-only memory. Use this setup

if your FPGA board does _not_ provide a UART interface.

* **`rtl/test_setups/neorv32_testsetup_bootloader.vhd`**: this setups uses the UART and the default NEORV32

bootloader to upload new software executables. Use this setup if your board _does_ provide a UART interface.

.NEORV32 "hello world" test setup (`rtl/test_setups/neorv32_testsetup_bootloader.vhd`)

image::neorv32_test_setup.png[align=center]

This tutorial uses a _simplified_ test setup of the processor

.External Clock Source

to keeps things simple at the beginning as this setup is intended as

[NOTE]

evaluation or "hello world" project to check out the NEORV32.

These test setups are intended to be directly used as **design top entity**. Of course you can also instantiate them

into another design unit. If your FPGA board only provides _very fast_ external clock sources (like on the FOMU board)

you might need to add clock management components (PLLs, DCMs, MMCMs, ...) to the test setup or to the according top entity

if you instantiate one of the test setups.

[start=1]

[start=1]

. Create a new project with your FPGA EDA tool of choice.

. Create a new project with your FPGA EDA tool of choice.

. Add all VHDL files from the project's `rtl/core` folder to your project. Make sure to _reference_ the

. Add all VHDL files from the project's `rtl/core` folder to your project.

files only – do not copy them.

. Make sure to add all the rtl files to a new library called `neorv32`. If your FPGA tools does not

. Make sure to add all the rtl files to a new library called `neorv32`. If your FPGA tools does not

provide a field to enter the library name, check out the "properties" menu of the added rtl files.

provide a field to enter the library name, check out the "properties" menu of the added rtl files.

. The `rtl/core/neorv32_top.vhd` VHDL file is the top entity of the NEORV32 processor. If you

. The `rtl/core/neorv32_top.vhd` VHDL file is the top entity of the NEORV32 processor, which can be

already have a design, instantiate this unit into your design and proceed.

instantiated into the "real" project. However, in this tutorial we will use one of the pre-defined

test setups from `rtl/test_setups` (see above).

[IMPORTANT]

[IMPORTANT]

Make sure to include the `neorv32` package into your design when instantiating the processor: add

Make sure to include the `neorv32` package into your design when instantiating the processor: add

`library neorv32;` and `use neorv32.neorv32_package.all;` to your design unit.

`library neorv32;` and `use neorv32.neorv32_package.all;` to your design unit.

[start=5]

[start=5]

. If you do not have a design yet and just want to check out the NEORV32 – no problem! This guide

. Add the pre-defined test setup of choice to the project, too, and select it as _top entity_.

uses a simplified top entity, that encapsulates the actual processor top entity: add the

. The entity of both test setups

`rtl/templates/processor/neorv32_ProcessorTop_Test.vhd` VHDL file to your project, too, and

provide a minimal set of configuration generics, that might have to be adapted to match your FPGA and board:

select it as _top entity_.

. This test setup provides a minimal test hardware setup:

.NEORV32 "hello world" test setup

image::neorv32_test_setup.png[align=center]

[start=7]

.Test setup entity - configuration generics

. It only implements some very basic processor and CPU features. Also, only the

minimum number of signals is propagated to the outer world.

. However, a minimal setup-specific configuration of the NEORV32 processor is required to make it run

on your FPGA board of choice. Only the absolutely required modifications will be made while

keeping the default configuration for the remaining configuration options:

.Cut-out of `neorv32_ProcessorTop_Test.vhd` showing the processor instance and its configuration

[source,vhdl]

[source,vhdl]

----

----

neorv32_top_inst: neorv32_top

  generic (

generic map (

    -- adapt these for your setup --

  -- General --

    CLOCK_FREQUENCY   : natural := 100000000; <1>

  CLOCK_FREQUENCY   => 100000000, -- in Hz # <1>

    MEM_INT_IMEM_SIZE : natural := 16*1024;   <2>

  INT_BOOTLOADER_EN => true,

    MEM_INT_DMEM_SIZE : natural := 8*1024     <3>

...

);

  -- Internal instruction memory --

  MEM_INT_IMEM_EN   => true,

  MEM_INT_IMEM_SIZE => 16*1024, # <2>

  -- Internal data memory --

  MEM_INT_DMEM_EN   => true,

  MEM_INT_DMEM_SIZE => 8*1024, # <3>

...

----

----

<1> Clock frequency of `clk_i` signal in Hertz

<1> Clock frequency of `clk_i` signal in Hertz

<2> Default size of internal instruction memory: 16kB

<2> Default size of internal instruction memory: 16kB

<3> Default size of internal data memory: 8kB

<3> Default size of internal data memory: 8kB

[start=9]

[start=7]

. There is one generic that has to be set according to your FPGA board setup: the actual clock frequency

. If you feel like it – or if your FPGA does not provide sufficient resources – you can modify the

of the top's clock input signal (`clk_i`). Use the _CLOCK_FREQUENC_Y generic to specify your clock source's

_memory sizes_ (`MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE` – marked with notes "2" and "3"). But as mentioned

frequency in Hertz (Hz) (note "1").

above, let's keep things simple at first and use the standard configuration for now.

. If you feel like it – or if your FPGA does not provide many resources – you can modify the

. There is one generic that _has to be set according to your FPGA board_ setup: the actual clock frequency

**memory sizes** (_MEM_INT_IMEM_SIZE_ and _MEM_INT_DMEM_SIZE_ – marked with notes "2" and "3") or even

of the top's clock input signal (`clk_i`). Use the `CLOCK_FREQUENCY` generic to specify your clock source's

exclude certain ISA extensions and peripheral modules from implementation - but as mentioned above, let's keep things

frequency in Hertz (Hz).

simple at first and use the standard configuration for now.

[NOTE]

[NOTE]

If you have changed the default memory configuration (_MEM_INT_IMEM_SIZE_ and _MEM_INT_DMEM_SIZE_ generics)

If you have changed the default memory configuration (`MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE` generics)

keep those new sizes in mind – these values are required for setting

keep those new sizes in mind – these values are required for setting

up the software framework in the next section <<_general_software_framework_setup>>.

up the software framework in the next section <<_general_software_framework_setup>>.

[start=11]

[start=9]

. Depending on your FPGA tool of choice, it is time to assign the signals of the test setup top entity to

. Depending on your FPGA tool of choice, it is time to assign the signals of the test setup top entity to

the according pins of your FPGA board. All the signals can be found in the entity declaration:

the according pins of your FPGA board. All the signals can be found in the entity declaration of the

corresponding test setup:

.Entity signals of `neorv32_test_setup.vhd`

.Entity signals of `neorv32_testsetup_approm.vhd`

[source,vhdl]

[source,vhdl]

----

----

entity neorv32_test_setup is

  port (

  port (

    -- Global control --

    -- Global control --

    clk_i       : in std_ulogic := '0'; -- global clock, rising edge

    clk_i       : in  std_ulogic; -- global clock, rising edge

    rstn_i      : in std_ulogic := '0'; -- global reset, low-active, async

    rstn_i      : in  std_ulogic; -- global reset, low-active, async

    -- GPIO --

    gpio_o      : out std_ulogic_vector(7 downto 0) -- parallel output

);

----

.Entity signals of `neorv32_testsetup_bootloader.vhd`

[source,vhdl]

----

  port (

    -- Global control --

    clk_i       : in  std_ulogic; -- global clock, rising edge

    rstn_i      : in  std_ulogic; -- global reset, low-active, async

    -- GPIO --

    -- GPIO --

    gpio_o      : out std_ulogic_vector(7 downto 0); -- parallel output

    gpio_o      : out std_ulogic_vector(7 downto 0); -- parallel output

    -- UART0 --

    -- UART0 --

    uart0_txd_o : out std_ulogic; -- UART0 send data

    uart0_txd_o : out std_ulogic; -- UART0 send data

    uart0_rxd_i : in std_ulogic := '0' -- UART0 receive data

    uart0_rxd_i : in  std_ulogic  -- UART0 receive data

);

);

end neorv32_test_setup;

----

----

[start=12]

.Signal Polarity

[NOTE]

If your FPGA board has inverse polarity for certain input/output you can add `not` gates. Example: The reset signal

`rstn_i` is low-active by default; the LEDs connected to `gpio_o` high-active by default.

You can do this in your board top if you instantiate the test setup,

or _inside_ the test setup if this is your top entity (low-active LEDs example: `gpio_o <= NOT con_gpio_o(7 downto 0);`).

[start=10]

. Attach the clock input `clk_i` to your clock source and connect the reset line `rstn_i` to a button of

. Attach the clock input `clk_i` to your clock source and connect the reset line `rstn_i` to a button of

your FPGA board. Check whether it is low-active or high-active – the reset signal of the processor is

your FPGA board. Check whether it is low-active or high-active – the reset signal of the processor is

**low-active**, so maybe you need to invert the input signal.

**low-active**, so maybe you need to invert the input signal.

. If possible, connected at least bit `0` of the GPIO output port `gpio_o` to a high-active LED (invert

. If possible, connected _at least_ bit `0` of the GPIO output port `gpio_o` to a LED (see "Signal Polarity" note above).

the signal when your LEDs are low-active). This LED will be used as status LED for the setup.

. Finally, if your are using the UART-based test setup (`neorv32_testsetup_bootloader.vhd`)

. Finally, if your FPGA board provides a serial host interface (USB-to-serial converter) interface,

connect the UART communication signals `uart0_txd_o` and `uart0_rxd_i` to the host interface (e.g. USB-UART converter).

connect the UART communication signals `uart0_txd_o` and `uart0_rxd_i`.

. Perform the project HDL compilation (synthesis, mapping, bitstream generation).

. Perform the project HDL compilation (synthesis, mapping, bitstream generation).

. Program the generated bitstream into your FPGA and press the button connected to the reset signal.

. Program the generated bitstream into your FPGA and press the button connected to the reset signal.

. Done! The assigned status LED should be flashing now for some sections before permanently lighting up.

. Done! The LED at `gpio_o(0)` should be flashing now.

[TIP]

After the GCC toolchain for compiling RISC-V source code is ready (chapter <<_general_software_framework_setup>>),

you can advance to one of these chapters to learn how to get a software executable into your processor setup:

* If you are using the `neorv32_testsetup_approm.vhd` setup: See section <<_installing_an_executable_directly_into_memory>>.

* If you are using the `neorv32_testsetup_bootloader.vhd` setup: See section <<_uploading_and_starting_of_a_binary_executable_image_via_uart>>.

// ####################################################################################################################

// ####################################################################################################################

Line 600...

Line 631...

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

== Application-Specific Processor Configuration

Due to the processor's configuration options, which are mainly defined via the top entity VHDL generics, the SoC

can be tailored to the application-specific requirements. Note that this chapter does not focus on optional

_SoC features_ like IO/peripheral modules. It rather gives ideas on how to optimize for _overall goals_

like performance and area.

[NOTE]

Please keep in mind that optimizing the design in one direction (like performance) will also effect other potential

optimization goals (like area and energy).

=== Optimize for Performance

The following points show some concepts to optimize the processor for performance regardless of the costs

(i.e. increasing area and energy requirements):

* Enable all performance-related RISC-V CPU extensions that implement dedicated hardware accelerators instead

of emulating operations entirely in software:  `M`, `C`, `Zfinx`

* Enable mapping of compleX CPU operations to dedicated hardware: `FAST_MUL_EN => true` to use DSP slices for

multiplications, `FAST_SHIFT_EN => true` use a fast barrel shifter for shift operations.

* Implement the instruction cache: `ICACHE_EN => true`

* Use as many _internal_ memory as possible to reduce memory access latency: `MEM_INT_IMEM_EN => true` and

`MEM_INT_DMEM_EN => true`, maximize `MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE`

* Increase the CPU's instruction prefetch buffer size: `CPU_IPB_ENTRIES`

* _To be continued..._

=== Optimize for Size

The NEORV32 is a size-optimized processor system that is intended to fit into tiny niches within large SoC

designs or to be used a customized microcontroller in really tiny / low-power FPGAs (like Lattice iCE40).

Here are some ideas how to make the processor even smaller while maintaining it's _general purpose system_

concept and maximum RISC-V compatibility.

**SoC**

* This is obvious, but exclude all unused optional IO/peripheral modules from synthesis via the processor

configuration generics.

* If an IO module provides an option to configure the number of "channels", constrain this number to the

actually required value (e.g. the PWM module `IO_PWM_NUM_CH` or the external interrupt controller `XIRQ_NUM_CH`).

* Reduce the FIFO sizes of implemented modules (e.g. `SLINK_TX_FIFO`).

* Disable the instruction cache (`ICACHE_EN => false`) if the design only uses processor-internal IMEM

and DMEM memories.

* _To be continued..._

**CPU**

* Use the _embedded_ RISC-V CPU architecture extension (`CPU_EXTENSION_RISCV_E`) to reduce block RAM utilization.

* The compressed instructions extension (`CPU_EXTENSION_RISCV_C`) requires additional logic for the decoder but

also reduces program code size by approximately 30%.

* If not explicitly used/required, constrain the CPU's counter sizes: `CPU_CNT_WIDTH` for `[m]instret[h]`

(number of instruction) and `[m]cycle[h]` (number of cycles) counters. You can even remove these counters

by setting `CPU_CNT_WIDTH => 0` if they are not used at all (note, this is not RISC-V compliant).

* Reduce the CPU's prefetch buffer size (`CPU_IPB_ENTRIES`).

* Map CPU shift operations to a small and iterative shifter unit (`FAST_SHIFT_EN => false`).

* If you have unused DSP block available, you can map multiplication operations to those slices instead of

using LUTs to implement the multiplier (`FAST_MUL_EN => true`).

* If there is no need to execute division in hardware, use the `Zmmul` extension instead of the full-scale

`M` extension.

* Disable CPU extension that are not explicitly used (`A`, `U`, `Zfinx`).

* _To be continued..._

=== Optimize for Clock Speed

The NEORV32 Processor and CPU are designed to provide minimal logic between register stages to keep the

critical path as short as possible. When enabling additional extension or modules the impact on the existing

logic is also kept at a minimum to prevent timing degrading. If there is a major impact on existing

logic (example: many physical memory protection address configuration registers) the VHDL code automatically

adds additional register stages to maintain critical path length. Obviously, this increases operation latency.

In order to optimize for a minimal critical path (= maximum clock speed) the following points should be considered:

* Complex CPU extensions (in terms of hardware requirements) should be avoided (examples: floating-point unit, physical memory protection).

* Large carry chains (>32-bit) should be avoided (constrain CPU counter sizes: e.g. `CPU_CNT_WIDTH => 32` and `HPM_NUM_CNTS => 32`).

* If the target FPGA provides sufficient DSP resources, CPU multiplication operations can be mapped to DSP slices (`FAST_MUL_EN => true`)

reducing LUT usage and critical path impact while also increasing overall performance.

* Use the synchronous (registered) RX path configuration of the external memory interface (`MEM_EXT_ASYNC_RX => false`).

* _To be continued..._

[NOTE]

The short and fixed-length critical path allows to integrate the core into existing clock domains.

So no clock domain-crossing and no sub-clock generation is required. However, for very high clock

frequencies (this is technology / platform dependent) clock domain crossing becomes crucial for chip-internal

connections.

=== Optimize for Energy

There are no _dedicated_ configuration options to optimize the processor for energy (minimal consumption;

energy/instruction ratio) yet. However, a reduced processor area (<<_optimize_for_size>>) will also reduce

static energy consumption.

To optimize your setup for low-power applications, you can make use of the CPU sleep mode (`wfi` instruction).

Put the CPU to sleep mode whenever possible. Disable all processor modules that are not actually used (exclude them

from synthesis if the will be _never_ used; disable the module via it's control register if the module is not

_currently_ used). When is sleep mode, you can keep a timer module running (MTIME or the watch dog) to wake up

the CPU again. Since the wake up is triggered by _any_ interrupt, the external interrupt controller can also

be used to wake up the CPU again. By this, all timers (and all other modules) can be deactivated as well.

.Processor-internal clock generator shutdown

[TIP]

If _no_ IO/peripheral module is currently enabled, the processor's internal clock generator circuit will be

shut down reducing switching activity and thus, dynamic energy consumption.

// ####################################################################################################################

:sectnums:

== Customizing the Internal Bootloader

== Customizing the Internal Bootloader

The NEORV32 bootloader provides several options to configure and customize it for a certain application setup.

The NEORV32 bootloader provides several options to configure and customize it for a certain application setup.

This configuration is done by passing _defines_ when compiling the bootloader. Of course you can also

This configuration is done by passing _defines_ when compiling the bootloader. Of course you can also

modify to bootloader source code to provide a setup that perfectly fits your needs.

modify to bootloader source code to provide a setup that perfectly fits your needs.

Line 630...

Line 770...

4+^| Boot configuration

4+^| Boot configuration

| `AUTO_BOOT_SPI_EN`  | `0` | `0`, `1` | Set `1` to enable immediate boot from external SPI flash

| `AUTO_BOOT_SPI_EN`  | `0` | `0`, `1` | Set `1` to enable immediate boot from external SPI flash

| `AUTO_BOOT_OCD_EN`  | `0` | `0`, `1` | Set `1` to enable boot via on-chip debugger (OCD)

| `AUTO_BOOT_OCD_EN`  | `0` | `0`, `1` | Set `1` to enable boot via on-chip debugger (OCD)

| `AUTO_BOOT_TIMEOUT` | `8` | _any_ | Time in seconds after the auto-boot sequence starts (if there is no UART input by user); set to 0 to disabled auto-boot sequence

| `AUTO_BOOT_TIMEOUT` | `8` | _any_ | Time in seconds after the auto-boot sequence starts (if there is no UART input by user); set to 0 to disabled auto-boot sequence

4+^| SPI configuration

4+^| SPI configuration

| `SPI_EN`                | `1` | `0`, `1` | Set `1` to enable the usage of the SPI module (including load/store executables from/to SPI flash options)

| `SPI_FLASH_CS`          | `0` | `0` ... `7` | SPI chip select output (`spi_csn_o`) for selecting flash

| `SPI_FLASH_CS`          | `0` | `0` ... `7` | SPI chip select output (`spi_csn_o`) for selecting flash

| `SPI_FLASH_SECTOR_SIZE` | `65536` | _any_ | SPI flash sector size in bytes

| `SPI_FLASH_SECTOR_SIZE` | `65536` | _any_ | SPI flash sector size in bytes

| `SPI_FLASH_CLK_PRSC`    | `CLK_PRSC_8`  | `CLK_PRSC_2` `CLK_PRSC_4` `CLK_PRSC_8` `CLK_PRSC_64` `CLK_PRSC_128` `CLK_PRSC_1024` `CLK_PRSC_2024` `CLK_PRSC_4096` | SPI clock pre-scaler (dividing main processor clock)

| `SPI_FLASH_CLK_PRSC`    | `CLK_PRSC_8`  | `CLK_PRSC_2` `CLK_PRSC_4` `CLK_PRSC_8` `CLK_PRSC_64` `CLK_PRSC_128` `CLK_PRSC_1024` `CLK_PRSC_2024` `CLK_PRSC_4096` | SPI clock pre-scaler (dividing main processor clock)

| `SPI_BOOT_BASE_ADDR`    | `0x08000000` | _any_ 32-bit value | Defines the _base_ address of the executable in external flash

| `SPI_BOOT_BASE_ADDR`    | `0x08000000` | _any_ 32-bit value | Defines the _base_ address of the executable in external flash

|=======================

|=======================

Line 818...

Line 959...

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

== Simulating the Processor

== Simulating the Processor

.WORK IN PROGRESS

[WARNING]

This Section Is Under Construction! +

FIXME!

:sectnums:

:sectnums:

=== Testbench

=== Testbench

The NEORV32 project features a simple default testbench (`sim/neorv32_tb.simple.vhd`) that can be used to simulate

The NEORV32 project features a simple, plain-VHDL (no third-party libraries) default testbench (`sim/neorv32_tb.simple.vhd`)

and test the processor setup. This testbench features a 100MHz clock and enables all optional peripheral and

that can be used to simulate and test the processor setup. This testbench features a 100MHz clock and enables all optional

CPU extensions except for the `E` extension and the TRNG IO module (that CANNOT be simulated due to its

peripheral and CPU extensions except for the `E` extension and the TRNG IO module (that CANNOT be simulated due to its

combinatorial (looped) oscillator architecture).

combinatorial (looped) architecture).

The simulation setup is configured via the "User Configuration" section located right at the beginning of

The simulation setup is configured via the "User Configuration" section located right at the beginning of

the testbench's architecture. Each configuration constant provides comments to explain the functionality.

the testbench's architecture. Each configuration constant provides comments to explain the functionality.

Besides the actual NEORV32 Processor, the testbench also simulates "external" components that are connected

Besides the actual NEORV32 Processor, the testbench also simulates "external" components that are connected

Line 858...

Line 993...

| `0x80000000` | `dmem_size_c` | `r/w/e,  a, 8/16/32` | external DMEM

| `0x80000000` | `dmem_size_c` | `r/w/e,  a, 8/16/32` | external DMEM

| `0xf0000000` |      64 bytes | `r/w/e, !a, 8/16/32` | external "IO" memory, atomic accesses will fail

| `0xf0000000` |      64 bytes | `r/w/e, !a, 8/16/32` | external "IO" memory, atomic accesses will fail

| `0xff000000` |       4 bytes | `-/w/-,  a,  -/-/32` | memory-mapped register to trigger "machine external", "machine software" and "SoC Fast Interrupt" interrupts

| `0xff000000` |       4 bytes | `-/w/-,  a,  -/-/32` | memory-mapped register to trigger "machine external", "machine software" and "SoC Fast Interrupt" interrupts

|=======================

|=======================

The simulated NEORV32 does not use the bootloader and directly boots the current application image (from

the `rtl/core/neorv32_application_image.vhd` image file). Make sure to use the `all` target of the

makefile to install your application as VHDL image after compilation:

[source, bash]

----

sw/example/blink_led$ make clean_all all

----

.Simulation-Optimized CPU/Processors Modules

[NOTE]

[NOTE]

The `sim/rtl_modules` folder provides simulation-optimized versions of certain CPU/processor modules.

The simulated NEORV32 does not use the bootloader and _directly boots_ the current application image (from

These alternatives can be used to replace the default CPU/processor HDL files to allow faster/easier/more

the `rtl/core/neorv32_application_image.vhd` image file).

efficient simulation. **These files are not intended for synthesis!**

**Simulation Console Output**

.UART output during simulation

[NOTE]

Data written to the NEORV32 UART0 / UART1 transmitter is send to a virtual UART receiver implemented

Data written to the NEORV32 UART0 / UART1 transmitter is send to a virtual UART receiver implemented

as part of the testbench. Received chars are send to the simulator console and are also stored to a log file

as part of the testbench. Received chars are send to the simulator console and are also stored to a log file

(`neorv32.testbench_uart0.out` for UART0, `neorv32.testbench_uart1.out` for UART1) inside the simulator home folder.

(`neorv32.testbench_uart0.out` for UART0, `neorv32.testbench_uart1.out` for UART1) inside the simulation's home folder.

**Please note that printing via the native UART receiver takes a lot of time.** For faster simulation console output

see section <<_faster_simulation_console_output>>.

:sectnums:

:sectnums:

=== Faster Simulation Console Output

=== Faster Simulation Console Output

Line 907...

Line 1033...

[source, bash]

[source, bash]

----

----

sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all all

sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all all

----

----

The provided define will change the default UART0/UART1 setup function in order to set the simulation mode flag in the according UART's control register.

The provided define will change the default UART0/UART1 setup function in order to set the simulation

mode flag in the according UART's control register.

[NOTE]

[NOTE]

The UART simulation output (to file and to screen) outputs "complete lines" at once. A line is

The UART simulation output (to file and to screen) outputs "complete lines" at once. A line is

completed with a line feed (newline, ASCII `\n` = 10).

completed with a line feed (newline, ASCII `\n` = 10).

Line 927...

Line 1054...

----

----

neorv32/sim$ sh ghdl_sim.sh --stop-time=20ms

neorv32/sim$ sh ghdl_sim.sh --stop-time=20ms

----

----

:sectnums:

=== In-Console Application Simulation

To directly compile and run a program in the console (using the default testbench and GHDL

as simulator) you can use the `sim` makefile target. Make sure to use the UART simulation mode

(`USER_FLAGS+=-DUART0_SIM_MODE` and/or `USER_FLAGS+=-DUART1_SIM_MODE`) to get

faster / direct-to-console UART output.

[source, bash]

----

sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all sim

[...]

Blinking LED demo program

----

:sectnums:

=== Hello World!

To do a quick test of the NEORV32 make sure to have [GHDL](https://github.com/ghdl/ghdl) and a

[RISC-V gcc toolchain](https://github.com/stnolting/riscv-gcc-prebuilt) installed, navigate to the project's

`sw/example/hello_world` folder and run `make USER_FLAGS+=-DUART0_SIM_MODE MARCH=-march=rv32imac clean_all sim`:

[TIP]

The simulator will output some _sanity check_ notes (and warnings or even errors if something is ill-configured)

right at the beginning of the simulation to give a brief overview of the actual NEORV32 SoC and CPU configurations.

[source, bash]

----

stnolting@Einstein:/mnt/n/Projects/neorv32/sw/example/hello_world$ make USER_FLAGS+=-DUART0_SIM_MODE MARCH=-march=rv32imac clean_all sim

../../../sw/lib/source/neorv32_uart.c: In function 'neorv32_uart0_setup':

../../../sw/lib/source/neorv32_uart.c:301:4: warning: #warning UART0_SIM_MODE (primary UART) enabled! Sending all UART0.TX data to text.io simulation output instead of real UART0 transmitter. Use this for simulations only! [-Wcpp]

  301 |   #warning UART0_SIM_MODE (primary UART) enabled! Sending all UART0.TX data to text.io simulation output instead of real UART0 transmitter. Use this for simulations only!

      |    ^~~~~~~

Memory utilization:

   text    data     bss     dec     hex filename

   4612       0     120    4732    127c main.elf

Compiling ../../../sw/image_gen/image_gen

Installing application image to ../../../rtl/core/neorv32_application_image.vhd

Simulating neorv32_application_image.vhd...

Tip: Compile application with USER_FLAGS+=-DUART[0/1]_SIM_MODE to auto-enable UART[0/1]'s simulation mode (redirect UART output to simulator console).

Using simulation runtime args: --stop-time=10ms

../rtl/core/neorv32_top.vhd:347:3:@0ms:(assertion note): NEORV32 PROCESSOR IO Configuration: GPIO MTIME UART0 UART1 SPI TWI PWM WDT CFS SLINK NEOLED XIRQ

../rtl/core/neorv32_top.vhd:370:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Boot configuration: Direct boot from memory (processor-internal IMEM).

../rtl/core/neorv32_top.vhd:394:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing on-chip debugger (OCD).

../rtl/core/neorv32_cpu.vhd:169:3:@0ms:(assertion note): NEORV32 CPU ISA Configuration (MARCH): RV32IMACU_Zbb_Zicsr_Zifencei_Zfinx_Debug

../rtl/core/neorv32_cpu.vhd:189:3:@0ms:(assertion note): NEORV32 CPU CONFIG NOTE: Implementing NO dedicated hardware reset for uncritical registers (default, might reduce area). Set package constant  = TRUE to configure a DEFINED reset value for all CPU registers.

../rtl/core/neorv32_imem.vhd:107:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal IMEM as ROM (16384 bytes), pre-initialized with application (4612 bytes).

../rtl/core/neorv32_dmem.vhd:89:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal DMEM (RAM, 8192 bytes).

../rtl/core/neorv32_wishbone.vhd:136:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing STANDARD Wishbone protocol.

../rtl/core/neorv32_wishbone.vhd:140:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing auto-timeout (255 cycles).

../rtl/core/neorv32_wishbone.vhd:144:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing LITTLE-endian byte order.

../rtl/core/neorv32_wishbone.vhd:148:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing registered RX path.

../rtl/core/neorv32_slink.vhd:161:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing 8 RX and 8 TX stream links.

##

                                                                                       ##         ##   ##   ##

 ##     ##   #########   ########    ########   ##      ##   ########    ########      ##       ################

####    ##  ##          ##      ##  ##      ##  ##      ##  ##      ##  ##      ##     ##     ####            ####

## ##   ##  ##          ##      ##  ##      ##  ##      ##          ##         ##      ##       ##   ######   ##

##  ##  ##  #########   ##      ##  #########   ##      ##      #####        ##        ##     ####   ######   ####

##   ## ##  ##          ##      ##  ##    ##     ##    ##           ##     ##          ##       ##   ######   ##

##    ####  ##          ##      ##  ##     ##     ##  ##    ##      ##   ##            ##     ####            ####

##     ##    #########   ########   ##      ##      ##       ########   ##########     ##       ################

                                                                                       ##         ##   ##   ##

##

Hello world! :)

----

:sectnums:

=== Advanced Simulation using VUNIT

.WORK IN PROGRESS

[WARNING]

This Section Is Under Construction! +

FIXME!

The NEORV32 provides a more sophisticated simulation setup using https://vunit.github.io/[VUNIT].

The according VUNIT-based testbench is `sim/neorv32_tb.vhd`.

**WORK-IN-PROGRESS**

// ####################################################################################################################

// ####################################################################################################################

:sectnums:

:sectnums:

== Building the Documentation

== Building the Documentation

Browse

Tools

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [userguide/] [content.adoc] - Diff between revs 62 and 63