1 |
60 |
zero_gravi |
Let's Get It Started!
|
2 |
|
|
|
3 |
63 |
zero_gravi |
This user guide uses the NEORV32 project _as is_ from the official `neorv32` repository.
|
4 |
|
|
To make your first NEORV32 project run, follow the guides from the upcoming sections. It is recommended to
|
5 |
|
|
follow these guides step by step and eventually in the presented order.
|
6 |
60 |
zero_gravi |
|
7 |
63 |
zero_gravi |
[TIP]
|
8 |
|
|
This guide uses the minimalistic and platform/toolchain agnostic SoC test setups from
|
9 |
|
|
`rtl/test_setups` for illustration. You can use one of the provided test setups for
|
10 |
|
|
your first FPGA tests. Alternatively, have a look at the `setups` folder,
|
11 |
|
|
which provides more sophisticated example setups for various FPGAs/FPGA boards and toolchains.
|
12 |
|
|
|
13 |
|
|
|
14 |
60 |
zero_gravi |
:sectnums:
|
15 |
61 |
zero_gravi |
== Software Toolchain Setup
|
16 |
60 |
zero_gravi |
|
17 |
61 |
zero_gravi |
To compile (and debug) executables for the NEORV32 a RISC-V toolchain is required.
|
18 |
|
|
There are two possibilities to get this:
|
19 |
60 |
zero_gravi |
|
20 |
63 |
zero_gravi |
1. Download and _build_ the official RISC-V GNU toolchain yourself.
|
21 |
61 |
zero_gravi |
2. Download and install a prebuilt version of the toolchain; this might also done via the package manager / app store of your OS
|
22 |
60 |
zero_gravi |
|
23 |
63 |
zero_gravi |
[NOTE]
|
24 |
|
|
The default toolchain prefix (`RISCV_PREFIX` variable) for this project is **`riscv32-unknown-elf-`**. Of course you can use any other RISC-V
|
25 |
|
|
toolchain (like `riscv64-unknown-elf-`) that is capable to emit code for a `rv32` architecture. Just change `RISCV_PREFIX`
|
26 |
|
|
according to your needs.
|
27 |
60 |
zero_gravi |
|
28 |
|
|
|
29 |
|
|
:sectnums:
|
30 |
|
|
=== Building the Toolchain from Scratch
|
31 |
|
|
|
32 |
65 |
zero_gravi |
To build the toolchain by yourself you can follow the guide from the official https://github.com/riscv-collab/riscv-gnu-toolchain GitHub page.
|
33 |
61 |
zero_gravi |
You need to make sure the generated toolchain fits the architecture of the NEORV32 core. To get a toolchain that even supports minimal
|
34 |
|
|
ISA extension configurations, it is recommend to compile for `rv32i` only. Please note that this minimal ISA also provides further ISA
|
35 |
65 |
zero_gravi |
extensions like `m` or `c`. Of course you can use a _multilib_ approach to generate toolchains for several target ISAs at once.
|
36 |
60 |
zero_gravi |
|
37 |
61 |
zero_gravi |
.Configuring GCC build for `rv32i` (minimal ISA)
|
38 |
60 |
zero_gravi |
[source,bash]
|
39 |
|
|
----
|
40 |
65 |
zero_gravi |
riscv-gnu-toolchain$ ./configure --prefix=/opt/riscv --with-arch=rv32i --with-abi=ilp32
|
41 |
60 |
zero_gravi |
riscv-gnu-toolchain$ make
|
42 |
|
|
----
|
43 |
|
|
|
44 |
63 |
zero_gravi |
[IMPORTANT]
|
45 |
65 |
zero_gravi |
Keep in mind that - for instance - a toolchain build with `--with-arch=rv32imc` only provides library code compiled with
|
46 |
63 |
zero_gravi |
compressed (`C`) and `mul`/`div` instructions (`M`)! Hence, this code cannot be executed (without
|
47 |
|
|
emulation) on an architecture without these extensions!
|
48 |
60 |
zero_gravi |
|
49 |
63 |
zero_gravi |
|
50 |
60 |
zero_gravi |
:sectnums:
|
51 |
|
|
=== Downloading and Installing a Prebuilt Toolchain
|
52 |
|
|
|
53 |
|
|
Alternatively, you can download a prebuilt toolchain.
|
54 |
|
|
|
55 |
61 |
zero_gravi |
:sectnums:
|
56 |
|
|
==== Use The Toolchain I have Build
|
57 |
60 |
zero_gravi |
|
58 |
61 |
zero_gravi |
I have compiled a GCC toolchain on a 64-bit x86 Ubuntu (Ubuntu on Windows, actually) and uploaded it to
|
59 |
60 |
zero_gravi |
GitHub. You can directly download the according toolchain archive as single _zip-file_ within a packed
|
60 |
61 |
zero_gravi |
release from https://github.com/stnolting/riscv-gcc-prebuilt.
|
61 |
60 |
zero_gravi |
|
62 |
|
|
Unpack the downloaded toolchain archive and copy the content to a location in your file system (e.g.
|
63 |
|
|
`/opt/riscv`). More information about downloading and installing my prebuilt toolchains can be found in
|
64 |
|
|
the repository's README.
|
65 |
|
|
|
66 |
|
|
|
67 |
61 |
zero_gravi |
:sectnums:
|
68 |
|
|
==== Use a Third Party Toolchain
|
69 |
|
|
|
70 |
60 |
zero_gravi |
Of course you can also use any other prebuilt version of the toolchain. There are a lot RISC-V GCC packages out there -
|
71 |
61 |
zero_gravi |
even for Windows. On Linux system you might even be able to fetch a toolchain via your distribution's package manager.
|
72 |
60 |
zero_gravi |
|
73 |
|
|
[IMPORTANT]
|
74 |
|
|
Make sure the toolchain can (also) emit code for a `rv32i` architecture, uses the `ilp32` or `ilp32e` ABI and **was not build** using
|
75 |
|
|
CPU extensions that are not supported by the NEORV32 (like `D`).
|
76 |
|
|
|
77 |
|
|
|
78 |
|
|
:sectnums:
|
79 |
|
|
=== Installation
|
80 |
|
|
|
81 |
61 |
zero_gravi |
Now you have the toolchain binaries. The last step is to add them to your `PATH` environment variable (if you have not
|
82 |
|
|
already done so): make sure to add the _binaries_ folder (`bin`) of your toolchain.
|
83 |
60 |
zero_gravi |
|
84 |
|
|
[source,bash]
|
85 |
|
|
----
|
86 |
|
|
$ export PATH:$PATH:/opt/riscv/bin
|
87 |
|
|
----
|
88 |
|
|
|
89 |
|
|
You should add this command to your `.bashrc` (if you are using bash) to automatically add the RISC-V
|
90 |
|
|
toolchain at every console start.
|
91 |
|
|
|
92 |
|
|
:sectnums:
|
93 |
|
|
=== Testing the Installation
|
94 |
|
|
|
95 |
|
|
To make sure everything works fine, navigate to an example project in the NEORV32 example folder and
|
96 |
|
|
execute the following command:
|
97 |
|
|
|
98 |
|
|
[source,bash]
|
99 |
|
|
----
|
100 |
|
|
neorv32/sw/example/blink_led$ make check
|
101 |
|
|
----
|
102 |
|
|
|
103 |
66 |
zero_gravi |
This will test all the tools required for generating NEORV32 executables.
|
104 |
61 |
zero_gravi |
Everything is working fine if `Toolchain check OK` appears at the end.
|
105 |
60 |
zero_gravi |
|
106 |
|
|
|
107 |
|
|
|
108 |
|
|
<<<
|
109 |
|
|
// ####################################################################################################################
|
110 |
|
|
:sectnums:
|
111 |
|
|
== General Hardware Setup
|
112 |
|
|
|
113 |
63 |
zero_gravi |
This guide shows the basics of setting up a NEORV32 project for FPGA implementation (or simulation only)
|
114 |
|
|
_from scratch_. It uses a _simplified_ test "SoC" setup of the processor to keeps things simple at the beginning.
|
115 |
|
|
This simple setup is intended for evaluation or as "hello world" project to check out the NEORV32
|
116 |
|
|
on _your_ FPGA board.
|
117 |
60 |
zero_gravi |
|
118 |
|
|
[TIP]
|
119 |
63 |
zero_gravi |
If you want to use a more sophisticated pre-defined setup to start with, check out the
|
120 |
|
|
`setups` folder, which provides example setups for various FPGA, boards and toolchains.
|
121 |
60 |
zero_gravi |
|
122 |
63 |
zero_gravi |
The NEORV32 project features two minimalistic pre-configured test setups in
|
123 |
|
|
https://github.com/stnolting/neorv32/blob/master/rtl/test_setups[`rtl/test_setups`].
|
124 |
|
|
Both test setups only implement very basic processor and CPU features.
|
125 |
|
|
The main difference between the two setups is the processor boot concept - so how to get a software executable
|
126 |
|
|
_into_ the processor:
|
127 |
60 |
zero_gravi |
|
128 |
63 |
zero_gravi |
* **`rtl/test_setups/neorv32_testsetup_approm.vhd`**: this setup does not require a connection via UART. The
|
129 |
|
|
software executable is "installed" into the bitstream to initialize a read-only memory. Use this setup
|
130 |
|
|
if your FPGA board does _not_ provide a UART interface.
|
131 |
|
|
* **`rtl/test_setups/neorv32_testsetup_bootloader.vhd`**: this setups uses the UART and the default NEORV32
|
132 |
|
|
bootloader to upload new software executables. Use this setup if your board _does_ provide a UART interface.
|
133 |
|
|
|
134 |
|
|
.NEORV32 "hello world" test setup (`rtl/test_setups/neorv32_testsetup_bootloader.vhd`)
|
135 |
|
|
image::neorv32_test_setup.png[align=center]
|
136 |
|
|
|
137 |
|
|
.External Clock Source
|
138 |
|
|
[NOTE]
|
139 |
|
|
These test setups are intended to be directly used as **design top entity**. Of course you can also instantiate them
|
140 |
|
|
into another design unit. If your FPGA board only provides _very fast_ external clock sources (like on the FOMU board)
|
141 |
|
|
you might need to add clock management components (PLLs, DCMs, MMCMs, ...) to the test setup or to the according top entity
|
142 |
|
|
if you instantiate one of the test setups.
|
143 |
|
|
|
144 |
61 |
zero_gravi |
[start=1]
|
145 |
60 |
zero_gravi |
. Create a new project with your FPGA EDA tool of choice.
|
146 |
63 |
zero_gravi |
. Add all VHDL files from the project's `rtl/core` folder to your project.
|
147 |
64 |
zero_gravi |
|
148 |
|
|
.Internal Memories
|
149 |
|
|
[IMPORTANT]
|
150 |
|
|
For a _general_ first setup (technology-independent) use the _default_ memory architectures for the internal memories
|
151 |
|
|
(IMEM and DMEM). These are located in `rtl/core/mem`, so **make sure to add the files from `rtl/core/mem` to your project, too**. +
|
152 |
|
|
+
|
153 |
|
|
If synthesis cannot efficiently map those default memory descriptions to the available memory resources, you can later replace the
|
154 |
|
|
default memory architectures by optimized platform-specific memory architectures. **Example:** The `setups/radiant/UPduino_v3`
|
155 |
|
|
example setup uses optimized memory primitives. Hence, it does not include the default memory architectures from
|
156 |
|
|
`rtl/core/mem` as these are replaced by device-specific implementations. However, it still has to include the entity
|
157 |
|
|
definitions from `rtl/core`.
|
158 |
|
|
|
159 |
|
|
[start=3]
|
160 |
61 |
zero_gravi |
. Make sure to add all the rtl files to a new library called `neorv32`. If your FPGA tools does not
|
161 |
|
|
provide a field to enter the library name, check out the "properties" menu of the added rtl files.
|
162 |
64 |
zero_gravi |
|
163 |
|
|
.Compile order
|
164 |
|
|
[NOTE]
|
165 |
|
|
Some tools (like Lattice Radiant) might require a _manual compile order_ of the VHDL source files to identify the dependencies.
|
166 |
|
|
The package file `neorv32_package.vhd` should be analyzed first followed by the memory image files (`neorv32_application_imagevhd`
|
167 |
|
|
and `neorv32_bootloader_image.vhd`) and the entity-only files (`neorv32_*mem.entity.vhd`).
|
168 |
|
|
|
169 |
|
|
[start=4]
|
170 |
63 |
zero_gravi |
. The `rtl/core/neorv32_top.vhd` VHDL file is the top entity of the NEORV32 processor, which can be
|
171 |
|
|
instantiated into the "real" project. However, in this tutorial we will use one of the pre-defined
|
172 |
|
|
test setups from `rtl/test_setups` (see above).
|
173 |
61 |
zero_gravi |
|
174 |
|
|
[IMPORTANT]
|
175 |
|
|
Make sure to include the `neorv32` package into your design when instantiating the processor: add
|
176 |
|
|
`library neorv32;` and `use neorv32.neorv32_package.all;` to your design unit.
|
177 |
|
|
|
178 |
|
|
[start=5]
|
179 |
63 |
zero_gravi |
. Add the pre-defined test setup of choice to the project, too, and select it as _top entity_.
|
180 |
|
|
. The entity of both test setups
|
181 |
|
|
provide a minimal set of configuration generics, that might have to be adapted to match your FPGA and board:
|
182 |
60 |
zero_gravi |
|
183 |
63 |
zero_gravi |
.Test setup entity - configuration generics
|
184 |
60 |
zero_gravi |
[source,vhdl]
|
185 |
|
|
----
|
186 |
63 |
zero_gravi |
generic (
|
187 |
|
|
-- adapt these for your setup --
|
188 |
|
|
CLOCK_FREQUENCY : natural := 100000000; <1>
|
189 |
|
|
MEM_INT_IMEM_SIZE : natural := 16*1024; <2>
|
190 |
|
|
MEM_INT_DMEM_SIZE : natural := 8*1024 <3>
|
191 |
|
|
);
|
192 |
60 |
zero_gravi |
----
|
193 |
61 |
zero_gravi |
<1> Clock frequency of `clk_i` signal in Hertz
|
194 |
|
|
<2> Default size of internal instruction memory: 16kB
|
195 |
|
|
<3> Default size of internal data memory: 8kB
|
196 |
60 |
zero_gravi |
|
197 |
63 |
zero_gravi |
[start=7]
|
198 |
65 |
zero_gravi |
. If you feel like it - or if your FPGA does not provide sufficient resources - you can modify the
|
199 |
|
|
_memory sizes_ (`MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE` - marked with notes "2" and "3"). But as mentioned
|
200 |
63 |
zero_gravi |
above, let's keep things simple at first and use the standard configuration for now.
|
201 |
|
|
. There is one generic that _has to be set according to your FPGA board_ setup: the actual clock frequency
|
202 |
|
|
of the top's clock input signal (`clk_i`). Use the `CLOCK_FREQUENCY` generic to specify your clock source's
|
203 |
|
|
frequency in Hertz (Hz).
|
204 |
60 |
zero_gravi |
|
205 |
|
|
[NOTE]
|
206 |
63 |
zero_gravi |
If you have changed the default memory configuration (`MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE` generics)
|
207 |
65 |
zero_gravi |
keep those new sizes in mind - these values are required for setting
|
208 |
60 |
zero_gravi |
up the software framework in the next section <<_general_software_framework_setup>>.
|
209 |
|
|
|
210 |
63 |
zero_gravi |
[start=9]
|
211 |
60 |
zero_gravi |
. Depending on your FPGA tool of choice, it is time to assign the signals of the test setup top entity to
|
212 |
63 |
zero_gravi |
the according pins of your FPGA board. All the signals can be found in the entity declaration of the
|
213 |
|
|
corresponding test setup:
|
214 |
60 |
zero_gravi |
|
215 |
63 |
zero_gravi |
.Entity signals of `neorv32_testsetup_approm.vhd`
|
216 |
60 |
zero_gravi |
[source,vhdl]
|
217 |
|
|
----
|
218 |
|
|
port (
|
219 |
|
|
-- Global control --
|
220 |
63 |
zero_gravi |
clk_i : in std_ulogic; -- global clock, rising edge
|
221 |
|
|
rstn_i : in std_ulogic; -- global reset, low-active, async
|
222 |
60 |
zero_gravi |
-- GPIO --
|
223 |
63 |
zero_gravi |
gpio_o : out std_ulogic_vector(7 downto 0) -- parallel output
|
224 |
|
|
);
|
225 |
|
|
----
|
226 |
|
|
|
227 |
|
|
.Entity signals of `neorv32_testsetup_bootloader.vhd`
|
228 |
|
|
[source,vhdl]
|
229 |
|
|
----
|
230 |
|
|
port (
|
231 |
|
|
-- Global control --
|
232 |
|
|
clk_i : in std_ulogic; -- global clock, rising edge
|
233 |
|
|
rstn_i : in std_ulogic; -- global reset, low-active, async
|
234 |
|
|
-- GPIO --
|
235 |
60 |
zero_gravi |
gpio_o : out std_ulogic_vector(7 downto 0); -- parallel output
|
236 |
|
|
-- UART0 --
|
237 |
|
|
uart0_txd_o : out std_ulogic; -- UART0 send data
|
238 |
63 |
zero_gravi |
uart0_rxd_i : in std_ulogic -- UART0 receive data
|
239 |
|
|
);
|
240 |
60 |
zero_gravi |
----
|
241 |
|
|
|
242 |
63 |
zero_gravi |
.Signal Polarity
|
243 |
|
|
[NOTE]
|
244 |
|
|
If your FPGA board has inverse polarity for certain input/output you can add `not` gates. Example: The reset signal
|
245 |
|
|
`rstn_i` is low-active by default; the LEDs connected to `gpio_o` high-active by default.
|
246 |
|
|
You can do this in your board top if you instantiate the test setup,
|
247 |
|
|
or _inside_ the test setup if this is your top entity (low-active LEDs example: `gpio_o <= NOT con_gpio_o(7 downto 0);`).
|
248 |
|
|
|
249 |
|
|
[start=10]
|
250 |
60 |
zero_gravi |
. Attach the clock input `clk_i` to your clock source and connect the reset line `rstn_i` to a button of
|
251 |
65 |
zero_gravi |
your FPGA board. Check whether it is low-active or high-active - the reset signal of the processor is
|
252 |
60 |
zero_gravi |
**low-active**, so maybe you need to invert the input signal.
|
253 |
63 |
zero_gravi |
. If possible, connected _at least_ bit `0` of the GPIO output port `gpio_o` to a LED (see "Signal Polarity" note above).
|
254 |
|
|
. Finally, if your are using the UART-based test setup (`neorv32_testsetup_bootloader.vhd`)
|
255 |
|
|
connect the UART communication signals `uart0_txd_o` and `uart0_rxd_i` to the host interface (e.g. USB-UART converter).
|
256 |
60 |
zero_gravi |
. Perform the project HDL compilation (synthesis, mapping, bitstream generation).
|
257 |
61 |
zero_gravi |
. Program the generated bitstream into your FPGA and press the button connected to the reset signal.
|
258 |
63 |
zero_gravi |
. Done! The LED at `gpio_o(0)` should be flashing now.
|
259 |
60 |
zero_gravi |
|
260 |
63 |
zero_gravi |
[TIP]
|
261 |
|
|
After the GCC toolchain for compiling RISC-V source code is ready (chapter <<_general_software_framework_setup>>),
|
262 |
|
|
you can advance to one of these chapters to learn how to get a software executable into your processor setup:
|
263 |
|
|
* If you are using the `neorv32_testsetup_approm.vhd` setup: See section <<_installing_an_executable_directly_into_memory>>.
|
264 |
|
|
* If you are using the `neorv32_testsetup_bootloader.vhd` setup: See section <<_uploading_and_starting_of_a_binary_executable_image_via_uart>>.
|
265 |
60 |
zero_gravi |
|
266 |
|
|
|
267 |
63 |
zero_gravi |
|
268 |
60 |
zero_gravi |
<<<
|
269 |
|
|
// ####################################################################################################################
|
270 |
|
|
:sectnums:
|
271 |
|
|
== General Software Framework Setup
|
272 |
|
|
|
273 |
61 |
zero_gravi |
To allow executables to be _actually executed_ on the NEORV32 Processor the configuration of the software framework
|
274 |
|
|
has to be aware to the hardware configuration. This guide focuses on the memory configuration. To enabled
|
275 |
66 |
zero_gravi |
certain CPU ISA features refer to the <<_enabling_risc_v_cpu_extensions>> section.
|
276 |
60 |
zero_gravi |
|
277 |
61 |
zero_gravi |
[TIP]
|
278 |
|
|
If you have **not** changed the _default_ memory configuration in section <<_general_hardware_setup>>
|
279 |
|
|
you are already done and you can skip the rest of this guide.
|
280 |
|
|
|
281 |
60 |
zero_gravi |
[start=1]
|
282 |
|
|
. Open the NEORV32 linker script `sw/common/neorv32.ld` with a text editor. Right at the
|
283 |
61 |
zero_gravi |
beginning of this script you will find the `MEMORY` configuration listing the different memory section:
|
284 |
60 |
zero_gravi |
|
285 |
61 |
zero_gravi |
.Cut-out of the linker script `neorv32.ld`: `ram` memory section configuration
|
286 |
60 |
zero_gravi |
[source,c]
|
287 |
|
|
----
|
288 |
|
|
MEMORY
|
289 |
|
|
{
|
290 |
65 |
zero_gravi |
ram (rwx) : ORIGIN = 0x80000000, LENGTH = DEFINED(make_bootloader) ? 512 : 8*1024 <1>
|
291 |
61 |
zero_gravi |
...
|
292 |
60 |
zero_gravi |
----
|
293 |
61 |
zero_gravi |
<1> Size of the data memory address space (right-most value) (internal/external DMEM); here 8kB
|
294 |
60 |
zero_gravi |
|
295 |
61 |
zero_gravi |
[start=2]
|
296 |
|
|
. We only need to change the `ram` section, which presents the available data address space.
|
297 |
|
|
If you have changed the DMEM (_MEM_INT_DMEM_SIZE_ generic) size adapt the `LENGTH` parameter of the `ram`
|
298 |
|
|
section (here: `8*1024`) so it is equal to your DMEM hardware configuration.
|
299 |
60 |
zero_gravi |
|
300 |
61 |
zero_gravi |
[IMPORTANT]
|
301 |
|
|
Make sure you only modify the _right-most_ value (here: 8*1024)! +
|
302 |
|
|
The "`512`" are not relevant for the application.
|
303 |
|
|
|
304 |
60 |
zero_gravi |
[start=3]
|
305 |
61 |
zero_gravi |
. Done! Save your changes and close the linker script.
|
306 |
60 |
zero_gravi |
|
307 |
61 |
zero_gravi |
.Advanced: Section base address and size
|
308 |
60 |
zero_gravi |
[IMPORTANT]
|
309 |
61 |
zero_gravi |
More information can be found in the datasheet section https://stnolting.github.io/neorv32/#_address_space[Address Space].
|
310 |
60 |
zero_gravi |
|
311 |
|
|
|
312 |
|
|
|
313 |
|
|
<<<
|
314 |
|
|
// ####################################################################################################################
|
315 |
|
|
:sectnums:
|
316 |
|
|
== Application Program Compilation
|
317 |
|
|
|
318 |
62 |
zero_gravi |
This guide shows how to compile an example C-code application into a NEORV32 executable that
|
319 |
61 |
zero_gravi |
can be uploaded via the bootloader or the on-chip debugger.
|
320 |
|
|
|
321 |
|
|
[IMPORTANT]
|
322 |
|
|
If your FPGA board does not provide such an interface - don't worry!
|
323 |
|
|
Section <<_installing_an_executable_directly_into_memory>> shows how to
|
324 |
|
|
run custom programs on your FPGA setup without having a UART.
|
325 |
|
|
|
326 |
60 |
zero_gravi |
[start=1]
|
327 |
61 |
zero_gravi |
. Open a terminal console and navigate to one of the project's example programs. For instance, navigate to the
|
328 |
|
|
simple `sw/example_blink_led` example program. This program uses the NEORV32 GPIO module to display
|
329 |
60 |
zero_gravi |
an 8-bit counter on the lowest eight bit of the `gpio_o` output port.
|
330 |
|
|
. To compile the project and generate an executable simply execute:
|
331 |
|
|
|
332 |
|
|
[source,bash]
|
333 |
|
|
----
|
334 |
61 |
zero_gravi |
neorv32/sw/example/blink_led$ make clean_all exe
|
335 |
60 |
zero_gravi |
----
|
336 |
|
|
|
337 |
|
|
[start=3]
|
338 |
66 |
zero_gravi |
. We are using the `clean_all` target to make sure everything is re-build.
|
339 |
60 |
zero_gravi |
. This will compile and link the application sources together with all the included libraries. At the end,
|
340 |
61 |
zero_gravi |
your application is transformed into an ELF file (`main.elf`). The _NEORV32 image generator_ (in `sw/image_gen`)
|
341 |
|
|
takes this file and creates a final executable. The makefile will show the resulting memory utilization and
|
342 |
|
|
the executable size:
|
343 |
60 |
zero_gravi |
|
344 |
|
|
[source,bash]
|
345 |
|
|
----
|
346 |
61 |
zero_gravi |
neorv32/sw/example/blink_led$ make clean_all exe
|
347 |
60 |
zero_gravi |
Memory utilization:
|
348 |
61 |
zero_gravi |
text data bss dec hex filename
|
349 |
|
|
3176 0 120 3296 ce0 main.elf
|
350 |
|
|
Compiling ../../../sw/image_gen/image_gen
|
351 |
60 |
zero_gravi |
Executable (neorv32_exe.bin) size in bytes:
|
352 |
62 |
zero_gravi |
3188
|
353 |
60 |
zero_gravi |
----
|
354 |
|
|
|
355 |
61 |
zero_gravi |
[start=5]
|
356 |
|
|
. That's it. The `exe` target has created the actual executable `neorv32_exe.bin` in the current folder
|
357 |
|
|
that is ready to be uploaded to the processor.
|
358 |
60 |
zero_gravi |
|
359 |
|
|
[TIP]
|
360 |
61 |
zero_gravi |
The compilation process will also create a `main.asm` assembly listing file in the current folder, which
|
361 |
|
|
shows the actual assembly code of the application.
|
362 |
60 |
zero_gravi |
|
363 |
|
|
|
364 |
|
|
|
365 |
|
|
<<<
|
366 |
|
|
// ####################################################################################################################
|
367 |
|
|
:sectnums:
|
368 |
|
|
== Uploading and Starting of a Binary Executable Image via UART
|
369 |
|
|
|
370 |
61 |
zero_gravi |
Follow this guide to use the bootloader to upload an executable via UART.
|
371 |
60 |
zero_gravi |
|
372 |
61 |
zero_gravi |
[NOTE]
|
373 |
|
|
This concept uses the default "Indirect Boot" scenario that uses the bootloader to upload new executables.
|
374 |
|
|
See datasheet section https://stnolting.github.io/neorv32/#_indirect_boot[Indirect Boot] for more information.
|
375 |
60 |
zero_gravi |
|
376 |
61 |
zero_gravi |
[IMPORTANT]
|
377 |
|
|
If your FPGA board does not provide such an interface - don't worry!
|
378 |
|
|
Section <<_installing_an_executable_directly_into_memory>> shows how to
|
379 |
|
|
run custom programs on your FPGA setup without having a UART.
|
380 |
60 |
zero_gravi |
|
381 |
|
|
[start=1]
|
382 |
61 |
zero_gravi |
. Connect the primary UART (UART0) interface of your FPGA board to a serial port of your host computer.
|
383 |
|
|
. Start a terminal program. In this tutorial, I am using TeraTerm for Windows. You can download it fore free
|
384 |
|
|
from https://ttssh2.osdn.jp/index.html.en
|
385 |
60 |
zero_gravi |
|
386 |
61 |
zero_gravi |
[NOTE]
|
387 |
|
|
_Any_ terminal program that can connect to a serial port should work. However, make sure the program
|
388 |
|
|
can transfer data in _raw_ byte mode without any protocol overhead around it.
|
389 |
60 |
zero_gravi |
|
390 |
|
|
[start=3]
|
391 |
61 |
zero_gravi |
. Open a connection to the the serial port your UART is connected to. Configure the terminal setting according to the
|
392 |
60 |
zero_gravi |
following parameters:
|
393 |
|
|
|
394 |
|
|
* 19200 Baud
|
395 |
|
|
* 8 data bits
|
396 |
|
|
* 1 stop bit
|
397 |
|
|
* no parity bits
|
398 |
61 |
zero_gravi |
* _no_ transmission/flow control protocol
|
399 |
|
|
* receiver (host computer) newline on `\r\n` (carriage return & newline)
|
400 |
60 |
zero_gravi |
|
401 |
|
|
[start=4]
|
402 |
61 |
zero_gravi |
. Also make sure that single chars are send from your computer _without_ any consecutive "new line" or "carriage
|
403 |
60 |
zero_gravi |
return" commands (this is highly dependent on your terminal application of choice, TeraTerm only
|
404 |
|
|
sends the raw chars by default).
|
405 |
|
|
. Press the NEORV32 reset button to restart the bootloader. The status LED starts blinking and the
|
406 |
|
|
bootloader intro screen appears in your console. Hurry up and press any key (hit space!) to abort the
|
407 |
|
|
automatic boot sequence and to start the actual bootloader user interface console.
|
408 |
|
|
|
409 |
|
|
.Bootloader console; aborted auto-boot sequence
|
410 |
|
|
[source,bash]
|
411 |
|
|
----
|
412 |
|
|
<< NEORV32 Bootloader >>
|
413 |
|
|
|
414 |
|
|
BLDV: Mar 23 2021
|
415 |
|
|
HWV: 0x01050208
|
416 |
|
|
CLK: 0x05F5E100
|
417 |
|
|
MISA: 0x40901105
|
418 |
|
|
ZEXT: 0x00000023
|
419 |
|
|
PROC: 0x0EFF0037
|
420 |
|
|
IMEM: 0x00004000 bytes @ 0x00000000
|
421 |
|
|
DMEM: 0x00002000 bytes @ 0x80000000
|
422 |
|
|
|
423 |
|
|
Autoboot in 8s. Press key to abort.
|
424 |
|
|
Aborted.
|
425 |
|
|
|
426 |
|
|
Available commands:
|
427 |
|
|
h: Help
|
428 |
|
|
r: Restart
|
429 |
|
|
u: Upload
|
430 |
|
|
s: Store to flash
|
431 |
|
|
l: Load from flash
|
432 |
|
|
e: Execute
|
433 |
|
|
CMD:>
|
434 |
|
|
----
|
435 |
|
|
|
436 |
|
|
[start=6]
|
437 |
61 |
zero_gravi |
. Execute the "Upload" command by typing `u`. Now the bootloader is waiting for a binary executable to be send.
|
438 |
60 |
zero_gravi |
|
439 |
|
|
[source,bash]
|
440 |
|
|
----
|
441 |
|
|
CMD:> u
|
442 |
|
|
Awaiting neorv32_exe.bin...
|
443 |
|
|
----
|
444 |
|
|
|
445 |
|
|
[start=7]
|
446 |
61 |
zero_gravi |
. Use the "send file" option of your terminal program to send a NEORV32 executable (`neorv32_exe.bin`).
|
447 |
|
|
. Again, make sure to transmit the executable in raw binary mode (no transfer protocol).
|
448 |
|
|
When using TeraTerm, select the "binary" option in the send file dialog.
|
449 |
60 |
zero_gravi |
. If everything went fine, OK will appear in your terminal:
|
450 |
|
|
|
451 |
|
|
[source,bash]
|
452 |
|
|
----
|
453 |
|
|
CMD:> u
|
454 |
|
|
Awaiting neorv32_exe.bin... OK
|
455 |
|
|
----
|
456 |
|
|
|
457 |
|
|
[start=10]
|
458 |
61 |
zero_gravi |
. The executable is now in the instruction memory of the processor. To execute the program right
|
459 |
60 |
zero_gravi |
now run the "Execute" command by typing `e`:
|
460 |
|
|
|
461 |
|
|
[source,bash]
|
462 |
|
|
----
|
463 |
|
|
CMD:> u
|
464 |
|
|
Awaiting neorv32_exe.bin... OK
|
465 |
|
|
CMD:> e
|
466 |
|
|
Booting...
|
467 |
|
|
Blinking LED demo program
|
468 |
|
|
----
|
469 |
|
|
|
470 |
|
|
[start=11]
|
471 |
61 |
zero_gravi |
. If everything went fine, you should see the LEDs blinking.
|
472 |
60 |
zero_gravi |
|
473 |
61 |
zero_gravi |
[NOTE]
|
474 |
|
|
The bootloader will print error codes if something went wrong.
|
475 |
|
|
See section https://stnolting.github.io/neorv32/#_bootloader[Bootloader] of the NEORV32 datasheet for more information.
|
476 |
60 |
zero_gravi |
|
477 |
61 |
zero_gravi |
[TIP]
|
478 |
|
|
See section <<_programming_an_external_spi_flash_via_the_bootloader>> to learn how to use an external SPI
|
479 |
|
|
flash for nonvolatile program storage.
|
480 |
60 |
zero_gravi |
|
481 |
61 |
zero_gravi |
[TIP]
|
482 |
|
|
Executables can also be uploaded via the **on-chip debugger**.
|
483 |
|
|
See section <<_debugging_with_gdb>> for more information.
|
484 |
|
|
|
485 |
|
|
|
486 |
|
|
|
487 |
60 |
zero_gravi |
<<<
|
488 |
|
|
// ####################################################################################################################
|
489 |
|
|
:sectnums:
|
490 |
61 |
zero_gravi |
== Installing an Executable Directly Into Memory
|
491 |
60 |
zero_gravi |
|
492 |
61 |
zero_gravi |
If you do not want to use the bootloader (or the on-chip debugger) for executable upload or if your setup does not provide
|
493 |
|
|
a serial interface for that, you can also directly install an application into embedded memory.
|
494 |
60 |
zero_gravi |
|
495 |
61 |
zero_gravi |
This concept uses the "Direct Boot" scenario that implements the processor-internal IMEM as ROM, which is
|
496 |
|
|
pre-initialized with the application's executable during synthesis. Hence, it provides _non-volatile_ storage of the
|
497 |
|
|
executable inside the processor. This storage cannot be altered during runtime and any source code modification of
|
498 |
|
|
the application requires to re-program the FPGA via the bitstream.
|
499 |
|
|
|
500 |
|
|
[TIP]
|
501 |
|
|
See datasheet section https://stnolting.github.io/neorv32/#_direct_boot[Direct Boot] for more information.
|
502 |
|
|
|
503 |
|
|
|
504 |
|
|
|
505 |
|
|
Using the IMEM as ROM:
|
506 |
|
|
|
507 |
|
|
* for this boot concept the bootloader is no longer required
|
508 |
|
|
* this concept only works for the internal IMEM (but can be extended to work with external memories coupled via the processor's bus interface)
|
509 |
62 |
zero_gravi |
* make sure that the memory components (like block RAM) the IMEM is mapped to support an initialization via the bitstream
|
510 |
61 |
zero_gravi |
|
511 |
60 |
zero_gravi |
[start=1]
|
512 |
61 |
zero_gravi |
. At first, make sure your processor setup actually implements the internal IMEM: the `MEM_INT_IMEM_EN` generics has to be set to `true`:
|
513 |
|
|
|
514 |
|
|
.Processor top entity configuration - enable internal IMEM
|
515 |
|
|
[source,vhdl]
|
516 |
|
|
----
|
517 |
|
|
-- Internal Instruction memory --
|
518 |
|
|
MEM_INT_IMEM_EN => true, -- implement processor-internal instruction memory
|
519 |
|
|
----
|
520 |
|
|
|
521 |
|
|
[start=2]
|
522 |
|
|
. For this setup we do not want the bootloader to be implemented at all. Disable implementation of the bootloader by setting the
|
523 |
62 |
zero_gravi |
`INT_BOOTLOADER_EN` generic to `false`. This will also modify the processor-internal IMEM so it is initialized with the executable during synthesis.
|
524 |
61 |
zero_gravi |
|
525 |
|
|
.Processor top entity configuration - disable internal bootloader
|
526 |
|
|
[source,vhdl]
|
527 |
|
|
----
|
528 |
|
|
-- General --
|
529 |
|
|
INT_BOOTLOADER_EN => false, -- boot configuration: false = boot from int/ext (I)MEM
|
530 |
|
|
----
|
531 |
|
|
|
532 |
|
|
[start=3]
|
533 |
|
|
. To generate an "initialization image" for the IMEM that contains the actual application, run the `install` target when compiling your application:
|
534 |
|
|
|
535 |
|
|
[source,bash]
|
536 |
|
|
----
|
537 |
|
|
neorv32/sw/example/blink_led$ make clean_all install
|
538 |
|
|
Memory utilization:
|
539 |
|
|
text data bss dec hex filename
|
540 |
|
|
3176 0 120 3296 ce0 main.elf
|
541 |
|
|
Compiling ../../../sw/image_gen/image_gen
|
542 |
|
|
Installing application image to ../../../rtl/core/neorv32_application_image.vhd
|
543 |
|
|
----
|
544 |
|
|
|
545 |
|
|
[start=4]
|
546 |
|
|
. The `install` target has compiled all the application sources but instead of creating an executable (`neorv32_exe.bit`) that can be uploaded via the
|
547 |
|
|
bootloader, it has created a VHDL memory initialization image `core/neorv32_application_image.vhd`.
|
548 |
|
|
. This VHDL file is automatically copied to the core's rtl folder (`rtl/core`) so it will be included for the next synthesis.
|
549 |
|
|
. Perform a new synthesis. The IMEM will be build as pre-initialized ROM (inferring embedded memories if possible).
|
550 |
|
|
. Upload your bitstream. Your application code now resides unchangeable in the processor's IMEM and is directly executed after reset.
|
551 |
|
|
|
552 |
|
|
|
553 |
|
|
The synthesis tool / simulator will print asserts to inform about the (IMEM) memory / boot configuration:
|
554 |
|
|
|
555 |
|
|
[source]
|
556 |
|
|
----
|
557 |
|
|
NEORV32 PROCESSOR CONFIG NOTE: Boot configuration: Direct boot from memory (processor-internal IMEM).
|
558 |
|
|
NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal IMEM as ROM (3176 bytes), pre-initialized with application.
|
559 |
|
|
----
|
560 |
|
|
|
561 |
|
|
|
562 |
|
|
|
563 |
|
|
<<<
|
564 |
|
|
// ####################################################################################################################
|
565 |
|
|
:sectnums:
|
566 |
|
|
== Setup of a New Application Program Project
|
567 |
|
|
|
568 |
|
|
[start=1]
|
569 |
|
|
. The easiest way of creating a _new_ software application project is to copy an _existing_ one. This will keep all
|
570 |
|
|
file dependencies. For example you can copy `sw/example/blink_led` to `sw/example/flux_capacitor`.
|
571 |
|
|
. If you want to place you application somewhere outside `sw/example` you need to adapt the application's makefile.
|
572 |
66 |
zero_gravi |
In the makefile you will find a variable that keeps the relative or absolute path to the NEORV32 repository home
|
573 |
60 |
zero_gravi |
folder. Just modify this variable according to your new project's home location:
|
574 |
|
|
|
575 |
|
|
[source,makefile]
|
576 |
|
|
----
|
577 |
|
|
# Relative or absolute path to the NEORV32 home folder (use default if not set by user)
|
578 |
|
|
NEORV32_HOME ?= ../../..
|
579 |
|
|
----
|
580 |
|
|
|
581 |
|
|
[start=3]
|
582 |
61 |
zero_gravi |
. If your project contains additional source files outside of the project folder, you can add them to
|
583 |
|
|
the `APP_SRC` variable:
|
584 |
60 |
zero_gravi |
|
585 |
|
|
[source,makefile]
|
586 |
|
|
----
|
587 |
|
|
# User's application sources (add additional files here)
|
588 |
|
|
APP_SRC = $(wildcard *.c) ../somewhere/some_file.c
|
589 |
|
|
----
|
590 |
|
|
|
591 |
|
|
[start=4]
|
592 |
61 |
zero_gravi |
. You also can add a folder containing your application's include files to the
|
593 |
|
|
`APP_INC` variable (do not forget the `-I` prefix):
|
594 |
60 |
zero_gravi |
|
595 |
|
|
[source,makefile]
|
596 |
|
|
----
|
597 |
|
|
# User's application include folders (don't forget the '-I' before each entry)
|
598 |
|
|
APP_INC = -I . -I ../somewhere/include_stuff_folder
|
599 |
|
|
----
|
600 |
|
|
|
601 |
|
|
|
602 |
|
|
|
603 |
|
|
<<<
|
604 |
|
|
// ####################################################################################################################
|
605 |
|
|
:sectnums:
|
606 |
|
|
== Enabling RISC-V CPU Extensions
|
607 |
|
|
|
608 |
61 |
zero_gravi |
Whenever you enable/disable a RISC-V CPU extensions via the according `CPU_EXTENSION_RISCV_x` generic, you need to
|
609 |
60 |
zero_gravi |
adapt the toolchain configuration so the compiler can actually generate according code for it.
|
610 |
|
|
|
611 |
|
|
To do so, open the makefile of your project (for example `sw/example/blink_led/makefile`) and scroll to the
|
612 |
61 |
zero_gravi |
"USER CONFIGURATION" section right at the beginning of the file. You need to modify the `MARCH` variable and eventually
|
613 |
|
|
the `MABI` variable according to your CPU hardware configuration.
|
614 |
60 |
zero_gravi |
|
615 |
|
|
[source,makefile]
|
616 |
|
|
----
|
617 |
|
|
# CPU architecture and ABI
|
618 |
65 |
zero_gravi |
MARCH ?= rv32i <1>
|
619 |
|
|
MABI ?= ilp32 <2>
|
620 |
60 |
zero_gravi |
----
|
621 |
|
|
<1> MARCH = Machine architecture ("ISA string")
|
622 |
|
|
<2> MABI = Machine binary interface
|
623 |
|
|
|
624 |
61 |
zero_gravi |
For example, if you enable the RISC-V `C` extension (16-bit compressed instructions) via the `CPU_EXTENSION_RISCV_C`
|
625 |
62 |
zero_gravi |
generic (set `true`) you need to add the `c` extension also to the `MARCH` ISA string in order to make the compiler
|
626 |
61 |
zero_gravi |
emit compressed instructions.
|
627 |
60 |
zero_gravi |
|
628 |
62 |
zero_gravi |
.Privileged Architecture Extensions
|
629 |
|
|
[IMPORTANT]
|
630 |
|
|
Privileged architecture extensions like `Zicsr` or `Zifencei` are "used" _implicitly_ by the compiler. Hence, according
|
631 |
|
|
instruction will only be generated when "encoded" via inline assembly or when linking according libraries. In this case,
|
632 |
|
|
these instruction will _always_ be emitted (even if the according extension is not specified in `MARCH`). +
|
633 |
|
|
**I recommend to _not_ specify any privileged architecture extensions in `MARCH`.**
|
634 |
|
|
|
635 |
61 |
zero_gravi |
[WARNING]
|
636 |
|
|
ISA extension enabled in hardware can be a superset of the extensions enabled in software, but not the other way
|
637 |
|
|
around. For example generating compressed instructions for a CPU configuration that has the `c` extension disabled
|
638 |
|
|
will cause _illegal instruction exceptions_ at runtime.
|
639 |
60 |
zero_gravi |
|
640 |
61 |
zero_gravi |
You can also override the default `MARCH` and `MABI` configurations from the makefile when invoking the makefile:
|
641 |
|
|
|
642 |
60 |
zero_gravi |
[source,bash]
|
643 |
|
|
----
|
644 |
65 |
zero_gravi |
$ make MARCH=rv32ic clean_all all
|
645 |
60 |
zero_gravi |
----
|
646 |
|
|
|
647 |
|
|
[NOTE]
|
648 |
62 |
zero_gravi |
The RISC-V ISA string for `MARCH` follows a certain _canonical_ structure:
|
649 |
|
|
`rev32[i/e][m][a][f][d][g][q][c][b][v][n]...` For example `rv32imac` is valid while `rv32icma` is not.
|
650 |
60 |
zero_gravi |
|
651 |
|
|
|
652 |
|
|
|
653 |
|
|
<<<
|
654 |
|
|
// ####################################################################################################################
|
655 |
|
|
:sectnums:
|
656 |
63 |
zero_gravi |
== Application-Specific Processor Configuration
|
657 |
|
|
|
658 |
|
|
Due to the processor's configuration options, which are mainly defined via the top entity VHDL generics, the SoC
|
659 |
|
|
can be tailored to the application-specific requirements. Note that this chapter does not focus on optional
|
660 |
|
|
_SoC features_ like IO/peripheral modules. It rather gives ideas on how to optimize for _overall goals_
|
661 |
|
|
like performance and area.
|
662 |
|
|
|
663 |
|
|
[NOTE]
|
664 |
|
|
Please keep in mind that optimizing the design in one direction (like performance) will also effect other potential
|
665 |
|
|
optimization goals (like area and energy).
|
666 |
|
|
|
667 |
|
|
=== Optimize for Performance
|
668 |
|
|
|
669 |
|
|
The following points show some concepts to optimize the processor for performance regardless of the costs
|
670 |
|
|
(i.e. increasing area and energy requirements):
|
671 |
|
|
|
672 |
|
|
* Enable all performance-related RISC-V CPU extensions that implement dedicated hardware accelerators instead
|
673 |
|
|
of emulating operations entirely in software: `M`, `C`, `Zfinx`
|
674 |
|
|
* Enable mapping of compleX CPU operations to dedicated hardware: `FAST_MUL_EN => true` to use DSP slices for
|
675 |
|
|
multiplications, `FAST_SHIFT_EN => true` use a fast barrel shifter for shift operations.
|
676 |
|
|
* Implement the instruction cache: `ICACHE_EN => true`
|
677 |
|
|
* Use as many _internal_ memory as possible to reduce memory access latency: `MEM_INT_IMEM_EN => true` and
|
678 |
|
|
`MEM_INT_DMEM_EN => true`, maximize `MEM_INT_IMEM_SIZE` and `MEM_INT_DMEM_SIZE`
|
679 |
|
|
* Increase the CPU's instruction prefetch buffer size: `CPU_IPB_ENTRIES`
|
680 |
|
|
* _To be continued..._
|
681 |
|
|
|
682 |
|
|
|
683 |
|
|
=== Optimize for Size
|
684 |
|
|
|
685 |
|
|
The NEORV32 is a size-optimized processor system that is intended to fit into tiny niches within large SoC
|
686 |
|
|
designs or to be used a customized microcontroller in really tiny / low-power FPGAs (like Lattice iCE40).
|
687 |
|
|
Here are some ideas how to make the processor even smaller while maintaining it's _general purpose system_
|
688 |
|
|
concept and maximum RISC-V compatibility.
|
689 |
|
|
|
690 |
|
|
**SoC**
|
691 |
|
|
|
692 |
|
|
* This is obvious, but exclude all unused optional IO/peripheral modules from synthesis via the processor
|
693 |
|
|
configuration generics.
|
694 |
|
|
* If an IO module provides an option to configure the number of "channels", constrain this number to the
|
695 |
|
|
actually required value (e.g. the PWM module `IO_PWM_NUM_CH` or the external interrupt controller `XIRQ_NUM_CH`).
|
696 |
|
|
* Reduce the FIFO sizes of implemented modules (e.g. `SLINK_TX_FIFO`).
|
697 |
|
|
* Disable the instruction cache (`ICACHE_EN => false`) if the design only uses processor-internal IMEM
|
698 |
|
|
and DMEM memories.
|
699 |
|
|
* _To be continued..._
|
700 |
|
|
|
701 |
|
|
**CPU**
|
702 |
|
|
|
703 |
|
|
* Use the _embedded_ RISC-V CPU architecture extension (`CPU_EXTENSION_RISCV_E`) to reduce block RAM utilization.
|
704 |
|
|
* The compressed instructions extension (`CPU_EXTENSION_RISCV_C`) requires additional logic for the decoder but
|
705 |
|
|
also reduces program code size by approximately 30%.
|
706 |
|
|
* If not explicitly used/required, constrain the CPU's counter sizes: `CPU_CNT_WIDTH` for `[m]instret[h]`
|
707 |
|
|
(number of instruction) and `[m]cycle[h]` (number of cycles) counters. You can even remove these counters
|
708 |
|
|
by setting `CPU_CNT_WIDTH => 0` if they are not used at all (note, this is not RISC-V compliant).
|
709 |
|
|
* Reduce the CPU's prefetch buffer size (`CPU_IPB_ENTRIES`).
|
710 |
|
|
* Map CPU shift operations to a small and iterative shifter unit (`FAST_SHIFT_EN => false`).
|
711 |
|
|
* If you have unused DSP block available, you can map multiplication operations to those slices instead of
|
712 |
|
|
using LUTs to implement the multiplier (`FAST_MUL_EN => true`).
|
713 |
|
|
* If there is no need to execute division in hardware, use the `Zmmul` extension instead of the full-scale
|
714 |
|
|
`M` extension.
|
715 |
|
|
* Disable CPU extension that are not explicitly used (`A`, `U`, `Zfinx`).
|
716 |
|
|
* _To be continued..._
|
717 |
|
|
|
718 |
|
|
=== Optimize for Clock Speed
|
719 |
|
|
|
720 |
|
|
The NEORV32 Processor and CPU are designed to provide minimal logic between register stages to keep the
|
721 |
|
|
critical path as short as possible. When enabling additional extension or modules the impact on the existing
|
722 |
|
|
logic is also kept at a minimum to prevent timing degrading. If there is a major impact on existing
|
723 |
|
|
logic (example: many physical memory protection address configuration registers) the VHDL code automatically
|
724 |
|
|
adds additional register stages to maintain critical path length. Obviously, this increases operation latency.
|
725 |
|
|
|
726 |
|
|
In order to optimize for a minimal critical path (= maximum clock speed) the following points should be considered:
|
727 |
|
|
|
728 |
|
|
* Complex CPU extensions (in terms of hardware requirements) should be avoided (examples: floating-point unit, physical memory protection).
|
729 |
|
|
* Large carry chains (>32-bit) should be avoided (constrain CPU counter sizes: e.g. `CPU_CNT_WIDTH => 32` and `HPM_NUM_CNTS => 32`).
|
730 |
|
|
* If the target FPGA provides sufficient DSP resources, CPU multiplication operations can be mapped to DSP slices (`FAST_MUL_EN => true`)
|
731 |
|
|
reducing LUT usage and critical path impact while also increasing overall performance.
|
732 |
|
|
* Use the synchronous (registered) RX path configuration of the external memory interface (`MEM_EXT_ASYNC_RX => false`).
|
733 |
|
|
* _To be continued..._
|
734 |
|
|
|
735 |
|
|
[NOTE]
|
736 |
|
|
The short and fixed-length critical path allows to integrate the core into existing clock domains.
|
737 |
|
|
So no clock domain-crossing and no sub-clock generation is required. However, for very high clock
|
738 |
|
|
frequencies (this is technology / platform dependent) clock domain crossing becomes crucial for chip-internal
|
739 |
|
|
connections.
|
740 |
|
|
|
741 |
|
|
|
742 |
|
|
=== Optimize for Energy
|
743 |
|
|
|
744 |
|
|
There are no _dedicated_ configuration options to optimize the processor for energy (minimal consumption;
|
745 |
|
|
energy/instruction ratio) yet. However, a reduced processor area (<<_optimize_for_size>>) will also reduce
|
746 |
|
|
static energy consumption.
|
747 |
|
|
|
748 |
|
|
To optimize your setup for low-power applications, you can make use of the CPU sleep mode (`wfi` instruction).
|
749 |
|
|
Put the CPU to sleep mode whenever possible. Disable all processor modules that are not actually used (exclude them
|
750 |
|
|
from synthesis if the will be _never_ used; disable the module via it's control register if the module is not
|
751 |
|
|
_currently_ used). When is sleep mode, you can keep a timer module running (MTIME or the watch dog) to wake up
|
752 |
|
|
the CPU again. Since the wake up is triggered by _any_ interrupt, the external interrupt controller can also
|
753 |
|
|
be used to wake up the CPU again. By this, all timers (and all other modules) can be deactivated as well.
|
754 |
|
|
|
755 |
|
|
.Processor-internal clock generator shutdown
|
756 |
|
|
[TIP]
|
757 |
|
|
If _no_ IO/peripheral module is currently enabled, the processor's internal clock generator circuit will be
|
758 |
|
|
shut down reducing switching activity and thus, dynamic energy consumption.
|
759 |
|
|
|
760 |
|
|
|
761 |
|
|
|
762 |
|
|
<<<
|
763 |
|
|
// ####################################################################################################################
|
764 |
|
|
:sectnums:
|
765 |
64 |
zero_gravi |
== Adding Custom Hardware Modules
|
766 |
|
|
|
767 |
|
|
In resemblance to the RISC-V ISA, the NEORV32 processor was designed to ease customization and _extensibility_.
|
768 |
|
|
The processor provides several predefined options to add application-specific custom hardware modules and accelerators.
|
769 |
|
|
|
770 |
|
|
|
771 |
|
|
=== Standard (_External_) Interfaces
|
772 |
|
|
|
773 |
|
|
The processor already provides a set of standard interfaces that are intended to connect _chip-external_ devices.
|
774 |
|
|
However, these interfaces can also be used chip-internally. The most suitable interfaces are
|
775 |
|
|
https://stnolting.github.io/neorv32/#_general_purpose_input_and_output_port_gpio[GPIO],
|
776 |
|
|
https://stnolting.github.io/neorv32/#_primary_universal_asynchronous_receiver_and_transmitter_uart0[UART],
|
777 |
|
|
https://stnolting.github.io/neorv32/#_serial_peripheral_interface_controller_spi[SPI] and
|
778 |
|
|
https://stnolting.github.io/neorv32/#_two_wire_serial_interface_controller_twi[TWI].
|
779 |
|
|
|
780 |
|
|
The SPI and (especially) the GPIO interfaces might be the most straightforward approaches since they
|
781 |
|
|
have a minimal protocol overhead. Device-specific interrupt capabilities can be added using the
|
782 |
|
|
https://stnolting.github.io/neorv32/#_external_interrupt_controller_xirq[External Interrupt Controller (XIRQ)].
|
783 |
|
|
Beyond simplicity, these interface only provide a very limited bandwidth and require more sophisticated
|
784 |
|
|
software handling ("bit-banging" for the GPIO).
|
785 |
|
|
|
786 |
|
|
|
787 |
|
|
=== External Bus Interface
|
788 |
|
|
|
789 |
|
|
The https://stnolting.github.io/neorv32/#_processor_external_memory_interface_wishbone_axi4_lite[External Bus Interface]
|
790 |
|
|
provides the classic approach to connect to custom IP. By default, the bus interface implements the widely adopted
|
791 |
|
|
Wishbone interface standard. However, this project also includes wrappers to bridge to other protocol standards like ARM's
|
792 |
|
|
AXI4-Lite or Intel's Avalon. By using a full-featured bus protocol, complex SoC structures can be implemented (including
|
793 |
|
|
several modules and even multi-core architectures). Many FPGA EDA tools provide graphical editors to build and customize
|
794 |
|
|
whole SoC architectures and even include pre-defined IP libraries.
|
795 |
|
|
|
796 |
|
|
.Example AXI SoC using Xilinx Vivado
|
797 |
|
|
image::neorv32_axi_soc.png[]
|
798 |
|
|
|
799 |
|
|
The bus interface uses a memory-mapped approach. All data transfers are handled by simple load/store operations since the
|
800 |
|
|
external bus interface is mapped into the processor's https://stnolting.github.io/neorv32/#_address_space[address space].
|
801 |
|
|
This allows a very simple still high-bandwidth communications.
|
802 |
|
|
|
803 |
|
|
|
804 |
|
|
=== Stream Link Interface
|
805 |
|
|
|
806 |
|
|
The NEORV32 https://stnolting.github.io/neorv32/#_stream_link_interface_slink[Stream Link Interface] provides
|
807 |
|
|
point-to-point, unidirectional and parallel data channels that can be used to transfer streaming data. In
|
808 |
|
|
contrast to the external bus interface, the streaming data does not provide any kind of "direction" control,
|
809 |
|
|
so it can be seen as "constant address bursts". The stream link interface provides less protocol overhead
|
810 |
|
|
and less latency than the bus interface. Furthermore, FIFOs can be be configured to each direction (RX/TX) to
|
811 |
|
|
allow more CPU-independent operation.
|
812 |
|
|
|
813 |
|
|
|
814 |
|
|
=== Custom Functions Subsystem
|
815 |
|
|
|
816 |
66 |
zero_gravi |
The NEORV32 https://stnolting.github.io/neorv32/#_custom_functions_subsystem_cfs[Custom Functions Subsystem] is
|
817 |
|
|
an "empty" template for a processor-internal module. It provides 32 32-bit memory-mapped interface
|
818 |
64 |
zero_gravi |
registers that can be used to communicate with any arbitrary custom design logic. The intentions of this
|
819 |
|
|
subsystem is to provide a simple base, where the user can concentrate on implementing the actual design logic
|
820 |
|
|
rather than taking care of the communication between the CPU/software and the design logic. The interface
|
821 |
|
|
registers are already allocated within the processor's address space and are supported by the software framework
|
822 |
|
|
via low-level hardware access mechanisms. Additionally, the CFS provides a direct pre-defined interrupt channel to
|
823 |
66 |
zero_gravi |
the CPU, which is also supported by the NEORV32 runtime environment.
|
824 |
64 |
zero_gravi |
|
825 |
|
|
|
826 |
|
|
|
827 |
|
|
<<<
|
828 |
|
|
// ####################################################################################################################
|
829 |
|
|
:sectnums:
|
830 |
61 |
zero_gravi |
== Customizing the Internal Bootloader
|
831 |
60 |
zero_gravi |
|
832 |
61 |
zero_gravi |
The NEORV32 bootloader provides several options to configure and customize it for a certain application setup.
|
833 |
|
|
This configuration is done by passing _defines_ when compiling the bootloader. Of course you can also
|
834 |
|
|
modify to bootloader source code to provide a setup that perfectly fits your needs.
|
835 |
60 |
zero_gravi |
|
836 |
61 |
zero_gravi |
[IMPORTANT]
|
837 |
|
|
Each time the bootloader sources are modified, the bootloader has to be re-compiled (and re-installed to the
|
838 |
|
|
bootloader ROM) and the processor has to be re-synthesized.
|
839 |
60 |
zero_gravi |
|
840 |
61 |
zero_gravi |
[NOTE]
|
841 |
|
|
Keep in mind that the maximum size for the bootloader is limited to 32kB and should be compiled using the
|
842 |
|
|
base ISA `rv32i` only to ensure it can work independently of the actual CPU configuration.
|
843 |
60 |
zero_gravi |
|
844 |
61 |
zero_gravi |
.Bootloader configuration parameters
|
845 |
|
|
[cols="<2,^1,^2,<6"]
|
846 |
|
|
[options="header", grid="rows"]
|
847 |
|
|
|=======================
|
848 |
|
|
| Parameter | Default | Legal values | Description
|
849 |
|
|
4+^| Serial console interface
|
850 |
|
|
| `UART_EN` | `1` | `0`, `1` | Set to `0` to disable UART0 (no serial console at all)
|
851 |
|
|
| `UART_BAUD` | `19200` | _any_ | Baud rate of UART0
|
852 |
|
|
4+^| Status LED
|
853 |
|
|
| `STATUS_LED_EN` | `1` | `0`, `1` | Enable bootloader status led ("heart beat") at `GPIO` output port pin #`STATUS_LED_PIN` when `1`
|
854 |
|
|
| `STATUS_LED_PIN` | `0` | `0` ... `31` | `GPIO` output pin used for the high-active status LED
|
855 |
|
|
4+^| Boot configuration
|
856 |
|
|
| `AUTO_BOOT_SPI_EN` | `0` | `0`, `1` | Set `1` to enable immediate boot from external SPI flash
|
857 |
|
|
| `AUTO_BOOT_OCD_EN` | `0` | `0`, `1` | Set `1` to enable boot via on-chip debugger (OCD)
|
858 |
|
|
| `AUTO_BOOT_TIMEOUT` | `8` | _any_ | Time in seconds after the auto-boot sequence starts (if there is no UART input by user); set to 0 to disabled auto-boot sequence
|
859 |
|
|
4+^| SPI configuration
|
860 |
63 |
zero_gravi |
| `SPI_EN` | `1` | `0`, `1` | Set `1` to enable the usage of the SPI module (including load/store executables from/to SPI flash options)
|
861 |
61 |
zero_gravi |
| `SPI_FLASH_CS` | `0` | `0` ... `7` | SPI chip select output (`spi_csn_o`) for selecting flash
|
862 |
|
|
| `SPI_FLASH_SECTOR_SIZE` | `65536` | _any_ | SPI flash sector size in bytes
|
863 |
|
|
| `SPI_FLASH_CLK_PRSC` | `CLK_PRSC_8` | `CLK_PRSC_2` `CLK_PRSC_4` `CLK_PRSC_8` `CLK_PRSC_64` `CLK_PRSC_128` `CLK_PRSC_1024` `CLK_PRSC_2024` `CLK_PRSC_4096` | SPI clock pre-scaler (dividing main processor clock)
|
864 |
|
|
| `SPI_BOOT_BASE_ADDR` | `0x08000000` | _any_ 32-bit value | Defines the _base_ address of the executable in external flash
|
865 |
|
|
|=======================
|
866 |
60 |
zero_gravi |
|
867 |
61 |
zero_gravi |
Each configuration parameter is implemented as C-language `define` that can be manually overridden (_redefined_) when
|
868 |
|
|
invoking the bootloader's makefile. The according parameter and its new value has to be _appended_
|
869 |
64 |
zero_gravi |
(using `+=`) to the makefile `USER_FLAGS` variable. Make sure to use the `-D` prefix here.
|
870 |
60 |
zero_gravi |
|
871 |
61 |
zero_gravi |
For example, to configure a UART Baud rate of 57600 and redirecting the status LED to output pin 20
|
872 |
|
|
use the following command (_in_ the bootloader's source folder `sw/bootloader`):
|
873 |
60 |
zero_gravi |
|
874 |
61 |
zero_gravi |
.Example: customizing, re-compiling and re-installing the bootloader
|
875 |
|
|
[source,console]
|
876 |
60 |
zero_gravi |
----
|
877 |
61 |
zero_gravi |
$ make USER_FLAGS+=-DUART_BAUD=57600 USER_FLAGS+=-DSTATUS_LED_PIN=20 clean_all bootloader
|
878 |
60 |
zero_gravi |
----
|
879 |
|
|
|
880 |
61 |
zero_gravi |
[NOTE]
|
881 |
|
|
The `clean_all` target ensure that all libraries are re-compiled. The `bootloader` target will automatically
|
882 |
|
|
compile and install the bootloader to the HDL boot ROM (updating `rtl/core/neorv32_bootloader_image.vhd`).
|
883 |
60 |
zero_gravi |
|
884 |
61 |
zero_gravi |
:sectnums:
|
885 |
|
|
=== Bootloader Boot Configuration
|
886 |
60 |
zero_gravi |
|
887 |
61 |
zero_gravi |
The bootloader provides several _boot configurations_ that define where the actual application's executable
|
888 |
|
|
shall be fetched from. Note that the non-default boot configurations provide a smaller memory footprint
|
889 |
|
|
reducing boot ROM implementation costs.
|
890 |
60 |
zero_gravi |
|
891 |
61 |
zero_gravi |
:sectnums!:
|
892 |
|
|
==== Default Boot Configuration
|
893 |
60 |
zero_gravi |
|
894 |
61 |
zero_gravi |
The _default_ bootloader configuration provides a UART-based user interface that allows to upload new executables
|
895 |
|
|
at any time. Optionally, the executable can also be programmed to an external SPI flash by the bootloader (see
|
896 |
|
|
section <<_programming_an_external_spi_flash_via_the_bootloader>>).
|
897 |
60 |
zero_gravi |
|
898 |
61 |
zero_gravi |
This configuration also provides an _automatic boot sequence_ (auto-boot) which will start fetching an executable
|
899 |
|
|
from external SPI flash using the default SPI configuration. By this, the default bootloader configuration
|
900 |
|
|
provides a "non volatile program storage" mechanism that automatically boot from external SPI flash
|
901 |
|
|
(after `AUTO_BOOT_TIMEOUT`) while still providing the option to re-program SPI flash at any time
|
902 |
|
|
via the UART interface.
|
903 |
60 |
zero_gravi |
|
904 |
61 |
zero_gravi |
:sectnums!:
|
905 |
|
|
==== `AUTO_BOOT_SPI_EN`
|
906 |
60 |
zero_gravi |
|
907 |
61 |
zero_gravi |
The automatic boot from SPI flash (enabled when `AUTO_BOOT_SPI_EN` is `1`) will fetch an executable from an external
|
908 |
|
|
SPI flash (using the according _SPI configuration_) right after reset. The bootloader will start fetching
|
909 |
|
|
the image at SPI flash base address `SPI_BOOT_BASE_ADDR`.
|
910 |
60 |
zero_gravi |
|
911 |
61 |
zero_gravi |
Note that there is _no_ UART console to interact with the bootloader. However, this boot configuration will
|
912 |
|
|
output minimal status messages via UART (if `UART_EN` is `1`).
|
913 |
60 |
zero_gravi |
|
914 |
61 |
zero_gravi |
:sectnums!:
|
915 |
|
|
==== `AUTO_BOOT_OCD_EN`
|
916 |
60 |
zero_gravi |
|
917 |
61 |
zero_gravi |
If `AUTO_BOOT_OCD_EN` is `1` the bootloader is implemented as minimal "halt loop" to be used with the on-chip debugger.
|
918 |
|
|
After initializing the hardware, the CPU waits in this endless loop until the on-chip debugger takes control over
|
919 |
|
|
the core (to upload and run the actual executable). See section <<_debugging_using_the_on_chip_debugger>>
|
920 |
|
|
for more information on how to use the on-chip debugger to upload and run executables.
|
921 |
60 |
zero_gravi |
|
922 |
61 |
zero_gravi |
[NOTE]
|
923 |
|
|
All bootloader boot configuration support uploading new executables via the on-chip debugger.
|
924 |
60 |
zero_gravi |
|
925 |
61 |
zero_gravi |
[WARNING]
|
926 |
|
|
Note that this boot configuration does not load any executable at all! Hence,
|
927 |
62 |
zero_gravi |
this boot configuration is intended to be used with the on-chip debugger only.
|
928 |
60 |
zero_gravi |
|
929 |
|
|
|
930 |
|
|
|
931 |
61 |
zero_gravi |
<<<
|
932 |
|
|
// ####################################################################################################################
|
933 |
|
|
:sectnums:
|
934 |
|
|
== Programming an External SPI Flash via the Bootloader
|
935 |
60 |
zero_gravi |
|
936 |
61 |
zero_gravi |
The default processor-internal NEORV32 bootloader supports automatic booting from an external SPI flash.
|
937 |
|
|
This guide shows how to write an executable to the SPI flash via the bootloader so it can be automatically
|
938 |
|
|
fetched and executed after processor reset. For example, you can use a section of the FPGA bitstream configuration
|
939 |
|
|
memory to store an application executable.
|
940 |
60 |
zero_gravi |
|
941 |
61 |
zero_gravi |
[NOTE]
|
942 |
|
|
This section assumes the _default_ configuration of the NEORV32 bootloader.
|
943 |
|
|
See section <<_customizing_the_internal_bootloader>> on how to customize the bootloader and its setting
|
944 |
|
|
(for example the SPI chip-select port, the SPI clock speed or the flash base address for storing the executable).
|
945 |
60 |
zero_gravi |
|
946 |
|
|
|
947 |
61 |
zero_gravi |
:sectnums:
|
948 |
|
|
=== SPI Flash
|
949 |
60 |
zero_gravi |
|
950 |
61 |
zero_gravi |
The bootloader can access an SPI compatible flash via the processor top entity's SPI port. By default, the flash
|
951 |
|
|
chip-select line is to `spi_csn_o(0)` and uses 1/8 of the processor's main clock as clock frequency.
|
952 |
|
|
The SPI flash has to support single-byte read and write, 24-bit addresses and at least the following standard commands:
|
953 |
60 |
zero_gravi |
|
954 |
61 |
zero_gravi |
* READ `0x03`
|
955 |
|
|
* READ STATUS `0x05`
|
956 |
|
|
* WRITE ENABLE `0x06`
|
957 |
|
|
* PAGE PROGRAM `0x02`
|
958 |
|
|
* SECTOR ERASE `0xD8`
|
959 |
|
|
* READ ID `0x9E`
|
960 |
60 |
zero_gravi |
|
961 |
61 |
zero_gravi |
Compatible (FGPA configuration) SPI flash memories are for example the "Winbond W25Q64FV2 or the "Micron N25Q032A".
|
962 |
60 |
zero_gravi |
|
963 |
|
|
|
964 |
|
|
:sectnums:
|
965 |
61 |
zero_gravi |
=== Programming an Executable
|
966 |
60 |
zero_gravi |
|
967 |
|
|
[start=1]
|
968 |
|
|
. At first, reset the NEORV32 processor and wait until the bootloader start screen appears in your terminal program.
|
969 |
|
|
. Abort the auto boot sequence and start the user console by pressing any key.
|
970 |
61 |
zero_gravi |
. Press u to upload the executable that you want to store to the external flash:
|
971 |
60 |
zero_gravi |
|
972 |
|
|
[source]
|
973 |
|
|
----
|
974 |
|
|
CMD:> u
|
975 |
|
|
Awaiting neorv32_exe.bin...
|
976 |
|
|
----
|
977 |
|
|
|
978 |
|
|
[start=4]
|
979 |
61 |
zero_gravi |
. Send the binary in raw binary via your terminal program. When the upload is completed and "OK"
|
980 |
60 |
zero_gravi |
appears, press `p` to trigger the programming of the flash (do not execute the image via the `e`
|
981 |
|
|
command as this might corrupt the image):
|
982 |
|
|
|
983 |
|
|
[source]
|
984 |
|
|
----
|
985 |
|
|
CMD:> u
|
986 |
|
|
Awaiting neorv32_exe.bin... OK
|
987 |
|
|
CMD:> p
|
988 |
|
|
Write 0x000013FC bytes to SPI flash @ 0x00800000? (y/n)
|
989 |
|
|
----
|
990 |
|
|
|
991 |
|
|
[start=5]
|
992 |
|
|
. The bootloader shows the size of the executable and the base address inside the SPI flash where the
|
993 |
|
|
executable is going to be stored. A prompt appears: Type `y` to start the programming or type `n` to
|
994 |
61 |
zero_gravi |
abort.
|
995 |
60 |
zero_gravi |
|
996 |
61 |
zero_gravi |
[TIP]
|
997 |
|
|
Section <<_customizing_the_internal_bootloader>> show the according C-language `define` that can be modified
|
998 |
|
|
to specify the base address of the executable inside the SPI flash.
|
999 |
|
|
|
1000 |
60 |
zero_gravi |
[source]
|
1001 |
|
|
----
|
1002 |
|
|
CMD:> u
|
1003 |
|
|
Awaiting neorv32_exe.bin... OK
|
1004 |
|
|
CMD:> p
|
1005 |
61 |
zero_gravi |
Write 0x000013FC bytes to SPI flash @ 0x08000000? (y/n) y
|
1006 |
60 |
zero_gravi |
Flashing... OK
|
1007 |
|
|
CMD:>
|
1008 |
|
|
----
|
1009 |
|
|
|
1010 |
|
|
[start=6]
|
1011 |
|
|
. If "OK" appears in the terminal line, the programming process was successful. Now you can use the
|
1012 |
|
|
auto boot sequence to automatically boot your application from the flash at system start-up without
|
1013 |
|
|
any user interaction.
|
1014 |
|
|
|
1015 |
|
|
|
1016 |
|
|
|
1017 |
|
|
<<<
|
1018 |
|
|
// ####################################################################################################################
|
1019 |
|
|
:sectnums:
|
1020 |
61 |
zero_gravi |
== Packaging the Processor as IP block for Xilinx Vivado Block Designer
|
1021 |
|
|
|
1022 |
62 |
zero_gravi |
[start=1]
|
1023 |
64 |
zero_gravi |
. Import all the core files from `rtl/core` (including default internal memory architectures from `rtl/core/mem`)
|
1024 |
|
|
and assign them to a _new_ design library `neorv32`.
|
1025 |
62 |
zero_gravi |
. Instantiate the `rtl/wrappers/neorv32_top_axi4lite.vhd` module.
|
1026 |
|
|
. Then either directly use that module in a new block-design ("Create Block Design", right-click -> "Add Module",
|
1027 |
|
|
thats easier for a first try) or package it ("Tools", "Create and Package new IP") for the use in other projects.
|
1028 |
|
|
. Connect your AXI-peripheral directly to the core's AXI4-Interface if you only have one, or to an AXI-Interconnect
|
1029 |
|
|
(from the IP-catalog) if you have multiple peripherals.
|
1030 |
|
|
. Connect ALL the `ACLK` and `ARESETN` pins of all peripherals and interconnects to the processor's clock and reset
|
1031 |
|
|
signals to have a _unified_ clock and reset domain (easier for a first setup).
|
1032 |
|
|
. Open the "Address Editor" tab and let Vivado assign the base-addresses for the AXI-peripherals (you can modify them
|
1033 |
|
|
according to your needs).
|
1034 |
|
|
. For all FPGA-external signals (like UART signals) make all the connections you need "external"
|
1035 |
|
|
(right-click on the signal/pin -> "Make External").
|
1036 |
|
|
. Save everything, let VIVADO create a HDL-Wrapper for the block-design and choose this as your _Top Level Design_.
|
1037 |
|
|
. Define your constraints and generate your bitstream.
|
1038 |
61 |
zero_gravi |
|
1039 |
65 |
zero_gravi |
.TWI Tri-State Drivers
|
1040 |
|
|
[IMPORTANT]
|
1041 |
|
|
Set the synthesis option "global" when generating the block design to maintain the internal TWI tri-state drivers.
|
1042 |
|
|
|
1043 |
62 |
zero_gravi |
[NOTE]
|
1044 |
65 |
zero_gravi |
Guide provided by GitHub user https://github.com/AWenzel83[`AWenzel83`] (see
|
1045 |
|
|
https://github.com/stnolting/neorv32/discussions/52#discussioncomment-819013). ❤️
|
1046 |
61 |
zero_gravi |
|
1047 |
|
|
|
1048 |
62 |
zero_gravi |
|
1049 |
61 |
zero_gravi |
<<<
|
1050 |
|
|
// ####################################################################################################################
|
1051 |
|
|
:sectnums:
|
1052 |
60 |
zero_gravi |
== Simulating the Processor
|
1053 |
|
|
|
1054 |
64 |
zero_gravi |
The NEORV32 project includes a core CPU, built-in peripherals in the Processor Subsystem, and additional peripherals in
|
1055 |
|
|
the templates and examples.
|
1056 |
|
|
Therefore, there is a wide range of possible testing and verification strategies.
|
1057 |
|
|
|
1058 |
|
|
On the one hand, a simple smoke testbench allows ensuring that functionality is correct from a software point of view.
|
1059 |
|
|
That is used for running the RISC-V architecture tests, in order to guarantee compliance with the ISA specification(s).
|
1060 |
|
|
|
1061 |
|
|
On the other hand, http://vunit.github.io/[VUnit] and http://vunit.github.io/verification_components/user_guide.html[Verification Components] are used for verifying the functionality of the various peripherals from a hardware point of view.
|
1062 |
|
|
|
1063 |
61 |
zero_gravi |
:sectnums:
|
1064 |
|
|
=== Testbench
|
1065 |
|
|
|
1066 |
64 |
zero_gravi |
A plain-VHDL (no third-party libraries) testbench (`sim/simple/neorv32_tb.simple.vhd`) can be used for simulating and
|
1067 |
|
|
testing the processor.
|
1068 |
|
|
This testbench features a 100MHz clock and enables all optional peripheral and CPU extensions except for the `E`
|
1069 |
|
|
extension and the TRNG IO module (that CANNOT be simulated due to its combinatorial (looped) architecture).
|
1070 |
60 |
zero_gravi |
|
1071 |
|
|
The simulation setup is configured via the "User Configuration" section located right at the beginning of
|
1072 |
|
|
the testbench's architecture. Each configuration constant provides comments to explain the functionality.
|
1073 |
|
|
|
1074 |
|
|
Besides the actual NEORV32 Processor, the testbench also simulates "external" components that are connected
|
1075 |
|
|
to the processor's external bus/memory interface. These components are:
|
1076 |
|
|
|
1077 |
|
|
* an external instruction memory (that also allows booting from it)
|
1078 |
|
|
* an external data memory
|
1079 |
|
|
* an external memory to simulate "external IO devices"
|
1080 |
|
|
* a memory-mapped registers to trigger the processor's interrupt signals
|
1081 |
|
|
|
1082 |
|
|
The following table shows the base addresses of these four components and their default configuration and
|
1083 |
64 |
zero_gravi |
properties:
|
1084 |
60 |
zero_gravi |
|
1085 |
64 |
zero_gravi |
[NOTE]
|
1086 |
|
|
====
|
1087 |
|
|
Attributes:
|
1088 |
|
|
|
1089 |
|
|
* `r` = read
|
1090 |
|
|
* `w` = write
|
1091 |
|
|
* `e` = execute
|
1092 |
|
|
* `a` = atomic accesses possible
|
1093 |
|
|
* `8` = byte-accessible
|
1094 |
|
|
* `16` = half-word-accessible
|
1095 |
|
|
* `32` = word-accessible
|
1096 |
|
|
====
|
1097 |
|
|
|
1098 |
60 |
zero_gravi |
.Testbench: processor-external memories
|
1099 |
|
|
[cols="^4,>3,^5,<11"]
|
1100 |
|
|
[options="header",grid="rows"]
|
1101 |
|
|
|=======================
|
1102 |
|
|
| Base address | Size | Attributes | Description
|
1103 |
|
|
| `0x00000000` | `imem_size_c` | `r/w/e, a, 8/16/32` | external IMEM (initialized with application image)
|
1104 |
|
|
| `0x80000000` | `dmem_size_c` | `r/w/e, a, 8/16/32` | external DMEM
|
1105 |
|
|
| `0xf0000000` | 64 bytes | `r/w/e, !a, 8/16/32` | external "IO" memory, atomic accesses will fail
|
1106 |
|
|
| `0xff000000` | 4 bytes | `-/w/-, a, -/-/32` | memory-mapped register to trigger "machine external", "machine software" and "SoC Fast Interrupt" interrupts
|
1107 |
|
|
|=======================
|
1108 |
|
|
|
1109 |
64 |
zero_gravi |
[IMPORTANT]
|
1110 |
63 |
zero_gravi |
The simulated NEORV32 does not use the bootloader and _directly boots_ the current application image (from
|
1111 |
|
|
the `rtl/core/neorv32_application_image.vhd` image file).
|
1112 |
60 |
zero_gravi |
|
1113 |
63 |
zero_gravi |
.UART output during simulation
|
1114 |
64 |
zero_gravi |
[IMPORTANT]
|
1115 |
60 |
zero_gravi |
Data written to the NEORV32 UART0 / UART1 transmitter is send to a virtual UART receiver implemented
|
1116 |
|
|
as part of the testbench. Received chars are send to the simulator console and are also stored to a log file
|
1117 |
63 |
zero_gravi |
(`neorv32.testbench_uart0.out` for UART0, `neorv32.testbench_uart1.out` for UART1) inside the simulation's home folder.
|
1118 |
|
|
**Please note that printing via the native UART receiver takes a lot of time.** For faster simulation console output
|
1119 |
|
|
see section <<_faster_simulation_console_output>>.
|
1120 |
60 |
zero_gravi |
|
1121 |
|
|
|
1122 |
61 |
zero_gravi |
:sectnums:
|
1123 |
|
|
=== Faster Simulation Console Output
|
1124 |
|
|
|
1125 |
60 |
zero_gravi |
When printing data via the UART the communication speed will always be based on the configured BAUD
|
1126 |
|
|
rate. For a simulation this might take some time. To have faster output you can enable the **simulation mode**
|
1127 |
64 |
zero_gravi |
for UART0/UART1 (see section https://stnolting.github.io/neorv32/#_primary_universal_asynchronous_receiver_and_transmitter_uart0[Documentation: Primary Universal Asynchronous Receiver and Transmitter (UART0)]).
|
1128 |
60 |
zero_gravi |
|
1129 |
64 |
zero_gravi |
ASCII data sent to UART0|UART1 will be immediately printed to the simulator console and logged to files in the simulator
|
1130 |
|
|
execution directory:
|
1131 |
60 |
zero_gravi |
|
1132 |
64 |
zero_gravi |
* `neorv32.uart?.sim_mode.text.out`: ASCII data.
|
1133 |
|
|
* `neorv32.uart?.sim_mode.data.out`: all written 32-bit dumped as 8-char hexadecimal values.
|
1134 |
60 |
zero_gravi |
|
1135 |
64 |
zero_gravi |
You can "automatically" enable the simulation mode of UART0/UART1 when compiling an application.
|
1136 |
|
|
In this case, the "real" UART0/UART1 transmitter unit is permanently disabled.
|
1137 |
|
|
To enable the simulation mode just compile and install your application and add _UART?_SIM_MODE_ to the compiler's
|
1138 |
|
|
_USER_FLAGS_ variable (do not forget the `-D` suffix flag):
|
1139 |
60 |
zero_gravi |
|
1140 |
|
|
[source, bash]
|
1141 |
|
|
----
|
1142 |
|
|
sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all all
|
1143 |
|
|
----
|
1144 |
|
|
|
1145 |
63 |
zero_gravi |
The provided define will change the default UART0/UART1 setup function in order to set the simulation
|
1146 |
|
|
mode flag in the according UART's control register.
|
1147 |
60 |
zero_gravi |
|
1148 |
|
|
[NOTE]
|
1149 |
|
|
The UART simulation output (to file and to screen) outputs "complete lines" at once. A line is
|
1150 |
|
|
completed with a line feed (newline, ASCII `\n` = 10).
|
1151 |
|
|
|
1152 |
|
|
|
1153 |
61 |
zero_gravi |
:sectnums:
|
1154 |
64 |
zero_gravi |
=== Simulation using a shell script (with GHDL)
|
1155 |
60 |
zero_gravi |
|
1156 |
64 |
zero_gravi |
To simulate the processor using _GHDL_ navigate to the `sim/simple/` folder and run the provided shell script.
|
1157 |
61 |
zero_gravi |
Any arguments that are provided while executing this script are passed to GHDL.
|
1158 |
|
|
For example the simulation time can be set to 20ms using `--stop-time=20ms` as argument.
|
1159 |
60 |
zero_gravi |
|
1160 |
|
|
[source, bash]
|
1161 |
|
|
----
|
1162 |
64 |
zero_gravi |
neorv32/sim/simple$ sh ghdl_sim.sh --stop-time=20ms
|
1163 |
60 |
zero_gravi |
----
|
1164 |
|
|
|
1165 |
|
|
|
1166 |
63 |
zero_gravi |
:sectnums:
|
1167 |
64 |
zero_gravi |
=== Simulation using Application Makefiles (In-Console with GHDL)
|
1168 |
60 |
zero_gravi |
|
1169 |
63 |
zero_gravi |
To directly compile and run a program in the console (using the default testbench and GHDL
|
1170 |
|
|
as simulator) you can use the `sim` makefile target. Make sure to use the UART simulation mode
|
1171 |
|
|
(`USER_FLAGS+=-DUART0_SIM_MODE` and/or `USER_FLAGS+=-DUART1_SIM_MODE`) to get
|
1172 |
|
|
faster / direct-to-console UART output.
|
1173 |
|
|
|
1174 |
|
|
[source, bash]
|
1175 |
|
|
----
|
1176 |
|
|
sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all sim
|
1177 |
|
|
[...]
|
1178 |
|
|
Blinking LED demo program
|
1179 |
|
|
----
|
1180 |
|
|
|
1181 |
|
|
|
1182 |
|
|
:sectnums:
|
1183 |
64 |
zero_gravi |
==== Hello World!
|
1184 |
63 |
zero_gravi |
|
1185 |
64 |
zero_gravi |
To do a quick test of the NEORV32 make sure to have https://github.com/ghdl/ghdl[GHDL] and a
|
1186 |
|
|
[RISC-V gcc toolchain](https://github.com/stnolting/riscv-gcc-prebuilt) installed.
|
1187 |
65 |
zero_gravi |
Navigate to the project's `sw/example/hello_world` folder and run `make USER_FLAGS+=-DUART0_SIM_MODE MARCH=rv32imac clean_all sim`:
|
1188 |
63 |
zero_gravi |
|
1189 |
|
|
[TIP]
|
1190 |
|
|
The simulator will output some _sanity check_ notes (and warnings or even errors if something is ill-configured)
|
1191 |
|
|
right at the beginning of the simulation to give a brief overview of the actual NEORV32 SoC and CPU configurations.
|
1192 |
|
|
|
1193 |
|
|
[source, bash]
|
1194 |
|
|
----
|
1195 |
65 |
zero_gravi |
stnolting@Einstein:/mnt/n/Projects/neorv32/sw/example/hello_world$ make USER_FLAGS+=-DUART0_SIM_MODE MARCH=rv32imac clean_all sim
|
1196 |
63 |
zero_gravi |
../../../sw/lib/source/neorv32_uart.c: In function 'neorv32_uart0_setup':
|
1197 |
|
|
../../../sw/lib/source/neorv32_uart.c:301:4: warning: #warning UART0_SIM_MODE (primary UART) enabled! Sending all UART0.TX data to text.io simulation output instead of real UART0 transmitter. Use this for simulations only! [-Wcpp]
|
1198 |
64 |
zero_gravi |
301 | #warning UART0_SIM_MODE (primary UART) enabled! Sending all UART0.TX data to text.io simulation output instead of real UART0 transmitter. Use this for simulations only! <1>
|
1199 |
63 |
zero_gravi |
| ^~~~~~~
|
1200 |
|
|
Memory utilization:
|
1201 |
|
|
text data bss dec hex filename
|
1202 |
64 |
zero_gravi |
4612 0 120 4732 127c main.elf <2>
|
1203 |
63 |
zero_gravi |
Compiling ../../../sw/image_gen/image_gen
|
1204 |
64 |
zero_gravi |
Installing application image to ../../../rtl/core/neorv32_application_image.vhd <3>
|
1205 |
63 |
zero_gravi |
Simulating neorv32_application_image.vhd...
|
1206 |
64 |
zero_gravi |
Tip: Compile application with USER_FLAGS+=-DUART[0/1]_SIM_MODE to auto-enable UART[0/1]'s simulation mode (redirect UART output to simulator console). <4>
|
1207 |
|
|
Using simulation runtime args: --stop-time=10ms <5>
|
1208 |
|
|
../rtl/core/neorv32_top.vhd:347:3:@0ms:(assertion note): NEORV32 PROCESSOR IO Configuration: GPIO MTIME UART0 UART1 SPI TWI PWM WDT CFS SLINK NEOLED XIRQ <6>
|
1209 |
63 |
zero_gravi |
../rtl/core/neorv32_top.vhd:370:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Boot configuration: Direct boot from memory (processor-internal IMEM).
|
1210 |
|
|
../rtl/core/neorv32_top.vhd:394:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing on-chip debugger (OCD).
|
1211 |
|
|
../rtl/core/neorv32_cpu.vhd:169:3:@0ms:(assertion note): NEORV32 CPU ISA Configuration (MARCH): RV32IMACU_Zbb_Zicsr_Zifencei_Zfinx_Debug
|
1212 |
|
|
../rtl/core/neorv32_cpu.vhd:189:3:@0ms:(assertion note): NEORV32 CPU CONFIG NOTE: Implementing NO dedicated hardware reset for uncritical registers (default, might reduce area). Set package constant = TRUE to configure a DEFINED reset value for all CPU registers.
|
1213 |
|
|
../rtl/core/neorv32_imem.vhd:107:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal IMEM as ROM (16384 bytes), pre-initialized with application (4612 bytes).
|
1214 |
|
|
../rtl/core/neorv32_dmem.vhd:89:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing processor-internal DMEM (RAM, 8192 bytes).
|
1215 |
|
|
../rtl/core/neorv32_wishbone.vhd:136:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing STANDARD Wishbone protocol.
|
1216 |
|
|
../rtl/core/neorv32_wishbone.vhd:140:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing auto-timeout (255 cycles).
|
1217 |
|
|
../rtl/core/neorv32_wishbone.vhd:144:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing LITTLE-endian byte order.
|
1218 |
|
|
../rtl/core/neorv32_wishbone.vhd:148:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: External Bus Interface - Implementing registered RX path.
|
1219 |
|
|
../rtl/core/neorv32_slink.vhd:161:3:@0ms:(assertion note): NEORV32 PROCESSOR CONFIG NOTE: Implementing 8 RX and 8 TX stream links.
|
1220 |
64 |
zero_gravi |
<7>
|
1221 |
63 |
zero_gravi |
##
|
1222 |
|
|
## ## ## ##
|
1223 |
|
|
## ## ######### ######## ######## ## ## ######## ######## ## ################
|
1224 |
|
|
#### ## ## ## ## ## ## ## ## ## ## ## ## ## #### ####
|
1225 |
|
|
## ## ## ## ## ## ## ## ## ## ## ## ## ## ###### ##
|
1226 |
|
|
## ## ## ######### ## ## ######### ## ## ##### ## ## #### ###### ####
|
1227 |
|
|
## ## ## ## ## ## ## ## ## ## ## ## ## ## ###### ##
|
1228 |
|
|
## #### ## ## ## ## ## ## ## ## ## ## ## #### ####
|
1229 |
|
|
## ## ######### ######## ## ## ## ######## ########## ## ################
|
1230 |
|
|
## ## ## ##
|
1231 |
|
|
##
|
1232 |
|
|
Hello world! :)
|
1233 |
|
|
----
|
1234 |
64 |
zero_gravi |
<1> Notifier that "simulation mode" of UART0 is enabled (by the `USER_FLAGS+=-DUART0_SIM_MODE` makefile flag). All UART0 output is send to the simulator console.
|
1235 |
|
|
<2> Final executable size (`text`) and _static_ data memory requirements (`data`, `bss`).
|
1236 |
|
|
<3> The application code is _installed_ as pre-initialized IMEM. This is the default approach for simulation.
|
1237 |
|
|
<4> A note regarding UART "simulation mode", but we have already enabled that.
|
1238 |
|
|
<5> List of (default) arguments that were send to the simulator. Here: maximum simulation time (10ms).
|
1239 |
|
|
<6> "Sanity checks" from the core's VHDL files. These reports give some brief information about the SoC/CPU configuration (-> generics). If there are problems with the current configuration, an ERROR will appear.
|
1240 |
|
|
<7> Execution of the actual program starts.
|
1241 |
63 |
zero_gravi |
|
1242 |
|
|
|
1243 |
|
|
:sectnums:
|
1244 |
64 |
zero_gravi |
=== Advanced Simulation using VUnit
|
1245 |
63 |
zero_gravi |
|
1246 |
64 |
zero_gravi |
https://vunit.github.io/[VUnit] is an open source unit testing framework for VHDL/SystemVerilog.
|
1247 |
|
|
It allows continuous and automated testing of HDL code by complementing traditional testing methodologies.
|
1248 |
|
|
The motto of VUnit is _"testing early and often"_ through automation.
|
1249 |
63 |
zero_gravi |
|
1250 |
64 |
zero_gravi |
VUnit is composed by a http://vunit.github.io/py/ui.html[Python interface] and multiple optional
|
1251 |
|
|
http://vunit.github.io/vhdl_libraries.html[VHDL libraries].
|
1252 |
|
|
The Python interface allows declaring sources and simulation options, and it handles the compilation, execution and
|
1253 |
|
|
gathering of the results regardless of the simulator used.
|
1254 |
|
|
That allows having a single `run.py` script to be used with GHDL, ModelSim/QuestaSim, Riviera PRO, etc.
|
1255 |
|
|
On the other hand, the VUnit's VHDL libraries provide utilities for assertions, logging, having virtual queues, handling CSV files, etc.
|
1256 |
|
|
The http://vunit.github.io/verification_components/user_guide.html[Verification Component Library] uses those features
|
1257 |
|
|
for abstracting away bit-toggling when verifying standard interfaces such as Wishbone, AXI, Avalon, UARTs, etc.
|
1258 |
63 |
zero_gravi |
|
1259 |
64 |
zero_gravi |
Testbench sources in `sim` (such as `sim/neorv32_tb.vhd` and `sim/uart_rx*.vhd`) use VUnit's VHDL libraries for testing
|
1260 |
|
|
NEORV32 and peripherals.
|
1261 |
66 |
zero_gravi |
The entry-point for executing the tests is `sim/run.py`.
|
1262 |
63 |
zero_gravi |
|
1263 |
64 |
zero_gravi |
[source, bash]
|
1264 |
|
|
----
|
1265 |
|
|
# ./sim/run.py -l
|
1266 |
|
|
neorv32.neorv32_tb.all
|
1267 |
|
|
Listed 1 tests
|
1268 |
63 |
zero_gravi |
|
1269 |
64 |
zero_gravi |
# ./sim/run.py -v
|
1270 |
|
|
Compiling into neorv32: rtl/core/neorv32_uart.vhd passed
|
1271 |
|
|
Compiling into neorv32: rtl/core/neorv32_twi.vhd passed
|
1272 |
|
|
Compiling into neorv32: rtl/core/neorv32_trng.vhd passed
|
1273 |
|
|
...
|
1274 |
|
|
----
|
1275 |
63 |
zero_gravi |
|
1276 |
64 |
zero_gravi |
See http://vunit.github.io/user_guide.html[VUnit: User Guide] and http://vunit.github.io/cli.html[VUnit: Command Line Interface] for further info about VUnit's features.
|
1277 |
|
|
|
1278 |
|
|
|
1279 |
60 |
zero_gravi |
<<<
|
1280 |
|
|
// ####################################################################################################################
|
1281 |
|
|
:sectnums:
|
1282 |
|
|
== Building the Documentation
|
1283 |
|
|
|
1284 |
61 |
zero_gravi |
The documentation (datasheet + user guide) is written using `asciidoc`. The according source files
|
1285 |
|
|
can be found in `docs/...`. The documentation of the software framework is written _in-code_ using `doxygen`.
|
1286 |
60 |
zero_gravi |
|
1287 |
62 |
zero_gravi |
A makefiles in the project's `docs` directory is provided to build all of the documentation as HTML pages
|
1288 |
60 |
zero_gravi |
or as PDF documents.
|
1289 |
|
|
|
1290 |
|
|
[TIP]
|
1291 |
61 |
zero_gravi |
Pre-rendered PDFs are available online as _nightly pre-releases_: https://github.com/stnolting/neorv32/releases.
|
1292 |
60 |
zero_gravi |
The HTML-based documentation is also available online at the project's https://stnolting.github.io/neorv32/[GitHub Pages].
|
1293 |
|
|
|
1294 |
|
|
The makefile provides a help target to show all available build options and their according outputs.
|
1295 |
|
|
|
1296 |
|
|
[source,bash]
|
1297 |
|
|
----
|
1298 |
62 |
zero_gravi |
neorv32/docs$ make help
|
1299 |
60 |
zero_gravi |
----
|
1300 |
|
|
|
1301 |
|
|
.Example: Generate HTML documentation (data sheet) using `asciidoctor`
|
1302 |
|
|
[source,bash]
|
1303 |
|
|
----
|
1304 |
62 |
zero_gravi |
neorv32/docs$ make html
|
1305 |
60 |
zero_gravi |
----
|
1306 |
|
|
|
1307 |
|
|
[TIP]
|
1308 |
|
|
If you don't have `asciidoctor` / `asciidoctor-pdf` installed, you can still generate all the documentation using
|
1309 |
|
|
a _docker container_ via `make container`.
|
1310 |
|
|
|
1311 |
|
|
|
1312 |
|
|
|
1313 |
|
|
<<<
|
1314 |
|
|
// ####################################################################################################################
|
1315 |
|
|
:sectnums:
|
1316 |
65 |
zero_gravi |
== Zephyr RTOS Support 🪁
|
1317 |
|
|
|
1318 |
|
|
The NEORV32 processor is supported by upstream Zephyr RTOS: https://docs.zephyrproject.org/latest/boards/riscv/neorv32/doc/index.html
|
1319 |
|
|
|
1320 |
|
|
[IMPORTANT]
|
1321 |
|
|
The absolute path to the NEORV32 executable image generator binary (`.../neorv32/sw/image_gen`) has to be added to the `PATH` variable
|
1322 |
|
|
so the Zephyr build system can generate executables and memory-initialization images.
|
1323 |
|
|
|
1324 |
|
|
[NOTE]
|
1325 |
|
|
Zephyr OS port provided by GitHub user https://github.com/henrikbrixandersen[henrikbrixandersen]
|
1326 |
|
|
(see https://github.com/stnolting/neorv32/discussions/172). ❤️
|
1327 |
|
|
|
1328 |
|
|
|
1329 |
|
|
|
1330 |
|
|
<<<
|
1331 |
|
|
// ####################################################################################################################
|
1332 |
|
|
:sectnums:
|
1333 |
60 |
zero_gravi |
== FreeRTOS Support
|
1334 |
|
|
|
1335 |
|
|
A NEORV32-specific port and a simple demo for FreeRTOS (https://github.com/FreeRTOS/FreeRTOS) are
|
1336 |
61 |
zero_gravi |
available in the `sw/example/demo_freeRTOS` folder. See the according documentation (`sw/example/demo_freeRTOS/README.md`)
|
1337 |
|
|
for more information.
|
1338 |
60 |
zero_gravi |
|
1339 |
|
|
|
1340 |
|
|
|
1341 |
|
|
// ####################################################################################################################
|
1342 |
|
|
:sectnums:
|
1343 |
|
|
== RISC-V Architecture Test Framework
|
1344 |
|
|
|
1345 |
|
|
The NEORV32 Processor passes the according tests provided by the official RISC-V Architecture Test Suite
|
1346 |
|
|
(V2.0+), which is available online at GitHub: https://github.com/riscv/riscv-arch-test
|
1347 |
|
|
|
1348 |
|
|
All files required for executing the test framework on a simulated instance of the processor (including port
|
1349 |
62 |
zero_gravi |
files) are located in the `sw/isa-test` folder of the NEORV32 repository. The test framework is executed via the
|
1350 |
|
|
`sim/run_riscv_arch_test.sh` script. Take a look at the provided `sim/README.md`
|
1351 |
|
|
(https://github.com/stnolting/neorv32/tree/master/sim[online at GitHub])
|
1352 |
60 |
zero_gravi |
file for more information on how to run the tests and how testing is conducted in detail.
|
1353 |
|
|
|
1354 |
|
|
|
1355 |
|
|
|
1356 |
|
|
<<<
|
1357 |
|
|
// ####################################################################################################################
|
1358 |
|
|
:sectnums:
|
1359 |
|
|
== Debugging using the On-Chip Debugger
|
1360 |
|
|
|
1361 |
61 |
zero_gravi |
The NEORV32 on-chip debugger allows _online_ in-system debugging via an external JTAG access port from a
|
1362 |
60 |
zero_gravi |
host machine. The general flow is independent of the host machine's operating system. However, this tutorial uses
|
1363 |
|
|
Windows and Linux (Ubuntu on Windows) in parallel.
|
1364 |
|
|
|
1365 |
61 |
zero_gravi |
[TIP]
|
1366 |
|
|
See datasheet section https://stnolting.github.io/neorv32/#_on_chip_debugger_ocd[On Chip Debugger (OCD)]
|
1367 |
|
|
for more information.
|
1368 |
|
|
|
1369 |
60 |
zero_gravi |
[NOTE]
|
1370 |
|
|
This tutorial uses `gdb` to **directly upload an executable** to the processor. If you are using the default
|
1371 |
|
|
processor setup _with_ internal instruction memory (IMEM) make sure it is implemented as RAM
|
1372 |
61 |
zero_gravi |
(_INT_BOOTLOADER_EN_ generic = true).
|
1373 |
60 |
zero_gravi |
|
1374 |
64 |
zero_gravi |
[IMPORTANT]
|
1375 |
|
|
The on-chip debugger is only implemented if the _ON_CHIP_DEBUGGER_EN_ generic is set _true_. Furthermore, it requires
|
1376 |
|
|
the `Zicsr` and `Zifencei` CPU extension to be implemented (top generics _CPU_EXTENSION_RISCV_Zicsr_
|
1377 |
|
|
and _CPU_EXTENSION_RISCV_Zifencei_ = true).
|
1378 |
60 |
zero_gravi |
|
1379 |
64 |
zero_gravi |
|
1380 |
60 |
zero_gravi |
:sectnums:
|
1381 |
|
|
=== Hardware Requirements
|
1382 |
|
|
|
1383 |
|
|
Make sure the on-chip debugger of your NEORV32 setups is implemented (_ON_CHIP_DEBUGGER_EN_ generic = true).
|
1384 |
|
|
Connect a JTAG adapter to the NEORV32 `jtag_*` interface signals. If you do not have a full-scale JTAG adapter, you can
|
1385 |
|
|
also use a FTDI-based adapter like the "FT2232H-56Q Mini Module", which is a simple and inexpensive FTDI breakout board.
|
1386 |
|
|
|
1387 |
|
|
.JTAG pin mapping
|
1388 |
|
|
[cols="^3,^2,^2"]
|
1389 |
|
|
[options="header",grid="rows"]
|
1390 |
|
|
|=======================
|
1391 |
|
|
| NEORV32 top signal | JTAG signal | FTDI port
|
1392 |
|
|
| `jtag_tck_i` | TCK | D0
|
1393 |
|
|
| `jtag_tdi_i` | TDI | D1
|
1394 |
|
|
| `jtag_tdo_o` | TDO | D2
|
1395 |
|
|
| `jtag_tms_i` | TMS | D3
|
1396 |
|
|
| `jtag_trst_i` | TRST | D4
|
1397 |
|
|
|=======================
|
1398 |
|
|
|
1399 |
|
|
[TIP]
|
1400 |
|
|
The low-active JTAG _test reset_ (TRST) signals is _optional_ as a reset can also be triggered via the TAP controller.
|
1401 |
|
|
If TRST is not used make sure to pull the signal _high_.
|
1402 |
|
|
|
1403 |
|
|
|
1404 |
|
|
:sectnums:
|
1405 |
|
|
=== OpenOCD
|
1406 |
|
|
|
1407 |
|
|
The NEORV32 on-chip debugger can be accessed using the https://github.com/riscv/riscv-openocd[RISC-V port of OpenOCD].
|
1408 |
|
|
Prebuilt binaries can be obtained - for example - from https://www.sifive.com/software[SiFive]. A pre-configured
|
1409 |
|
|
OpenOCD configuration file (`sw/openocd/openocd_neorv32.cfg`) is available that allows easy access to the NEORV32 CPU.
|
1410 |
|
|
|
1411 |
|
|
[NOTE]
|
1412 |
|
|
You might need to adapt `ftdi_vid_pid`, `ftdi_channel` and `ftdi_layout_init` in `sw/openocd/openocd_neorv32.cfg`
|
1413 |
|
|
according to your interface chip and your operating system.
|
1414 |
|
|
|
1415 |
|
|
[TIP]
|
1416 |
|
|
If you want to modify the JTAG clock speed (via `adapter speed` in `sw/openocd/openocd_neorv32.cfg`) make sure to meet
|
1417 |
|
|
the clock requirements noted in https://stnolting.github.io/neorv32/#_debug_module_dm[Documentation: Debug Transport Module (DTM)].
|
1418 |
|
|
|
1419 |
|
|
To access the processor using OpenOCD, open a terminal and start OpenOCD with the pre-configured configuration file.
|
1420 |
|
|
|
1421 |
|
|
.Connecting via OpenOCD (on Windows)
|
1422 |
|
|
[source, bash]
|
1423 |
|
|
--------------------------
|
1424 |
|
|
N:\Projects\neorv32\sw\openocd>openocd -f openocd_neorv32.cfg
|
1425 |
|
|
Open On-Chip Debugger 0.11.0-rc1+dev (SiFive OpenOCD 0.10.0-2020.12.1)
|
1426 |
|
|
Licensed under GNU GPL v2
|
1427 |
|
|
For bug reports:
|
1428 |
|
|
https://github.com/sifive/freedom-tools/issues
|
1429 |
|
|
1
|
1430 |
|
|
Info : Listening on port 6666 for tcl connections
|
1431 |
|
|
Info : Listening on port 4444 for telnet connections
|
1432 |
|
|
Info : clock speed 1000 kHz
|
1433 |
|
|
Info : JTAG tap: neorv32.cpu tap/device found: 0x0cafe001 (mfg: 0x000 (), part: 0xcafe, ver: 0x0)
|
1434 |
|
|
Info : datacount=1 progbufsize=2
|
1435 |
|
|
Info : Disabling abstract command reads from CSRs.
|
1436 |
|
|
Info : Examined RISC-V core; found 1 harts
|
1437 |
|
|
Info : hart 0: XLEN=32, misa=0x40801105
|
1438 |
|
|
Info : starting gdb server for neorv32.cpu.0 on 3333
|
1439 |
|
|
Info : Listening on port 3333 for gdb connections
|
1440 |
|
|
--------------------------
|
1441 |
|
|
|
1442 |
|
|
OpenOCD has successfully connected to the NEORV32 on-chip debugger and has examined the CPU (showing the content of
|
1443 |
|
|
the `misa` CSRs). Now you can use `gdb` to connect via port 3333.
|
1444 |
|
|
|
1445 |
|
|
|
1446 |
|
|
:sectnums:
|
1447 |
|
|
=== Debugging with GDB
|
1448 |
|
|
|
1449 |
|
|
This guide uses the simple "blink example" from `sw/example/blink_led` as simplified test application to
|
1450 |
|
|
show the basics of in-system debugging.
|
1451 |
|
|
|
1452 |
|
|
At first, the application needs to be compiled. We will use the minimal machine architecture configuration
|
1453 |
|
|
(`rv32i`) here to be independent of the actual processor/CPU configuration.
|
1454 |
|
|
Navigate to `sw/example/blink_led` and compile the application:
|
1455 |
|
|
|
1456 |
|
|
.Compile the test application
|
1457 |
|
|
[source, bash]
|
1458 |
|
|
--------------------------
|
1459 |
65 |
zero_gravi |
.../neorv32/sw/example/blink_led$ make MARCH=rv32i USER_FLAGS+=-g clean_all all
|
1460 |
60 |
zero_gravi |
--------------------------
|
1461 |
|
|
|
1462 |
64 |
zero_gravi |
.Adding debug symbols to the executable
|
1463 |
|
|
[NOTE]
|
1464 |
|
|
`USER_FLAGS+=-g` passes the `-g` flag to the compiler so it adds debug information/symbols
|
1465 |
|
|
to the generated ELF file. This is optional but will provide more sophisticated information for debugging
|
1466 |
|
|
(like source file line numbers).
|
1467 |
|
|
|
1468 |
60 |
zero_gravi |
This will generate an ELF file `main.elf` that contains all the symbols required for debugging.
|
1469 |
|
|
Furthermore, an assembly listing file `main.asm` is generated that we will use to define breakpoints.
|
1470 |
|
|
|
1471 |
|
|
Open another terminal in `sw/example/blink_led` and start `gdb`.
|
1472 |
61 |
zero_gravi |
The GNU debugger is part of the toolchain (see <<_software_toolchain_setup>>).
|
1473 |
60 |
zero_gravi |
|
1474 |
|
|
.Starting GDB (on Linux (Ubuntu on Windows))
|
1475 |
|
|
[source, bash]
|
1476 |
|
|
--------------------------
|
1477 |
|
|
.../neorv32/sw/example/blink_led$ riscv32-unknown-elf-gdb
|
1478 |
|
|
GNU gdb (GDB) 10.1
|
1479 |
|
|
Copyright (C) 2020 Free Software Foundation, Inc.
|
1480 |
|
|
License GPLv3+: GNU GPL version 3 or later
|
1481 |
|
|
This is free software: you are free to change and redistribute it.
|
1482 |
|
|
There is NO WARRANTY, to the extent permitted by law.
|
1483 |
|
|
Type "show copying" and "show warranty" for details.
|
1484 |
|
|
This GDB was configured as "--host=x86_64-pc-linux-gnu --target=riscv32-unknown-elf".
|
1485 |
|
|
Type "show configuration" for configuration details.
|
1486 |
|
|
For bug reporting instructions, please see:
|
1487 |
|
|
.
|
1488 |
|
|
Find the GDB manual and other documentation resources online at:
|
1489 |
|
|
.
|
1490 |
|
|
|
1491 |
|
|
For help, type "help".
|
1492 |
|
|
Type "apropos word" to search for commands related to "word".
|
1493 |
|
|
(gdb)
|
1494 |
|
|
--------------------------
|
1495 |
|
|
|
1496 |
64 |
zero_gravi |
Now connect to OpenOCD using the default port 3333 on your machine.
|
1497 |
|
|
We will use the previously generated ELF file `main.elf` from the `blink_led` example.
|
1498 |
|
|
Finally, upload the program to the processor and start debugging.
|
1499 |
60 |
zero_gravi |
|
1500 |
|
|
[NOTE]
|
1501 |
|
|
The executable that is uploaded to the processor is **not** the default NEORV32 executable (`neorv32_exe.bin`) that
|
1502 |
|
|
is used for uploading via the bootloader. Instead, all the required sections (like `.text`) are extracted from `mail.elf`
|
1503 |
|
|
by GDB and uploaded via the debugger's indirect memory access.
|
1504 |
|
|
|
1505 |
|
|
.Running GDB
|
1506 |
|
|
[source, bash]
|
1507 |
|
|
--------------------------
|
1508 |
64 |
zero_gravi |
(gdb) target extended-remote localhost:3333 <1>
|
1509 |
60 |
zero_gravi |
Remote debugging using localhost:3333
|
1510 |
|
|
warning: No executable has been specified and target does not support
|
1511 |
|
|
determining executable automatically. Try using the "file" command.
|
1512 |
|
|
0xffff0c94 in ?? () <2>
|
1513 |
|
|
(gdb) file main.elf <3>
|
1514 |
|
|
A program is being debugged already.
|
1515 |
|
|
Are you sure you want to change the file? (y or n) y
|
1516 |
|
|
Reading symbols from main.elf...
|
1517 |
|
|
(gdb) load <4>
|
1518 |
|
|
Loading section .text, size 0xd0c lma 0x0
|
1519 |
|
|
Loading section .rodata, size 0x39c lma 0xd0c
|
1520 |
|
|
Start address 0x00000000, load size 4264
|
1521 |
|
|
Transfer rate: 43 KB/sec, 2132 bytes/write.
|
1522 |
|
|
(gdb)
|
1523 |
|
|
--------------------------
|
1524 |
|
|
<1> Connect to OpenOCD
|
1525 |
|
|
<2> The CPU was still executing code from the bootloader ROM - but that does not matter here
|
1526 |
|
|
<3> Select `mail.elf` from the `blink_led` example
|
1527 |
|
|
<4> Upload the executable
|
1528 |
|
|
|
1529 |
|
|
After the upload, GDB will make the processor jump to the beginning of the uploaded executable
|
1530 |
|
|
(by default, this is the beginning of the instruction memory at `0x00000000`) skipping the bootloader
|
1531 |
|
|
and halting the CPU right before executing the `blink_led` application.
|
1532 |
|
|
|
1533 |
|
|
|
1534 |
|
|
:sectnums:
|
1535 |
|
|
==== Breakpoint Example
|
1536 |
|
|
|
1537 |
|
|
The following steps are just a small showcase that illustrate a simple debugging scheme.
|
1538 |
|
|
|
1539 |
|
|
While compiling `blink_led`, an assembly listing file `main.asm` was generated.
|
1540 |
|
|
Open this file with a text editor to check out what the CPU is going to do when resumed.
|
1541 |
|
|
|
1542 |
|
|
The `blink_led` example implements a simple counter on the 8 lowest GPIO output ports. The program uses
|
1543 |
|
|
"busy wait" to have a visible delay between increments. This waiting is done by calling the `neorv32_cpu_delay_ms`
|
1544 |
|
|
function. We will add a _breakpoint_ right at the end of this wait function so we can step through the iterations
|
1545 |
|
|
of the counter.
|
1546 |
|
|
|
1547 |
|
|
.Cut-out from `main.asm` generated from the `blink_led` example
|
1548 |
|
|
[source, assembly]
|
1549 |
|
|
--------------------------
|
1550 |
|
|
00000688 <__neorv32_cpu_delay_ms_end>:
|
1551 |
|
|
688: 01c12083 lw ra,28(sp)
|
1552 |
|
|
68c: 02010113 addi sp,sp,32
|
1553 |
|
|
690: 00008067 ret
|
1554 |
|
|
--------------------------
|
1555 |
|
|
|
1556 |
|
|
The very last instruction of the `neorv32_cpu_delay_ms` function is `ret` (= return)
|
1557 |
|
|
at hexadecimal `690` in this example. Add this address as _breakpoint_ to GDB.
|
1558 |
|
|
|
1559 |
|
|
[NOTE]
|
1560 |
|
|
The address might be different if you use a different version of the software framework or
|
1561 |
|
|
if different ISA options are configured.
|
1562 |
|
|
|
1563 |
|
|
.Adding a GDB breakpoint
|
1564 |
|
|
[source, bash]
|
1565 |
|
|
--------------------------
|
1566 |
|
|
(gdb) b * 0x690
|
1567 |
|
|
Breakpoint 1 at 0x690
|
1568 |
|
|
--------------------------
|
1569 |
|
|
|
1570 |
64 |
zero_gravi |
.How do breakpoints work?
|
1571 |
|
|
[TIP]
|
1572 |
|
|
The NEORV32 on-chip debugger does not provide any hardware breakpoints (RISC-V "trigger modules") that compare an address like the PC
|
1573 |
|
|
with a predefined value. Instead, gdb will modify the actual executable in IMEM: the actual instruction at the address
|
1574 |
|
|
of the specified breakpoint is replaced by a `break` / `c.break` instruction. Whenever execution reaches this instruction, debug mode is
|
1575 |
|
|
re-entered and the debugger restores the original instruction at this address to maintain original program behavior.
|
1576 |
|
|
|
1577 |
60 |
zero_gravi |
Now execute `c` (= continue). The CPU will resume operation until it hits the break-point.
|
1578 |
|
|
By this we can "step" from increment to increment.
|
1579 |
|
|
|
1580 |
|
|
.Iterating from breakpoint to breakpoint
|
1581 |
|
|
[source, bash]
|
1582 |
|
|
--------------------------
|
1583 |
|
|
Breakpoint 1 at 0x690
|
1584 |
|
|
(gdb) c
|
1585 |
|
|
Continuing.
|
1586 |
|
|
|
1587 |
|
|
Breakpoint 1, 0x00000690 in neorv32_cpu_delay_ms ()
|
1588 |
|
|
(gdb) c
|
1589 |
|
|
Continuing.
|
1590 |
|
|
|
1591 |
|
|
Breakpoint 1, 0x00000690 in neorv32_cpu_delay_ms ()
|
1592 |
|
|
(gdb) c
|
1593 |
|
|
Continuing.
|
1594 |
|
|
--------------------------
|
1595 |
|
|
|
1596 |
|
|
include::../legal.adoc[]
|