OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [docs/] [datasheet/] [overview.adoc] - Blame information for rev 60

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 60 zero_gravi
:sectnums:
2
== Overview
3
 
4
[quote]
5
____
6
RISC-V - Instruction Sets Want To Be Free!
7
____
8
 
9
The NEORV32footnote:[Pronounced "neo-R-V-thirty-two" or "neo-risc-five-thirty-two" in its long form.] is an open-source
10
RISC-V compatible processor system that is intended as *ready-to-go* auxiliary processor within a larger SoC
11
designs or as stand-alone custom / customizable microcontroller.
12
 
13
The system is highly configurable and provides optional common peripherals like embedded memories,
14
timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like
15
memories, NoCs and other peripherals. On-line and in-system debugging is supported by an OpenOCD/gdb
16
compatible on-chip debugger accessible via JTAG.
17
 
18
The software framework of the processor comes with application makefiles, software libraries for all CPU
19
and processor features, a bootloader, a runtime environment and several example programs – including a port
20
of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as
21
default toolchain (https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains are also provided]).
22
 
23
[TIP]
24
The project's change log is available in https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md[CHANGELOG.md]
25
in the root directory of the NEORV32 repository. Please also check out the <<_legal>> section.
26
 
27
 
28
 
29
:sectnums!:
30
=== Structure
31
 
32
Chapter <<_neorv32_processor_soc>>
33
 
34
* top entity signals and configuration generics, address space layout, internal peripheral devices and interrupts, internal
35
memories and caches, internal bus architecture, external bus interface
36
 
37
Chapter <<_neorv32_central_processing_unit_cpu>>
38
 
39
* instruction set(s) and extensions, instruction timing, control ans status registers, traps, exceptions and interrupts,
40
hardware execution safety, native bus interface
41
 
42
Chapter <<_on_chip_debugger_ocd>>
43
 
44
* on-chip debugging compatible to the "Minimal RISC-V Debug Specification Version 0.13.2".
45
 
46
Chapter <<_software_framework>>
47
 
48
* core libraries, bootloader, makefiles, runtime environment
49
 
50
Chapter <<_lets_get_it_started>>
51
 
52
* toolchain installation and setup, hardware setup, software setup, application compilation, simulating the processor
53
debugging using the on-chip debugger
54
 
55
[TIP]
56
Links in this document are <<_structure,highlighted>>.
57
 
58
 
59
 
60
<<<
61
// ####################################################################################################################
62
:sectnums:
63
=== Project Key Features
64
 
65
* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU - passes the official RISC-V architecture tests
66
* official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID]
67
* optional RISC-V CPU extensions:
68
** `A` - atomic memory access operations
69
** `B` - bit-manipulation instructions
70
** `C` - 16-bit compressed instructions
71
** `E` - embedded CPU version (reduced register file size)
72
** `M` - integer multiplication and division hardware
73
** `U` - less-privileged _user_ mode
74
** `Zfinx` - single-precision floating-point unit
75
** `Zicsr` - control and status register access (privileged architecture)
76
** `Zifencei` - instruction stream synchronization
77
** `PMP` - physical memory protection
78
** `HPM` - hardware performance monitors
79
* **Software framework**
80
** GCC-based toolchain - prebuilt toolchains available; application compilation based on GNU makefiles
81
** internal bootloader with serial user interface
82
** core libraries for high-level usage of the provided functions and peripherals
83
** runtime environment and several example programs
84
** doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html
85
** FreeRTOS port + demos available
86
* **NEORV32 Processor**: highly-configurable full-scale microcontroller-like processor system / SoC based on the NEORV32 CPU with optional standard peripherals:
87
** serial interfaces (UARTs, TWI, SPI)
88
** timers and counters (WDT, MTIME, NCO)
89
** general purpose IO and PWM and native NeoPixel (c) compatible smart LED interface
90
** embedded memories / caches for data, instructions and bootloader
91
** external memory interface (Wishbone or AXI4-Lite)
92
* on-chip debugger compatible with OpenOCD and gdb
93
* fully synchronous design, no latches, no gated clocks
94
* completely described in behavioral, platform-independent VHDL
95
* small hardware footprint and high operating frequency
96
 
97
 
98
<<<
99
// ####################################################################################################################
100
:sectnums:
101
=== Project Folder Structure
102
 
103
...................................
104
neorv32            - Project home folder
105
├.ci              - Scripts for continuous integration
106
├boards           - Example setups for various FPGA boards
107
├CHANGELOG.md     - Project change log
108
├docs             - Project documentation
109
│├doxygen_build  - Software framework documentation (generated by doxygen)
110
│├src_adoc       - AsciiDoc sources for this document
111
│├references     - Data sheets and RISC-V specs.
112
│└figures        - Figures and logos
113
├riscv-arch-test  - Port files for the official RISC-V architecture tests
114
├rtl              - VHDL sources
115
│├core           - Sources of the CPU & SoC
116
│└top_templates  - Alternate/additional top entities/wrappers
117
├sim              - Simulation files
118
│├ghdl           - Simulation scripts for GHDL
119
│├rtl_modules    - Processor modules for simulation-only
120
│└vivado         - Pre-configured Xilinx ISIM waveform
121
└sw               - Software framework
122
 ├bootloader      - Sources and scripts for the NEORV32 internal bootloader
123
 ├common          - Linker script and crt0.S start-up code
124
 ├example         - Various example programs
125
 │└...
126
 ├ocd_firmware    - source code for on-chip debugger's "park loop"
127
 ├openocd         - OpenOCD on-chip debugger configuration files
128
 ├image_gen       - Helper program to generate NEORV32 executables
129
 └lib             - Processor core library
130
  ├include        - Header files (*.h)
131
  └source         - Source files (*.c)
132
...................................
133
 
134
[NOTE]
135
There are further files and folders starting with a dot which – for example – contain
136
data/configurations only relevant for git or for the continuous integration framework (`.ci`).
137
 
138
 
139
<<<
140
// ####################################################################################################################
141
:sectnums:
142
=== VHDL File Hierarchy
143
 
144
All necessary VHDL hardware description files are located in the project's `rtl/core folder`. The top entity
145
of the entire processor including all the required configuration generics is **`neorv32_top.vhd`**.
146
 
147
[IMPORTANT]
148
All core VHDL files from the list below have to be assigned to a new design library named **`neorv32`**. Additional
149
files, like alternative top entities, can be assigned to any library.
150
 
151
...................................
152
neorv32_top.vhd                      - NEORV32 Processor top entity
153
├neorv32_boot_rom.vhd               - Bootloader ROM
154
│└neorv32_bootloader_image.vhd     - Bootloader boot ROM memory image
155
├neorv32_busswitch.vhd              - Processor bus switch for CPU buses (I&D)
156
├neorv32_bus_keeper.vhd             - Processor-internal bus monitor
157
├neorv32_icache.vhd                 - Processor-internal instruction cache
158
├neorv32_cfs.vhd                    - Custom functions subsystem
159
├neorv32_cpu.vhd                    - NEORV32 CPU top entity
160
│├neorv32_package.vhd              - Processor/CPU main VHDL package file
161
│├neorv32_cpu_alu.vhd              - Arithmetic/logic unit
162
│├neorv32_cpu_bus.vhd              - Bus interface unit + physical memory protection
163
│├neorv32_cpu_control.vhd          - CPU control, exception/IRQ system and CSRs
164
││└neorv32_cpu_decompressor.vhd   - Compressed instructions decoder
165
│├neorv32_cpu_cp_fpu.vhd           - Floating-point co-processor (Zfinx extension)
166
│├neorv32_cpu_cp_muldiv.vhd        - Mul/Div co-processor (M extension)
167
│└neorv32_cpu_regfile.vhd          - Data register file
168
├neorv32_debug_dm.vhd               - on-chip debugger: debug module
169
├neorv32_debug_dtm.vhd              - on-chip debugger: debug transfer module
170
├neorv32_dmem.vhd                   - Processor-internal data memory
171
├neorv32_gpio.vhd                   - General purpose input/output port unit
172
├neorv32_imem.vhd                   - Processor-internal instruction memory
173
│└neor32_application_image.vhd     - IMEM application initialization image
174
├neorv32_mtime.vhd                  - Machine system timer
175
├neorv32_nco.vhd                    - Numerically-controlled oscillator
176
├neorv32_neoled.vhd                 - NeoPixel (TM) compatible smart LED interface
177
├neorv32_pwm.vhd                    - Pulse-width modulation controller
178
├neorv32_spi.vhd                    - Serial peripheral interface controller
179
├neorv32_sysinfo.vhd                - System configuration information memory
180
├neorv32_trng.vhd                   - True random number generator
181
├neorv32_twi.vhd                    - Two wire serial interface controller
182
├neorv32_uart.vhd                   - Universal async. receiver/transmitter
183
├neorv32_wdt.vhd                    - Watchdog timer
184
└neorv32_wb_interface.vhd           - External (Wishbone) bus interface
185
...................................
186
 
187
 
188
<<<
189
// ####################################################################################################################
190
:sectnums:
191
=== FPGA Implementation Results
192
 
193
This chapter shows exemplary implementation results of the NEORV32 CPU and Processor. Please note, that
194
the provided results are just a relative measure as logic functions of different modules might be merged
195
between entity boundaries, so the actual utilization results might vary a bit.
196
 
197
:sectnums:
198
==== CPU
199
 
200
[cols="<2,<8"]
201
[grid="topbot"]
202
|=======================
203
| Hardware version: | `1.5.5.5`
204
| Top entity:       | `rtl/core/neorv32_cpu.vhd`
205
|=======================
206
 
207
[cols="<5,>1,>1,>1,>1,>1"]
208
[options="header",grid="rows"]
209
|=======================
210
| CPU                                   | LEs  | FFs  | MEM bits | DSPs | _f~max~_
211
| `rv32i`                               |  980 |  409 | 1024     | 0    | 123 MHz
212
| `rv32i_Zicsr`                         | 1835 |  856 | 1024     | 0    | 124 MHz
213
| `rv32im_Zicsr`                        | 2443 | 1134 | 1024     | 0    | 124 MHz
214
| `rv32imc_Zicsr`                       | 2669 | 1149 | 1024     | 0    | 125 MHz
215
| `rv32imac_Zicsr`                      | 2685 | 1156 | 1024     | 0    | 124 MHz
216
| `rv32imac_Zicsr` + `debug_mode`       | 3058 | 1225 | 1024     | 0    | 120 MHz
217
| `rv32imac_Zicsr` + `u`                | 2698 | 1162 | 1024     | 0    | 124 MHz
218
| `rv32imac_Zicsr_Zifencei` + `u`       | 2715 | 1162 | 1024     | 0    | 122 MHz
219
| `rv32imac_Zicsr_Zifencei_Zfinx` + `u` | 4004 | 1812 | 1024     | 7    | 121 MHz
220
|=======================
221
 
222
 
223
:sectnums:
224
==== Processor Modules
225
 
226
[cols="<2,<8"]
227
[grid="topbot"]
228
|=======================
229
| Hardware version: | `1.5.5.9`
230
| Top entity:       | `rtl/core/neorv32_top.vhd`
231
|=======================
232
 
233
.Hardware utilization by the processor modules (mandatory core modules in **bold**)
234
[cols="<2,<8,>1,>1,>2,>1"]
235
[options="header",grid="rows"]
236
|=======================
237
| Module        | Description                                         | LEs | FFs | MEM bits | DSPs
238
| Boot ROM      | Bootloader ROM (4kB)                                |   3 |   1 |    32768 |    0
239
| **BUSKEEPER** | Processor-internal bus monitor                      |  11 |   6 |        0 |    0
240
| **BUSSWITCH** | Bus mux for CPU instr. and data interface           |  49 |   8 |        0 |    0
241
| CFS           | Custom functions subsystem                          |   - |   - |        - |    -
242
| DMEM          | Processor-internal data memory (8kB)                |  18 |   2 |    65536 |    0
243
| DM            | On-chip debugger - debug module                     | 493 | 240 |        0 |    0
244
| DTM           | On-chip debugger - debug transfer module (JTAG)     | 254 | 218 |        0 |    0
245
| GPIO          | General purpose input/output ports                  |  67 |  65 |        0 |    0
246
| iCACHE        | Instruction cache (1x4 blocks, 256 bytes per block) | 220 | 154 |     8192 |    0
247
| IMEM          | Processor-internal instruction memory (16kB)        |   6 |   2 |   131072 |    0
248
| MTIME         | Machine system timer                                | 289 | 200 |        0 |    0
249
| NCO           | Numerically-controlled oscillator                   | 254 | 226 |        0 |    0
250
| NEOLED        | Smart LED Interface (NeoPixel/WS28128) [4xFIFO]     | 347 | 309 |        0 |    0
251
| PWM           | Pulse_width modulation controller (4 channels)      |  71 |  69 |        0 |    0
252
| SPI           | Serial peripheral interface                         | 138 | 124 |        0 |    0
253
| **SYSINFO**   | System configuration information memory             |  10 |  10 |        0 |    0
254
| TRNG          | True random number generator                        | 132 | 105 |        0 |    0
255
| TWI           | Two-wire interface                                  |  77 |  44 |        0 |    0
256
| UART0/1       | Universal asynchronous receiver/transmitter 0/1     | 176 | 132 |        0 |    0
257
| WDT           | Watchdog timer                                      |  60 |  45 |        0 |    0
258
| WISHBONE      | External memory interface                           | 129 | 104 |        0 |    0
259
|=======================
260
 
261
 
262
<<<
263
:sectnums:
264
==== Exemplary Setups
265
 
266
[TIP]
267
Exemplary setups for different technologies and various FPGA boards can be found in the `boards` folder
268
(https://github.com/stnolting/neorv32/tree/master/boards).
269
 
270
The following table shows exemplary NEORV32 processor implementation results for different FPGA
271
platforms. Most setups use the default peripheral configuration (like no CFS, no caches and no
272
TRNG), no external memory interface and only internal instruction and data memories (IMEM uses 16kB
273
and DMEM uses 8kB memory space).
274
 
275
[cols="<2,<8"]
276
[grid="topbot"]
277
|=======================
278
| Hardware version: | `1.4.9.0`
279
|=======================
280
 
281
.Hardware utilization for exemplary NEORV32 setups
282
[cols="<4,<5,<4,<4,<3,<3,<3,<4,<4,<3"]
283
[options="header",grid="rows"]
284
|=======================
285
| Vendor  | FPGA                             | Board            | Toolchain               | CPU                               | LUT        | FF         | DSP    | Memory                        | _f_
286
| Intel   | Cyclone IV `EP4CE22F17-C6N`      | Terasic DE0-Nano | Quartus Prime Lite 20.1 | `rv32imcu_Zicsr_Zifencei` + `PMP` | 3813 (17%) | 1890 (8%)  | 0 (0%) | Memory bits: 231424 (38%)     | 119 MHz
287
| Lattice | iCE40 UltraPlus `iCE40UP5KSG48I` | Upduino v3.0     | Radiant 2.1             | `rv32icu_Zicsr_Zifencei`          | 5123 (97%) | 1972 (37%) | 0 (0%) | EBR: 12 (40%) SPRAM: 4 (100%) | 24 MHz
288
| Xilinx  | Artix-7 `XC7A35TICSG324-1L`      | Arty A7-35T      | Vivado 2019.2           | `rv32imcu_Zicsr_Zifencei` + `PMP` | 2465 (12%) | 1912 (5%)  | 0 (0%) | BRAM: 8 (16%)                 | 100 MHz
289
|=======================
290
 
291
**Notes**
292
 
293
* The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DEMEM (each 64kB).
294
* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32 bootloader to store and automatically boot an application program after reset (both tested successfully).
295
* The setups with PMP implement 2 regions with a minimal granularity of 64kB.
296
* No HPM counters are used.
297
 
298
 
299
<<<
300
// ####################################################################################################################
301
:sectnums:
302
=== CPU Performance
303
 
304
:sectnums:
305
==== CoreMark Benchmark
306
 
307
.Configuration
308
[cols="<2,<8"]
309
[grid="topbot"]
310
|=======================
311
| Hardware:       | 32kB IMEM, 16kB DMEM, no caches, 100MHz clock
312
| CoreMark:       | 2000 iterations, MEM_METHOD is MEM_STACK
313
| Compiler:       | RISCV32-GCC 10.1.0
314
| Peripherals:    | UART for printing the results
315
| Compiler flags: | default, see makefile
316
|=======================
317
 
318
The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark]. This
319
benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole
320
system. The according source code and the SW project can be found in the `sw/example/coremark` folder.
321
 
322
The resulting CoreMark score is defined as CoreMark iterations per second.
323
The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is
324
defined as CoreMark score divided by the CPU's clock frequency in MHz.
325
 
326
[cols="<2,<8"]
327
[grid="topbot"]
328
|=======================
329
| Hardware version: | `1.4.9.8`
330
|=======================
331
 
332
.CoreMark results
333
[cols="<4,>1,>1,>1"]
334
[options="header",grid="rows"]
335
|=======================
336
| CPU (incl. `Zicsr`)                         | Executable size | CoreMark Score | CoreMarks/Mhz
337
| `rv32i`                                     |     28756 bytes |          36.36 | **0.3636**
338
| `rv32im`                                    |     27516 bytes |          68.97 | **0.6897**
339
| `rv32imc`                                   |     22008 bytes |          68.97 | **0.6897**
340
| `rv32imc` + _FAST_MUL_EN_                   |     22008 bytes |          86.96 | **0.8696**
341
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ |     22008 bytes |          90.91 | **0.9091**
342
|=======================
343
 
344
[NOTE]
345
All executable were generated using maximum optimization `-O3`.
346
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the _M_ extension (enabled via the
347
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
348
operations (enabled via the _FAST_SHIFT_EN_ generic).
349
 
350
 
351
<<<
352
:sectnums:
353
==== Instruction Timing
354
 
355
The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of
356
several consecutive micro operations. Hence, each instruction requires several clock cycles to execute.
357
 
358
The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on
359
the available CPU extensions. The following table shows the performance results for successfully (!) running
360
2000 CoreMark iterations.
361
 
362
The average CPI is computed by dividing the total number of required clock cycles (only the timed core to
363
avoid distortion due to IO wait cycles) by the number of executed instructions (`[m]instret[h]` CSRs). The
364
executables were generated using optimization -O3.
365
 
366
[cols="<2,<8"]
367
[grid="topbot"]
368
|=======================
369
| Hardware version: | `1.4.9.8`
370
|=======================
371
 
372
.CoreMark instruction timing
373
[cols="<4,>2,>2,>2"]
374
[options="header",grid="rows"]
375
|=======================
376
| CPU (incl. `Zicsr`)                         | Required clock cycles | Executed instruction | Average CPI
377
| `rv32i`                                     |            5595750503 | 1466028607           | **3.82**
378
| `rv32im`                                    |            2966086503 |  598651143           | **4.95**
379
| `rv32imc`                                   |            2981786734 |  611814918           | **4.87**
380
| `rv32imc` + _FAST_MUL_EN_                   |            2399234734 |  611814918           | **3.92**
381
| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ |            2265135174 |  611814948           | **3.70**
382
|=======================
383
 
384
[TIP]
385
The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the M extension (enabled via the
386
_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift
387
operations (enabled via the _FAST_SHIFT_EN_ generic).
388
 
389
[TIP]
390
More information regarding the execution time of each implemented instruction can be found in
391
chapter <<_instruction_timing>>.
392
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.