OpenCores

Line 90...

### To-Do / Wish List / Help Wanted

### To-Do / Wish List / Help Wanted

* Use LaTeX for data sheet

* Use LaTeX for data sheet

* Further size and performance optimization *(work in progress)*

* Further size and performance optimization *[work in progress]*

* Add associativity configuration for instruction cache

* Add associativity configuration for instruction cache

* Add a data cache

* Add *data* cache

* Burst mode for the external memory/bus interface

* Burst mode for the external memory/bus interface

* RISC-V `B` extension ([bitmanipulation](https://github.com/riscv/riscv-bitmanip)) *(shelved)*

* RISC-V `F` (using `Zfinx`?) CPU extension (single-precision floating point) *[planning]*

* RISC-V `B` CPU extension ([bitmanipulation](https://github.com/riscv/riscv-bitmanip)) *[shelved]*

* Synthesis results (+ wrappers?) for more/specific platforms

* Synthesis results (+ wrappers?) for more/specific platforms

* More support for FreeRTOS

* More support for FreeRTOS (like *all* traps)

* Port additional RTOSs (like [Zephyr](https://github.com/zephyrproject-rtos/zephyr) or [RIOT](https://www.riot-os.org))

* Port additional RTOSs (like [Zephyr](https://github.com/zephyrproject-rtos/zephyr) or [RIOT](https://www.riot-os.org))

* Single-precision floating point unit (`F`) *(planned)*

* Implement further RISC-V (or custom?) CPU extensions

* Implement further RISC-V (or custom?) CPU extensions

* Add debugger ([RISC-V debug spec](https://github.com/riscv/riscv-debug-spec))

* Add debugger ([RISC-V debug spec](https://github.com/riscv/riscv-debug-spec))

* Add memory-mapped trigger to testbench to quit simulation (using VHDL2008's `use std.env.finish;`) - but how? :thinking:

* Add memory-mapped trigger to testbench to quit simulation (maybe using VHDL2008's `use std.env.finish`?)

* ...

* ...

* [Ideas?](#ContributeFeedbackQuestions)

* [Ideas?](#ContributeFeedbackQuestions)

Line 187...

**Privileged architecture / CSR access** (`Zicsr` extension):

**Privileged architecture / CSR access** (`Zicsr` extension):

  * Privilege levels: `M-mode` (Machine mode)

  * Privilege levels: `M-mode` (Machine mode)

  * CSR access instructions: `CSRRW` `CSRRS` `CSRRC` `CSRRWI` `CSRRSI` `CSRRCI`

  * CSR access instructions: `CSRRW` `CSRRS` `CSRRC` `CSRRWI` `CSRRSI` `CSRRCI`

  * System instructions: `MRET` `WFI`

  * System instructions: `MRET` `WFI`

  * Pseudo-instructions are not listed

  * Pseudo-instructions are not listed

  * Counter CSRs: `cycle` `cycleh` `instret` `instreth` `time` `timeh` `mcycle` `mcycleh` `minstret` `minstreth` `mcounteren` `mcountinhibit`

  * Counter CSRs: `[m]cycle[h]` `[m]instret[m]` `time[h]` `[m]hpmcounter*[h]`(3..31, configurable) `mcounteren` `mcountinhibit` `mhpmevent*`(3..31, configurable)

  * Machine CSRs: `mstatus` `mstatush` `misa`(read-only!) `mie` `mtvec` `mscratch` `mepc` `mcause` `mtval` `mip` `mvendorid` [`marchid`](https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md) `mimpid` `mhartid` `mzext`(custom)

  * Machine CSRs: `mstatus[h]` `misa`(read-only!) `mie` `mtvec` `mscratch` `mepc` `mcause` `mtval` `mip` `mvendorid` [`marchid`](https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md) `mimpid` `mhartid` `mzext`(custom)

  * Supported exceptions and interrupts:

  * Supported exceptions and interrupts:

    * Misaligned instruction address

    * Misaligned instruction address

    * Instruction access fault (via unacknowledged bus access after timeout)

    * Instruction access fault (via unacknowledged bus access after timeout)

    * Illegal instruction

    * Illegal instruction

    * Breakpoint (via `ebreak` instruction)

    * Breakpoint (via `ebreak` instruction)

Line 212...

**Privileged architecture / instruction stream synchronization** (`Zifencei` extension):

**Privileged architecture / instruction stream synchronization** (`Zifencei` extension):

  * System instructions: `FENCE.I` (among others, used to clear and reload instruction cache)

  * System instructions: `FENCE.I` (among others, used to clear and reload instruction cache)

**Privileged architecture / Physical memory protection** (`PMP`, requires `Zicsr` extension):

**Privileged architecture / Physical memory protection** (`PMP`, requires `Zicsr` extension):

  * Additional machine CSRs: `pmpcfg0` `pmpcfg1` `pmpaddr0` `pmpaddr1` `pmpaddr2` `pmpaddr3` `pmpaddr4` `pmpaddr5` `pmpaddr6` `pmpaddr7`

  * Configurable number of regions

  * Additional machine CSRs: `pmpcfg*`(0..15) `pmpaddr*`(0..63)

### Non-RISC-V-Compliant Issues

### Non-RISC-V-Compliant Issues

* CPU and Processor are BIG-ENDIAN, but this should be no problem as the external memory bus interface provides big- and little-endian configurations

* CPU and Processor are BIG-ENDIAN, but this should be no problem as the external memory bus interface provides big- and little-endian configurations

* `misa` CSR is read-only - no dynamic enabling/disabling of synthesized CPU extensions during runtime; for compatibility: write accesses (in m-mode) are ignored and do not cause an exception

* `misa` CSR is read-only - no dynamic enabling/disabling of synthesized CPU extensions during runtime; for compatibility: write accesses (in m-mode) are ignored and do not cause an exception

* The physical memory protection (**PMP**) only supports `NAPOT` mode, a minimal granularity of 8 bytes and only up to 8 regions

* The physical memory protection (**PMP**) only supports `NAPOT` mode yet and a minimal granularity of 8 bytes

* The `A` extension only implements `lr.w` and `sc.w` instructions yet. However, these instructions are sufficient to emulate all further AMO operations

* The `A` extension only implements `lr.w` and `sc.w` instructions yet. However, these instructions are sufficient to emulate all further AMO operations

* The `mcause` trap code `0x80000000` (originally reserved in the RISC-V specs) is used to indicate a hardware reset (non-maskable reset).

* The `mcause` trap code `0x80000000` (originally reserved in the RISC-V specs) is used to indicate a hardware reset (as "non-maskable interrupt").

### NEORV32-Specific CPU Extensions

### NEORV32-Specific CPU Extensions

The NEORV32-specific extensions are always enabled and are indicated via the `X` bit in the `misa` CSR.

The NEORV32-specific extensions are always enabled and are indicated via the `X` bit in the `misa` CSR.

Line 241...

Line 242...

### NEORV32 CPU

### NEORV32 CPU

This chapter shows exemplary implementation results of the NEORV32 CPU for an **Intel Cyclone IV EP4CE22F17C6N FPGA** on

This chapter shows exemplary implementation results of the NEORV32 CPU for an **Intel Cyclone IV EP4CE22F17C6N FPGA** on

a DE0-nano board. The design was synthesized using **Intel Quartus Prime Lite 20.1** ("balanced implementation"). The timing

a DE0-nano board. The design was synthesized using **Intel Quartus Prime Lite 20.1** ("balanced implementation"). The timing

information is derived from the Timing Analyzer / Slow 1200mV 0C Model. If not otherwise specified, the default configuration

information is derived from the Timing Analyzer / Slow 1200mV 0C Model. If not otherwise specified, the default configuration

of the CPU's generics is assumed (for example no PMP). No constraints were used at all. The `u` and `Zifencei` extensions have

of the CPU's generics is assumed (e.g. no physical memory protection, no hardware performance monitors).

a negligible impact on the hardware requirements.

No constraints were used at all. The `u` and `Zifencei` extensions have a negligible impact on the hardware requirements.

Results generated for hardware version [`1.4.9.2`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).

Results generated for hardware version [`1.4.9.2`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).

| CPU Configuration                       | LEs        | FFs      | Memory bits | DSPs | f_max   |

| CPU Configuration                       | LEs        | FFs      | Memory bits | DSPs | f_max   |

|:----------------------------------------|:----------:|:--------:|:-----------:|:----:|:-------:|

|:----------------------------------------|:----------:|:--------:|:-----------:|:----:|:-------:|

Line 304...

Line 305...

The FPGA-specific memory components can be found in [`rtl/fpga_specific`](https://github.com/stnolting/neorv32/blob/master/rtl/fpga_specific/lattice_ice40up).

The FPGA-specific memory components can be found in [`rtl/fpga_specific`](https://github.com/stnolting/neorv32/blob/master/rtl/fpga_specific/lattice_ice40up).

* The clock frequencies marked with a "c" are constrained clocks. The remaining ones are _f_max_ results from the place and route timing reports.

* The clock frequencies marked with a "c" are constrained clocks. The remaining ones are _f_max_ results from the place and route timing reports.

* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32

* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32

bootloader to store and automatically boot an application program after reset (both tested successfully).

bootloader to store and automatically boot an application program after reset (both tested successfully).

* The setups with `PMP` implement 2 regions with a minimal granularity of 64kB.

* The setups with `PMP` implement 2 regions with a minimal granularity of 64kB.

* No HPM counters are implemented.

## Performance

## Performance

Line 315...

Line 317...

The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the NEORV32 and is available in the

The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the NEORV32 and is available in the

[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark

[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark

tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.

tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.

Results generated for hardware version [`1.4.7.0`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).

~~~

~~~

**Configuration**

**Configuration**

Hardware:       32kB IMEM, 16kB DMEM, no caches, 100MHz clock

Hardware:       32kB IMEM, 16kB DMEM, no caches(!), 100MHz clock

CoreMark:       2000 iterations, MEM_METHOD is MEM_STACK

CoreMark:       2000 iterations, MEM_METHOD is MEM_STACK

Compiler:       RISCV32-GCC 10.1.0 (rv32i toolchain)

Compiler:       RISCV32-GCC 10.1.0 (rv32i toolchain)

Compiler flags: default, see makefile

Compiler flags: default, see makefile

Peripherals:    UART for printing the results

Peripherals:    UART for printing the results

~~~

~~~

| CPU                                         | Executable Size | Optimization | CoreMark Score | CoreMarks/MHz |

Results generated for hardware version [`1.4.9.8`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).

| CPU (including `Zicsr`)                     | Executable Size | Optimization | CoreMark Score | CoreMarks/MHz |

|:--------------------------------------------|:---------------:|:------------:|:--------------:|:-------------:|

|:--------------------------------------------|:---------------:|:------------:|:--------------:|:-------------:|

| `rv32i`                                     |    27 424 bytes |        `-O3` |          35.71 |    **0.3571** |

| `rv32i`                                     |    28 756 bytes |        `-O3` |          36.36 |    **0.3636** |

| `rv32im`                                    |    26 232 bytes |        `-O3` |          66.66 |    **0.6666** |

| `rv32im`                                    |    27 516 bytes |        `-O3` |          68.97 |    **0.6897** |

| `rv32imc`                                   |    20 876 bytes |        `-O3` |          66.66 |    **0.6666** |

| `rv32imc`                                   |    22 008 bytes |        `-O3` |          68.97 |    **0.6897** |

| `rv32imc` + `FAST_MUL_EN`                   |    20 876 bytes |        `-O3` |          83.33 |    **0.8333** |

| `rv32imc` + `FAST_MUL_EN`                   |    22 008 bytes |        `-O3` |          86.96 |    **0.8696** |

| `rv32imc` + `FAST_MUL_EN` + `FAST_SHIFT_EN` |    20 876 bytes |        `-O3` |          86.96 |    **0.8696** |

| `rv32imc` + `FAST_MUL_EN` + `FAST_SHIFT_EN` |    22 008 bytes |        `-O3` |          90.91 |    **0.9091** |

The `FAST_MUL_EN` configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic). The `FAST_SHIFT_EN` configuration

The `FAST_MUL_EN` configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic). The `FAST_SHIFT_EN` configuration

uses a barrel shifter for CPU shift operations (enabled via the `FAST_SHIFT_EN` generic).

uses a barrel shifter for CPU shift operations (enabled via the `FAST_SHIFT_EN` generic).

When the `C` extension is enabled, branches to an unaligned uncompressed instruction require additional instruction fetch cycles.

When the `C` extension is enabled, branches to an unaligned uncompressed instruction require additional instruction fetch cycles.

Line 345...

Line 347...

### Instruction Cycles

### Instruction Cycles

The NEORV32 CPU is based on a two-stages pipelined architecutre. Each stage uses a multi-cycle processing scheme. Hence,

The NEORV32 CPU is based on a two-stages pipelined architecutre. Each stage uses a multi-cycle processing scheme. Hence,

each instruction requires several clock cycles to execute (2 cycles for ALU operations, ..., 40 cycles for divisions).

each instruction requires several clock cycles to execute (2 cycles for ALU operations, ..., 40 cycles for divisions).

The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on the available

The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on the available

CPU extensions.

CPU extensions. *By default* the CPU-internal shifter (e.g. for the `SLL` instruction) as well as the multiplier and divider of the

Please note that by default the CPU-internal shifter (e.g. for the `SLL` instruction) as well as the multiplier and divider of the

`M` extension use a bit-serial approach and require several cycles for completion.

`M` extension use a bit-serial approach and require several cycles for completion.

The following table shows the performance results for successfully running 2000 CoreMark

The following table shows the performance results for successfully running 2000 CoreMark

iterations, which reflects a pretty good "real-life" work load. The average CPI is computed by

iterations, which reflects a pretty good "real-life" work load. The average CPI is computed by

dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles; sampled via the `cycle[h]` CSRs)

dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles; sampled via the `cycle[h]` CSRs)

by the number of executed instructions (`instret[h]` CSRs). The executables were generated using optimization `-O3`.

by the number of executed instructions (`instret[h]` CSRs). The executables were generated using optimization `-O3`.

Results generated for hardware version [`1.4.7.0`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).

Results generated for hardware version [`1.4.9.8`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).

| CPU                                         | Required Clock Cycles | Executed Instructions | Average CPI |

| CPU  (including `Zicsr`)                    | Required Clock Cycles | Executed Instructions | Average CPI |

|:--------------------------------------------|----------------------:|----------------------:|:-----------:|

|:--------------------------------------------|----------------------:|----------------------:|:-----------:|

| `rv32i`                                     |         5 648 997 774 |         1 469 233 238 |    **3.84** |

| `rv32i`                                     |         5 595 750 503 |         1 466 028 607 |    **3.82** |

| `rv32im`                                    |         3 036 749 774 |           601 871 338 |    **5.05** |

| `rv32im`                                    |         2 966 086 503 |           598 651 143 |    **4.95** |

| `rv32imc`                                   |         3 036 959 882 |           615 034 616 |    **4.94** |

| `rv32imc`                                   |         2 981 786 734 |           611 814 918 |    **4.87** |

| `rv32imc` + `FAST_MUL_EN`                   |         2 454 407 882 |           615 034 588 |    **3.99** |

| `rv32imc` + `FAST_MUL_EN`                   |         2 399 234 734 |           611 814 918 |    **3.92** |

| `rv32imc` + `FAST_MUL_EN` + `FAST_SHIFT_EN` |         2 320 308 322 |           615 034 676 |    **3.77** |

| `rv32imc` + `FAST_MUL_EN` + `FAST_SHIFT_EN` |         2 265 135 174 |           611 814 948 |    **3.70** |

The `FAST_MUL_EN` configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic). The `FAST_SHIFT_EN` configuration

The `FAST_MUL_EN` configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic). The `FAST_SHIFT_EN` configuration

uses a barrel shifter for CPU shift operations (enabled via the `FAST_SHIFT_EN` generic).

uses a barrel shifter for CPU shift operations (enabled via the `FAST_SHIFT_EN` generic).

When the `C` extension is enabled branches to an unaligned uncompressed instruction require additional instruction fetch cycles.

When the `C` extension is enabled branches to an unaligned uncompressed instruction require additional instruction fetch cycles.

Line 591...

Line 590...

> S. Nolting, "The NEORV32 Processor", github.com/stnolting/neorv32

> S. Nolting, "The NEORV32 Processor", github.com/stnolting/neorv32

#### BSD 3-Clause License

#### BSD 3-Clause License

Copyright (c) 2020, Stephan Nolting. All rights reserved.

Copyright (c) 2021, Stephan Nolting. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are

Redistribution and use in source and binary forms, with or without modification, are

permitted provided that the following conditions are met:

permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of

1. Redistributions of source code must retain the above copyright notice, this list of

Browse

Tools

Subversion Repositories neorv32

[/] [neorv32/] [trunk/] [README.md] - Diff between revs 41 and 42

Rev 41	Rev 42
Line 90...	Line 90...


`### To-Do / Wish List / Help Wanted`	`### To-Do / Wish List / Help Wanted`

`* Use LaTeX for data sheet`	`* Use LaTeX for data sheet`
`* Further size and performance optimization (work in progress)`	`* Further size and performance optimization [work in progress]`
`* Add associativity configuration for instruction cache`	`* Add associativity configuration for instruction cache`
`* Add a data cache`	`* Add data cache`
`* Burst mode for the external memory/bus interface`	`* Burst mode for the external memory/bus interface`
* RISC-V `B` extension ([bitmanipulation](https://github.com/riscv/riscv-bitmanip)) (shelved)	* RISC-V `F` (using `Zfinx`?) CPU extension (single-precision floating point) [planning]
	* RISC-V `B` CPU extension ([bitmanipulation](https://github.com/riscv/riscv-bitmanip)) [shelved]
`* Synthesis results (+ wrappers?) for more/specific platforms`	`* Synthesis results (+ wrappers?) for more/specific platforms`
`* More support for FreeRTOS`	`* More support for FreeRTOS (like all traps)`
`* Port additional RTOSs (like [Zephyr](https://github.com/zephyrproject-rtos/zephyr) or [RIOT](https://www.riot-os.org))`	`* Port additional RTOSs (like [Zephyr](https://github.com/zephyrproject-rtos/zephyr) or [RIOT](https://www.riot-os.org))`
* Single-precision floating point unit (`F`) (planned)
`* Implement further RISC-V (or custom?) CPU extensions`	`* Implement further RISC-V (or custom?) CPU extensions`
`* Add debugger ([RISC-V debug spec](https://github.com/riscv/riscv-debug-spec))`	`* Add debugger ([RISC-V debug spec](https://github.com/riscv/riscv-debug-spec))`
* Add memory-mapped trigger to testbench to quit simulation (using VHDL2008's `use std.env.finish;`) - but how? :thinking:	* Add memory-mapped trigger to testbench to quit simulation (maybe using VHDL2008's `use std.env.finish`?)
`* ...`	`* ...`
`* [Ideas?](#ContributeFeedbackQuestions)`	`* [Ideas?](#ContributeFeedbackQuestions)`



Line 187...	Line 187...
Privileged architecture / CSR access (`Zicsr` extension):	Privileged architecture / CSR access (`Zicsr` extension):
* Privilege levels: `M-mode` (Machine mode)	* Privilege levels: `M-mode` (Machine mode)
* CSR access instructions: `CSRRW` `CSRRS` `CSRRC` `CSRRWI` `CSRRSI` `CSRRCI`	* CSR access instructions: `CSRRW` `CSRRS` `CSRRC` `CSRRWI` `CSRRSI` `CSRRCI`
* System instructions: `MRET` `WFI`	* System instructions: `MRET` `WFI`
`* Pseudo-instructions are not listed`	`* Pseudo-instructions are not listed`
* Counter CSRs: `cycle` `cycleh` `instret` `instreth` `time` `timeh` `mcycle` `mcycleh` `minstret` `minstreth` `mcounteren` `mcountinhibit`	* Counter CSRs: `[m]cycle[h]` `[m]instret[m]` `time[h]` `[m]hpmcounter[h]`(3..31, configurable) `mcounteren` `mcountinhibit` `mhpmevent`(3..31, configurable)
* Machine CSRs: `mstatus` `mstatush` `misa`(read-only!) `mie` `mtvec` `mscratch` `mepc` `mcause` `mtval` `mip` `mvendorid` [`marchid`](https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md) `mimpid` `mhartid` `mzext`(custom)	* Machine CSRs: `mstatus[h]` `misa`(read-only!) `mie` `mtvec` `mscratch` `mepc` `mcause` `mtval` `mip` `mvendorid` [`marchid`](https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md) `mimpid` `mhartid` `mzext`(custom)
`* Supported exceptions and interrupts:`	`* Supported exceptions and interrupts:`
`* Misaligned instruction address`	`* Misaligned instruction address`
`* Instruction access fault (via unacknowledged bus access after timeout)`	`* Instruction access fault (via unacknowledged bus access after timeout)`
`* Illegal instruction`	`* Illegal instruction`
* Breakpoint (via `ebreak` instruction)	* Breakpoint (via `ebreak` instruction)
Line 212...	Line 212...

Privileged architecture / instruction stream synchronization (`Zifencei` extension):	Privileged architecture / instruction stream synchronization (`Zifencei` extension):
* System instructions: `FENCE.I` (among others, used to clear and reload instruction cache)	* System instructions: `FENCE.I` (among others, used to clear and reload instruction cache)

Privileged architecture / Physical memory protection (`PMP`, requires `Zicsr` extension):	Privileged architecture / Physical memory protection (`PMP`, requires `Zicsr` extension):
* Additional machine CSRs: `pmpcfg0` `pmpcfg1` `pmpaddr0` `pmpaddr1` `pmpaddr2` `pmpaddr3` `pmpaddr4` `pmpaddr5` `pmpaddr6` `pmpaddr7`	`* Configurable number of regions`
	* Additional machine CSRs: `pmpcfg`(0..15) `pmpaddr`(0..63)


`### Non-RISC-V-Compliant Issues`	`### Non-RISC-V-Compliant Issues`

`* CPU and Processor are BIG-ENDIAN, but this should be no problem as the external memory bus interface provides big- and little-endian configurations`	`* CPU and Processor are BIG-ENDIAN, but this should be no problem as the external memory bus interface provides big- and little-endian configurations`
* `misa` CSR is read-only - no dynamic enabling/disabling of synthesized CPU extensions during runtime; for compatibility: write accesses (in m-mode) are ignored and do not cause an exception	* `misa` CSR is read-only - no dynamic enabling/disabling of synthesized CPU extensions during runtime; for compatibility: write accesses (in m-mode) are ignored and do not cause an exception
* The physical memory protection (PMP) only supports `NAPOT` mode, a minimal granularity of 8 bytes and only up to 8 regions	* The physical memory protection (PMP) only supports `NAPOT` mode yet and a minimal granularity of 8 bytes
* The `A` extension only implements `lr.w` and `sc.w` instructions yet. However, these instructions are sufficient to emulate all further AMO operations	* The `A` extension only implements `lr.w` and `sc.w` instructions yet. However, these instructions are sufficient to emulate all further AMO operations
* The `mcause` trap code `0x80000000` (originally reserved in the RISC-V specs) is used to indicate a hardware reset (non-maskable reset).	* The `mcause` trap code `0x80000000` (originally reserved in the RISC-V specs) is used to indicate a hardware reset (as "non-maskable interrupt").


`### NEORV32-Specific CPU Extensions`	`### NEORV32-Specific CPU Extensions`

The NEORV32-specific extensions are always enabled and are indicated via the `X` bit in the `misa` CSR.	The NEORV32-specific extensions are always enabled and are indicated via the `X` bit in the `misa` CSR.
Line 241...	Line 242...
`### NEORV32 CPU`	`### NEORV32 CPU`

`This chapter shows exemplary implementation results of the NEORV32 CPU for an Intel Cyclone IV EP4CE22F17C6N FPGA on`	`This chapter shows exemplary implementation results of the NEORV32 CPU for an Intel Cyclone IV EP4CE22F17C6N FPGA on`
`a DE0-nano board. The design was synthesized using Intel Quartus Prime Lite 20.1 ("balanced implementation"). The timing`	`a DE0-nano board. The design was synthesized using Intel Quartus Prime Lite 20.1 ("balanced implementation"). The timing`
`information is derived from the Timing Analyzer / Slow 1200mV 0C Model. If not otherwise specified, the default configuration`	`information is derived from the Timing Analyzer / Slow 1200mV 0C Model. If not otherwise specified, the default configuration`
of the CPU's generics is assumed (for example no PMP). No constraints were used at all. The `u` and `Zifencei` extensions have	`of the CPU's generics is assumed (e.g. no physical memory protection, no hardware performance monitors).`
`a negligible impact on the hardware requirements.`	No constraints were used at all. The `u` and `Zifencei` extensions have a negligible impact on the hardware requirements.

Results generated for hardware version [`1.4.9.2`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).	Results generated for hardware version [`1.4.9.2`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).

`\| CPU Configuration \| LEs \| FFs \| Memory bits \| DSPs \| f_max \|`	`\| CPU Configuration \| LEs \| FFs \| Memory bits \| DSPs \| f_max \|`
`\|:----------------------------------------\|:----------:\|:--------:\|:-----------:\|:----:\|:-------:\|`	`\|:----------------------------------------\|:----------:\|:--------:\|:-----------:\|:----:\|:-------:\|`
Line 304...	Line 305...
The FPGA-specific memory components can be found in [`rtl/fpga_specific`](https://github.com/stnolting/neorv32/blob/master/rtl/fpga_specific/lattice_ice40up).	The FPGA-specific memory components can be found in [`rtl/fpga_specific`](https://github.com/stnolting/neorv32/blob/master/rtl/fpga_specific/lattice_ice40up).
`* The clock frequencies marked with a "c" are constrained clocks. The remaining ones are _f_max_ results from the place and route timing reports.`	`* The clock frequencies marked with a "c" are constrained clocks. The remaining ones are _f_max_ results from the place and route timing reports.`
`* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32`	`* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32`
`bootloader to store and automatically boot an application program after reset (both tested successfully).`	`bootloader to store and automatically boot an application program after reset (both tested successfully).`
* The setups with `PMP` implement 2 regions with a minimal granularity of 64kB.	* The setups with `PMP` implement 2 regions with a minimal granularity of 64kB.
	`* No HPM counters are implemented.`



`## Performance`	`## Performance`

Line 315...	Line 317...

`The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the NEORV32 and is available in the`	`The [CoreMark CPU benchmark](https://www.eembc.org/coremark) was executed on the NEORV32 and is available in the`
`[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark`	`[sw/example/coremark](https://github.com/stnolting/neorv32/blob/master/sw/example/coremark) project folder. This benchmark`
`tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.`	`tests the capabilities of a CPU itself rather than the functions provided by the whole system / SoC.`

Results generated for hardware version [`1.4.7.0`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).

`~~~`	`~~~`
`Configuration`	`Configuration`
`Hardware: 32kB IMEM, 16kB DMEM, no caches, 100MHz clock`	`Hardware: 32kB IMEM, 16kB DMEM, no caches(!), 100MHz clock`
`CoreMark: 2000 iterations, MEM_METHOD is MEM_STACK`	`CoreMark: 2000 iterations, MEM_METHOD is MEM_STACK`
`Compiler: RISCV32-GCC 10.1.0 (rv32i toolchain)`	`Compiler: RISCV32-GCC 10.1.0 (rv32i toolchain)`
`Compiler flags: default, see makefile`	`Compiler flags: default, see makefile`
`Peripherals: UART for printing the results`	`Peripherals: UART for printing the results`
`~~~`	`~~~`

`\| CPU \| Executable Size \| Optimization \| CoreMark Score \| CoreMarks/MHz \|`	Results generated for hardware version [`1.4.9.8`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).

	\| CPU (including `Zicsr`) \| Executable Size \| Optimization \| CoreMark Score \| CoreMarks/MHz \|
`\|:--------------------------------------------\|:---------------:\|:------------:\|:--------------:\|:-------------:\|`	`\|:--------------------------------------------\|:---------------:\|:------------:\|:--------------:\|:-------------:\|`
\| `rv32i` \| 27 424 bytes \| `-O3` \| 35.71 \| 0.3571 \|	\| `rv32i` \| 28 756 bytes \| `-O3` \| 36.36 \| 0.3636 \|
\| `rv32im` \| 26 232 bytes \| `-O3` \| 66.66 \| 0.6666 \|	\| `rv32im` \| 27 516 bytes \| `-O3` \| 68.97 \| 0.6897 \|
\| `rv32imc` \| 20 876 bytes \| `-O3` \| 66.66 \| 0.6666 \|	\| `rv32imc` \| 22 008 bytes \| `-O3` \| 68.97 \| 0.6897 \|
\| `rv32imc` + `FAST_MUL_EN` \| 20 876 bytes \| `-O3` \| 83.33 \| 0.8333 \|	\| `rv32imc` + `FAST_MUL_EN` \| 22 008 bytes \| `-O3` \| 86.96 \| 0.8696 \|
\| `rv32imc` + `FAST_MUL_EN` + `FAST_SHIFT_EN` \| 20 876 bytes \| `-O3` \| 86.96 \| 0.8696 \|	\| `rv32imc` + `FAST_MUL_EN` + `FAST_SHIFT_EN` \| 22 008 bytes \| `-O3` \| 90.91 \| 0.9091 \|

The `FAST_MUL_EN` configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic). The `FAST_SHIFT_EN` configuration	The `FAST_MUL_EN` configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic). The `FAST_SHIFT_EN` configuration
uses a barrel shifter for CPU shift operations (enabled via the `FAST_SHIFT_EN` generic).	uses a barrel shifter for CPU shift operations (enabled via the `FAST_SHIFT_EN` generic).

When the `C` extension is enabled, branches to an unaligned uncompressed instruction require additional instruction fetch cycles.	When the `C` extension is enabled, branches to an unaligned uncompressed instruction require additional instruction fetch cycles.
Line 345...	Line 347...
`### Instruction Cycles`	`### Instruction Cycles`

`The NEORV32 CPU is based on a two-stages pipelined architecutre. Each stage uses a multi-cycle processing scheme. Hence,`	`The NEORV32 CPU is based on a two-stages pipelined architecutre. Each stage uses a multi-cycle processing scheme. Hence,`
`each instruction requires several clock cycles to execute (2 cycles for ALU operations, ..., 40 cycles for divisions).`	`each instruction requires several clock cycles to execute (2 cycles for ALU operations, ..., 40 cycles for divisions).`
`The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on the available`	`The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on the available`
`CPU extensions.`	CPU extensions. By default the CPU-internal shifter (e.g. for the `SLL` instruction) as well as the multiplier and divider of the

Please note that by default the CPU-internal shifter (e.g. for the `SLL` instruction) as well as the multiplier and divider of the
`M` extension use a bit-serial approach and require several cycles for completion.	`M` extension use a bit-serial approach and require several cycles for completion.

`The following table shows the performance results for successfully running 2000 CoreMark`	`The following table shows the performance results for successfully running 2000 CoreMark`
`iterations, which reflects a pretty good "real-life" work load. The average CPI is computed by`	`iterations, which reflects a pretty good "real-life" work load. The average CPI is computed by`
dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles; sampled via the `cycle[h]` CSRs)	dividing the total number of required clock cycles (only the timed core to avoid distortion due to IO wait cycles; sampled via the `cycle[h]` CSRs)
by the number of executed instructions (`instret[h]` CSRs). The executables were generated using optimization `-O3`.	by the number of executed instructions (`instret[h]` CSRs). The executables were generated using optimization `-O3`.

Results generated for hardware version [`1.4.7.0`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).	Results generated for hardware version [`1.4.9.8`](https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md).

`\| CPU \| Required Clock Cycles \| Executed Instructions \| Average CPI \|`	\| CPU (including `Zicsr`) \| Required Clock Cycles \| Executed Instructions \| Average CPI \|
`\|:--------------------------------------------\|----------------------:\|----------------------:\|:-----------:\|`	`\|:--------------------------------------------\|----------------------:\|----------------------:\|:-----------:\|`
\| `rv32i` \| 5 648 997 774 \| 1 469 233 238 \| 3.84 \|	\| `rv32i` \| 5 595 750 503 \| 1 466 028 607 \| 3.82 \|
\| `rv32im` \| 3 036 749 774 \| 601 871 338 \| 5.05 \|	\| `rv32im` \| 2 966 086 503 \| 598 651 143 \| 4.95 \|
\| `rv32imc` \| 3 036 959 882 \| 615 034 616 \| 4.94 \|	\| `rv32imc` \| 2 981 786 734 \| 611 814 918 \| 4.87 \|
\| `rv32imc` + `FAST_MUL_EN` \| 2 454 407 882 \| 615 034 588 \| 3.99 \|	\| `rv32imc` + `FAST_MUL_EN` \| 2 399 234 734 \| 611 814 918 \| 3.92 \|
\| `rv32imc` + `FAST_MUL_EN` + `FAST_SHIFT_EN` \| 2 320 308 322 \| 615 034 676 \| 3.77 \|	\| `rv32imc` + `FAST_MUL_EN` + `FAST_SHIFT_EN` \| 2 265 135 174 \| 611 814 948 \| 3.70 \|


The `FAST_MUL_EN` configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic). The `FAST_SHIFT_EN` configuration	The `FAST_MUL_EN` configuration uses DSPs for the multiplier of the `M` extension (enabled via the `FAST_MUL_EN` generic). The `FAST_SHIFT_EN` configuration
uses a barrel shifter for CPU shift operations (enabled via the `FAST_SHIFT_EN` generic).	uses a barrel shifter for CPU shift operations (enabled via the `FAST_SHIFT_EN` generic).

When the `C` extension is enabled branches to an unaligned uncompressed instruction require additional instruction fetch cycles.	When the `C` extension is enabled branches to an unaligned uncompressed instruction require additional instruction fetch cycles.
Line 591...	Line 590...

`> S. Nolting, "The NEORV32 Processor", github.com/stnolting/neorv32`	`> S. Nolting, "The NEORV32 Processor", github.com/stnolting/neorv32`

`#### BSD 3-Clause License`	`#### BSD 3-Clause License`

`Copyright (c) 2020, Stephan Nolting. All rights reserved.`	`Copyright (c) 2021, Stephan Nolting. All rights reserved.`

`Redistribution and use in source and binary forms, with or without modification, are`	`Redistribution and use in source and binary forms, with or without modification, are`
`permitted provided that the following conditions are met:`	`permitted provided that the following conditions are met:`

`1. Redistributions of source code must retain the above copyright notice, this list of`	`1. Redistributions of source code must retain the above copyright notice, this list of`