OpenCores
URL https://opencores.org/ocsvn/neorv32/neorv32/trunk

Subversion Repositories neorv32

Compare Revisions

  • This comparison shows the changes necessary to convert path
    /neorv32/trunk/docs
    from Rev 59 to Rev 60
    Reverse comparison

Rev 59 → Rev 60

/src_adoc/cpu_csr.adoc File deleted
/src_adoc/neorv32.adoc File deleted
/src_adoc/soc_spi.adoc File deleted
/src_adoc/soc_wdt.adoc File deleted
/src_adoc/icons/tip.png Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream
src_adoc/icons/tip.png Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: src_adoc/icons/important.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: src_adoc/icons/important.png =================================================================== --- src_adoc/icons/important.png (revision 59) +++ src_adoc/icons/important.png (nonexistent)
src_adoc/icons/important.png Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: src_adoc/icons/warning.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: src_adoc/icons/warning.png =================================================================== --- src_adoc/icons/warning.png (revision 59) +++ src_adoc/icons/warning.png (nonexistent)
src_adoc/icons/warning.png Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: src_adoc/icons/note.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: src_adoc/icons/note.png =================================================================== --- src_adoc/icons/note.png (revision 59) +++ src_adoc/icons/note.png (nonexistent)
src_adoc/icons/note.png Property changes : Deleted: svn:mime-type ## -1 +0,0 ## -application/octet-stream \ No newline at end of property Index: src_adoc/soc_twi.adoc =================================================================== --- src_adoc/soc_twi.adoc (revision 59) +++ src_adoc/soc_twi.adoc (nonexistent) @@ -1,84 +0,0 @@ -<<< -:sectnums: -==== Two-Wire Serial Interface Controller (TWI) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_twi.vhd | -| Software driver file(s): | neorv32_twi.c | -| | neorv32_twi.h | -| Top entity port: | `twi_sda_io` | 1-bit bi-directional serial data -| | `twi_scl_io` | 1-bit bi-directional serial clock -| Configuration generics: | _IO_TWI_EN_ | implement TWI controller when _true_ -| CPU interrupts: | fast IRQ channel 7 | transmission done interrupt (see <<_processor_interrupts>>) -|======================= - -**Theory of Operation** - -The two wire interface – also called "I²C" – is a quite famous interface for connecting several on-board -components. Since this interface only needs two signals (the serial data line `twi_sda_io` and the serial -clock line `twi_scl_io`) – despite of the number of connected devices – it allows easy interconnections of -several peripheral nodes. - -The NEORV32 TWI implements a **TWI controller**. It features "clock stretching" (if enabled via the control -register), so a slow peripheral can halt the transmission by pulling the SCL line low. Currently, **no multi-controller -support** is available. Also, the NEORV32 TWI unit cannot operate in peripheral mode. - -The TWI is enabled via the _TWI_CT_EN_ bit in the _TWI_CT_ control register. The user program can start / stop a -transmission by issuing a START or STOP condition. These conditions are generated by setting the -according bits (_TWI_CT_START_ or _TWI_CT_STOP_) in the control register. - -Data is send by writing a byte to the _TWI_DATA_ register. Received data can also be read from this -register. The TWI controller is busy (transmitting data or performing a START or STOP condition) as long as the -_TWI_CT_BUSY_ bit in the control register is set. - -An accessed peripheral has to acknowledge each transferred byte. When the _TWI_CT_ACK_ bit is set after a -completed transmission, the accessed peripheral has send an acknowledge. If it is cleared after a -transmission, the peripheral has send a not-acknowledge (NACK). The NEORV32 TWI controller can also -send an ACK by itself ("controller acknowledge _MACK_") after a transmission by pulling SDA low during the -ACK time slot. Set the _TWI_CT_MACK_ bit to activate this feature. If this bit is cleared, the ACK/NACK of the -peripheral is sampled in this time slot instead (normal mode). - -In summary, the following independent TWI operations can be triggered by the application program: - -* send START condition (also as REPEATED START condition) -* send STOP condition -* send (at least) one byte while also sampling one byte from the bus - -[IMPORTANT] -The serial clock (SCL) and the serial data (SDA) lines can only be actively driven low by the -controller. Hence, external pull-up resistors are required for these lines. - -The TWI clock frequency is defined via the 3-bit _TWI_CT_PRSCx_ clock prescaler. The following prescalers -are available: - -.TWI prescaler configuration -[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] -[options="header",grid="rows"] -|======================= -| **`TWI_CT_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` -| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 -|======================= - -Based on the _TWI_CT_PRSCx_ configuration, the actual TWI clock frequency f~SCL~ is derived from the processor main clock f~main~ and is determined by: - -_**f~SCL~**_ = _f~main~[Hz]_ / (4 * `clock_prescaler`) - -.TWI register map -[cols="<2,<2,<4,^1,<7"] -[options="header",grid="all"] -|======================= -| Address | Name [C] | Bit(s), Name [C] | R/W | Function -.10+<| `0xffffffb0` .10+<| _TWI_CT_ <|`0` _TWI_CT_EN_ ^| r/w <| TWI enable - <|`1` _TWI_CT_START_ ^| r/w <| generate START condition - <|`2` _TWI_CT_STOP_ ^| r/w <| generate STOP condition - <|`3` _TWI_CT_PRSC0_ ^| r/w .3+<| 3-bit clock prescaler select - <|`4` _TWI_CT_PRSC1_ ^| r/w - <|`5` _TWI_CT_PRSC2_ ^| r/w - <|`6` _TWI_CT_MACK_ ^| r/w <| generate controller ACK for each transmission ("MACK") - <|`7` _TWI_CT_CKSTEN_ ^| r/w <| allow clock-stretching by peripherals when set - <|`30` _TWI_CT_ACK_ ^| r/- <| ACK received when set - <|`31` _TWI_CT_BUSY_ ^| r/- <| transfer/START/STOP in progress when set -| `0xffffffb4` | _TWI_DATA_ |`7:0` _TWI_DATA_MSB_ : TWI_DATA_LSB_ | r/w | receive/transmit data -|======================= Index: src_adoc/soc_pwm.adoc =================================================================== --- src_adoc/soc_pwm.adoc (revision 59) +++ src_adoc/soc_pwm.adoc (nonexistent) @@ -1,65 +0,0 @@ -<<< -:sectnums: -==== Pulse-Width Modulation Controller (PWM) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_pwm.vhd | -| Software driver file(s): | neorv32_pwm.c | -| | neorv32_pwm.h | -| Top entity port: | `pwm_o` | 4-channel PWM output (1-bit per channel) -| Configuration generics: | _IO_PWM_EN_ | implement PWM controller when _true_ -| CPU interrupts: | none | -|======================= - -**Theory of Operation** - -The PWM controller implements a pulse-width modulation controller with four independent channels and 8- -bit resolution per channel. It is based on an 8-bit counter with four programmable threshold comparators that -control the actual duty cycle of each channel. The controller can be used to drive a fancy RGB-LED with 24- -bit true color, to dim LCD back-lights or even for "analog" control. An external integrator (RC low-pass filter) -can be used to smooth the generated "analog" signals. - -The PWM controller is activated by setting the _PWM_CT_EN_ bit in the module's control register _PWM_CT_. When this -bit is cleared, the unit is reset and all PWM output channels are set to zero. -The 8-bit duty cycle for each channel, which represents the channel's "intensity", is defined via the according 8-bit_ PWM_DUTY_CHx_ byte in the _PWM_DUTY_ register. -Based on the duty cycle _PWM_DUTY_CHx_ the according intensity of each channel can be computed by the following formula: - -_**Intensity~x~**_ = _PWM_DUTY_CHx_ / (2^8^) - -The frequency of the generated PWM signals is defined by the PWM operating clock. This clock is derived -from the main processor clock and divided by a prescaler via the 3-bit PWM_CT_PRSCx in the unit's control -register. The following prescalers are available: - -.PWM prescaler configuration -[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] -[options="header",grid="rows"] -|======================= -| **`PWM_CT_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` -| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 -|======================= - -The resulting PWM frequency is defined by: - -_**f~PWM~**_ = _f~main~[Hz]_ / (2^8^ * `clock_prescaler`) - -[TIP] -A more sophisticated frequency generation option is provided by by the numerically-controlled oscillator -module (see section <<_numerically_controller_oscillator_nco>>). - -<<< -.PWM register map -[cols="<4,<5,<10,^2,<11"] -[options="header",grid="all"] -|======================= -| Address | Name [C] | Bit(s), Name [C] | R/W | Function -.4+<| `0xffffffb8` .4+<| _PWM_CT_ <|`0` _PWM_CT_EN_ ^| r/w <| TWI enable - <|`1` _PWM_CT_PRSC0_ ^| r/w .3+<| 3-bit clock prescaler select - <|`2` _PWM_CT_PRSC1_ ^| r/w - <|`3` _PWM_CT_PRSC2_ ^| r/w -.4+<| `0xffffffbc` .4+<| _PWM_DUTY_ <|`7:0` _PWM_DUTY_CH0_MSB_ : _PWM_DUTY_CH0_LSB_ ^| r/w <| 8-bit duty cycle for channel 0 - <|`15:8` _PWM_DUTY_CH1_MSB_ : _PWM_DUTY_CH1_LSB_ ^| r/w <| 8-bit duty cycle for channel 1 - <|`23:16` _PWM_DUTY_CH2_MSB_ : _PWM_DUTY_CH2_LSB_ ^| r/w <| 8-bit duty cycle for channel 2 - <|`31:24` _PWM_DUTY_CH3_MSB_ : _PWM_DUTY_CH3_LSB_ ^| r/w <| 8-bit duty cycle for channel 3 -|======================= Index: src_adoc/index.adoc =================================================================== --- src_adoc/index.adoc (revision 59) +++ src_adoc/index.adoc (nonexistent) @@ -1,29 +0,0 @@ -= The NEORV32 RISC-V Processor -:author: Dipl.-Ing. Stephan Nolting -:email: stnolting@gmail.com -:description: A size-optimized, customizable and open-source full-scale 32-bit RISC-V soft-core CPU and SoC written in platform-independent VHDL. -:revnumber: v1.5.5.9 -:doctype: book -:sectnums: -:icons: font -:imagesdir: figures -:stem: -:reproducible: -:listing-caption: Listing -:toc: left -:toclevels: 4 -:title-logo-image: neorv32_logo_dark.png[pdfwidth=6.25in,align=center] -:favicon: figures/icon.png - -image::neorv32_logo_transparent.png[align=center] - -image::riscv_logo.png[width=350,align=center] - -[.text-center] -https://github.com/stnolting/neorv32[image:https://img.shields.io/badge/GitHub-stnolting%2Fneorv32-ffbd00?style=flat-square&logo=github&[title='homepage']] -https://github.com/stnolting/neorv32/blob/master/LICENSE[image:https://img.shields.io/github/license/stnolting/neorv32?longCache=true&style=flat-square[title='license']] -https://github.com/stnolting/neorv32/releases/tag/nightly[image:https://img.shields.io/badge/data%20sheet-PDF-ffbd00?longCache=true&style=flat-square&logo=asciidoctor[title='datasheet (pdf)']] -https://stnolting.github.io/neorv32/sw/files.html[image:https://img.shields.io/badge/doxygen-HTML-ffbd00?longCache=true&style=flat-square&logo=Doxygen[title='doxygen']] - - -include::content.adoc[] Index: src_adoc/content.adoc =================================================================== --- src_adoc/content.adoc (revision 59) +++ src_adoc/content.adoc (nonexistent) @@ -1,86 +0,0 @@ -<<< -// #################################################################################################################### -:sectnums!: -== Proprietary and Legal Notice - -* "GitHub" is a Subsidiary of Microsoft Corporation. -* "Vivado" and "Artix" are trademarks of Xilinx Inc. -* "AXI" and "AXI4-Lite" are trademarks of Arm Holdings plc. -* "ModelSim" is a trademark of Mentor Graphics – A Siemens Business. -* "Quartus Prime" and "Cyclone" are trademarks of Intel Corporation. -* "iCE40", "UltraPlus" and "Radiant" are trademarks of Lattice Semiconductor Corporation. -* "Windows" is a trademark of Microsoft Corporation. -* "Tera Term" copyright by T. Teranishi. -* Timing diagrams made with WaveDrom Editor. -* "NeoPixel" is a trademark of Adafruit Industries. - -Icons from https://www.flaticon.com and made by -link:https://www.freepik.com[Freepik], link:https://www.flaticon.com/authors/good-ware[Good Ware], -link:https://www.flaticon.com/authors/pixel-perfect[Pixel perfect], link:https://www.flaticon.com/authors/vectors-market[Vectors Market] - - -**Limitation of Liability for External Links** - -This document contains links to the websites of third parties ("external links"). As the content of these websites -is not under our control, we cannot assume any liability for such external content. In all cases, the provider of -information of the linked websites is liable for the content and accuracy of the information provided. At the -point in time when the links were placed, no infringements of the law were recognizable to us. As soon as an -infringement of the law becomes known to us, we will immediately remove the link in question. - -**Disclaimer** - -This project is released under the BSD 3-Clause license. No copyright infringement -intended. Other implied or used projects might have different licensing – see their documentation to get more information. - -**A big shoutout to all https://github.com/stnolting/neorv32/graphs/contributors[contributors], -who helped improving this project! ❤️** - -<<< -:sectnums!: -== BSD 3-Clause License -Copyright (c) 2021, Stephan Nolting. All rights reserved. - -Redistribution and use in source and binary forms, with or without modification, are permitted provided that -the following conditions are met: - -. Redistributions of source code must retain the above copyright notice, this list of conditions and the -following disclaimer. -. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and -the following disclaimer in the documentation and/or other materials provided with the distribution. -. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or -promote products derived from this software without specific prior written permission. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" -AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE -IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE -ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE -LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR -CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF -SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS -INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN -CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) -ARISING IN ANY WAY OUT OF - - -========================== -**The NEORV32 RISC-V Processor** + -Copyright (c) 2021, by Dipl.-Ing. Stephan Nolting. All rights reserved. + -HQ: https://github.com/stnolting/neorv32 + -Contact: stnolting@gmail.com + -_made in Hanover, Germany_ -========================== - - -// #################################################################################################################### - -include::overview.adoc[] - -include::soc.adoc[] - -include::cpu.adoc[] - -include::software.adoc[] - -include::on_chip_debugger.adoc[] - -include::getting_started.adoc[] Index: src_adoc/soc_neoled.adoc =================================================================== --- src_adoc/soc_neoled.adoc (revision 59) +++ src_adoc/soc_neoled.adoc (nonexistent) @@ -1,193 +0,0 @@ -<<< -:sectnums: -==== Smart LED Interface (NEOLED) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_neoled.vhd | -| Software driver file(s): | neorv32_neoled.c | -| | neorv32_neoled.h | -| Top entity port: | `neoled_o` | 1-bit serial data -| Configuration generics: | _IO_NEOLED_EN_ | implement NEOLED when _true_ -| CPU interrupts: | fast IRQ channel 9 | NEOLED interrupt (see <<_processor_interrupts>>) -|======================= - -**Theory of Operation** - -The NEOLED module provides a dedicated interface for "smart RGB LEDs" like the WS2812 or WS2811. -These LEDs provide a single interface wire that uses an asynchronous serial protocol for transmitting color -data. Basically, data is transferred via LED-internal shift registers, which allows to cascade an unlimited -number of smart LEDs. The protocol provides a RESET command to strobe the transmitted data into the -LED PWM driver registers after data has shifted throughout all LEDs in a chain. - -[NOTE] -The NEOLED interface is compatible to the "Adafruit Industries NeoPixel" products, which feature -WS2812 (or older WS2811) smart LEDs (see link:https://learn.adafruit.com/adafruit-neopixel-uberguide). - -The interface provides a single 1-bit output `neoled_o` to drive an arbitrary number of LEDs. Since the -NEOLED module provides 24-bit and 32-bit operating modes, a mixed setup with RGB LEDs (24-bit color) -and RGBW LEDs (32-bit color including a dedicated white LED chip) is also possible. - -**Theory of Operation – Protocol** - -The interface of the WS2812 LEDs uses an 800kHz carrier signal. Data is transmitted in a serial manner -starting with LSB-first. The intensity for each R, G & B LED chip (= color code) is defined via an 8-bit -value. The actual data bits are transferred by modifying the duty cycle of the signal (the timings for the -WS2812 are shown below). A RESET command is "send" by pulling the data line LOW for at least 50μs. - -.WS2812 bit-level protocol - taken from the "Adafruit NeoPixel Überguide" -image::neopixel.png[align=center] - -.WS2812 interface timing -[cols="<2,<2,<6"] -[grid="all"] -|======================= -| T~total~ (T~carrier~) | 1.25μs +/- 300ns | period for a single bit -| T~0H~ | 0.4μs +/- 150ns | high-time for sending a `1` -| T~0L~ | 0.8μs +/- 150ns | low-time for sending a `1` -| T~1H~ | 0.85μs +/- 150ns | high-time for sending a `0` -| T~1L~ | 0.45μs +/- 150 ns | low-time for sending a `0` -| RESET | Above 50μs | low-time for sending a RESET command -|======================= - -**Theory of Operation – NEOLED Module** - -The NEOLED modules provides two accessible interface register: the control register _NEOLED_CT_ and the -TX data register _NEOLED_DATA_. The NEOLED module is globally enabled via the control register's -_NEOLED_CT_EN_ bit. Clearing this bit will terminate any current operation, reset the module and -set the `neoled_o` output to zero. The precise timing (implementing the **WS2812** protocol) and transmission -mode are fully programmable via the _NEOLED_CT_ register to provide maximum flexibility. - -**Timing Configuration** - -The basic carrier frequency (800kHz for the WS2812 LEDs) is configured via a 3-bit main clock prescaler (_NEOLED_CT_PRSCx_, see table below) -that scales the main processor clock f~main~ and a 5-bit cycle multiplier _NEOLED_CT_T_TOT_x_. - -.NEOLED prescaler configuration -[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] -[options="header",grid="rows"] -|======================= -| **`NEOLED_CT_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` -| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 -|======================= - -The duty-cycles (or more precisely: the high- and low-times for sending either a '1' bit or a '0' bit) are -defined via the 5-bit _NEOLED_CT_T_ONE_H_x_ and _NEOLED_CT_T_ZERO_H_x_ values, respecively. These programmable -timing constants allow to adapt the interface for a wide variety of smart LED protocol (for example WS2812 vs. -WS2811). - -**Timing Configuration – Example (WS2812)** - -Generate the base clock f~TX~ for the NEOLED TX engine: - -* processor clock f~main~ = 100 MHz -* _NEOLED_CT_PRSCx_ = `0b001` = f~main~ / 4 - -_**f~TX~**_ = _f~main~[Hz]_ / `clock_prescaler` = 100MHz / 4 = 25MHz - -_**T~TX~**_ = 1 / _**f~TX~**_ = 40ns - -Generate carrier period (T~carrier~) and *high-times* (duty cycle) for sending `0` (T~0H~) and `1` (T~1H~) bits: - -* _NEOLED_CT_T_TOT_ = `0b11110` (= decimal 30) -* _NEOLED_CT_T_ZERO_H_ = `0b01010` (= decimal 10) -* _NEOLED_CT_T_ONE_H_ = `0b10100` (= decimal 20) - -_**T~carrier~**_ = _**T~TX~**_ * _NEOLED_CT_T_TOT_ = 40ns * 30 = 1.4µs - -_**T~0H~**_ = _**T~TX~**_ * _NEOLED_CT_T_ZERO_H_ = 40ns * 10 = 0.4µs - -_**T~1H~**_ = _**T~TX~**_ * _NEOLED_CT_T_ONE_H_ = 40ns * 20 = 0.8µs - -[TIP] -The NEOLED SW driver library (`neorv32_neoled.h`) provides a simplified configuration -function that configures all timing parameters for driving WS2812 LEDs based on the processor -clock configuration. - -**RGB / RGBW Configuration** - -NeoPixel are available in two "color" version: LEDs with three chips providing RGB color and LEDs with -four chips providing RGB color plus a dedicated white LED chip (= RGBW). Since the intensity of every -LED chip is defined via an 8-bit value the RGB LEDs require a frame of 24-bit per module and the RGBW -LEDs require a frame of 32-bit per module. - -The data transfer quantity of the NEOLED module can be configured via the _NEOLED_MODE_EN_ control -register bit. If this bit is cleared, the NEOLED interface operates in 24-bit mode and will transmit bits `23:0` of -the data written to _NEOLED_DATA_. If _NEOLED_MODE_EN_ is set, the NEOLED interface operates in 32-bit -mode and will transmit bits `31:0` of the data written to _NEOLED_DATA_. - -**TX Data FIFO** - -The interface features a TX data buffer (a FIFO) to allow CPU-independent operation. The buffer depth -is configured via the `tx_buffer_entries_c` constant (default = 4 entries) in the module's VHDL source -file `rtl/core/neorv32_neoled.vhd`. The current configuration can be read via the _NEOLED_CT_BUFS_x_ -control register bits, which result log2(`tx_buffer_entries_c`). - -When writing data to the _NEOLED_DATA_ register the data is automatically written to the TX buffer. Whenever -data is available in the buffer the serial transmission engine will take it and transmit it to the LEDs. - -The data transfer size (_NEOLED_MODE_EN_) can be modified at every time since this control register bit is also buffered -in the FIFO. This allows to arbitrarily mixing RGB and RGBW LEDs in the chain. - -[WARNING] -Please note that the timing configurations (_NEOLED_CT_PRSCx_, _NEOLED_CT_T_TOT_x_, -_NEOLED_CT_T_ONE_H_x_ and _NEOLED_CT_T_ZERO_H_x_) are NOT stored to the buffer. Changing -these value while the buffer is not empty or the TX engine is still sending will cause data corruption. - -**Status Configuration** - -The NEOLED modules features two read-only status bits in the control register: _NEOLED_CT_BUSY_ and -_NEOLED_CT_TX_STATUS_. - -If the _NEOLED_CT_TX_STATUS_ is set the serial TX engine is still busy sending serial data to the LED stripes. -If the flag is cleared, the TX engine is idle and the serial data output `neoled_o` is set LOW. - -The _NEOLED_CT_BUSY_ flag provides a programmable option to check for the TX buffer state. The control -register's _NEOLED_CT_BSCON_ bit is used to configure the "meaning" of the _NEOLED_CT_BUSY_ flag. The -condition for sending an interrupt request (IRQ) to the CPU is also configured via the _NEOLED_CT_BSCON_ -bit. - -[cols="^5,^8,^8"] -[options="header",grid="all"] -|======================= -| _NEOLED_CT_BSCON_ | _NEOLED_CT_BUSY_ | Sending an IRQ when ... -| 0 | the busy flag will clear if there **IS at least one free entry** in the TX buffer | the IRQ will fire if **at least one entry GETS free** in the TX buffer -| 1 | the busy flag will clear if the **whole TX buffer IS empty** | the IRQ will fire if the **whole TX buffer GETS empty** -|======================= - -When _NEOLED_CT_BSCON_ is set, the CPU can write up to `tx_buffer_entries_c` of new data words to -_NEOLED_DATA_ without checking the busy flag _NEOLED_CT_BUSY_. This highly relaxes time constraints for -sending a continuous data stream to the LEDs (as an idle time beyond 50μs will trigger the LED's a RESET -command). - -<<< -.NEOLED register map -[cols="<4,<5,<9,^2,<9"] -[options="header",grid="all"] -|======================= -| Address | Name [C] | Bit(s), Name [C] | R/W | Function -.22+<| `0xffffffd8` .22+<| _NEOLED_CT_ <|`0` _NEOLED_CT_EN_ ^| r/w <| NCO enable - <|`1` _NEOLED_CT_MODE_ ^| r/w <| data transfer size; `0`=24-bit; `1`=32-bit - <|`2` _NEOLED_CT_BSCON_ ^| r/w <| busy flag / IRQ trigger configuration (see table above) - <|`3` _NEOLED_CT_PRSC0_ ^| r/w <| 3-bit clock prescaler, bit 0 - <|`4` _NEOLED_CT_PRSC1_ ^| r/w <| 3-bit clock prescaler, bit 1 - <|`5` _NEOLED_CT_PRSC2_ ^| r/w <| 3-bit clock prescaler, bit 2 - <|`6` _NEOLED_CT_BUFS0_ ^| r/- .4+<| 4-bit log2(`tx_buffer_entries_c`) - <|`7` _NEOLED_CT_BUFS1_ ^| r/- - <|`8` _NEOLED_CT_BUFS2_ ^| r/- - <|`9` _NEOLED_CT_BUFS3_ ^| r/- - <|`10` _NEOLED_CT_T_TOT_0_ ^| r/w .5+| 5-bit pulse clock ticks per total single-bit period (T~total~) - <|`11` _NEOLED_CT_T_TOT_1_ ^| r/w - <|`12` _NEOLED_CT_T_TOT_2_ ^| r/w - <|`13` _NEOLED_CT_T_TOT_3_ ^| r/w - <|`14` _NEOLED_CT_T_TOT_4_ ^| r/w - <|`20` _NEOLED_CT_ONE_H_0_ ^| r/w .5+<| 5-bit pulse clock ticks per high-time for sending a one-bit (T~H1~) - <|`21` _NEOLED_CT_ONE_H_1_ ^| r/w - <|`22` _NEOLED_CT_ONE_H_2_ ^| r/w - <|`23` _NEOLED_CT_ONE_H_3_ ^| r/w - <|`24` _NEOLED_CT_ONE_H_4_ ^| r/w - <|`30` _NEOLED_CT_TX_STATUS_ ^| r/- <| transmit engine busy when `1` - <|`31` _NEOLED_CT_BUSY_ ^| r/- <| busy / buffer status flag; configured via _NEOLED_CT_BSCON_ (see table above) -| `0xffffffdc` | _NEOLED_DATA_ <|`31:0` / `23:0` ^| -/w <| TX data (32-/24-bit) -|======================= Index: src_adoc/soc_trng.adoc =================================================================== --- src_adoc/soc_trng.adoc (revision 59) +++ src_adoc/soc_trng.adoc (nonexistent) @@ -1,84 +0,0 @@ -<<< -:sectnums: -==== True Random-Number Generator (TRNG) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_trng.vhd | -| Software driver file(s): | neorv32_trng.c | -| | neorv32_trng.h | -| Top entity port: | none | -| Configuration generics: | _IO_TRNG_EN_ | implement TRNG when _true_ -| CPU interrupts: | none | -|======================= - -**Theory of Operation** - -The NEORV32 true random number generator provides _physical true random numbers_ for your application. -Instead of using a pseudo RNG like a LFSR, the TRNG of the processor uses a simple, straight-forward ring -oscillator as physical entropy source. Hence, voltage and thermal fluctuations are used to provide true -physical random data. - -[NOTE] -The TRNG features a platform independent architecture without FPGA-specific primitives, macros or -attributes. - -**Architecture** - -The NEORV32 TRNG is based on simple ring oscillators, which are implemented as an inverter chain with -an odd number of inverters. A **latch** is used to decouple each individual inverter. Basically, this architecture -is some king of asynchronous LFSR. - -The output of several ring oscillators are synchronized using two registers and are XORed together. The -resulting output is de-biased using a von-Neumann randomness extractor. This de-biased output is further -processed by a simple 8-bit Fibonacci LFSR to improve whitening. After at least 8 clock cycles the state of -the LFSR is sampled and provided as final data output. - -To prevent the synthesis tool from doing logic optimization and thus, removing all but one inverter, the -TRNG uses simple latches to decouple an inverter and its actual output. The latches are reset when the -TRNG is disabled and are enabled one by one by a "real" shift register when the TRNG is activated. This -construct can be synthesized for any FPGA platform. Thus, the NEORV32 TRNG provides a platform -independent architecture. - -**TRNG Configuration** - -The TRNG uses several ring-oscillators, where the next oscillator provides a slightly longer chain (more -inverters) than the one before. This increment is constant for all implemented oscillators. This setup can be -customized by modifying the "Advanced Configuration" constants in the TRNG's VHDL file: - -* The `num_roscs_c` constant defines the total number of ring oscillators in the system. num_inv_start_c -defines the number of inverters used by the first ring oscillators (has to be an odd number). Each additional -ring oscillator provides `num_inv_inc_c` more inverters that the one before (has to be an even number). -* The LFSR-based post-processing can be deactivated using the `lfsr_en_c` constant. The polynomial tap -mask of the LFSR can be customized using `lfsr_taps_c`. - -**Using the TRNG** - -The TRNG features a single register for status and data access. When the _TRNG_CT_EN_ control register bit is -set, the TRNG is enabled and starts operation. As soon as the _TRNG_CT_VALID_ bit is set, the currently -sampled 8-bit random data byte can be obtained from the lowest 8 bits of the TRNG_CT register -(_TRNG_CT_DATA_MSB_ : _TRNG_CT_DATA_LSB_). The _TRNG_CT_VALID_ bit is automatically cleared -when reading the control register. - -[IMPORTANT] -The TRNG needs at least 8 clock cycles to generate a new random byte. During this sampling time -the current output random data is kept stable in the output register until a valid sampling of the new byte has -completed. - -Randomness "Quality" -I have not verified the quality of the generated random numbers (for example using NIST test suites). The -quality is highly effected by the actual configuration of the TRNG and the resulting FPGA mapping/routing. -However, generating larger histograms of the generated random number shows an equal distribution (binary -average of the random numbers = 127). A simple evaluation test/demo program can be found in -`sw/example/demo_trng`. - -.TRNG register map -[cols="<2,<2,<4,^1,<7"] -[options="header",grid="all"] -|======================= -| Address | Name [C] | Bit(s), Name [C] | R/W | Function -.3+<| `0xffffff88` .3+<| _TRNG_CT_ <|`7:0` _TRNG_CT_DATA_MSB_ : _TRNG_CT_DATA_MSB_ ^| r/- <| 8-bit random data output - <|`30` _TRNG_CT_EN_ ^| r/w <| TRNG enable - <|`31` _TRNG_CT_VALID_ ^| r/- <| random data output is valid when set -|======================= Index: src_adoc/soc_mtime.adoc =================================================================== --- src_adoc/soc_mtime.adoc (revision 59) +++ src_adoc/soc_mtime.adoc (nonexistent) @@ -1,50 +0,0 @@ -<<< -:sectnums: -==== Machine System Timer (MTIME) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_mtime.vhd | -| Software driver file(s): | neorv32_mtime.c | -| | neorv32_mtime.h | -| Top entity port: | `mtime_i` | System time input from external MTIME -| | `mtime_o` | System time output (64-bit) for SoC -| Configuration generics: | _IO_MTIME_EN_ | implement MTIME when _true_ -| CPU interrupts: | `MTI` | machine timer interrupt (see <<_processor_interrupts>>) -|======================= - -**Theory of Operation** - -The MTIME machine system timer implements the memory-mapped MTIME timer from the official RISC-V -specifications. This unit features a 64-bit system timer incremented with the primary processor clock. -The current system time can also be obtained using the `time[h]` CSRs and is made available for processor-external -use via the top's `mtime_o` signal. - -[NOTE] -If the processor-internal **MTIME unit is NOT implemented**, the top's `mtime_i` input signal is used to update the `time[h]` CSRs -and the `MTI` machine timer interrupt) CPU interrupt is directly connected to the top's `mtime_irq_i` input. - -The 64-bit system time can be accessed via the `MTIME_LO` and `MTIME_HI` memory-mapped registers (read/write) and also via -the CPU's `time[h]` CSRs (read-only). A 64-bit time compare register – accessible via memory-mapped `MTIMECMP_LO` and `MTIMECMP_HI` -registers – are used to configure an interrupt to the CPU. The interrupt is triggered -whenever `MTIME` (high & low part) >= `MTIMECMP` (high & low part) and is directly forwarded to the CPU's `MTI` interrupt. - -[TIP] -The interrupt request is a single-shot signal, -so the CPU is triggered once if the system time is greater than or equal to the compare time. Hence, -another MTIME IRQ is only possible when updating `MTIMECMP`. - -The 64-bit counter and the 64-bit comparator are implemented as 2×32-bit counters and comparators with a -registered carry to prevent a 64-bit carry chain and thus, to simplify timing closure. - -.MTIME register map -[cols="<3,<3,^1,^1,<6"] -[options="header",grid="all"] -|======================= -| Address | Name [C] | Bits | R/W | Function -| `0xffffff90` | _MTIME_LO_ | 31:0 | r/w | machine system time, low word -| `0xffffff94` | _MTIME_HI_ | 31:0 | r/w | machine system time, high word -| `0xffffff98` | _MTIMECMP_LO_ | 31:0 | r/w | time compare, low word -| `0xffffff9c` | _MTIMECMP_HI_ | 31:0 | r/w | time compare, high word -|======================= Index: src_adoc/soc_uart.adoc =================================================================== --- src_adoc/soc_uart.adoc (revision 59) +++ src_adoc/soc_uart.adoc (nonexistent) @@ -1,216 +0,0 @@ -<<< -:sectnums: -==== Primary Universal Asynchronous Receiver and Transmitter (UART0) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_uart.vhd | -| Software driver file(s): | neorv32_uart.c | -| | neorv32_uart.h | -| Top entity port: | `uart0_txd_o` | serial transmitter output UART0 -| | `uart0_rxd_i` | serial receiver input UART0 -| | `uart0_rts_o` | flow control: RX ready to receive -| | `uart0_cts_i` | flow control: TX allowed to send -| Configuration generics: | _IO_UART0_EN_ | implement UART0 when _true_ -| CPU interrupts: | fast IRQ channel 2 | RX done interrupt -| | fast IRQ channel 3 | TX done interrupt (see <<_processor_interrupts>>) -|======================= - -[IMPORTANT] -Please note that ALL default example programs and software libraries of the NEORV32 software -framework (including the bootloader and the runtime environment) use the primary UART -(_UART0_) as default user console interface. For compatibility, all C-language function calls to -`neorv32_uart_*` are mapped to the according primary UART (_UART0_) `neorv32_uart0_*` -functions. - -**Theory of Operation** - -In most cases, the UART is a standard interface used to establish a communication channel between the -computer/user and an application running on the processor platform. The NEORV32 UARTs features a -standard configuration frame configuration: 8 data bits, an optional parity bit (even or odd) and 1 stop bit. -The parity and the actual Baudrate are configurable by software. - -The UART0 is enabled by setting the _UART_CT_EN_ bit in the UART control register _UART0_CT_. The actual -transmission Baudrate (like 19200) is configured via the 12-bit _UART_CT_BAUDxx_ baud prescaler (`baud_rate`) and the -3-bit _UART_CT_PRSCx_ clock prescaler. - -.UART prescaler configuration -[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] -[options="header",grid="rows"] -|======================= -| **`UART_CT_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` -| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 -|======================= - -_**Baudrate**_ = (_f~main~[Hz]_ / `clock_prescaler`) / (`baud_rate` + 1) - -A new transmission is started by writing the data byte to be send to the lowest byte of the _UART0_DATA_ register. The -transfer is completed when the _UART_CT_TX_BUSY_ control register flag returns to zero. A new received byte -is available when the _UART_DATA_AVAIL_ flag of the UART0_DATA register is set. A "frame error" in a received byte -(broken stop bit) is indicated via the _UART_DATA_FERR_ flag in the UART0_DATA register. - -**RX Double-Buffering** - -The UART receive engine provides a simple data buffer with two entries. These two entries are transparent -for the user. The transmitting device can send up to 2 chars to the UART without risking data loss. If another -char is sent before at least one char has been read from the buffer data loss occurs. This situation can be -detected via the receiver overrun flag _UART_DATA_OVERR_ in the _UART0_DATA_ register. The flag is -automatically cleared after reading _UART0_DATA_. - -**Parity Modes** - -The parity flag is added if the _UART_CT_PMODE1_ flag is set. When _UART_CT_PMODE0_ is zero the UART -operates in "even parity" mode. If this flag is set, the UART operates in "odd parity" mode. Parity errors in -received data are indicated via the _UART_DATA_PERR_ flag in the _UART_DATA_ registers. This flag is updated with each new -received character. A frame error in the received data (i.e. stop bit is not set) is indicated via the -_UART_DATA_FERR_ flag in the _UART0_DATA_. This flag is also updated with each new received character - -**Hardware Flow Control – RTS/CTS** - -The UART supports hardware flow control using the standard CTS (clear to send) and/or RTS (ready to send -/ ready to receive "RTR") signals. Both hardware control flow mechanisms can be individually enabled. - -If **RTS hardware flow control** is enabled by setting the _UART_CT_RTS_EN_ control register flag, the UART -will pull the `uart0_rts_o` signal low if the UART's receiver is idle and no received data is waiting to get read by -application software. As long as this signal is low the connected device can send new data. `uart0_rts_o` is always LOW if the UART is disabled. - -The RTS line is de-asserted (going high) as soon as the start bit of a new incoming char has been -detected. The transmitting device continues sending the current char and can also send another char -(due to the RX double-buffering), which is done by most terminal programs. Any additional data send -when RTS is still asserted will override the RX input buffer causing data loss. This will set the _UART_DATA_OVERR_ flag in the -_UART0_DATA_ register. Any read access to this register clears the flag again. - -If **CTS hardware flow control** is enabled by setting the _UART_CT_CTS_EN_ control register flag, the UART's -transmitter will not start sending a new char until the `uart0_cts_i` signal goes low. If a new data to be -send is written to the UART data register while `uart0_cts_i` is not asserted (=low), the UART will wait for -`uart0_cts_i` to become asserted (=high) before sending starts. During this time, the UART busy flag -_UART_CT_TX_BUSY_ remains set. - -If `uart0_cts_i` is asserted, no new data transmission will be started by the UART. The state of the `uart0_cts_i` -signals has no effect on a transmission being already in progress. - -Signal changes on `uart0_cts_i` during an active transmission are ignored. Application software can check -the current state of the `uart0_cts_o` input signal via the _UART_CT_CTS_ control register flag. - -[TIP] -Please note that – just like the RXD and TXD signals – the RTS and CTS signals have to be **cross**-coupled -between devices. - -**Interrupts** - -The UART features two interrupts: the "TX done interrupt" is triggered when a transmit operation (sending) has finished. The "RX -done interrupt" is triggered when a data byte has been received. If the UART0 is not implemented, the UART0 interrupts are permanently tied to zero. - -[NOTE] -The UART's RX interrupt is always triggered when a new data word has arrived – regardless of the -state of the RX double-buffer. - -**Simulation Mode** - -The default UART0 operation will transmit any data written to the _UART0_DATA_ register via the serial TX line at -the defined baud rate. Even though the default testbench provides a simulated UART0 receiver, which -outputs any received char to the simulator console, such a transmission takes a lot of time. To accelerate -UART0 output during simulation (and also to dump large amounts of data for further processing like -verification) the UART0 features a **simulation mode**. - -The simulation mode is enabled by setting the _UART_CT_SIM_MODE_ bit in the UART0's control register -_UART0_CT_. Any other UART0 configuration bits are irrelevant, but the UART0 has to be enabled via the -_UART_CT_EN_ bit. When the simulation mode is enabled, any written char to _UART0_DATA_ (bits 7:0) is -directly output as ASCII char to the simulator console. Additionally, all text is also stored to a text file -`neorv32.uart0.sim_mode.text.out` in the simulation home folder. Furthermore, the whole 32-bit word -written to _UART0_DATA_ is stored as plain 8-char hexadecimal value to a second text file -`neorv32.uart0.sim_mode.data.out` also located in the simulation home folder. - -If the UART is configured for simulation mode there will be **NO physical UART0 transmissions via -`uart0_txd_o`** at all. Furthermore, no interrupts (RX done or TX done) will be triggered in any situation. - -[TIP] -More information regarding the simulation-mode of the UART0 can be found in section <<_simulating_the_processor>>. - -.UART0 register map -[cols="<6,<7,<10,^2,<18"] -[options="header",grid="all"] -|======================= -| Address | Name [C] | Bit(s), Name [C] | R/W | Function -.12+<| `0xffffffa0` .12+<| _UART0_CT_ <|`11:0` _UART_CT_BAUDxx_ ^| r/w <| 12-bit BAUD value configuration value - <|`12` _UART_CT_SIM_MODE_ ^| r/w <| enable **simulation mode** - <|`20` _UART_CT_RTS_EN_ ^| r/w <| enable RTS hardware flow control - <|`21` _UART_CT_CTS_EN_ ^| r/w <| enable CTS hardware flow control - <|`22` _UART_CT_PMODE0_ ^| r/w .2+<| parity bit enable and configuration (`00`/`01`= no parity; `10`=even parity; `11`=odd parity) - <|`23` _UART_CT_PMODE1_ ^| r/w - <|`24` _UART_CT_PRSC0_ ^| r/w .3+<| 3-bit baudrate clock prescaler select - <|`25` _UART_CT_PRSC1_ ^| r/w - <|`26` _UART_CT_PRSC2_ ^| r/w - <|`27` _UART_CT_CTS_ ^| r/- <| current state of UART's CTS input signal - <|`28` _UART_CT_EN_ ^| r/w <| UART enable - <|`31` _UART_CT_TX_BUSY_ ^| r/- <| trasmitter busy flag -.6+<| `0xffffffa4` .6+<| _UART0_DATA_ <|`7:0` _UART_DATA_MSB_ : _UART_DATA_LSB_ ^| r/w <| receive/transmit data (8-bit) - <|`31:0` - ^| -/w <| **simulation data output** - <|`28` _UART_DATA_PERR_ ^| r/- <| RX parity error - <|`29` _UART_DATA_FERR_ ^| r/- <| RX data frame error (stop bit nt set) - <|`30` _UART_DATA_OVERR_ ^| r/- <| RX data overrun - <|`31` _UART_DATA_AVAIL_ ^| r/- <| RX data available when set -|======================= - - - -<<< -// #################################################################################################################### -:sectnums: -==== Secondary Universal Asynchronous Receiver and Transmitter (UART1) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_uart.vhd | -| Software driver file(s): | neorv32_uart.c | -| | neorv32_uart.h | -| Top entity port: | `uart1_txd_o` | serial transmitter output UART1 -| | `uart1_rxd_i` | serial receiver input UART1 -| | `uart1_rts_o` | flow control: RX ready to receive -| | `uart1_cts_i` | flow control: TX allowed to send -| Configuration generics: | _IO_UART1_EN_ | implement UART1 when _true_ -| CPU interrupts: | fast IRQ channel 4 | RX done interrupt -| | fast IRQ channel 5 | TX done interrupt (see <<_processor_interrupts>>) -|======================= - -**Theory of Operation** - -The secondary UART (UART1) is functional identical to the primary UART (<<_primary_universal_asynchronous_receiver_and_transmitter_uart0>>). -Obviously, UART1 has different addresses for -thw control register (_UART1_CT_) and the data register (_UART1_DATA_) – see the register map below. However, the -register bits/flags use the same bit positions and naming. Furthermore, the "RX done" and "TX done" interrupts are -mapped to different CPU fast interrupt channels. - -**Simulation Mode** - -The secondary UART (UART1) provides the same simulation options as the primary UART. However, -output data is written to UART1-specific files: `neorv32.uart1.sim_mode.text.out` is used to store -plain ASCII text and `neorv32.uart1.sim_mode.data.out` is used to store full 32-bit hexadecimal -encoded data words. - -.UART1 register map -[cols="<6,<7,<10,^2,<18"] -[options="header",grid="all"] -|======================= -| Address | Name [C] | Bit(s), Name [C] | R/W | Function -.12+<| `0xffffffd0` .12+<| _UART1_CT_ <|`11:0` _UART_CT_BAUDxx_ ^| r/w <| 12-bit BAUD value configuration value - <|`12` _UART_CT_SIM_MODE_ ^| r/w <| enable **simulation mode** - <|`20` _UART_CT_RTS_EN_ ^| r/w <| enable RTS hardware flow control - <|`21` _UART_CT_CTS_EN_ ^| r/w <| enable CTS hardware flow control - <|`22` _UART_CT_PMODE0_ ^| r/w .2+<| parity bit enable and configuration (`00`/`01`= no parity; `10`=even parity; `11`=odd parity) - <|`23` _UART_CT_PMODE1_ ^| r/w - <|`24` _UART_CT_PRSC0_ ^| r/w .3+<| 3-bit baudrate clock prescaler select - <|`25` _UART_CT_PRSC1_ ^| r/w - <|`26` _UART_CT_PRSC2_ ^| r/w - <|`27` _UART_CT_CTS_ ^| r/- <| current state of UART's CTS input signal - <|`28` _UART_CT_EN_ ^| r/w <| UART enable - <|`31` _UART_CT_TX_BUSY_ ^| r/- <| trasmitter busy flag -.6+<| `0xffffffd4` .6+<| _UART1_DATA_ <|`7:0` _UART_DATA_MSB_ : _UART_DATA_LSB_ ^| r/w <| receive/transmit data (8-bit) - <|`31:0` - ^| -/w <| **simulation data output** - <|`28` _UART_DATA_PERR_ ^| r/- <| RX parity error - <|`29` _UART_DATA_FERR_ ^| r/- <| RX data frame error (stop bit nt set) - <|`30` _UART_DATA_OVERR_ ^| r/- <| RX data overrun - <|`31` _UART_DATA_AVAIL_ ^| r/- <| RX data available when set -|======================= Index: src_adoc/soc_cfs.adoc =================================================================== --- src_adoc/soc_cfs.adoc (revision 59) +++ src_adoc/soc_cfs.adoc (nonexistent) @@ -1,103 +0,0 @@ -<<< -:sectnums: -==== Custom Functions Subsystem (CFS) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_gfs.vhd | -| Software driver file(s): | neorv32_gfs.c | -| | neorv32_gfs.h | -| Top entity port: | `cfs_in_i` | custom input conduit -| | `cfs_out_o` | custom output conduit -| Configuration generics: | _IO_CFS_EN_ | implement CFS when _true_ -| | _IO_CFS_CONFIG_ | custom generic conduit -| | _IO_CFS_IN_SIZE_ | size of `cfs_in_i` -| | _IO_CFS_OUT_SIZE_ | size of `cfs_out_o` -| CPU interrupts: | fast IRQ channel 1 | CFS interrupt (see <<_processor_interrupts>>) -|======================= - -**Theory of Operation** - -The custom functions subsystem can be used to implement application-specific user-defined co-processors -(like encryption or arithmetic accelerators) or peripheral/communication interfaces. In contrast to connecting -custom hardware accelerators via the external memory interface, the CFS provide a convenient and low-latency -extension and customization option. - -The CFS provides up to 32x 32-bit memory-mapped registers (see register map table below). The actual -functionality of these register has to be defined by the hardware designer. - -[INFO] -Take a look at the template CFS VHDL source file (`rtl/core/neorv32_cfs.vhd`). The file is highly -commented to illustrate all aspects that are relevant for implementing custom CFS-based co-processor designs. - -**CFS Software Access** - -The CFS memory-mapped registers can be accessed by software using the provided C-language aliases (see -register map table below). Note that all interface registers provide 32-bit access data of type `uint32_t`. - -[source,c] ----- -// C-code CFS usage example -CFS_REG_0 = (uint32_t)some_data_array(i); // write to CFS register 0 -uint32_t temp = CFS_REG_20; // read from CFS register 20 ----- - -**CFS Interrupt** - -The CFS provides a single one-shot interrupt request signal mapped to the CPU's fast interrupt channel 1. -See section <<_processor_interrupts>> for more information. - -**CFS Configuration Generic** - -By default, the CFS provides a single 32-bit `std_(u)logic_vector` configuration generic _IO_CFS_CONFIG_ -that is available in the processor's top entity. This generic can be used to pass custom configuration options -from the top entity down to the CFS entity. - -**CFS Custom IOs** - -By default, the CFS also provides two unidirectional input and output conduits `cfs_in_i` and `cfs_out_o`. -These signals are propagated to the processor's top entity. The actual use of these signals has to be defined -by the hardware designer. The size of the input signal conduit `cfs_in_i` is defined via the (top's) _IO_CFS_IN_SIZE_ configuration -generic (default = 32-bit). The size of the output signal conduit `cfs_out_o` is defined via the (top's) -_IO_CFS_OUT_SIZE_ configuration generic (default = 32-bit). If the custom function subsystem is not implemented -(_IO_CFS_EN_ = false) the `cfs_out_o` signal is tied to all-zero. - -.CFS register map -[cols="^4,<5,^2,^3,<14"] -[options="header",grid="all"] -|======================= -| Address | Name [C] | Bit(s) | R/W | Function -| `0xffffff00` | _CFS_REG_0_ |`31:0` | (r)/(w) | custom CFS interface register 0 -| `0xffffff04` | _CFS_REG_1_ |`31:0` | (r)/(w) | custom CFS interface register 1 -| `0xffffff08` | _CFS_REG_2_ |`31:0` | (r)/(w) | custom CFS interface register 2 -| `0xffffff0c` | _CFS_REG_3_ |`31:0` | (r)/(w) | custom CFS interface register 3 -| `0xffffff10` | _CFS_REG_4_ |`31:0` | (r)/(w) | custom CFS interface register 4 -| `0xffffff14` | _CFS_REG_5_ |`31:0` | (r)/(w) | custom CFS interface register 5 -| `0xffffff18` | _CFS_REG_6_ |`31:0` | (r)/(w) | custom CFS interface register 6 -| `0xffffff1c` | _CFS_REG_7_ |`31:0` | (r)/(w) | custom CFS interface register 7 -| `0xffffff20` | _CFS_REG_8_ |`31:0` | (r)/(w) | custom CFS interface register 8 -| `0xffffff24` | _CFS_REG_9_ |`31:0` | (r)/(w) | custom CFS interface register 9 -| `0xffffff28` | _CFS_REG_10_ |`31:0` | (r)/(w) | custom CFS interface register 10 -| `0xffffff2c` | _CFS_REG_11_ |`31:0` | (r)/(w) | custom CFS interface register 11 -| `0xffffff30` | _CFS_REG_12_ |`31:0` | (r)/(w) | custom CFS interface register 12 -| `0xffffff34` | _CFS_REG_13_ |`31:0` | (r)/(w) | custom CFS interface register 13 -| `0xffffff38` | _CFS_REG_14_ |`31:0` | (r)/(w) | custom CFS interface register 14 -| `0xffffff3c` | _CFS_REG_15_ |`31:0` | (r)/(w) | custom CFS interface register 15 -| `0xffffff40` | _CFS_REG_16_ |`31:0` | (r)/(w) | custom CFS interface register 16 -| `0xffffff44` | _CFS_REG_17_ |`31:0` | (r)/(w) | custom CFS interface register 17 -| `0xffffff48` | _CFS_REG_18_ |`31:0` | (r)/(w) | custom CFS interface register 18 -| `0xffffff4c` | _CFS_REG_19_ |`31:0` | (r)/(w) | custom CFS interface register 19 -| `0xffffff50` | _CFS_REG_20_ |`31:0` | (r)/(w) | custom CFS interface register 20 -| `0xffffff54` | _CFS_REG_21_ |`31:0` | (r)/(w) | custom CFS interface register 21 -| `0xffffff58` | _CFS_REG_22_ |`31:0` | (r)/(w) | custom CFS interface register 22 -| `0xffffff5c` | _CFS_REG_23_ |`31:0` | (r)/(w) | custom CFS interface register 23 -| `0xffffff60` | _CFS_REG_24_ |`31:0` | (r)/(w) | custom CFS interface register 24 -| `0xffffff64` | _CFS_REG_25_ |`31:0` | (r)/(w) | custom CFS interface register 25 -| `0xffffff68` | _CFS_REG_26_ |`31:0` | (r)/(w) | custom CFS interface register 26 -| `0xffffff6c` | _CFS_REG_27_ |`31:0` | (r)/(w) | custom CFS interface register 27 -| `0xffffff70` | _CFS_REG_28_ |`31:0` | (r)/(w) | custom CFS interface register 28 -| `0xffffff74` | _CFS_REG_29_ |`31:0` | (r)/(w) | custom CFS interface register 29 -| `0xffffff78` | _CFS_REG_30_ |`31:0` | (r)/(w) | custom CFS interface register 30 -| `0xffffff7c` | _CFS_REG_31_ |`31:0` | (r)/(w) | custom CFS interface register 31 -|======================= Index: src_adoc/soc_icache.adoc =================================================================== --- src_adoc/soc_icache.adoc (revision 59) +++ src_adoc/soc_icache.adoc (nonexistent) @@ -1,50 +0,0 @@ -<<< -:sectnums: -==== Processor-Internal Instruction Cache (iCACHE) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_icache.vhd | -| Software driver file(s): | none | _implicitly used_ -| Top entity port: | none | -| Configuration generics: | _ICACHE_EN_ | implement processor-internal instruction cache when _true_ -| | _ICACHE_NUM_BLOCKS_ | number of cache blocks (pages/lines) -| | _ICACHE_BLOCK_SIZE_ | size of a cache block in bytes -| | _ICACHE_ASSOCIATIVITY_ | associativity / number of sets -| CPU interrupts: | none | -|======================= - -The processor features an optional cache for instructions to compensate memories with high latency. The -cache is directly connected to the CPU's instruction fetch interface and provides a full-transparent buffering -of instruction fetch accesses to the entire 4GB address space. - -[IMPORTANT] -The instruction cache is intended to accelerate instruction fetch via the external memory interface. -Since all processor-internal memories provide an access latency of one cycle (by default), caching -internal memories does not bring any performance gain. However, it _might_ reduce traffic on the -processor-internal bus. - -The cache is implemented if the _ICACHE_EN_ generic is true. The size of the cache memory is defined via -_ICACHE_BLOCK_SIZE_ (the size of a single cache block/page/line in bytes; has to be a power of two and >= -4 bytes), _ICACHE_NUM_BLOCKS_ (the total amount of cache blocks; has to be a power of two and >= 1) and -the actual cache associativity _ICACHE_ASSOCIATIVITY_ (number of sets; 1 = direct-mapped, 2 = 2-way set-associative, -has to be a power of two and >= 1). - -If the cache associativity (_ICACHE_ASSOCIATIVITY_) is > 1 the LRU replacement policy (least recently -used) is used. - -[TIP] -Keep the features of the targeted FPGA's memory resources (block RAM) in mind when configuring -the cache size/layout to maximize and optimize resource utilization. - -By executing the `ifence.i` instruction (`Zifencei` CPU extension) the cache is cleared and a reload from -main memory is forced. Among other things, this allows to implement self-modifying code. - -**Bus Access Fault Handling** - -The cache always loads a complete cache block (_ICACHE_BLOCK_SIZE_ bytes) aligned to the size of a cache -block if a miss is detected. If any of the accessed addresses within a single block do not successfully -acknowledge (i.e. issuing an error signal or timing out) the whole cache block is invalidate and any access to -an address within this cache block will also raise an instruction fetch bus error fault exception. - Index: src_adoc/soc_wishbone.adoc =================================================================== --- src_adoc/soc_wishbone.adoc (revision 59) +++ src_adoc/soc_wishbone.adoc (nonexistent) @@ -1,157 +0,0 @@ -<<< -:sectnums: -==== Processor-External Memory Interface (WISHBONE) (AXI4-Lite) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_wishbone.vhd | -| Software driver file(s): | none | _implicitly used_ -| Top entity port: | `wb_tag_o` | request tag output (3-bit) -| | `wb_adr_o` | address output (32-bit) -| | `wb_dat_i` | data input (32-bit) -| | `wb_dat_o` | data output (32-bit) -| | `wb_we_o` | write enable (1-bit) -| | `wb_sel_o` | byte enable (4-bit) -| | `wb_stb_o` | strobe (1-bit) -| | `wb_cyc_o` | valid cycle (1-bit) -| | `wb_lock_o` | exclusive access request (1-bit) -| | `wb_ack_i` | acknowledge (1-bit) -| | `wb_err_i` | bus error (1-bit) -| | `fence_o` | an executed `fence` instruction -| | `fencei_o` | an executed `fence.i` instruction -| Configuration generics: | _MEM_EXT_EN_ | enable external memory interface when _true_ -| | _MEM_EXT_TIMEOUT_ | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled) -| Configuration constants in VHDL package file `neorv32_package.vhd`: | `wb_pipe_mode_c` | when _false_ (default): classic/standard Wishbone protocol; when _true_: pipelined Wishbone protocol -| | `xbus_big_endian_c` | byte-order (Endianness) of external memory interface (true=BIG (default), false=little) -| CPU interrupts: | none | -|======================= - -The external memory interface uses the Wishbone interface protocol. The external interface port is available -when the _MEM_EXT_EN_ generic is _true_. This interface can be used to attach external memories, custom -hardware accelerators additional IO devices or all other kinds of IP blocks. All memory accesses from the -CPU, that do not target the internal bootloader ROM, the internal IO region or the internal data/instruction -memories (if implemented at all) are forwarded to the Wishbone gateway and thus to the external memory -interface. - -[TIP] -When using the default processor setup, all access addresses between 0x00000000 and -0xffff0000 (= beginning of processor-internal BOOT ROM) are delegated to the external memory -/ bus interface if they are not targeting the (actually enabled/implemented) processor-internal -instruction memory (IMEM) or the (actually enabled/implemented) processor-internal data memory -(DMEM). See section <<_address_space>> for more information. - -**Wishbone Bus Protocol** - -The external memory interface either uses **standard** ("classic") Wishbone transactions (default) or -**pipelined** Wishbone transactions. The transaction protocol is configured via the wb_pipe_mode_c constant -in the in the main VHDL package file (`rtl/neorv32_package.vhd`): - -[source,vhdl] ----- --- (external) bus interface -- -constant wb_pipe_mode_c : boolean := false; ----- - -When `wb_pipe_mode_c` is disabled, all bus control signals including _STB_ are active (and stable) until the -transfer is acknowledged/terminated. If `wb_pipe_mode_c` is enabled, all bus control except _STB_ are active -(and stable) until the transfer is acknowledged/terminated. In this case, _STB_ is active only during the very -first bus clock cycle. - -.Exemplary Wishbone bus accesses using "classic" and "pipelined" protocol -[cols="^2,^2"] -[grid="none"] -|======================= -a| image::wishbone_classic_read.png[700,300] -a| image::wishbone_pipelined_write.png[700,300] -| **Classic** Wishbone read access | **Pipelined** Wishbone write access -|======================= - - -[TOP] -A detailed description of the implemented Wishbone bus protocol and the according interface signals -can be found in the data sheet "Wishbone B4 – WISHBONE System-on-Chip (SoC) Interconnection -Architecture for Portable IP Cores". A copy of this document can be found in the docs folder of this -project. - -**Interface Latency** - -The Wishbone gateway introduces two additional latency cycles: Processor-outgoing and -incoming signals -are fully registered. Thus, any access from the CPU to a processor-external devices requires +2 clock cycles. - -**Bus Access Timeout** - -The Wishbone bus interface provides an option to configure a bus access timeout counter. The _MEM_EXT_TIMEOUT_ -top generic is used to specify the _maximum_ time (in clock cycles) a bus access can be pending before it is automatically -terminated. If _MEM_EXT_TIMEOUT_ is set to zero, the timeout disabled an a bus access can take an arbitrary number of cycles to complete. - -When _MEM_EXT_TIMEOUT_ is greater than zero, the WIshbone adapter starts an internal countdown whenever the CPU -accesses a memory address via the external memory interface. If the accessed memory / device does not acknowledge (via `wb_ack_i`) -or terminate (via `wb_err_i`) the transfer within _MEM_EXT_TIMEOUT_ clock cycles, the bus access is automatically canceled -(setting `wb_cyc_o` low again) and a load/store/instruction fetch bus access fault exception is raised. - -[TIP] -This feature can be used as **safety guard** if the external memory system does not check for "address space holes". That means that addresses, which -do not belong to a certain memory or device, do not permanently stall the processor due to an unacknowledged/unterminated bus access. If the external -memory system can guarantee to access **any** bus access (even it targets an unimplemented address) the timeout feature should be disabled -(_MEM_EXT_TIMEOUT_ = 0). - -**Wishbone Tag** - -The 3-bit wishbone `wb_tag_o` signal provides additional information regarding the access type. This signal -is compatible to the AXI4 _AxPROT_ signal. - -* `wb_tag_o(0)` 1: privileged access (CPU is in machine mode); 0: unprivileged access -* `wb_tag_o(1)` always zero (indicating "secure access") -* `wb_tag_o(2)` 1: instruction fetch access, 0: data access - -**Exclusive / Atomic Bus Access** - -If the atomic memory access CPU extension (via _CPU_EXTENSION_RISCV_A_) is enabled, the CPU can -request an atomic/exclusive bus access via the external memory interface. - -The load-reservate instruction (`lr.w`) will set the `wb_lock_o` signal telling the bus interconnect to establish a -reservation for the current accessed address (start of an exclusive access). This signal will stay asserted until -another memory access instruction is executed (for example a `sc.w`). - -The memory system has to make sure that no other entity can access the reservated address until `wb_lock_o` -is released again. If this attempt fails, the memory system has to assert `wb_err_i` in order to indicate that the -reservation was broken. - -[TIP] -See section <<_bus_interface>> for the CPU bus interface protocol. - -**Endianness** - -The NEORV32 CPU and the Processor setup are BIG-endian architectures. However, to allow a connection -to a little-endian memory system the external bus interface provides an Endianness configuration. The -Endianness can be configured via the global `xbus_big_endian_c` constant in the main VHDL package file -(rtl/neorv32_package.vhd). By default, the external memory interface uses BIG-endian byte-order. - -[source,vhdl] ----- --- (external) bus interface -- -constant xbus_big_endian_c : boolean := true; ----- - -Application software can check the Endianness configuration of the external bus interface via the -_SYSINFO_FEATURES_MEM_EXT_ENDIAN_ flag in the processor's SYSINFO module (see section -<<_system_configuration_information_memory_sysinfo>> for more information). - -**AXI4-Lite Connectivity** - -The AXI4-Lite wrapper (`rtl/top_templates/neorv32_top_axi4lite.vhd`) provides a Wishbone-to- -AXI4-Lite bridge, compatible with Xilinx Vivado (IP packager and block design editor). All entity signals of -this wrapper are of type _std_logic_ or _std_logic_vector_, respectively. - -The AXI Interface has been verified using Xilinx Vivado IP Packager and Block Designer. The AXI -interface port signals are automatically detected when packaging the core. - -.Example AXI SoC using Xilinx Vivado -image::neorv32_axi_soc.png[] - -[WARNING] -Using the auto-termination timeout feature (_MEM_EXT_TIMEOUT_ greater than zero) is **not AXI4 compliant** as the AXI protocol does not support canceling of -bus transactions. Therefore, the NEORV32 top wrapper with AXI4-Lite interface (`rtl/top_templates/neorv32_top_axi4lite`) configures _MEM_EXT_TIMEOUT_ = 0 by default. - - Index: src_adoc/on_chip_debugger.adoc =================================================================== --- src_adoc/on_chip_debugger.adoc (revision 59) +++ src_adoc/on_chip_debugger.adoc (nonexistent) @@ -1,603 +0,0 @@ -<<< -:sectnums: -== On-Chip Debugger (OCD) - -The NEORV32 Processor features an _on-chip debugger_ (OCD) implementing **execution-based debugging** that is compatible -to the **Minimal RISC-V Debug Specification Version 0.13.2**. -Please refer to this spec for in-deep information. -A copy of the specification is available in `docs/references/riscv-debug-release.pdf`. -The NEORV32 OCD provides the following key features: - -* JTAG test access port -* run-control of the CPU: halting, single-stepping and resuming -* executing arbitrary programs during debugging -* accessing core registers (direct access to GPRs, indirect access to CSRs via program buffer) -* indirect access to the whole processor address space (via program buffer)) -* compatible to the https://github.com/riscv/riscv-openocd[RISC-V port of OpenOCD]; - pre-built binaries can be obtained for example from https://www.sifive.com/software[SiFive] - -[NOTE] -The OCD requires additional resources for implementation and _might_ also increase the critical path resulting in less -performance. If the OCD is not really required for the _final_ implementation, it can be disabled and thus, -discarded from implementation. In this case all circuitry of the debugger is completely removed (no impact -on area, energy or timing at all). - -[TIP] -A simple example on how to use NEORV32 on-chip debugger in combination with `OpenOCD` and `gdb` -is shown in chapter <<_debugging_using_the_on_chip_debugger>>. - -The NEORV32 on-chip debugger complex is based on three hardware modules: - -.NEORV32 on-chip debugger complex -image::neorv32_ocd_complex.png[align=center] - -[start=1] -. <<_debug_transport_module_dtm>> (`rtl/core/neorv32_debug_dtm.vhd`): External JTAG access tap to allow an external - adapter to interface with the _debug module(DM)_ using the _debug module interface (dmi)_. -. <<_debug_module_dm>> (`rtl/core/neorv32_debug_tm.vhd`): Debugger control unit that is configured by the DTM via the - the _dmi_. Form the CPU's "point of view" this module behaves as a memory-mapped "peripheral" that can be accessed - via the processor-internal bus. The memory-mapped registers provide an internal _data buffer_ for data transfer - from/to the DM, a _code ROM_ containing the "park loop" code, a _program buffer_ to allow the debugger to - execute small programs defined by the DM and a _status register_ that is used to communicate - _halt_, _resume_ and _execute_ requests/acknowledges from/to the DM. -. CPU <<_cpu_debug_mode>> extension (part of`rtl/core/neorv32_cpu_control.vhd`): - This extension provides the "debug execution mode" which executes the "park loop" code from the DM. - The mode also provides additional CSRs. - -**Theory of Operation** - -When debugging the system using the OCD, the debugger issues a halt request to the CPU (via the CPU's -`db_halt_req_i` signal) to make the CPU enter _debug mode_. In this state, the application-defined architectural -state of the system/CPU is "frozen" so the debugger can monitor and even modify it. -While in debug mode, the CPU executes the "park loop" code from the _code ROM_ of the DM. -This park loop implements an endless loop, in which the CPU polls the memory-mapped _status register_ that is -controlled by the _debug module (DM)_. The flags of these register are used to communicate _requests_ from -the DM and to _acknowledge_ them by the CPU: trigger execution of the program buffer or resume the halted -application. - - - -<<< -// #################################################################################################################### -:sectnums: -=== Debug Transport Module (DTM) - -The debug transport module (VHDL module: `rtl/core/neorv32_debug_dtm.vhd`) provides a JTAG test access port (TAP). -The DTM is the first entity in the debug system, which connects and external debugger via JTAG to the next debugging -entity: the debug module (DM). -External access is provided by the following top-level ports. - -.JTAG top level signals -[cols="^2,^2,^2,<8"] -[options="header",grid="rows"] -|======================= -| Name | Width | Direction | Description -| `jtag_trst_i` | 1 | in | TAP reset (low-active); this signal is optional, make sure to pull it _high_ if it is not used -| `jtag_tck_i` | 1 | in | serial clock -| `jtag_tdi_i` | 1 | in | serial data input -| `jtag_tdo_o` | 1 | out | serial data output -| `jtag_tms_i` | 1 | in | mode select -|======================= - -.JTAG Clock -[IMPORTANT] -The actual JTAG clock signal is **not** used as primary clock. Instead it is used to synchronize -JTGA accesses, while all internal operations trigger on the system clock. Hence, no additional clock domain is required -for integration of this module. -However, this constraints the maximal JTAG clock (`jtag_tck_i`) frequency to be less than or equal to -1/4 of the system clock (`clk_i`) frequency. - -[NOTE] -If the on-chip debugger is disabled (_ON_CHIP_DEBUGGER_EN_ = false) the JTAG serial input `jtag_tdi_i` is directly -connected to the JTAG serial output `jtag_tdo_o` to maintain the JTAG chain. - -[WARNING] -The NEORV32 JTAG TAP does not provide a _boundary check_ function (yet?). Hence, physical device pins cannot be accessed. - -The DTM uses the "debug module interface (dmi)" to access the actual debug module (DM). -These accesses are controlled by TAP-internal registers. -Each registers is selected by the JTAG instruction register (`IR`) and accessed through the JTAG data register (`DR`). - -[NOTE] -The DTM's instruction and data registers can be accessed using OpenOCDs `irscan` and `drscan` commands. -The RISC-V port of OpenOCD also provides low-level command (`riscv dmi_read` & `riscv dmi_write`) to access the _dmi_ -debug module interface. - -JTAG access is conducted via the *instruction register* `IR`, which is 5 bit wide, and several *data registers* `DR` -with different sizes. -The data registers are accessed by writing the according address to the instruction register. -The following table shows the available data registers: - -.JTAG TAP registers -[cols="^2,^2,^2,<8"] -[options="header",grid="rows"] -|======================= -| Address (via `IR`) | Name | Size [bits] | Description -| `00001` | `IDCODE` | 32 | identifier, default: `0x0CAFE001` (configurable via package's `jtag_tap_idcode_*` constants) -| `10000` | `DTMCS` | 32 | debug transport module control and status register -| `10001` | `DMI` | 41 | debug module interface (_dmi_); 7-bit address, 32-bit read/write data, 2-bit operation (`00` = NOP; `10` = write; `01` = read) -| others | `BYPASS` | 1 | default JTAG bypass register -|======================= - -[INFO] -See the https://github.com/riscv/riscv-debug-spec[RISC-V debug specification] for more information regarding the data -registers and operations. -A local copy can be found in `docs/references`. - - - -<<< -// #################################################################################################################### -:sectnums: -=== Debug Module (DM) - -According to the RISC-V debug specification, the DM (VHDL module: `rtl/core/neorv32_debug_dm.vhd`) -acts as a translation interface between abstract operations issued by the debugger and the platform-specific -debugger implementation. It supports the following features (excerpt from the debug spec): - -* Gives the debugger necessary information about the implementation. -* Allows the hart to be halted and resumed and provides status of the current state. -* Provides abstract read and write access to the halted hart's GPRs. -* Provides access to a reset signal that allows debugging from the very first instruction after reset. -* Provides a mechanism to allow debugging the hart immediately out of reset. (_still experimental_) -* Provides a Program Buffer to force the hart to execute arbitrary instructions. -* Allows memory access from a hart's point of view. - -The NEORV32 DM follows the "Minimal RISC-V External Debug Specification" to provide full debugging -capabilities while keeping resource (area) requirements at a minimum level. -It implements the **execution based debugging scheme** for a single hart and provides the following -hardware features: - -* program buffer with 2 entries and implicit `ebreak` instruction afterwards -* no _direct_ bus access (indirect bus access via the CPU) -* abstract commands: "access register" plus auto-execution -* no _dedicated_ halt-on-reset capabilities yet (but can be emulated) - -The DM provides two "sides of access": access from the DTM via the _debug module interface (dmi)_ and access from the -CPU via the processor-internal bus. From the DTM's point of view, the DM implements a set of <<_dm_registers>> that -are used to control and monitor the actual debugging. From the CPU's point of view, the DM implements several -memory-mapped registers (within the _normal_ address space) that are used for communicating debugging control -and status (<<_dm_cpu_access>>). - - -:sectnums: -==== DM Registers - -The DM is controlled via a set of registers that are accessed via the DTM's _dmi_. -The "Minimal RISC-V Debug Specification" requires only a subset of the registers specified in the spec. -The following registers are implemented. -Write accesses to any other registers are ignored and read accesses will always return zero. -Register names that are encapsulated in "( )" are not actually implemented; however, they are listed to explicitly show -their functionality. - -.Available DM registers -[cols="^2,^3,<7"] -[options="header",grid="rows"] -|======================= -| Address | Name | Description -| `0x04` | `data0` | Abstract data 0, used for data transfer between debugger and processor -| `0x10` | `dmcontrol` | Debug module control -| `0x11` | `dmstatus` | Debug module status -| `0x12` | `hartinfo` | Hart information -| `0x16` | `abstracts` | Abstract control and status -| `0x17` | `command` | Abstract command -| `0x18` | `abstractauto` | Abstract command auto-execution -| `0x1d` | (`nextdm`) | Base address of _next_ DM; read as zero to indicate there is only _one_ DM -| `0x20` | `progbuf0` | Program buffer 0 -| `0x21` | `progbuf1` | Program buffer 1 -| `0x38` | (`sbcs`) | System bus access control and status; read as zero to indicate there is no _direct_ system bus access -| `0x40` | `haltsum0` | Halt summary 0 -|======================= - - -:sectnums!: -===== **`data`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x04 | **Abstract data 0** | `data0` -3+| Reset value: _UNDEFINED_ -3+| Basic read/write registers to be used with abstract command (for example to read/write data from/to CPU GPRs). -|====== - - -:sectnums!: -===== **`dmcontrol`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x10 | **Debug module control register** | `dmcontrol` -3+| Reset value: 0x00000000 -3+| Control of the overall debug module and the hart. The following table shows all implemented bits. All remaining bits/bit-fields are configures as "zero" and are -read-only. Writing '1' to these bits/fields will be ignored. -|====== - -.`dmcontrol` - debug module control register bits -[cols="^1,^2,^1,<8"] -[options="header",grid="rows"] -|======================= -| Bit | Name [RISC-V] | R/W | Description -| 31 | `haltreq` | -/w | set/clear hart halt request -| 30 | `resumereq` | -/w | request hart to resume -| 28 | `ackhavereset` | -/w | write `1` to clear `*havereset` flags -| 1 | `ndmreset` | r/w | put whole processor into reset when `1` -| 0 | `dmactive` | r/w | DM enable; writing `0`-`1` will reset the DM -|======================= - - -:sectnums!: -===== **`dmstatus`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x11 | **Debug module status register** | `dmstatus` -3+| Reset value: 0x00000000 -3+| Current status of the overall debug module and the hart. The entire register is read-only. -|====== - -.`dmstatus` - debug module status register bits -[cols="^1,^2,<10"] -[options="header",grid="rows"] -|======================= -| Bit | Name [RISC-V] | Description -| 31:23 | _reserved_ | reserved; always zero -| 22 | `impebreak` | always `1`; indicates an implicit `ebreak` instruction after the last program buffer entry -| 21:20 | _reserved_ | reserved; always zero -| 19 | `allhavereset` .2+| `1` when the hart is in reset -| 18 | `anyhavereset` -| 17 | `allresumeack` .2+| `1` when the hart has acknowledged a resume request -| 16 | `anyresumeack` -| 15 | `allnonexistent` .2+| always zero to indicate the hart is always existent -| 14 | `anynonexistent` -| 13 | `allunavail` .2+| `1` when the DM is disabled to indicate the hart is unavailable -| 12 | `anyunavail` -| 11 | `allrunning` .2+| `1` when the hart is running -| 10 | `anyrunning` -| 9 | `allhalted` .2+| `1` when the hart is halted -| 8 | `anyhalted` -| 7 | `authenticated` | always `1`; there is no authentication -| 6 | `authbusy` | always `0`; there is no authentication -| 5 | `hasresethaltreq` | always `0`; halt-on-reset is not supported (directly) -| 4 | `confstrptrvalid` | always `0`; no configuration string available -| 3:0 | `version` | `0010` - DM is compatible to version 0.13 -|======================= - - -:sectnums!: -===== **`hartinfo`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x12 | **Hart information** | `hartinfo` -3+| Reset value: see below -3+| This register gives information about the hart. The entire register is read-only. -|====== - -.`hartinfo` - hart information register bits -[cols="^1,^2,<8"] -[options="header",grid="rows"] -|======================= -| Bit | Name [RISC-V] | Description -| 31:24 | _reserved_ | reserved; always zero -| 23:20 | `nscratch` | `0001`, number of `dscratch*` CPU registers = 1 -| 19:17 | _reserved_ | reserved; always zero -| 16 | `dataccess` | `0`, the `data` registers are shadowed in the hart's address space -| 15:12 | `datasize` | `0001`, number of 32-bit words in the address space dedicated to shadowing the `data` registers = 1 -| 11:0 | `dataaddr` | = `dm_data_base_c(11:0)`, signed base address of `data` words (see address map in <<_dm_cpu_access>>) -|======================= - - -:sectnums!: -===== **`abstracts`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x16 | **Abstract control and status** | `abstracts` -3+| Reset value: see below -3+| Command execution info and status. -|====== - -.`abstracts` - abstract control and status register bits -[cols="^1,^2,^1,<8"] -[options="header",grid="rows"] -|======================= -| Bit | Name [RISC-V] | R/W | Description -| 31:29 | _reserved_ | r/- | reserved; always zero -| 28:24 | `progbufsize` | r/- | `0010`; size of the program buffer (`progbuf`) = 2 entries -| 23:11 | _reserved_ | r/- | reserved; always zero -| 12 | `busy` | r/- | `1` when a command is being executed -| 11 | _reserved_ | r/- | reserved; always zero -| 10:8 | `cmerr` | r/w | error during command execution (see below); has to be cleared by writing `111` -| 7:4 | _reserved_ | r/- | reserved; always zero -| 3:0 | `datacount` | r/- | `0001`; number of implemented `data` registers for abstract commands = 1 -|======================= - -Error codes in `cmderr` (highest priority first): - -* `000` - no error -* `100` - command cannot be executed since hart is not in expected state -* `011` - exception during command execution -* `010` - unsupported command -* `001` - invalid DM register read/write while command is/was executing - - -:sectnums!: -===== **`command`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x17 | **Abstract command** | `command` -3+| Reset value: 0x00000000 -3+| Writing this register will trigger the execution of an abstract command. New command can only be executed if -`cmderr` is zero. The entire register in write-only (reads will return zero). -|====== - -[NOTE] -The NEORV32 DM only supports **Access Register** abstract commands. These commands can only access the -hart's GPRs (abstract command register index `0x1000` - `0x101f`). - -.`command` - abstract command register - "access register" commands only -[cols="^1,^2,<8"] -[options="header",grid="rows"] -|======================= -| Bit | Name [RISC-V] | Description / required value -| 31:24 | `cmdtype` | `00000000` to indicate "access register" command -| 23 | _reserved_ | reserved, has to be `0` when writing -| 22:20 | `aarsize` | `010` to indicate 32-bit accesses -| 21 | `aarpostincrement` | `0`, postincrement is not supported -| 18 | `postexec` | if set the program buffer is executed _after_ the command -| 17 | `transfer` | if set the operation in `write` is conducted -| 16 | `write` | `1`: copy `data0` to `[regno]`; `0` copy `[regno]` to `data0` -| 15:0 | `regno` | GPR-access only; has to be `0x1000` - `0x101f` -|======================= - - -:sectnums!: -===== **`abstractauto`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x18 | **Abstract command auto-execution** | `abstractauto` -3+| Reset value: 0x00000000s -3+| Register to configure when a read/write access to a DM repeats execution of the last abstract command. -|====== - -.`abstractauto` - Abstract command auto-execution register bits -[cols="^1,^2,^1,<8"] -[options="header",grid="rows"] -|======================= -| Bit | Name [RISC-V] | R/W | Description -| 17 | `autoexecprogbuf[1]` | r/w | when set reading/writing from/to `progbuf1` will execute `command again` -| 16 | `autoexecprogbuf[0]` | r/w | when set reading/writing from/to `progbuf0` will execute `command again` -| 0 | `autoexecdata[0]` | r/w | when set reading/writing from/to `data0` will execute `command again` -|======================= - - -:sectnums!: -===== **`progbuf`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x20 | **Program buffer 0** | `progbuf0` -| 0x21 | **Program buffer 1** | `progbuf1` -3+| Reset value: `NOP`-instruction -3+| General purpose program buffer for the DM. -|====== - - -:sectnums!: -===== **`haltsum0`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x40 | **Halt summary 0** | `haltsum0` -3+| Reset value: _UNDEFINED_ -3+| Bit 0 of this register is set if the hart is halted (all remaining bits are always zero). The entire register is read-only. -|====== - -:sectnums: -==== DM CPU Access - -From the CPU's point of view, the DM behaves as a memory-mapped peripheral that includes - -* a small ROM that contains the code for the "park loop", which is executed when the CPU is _in_ debug mode. -* a program buffer populated by the debugger host to execute small programs -* a data buffer to transfer data between the processor and the debugger host -* a status register to communicate debugging requests - -.Park Loop Code Sources -[NOTE] -The assembly sources of the **park loop code** are available in `sw/ocd-firmware/park_loop.S`. Please note, that these -sources are not intended to be changed by the used. Hence, the makefile does not provide an automatic option -to compile and "install" the debugger ROM code into the HDL sources and require a manual copy -(see `sw/ocd-firmware/README.md`). - -The DM uses a total address space of 128 words of the CPU's address space (= 512 bytes) divided into four sections -of 32 words (= 128 bytes) each. -Please note, that the program buffer, the data buffer and the status register only uses a few effective words in this -address space. However, these effective addresses are mirrored to fill up the whole 128 bytes of the section. -Hence, any CPU access within this address space will succeed. - -.DM CPU access - address map (divided into four sections) -[cols="^2,^4,^2,<7"] -[options="header",grid="rows"] -|======================= -| Base address | Name [VHDL package] | Actual size | Description -| `0xfffff800` | `dm_code_base_c` (= `dm_base_c`) | 128 bytes | Code ROM for the "park loop" code -| `0xfffff880` | `dm_pbuf_base_c` | 16 bytes | Program buffer, provided by DM -| `0xfffff900` | `dm_data_base_c` | 4 bytes | Data buffer (`dm.data0`) -| `0xfffff980` | `dm_sreg_base_c` | 4 bytes | Control and status register -|======================= - -[NOTE] -From the CPU's point of view, the DB is mapped to an _"unused"_ address range within the processor's -<<_address_space>> right between the bootloader ROM (BOOTROM) and the actual processor-internal IO -space at addresses `0xfffff800` - `0xfffff9ff` - -When the CPU enters or re-enters (for example via `ebreak` in the DM's program buffer) debug mode, it jumps to -the beginning of the DM's "park loop" code ROM at `dm_code_base_c`. This is the _normal entry point_ for the -park loop code. If an exception is encountered during debug mode, the CPU jumps to `dm_code_base_c + 4`, -which is the _exception entry point_. - -**Status Register** - -The status register provides a direct communication channel between the CPU executing the park loop and the -host-controlled controller of the DM. Note that all bits that can be written by the CPU (acknowledge flags) -cause a single-shot (1-cycle) signal to the DM controller and auto-clear (always read as zero). -The bits that are driven by the DM controller and are read-only to the CPU and keep their state until the CPU -acknowledges the according request. - -.DM CPU access - status register -[cols="^2,^2,^2,<8"] -[options="header",grid="rows"] -|======================= -| Bit | Name | CPU access | Description -| 0 | `halt_ack` | -/w | Set by the CPU to indicate that the CPU is halted and keeps iterating in the park loop -| 1 | `resume_req` | r/- | Set by the DM to tell the CPU to resume normal operation (leave parking loop and leave debug mode via `dret` instruction) -| 2 | `resume_ack` | -/w | Set by the CPU to acknowledge that the CPU is now going to leave parking loop & debug mode -| 3 | `execute_req` | r/- | Set by the DM to tell the CPU to leave debug mode and execute the instructions from the program buffer; CPU will re-enter parking loop afterwards -| 4 | `execute_ack` | -/w | Set by the CPU to acknowledge that the CPU is now going to execute the program buffer -| 5 | `exception_ack` | -/w | Set by the CPU to inform the DM that an exception occurred during execution of the park loop or during execution of the program buffer -|======================= - - - -<<< -// #################################################################################################################### -:sectnums: -=== CPU Debug Mode - -The NEORV32 CPU Debug Mode (part of `rtl/core/neorv32_cpu_control.vhd`) is compatible to the "Minimal RISC-V Debug Specification 0.13.2". -It is enabled/implemented by setting the CPU generic _CPU_EXTENSION_RISCV_DEBUG_ to "true" (done by setting processor -generic _ON_CHIP_DEBUGGER_EN_). -It provides a new operation mode called "debug mode". -When enabled, three additional CSRs are available (section <<_cpu_debug_mode_csrs>>) and also the "return from debug mode" -instruction `dret` is available when the CPU is "in" debug mode. - -[IMPORTANT] -The CPU _debug mode_ requires the `Zicsr` CPU extension to be implemented (top generic _CPU_EXTENSION_RISCV_Zicsr_ = true). - -The CPU debug mode is entered when one of the following events appear: - -[start=1] -. executing `ebreak` instruction (when `dcsr.ebreakm` is set and in machine mode OR when `dcsr.ebreaku` is set and in user mode) -. debug halt request from external DM (via CPU signal `db_halt_req_i`, high-active, triggering on rising-edge) -. finished executing of a single instruction while in single-step debugging mode (enabled via `dcsr.step`) - -From a hardware point of view, these "entry conditions" are special synchronous (`ebreak` instruction) or asynchronous -(single-stepping "interrupt"; halt request "interrupt") traps, that are handled invisibly by the control logic. - -Whenever the CPU **enters debug mode** it performs the following operations: - -* move `pc` to `dpcs` -* copy the hart's current privilege level to `dcsr.prv` -* set `dcrs.cause` according to the cause why debug mode is entered -* **no update** of `mtval`, `mcause`, `mtval` and `mstatus` CSRs -* load the address configured via the CPU _CPU_DEBUG_ADDR_ generic to the `pc` to jump to "debugger park loop" code in the debug module (DM) - -When the CPU **is in debug mode** the following things are important: - -* while in debug mode, the CPU executes the parking loop and the program buffer provided by the DM if requested -* effective CPU privilege level is `machine` mode, PMP is not active -* if an exception occurs - * if the exception was caused by any debug-mode entry action the CPU jumps to the _normal entry point_ - ( = _CPU_DEBUG_ADDR_) of the park loop again (for example when executing `ebreak` in debug mode) - * for all other exception sources the CPU jumps to the _exception entry point_ ( = _CPU_DEBUG_ADDR_ + 4) - to signal an exception to the DM and restarts the park loop again afterwards -* interrupts are masked - including NMIs; interrupts can be enabled _during the execution of single-stepped instructions_ - when `dcsr.stepie` is set -* if the DM makes a resume request, the park loop exits and the CPU leaves debug mode (executing `dret`) - -Debug mode is left either by executing the `dret` instruction footnote:[`dret` should only be executed _inside_ the debugger -"park loop" code (-> code ROM in the debug module (DM).)] (_in_ debug mode) or by performing -a hardware reset of the CPU. Executing `dret` outside of debug mode will raise an illegal instruction exception. -Whenever the CPU **leaves debug mode** the following things happen: - -* set the hart's current privilege level according to `dcsr.prv` -* restore `pc` from `dpcs` -* resume normal operation at `pc` - - -:sectnums: -==== CPU Debug Mode CSRs - -Two additional CSRs are required by the _Minimal RISC-V Debug Specification_: The debug mode control and status register -`dcsr` and the program counter `dpc`. Providing a general purpose scratch register for debug mode (`dscratch0`) allows -faster execution of program provided by the debugger, since _one_ general purpose register can be backup-ed and -directly used. - -[NOTE] -The debug-mode control and status registers (CSRs) are only accessible when the CPU is _in_ debug mode. -If these CSRs are accessed outside of debug mode (for example when in `machine` mode) an illegal instruction exception -is raised. - - -:sectnums!: -===== **`dcsr`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x7b0 | **Debug control and status register** | `dcsr` -3+| Reset value: 0x00000000 -3+| The `dcsr` CSR is compatible to the RISC-V debug spec. It is used to configure debug mode and provides additional status information. -The following bits are implemented. The reaming bits are read-only and always read as zero. -|====== - -.Debug control and status register bits -[cols="^1,^2,^1,<8"] -[options="header",grid="rows"] -|======================= -| Bit | Name [RISC-V] | R/W | Event -| 31:28 | `xdebugver` | r/- | always `0100` - indicates external debug support exists -| 27:16 | - | r/- | _reserved_, read as zero -| 15 | `ebereakm` | r/w | `ebreak` instructions in `machine` mode _enter_ debug mode when set -| 14 | [line-through]#`ebereakh`# | r/- | `0` - hypervisor mode not available -| 13 | [line-through]#`ebereaks`# | r/- | `0` - supervisor mode not available -| 12 | `ebereaku` | r/w | `ebreak` instructions in `user` mode _enter_ debug mode when set -| 11 | `stepie` | r/w | enable interrupts when in single-stepping mode -| 10 | [line-through]#`stopcount`# | r/- | `0` - counters increment as usual -| 9 | [line-through]#`stoptime`# | r/- | `0` - timers increment as usual -| 8:6 | `cause` | r/- | cause identifier - why was debug mode entered -| 5 | - | r/- | _reserved_, read as zero -| 4 | `mprven` | r/- | `0` - `mstatus.mprv` is ignored when in debug mode -| 3 | `nmip` | r/- | set when the non-maskable CPU/processor interrupt is pending -| 2 | `step` | r/w | enable single-stepping when set -| 1:0 | `prv` | r/w | CPU privilege level before/after debug mode -|======================= - - -:sectnums!: -===== **`dpc`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x7b1 | **Debug program counter** | `dpc` -3+| Reset value: _UNDEFINED_ -3+| The `dcsr` CSR is compatible to the RISC-V debug spec. It is used to store the current program counter when -debug mode is entered. The `dret` instruction will return to `dpc` by moving `dpc` to `pc`. -|====== - - -:sectnums!: -===== **`dscratch0`** - -[cols="4,27,>7"] -[frame="topbot",grid="none"] -|====== -| 0x7b2 | **Debug scratch register 0** | `dscratch0` -3+| Reset value: _UNDEFINED_ -3+| The `dscratch0` CSR is compatible to the RISC-V debug spec. It provides a general purpose debug mode-only scratch register. -|====== - - Index: src_adoc/soc_nco.adoc =================================================================== --- src_adoc/soc_nco.adoc (revision 59) +++ src_adoc/soc_nco.adoc (nonexistent) @@ -1,129 +0,0 @@ -<<< -:sectnums: -==== Numerically-Controlled Oscillator (NCO) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_nco.vhd | -| Software driver file(s): | neorv32_nco.c | -| | neorv32_nco.h | -| Top entity port: | `nco_o` | NCO output (3x 1-bit channels) -| Configuration generics: | _IO_NCO_EN_ | implement NCO when _true_ -| CPU interrupts: | none | -|======================= - -**Theory of Operation** - -The numerically-controller oscillator (NCO) provides a precise arbitrary linear frequency generator with -three independent channels. Based on a **direct digital synthesis** core, the NCO features a 20-bit wide -accumulator that is incremented with a programmable "tuning word". Whenever the accumulator overflows, a -flip flop is toggled that provides the actual frequency output. The accumulator increment is driven by one of -eight configurable clock sources, which are derived from the processor's main clock. - -The NCO features four accessible registers: the control register _NCO_CT_ and three _NCO_TUNE_CHi_ registers for -the tuning word of each channel i. The NCO is globally enabled by setting the _NCO_CT_EN_ bit in the control -register. If this bit is cleared, the accumulators of all channels are reset. The clock source for each channel i is -selected via the three bits _NCO_CT_CHi_PRSCx_ prescaler. The resulting clock is generated from the main -processor clock (f~main~) divided y the selected prescaler. - -.NCO prescaler configuration -[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] -[options="header",grid="rows"] -|======================= -| **`NCO_CT_CHi_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` -| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 -|======================= - -The resulting output frequency of each channel i is defined by the following equation: - -_**f~NCO~(i)**_ = ( _f~main~[Hz]_ / `clock_prescaler`(i) ) * (`tuning_word`(i) / 2*2^20+1^) - -The maximum NCO frequency f~NCOmax~ is configured when using the minimal clock prescaler and a maximum all-one -tuning word: - -_**f~NCOmax~**_ = ( _f~main~[Hz]_ / 2 ) * (1 / 2*2^20+1^) - -The minimum "frequency" is always 0 Hz when the tuning word is zero. The frequency resolution f~NCOres~ is -defined using the maximum clock prescaler and a minimal non-zero tuning word (= 1): - -_**f~NCOres~**_ = ( _f~main~[Hz]_ / 4096 ) * (1 / 2*2^20+1^) - -Assuming a processor frequency of f~main~ = 100 MHz the maximum NCO output frequency is f~NCOmax~ = 12.499 -MHz with an NCO frequency resolution of f~NCOres~ = 0.00582 Hz. - -**Advanced Configuration** - -The idle polarity of each channel is configured via the _NCO_CT_CHi_IDLE_POL_ flag and can be either `0` -(idle low) or `1` (idle high), which basically allows to invert the NCO output. If the NCO is globally disabled -by clearing the _NCO_CT_EN_ flag, `nco_o(i)` output bit i is set to the according _NCO_CT_CHi_IDLE_POL_. - -The current state of each NCO channel output can be read by software via the NCO_CT_CHi_OUTPUT bit. -The NCO frequency output is normally available via the top nco_o output signal. The according channel -output can be permanently set to zero by clearing the according NCO_CT_CHi_OE bit. - -Each NCO channel can operate either in standard mode or in pulse mode. The mode is configured via the -according channel's NCO_CT_CHi_MODE control register bit. - -**_Standard_ Operation Mode** - -If this _NCO_CT_CHi_MODE_ bit of channel i is cleared, the channel operates in standard mode providing a -frequency with **exactly 50% duty cycle** (T~high~ = T~low~). - -**_Pulse_ Operation Mode** - -If the _NCO_CT_CHi_MODE_ bit of channel i is set, the channel operates in pulse mode. In this mode, the duty -cycle can be modified to generate active pulses with variable length. Note that the "active" pulse polarity is defined -by the inverted _NCO_CT_CHi_IDLE_POL_ bit. - -Eight different pulse lengths are available. The active pulse length is defined as number of NCO clock -cycles, where the NCO clock is defined via the clock prescaler bits _NCO_CT_Chi_PRSCx_. The pulse length -of channel i is programmed by the 3-bit _NCO_CT_CHi_PULSEx_ configuration: - -.NCO pulse length configuration -[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] -[options="header",grid="rows"] -|======================= -| **`NCO_CT_CHi_PULSEx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` -| Pulse length (in NCO clock cycles) | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 -|======================= - -If _NCO_CT_CHi_IDLE_POL_ is cleared, T~high~ is defined by the _NCO_CT_CHi_PULSEx_ configuration and T~low~ = -T – T~high~. _If NCO_CT_CHi_IDLE_POL_ is set, T~low~ is defined by the _NCO_CT_CHi_PULSEx_ configuration and -T~high~ = T – T~low~. - -The actual output frequency of the channel (defined via the clock prescaler and the tuning word) is not -affected by the pulse configuration. - -For simple PWM applications, that do not require a precise frequency but a more flexible duty cycle -configuration, see section <<_pulse_width_modulation_controller_pwm>>. - -<<< -.NCO register map -[cols="<4,<3,<9,^2,<11"] -[options="header",grid="all"] -|======================= -| Address | Name [C] | Bit(s), Name [C] | R/W | Function -.22+<| `0xffffffc0` .22+<| _NCO_CT_ ^|`0` _NCO_CT_EN_ ^| r/w <| NCO enable - 3+^| Channel 0 `nco_o(0)` - ^|`1` _NCO_CT_CH0_MODE_ ^| r/w <| output mode (`0`=fixed 50% duty cycle; `1`=pulse mode) - ^|`2` _NCO_CT_CH0_IDLE_POL_ ^| r/w <| output idle polarity - ^|`3` _NCO_CT_CH0_OE_ ^| r/w <| enable output to `nco_o(0)` - ^|`4` _NCO_CT_CH0_OUTPUT_ ^| r/- <| current state of `nco_o(0)` - ^|`7:5` _NCO_CT_CH0_PRSC02_ : _NCO_CT_CH0_PRSC0_ ^| r/w <| 3-bit clock prescaler select - ^|`10_:8` _NCO_CT_CH0_PULSE2_ : _NCO_CT_CH0_PULSE0_ ^| r/w <| 3-bit pulse length select - 3+^| Channel 1 `nco_o(1)` - ^|`11` _NCO_CT_CH1_MODE_ ^| r/w <| output mode (`0`=fixed 50% duty cycle; `1`=pulse mode) - ^|`12` _NCO_CT_CH1_IDLE_POL_ ^| r/w <| output idle polarity - ^|`13` _NCO_CT_CH1_OE_ ^| r/w <| enable output to `nco_o(1)` - ^|`14` _NCO_CT_CH1_OUTPUT_ ^| r/- <| current state of `nco_o(1)` - ^|`17:15` _NCO_CT_CH1_PRSC2_ : _NCO_CT_CH1_PRSC0_ ^| r/w <| 3-bit clock prescaler select - ^|`20:18` _NCO_CT_CH1_PULSE2_ : _NCO_CT_CH1_PULSE0_ ^| r/w <| 3-bit pulse length select - 3+^| Channel 2 `nco_o(2)` - ^|`21` _NCO_CT_CH2_MODE_ ^| r/w <| output mode (`0`=fixed 50% duty cycle; `1`=pulse mode) - ^|`22` _NCO_CT_CH2_IDLE_POL_ ^| r/w <| output idle polarity - ^|`23` _NCO_CT_CH2_OE_ ^| r/w <| enable output to `nco_o(2)` - ^|`24` _NCO_CT_CH2_OUTPUT_ ^| r/- <| current state of `nco_o(2)` - ^|`27:25` _NCO_CT_CH2_PRSC2_ : _NCO_CT_CH2_PRSC0_ ^| r/w <| 3-bit clock prescaler select - ^|`30:28` _NCO_CT_CH2_PULSE2_ : _NCO_CT_CH2_PULSE0_ ^| r/w <| 3-bit pulse length select -|======================= Index: src_adoc/soc.adoc =================================================================== --- src_adoc/soc.adoc (revision 59) +++ src_adoc/soc.adoc (nonexistent) @@ -1,957 +0,0 @@ - -// #################################################################################################################### -:sectnums: -== NEORV32 Processor (SoC) - -The NEORV32 Processor is based on the NEORV32 CPU. Together with common peripheral -interfaces and embedded memories it provides a RISC-V-based full-scale microcontroller-like SoC platform. - -image::neorv32_processor.png[align=center] - -**Key Features** - -* _optional_ processor-internal data and instruction memories (<<_data_memory_dmem,**DMEM**>>/<<_instruction_memory_imem,**IMEM**>>) + cache (<<_processor_internal_instruction_cache_icache,**iCACHE**>>) -* _optional_ internal bootloader (<<_bootloader_rom_bootrom,**BOOTROM**>>) with UART console & SPI flash boot option -* _optional_ machine system timer (<<_machine_system_timer_mtime,**MTIME**>>), RISC-V-compatible -* _optional_ two independent universal asynchronous receivers and transmitters (<<_primary_universal_asynchronous_receiver_and_transmitter_uart0,**UART0**>>, <<_secondary_universal_asynchronous_receiver_and_transmitter_uart1,**UART1**>>) with optional hardware flow control (RTS/CTS) -* _optional_ 8/16/24/32-bit serial peripheral interface controller (<<_serial_peripheral_interface_controller_spi,**SPI**>>) with 8 dedicated CS lines -* _optional_ two wire serial interface controller (<<_two_wire_serial_interface_controller_twi,**TWI**>>), compatible to the I²C standard -* _optional_ general purpose parallel IO port (<<_general_purpose_input_and_output_port_gpio,**GPIO**>>), 32xOut, 32xIn -* _optional_ 32-bit external bus interface, Wishbone b4 / AXI4-Lite compatible (<<_processor_external_memory_interface_wishbone_axi4_lite,**WISHBONE**>>) -* _optional_ watchdog timer (<<_watchdog_timer_wdt,**WDT**>>) -* _optional_ PWM controller with 4 channels and 8-bit duty cycle resolution (<<_pulse_width_modulation_controller_pwm,**PWM**>>) -* _optional_ ring-oscillator-based true random number generator (<<_true_random_number_generator_trng,**TRNG**>>) -* _optional_ custom functions subsystem for custom co-processor extensions (<<_custom_functions_subsystem_cfs,**CFS**>>) -* _optional_ numerically-controlled oscillator (<<_numerically_controlled_oscillator_nco,**NCO**>>) with 3 independent channels -* _optional_ NeoPixel(TM)/WS2812-compatible smart LED interface (<<_smart_led_interface_neoled,**NEOLED**>>) -* _optional_ on-chip debugger with JTAG TAP (<<_on_chip_debugger_ocd,**OCD**>>) -* system configuration information memory to check HW configuration via software (<<_system_configuration_information_memory_sysinfo,**SYSINFO**>>) - - -<<< -// #################################################################################################################### -:sectnums: -=== Processor Top Entity - Signals - -The following table shows all interface ports of the processor top entity (`rtl/core/neorv32_top.vhd`). -The type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. - -[TIP] -A wrapper for the NEORV32 Processor setup providing resolved port signals can be found in -`rtl/top_templates/neorv32_top_stdlogic.vhd`. - -[cols="<3,^2,^2,<11"] -[options="header",grid="rows"] -|======================= -| Signal | Width | Dir. | Function -4+^| **Global Control** -| `clk_i` | 1 | in | global clock line, all registers triggering on rising edge -| `rstn_i` | 1 | in | global reset, asynchronous, **low-active** -4+^| **JTAG Access Port for <<_on_chip_debugger_ocd>>** -| `jtag_trst_i` | 1 | in | TAP reset, low-active (optionalfootnote:[Pull high if not used.]) -| `jtag_tck_i ` | 1 | in | serial clock -| `jtag_tdi_i ` | 1 | in | serial data input -| `jtag_tdo_o ` | 1 | out | serial data outputfootnote:[If the on-chip debugger is not implemented (_ON_CHIP_DEBUGGER_EN_ = false) `jtag_tdi_i` is directly forwarded to `jtag_tdo_o` to maintain the JTAG chain.] -| `jtag_tms_i ` | 1 | in | mode select -4+^| **External Bus Interface (<<_processor_external_memory_interface_wishbone_axi4_lite,WISHBONE>>)** -| `wb_tag_o` | 3 | out | tag (access type identifier) -| `wb_adr_o` | 32 | out | destination address -| `wb_dat_i` | 32 | in | write data -| `wb_dat_o` | 32 | out | read data -| `wb_we_o` | 1 | out | write enable ('0' = read transfer) -| `wb_sel_o` | 4 | out | byte enable -| `wb_stb_o` | 1 | out | strobe -| `wb_cyc_o` | 1 | out | valid cycle -| `wb_lock_o`| 1 | out | exclusive access request -| `wb_ack_i` | 1 | in | transfer acknowledge -| `wb_err_i` | 1 | in | transfer error -4+^| **Advanced Memory Control Signals** -| `fence_o` | 1 | out | indicates an executed _fence_ instruction -| `fencei_o` | 1 | out | indicates an executed _fencei_ instruction -4+^| **General Purpose Inputs & Outputs (<<_general_purpose_input_and_output_port_gpio,GPIO>>)** -| `gpio_o` | 32 | out | general purpose parallel output -| `gpio_i` | 32 | in | general purpose parallel input -4+^| **Primary Universal Asynchronous Receiver/Transmitter (<<_primary_universal_asynchronous_receiver_and_transmitter_uart0,UART0>>)** -| `uart0_txd_o` | 1 | out | UART0 serial transmitter -| `uart0_rxd_i` | 1 | in | UART0 serial receiver -| `uart0_rts_o` | 1 | out | UART0 RX ready to receive new char -| `uart0_cts_i` | 1 | in | UART0 TX allowed to start sending -4+^| **Primary Universal Asynchronous Receiver/Transmitter (<<_secondary_universal_asynchronous_receiver_and_transmitter_uart1,UART1>>)** -| `uart1_txd_o` | 1 | out | UART1 serial transmitter -| `uart1_rxd_i` | 1 | in | UART1 serial receiver -| `uart1_rts_o` | 1 | out | UART1 RX ready to receive new char -| `uart1_cts_i` | 1 | in | UART1 TX allowed to start sending -4+^| **Serial Peripheral Interface Controller (<<_serial_peripheral_interface_controller_spi,SPI>>)** -| `spi_sck_o` | 1 | out | SPI controller clock line -| `spi_sdo_o` | 1 | out | SPI serial data output -| `spi_sdi_i` | 1 | in | SPI serial data input -| `spi_csn_o` | 8 | out | SPI dedicated chip select (low-active) -4+^| **Two-Wire Interface Controller (<<_two_wire_serial_interface_controller_twi,TWI>>)** -| `twi_sda_io` | 1 | inout | TWI serial data line -| `twi_scl_io` | 1 | inout | TWI serial clock line -4+^| **Custom Functions Subsystem (<<_custom_functions_subsystem_cfs,CFS>>)** -| `cfs_in_i` | 32 | in | custom CFS input signal conduit -| `cfs_out_o` | 32 | out | custom CFS output signal conduit -4+^| **Pulse-Width Modulation Channels (<<_pulse_width_modulation_controller_pwm,PWM>>)** -| `pwm_o` | 4 | out | pulse-width modulated channels -4+^| **Numerically-Controller Oscillator (<<_numerically_controlled_oscillator_nco,NCO>>)** -| `nco_o` | 3 | out | NCO output channels -4+^| **Smart LED Interface - NeoPixel(TM) compatible (<<_smart_led_interface_neoled,NEOLED>>)** -| `neoled_o` | 1 | out | asynchronous serial data output -4+^| **System time (<<_machine_system_timer_mtime,MTIME>>)** -| `mtime_i` | 64 | in | machine timer time (to `time[h]` CSRs) from _external MTIME_ unit if the processor-internal _MTIME_ unit is NOT implemented -| `mtime_o` | 64 | out | machine timer time from _internal MTIME_ unit if processor-internal _MTIME_ unit IS implemented -4+^| **<<_processor_interrupts>>** -| `nm_irq_i` | 1 | in | non-maskable interrupt -| `soc_firq_i` | 6 | in | platform fast interrupt channels (custom) -| `mtime_irq_i` | 1 | in | machine timer interrupt13 (RISC-V) -| `msw_irq_i` | 1 | in | machine software interrupt (RISC-V) -| `mext_irq_i` | 1 | in | machine external interrupt (RISC-V) -|======================= - - -<<< -// #################################################################################################################### -:sectnums: -=== Processor Top Entity - Generics - -This is a list of all configuration generics of the NEORV32 processor top entity rtl/neorv32_top.vhd. -The generic name is shown in orange, followed by the type in printed in black and concluded by the default -value printed in light gray. - -[TIP] -The NEORV32 generics allow to configure the system according to your needs. The generics are -used to control implementation of certain CPU extensions and peripheral modules and even allow to -optimize the system for certain design goals like minimal area or maximum performance. - -[TIP] -Privileged software can determine the actual CPU and processor configuration via the `misa` and -`mzext` (see <<_machine_trap_setup>> and <<_neorv32_specific_custom_csrs>>) CSRs and via the memory-mapped _SYSINFO_ module (see <<_system_configuration_information_memory_sysinfo>>), -respectively. - -[TIP] -If optional modules (like CPU extensions or peripheral devices) are *not enabled* the according circuitry **will not be synthesized at all**. -Hence, the disabled modules do not increase area and power requirements and do not impact the timing. - -**CSR Description** - -The description of each CSR provides the following summary: - -.Generic description -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| _Generic_ | _type_ | _default value_ -3+| _Description_ -|====== - -<<< -// #################################################################################################################### -:sectnums: -==== General - -See section <<_system_configuration_information_memory_sysinfo>> for more information. - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CLOCK_FREQUENCY** | _natural_ | 0 -3+| The clock frequency of the processor's `clk_i` input port in Hertz (Hz). -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **BOOTLOADER_EN** | _boolean_ | true -3+| Implement the boot ROM, pre-initialized with the bootloader image when true. This will also change the -processor's boot address from the beginning of the instruction memory address space (default = -0x00000000) to the base address of the boot ROM. See section <<_bootloader>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **USER_CODE** | _std_ulogic_vector(31 downto 0)_ | x"00000000" -3+| Custom user code that can be read by software via the _SYSINFO_ module. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **HW_THREAD_ID** | _natural_ | 0 -3+| The hart ID of the CPU. Can be read via the `mhartid` CSR. Hart IDs must be unique within a system. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **ON_CHIP_DEBUGGER_EN** | _boolean_ | false -3+| Implement on-chip debugger (OCD). See chapter <<_on_chip_debugger_ocd>>. -|====== - - -// #################################################################################################################### -:sectnums: -==== RISC-V CPU Extensions - -See section <<_instruction_sets_and_extensions>> for more information. - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_EXTENSION_RISCV_A** | _boolean_ | false -3+| Implement atomic memory access operations when _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_EXTENSION_RISCV_B** | _boolean_ | false -3+| Implement bit manipulation instructions when _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_EXTENSION_RISCV_C** | _boolean_ | false -3+| Implement compressed instructions (16-bit) when _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_EXTENSION_RISCV_E** | _boolean_ | false -3+| Implement the embedded CPU extension (only implement the first 16 data registers) when _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_EXTENSION_RISCV_M** | _boolean_ | false -3+| Implement integer multiplication and division instructions when _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_EXTENSION_RISCV_U** | _boolean_ | false -3+| Implement less-privileged user mode when _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_EXTENSION_RISCV_Zfinx** | _boolean_ | false -3+| Implement the 32-bit single-precision floating-point extension (using integer registers) when _true_. For -more information see section <<_zfinx_single_precision_floating_point_operations>>. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_EXTENSION_RISCV_Zicsr** | _boolean_ | true -3+| Implement the control and status register (CSR) access instructions when true. Note: When this option is -disabled, the complete privileged architecture / trap system will be excluded from synthesis. Hence, no interrupts, no exceptions and -no machine information will be available. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_EXTENSION_RISCV_Zifencei** | _boolean_ | false -3+| Implement the instruction fetch synchronization instruction _fence.i_. For example, this option is required -for self-modifying code (and/or for i-cache flushes). -|====== - - -// #################################################################################################################### -:sectnums: -==== Extension Options - -See section <<_instruction_sets_and_extensions>> for more information. - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **FAST_MUL_EN** | _boolean_ | false -3+| When this generic is enabled, the multiplier of the `M` extension is realized using DSPs blocks instead of an -iterative bit-serial approach. This generic is only relevant when the multiplier and divider CPU extension is -enabled (_CPU_EXTENSION_RISCV_M_ is _true_). -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **FAST_SHIFT_EN** | _boolean_ | false -3+| When this generic is enabled the shifter unit of the CPU's ALU is implement as fast barrel shifter (requiring -more hardware resources). -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **TINY_SHIFT_EN** | _boolean_ | false -3+| If this generic is enabled the shifter unit of the CPU's ALU is implemented as (slow but tiny) single-bit iterative shifter -(requires up to 32 clock cycles for a shift operations, but reducing hardware footprint). The configuration of -this generic is ignored if _FAST_SHIFT_EN_ is _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_CNT_WIDTH** | _natural_ | 0 -3+| This generic configures the total size of the CPU's `cycle` and `instret` CSRs (low word + high word). See -section <<_machine_counters_and_timers>> for more information. Note: Configurations with _CPU_CNT_WIDTH_ -less than 64 are not RISC-V compliant. -|====== - - -// #################################################################################################################### -:sectnums: -==== Physical Memory Protection (PMP) - -See section <<_pmp_physical_memory_protection>> for more information. - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **PMP_NUM_REGIONS** | _natural_ | 0 -3+| Total number of implemented protections regions (0..64). If this generics is zero no physical memory -protection logic will be implemented at all. Setting _PMP_NUM_REGIONS_ > 0 will set the _CSR_MZEXT_PMP_ flag -in the `mzext` CSR. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **PMP_MIN_GRANULARITY** | _natural_ | 64*1024 -3+| Minimal region granularity in bytes. Has to be a power of two. Has to be at least 8 bytes. -|====== - - -// #################################################################################################################### -:sectnums: -==== Hardware Performance Monitors (HPM) - -See section <<_hpm_hardware_performance_monitors>> for more information. - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **HPM_NUM_CNTS** | _natural_ | 0 -3+| Total number of implemented hardware performance monitor counters (0..29). If this generics is zero no -hardware performance monitor logic will be implemented at all. Setting _HPM_NUM_CNTS_ > 0 will set the _CSR_MZEXT_HPM_ flag -in the `mzext` CSR. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **HPM_CNT_WIDTH** | _natural_ | 40 -3+| This generic defines the total LSB-aligned size of each HPM counter (size(`[m]hpmcounter*h`) + -size(`[m]hpmcounter*`)). The maximum value is 64, the minimal is 1. If the size is less than 64-bit, the -unused MSB-aligned counter bits are hardwired to zero. -|====== - - -// #################################################################################################################### -:sectnums: -==== Internal Instruction Memory - -See sections <<_address_space>> and <<_instruction_memory_imem>> for more information. - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **MEM_INT_IMEM_EN** | _boolean_ | true -3+| Implement processor internal instruction memory (IMEM) when _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **MEM_INT_IMEM_SIZE** | _natural_ | 16*1024 -3+| Size in bytes of the processor internal instruction memory (IMEM). Has no effect when _MEM_INT_IMEM_EN_ is _false_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **MEM_INT_IMEM_ROM** | _boolean_ | false -3+| Implement processor-internal instruction memory as read-only memory, which will be initialized with the -application image at synthesis time. Has no effect when _MEM_INT_IMEM_EN_ is _false_. -|====== - - -// #################################################################################################################### -:sectnums: -==== Internal Data Memory - -See sections <<_address_space>> and <<_data_memory_dmem>> for more information. - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **MEM_INT_DMEM_EN** | _boolean_ | true -3+| Implement processor internal data memory (DMEM) when _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **MEM_INT_DMEM_SIZE** | _natural_ | 8*1024 -3+| Size in bytes of the processor-internal data memory (DMEM). Has no effect when _MEM_INT_DMEM_EN_ is _false_. -|====== - - -// #################################################################################################################### -:sectnums: -==== Internal Cache Memory - -See section <<_processor_internal_instruction_cache_icache>> for more information. - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **ICACHE_EN** | _boolean_ | false -3+| Implement processor internal instruction cache when _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **ICACHE_NUM_BLOCK** | _natural_ | 4 -3+| Number of blocks (cache "pages" or "lines") in the instruction cache. Has to be a power of two. Has no -effect when _ICACHE_DMEM_EN_ is false. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **ICACHE_BLOCK_SIZE** | _natural_ | 64 -3+| Size in bytes of each block in the instruction cache. Has to be a power of two. Has no effect when -_ICACHE_EN_ is _false_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **ICACHE_ASSOCIATIVITY** | _natural_ | 1 -3+| Associativity (= number of sets) of the instruction cache. Has to be a power of two. Allowed configurations: -`1` = 1 set, direct mapped; `2` = 2-way set-associative. Has no effect when _ICACHE_EN_ is _false_. -|====== - - -// #################################################################################################################### -:sectnums: -==== External Memory Interface - -See sections <<_address_space>> and <<_processor_external_memory_interface_wishbone_axi4_lite>> for more information. - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **MEM_EXT_EN** | _boolean_ | false -3+| Implement external bus interface (WISHBONE) when _true_. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **MEM_EXT_TIMEOUT** | _natural_ | 255 -3+| Clock cycles after which a pending external bus access will auto-terminates and raise a bus fault exception. Set to 0 to disable auto-timeout. -|====== - - -// #################################################################################################################### -:sectnums: -==== Processor Peripheral/IO Modules - -See section <<_processor_internal_modules>> for more information. - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_GPIO_EN** | _boolean_ | true -3+| Implement general purpose input/output port unit (GPIO) when _true_. -See section <<_general_purpose_input_and_output_port_gpio>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_MTIME_EN** | _boolean_ | true -3+| Implement machine system timer (MTIME) when _true_. -See section <<_machine_system_timer_mtime>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_UART0_EN** | _boolean_ | true -3+| Implement primary universal asynchronous receiver/transmitter (UART0) when _true_. -See section <<_primary_universal_asynchronous_receiver_and_transmitter_uart0>> for -more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_UART1_EN** | _boolean_ | true -3+| Implement secondary universal asynchronous receiver/transmitter (UART1) when _true_. -See section <<_secondary_universal_asynchronous_receiver_and_transmitter_uart1>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_SPI_EN** | _boolean_ | true -3+| Implement serial peripheral interface controller (SPI) when _true_. -See section <<_serial_peripheral_interface_controller_spi>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_TWI_EN** | _boolean_ | true -3+| Implement two-wire interface controller (TWI) when _true_. -See section <<_two_wire_serial_interface_controller_twi>> for -more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_PWM_EN** | _boolean_ | true -3+| Implement pulse-width modulation controller (PWM) when _true_. -See section <<_pulse_width_modulation_controller_pwm>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_WDT_EN** | _boolean_ | true -3+| Implement watchdog timer (WDT) when _true_. See section <<_watchdog_timer_wdt>> for more -information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_TRNG_EN** | _boolean_ | false -3+| Implement true-random number generator (TRNG) when _true_. See section <<_true_random_number_generator_trng>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_CFS_EN** | _boolean_ | false -3+| Implement custom functions subsystem (CFS) when _true_. See section <<_custom_functions_subsystem_cfs>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_CFS_CONFIG** | _std_ulogic_vector(31 downto 0)_ | 0x"00000000" -3+| This is a "conduit" generic that can be used to pass user-defined CFS implementation flags to the custom -functions subsystem entity. See section <<_custom_functions_subsystem_cfs>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_CFS_IN_SIZE** | _positive_ | 32 -3+| Defines the size of the CFS input signal conduit (`cfs_in_i`). See section <<_custom_functions_subsystem_cfs>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_CFS_OUT_SIZE** | _positive_ | 32 -3+| Defines the size of the CFS output signal conduit (`cfs_out_o`). See section <<_custom_functions_subsystem_cfs>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_NCO_EN** | _boolean_ | true -3+| Implement numerically-controlled oscillator (NCO) when _true_. -See section <<_numerically_controlled_oscillator_nco>> for more information. -|====== - - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **IO_NEOLED_EN** | _boolean_ | true -3+| Implement smart LED interface (WS2812 / NeoPixel(TM)-compatible) (NEOLED) when _true_. -See section <<_smart_led_interface_neoled>> Compatible for more information. -|====== - - -<<< -// #################################################################################################################### -:sectnums: -=== Processor Interrupts - -[TIP] -The interrupt request signals have specific `mip` CSR bits (see <<_machine_trap_setup>>), specifc -`mie` CSR bits (see <<_machine_trap_handling>>) and specifc `mcause` CSR trap codes and trap -priorities. For more information (also regarding the signaling protocol) see section <<_traps_exceptions_and_interrupts>>. - -**RISC-V Standard Interrupts** - -The processor setup features the standard RISC-V interrupt lines for "machine timer interrupt", "machine -software interrupt" and "machine external interrupt". The software and external interrupt lines are available -via the processor's top entity. By default, the timer interrupt is connected to the internal machine timer -MTIME timer unit (<<_machine_system_timer_mtime>>). If this module has not been enabled for -synthesis, the machine timer interrupt is also available via the processor's top entity. - -**NEORV32-Specific Fast Interrupt Requests** - -As part of the custom/NEORV32-specific CPU extensions, the CPU features 16 fast interrupt request signals -(`FIRQ0` – `FIRQ15`). - -The fast interrupt request signals are divided into two groups. The FIRQs with higher priority (FIRQ0 – -FIRQ9) are dedicated for processor-internal usage. The FIRQs with lower priority (FIRQ10 – FIRQ15) are -available for custom usage via the processor's top entity signal `soc_firq_i`. - -The mapping of the 16 FIRQ channels is shown in the following table (the channel number corresponds to the FIRQ priority): - -.NEORV32 fast interrupt channel mapping -[cols="^1,<2,<7"] -[options="header",grid="rows"] -|======================= -| Channel | Source | Description -| 0 | _WDT_ | watchdog timeout interrupt -| 1 | _CFS_ | custom functions subsystem (CFS) interrupt (user-defined) -| 2 | _UART0_ (RXD) | UART0 data received interrupt (RX complete) -| 3 | _UART0_ (TXD) | UART0 sending done interrupt (TX complete) -| 4 | _UART1_ (RXD) | UART1 data received interrupt (RX complete) -| 5 | _UART1_ (TXD) | UART1 sending done interrupt (TX complete) -| 6 | _SPI_ | SPI transmission done interrupt -| 7 | _TWI_ | TWI transmission done interrupt -| 8 | _GPIO_ | GPIO input pin-change interrupt -| 9 | _NEOLED_ | NEOLED buffer TX empty / not full interrupt -| 10:15 | `soc_firq_i(5:0)` | Custom platform use; available via processor's top signal -|======================= - -**Non-Maskable Interrupt** - -The NEORV32 features a single non-maskable interrupt source via the `nm_irq_i` top -entity signal that can be used to signal critical system conditions. This interrupt source _cannot_ be disabled. Hence, it does _not_ provide -configuration/status flags in the `mie` and `mip` CSRs. The RISC-V-compatible `mcause` value `0x80000000` is used to indicate the non-maskable interrupt. - -<<< -// #################################################################################################################### -:sectnums: -=== Address Space - -By default, the total 32-bit (4GB) address space of the NEORV32 Processor is divided into four main regions: - -1. Instruction memory (IMEM) space – for instructions and constants. -2. Data memory (DMEM) space – for application runtime data (heap, stack, etc.). -3. Bootloader ROM address space – for the processor-internal bootloader. -4. IO/peripheral address space – for the processor-internal IO/peripheral devices (e.g., UART). - -.NEORV32 processor - address space (default configuration) -image::address_space.png[900] - -[TIP] -These four memory regions are handled by the linker when compiling a NEORV32 executable. -See section <<_executable_image_format>> for more information. - -**Address Space Layout** - -The general address space layout consists of two main configuration constants: `ispace_base_c` defining -the base address of the instruction memory address space and `dspace_base_c` defining the base address of -the data memory address space. Both constants are defined in the NEORV32 VHDL package file -`rtl/core/neorv32_package.vhd`: - -[source,vhdl] ----- --- Architecture Configuration ---------------------------------------------------- --- ---------------------------------------------------------------------------------- -constant ispace_base_c : std_ulogic_vector(31 downto 0) := x"00000000"; -constant dspace_base_c : std_ulogic_vector(31 downto 0) := x"80000000"; ----- - -The default configuration assumes the instruction memory address space starting at address _0x00000000_ -and the data memory address space starting at _0x80000000_. Both values can be modified for a specific -setup and the address space may overlap or can be completely identical. - -The base address of the bootloader (at _0xFFFF0000_) and the IO region (at _0xFFFFFF00_) for the peripheral -devices are also defined in the package and are fixed. These address regions cannot be used for other -applications – even if the bootloader or all IO devices are not implemented. - -[WARNING] -When using the processor-internal data and/or instruction memories (DMEM/IMEM) and using a non-default -configuration for the `dspace_base_c` and/or `ispace_base_c` base addresses, the -following requirements have to be fulfilled: -**1.** Both base addresses have to be aligned to a 4-byte boundary. -**2.** Both base addresses have to be aligned to the according internal memory sizes. - -:sectnums: -==== CPU Data and Instruction Access - -The CPU can access all of the 4GB address space from the instruction fetch interface (**I**) and also from the -data access interface (**D**). These two CPU interfaces are multiplexed by a simple bus switch -(`rtl/core/neorv32_busswitch.vhd`) into a _single_ processor-internal bus. All processor-internal -memories, peripherals and also the external memory interface are connected to this bus. Hence, both CPU -interfaces (instruction fetch & data access) have access to the same (**identical**) address space making the -setup a modified von-Neumann architecture. - -.Processor-internal bus architecture -image::neorv32_bus.png[1300] - -[NOTE] -The internal processor bus might appear as bottleneck. In order to reduce traffic jam on this bus -(when instruction fetch and data interface access the bus at the same time) the instruction fetch of -the CPU is equipped with a prefetch buffer. Instruction fetches can be further buffered using the i-cache. -Furthermore, data accesses (loads and stores) have higher priority than instruction fetch -accesses. - -[IMPORTANT] -Please note that all processor-internal components including the peripheral/IO devices can also be -accessed from programs running in less-privileged user mode. For example, if the system relies on -a periodic interrupt from the _MTIME_ timer unit, user-level programs could alter the _MTIME_ -configuration corrupting this interrupt. This kind of security issues can be compensated using the -PMP system (see <<_machine_physical_memory_protection>>). - -:sectnums: -==== Physical Memory Attributes - -The processor setup defines four simple attributes for the four processor-internal address space regions: - -* `r` – read access (from CPU data access interface, e.g. via "load") -* `w` – write access (from CPU data access interface, e.g. via "store") -* `x` – execute access (from CPU instruction fetch interface) -* `a` – atomic access (from CPU data access interface) -* `8` – byte (8-bit)-accessible (when writing) -* `16` – half-word (16-bit)-accessible (when writing) -* `32` – word (32-bit)-accessible (when writing) - -The following table shows the provided physical memory attributes of each region. Additional attributes (like -denying execute right for certain region of the IMEM) can be provided using the RISC-V <<_machine_physical_memory_protection>> extension. - -[cols="^1,^2,^2,^3,^2"] -[options="header",grid="rows"] -|======================= -| # | Region | Base address | Size | Attributes -| 4 | IO/peripheral devices | 0xffffff00 | 256 bytes | `r/w/a/32` -| 3 | bootloader ROM | 0xffff0000 | up to 32kB| `r/x/a` -| 2 | DMEM | 0x80000000 | up to 2GB (-64kB) | `r/w/x/a/8/16/32` -| 1 | IMEM | 0x00000000 | up to 2GB | `r/w/x/a/8/16/32` -|======================= - -Only the CPU of the processor has access to the internal memories and IO devices, hence all accesses are -always exclusive. Accessing a memory region in a way that violates the provided attributes will trigger a -load/store/instruction fetch access exception or will return a failed atomic access result, respectively. - -The physical memory attributes of memories and/or devices connected via the external bus interface have to -defined by those components or the interconnection fabric. - -:sectnums: -==== Internal Memories - -The processor can implement internal memories for instructions (IMEM) and data (DMEM), which will be -mapped to FPGA block RAMs. The implementation of these memories is controlled via the boolean -_MEM_INT_IMEM_EN_ and _MEM_INT_DMEM_EN_ generics. - -The size of these memories are configured via the _MEM_INT_IMEM_SIZE_ and _MEM_INT_DMEM_SIZE_ -generics (in bytes), respectively. The processor-internal instruction memory (IMEM) can optionally be -implemented as true ROM (_MEM_INT_IMEM_ROM_), which is initialized with the application code during -synthesis. - -If the processor-internal IMEM is implemented, it is located right at the base address of the instruction -address space (default `ispace_base_c` = _0x00000000_). Vice versa, the processor-internal data memory is -located right at the beginning of the data address space (default `dspace_base_c` = _0x80000000_) when -implemented. - -:sectnums: -==== External Memory/Bus Interface - -Any CPU access (data or instructions), which does not fulfill one of the following conditions, is forwarded -to the <<_processor_external_memory_interface_wishbone_axi4_lite>>: - -* access to the processor-internal IMEM and processor-internal IMEM is implemented -* access to the processor-internal DMEM and processor-internal DMEM is implemented -* access to the bootloader ROM and beyond → addresses >= _BOOTROM_BASE_ (default 0xFFFF0000) will never be forwarded to the external memory interface - -The external bus interface is available when the _MEM_EXT_EN_ generic is _true_. If this interface is -deactivated, any access exceeding the internal memories or peripheral devices will trigger a bus access fault -exception. If _MEM_EXT_TIMEOUT_ is greater than zero any external bus access that is not acknowledged or terminated -within _MEM_EXT_TIMEOUT_ clock cycles will auto-timeout and raise the according bus fault exception. - - - -<<< -// #################################################################################################################### -:sectnums: -=== Processor-Internal Modules - -Basically, the processor is a SoC consisting of the NEORV32 CPU, peripheral/IO devices, embedded -memories, an external memory interface and a bus infrastructure to interconnect all units. Additionally, the -system implements an internal reset generator and a global clock generator/divider. - -**Internal Reset Generator** - -Most processor-internal modules – except for the CPU and the watchdog timer – do not have a dedicated -reset signal. However, all devices can be reset by software by clearing the corresponding unit's control -register. The automatically included application start-up code will perform such a software-reset of all -modules to ensure a clean system reset state. The hardware reset signal of the processor can either be -triggered via the external reset pin (`rstn_i`, low-active) or by the internal watchdog timer (if implemented). -Before the external reset signal is applied to the system, it is filtered (so no spike can generate a reset, a -minimum active reset period of one clock cycle is required) and extended to have a minimal duration of four -clock cycles. - -**Internal Clock Divider** - -An internal clock divider generates 8 clock signals derived from the processor's main clock input `clk_i`. -These derived clock signals are not actual _clock signals_. Instead, they are derived from a simple counter and -are used as "clock enable" signal by the different processor modules. Thus, the whole design operates using -only the main clock signal (single clock domain). Some of the processor peripherals like the Watchdog or the -UARTs can select one of the derived clock enabled signals for their internal operation. If none of the -connected modules require a clock signal from the divider, it is automatically deactivated to reduce dynamic -power. - -The peripheral devices, which feature a time-based configuration, provide a three-bit prescaler select in their -according control register to select one out of the eight available clocks. The mapping of the prescaler select -bits to the actually obtained clock are shown in the table below. Here, f represents the processor main clock -from the top entity's `clk_i` signal. - -[cols="<3,^1,^1,^1,^1,^1,^1,^1,^1"] -[grid="rows"] -|======================= -| Prescaler bits: | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` -| Resulting clock: | _f/2_ | _f/4_ | _f/8_ | _f/64_ | _f/128_ | _f/1024_| _f/2048_| _f/4096_ -|======================= - -**Peripheral / IO Devices** - -The processor-internal peripheral/IO devices are located at the end of the 32-bit address space at base -address _0xFFFFF00_. A region of 256 bytes is reserved for this devices. Hence, all peripheral/IO devices are -accessed using a memory-mapped scheme. A special linker script as well as the NEORV32 core software -library abstract the specific memory layout for the user. - -[IMPORTANT] -When accessing an IO device that hast not been implemented (via the according _IO_x_EN_ generic), a -load/store access fault exception is triggered. - -[IMPORTANT] -The peripheral/IO devices can only be written in full-word mode (i.e. 32-bit). Byte or half-word -(8/16-bit) writes will trigger a store access fault exception. Read accesses are not size constrained. -Processor-internal memories as well as modules connected to the external memory interface can still -be written with a byte-wide granularity. - -[TIP] -You should use the provided core software library to interact with the peripheral devices. This -prevents incompatibilities with future versions, since the hardware driver functions handle all the -register and register bit accesses. - -[TIP] -Most of the IO devices do not have a hardware reset. Instead, the devices are reset via software by -writing zero to the unit's control register. A general software-based reset of all devices is done by the -application start-up code `crt0.S`. - -**Nomenclature for the Peripheral / IO Devices Listing** - -Each peripheral device chapter features a register map showing accessible control and data registers of the -according device including the implemented control and status bits. You can directly interact with these -registers/bits via the provided _C-code defines_. These defines are set in the main processor core library -include file `sw/lib/include/neorv32.h`. The registers and/or register bits, which can be accessed -directly using plain C-code, are marked with a "[C]". - -Not all registers or register bits can be arbitrarily read/written. The following read/write access types are -available: - -* `r/w` registers / bits can be read and written -* `r/-` registers / bits are read-only; any write access to them has no effect -* `-/w` these registers / bits are write-only; they auto-clear in the next cycle and are always read as zero - -[TIP] -Bits / registers that are not listed in the register map tables are not (yet) implemented. These registers -/ bits are always read as zero. A write access to them has no effect, but user programs should only -write zero to them to keep compatible with future extension. - -[TIP] -When writing to read-only registers, the access is nevertheless acknowledged, but no actual data is -written. When reading data from a write-only register the result is undefined. - - -include::soc_imem.adoc[] - -include::soc_dmem.adoc[] - -include::soc_bootrom.adoc[] - -include::soc_icache.adoc[] - -include::soc_wishbone.adoc[] - -include::soc_gpio.adoc[] - -include::soc_wdt.adoc[] - -include::soc_mtime.adoc[] - -include::soc_uart.adoc[] - -include::soc_spi.adoc[] - -include::soc_twi.adoc[] - -include::soc_pwm.adoc[] - -include::soc_trng.adoc[] - -include::soc_cfs.adoc[] - -include::soc_nco.adoc[] - -include::soc_neoled.adoc[] - -include::soc_sysinfo.adoc[] - - Index: src_adoc/soc_bootrom.adoc =================================================================== --- src_adoc/soc_bootrom.adoc (revision 59) +++ src_adoc/soc_bootrom.adoc (nonexistent) @@ -1,36 +0,0 @@ -<<< -:sectnums: -==== Bootloader ROM (BOOTROM) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_boot_rom.vhd | -| Software driver file(s): | none | _implicitly used_ -| Top entity port: | none | -| Configuration generics: | _BOOTLOADER_EN_ | implement processor-internal bootloader when _true_ -| CPU interrupts: | none | -|======================= - -As the name already suggests, the boot ROM contains the read-only bootloader image. When the bootloader -is enabled via the _BOOTLOADER_EN_ generic it is directly executed after system reset. - -The bootloader ROM is located at address 0xFFFF0000. This location is fixed and the bootloader ROM size -must not exceed 32kB. The bootloader read-only memory is automatically initialized during synthesis via the -`rtl/core/neorv32_bootloader_image.vhd` file, which is generated when compiling and installing the -bootloader sources. - -The bootloader ROM address space cannot be used for other applications even when the bootloader is not -implemented. - -**Boot Configuration** - -If the bootloader is implemented, the CPU starts execution after reset right at the beginning of the boot -ROM. If the bootloader is not implemented, the CPU starts execution at the beginning of the instruction -memory space (defined via `ispace_base_c` constant in the `neorv32_package.vhd` VHDL package file, -default `ispace_base_c` = 0x00000000). In this case, the instruction memory has to contain a valid -executable – either by using the internal IMEM with an initialization during synthesis or by a user-defined -initialization process. - -[TIP] -See section <<_bootloader>> for more information regarding the bootloader's boot process and configuration options. Index: src_adoc/soc_dmem.adoc =================================================================== --- src_adoc/soc_dmem.adoc (revision 59) +++ src_adoc/soc_dmem.adoc (nonexistent) @@ -1,19 +0,0 @@ -<<< -:sectnums: -==== Data Memory (DMEM) - -[cols="<3,<3,<4"] -[frame="topbot",grid="none"] -|======================= -| Hardware source file(s): | neorv32_dmem.vhd | -| Software driver file(s): | none | _implicitly used_ -| Top entity port: | none | -| Configuration generics: | _MEM_INT_DMEM_EN_ | implement processor-internal DMEM when _true_ -| | _MEM_INT_DMEM_SIZE_ | DMEM size in bytes -| CPU interrupts: | none | -|======================= - -Implementation of the processor-internal data memory is enabled via the processor's _MEM_INT_DMEM_EN_ -generic. The size in bytes is defined via the _MEM_INT_DMEM_SIZE_ generic. If the DMEM is implemented, -the memory is mapped into the data memory space and located right at the beginning of the data memory -space (default `dspace_base_c` = 0x80000000). The DMEM is always implemented as RAM. Index: src_adoc/cpu.adoc =================================================================== --- src_adoc/cpu.adoc (revision 59) +++ src_adoc/cpu.adoc (nonexistent) @@ -1,1024 +0,0 @@ -:sectnums: -== NEORV32 Central Processing Unit (CPU) - -image::riscv_logo.png[width=350,align=center] - -**Key Features** - -* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU -* Optional RISC-V extensions: `rv32[i/e][m][a][c][b][u]` + `[Zfinx][Zicsr][Zifencei]` + `[debug_mode]` (for on-chip debugging) -* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications – passes the official RISC-V Architecture Tests (v2+) -* Official RISC-V open-source architecture ID -* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts and 1 non-maskable interrupt -* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions -* Optional physical memory configuration (PMP), compatible to the RISC-V specifications -* Optional hardware performance monitors (HPM) for application benchmarking -* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for -the NEORV32 processor) -* BIG-endian byte order -* Configurable hardware reset -* No hardware support of unaligned data/instruction accesses – they will trigger an exception. If the C extension is enabled instructions -can also be 16-bit aligned and a misaligned instruction address exception is not possible anymore - -[NOTE] -It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual -CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU -wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This -setup also allows to further use the default bootloader and software framework. From this base you -can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode. - - -<<< -// #################################################################################################################### -:sectnums: -=== Architecture - -The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture -specifications. The following figure shows the simplified architecture of the CPU. - -image::neorv32_cpu.png[align=center] - -The CPU uses a pipelined architecture with basically two main stages. The first stage (IF – instruction fetch) -is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is -stored to a FIFO – the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit -instruction words for the next pipeline stage. Compressed instructions – if enabled – are also decompressed -in this stage. The second stage (EX – execution) is responsible for actually executing the fetched instructions -via the execute engine. - -These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a -certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction -prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI -(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores -multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers -due to a taken branch. - -Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage -requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes -every single instruction in a series of consecutive micro-operations. The combination of these two classical -design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to -the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach). - -The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are -merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral -devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann -Architecture. - - -// #################################################################################################################### -:sectnums: -=== RISC-V Compatibility - -The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and -rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the -NEORV32 processor are located in the repository's `riscv-arch-test` folder. See section <<_risc_v_architecture_test_framework>> -for information how to run the tests on the NEORV32. - -.**RISC-V `rv32_m/C` Tests** -................................... -Check cadd-01 ... OK -Check caddi-01 ... OK -Check caddi16sp-01 ... OK -Check caddi4spn-01 ... OK -Check cand-01 ... OK -Check candi-01 ... OK -Check cbeqz-01 ... OK -Check cbnez-01 ... OK -Check cebreak-01 ... OK -Check cj-01 ... OK -Check cjal-01 ... OK -Check cjalr-01 ... OK -Check cjr-01 ... OK -Check cli-01 ... OK -Check clui-01 ... OK -Check clw-01 ... OK -Check clwsp-01 ... OK -Check cmv-01 ... OK -Check cnop-01 ... OK -Check cor-01 ... OK -Check cslli-01 ... OK -Check csrai-01 ... OK -Check csrli-01 ... OK -Check csub-01 ... OK -Check csw-01 ... OK -Check cswsp-01 ... OK -Check cxor-01 ... OK --------------------------------- -OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32 -................................... - -.**RISC-V `rv32_m/I` Tests** -................................... -Check add-01 ... OK -Check addi-01 ... OK -Check and-01 ... OK -Check andi-01 ... OK -Check auipc-01 ... OK -Check beq-01 ... OK -Check bge-01 ... OK -Check bgeu-01 ... OK -Check blt-01 ... OK -Check bltu-01 ... OK -Check bne-01 ... OK -Check fence-01 ... OK -Check jal-01 ... OK -Check jalr-01 ... OK -Check lb-align-01 ... OK -Check lbu-align-01 ... OK -Check lh-align-01 ... OK -Check lhu-align-01 ... OK -Check lui-01 ... OK -Check lw-align-01 ... OK -Check or-01 ... OK -Check ori-01 ... OK -Check sb-align-01 ... OK -Check sh-align-01 ... OK -Check sll-01 ... OK -Check slli-01 ... OK -Check slt-01 ... OK -Check slti-01 ... OK -Check sltiu-01 ... OK -Check sltu-01 ... OK -Check sra-01 ... OK -Check srai-01 ... OK -Check srl-01 ... OK -Check srli-01 ... OK -Check sub-01 ... OK -Check sw-align-01 ... OK -Check xor-01 ... OK -Check xori-01 ... OK --------------------------------- -OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32 -................................... - -.**RISC-V `rv32_m/M` Tests** -................................... -Check div-01 ... OK -Check divu-01 ... OK -Check mul-01 ... OK -Check mulh-01 ... OK -Check mulhsu-01 ... OK -Check mulhu-01 ... OK -Check rem-01 ... OK -Check remu-01 ... OK --------------------------------- -OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32 -................................... - -.**RISC-V `rv32_m/privilege` Tests** -................................... -Check ebreak ... OK -Check ecall ... OK -Check misalign-beq-01 ... OK -Check misalign-bge-01 ... OK -Check misalign-bgeu-01 ... OK -Check misalign-blt-01 ... OK -Check misalign-bltu-01 ... OK -Check misalign-bne-01 ... OK -Check misalign-jal-01 ... OK -Check misalign-lh-01 ... OK -Check misalign-lhu-01 ... OK -Check misalign-lw-01 ... OK -Check misalign-sh-01 ... OK -Check misalign-sw-01 ... OK -Check misalign1-jalr-01 ... OK -Check misalign2-jalr-01 ... OK --------------------------------- -OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32 -................................... - -.**RISC-V `rv32_m/Zifencei` Tests** -................................... -Check Fencei ... OK --------------------------------- -OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32 -................................... - - -<<< -:sectnums: -==== RISC-V Incompatibility Issues and Limitations - -This list shows the currently known issues regarding full RISC-V-compatibility. More specific information -can be found in section <<_instruction_sets_and_extensions>>. - -[IMPORTANT] -CPU and Processor are BIG-ENDIAN, but this should be no problem as the external memory bus -interface provides big- and little-endian configurations. See section <<_processor_external_memory_interface_wishbone_axi4_lite>> for more information. - -[IMPORTANT] -The `misa` CSR is read-only. It shows the synthesized CPU extensions. Hence, all implemented -CPU extensions are always active and cannot be enabled/disabled dynamically during runtime. Any -write access to it (in machine mode) is ignored and will not cause any exception or side-effects. - -[IMPORTANT] -The `mip` CSR is read-only. Pending IRQs can be cleared using the `mie` CSR. - -[IMPORTANT] -The physical memory protection (see section <<_machine_physical_memory_protection>>) -only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region. - -[IMPORTANT] -The `A` CPU extension (atomic memory access) only implements the `lr.w` and `sc.w` instructions yet. -However, these instructions are sufficient to emulate all further AMO operations. - - -==== NEORV32-Specific (Custom) Extensions - -The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR. - -[NOTE] -The CPU provides eight _fast interrupt_ interrupts, which are controlled via custom bit in the `mie` -and `mip` CSR. This extension is mapped to bits, that are available for custom use (according to the -RISC-V specs). Also, custom trap codes for `mcause` are implemented. - -[NOTE] -A custom CSR `mzext` is available that can be used to check for implemented `Z*` CPU extensions -(for example `Zifencei`). This CSR is mapped to the official "custom CSR address region". - -[NOTE] -All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception -<<_execution_safety>>. - - -<<< -// #################################################################################################################### -:sectnums: -=== CPU Top Entity - Signals - -The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The -type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal -direction seen from the CPU. - -.NEORV32 CPU top entity signals -[cols="<2,^1,^1,<6"] -[options="header", grid="rows"] -|======================= -| Signal | Width | Dir. | Function -4+^| **Global Signals** -| `clk_i` | 1 | in | global clock line, all registers triggering on rising edge -| `rstn_i` | 1 | in | global reset, low-active -| `sleep_o` | 1 | out | CPU is in sleep mode when set -4+^| **Instruction Bus Interface (<<_bus_interface>>)** -| `i_bus_addr_o` | 32 | out | destination address -| `i_bus_rdata_i` | 32 | in | read data -| `i_bus_wdata_o` | 32 | out | write data (always zero) -| `i_bus_ben_o` | 4 | out | byte enable -| `i_bus_we_o` | 1 | out | write transaction (always zero) -| `i_bus_re_o` | 1 | out | read transaction -| `i_bus_lock_o` | 1 | out | exclusive access request (always zero) -| `i_bus_ack_i` | 1 | in | bus transfer acknowledge from accessed peripheral -| `i_bus_err_i` | 1 | in | bus transfer terminate from accessed peripheral -| `i_bus_fence_o` | 1 | out | indicates an executed _fence.i_ instruction -| `i_bus_priv_o` | 2 | out | current CPU privilege level -4+^| **Data Bus Interface (<<_bus_interface>>)** -| `d_bus_addr_o` | 32 | out | destination address -| `d_bus_rdata_i` | 32 | in | read data -| `d_bus_wdata_o` | 32 | out | write data -| `d_bus_ben_o` | 4 | out | byte enable -| `d_bus_we_o` | 1 | out | write transaction -| `d_bus_re_o` | 1 | out | read transaction -| `d_bus_lock_o` | 1 | out | exclusive access request -| `d_bus_ack_i` | 1 | in | bus transfer acknowledge from accessed peripheral -| `d_bus_err_i` | 1 | in | bus transfer terminate from accessed peripheral -| `d_bus_fence_o` | 1 | out | indicates an executed _fence_ instruction -| `d_bus_priv_o` | 2 | out | current CPU privilege level -4+^| **System Time (see <<_timeh>> CSR)** -| `time_i` | 64 | in | system time input (from MTIME) -4+^| **Non-Maskable Interrupt (<<_traps_exceptions_and_interrupts>>)** -| `nm_irq_i` | 1 | in | non-maskable interrupt -4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)** -| `msw_irq_i` | 1 | in | RISC-V machine software interrupt -| `mext_irq_i` | 1 | in | RISC-V machine external interrupt -| `mtime_irq_i` | 1 | in | RISC-V machine timer interrupt -4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)** -| `firq_i` | 16 | in | fast interrupt request signals -| `firq_ack_o` | 16 | out | fast interrupt acknowledge signals -4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)** -| `db_halt_req_i` | 1 | in | request CPU to halt and enter debug mode -|======================= - -<<< -// #################################################################################################################### -:sectnums: -=== CPU Top Entity - Generics - -Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>). -and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the -NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration. -The _specific_ generics are listed below. - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000 -3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this -generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction -memory (IMEM) if the bootloader is disabled (_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information. -|====== - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000 -3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address -of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information. -|====== - -[cols="4,4,2"] -[frame="all",grid="none"] -|====== -| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false -3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information. -|====== - - -<<< -// #################################################################################################################### -:sectnums: -=== Instruction Sets and Extensions - -The NEORV32 is an RISC-V `rv32i` architecture that provides several optional RISC-V CPU and ISA -(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please -see the The _RISC-V Instruction Set Manual – Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual -Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder. - -[TIP] -The CPU can discover available ISA extensions via the <<_misa>> and <<_mzext>> CSRs or by executing an instruction -and checking for an _illegal instruction exception_. - - -==== **`A`** - Atomic Memory Access - -Atomic memory access instructions (for implementing semaphores and mutexes) are available when the -`CPU_EXTENSION_RISCV_A` configuration generic is _true_. In this case the following additional instructions -are available: - -* `lr.w`: load-reservate -* `sc.w`: store-conditional - -[NOTE] -Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations -(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the -instruction’s ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet -implemented) AMO (atomic memory operation) will trigger an illegal instruction exception. - -[NOTE] -The atomic instructions have special requirements for memory system / bus interconnect. More -information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively. - - -==== **`B`** - Bit-Manipulation - -The bit-manipulation instructions extension are available when the `CPU_EXTENSION_RISCV_B` configuration generic -is _true_. Note that not all sub-extensions are implemented yet. When the bit-manipulation extension is enabled -the following instructions are available: - -* base subset **`Zbb`**: `clz`, `ctz`, `cpop`, `sext.b`, `sext.h`, `min[u]`, `max[u]`, `andn`, `orn`, `xnor`, `rol`, `ror`, -`rori`, `c.xor`, `zext` (_pseudo instruction_ `for pack rd, rs, zero`), `rev8` (_pseudo instruction_ for `grevi rd, rs, -8`), -`orc.b` (_pseudo instruction_ for `gorci rd, rs, 7`) -* single-bit operations **`Zbs`**: `sbset[i]`, `sbclr[i]`, `sbclr[i]`, `sbext[i]` -* shifted-add operations **`Zba`**: `sh1add`, `sh2add`, `sh3add` - -[WARNING] -The bit manipulation extension is not yet officially ratified and the NEORV32 implementation is still -_work-in-progess_. There is no software support in the upstream GCC RISC-V port yet. However, an intrinsic library -is provided to utilize the provided bit manipulation extension from C-language code (see -`sw/example/bit_manipulation`). - -[NOTE] -The current version of the bit manipulation specs that are supported by the NEORV32 can be found -in `docs/references/bitmanip-draft.pdf`. - - -==== **`C`** - Compressed Instructions - -Compressed 16-bit instructions are available when the `CPU_EXTENSION_RISCV_C` configuration generic is -_true_. In this case the following instructions are available: - -* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`, -`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp` - -[NOTE] -When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ address require -an additional instruction fetch to load the required second half-word of that instruction. The performance can be increased -again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`, -`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile). - - -==== **`E`** - Embedded CPU - -The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to reduce hardware -requirements. This extensions is enabled when the `CPU_EXTENSION_RISCV_E` configuration generic is _true_. Accesses to registers beyond -`x15` will raise and _illegal instruction exception_. - -Due to the reduced register file an alternate ABI (**`ilp32e`**) is required for the toolchain. - - -==== **`I`** - Base Integer ISA -The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled -regardless of the setting of the remaining exceptions. The base instruction set includes the following -instructions: - -* immediates: `lui`, `auipc` -* jumps: `jal`, `jalr` -* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu` -* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw` -* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and` -* environment: `ecall`, `ebreak`, `fence` - -[NOTE] -In order to keep the hardware footprint low, the CPU's shift unit uses a hybrid parallel/serial approach. Shift -operations are split in coarse shifts (multiples of 4) and a final fine shift (0 to 3). The total execution -time depends on the shift amount. Alternatively, the shift operations can be processed completely in parallels by a fast -(but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations -complete within 2 cycles regardless of the shift amount. Shift operations can also be executed in a pure serial manner when -then `TINY_SHIFT_EN` generic is _true_. In that case, shift operations take up to 32 cycles depending on the shift amount. - -[NOTE] -Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the -top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been -executed. Any flags within the `fence` instruction word are ignore by the hardware. - - -==== **`M`** - Integer Multiplication and Division - -Hardware-accelerated integer multiplication and division instructions are available when the -`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are -available: - -• multiplication: `mul`, `mulh`, `mulhsu`, `mulhu` -• division: `div`, `divu`, `rem`, `remu` - -[NOTE] -By default, multiplication and division operations are executed in a bit-serial approach. -Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN` -generic is _true_ allowing faster execution. Multiplications and divisions -always require a fixed amount of cycles to complete - regardless of the input operands. - - -==== **`U`** - Less-Privileged User Mode - -Adds the less-privileged _user mode_ when the `CPU_EXTENSION_RISCV_U` configuration generic is _true_. For -instance, use-level code cannot access machine-mode CSRs. Furthermore, access to the address space (like -peripheral/IO devices) can be limited via the physical memory protection (_PMP_) unit for code running in user mode. - - -==== **`Zfinx`** Single-Precision Floating-Point Operations - -The `Zfinx` floating-point extension is an alternative of the `F` floating-point instruction that also uses the -integer register file `x` to store and operate on floating-point data (hence, `F-in-x`). Since not dedicated floating-point `f` -register file exists, the `Zfinx` extension requires less hardware resources and features faster context changes. -This also implies that there are NO dedicated `f` register file related load/store or move instructions. The -official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx - -The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications. - -The `Zfinx` extensions only supports single-precision (`.s` suffix) yet (so it is a direct alternative to the `F` -extension). The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration -generic is _true_. In this case the following instructions and CSRs are available: - -* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s` -* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s` -* computational: `fadd.s`, `fsub.s`, `fmul.s` -* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s` -* number classification: `fclass.s` - -* additional CSRs: `fcsr`, `frm`, `fflags` - -[WARNING] -Fused multiply-add instructions `f[n]m[add/sub].s` are not supported! -Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet! - -[WARNING] -Subnormal numbers (also "de-normalized" numbers) are not supported by the NEORV32 FPU. -Subnormal numbers (exponent = 0) are _flushed to zero_ (setting them to +/- 0) before entering the -FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the -result is also flushed to zero during normalization. - -[WARNING] -The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no -software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an -intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language -code (see `sw/example/floating_point_test`). - - -==== **`Zicsr`** Control and Status Register Access / Privileged Architecture - -The CSR access instructions as well as the exception and interrupt system (= the privileged architecture) is implemented when the -`CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_. In this case the following instructions are -available: - -* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci` -* environment: `mret`, `wfi` - -[WARNING] -If the `Zicsr` extension is disabled the CPU does not provide any kind of interrupt or exception -support at all. In order to provide the full spectrum of functions and to allow a secure executions -environment, the `Zicsr` extension should always be enabled. - -[NOTE] -The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is -halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to -be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set. - - -==== **`Zifencei`** Instruction Stream Synchronization - -The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration -generic is _true_. It allows manual synchronization of the instruction stream via the following instruction: - -* `fence.i` - -[NOTE] -The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer. -This allows a clean re-fetch of modified data from memory. Also, the top's `i_bus_fencei_o` signal is set -high for one cycle to inform the memory system. Any additional flags within the `fence.i` instruction word -are ignore by the hardware. - -[NOTE] -If the `Zifencei` extension is disabled (_CPU_EXTENSION_RISCV_Zifencei_ generic = false) executing -a `fence.i` instruction will be executed as `nop` (and will **not trap**) and none of the functions -described above will be executed. - - -==== **`PMP`** Physical Memory Protection - -The NEORV32 physical memory protection (PMP) is compatible to the PMP specified by the RISC-V specs. -The CPU PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger minimal sizes can be configured -via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements. The physical memory protection system is implemented when the -`PMP_NUM_REGIONS` configuration generic is >0. In this case the following additional CSRs are available: - -* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers -* `pmpaddr*` (0..63, depending on configuration): PMP address registers - -See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs. - -**Configuration** - -The actual number of regions and the minimal region granularity are defined via the top entity -`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available -granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the -number of available `pmpcfg*` and `pmpaddr*` CSRs. - -When implementing more PMP regions that a _certain critical limit_ *an additional register stage -is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also -increase the latency of instruction fetches and data access by +1 cycle. - -The critical limit can be adapted for custom use by a constant from the main VHDL package file -(`rtl/core/neorv32_package.vhd`). The default value is 8: - -[source,vhdl] ----- --- "critical" number of PMP regions -- -constant pmp_num_regions_critical_c : natural := 8; ----- - -**Operation** - -Any memory access address (from the CPU's instruction fetch or data access interface) is tested if it is accessing any -of the specified (configured via `pmpaddr*` and enabled via `pmpcfg*`) PMP regions. If an -address accesses one of these regions, the configured access rights (attributes in `pmpcfg*`) are checked: - -* a write access (store) will fail if no write attribute is set -* a read access (load) will fail if no read attribute is set -* an instruction fetch access will fail if no execute attribute is set - -If an access to a protected region does not have the according access rights (attributes) it will raise the according -_instruction/load/store access fault exception_. - -By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical -memory protection also for machine-level programs you need to active the _locked bit_ in the according -`pmpcfg*` configuration. - -[IMPORTANT] -After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for -internal (iterative) computations before the configuration becomes valid. - -[NOTE] -For more information regarding RISC-V physical memory protection see the official _The RISC-V -Instruction Set Manual – Volume II: Privileged Architecture_ specifications. - - -==== **`HPM`** Hardware Performance Monitors - -In additions to the mandatory cycles (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides -up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an -N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's -`HPM_CNT_WIDTH` generic (1..64-bit), and a corresponding event configuration CSR. The event configuration -CSR defines the architectural events that lead to an increment of the associated HPM counter. - -The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are -mandatory performance monitors on every RISC-V platform and have fixed increment event. For example, -the instructions-retired counter increments with each executed instructions. The actual hardware performance -monitors are optional and can be configured to increment on arbitrary hardware events. The number of -available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will exclude -all HPM logic from the design. - -Depending on the configuration, the following additional CSR are available: - -* counters: `[m]hpmcounter*[h]` (3..31, depending on configuration) -* event configuration: `mhpmevent*` (3..31, depending on configuration) - -User-level access to the counter registers `hpmcounter*[h]` can be individually restricted via the `mcounteren` CSR. -Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR. - -If `HPM_NUM_CNTS` is lower than the maximumg value (=29) the remaining HPMs are not implemented. -However, accessing their associated CSRs will not raise an illegal instructions exception. These CSR are -read-only and will always return 0. - -[NOTE] -For a list of all allocated HPM-related CSRs and all provided event configurations see section <<_hardware_performance_monitors_hpm>>. - - -<<< -// #################################################################################################################### -:sectnums: -=== Instruction Timing - -The instruction timing listed in the table below shows the required clock cycles for executing a certain -instruction. These instruction cycles assume a bus access without additional wait states and a filled -pipeline. - -Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU -configurations are presented in <<_cpu_performance>>. - -.Clock cycles per instruction -[cols="<2,^1,^4,<3"] -[options="header", grid="rows"] -|======================= -| Class | ISA | Instruction(s) | Execution cycles -| ALU | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2 -| ALU | `C` | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2 -| ALU | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32 -| ALU | `C` | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINS_SHIFT_EN` is enabled.]: 2..32 -| Branches | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3 -| Branches | `C` | `c.beqz` `c.bnez` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3 -| Jumps / Calls | `I/E` | `jal` `jalr` | 4 + ML -| Jumps / Calls | `C` | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML -| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML -| Memory access | `C` | `c.lw` `c.sw` `c.lwsp` `c.swsp` | 4 + ML -| Memory access | `A` | `lr.w` `sc.w` | 4 + ML -| Multiplication | `M` | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5 -| Division | `M` | `div` `divu` `rem` `remu` | 22+32+4 -| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3 -| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32 -| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32 -| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA -| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3 -| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3 -| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4 -| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4 -| System | `I/E` | `fence` | 3 -| System | `C`+`Zicsr` | `c.break` | 4 -| System | `Zicsr` | `mret` `wfi` | 5 -| System | `Zifencei` | `fence.i` | 5 -| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110 -| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112 -| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22 -| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13 -| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12 -| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47 -| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48 -|======================= - -[NOTE] -The presented values of the *floating-point execution cycles* are average values – obtained from -4096 instruction executions using pseudo-random input values. The execution time for emulating the -instructions (using pure-software libraries) is ~17..140 times higher. - - - -// #################################################################################################################### -include::cpu_csr.adoc[] - - - -<<< -// #################################################################################################################### -:sectnums: -==== Execution Safety - -The hardware of the NEORV32 CPU was designed for maximum *execution safety*. If the `Zicsr` CPU -extension is enabled, the core supports **all** traps specified by the official RISC-V specifications (obviously, -not the ones that are related to yet unimplemented extensions/features). Thus, the CPU provides well-defined -hardware fall-backs for (nearly) everything that can go wrong. Even if any kind of trap is triggered, the core -is always in a defined and fully synchronized state throughout the whole architecture (i.e. no need to make -out-of-order operations undone) that allows predictable execution behavior at any time. - -**Core Safety Features** - -* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system (no speculative execution / out-of-order states). -* The CPU supports all bus exceptions including bus access exceptions that are triggered if an -accessed address does not respond or encounters an internal error during access (which is a rare -feature in many open-source RISC-V cores). -* The CPU raises an illegal instruction trap for **all** unimplemented/malformed/illegal instructions (to support _full_ virtualization). -* If user-level code tries to read from machine-level-only CSRs (like `mstatus`) an illegal instruction -exception is raised. The results of this operations is always zero (though, machine-level -code handling this exception can modify the target register of the illegal access-causing -instruction to allow full virtualization). Illegal write accesses to machine CSRs will not be write any data at all. -* Illegal user-level memory accesses to protected addresses or address regions (via physical memory -protection) will not be conducted at all (no actual write and no actual read; prevents triggering of -memory-mapped devices). Illegal load operations will not return any data (the instruction's -destination register will not be written at all). - - - -<<< -// #################################################################################################################### -:sectnums: -==== Traps, Exceptions and Interrupts - -In this document a (maybe) special nomenclature regarding traps is used: - -* _interrupt_ = asynchronous exceptions -* _exceptions_ = synchronous exceptions -* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions) - -Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in the `mtvec` -CSR. The cause of the according interrupt or exception can be determined via the content of the `mcause` -CSR The address that reflected the current program counter when a trap was taken is stored to `mepc`. -Additional information regarding the cause of the trap can be retrieved from `mtval`. - -The traps are prioritized. If several exceptions occur at once only the one with highest priority is triggered. If -several interrupts trigger at once, the one with highest priority is triggered while the remaining ones are -queued. After completing the interrupt handler the interrupt with the second highest priority will issues and -so on. - - -**Memory Access Exceptions** - -If a load operation causes any exception, the destination register is not written at all. Exceptions caused by a -misalignment or a physical memory protection fault do not trigger a bus read-operation at all. -Exceptions caused by a store address misalignment or a store physical memory protection fault do not trigger -a bus write-operation at all. - - -**Instruction Atomicity** - -All instructions execute as atomic operations – interrupts can only trigger between two instructions. - - -**Custom Fast Interrupt Request Lines** - -As a custom extension, the NEORV32 CPU features 16 fast interrupt request lines via the `firq_i` CPU (/Processor) top -entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also -provide custom trap codes in `mcause`. - - -**Non-Maskable Interrupt** - -The NEORV32 CPU features a single non-maskable interrupt source via the `nm_irq_i` CPU (/Processor) top -entity signal that can be used to signal critical system conditions. This interrupt source _cannot_ be disabled at all (even not in interrupt service routines). -Hence, it does _not_ provide configuration/status flags in the `mie` and `mip` CSRs. The RISC-V-compatible -`mcause` value `0x80000000` is used to indicate the non-maskable interrupt. - -[IMPORTANT] -All CPU/Processor interrupt request signals are triggered when the signal is _high_ for exactly one cycle (being high for several cycles might -cause multiple triggering of the interrupt). - - -<<< -// #################################################################################################################### -:sectnums!: -===== NEORV32 Trap Listing - -.NEORV32 trap listing -[cols="3,6,5,14,11,4,4"] -[options="header",grid="rows"] -|======================= -| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval` -| 1 | `0x80000000` | 1.0 | _TRAP_CODE_NMI_ | non-maskable interrupt | _I-PC_ | _0_ -| 2 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_ | machine external interrupt | _I-PC_ | _0_ -| 3 | `0x80000003` | 1.3 | _TRAP_CODE_MSI_ | machine software interrupt | _I-PC_ | _0_ -| 4 | `0x80000007` | 1.7 | _TRAP_CODE_MTI_ | machine timer interrupt | _I-PC_ | _0_ -| 5 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_ | fast interrupt request channel 0 | _I-PC_ | _0_ -| 6 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_ | fast interrupt request channel 1 | _I-PC_ | _0_ -| 7 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_ | fast interrupt request channel 2 | _I-PC_ | _0_ -| 8 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_ | fast interrupt request channel 3 | _I-PC_ | _0_ -| 9 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_ | fast interrupt request channel 4 | _I-PC_ | _0_ -| 10 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_ | fast interrupt request channel 5 | _I-PC_ | _0_ -| 11 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_ | fast interrupt request channel 6 | _I-PC_ | _0_ -| 12 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_ | fast interrupt request channel 7 | _I-PC_ | _0_ -| 13 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_ | fast interrupt request channel 8 | _I-PC_ | _0_ -| 14 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_ | fast interrupt request channel 9 | _I-PC_ | _0_ -| 15 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_ | fast interrupt request channel 10 | _I-PC_ | _0_ -| 16 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_ | fast interrupt request channel 11 | _I-PC_ | _0_ -| 17 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_ | fast interrupt request channel 12 | _I-PC_ | _0_ -| 18 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_ | fast interrupt request channel 13 | _I-PC_ | _0_ -| 19 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_ | fast interrupt request channel 14 | _I-PC_ | _0_ -| 20 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_ | fast interrupt request channel 15 | _I-PC_ | _0_ -| 21 | `0x00000001` | 0.1 | _TRAP_CODE_I_ACCESS_ | instruction access fault | _B-ADR_ | _PC_ -| 22 | `0x00000002` | 0.2 | _TRAP_CODE_I_ILLEGAL_ | illegal instruction | _PC_ | _Inst_ -| 23 | `0x00000000` | 0.0 | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_ -| 24 | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_ | environment call from M-mode (ECALL in machine-mode) | _PC_ | _PC_ -| 25 | `0x00000008` | 0.8 | _TRAP_CODE_UENV_CALL_ | environment call from U-mode(ECALL in user-mode) | _PC_ | _PC_ -| 26 | `0x00000003` | 0.3 | _TRAP_CODE_BREAKPOINT_ | breakpoint (EBREAK) | _PC_ | _PC_ -| 27 | `0x00000006` | 0.6 | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_ -| 28 | `0x00000004` | 0.4 | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_ -| 29 | `0x00000007` | 0.7 | _TRAP_CODE_S_ACCESS_ | store access fault | _B-ADR_ | _B-ADR_ -| 30 | `0x00000005` | 0.5 | _TRAP_CODE_L_ACCESS_ | lad access fault | _B-ADR_ | _B-ADR_ -|======================= - -**Notes** - -The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the -cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the -official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can -be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to -`mepc` and `mtval` CSRs when a trap is triggered: - -* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet) -* _B-ADR_- bad memory access address that cause the trap -* _PC_ - address of instruction that caused the trap -* _0_ - zero -* _Inst_ - the faulting instruction itself - - - -<<< -// #################################################################################################################### -:sectnums: -==== Bus Interface - -The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for -accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol. - -:sectnums: -===== Address Space - -The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard -Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory -system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU -does not support unaligned memory accesses _in hardware_ – however, a software-based handling can be -implemented as any unaligned memory access will trigger an according exception. - -:sectnums: -===== Interface Signals - -The following table shows the signals of the data and instruction interfaces seen from the CPU -(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs). - -.CPU bus interface -[cols="<2,^1,<7"] -[options="header",grid="rows"] -|======================= -| Signal | Size | Function -| `bus_addr_o` | 32 | access address -| `bus_rdata_i` | 32 | data input for read operations -| `bus_wdata_o` | 32 | data output for write operations -| `bus_ben_o` | 4 | byte enable signal for write operations -| `bus_we_o` | 1 | bus write access -| `bus_re_o` | 1 | bus read access -| `bus_lock_o` | 1 | exclusive access request -| `bus_ack_i` | 1 | accessed peripheral indicates a successful completion of the bus transaction -| `bus_err_i` | 1 | accessed peripheral indicates an error during the bus transaction -| `bus_fence_o` | 1 | this signal is set for one cycle when the CPU executes a data/instruction fence operation -| `bus_priv_o` | 2 | current CPU privilege level -|======================= - -[NOTE] -Currently, there a no pipelined or overlapping operations implemented within the same bus interface. -So only a single transfer request can be "on the fly". - -:sectnums: -===== Protocol - -A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for -writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is -completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the -`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one -single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus -access fault or load/store bus access fault exception. - -[NOTE] -The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o` -signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous" -completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers. - -.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency -[IMPORTANT] -Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle). -However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined -by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`). -It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**. -The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out -(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception. -Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides -an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>). - -**Exemplary Bus Accesses** - -.Example bus accesses: see read/write access description below -[cols="^2,^2"] -[grid="none"] -|======================= -a| image::cpu_interface_read_long.png[read,300,150] -a| image::cpu_interface_write_long.png[write,300,150] -| Read access | Write access -|======================= - -**Write Access** - -For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte -enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the -transaction is completed. In the example the accessed peripheral cannot answer directly in the next -cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several -cycles after issuing. - -**Read Access** - -For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept -stable until the transaction is completed. In the example the accessed peripheral cannot answer -directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as -the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i` -signal). - -**Access Boundaries** - -The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching -compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16- -bit) and word (= 32-bit) boundaries. - -**Exclusive (Atomic) Access** - -The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional -combination. Normally, these combinations should target the same memory address. - -The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction -will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of -the memory system to manage this exclusive access reservation by storing the according access address and -the source of the access itself (for example via the CPU ID in a multi-core system). - -When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is -evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back -zero and will allow the according store operation to the memory system. If the lock is broken, the -instruction will write-back non-zero and will not generate an actual memory store operation. - -The CPU-internal exclusive access lock is broken if at least one of the situations appear. - -* when executing any other memory-access operation than `lr.w` -* when any trap (sync. or async.) is triggered (for example to force a context switch) -* when the memory system signals a bus error (via the `bus_err_i` signal) - -[TIP] -For more information regarding the SoC-level behavior and requirements of atomic operations see -section <<_processor_external_memory_interface_wishbone_axi4_lite>>. - -**Memory Barriers** - -Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle -(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the -memory system to perform the necessary operations (like a cache flush and refill). - - - -<<< -// #################################################################################################################### -:sectnums: -==== CPU Hardware Reset - -In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical -registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a -dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers -after power-up is not relevant for a defined CPU boot process. - -**Rational** - -A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage -of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the -data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback -of the processing result to some kind of memory. The initial status of the data registers after power-up is -irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in -the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not -control the actual operation (in contrast to the status register). This makes the pipeline data registers from -this example "uncritical registers". - -**NEORV32 CPU Reset** - -In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status -and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The -pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main -control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like -interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code). - -During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to -the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`) -does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire -because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated -hardware reset setting it to low (globally disabling interrupts). - -**Reset Configuration** - -Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value -(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a -reset. However, certain applications or situations (like advanced gate-level / timing simulations) might -require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can -be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`): - -[source,vhdl] ----- --- "critical" number of PMP regions -- -constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value -for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW), -default; TRUE=defined LOW reset value) ----- Index: src_adoc/getting_started.adoc =================================================================== --- src_adoc/getting_started.adoc (revision 59) +++ src_adoc/getting_started.adoc (nonexistent) @@ -1,1168 +0,0 @@ -:sectnums: -== Let's Get It Started! - -To make your NEORV32 project run, follow the guides from the upcoming sections. Follow these guides -step by step and in the presented order. - -:sectnums: -=== Toolchain Setup - -There are two possibilities to get the actual RISC-V GCC toolchain: - -1. Download and _build_ the official RISC-V GNU toolchain yourself -2. Download and install a prebuilt version of the toolchain - -[NOTE] -The default toolchain prefix for this project is **`riscv32-unknown-elf`**. Of course you can use any other RISC-V -toolchain (like `riscv64-unknown-elf`) that is capable to emit code for a `rv32` architecture. Just change the _RISCV_TOOLCHAIN_ variable in the application -makefile(s) according to your needs or define this variable when invoking the makefile. - -[IMPORTANT] -Keep in mind that – for instance – a rv32imc toolchain only provides library code compiled with -compressed (_C_) and `mul`/`div` instructions (_M_)! Hence, this code cannot be executed (without -emulation) on an architecture without these extensions! - - -:sectnums: -==== Building the Toolchain from Scratch - -To build the toolchain by yourself you can follow the guide from the official https://github.com/riscv/riscvgnu-toolchain GitHub page. - -The official RISC-V repository uses submodules. You need the `--recursive` option to fetch the submodules -automatically: - -[source,bash] ----- -$ git clone --recursive https://github.com/riscv/riscv-gnu-toolchain ----- - -Download and install the prerequisite standard packages: - -[source,bash] ----- -$ sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfrdev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev ----- - -To build the Linux cross-compiler, pick an install path. If you choose, say, `/opt/riscv`, then add -`/opt/riscv/bin` to your `PATH` variable. - -[source,bash] ----- -$ export PATH=$PATH:/opt/riscv/bin ----- - -Then, simply run the following commands and configuration in the RISC-V GNU toolchain source folder to compile a -`rv32i` toolchain: - -[source,bash] ----- -riscv-gnu-toolchain$ ./configure --prefix=/opt/riscv --with-arch=rv32i –-with-abi=ilp32 -riscv-gnu-toolchain$ make ----- - -After a while you will get `riscv32-unknown-elf-gcc` and all of its friends in your `/opt/riscv/bin` folder. - - -:sectnums: -==== Downloading and Installing a Prebuilt Toolchain - -Alternatively, you can download a prebuilt toolchain. - -**Use The Toolchain I have Build** - -I have compiled the toolchain on a 64-bit x86 Ubuntu (Ubuntu on Windows, actually) and uploaded it to -GitHub. You can directly download the according toolchain archive as single _zip-file_ within a packed -release from github.com/stnolting/riscv-gcc-prebuilt. - -Unpack the downloaded toolchain archive and copy the content to a location in your file system (e.g. -`/opt/riscv`). More information about downloading and installing my prebuilt toolchains can be found in -the repository's README. - -**Use a Third Party Toolchain** - -Of course you can also use any other prebuilt version of the toolchain. There are a lot RISC-V GCC packages out there - -even for Windows. - -[IMPORTANT] -Make sure the toolchain can (also) emit code for a `rv32i` architecture, uses the `ilp32` or `ilp32e` ABI and **was not build** using -CPU extensions that are not supported by the NEORV32 (like `D`). - - -:sectnums: -==== Installation - -Now you have the binaries. The last step is to add them to your `PATH` environment variable (if you have not -already done so). Make sure to add the binaries folder (`bin`) of your toolchain. - -[source,bash] ----- -$ export PATH:$PATH:/opt/riscv/bin ----- - -You should add this command to your `.bashrc` (if you are using bash) to automatically add the RISC-V -toolchain at every console start. - -:sectnums: -==== Testing the Installation - -To make sure everything works fine, navigate to an example project in the NEORV32 example folder and -execute the following command: - -[source,bash] ----- -neorv32/sw/example/blink_led$ make check ----- - -This will test all the tools required for the NEORV32. Everything is working fine if "Toolchain check OK" appears at the end. - - - -<<< -// #################################################################################################################### -:sectnums: -=== General Hardware Setup - -The following steps are required to generate a bitstream for your FPGA board. If you want to run the -NEORV32 processor in simulation only, the following steps might also apply. - -[TIP] -Check out the example setups in the `boards` folder (@GitHub: https://github.com/stnolting/neorv32/tree/master/boards), which provides script-based -demo projects for various FPGA boars. - -In this tutorial we will use a test implementation of the processor – using many of the processor's optional -modules but just propagating the minimal signals to the outer world. Hence, this guide is intended as -evaluation or "hello world" project to check out the NEORV32. A little note: The order of the following -steps might be a little different for your specific EDA tool. - -[start=0] -. Create a new project with your FPGA EDA tool of choice. -. Add all VHDL files from the project's `rtl/core` folder to your project. Make sure to _reference_ the -files only – do not copy them. -. Make sure to add all the rtl files to a new library called **`neorv32`**. If your FPGA tools does not -provide a field to enter the library name, check out the "properties" menu of the rtl files. -. The `rtl/core/neorv32_top.vhd` VHDL file is the top entity of the NEORV32 processor. If you -already have a design, instantiate this unit into your design and proceed. -. If you do not have a design yet and just want to check out the NEORV32 – no problem! In this guide -we will use a simplified top entity, that encapsulated the actual processor top entity: add the -`rtl/core/top_templates/neorv32_test_setup.vhd` VHDL file to your project too, and -select it as top entity. -. This test setup provides a minimal test hardware setup: - -.NEORV32 "hello world" test setup -image::neorv32_test_setup.png[align=center] - -[start=7] -. This test setup only implements some very basic processor and CPU features. Also, only the -minimum number of signals is propagated to the outer world. Please note that the reset input signal -`rstn_i` is **low-active**. -. The configuration of the NEORV32 processor is done using the generics of the instantiated processor -top entity. Let's keep things simple at first and use the default configuration: - -.Cut-out of `neorv32_test_setup.vhd` showing the processor instance and its configuration -[source,vhdl] ----- -neorv32_top_inst: neorv32_top -generic map ( - -- General -- - CLOCK_FREQUENCY => 100000000, -- in Hz # <1> - BOOTLOADER_EN => true, - USER_CODE => x"00000000", - ... - -- Internal instruction memory -- - MEM_INT_IMEM_EN => true, - MEM_INT_IMEM_SIZE => 16*1024, # <2> - MEM_INT_IMEM_ROM => false, - -- Internal data memory -- - MEM_INT_DMEM_EN => true, - MEM_INT_DMEM_SIZE => 8*1024, # <3> - ... ----- -<1> Clock frequency of `clk_i` in Hertz -<2> Default size of internal instruction memory: 16kB (no need to change that _now_) -<3> Default size of internal data memory: 8kB (no need to change that _now_) - -[start=9] -. There is one generic that has to be set according to your FPGA / board: The clock frequency of the -top's clock input signal (`clk_i`). Use the _CLOCK_FREQUENC_Y generic to specify your clock source's -frequency in Hertz (Hz) (note "1"). -. If you feel like it – or if your FPGA does not provide so many resources – you can modify the -**memory sizes** (_MEM_INT_IMEM_SIZE_ and _MEM_INT_DMEM_SIZE_ – marked with notes "2" and "3") or even -exclude certain ISa extensions and peripheral modules from implementation - but as mentioned above, let's keep things -simple at first and use the standard configuration for now. - -[NOTE] -Keep the internal instruction and data memory sizes in mind – these values are required for setting -up the software framework in the next section <<_general_software_framework_setup>>. - -[start=11] -. Depending on your FPGA tool of choice, it is time to assign the signals of the test setup top entity to -the according pins of your FPGA board. All the signals can be found in the entity declaration: - -.Entity signals of `neorv32_test_setup.vhd` -[source,vhdl] ----- -entity neorv32_test_setup is - port ( - -- Global control -- - clk_i : in std_ulogic := '0'; -- global clock, rising edge - rstn_i : in std_ulogic := '0'; -- global reset, low-active, async - -- GPIO -- - gpio_o : out std_ulogic_vector(7 downto 0); -- parallel output - -- UART0 -- - uart0_txd_o : out std_ulogic; -- UART0 send data - uart0_rxd_i : in std_ulogic := '0' -- UART0 receive data -); -end neorv32_test_setup; ----- - -[start=12] -. Attach the clock input `clk_i` to your clock source and connect the reset line `rstn_i` to a button of -your FPGA board. Check whether it is low-active or high-active – the reset signal of the processor is -**low-active**, so maybe you need to invert the input signal. -. If possible, connected at least bit `0` of the GPIO output port `gpio_o` to a high-active LED (invert -the signal when your LEDs are low-active) - this LED will be used as status LED by the bootloader. -. Finally, connect the primary UART's (UART0) communication signals `uart0_txd_o` and -`uart0_rxd_i` to your serial host interface (USB-to-serial converter). -. Perform the project HDL compilation (synthesis, mapping, bitstream generation). -. Download the generated bitstream into your FPGA ("program" it) and press the reset button (just to -make sure everything is sync). -. Done! If you have assigned the bootloader status LED , it should be -flashing now and you should receive the bootloader start prompt in your UART console (check the baudrate!). - - - -<<< -// #################################################################################################################### -:sectnums: -=== General Software Framework Setup - -While your synthesis tool is crunching the NEORV32 HDL files, it is time to configure the project's software -framework for your processor hardware setup. - -[start=1] -. You need to tell the linker the actual size of the processor's instruction and data memories. This has to be always sync -to the *hardware memory configuration* (done in section <<_general_hardware_setup>>). -. Open the NEORV32 linker script `sw/common/neorv32.ld` with a text editor. Right at the -beginning of the linker script you will find the **MEMORY** configuration showing two regions: `rom` and `ram` - -.Cut-out of the linker script `neorv32.ld`: Memory configuration -[source,c] ----- -MEMORY -{ - rom (rx) : ORIGIN = DEFINED(make_bootloader) ? 0xFFFF0000 : 0x00000000, LENGTH = DEFINED(make_bootloader) ? 4*1024 : 16*1024 # <1> - ram (rwx) : ORIGIN = 0x80000000, LENGTH = 8*1024 # <2> -} ----- -<1> Size of internal instruction memory (IMEM): 16kB -<2> Size of internal data memory (DMEM): 8kB - -[WARNING] -The `rom` region provides conditional assignments (via the _make_bootloader_ symbol) for the _origin_ -and the _length_ configuration depending on whether the executable is built as normal application (for the IMEM) or -as bootloader code (for the BOOTROM). To modify the IMEM configuration of the `rom` region, -make sure to **only edit the most right values** for `ORIGIN` and `LENGTH` (marked with notes "1" and "2"). - -[start=3] -. There are four parameters that are relevant here (only the right-most value for the `rom` section): The _origin_ -and the _length_ of the instruction memory (region name `rom`) and the _origin_ and the _length_ of the data -memory (region name `ram`). These four parameters have to be always sync to your hardware memory -configuration as described in section <<_general_hardware_setup>>. - -[IMPORTANT] -The `rom` _ORIGIN_ parameter has to be equal to the configuration of the NEORV32 ispace_base_c -(default: 0x00000000) VHDL package (`rtl/core/neorv32_package.vhd`) configuration constant. The `ram` _ORIGIN_ parameter has to -be equal to the configuration of the NEORV32 `dspace_base_c` (default: 0x80000000) VHDL -package (`rtl/core/neorv32_package.vhd`) configuration constant. - -[IMPORTANT] -The `rom` _LENGTH_ and the `ram` _LENGTH_ parameters have to match the configured memory sizes. For -instance, if the system does not have any external memories connected, the `rom` _LENGTH_ parameter -has to be equal to the processor-internal IMEM size (defined via top's _MEM_INT_IMEM_SIZE_ generic) -and the `ram` _LENGTH_ parameter has to be equal to the processor-internal DMEM size (defined via top's -_MEM_INT_DMEM_SIZE_ generic). - - - -<<< -// #################################################################################################################### -:sectnums: -=== Application Program Compilation - -[start=1] -. Open a terminal console and navigate to one of the project's example programs. For instance navigate to the -simple `sw/example_blink_led` example program. This program uses the NEORV32 GPIO unit to display -an 8-bit counter on the lowest eight bit of the `gpio_o` output port. -. To compile the project and generate an executable simply execute: - -[source,bash] ----- -neorv32/sw/example/blink_led$ make exe ----- - -[start=3] -. This will compile and link the application sources together with all the included libraries. At the end, -your application is transformed into an ELF file (`main.elf`). The *NEORV32 image generator* (in `sw/image_gen`) takes this file and creates a -final executable. The makefile will show the resulting memory utilization and the executable size: - -[source,bash] ----- -neorv32/sw/example/blink_led$ make exe -Memory utilization: - text data bss dec hex filename - 852 0 0 852 354 main.elf -Executable (neorv32_exe.bin) size in bytes: -864 ----- - -[start=4] -. That's it. The `exe` target has created the actual executable `neorv32_exe.bin` in the current -folder, which is ready to be uploaded to the processor via the bootloader's UART interface. - -[TIP] -The compilation process will also create a `main.asm` assembly listing file in the project directory, which -shows the actual assembly code of the complete application. - - - -<<< -// #################################################################################################################### -:sectnums: -=== Uploading and Starting of a Binary Executable Image via UART - -You have just created the executable. Now it is time to upload it to the processor. There are basically two -options to do so. - -[TIP] -Executables can also be uploaded via the **on-chip debugger**. -See section <<_debugging_with_gdb>> for more information. - -**Option 1** - -The NEORV32 makefiles provide an upload target that allows to directly upload an executable from the -command line. Reset the processor and execute: - -[source,bash] ----- -sw/example/blink_led$ make COM_PORT=/dev/ttyUSB1 upload ----- - -Replace `/dev/ttyUSB1` with the actual serial port you are using to communicate with the processor. You -might have to use `sudo make ...` if the targeted device requires elevated access rights. - - -**Option 2** - -The "better" option is to use a standard terminal program to upload an executable. This provides a more -comfortable way as you can directly interact with the bootloader console. Additionally, using a terminal program -also allows to directly communicate with the uploaded application. - -[start=1] -. Connect the primary UART (UART0) interface of your FPGA board to a serial port of your -computer or use an USB-to-serial adapter. -. Start a terminal program. In this tutorial, I am using TeraTerm for Windows. You can download it from https://ttssh2.osdn.jp/index.html.en - -[WARNING] -Make sure your terminal program can transfer the executable in raw byte mode without any protocol stuff around it. - -[start=3] -. Open a connection to the corresponding srial port. Configure the terminal according to the -following parameters: - -* 19200 Baud -* 8 data bits -* 1 stop bit -* no parity bits -* no transmission/flow control protocol! (just raw byte mode) -* newline on `\r\n` (carriage return & newline) - -[start=4] -. Also make sure, that single chars are transmitted without any consecutive "new line" or "carriage -return" commands (this is highly dependent on your terminal application of choice, TeraTerm only -sends the raw chars by default). -. Press the NEORV32 reset button to restart the bootloader. The status LED starts blinking and the -bootloader intro screen appears in your console. Hurry up and press any key (hit space!) to abort the -automatic boot sequence and to start the actual bootloader user interface console. - -.Bootloader console; aborted auto-boot sequence -[source,bash] ----- -<< NEORV32 Bootloader >> - -BLDV: Mar 23 2021 -HWV: 0x01050208 -CLK: 0x05F5E100 -USER: 0x10000DE0 -MISA: 0x40901105 -ZEXT: 0x00000023 -PROC: 0x0EFF0037 -IMEM: 0x00004000 bytes @ 0x00000000 -DMEM: 0x00002000 bytes @ 0x80000000 - -Autoboot in 8s. Press key to abort. -Aborted. - -Available commands: -h: Help -r: Restart -u: Upload -s: Store to flash -l: Load from flash -e: Execute -CMD:> ----- - -[start=6] -. Execute the "Upload" command by typing `u`. Now the bootloader is waiting for a binary executable -to be send. - -[source,bash] ----- -CMD:> u -Awaiting neorv32_exe.bin... ----- - -[start=7] -. Use the "send file" option of your terminal program to transmit the previously generated binary executable `neorv32_exe.bin`. -. Again, make sure to transmit the executable in raw binary mode (no transfer protocol, no additional -header stuff). When using TeraTerm, select the "binary" option in the send file dialog. -. If everything went fine, OK will appear in your terminal: - -[source,bash] ----- -CMD:> u -Awaiting neorv32_exe.bin... OK ----- - -[start=10] -. The executable now resides in the instruction memory of the processor. To execute the program right -now run the "Execute" command by typing `e`: - -[source,bash] ----- -CMD:> u -Awaiting neorv32_exe.bin... OK -CMD:> e -Booting... -Blinking LED demo program ----- - -[start=11] -. Now you should see the LEDs counting. - - - -<<< -// #################################################################################################################### -:sectnums: -=== Setup of a New Application Program Project - -Done with all the introduction tutorials and those example programs? Then it is time to start your own -application project! - -[start=1] -. The easiest way of creating a *new* project is to make a copy of an *existing* project (like the -`blink_led` project) inside the `sw/example` folder. By this, all file dependencies are kept and you can -start coding and compiling. -. If you want to place the project folder somewhere else you need to adapt the project's makefile. In -the makefile you will find a variable that keeps the relative or absolute path to the NEORV32 home -folder. Just modify this variable according to your new project's home location: - -[source,makefile] ----- -# Relative or absolute path to the NEORV32 home folder (use default if not set by user) -NEORV32_HOME ?= ../../.. ----- - -[start=3] -. If your project contains additional source files outside of the project folder, you can add them to the _APP_SRC_ variable: - -[source,makefile] ----- -# User's application sources (add additional files here) -APP_SRC = $(wildcard *.c) ../somewhere/some_file.c ----- - -[start=4] -. You also need to add the folder containing the include files of your new project to the _APP_INC variable_ (do not forget the `-I` prefix): - -[source,makefile] ----- -# User's application include folders (don't forget the '-I' before each entry) -APP_INC = -I . -I ../somewhere/include_stuff_folder ----- - -[start=5] -. If you feel like it, you can change the default optimization level: - -[source,makefile] ----- -# Compiler effort -EFFORT = -Os ----- - -[TIP] -All the assignments made to the makefile variable can also be done "inline" when invoking the makefile. For example: `$make EFFORT=-Os clean_all exe` - - - - -<<< -// #################################################################################################################### -:sectnums: -=== Enabling RISC-V CPU Extensions - -Whenever you enable/disable a RISC-V CPU extensions via the according _CPU_EXTENSION_RISCV_x_ generic, you need to -adapt the toolchain configuration so the compiler can actually generate according code for it. - -To do so, open the makefile of your project (for example `sw/example/blink_led/makefile`) and scroll to the -"USER CONFIGURATION" section right at the beginning of the file. You need to modify the _MARCH_ variable and eventually -the _MABI_ variable according to your CPU hardware configuration. - -[source,makefile] ----- -# CPU architecture and ABI -MARCH = -march=rv32i # <1> -MABI = -mabi=ilp32 # <2> ----- -<1> MARCH = Machine architecture ("ISA string") -<2> MABI = Machine binary interface - -For example when you enable the RISC-V `C` extension (16-bit compressed instructions) via the _CPU_EXTENSION_RISCV_C_ generic (set _true_) you need -to add the 'c' extension also to the _MARCH_ ISA string. - -You can also override the default _MARCH_ and _MABI_ configurations from the makefile when invoking the makefile: - -[source,bash] ----- -$ make MARCH=-march=rv32ic clean_all all ----- - -[NOTE] -The RISC-V ISA string (for _MARCH_) follows a certain canonical structure: -`rev32[i/e][m][a][f][d][g][q][c][b][v][n]...` For example `rv32imac` is valid while `rv32icma` is not valid. - - - - -<<< -// #################################################################################################################### -:sectnums: -=== Building a Non-Volatile Application without External Boot Memory - -The primary purpose of the bootloader is to allow an easy and fast update of the current application. In particular, this is very handy -during the development stage of a project as you can upload modified programs at any time via the UART. -Maybe at some time your project has become mature and you want to actually _embed_ your processor -including the application. - -There are two options to provide _non-volatile_ storage of your application. The simplest (but also most constrained) one is to implement the IMEM -as true ROM to contain your program. The second option is to use an external boot memory - this concept is shown in a different section: -<<_programming_an_external_spi_flash_via_the_bootloader>>. - -Using the IMEM as ROM: - -* for this boot concept the bootloader is no longer required -* this concept only works for the internal IMEM (but can be extended to work with external memories coupled via the processor's bus interface) -* make sure that the memory components (like block RAM) the IMEM is mapped to support an initialization via the bitstream - -[start=1] -. At first, compile your application code by running the `make install` command: - -[source,bash] ----- -neorv32/sw/example/blink_led$ make compile -Memory utilization: - text data bss dec hex filename - 852 0 0 852 354 main.elf -Executable (neorv32_exe.bin) size in bytes: -864 -Installing application image to ../../../rtl/core/neorv32_application_image.vhd ----- - -[start=2] -. The `install` target has created an executable, too, but this time also in the form of a VHDL memory -initialization file. during synthesis, this initialization will become part of the final FPGA bitstream, which -in terms initializes the IMEM's memory primitives. -. To allow a direct boot of this image without interference of the bootloader you _can_ deactivate the implementation of -the bootloader via the according top entity's generic: - -[source,vhdl] ----- -BOOTLOADER_EN => false, -- implement processor-internal bootloader? # <1> ----- -<1> Set to _false_ to make the CPU directly boot from the IMEM. In this case the BOOTROM is discarded from the design. - -[start=4] -. When the bootloader is deactivated, the according module (BOOTROM) is removed from the design and the CPU will start booting -at the base address of the instruction memory space (IMEM base address) making the CPU directly executing your -application after reset. -. The IMEM could be still modified, since it is implemented as RAM by default, which might corrupt your -executable. To prevent this and to implement the IMEM as true ROM (and eventually saving some -more hardware resources), active the "IMEM as ROM" feature using the processor's according top entity -generic: - -[source,vhdl] ----- -MEM_INT_IMEM_ROM => true, -- implement processor-internal instruction memory as ROM ----- - -[start=6] -. Perform a new synthesis and upload your bitstream. Your application code now resides unchangeable -in the processor's IMEM and is directly executed after reset. - - - - -<<< -// #################################################################################################################### -:sectnums: -=== Customizing the Internal Bootloader - -The bootloader provides several configuration options to customize it for your specific applications. The -most important user-defined configuration options are available as C `#defines` right at the beginning of the -bootloader source code `sw/bootloader/bootloader.c`): - -.Cut-out from the bootloader source code `bootloader.c`: configuration parameters -[source,c] ----- -/** UART BAUD rate */ -#define BAUD_RATE (19200) -/** Enable auto-boot sequence if != 0 */ -#define AUTOBOOT_EN (1) -/** Time until the auto-boot sequence starts (in seconds) */ -#define AUTOBOOT_TIMEOUT 8 -/** Set to 0 to disable bootloader status LED */ -#define STATUS_LED_EN (1) -/** SPI_DIRECT_BOOT_EN: Define/uncomment to enable SPI direct boot */ -//#define SPI_DIRECT_BOOT_EN -/** Bootloader status LED at GPIO output port */ -#define STATUS_LED (0) -/** SPI flash boot image base address (warning! address might wrap-around!) */ -#define SPI_FLASH_BOOT_ADR (0x00800000) -/** SPI flash chip select line at spi_csn_o */ -#define SPI_FLASH_CS (0) -/** Default SPI flash clock prescaler */ -#define SPI_FLASH_CLK_PRSC (CLK_PRSC_8) -/** SPI flash sector size in bytes (default = 64kb) */ -#define SPI_FLASH_SECTOR_SIZE (64*1024) -/** ASCII char to start fast executable upload process */ -#define FAST_UPLOAD_CMD '#' ----- - -**Changing the Default Size of the Bootloader ROM** - -The NEORV32 default bootloader uses 4kB of storage. This is also the default size of the BOOTROM memory component. -If your new/modified bootloader exceeds this size, you need to modify the boot ROM configurations. - -[start=1] -. Open the processor's main package file `rtl/core/neorv32_package.vhd` and edit the -`boot_size_c` constant according to your requirements. The boot ROM size must not exceed 32kB -and should be a power of two (for optimal hardware mapping). - -[source,vhdl] ----- --- Bootloader ROM -- -constant boot_size_c : natural := 4*1024; -- bytes ----- - -[start=2] -. Now open the NEORV32 linker script `sw/common/neorv32.ld` and adapt the _LENGTH_ parameter -of the `rom` according to your new memory size. `boot_size_c` and the `rom` _LENGTH_ attribute have to be always -identical. Do **not modify** the _ORIGIN_ of the `rom` section. - -[source,c] ----- -MEMORY -{ - rom (rx) : ORIGIN = DEFINED(make_bootloader) ? 0xFFFF0000 : 0x00000000, LENGTH = DEFINED(make_bootloader) ? 4*1024 : 16*1024 # <1> - ram (rwx) : ORIGIN = 0x80000000, LENGTH = 8*1024 -} ----- -<1> Bootloader ROM default size = 4*1024 bytes (**left** value) - -[IMPORTANT] -The `rom` region provides conditional assignments (via symbol `make_bootloader`) for the origin -and the length depending on whether the executable is built as normal application (for the IMEM) or -as bootloader code (for the BOOTROM). To modify the BOOTLOADER memory size, make -sure to edit the first value for the origin (note "1"). - -**Re-Compiling and Re-Installing the Bootloader** - -Whenever you have modified the bootloader you need to recompile and re-install it and re-synthesize your design. - -[start=1] -. Compile and install the bootloader using the explicit `bootloader` makefile target. - -[source,bash] ----- -neorv32/sw/bootloader$ make bootloader ----- - -[start=1] -. Now perform a new synthesis / HDL compilation to update the bitstream with the new bootloader -image (some synthesis tools also allow to only update the BRAM initialization without re-running -the entire synthesis process). - -[NOTE] -The bootloader is intended to work regardless of the actual NEORV32 hardware configuration – -especially when it comes to CPU extensions. Hence, the bootloader should be build using the -minimal `rv32i` ISA only (`rv32e` would be even better). - - - - -<<< -// #################################################################################################################### -:sectnums: -=== Programming an External SPI Flash via the Bootloader - -As described in section <<_external_spi_flash_for_booting>> the bootloader provides an option to store an application image to an external SPI flash -and to read this image back for booting. These steps show how to store a - -[start=1] -. At first, reset the NEORV32 processor and wait until the bootloader start screen appears in your terminal program. -. Abort the auto boot sequence and start the user console by pressing any key. -. Press u to upload the program image, that you want to store to the external flash: - -[source] ----- -CMD:> u -Awaiting neorv32_exe.bin... ----- - -[start=4] -. Send the binary in raw binary via your terminal program. When the uploaded is completed and "OK" -appears, press `p` to trigger the programming of the flash (do not execute the image via the `e` -command as this might corrupt the image): - -[source] ----- -CMD:> u -Awaiting neorv32_exe.bin... OK -CMD:> p -Write 0x000013FC bytes to SPI flash @ 0x00800000? (y/n) ----- - -[start=5] -. The bootloader shows the size of the executable and the base address inside the SPI flash where the -executable is going to be stored. A prompt appears: Type `y` to start the programming or type `n` to -abort. See section <<_external_spi_flash_for_booting> for more information on how to configure the base address. - -[source] ----- -CMD:> u -Awaiting neorv32_exe.bin... OK -CMD:> p -Write 0x000013FC bytes to SPI flash @ 0x00800000? (y/n) y -Flashing... OK -CMD:> ----- - -[start=6] -. If "OK" appears in the terminal line, the programming process was successful. Now you can use the -auto boot sequence to automatically boot your application from the flash at system start-up without -any user interaction. - - - -<<< -// #################################################################################################################### -:sectnums: -=== Simulating the Processor - -**Testbench** - -The NEORV32 project features a simple default testbench (`sim/neorv32_tb.vhd`) that can be used to simulate -and test the processor setup. This testbench features a 100MHz clock and enables all optional peripheral and -CPU extensions except for the `E` extension and the TRNG IO module (that CANNOT be simulated due to its -combinatorial (looped) oscillator architecture). - -The simulation setup is configured via the "User Configuration" section located right at the beginning of -the testbench's architecture. Each configuration constant provides comments to explain the functionality. - -Besides the actual NEORV32 Processor, the testbench also simulates "external" components that are connected -to the processor's external bus/memory interface. These components are: - -* an external instruction memory (that also allows booting from it) -* an external data memory -* an external memory to simulate "external IO devices" -* a memory-mapped registers to trigger the processor's interrupt signals - -The following table shows the base addresses of these four components and their default configuration and -properties (attributes: `r` = read, `w` = write, `e` = execute, `a` = atomic accesses possible, `8` = byte-accessible, `16` = -half-word-accessible, `32` = word-accessible). - -.Testbench: processor-external memories -[cols="^4,>3,^5,<11"] -[options="header",grid="rows"] -|======================= -| Base address | Size | Attributes | Description -| `0x00000000` | `imem_size_c` | `r/w/e, a, 8/16/32` | external IMEM (initialized with application image) -| `0x80000000` | `dmem_size_c` | `r/w/e, a, 8/16/32` | external DMEM -| `0xf0000000` | 64 bytes | `r/w/e, !a, 8/16/32` | external "IO" memory, atomic accesses will fail -| `0xff000000` | 4 bytes | `-/w/-, a, -/-/32` | memory-mapped register to trigger "machine external", "machine software" and "SoC Fast Interrupt" interrupts -|======================= - -The simulated NEORV32 does not use the bootloader and directly boots the current application image (from -the `rtl/core/neorv32_application_image.vhd` image file). Make sure to use the `all` target of the -makefile to install your application as VHDL image after compilation: - -[source, bash] ----- -sw/example/blink_led$ make clean_all all ----- - -.Simulation-Optimized CPU/Processors Modules -[NOTE] -The `sim/rtl_modules` folder provides simulation-optimized versions of certain CPU/processor modules. -These alternatives can be used to replace the default CPU/processor HDL files to allow faster/easier/more -efficient simulation. **These files are not intended for synthesis!** - -**Simulation Console Output** - -Data written to the NEORV32 UART0 / UART1 transmitter is send to a virtual UART receiver implemented -as part of the testbench. Received chars are send to the simulator console and are also stored to a log file -(`neorv32.testbench_uart0.out` for UART0, `neorv32.testbench_uart1.out` for UART1) inside the simulator home folder. - -**Faster Simulation Console Output** - -When printing data via the UART the communication speed will always be based on the configured BAUD -rate. For a simulation this might take some time. To have faster output you can enable the **simulation mode** -or UART0/UART1 (see section <<_primary_universal_asynchronous_receiver_and_transmitter_uart0>>). - -ASCII data send to UART0 will be immediately printed to the simulator console. Additionally, the -ASCII data is logged in a file (`neorv32.uart0.sim_mode.text.out`) in the simulator home folder. All -written 32-bit data is also dumped as 8-char hexadecimal value into a file -(`neorv32.uart0.sim_mode.data.out`) also in the simulator home folder. - -ASCII data send to UART1 will be immediately printed to the simulator console. Additionally, the -ASCII data is logged in a file (`neorv32.uart1.sim_mode.text.out`) in the simulator home folder. All -written 32-bit data is also dumped as 8-char hexadecimal value into a file -(`neorv32.uart1.sim_mode.data.out`) also in the simulator home folder. - -You can "automatically" enable the simulation mode of UART0/UART1 when compiling an application. In this case the -"real" UART0/UART1 transmitter unit is permanently disabled. To enable the simulation mode just compile -and install your application and add _UART0_SIM_MODE_ for UART0 and/or _UART1_SIM_MODE_ for UART1 to -the compiler's _USER_FLAGS_ variable (do not forget the `-D` suffix flag): - -[source, bash] ----- -sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all all ----- - -The provided define will change the default UART0/UART1 setup function in order to set the simulation mode flag in the according UART's control register. - -[NOTE] -The UART simulation output (to file and to screen) outputs "complete lines" at once. A line is -completed with a line feed (newline, ASCII `\n` = 10). - -**Simulation with Xilinx Vivado** - -The project features default a Vivado simulation waveform configuration in `sim/vivado`. - -**Simulation with GHDL** - -To simulate the processor using _GHDL_ navigate to the `sim` folder and run the provided shell script. All arguments are passed to GHDL. -For example the simulation time can be configured using `--stop-time=4ms` as argument. - -[source, bash] ----- -neorv32/sim$ sh ghdl_sim.sh --stop-time=4ms ----- - - - -<<< -// #################################################################################################################### -:sectnums: -=== Building the Software Framework Documentation - -All core library software sources (libraries `sw/lib`, example programs `sw/example`, ...) are highly documented using _doxygen_. -To build the documentation by yourself navigate to the project's `doc` folder and run _doxygen_: - -[source,bash] ----- -neorv32/docs$ doxygen Doxyfile ----- - -This will generate the `docs/doxygen_build` folder. To view the documentation, open the -`docs/doxygen_build/html/index.html` file with your browser of choice. Click on the "files" tab to -see a list of all documented files. - -[TIP] -The documentation is automatically built and deployed to GitHub pages by a CI workflow (https://stnolting.github.io/neorv32/sw/files.html). - - - -// #################################################################################################################### -:sectnums: -=== Building the Project Documentation - -This data sheet is written using `asciidoc`. The source files are locates in `docs/src_adoc`. -a makefiles in project's root directory is provided to either build a single pdf file -or to build a HTML-based documentation. - -Pre-rendered pdf are available online as nightly pre-releases: https://github.com/stnolting/neorv32/releases. -The HTML-based documentation is also available online at the project' https://stnolting.github.io/neorv32/[GitHub Pages]. - -.Generate PDF documentation `docs/NEORV32.pdf` using `asciidoctor-pdf` -[source,bash] ----- -neorv32$ make pdf ----- - -.Generate HTML documentation `docs/index.html` using `asciidoctor` -[source,bash] ----- -neorv32$ make html ----- - -[TIP] -If you don't have `asciidoctor` / `asciidoctor-pdf` installed, you can still generate all the documentation using -a _docker container_ via `make container`. - - -<<< -// #################################################################################################################### -:sectnums: -=== FreeRTOS Support - -A NEORV32-specific port and a simple demo for FreeRTOS (https://github.com/FreeRTOS/FreeRTOS) are -available in the `sw/example/demo_freeRTOS` folder. - -See the according documentation (`sw/example/demo_freeRTOS/README.md`) for more information. - - - -// #################################################################################################################### -:sectnums: -=== RISC-V Architecture Test Framework - -The NEORV32 Processor passes the according tests provided by the official RISC-V Architecture Test Suite -(V2.0+), which is available online at GitHub: https://github.com/riscv/riscv-arch-test - -All files required for executing the test framework on a simulated instance of the processor (including port -files) are located in the `riscv-arch-test` folder in the root directory of the NEORV32 repository. Take a -look at the provided `riscv-arch-test/README.md` (https://github.com/stnolting/neorv32/blob/master/riscv-arch-test/README.md[online at GitHunb]) -file for more information on how to run the tests and how testing is conducted in detail. - - - -<<< -// #################################################################################################################### -:sectnums: -=== Debugging using the On-Chip Debugger - -The NEORV32 <<_on_chip_debugger_ocd>> allows _online_ in-system debugging via an external JTAG access port from a -host machine. The general flow is independent of the host machine's operating system. However, this tutorial uses -Windows and Linux (Ubuntu on Windows) in parallel. - -[NOTE] -This tutorial uses `gdb` to **directly upload an executable** to the processor. If you are using the default -processor setup _with_ internal instruction memory (IMEM) make sure it is implemented as RAM -(_MEM_INT_IMEM_ROM_ generic = false). - - -:sectnums: -==== Hardware Requirements - -Make sure the on-chip debugger of your NEORV32 setups is implemented (_ON_CHIP_DEBUGGER_EN_ generic = true). -Connect a JTAG adapter to the NEORV32 `jtag_*` interface signals. If you do not have a full-scale JTAG adapter, you can -also use a FTDI-based adapter like the "FT2232H-56Q Mini Module", which is a simple and inexpensive FTDI breakout board. - -.JTAG pin mapping -[cols="^3,^2,^2"] -[options="header",grid="rows"] -|======================= -| NEORV32 top signal | JTAG signal | FTDI port -| `jtag_tck_i` | TCK | D0 -| `jtag_tdi_i` | TDI | D1 -| `jtag_tdo_o` | TDO | D2 -| `jtag_tms_i` | TMS | D3 -| `jtag_trst_i` | TRST | D4 -|======================= - -[TIP] -The low-active JTAG _test reset_ (TRST) signals is _optional_ as a reset can also be triggered via the TAP controller. -If TRST is not used make sure to pull the signal _high_. - - -:sectnums: -==== OpenOCD - -The NEORV32 on-chip debugger can be accessed using the https://github.com/riscv/riscv-openocd[RISC-V port of OpenOCD]. -Prebuilt binaries can be obtained - for example - from https://www.sifive.com/software[SiFive]. A pre-configured -OpenOCD configuration file (`sw/openocd/openocd_neorv32.cfg`) is available that allows easy access to the NEORV32 CPU. - -[NOTE] -You might need to adapt `ftdi_vid_pid`, `ftdi_channel` and `ftdi_layout_init` in `sw/openocd/openocd_neorv32.cfg` -according to your interface chip and your operating system. - -[TIP] -If you want to modify the JTAG clock speed (via `adapter speed` in `sw/openocd/openocd_neorv32.cfg`) make sure to meet -the clock requirements noted in <<_debug_transport_module_dtm>>. - -To access the processor using OpenOCD, open a terminal and start OpenOCD with the pre-configured configuration file. - -.Connecting via OpenOCD (on Windows) -[source, bash] --------------------------- -N:\Projects\neorv32\sw\openocd>openocd -f openocd_neorv32.cfg -Open On-Chip Debugger 0.11.0-rc1+dev (SiFive OpenOCD 0.10.0-2020.12.1) -Licensed under GNU GPL v2 -For bug reports: - https://github.com/sifive/freedom-tools/issues -1 -Info : Listening on port 6666 for tcl connections -Info : Listening on port 4444 for telnet connections -Info : clock speed 1000 kHz -Info : JTAG tap: neorv32.cpu tap/device found: 0x0cafe001 (mfg: 0x000 (), part: 0xcafe, ver: 0x0) -Info : datacount=1 progbufsize=2 -Info : Disabling abstract command reads from CSRs. -Info : Examined RISC-V core; found 1 harts -Info : hart 0: XLEN=32, misa=0x40801105 -Info : starting gdb server for neorv32.cpu.0 on 3333 -Info : Listening on port 3333 for gdb connections --------------------------- - -OpenOCD has successfully connected to the NEORV32 on-chip debugger and has examined the CPU (showing the content of -the `misa` CSRs). Now you can use `gdb` to connect via port 3333. - - -:sectnums: -==== Debugging with GDB - -This guide uses the simple "blink example" from `sw/example/blink_led` as simplified test application to -show the basics of in-system debugging. - -At first, the application needs to be compiled. We will use the minimal machine architecture configuration -(`rv32i`) here to be independent of the actual processor/CPU configuration. -Navigate to `sw/example/blink_led` and compile the application: - -.Compile the test application -[source, bash] --------------------------- -.../neorv32/sw/example/blink_led$ make MARCH=-march=rv32i clean_all all --------------------------- - -This will generate an ELF file `main.elf` that contains all the symbols required for debugging. -Furthermore, an assembly listing file `main.asm` is generated that we will use to define breakpoints. - -Open another terminal in `sw/example/blink_led` and start `gdb`. -The GNU debugger is part of the toolchain (see <<_toolchain_setup>>). - -.Starting GDB (on Linux (Ubuntu on Windows)) -[source, bash] --------------------------- -.../neorv32/sw/example/blink_led$ riscv32-unknown-elf-gdb -GNU gdb (GDB) 10.1 -Copyright (C) 2020 Free Software Foundation, Inc. -License GPLv3+: GNU GPL version 3 or later -This is free software: you are free to change and redistribute it. -There is NO WARRANTY, to the extent permitted by law. -Type "show copying" and "show warranty" for details. -This GDB was configured as "--host=x86_64-pc-linux-gnu --target=riscv32-unknown-elf". -Type "show configuration" for configuration details. -For bug reporting instructions, please see: -. -Find the GDB manual and other documentation resources online at: - . - -For help, type "help". -Type "apropos word" to search for commands related to "word". -(gdb) --------------------------- - -Now connect to OpenOCD using the default port 3333 on your local machine. -Set the ELF file we want to debug to the recently generated `main.elf` from the `blink_led` example. -Finally, upload the program to the processor. - -[NOTE] -The executable that is uploaded to the processor is **not** the default NEORV32 executable (`neorv32_exe.bin`) that -is used for uploading via the bootloader. Instead, all the required sections (like `.text`) are extracted from `mail.elf` -by GDB and uploaded via the debugger's indirect memory access. - -.Running GDB -[source, bash] --------------------------- -(gdb) target remote localhost:3333 <1> -Remote debugging using localhost:3333 -warning: No executable has been specified and target does not support -determining executable automatically. Try using the "file" command. -0xffff0c94 in ?? () <2> -(gdb) file main.elf <3> -A program is being debugged already. -Are you sure you want to change the file? (y or n) y -Reading symbols from main.elf... -(gdb) load <4> -Loading section .text, size 0xd0c lma 0x0 -Loading section .rodata, size 0x39c lma 0xd0c -Start address 0x00000000, load size 4264 -Transfer rate: 43 KB/sec, 2132 bytes/write. -(gdb) --------------------------- -<1> Connect to OpenOCD -<2> The CPU was still executing code from the bootloader ROM - but that does not matter here -<3> Select `mail.elf` from the `blink_led` example -<4> Upload the executable - -After the upload, GDB will make the processor jump to the beginning of the uploaded executable -(by default, this is the beginning of the instruction memory at `0x00000000`) skipping the bootloader -and halting the CPU right before executing the `blink_led` application. - - -:sectnums: -===== Breakpoint Example - -The following steps are just a small showcase that illustrate a simple debugging scheme. - -While compiling `blink_led`, an assembly listing file `main.asm` was generated. -Open this file with a text editor to check out what the CPU is going to do when resumed. - -The `blink_led` example implements a simple counter on the 8 lowest GPIO output ports. The program uses -"busy wait" to have a visible delay between increments. This waiting is done by calling the `neorv32_cpu_delay_ms` -function. We will add a _breakpoint_ right at the end of this wait function so we can step through the iterations -of the counter. - -.Cut-out from `main.asm` generated from the `blink_led` example -[source, assembly] --------------------------- -00000688 <__neorv32_cpu_delay_ms_end>: - 688: 01c12083 lw ra,28(sp) - 68c: 02010113 addi sp,sp,32 - 690: 00008067 ret --------------------------- - -The very last instruction of the `neorv32_cpu_delay_ms` function is `ret` (= return) -at hexadecimal `690` in this example. Add this address as _breakpoint_ to GDB. - -[NOTE] -The address might be different if you use a different version of the software framework or -if different ISA options are configured. - -.Adding a GDB breakpoint -[source, bash] --------------------------- -(gdb) b * 0x690 -Breakpoint 1 at 0x690 --------------------------- - -Now execute `c` (= continue). The CPU will resume operation until it hits the break-point. -By this we can "step" from increment to increment. - -.Iterating from breakpoint to breakpoint -[source, bash] --------------------------- -Breakpoint 1 at 0x690 -(gdb) c -Continuing. - -Breakpoint 1, 0x00000690 in neorv32_cpu_delay_ms () -(gdb) c -Continuing. - -Breakpoint 1, 0x00000690 in neorv32_cpu_delay_ms () -(gdb) c -Continuing. --------------------------- - Index: src_adoc/neorv32-theme.yml =================================================================== --- src_adoc/neorv32-theme.yml (revision 59) +++ src_adoc/neorv32-theme.yml (nonexistent) @@ -1,48 +0,0 @@ -extends: default -page: - margin: [0.8in, 0.67in, 0.75in, 0.67in] -link: - font-color: #edac00 -image: - align: center -caption: - align: center -running-content: - start-at: toc -header: - height: 0.65in - vertical-align: bottom - image-vertical-align: bottom - font-size: 11 - border-color: #000000 - border-width: 1 - recto: - left: - content: '*The NEORV32 Processor*' - right: - content: '*Visit on https://github.com/stnolting/neorv32[GitHub]*' - verso: - left: - content: '*The NEORV32 Processor*' - right: - content: '*Visit on https://github.com/stnolting/neorv32[GitHub]*' -footer: - start-at: toc - height: 0.75in - font-size: 10 - border-color: #000000 - border-width: 1 - recto: - left: - content: '{page-number} / {page-count}' - center: - content: 'Copyright (c) 2021, Stephan Nolting. All rights reserved.' - right: - content: '{docdate}' - verso: - left: - content: '{page-number} / {page-count}' - center: - content: 'NEORV32 Version: {revnumber}' - right: - content: '{docdate}' Index: datasheet/content.adoc =================================================================== --- datasheet/content.adoc (nonexistent) +++ datasheet/content.adoc (revision 60) @@ -0,0 +1,14 @@ +<<< +// #################################################################################################################### + +include::overview.adoc[] + +include::soc.adoc[] + +include::cpu.adoc[] + +include::software.adoc[] + +include::on_chip_debugger.adoc[] + +include::../legal.adoc[] Index: datasheet/cpu.adoc =================================================================== --- datasheet/cpu.adoc (nonexistent) +++ datasheet/cpu.adoc (revision 60) @@ -0,0 +1,1005 @@ +:sectnums: +== NEORV32 Central Processing Unit (CPU) + +image::riscv_logo.png[width=350,align=center] + +**Key Features** + +* 32-bit pipelined/multi-cycle in-order `rv32` RISC-V CPU +* Optional RISC-V extensions: `rv32[i/e][m][a][c][u]` + `[Zfinx][Zicsr][Zifencei]` + `[debug_mode]` (for on-chip debugging) +* Compatible to the RISC-V user specifications and a subset of the RISC-V privileged architecture specifications – passes the official RISC-V Architecture Tests (v2+) +* Official RISC-V open-source architecture ID +* Standard RISC-V interrupts (_external_, _timer_, _software_) plus 16 _fast_ interrupts and 1 non-maskable interrupt +* Supports most of the traps from the RISC-V specifications (including bus access exceptions) and traps on all unimplemented/illegal/malformed instructions +* Optional physical memory configuration (PMP), compatible to the RISC-V specifications +* Optional hardware performance monitors (HPM) for application benchmarking +* Separated interfaces for instruction fetch and data access (merged into single bus via a bus switch for +the NEORV32 processor) +* little-endian byte order +* Configurable hardware reset +* No hardware support of unaligned data/instruction accesses – they will trigger an exception. If the C extension is enabled instructions +can also be 16-bit aligned and a misaligned instruction address exception is not possible anymore + +[NOTE] +It is recommended to use the **NEORV32 Processor** as default top instance even if you only want to use the actual +CPU. Simply disable all the processor-internal modules via the generics and you will get a "CPU +wrapper" that provides a minimal CPU environment and an external bus interface (like AXI4). This +setup also allows to further use the default bootloader and software framework. From this base you +can start building your own SoC. Of course you can also use the CPU in it’s true stand-alone mode. + +[NOTE] +This documentation assumes the reader is familiar with the official RISC-V "User" and "Privileged Architecture" specifications. + +<<< +// #################################################################################################################### +:sectnums: +=== Architecture + +The NEORV32 CPU was designed from scratch based only on the official ISA / privileged architecture +specifications. The following figure shows the simplified architecture of the CPU. + +image::neorv32_cpu.png[align=center] + +The CPU uses a pipelined architecture with basically two main stages. The first stage (IF – instruction fetch) +is responsible for fetching new instruction data from memory via the fetch engine. The instruction data is +stored to a FIFO – the instruction prefetch buffer. The issue engine takes this data and assembles 32-bit +instruction words for the next pipeline stage. Compressed instructions – if enabled – are also decompressed +in this stage. The second stage (EX – execution) is responsible for actually executing the fetched instructions +via the execute engine. + +These two pipeline stages are based on a multi-cycle processing engine. So the processing of each stage for a +certain operations can take several cycles. Since the IF and EX stages are decoupled via the instruction +prefetch buffer, both stages can operate in parallel and with overlapping operations. Hence, the optimal CPI +(cycles per instructions) is 2, but it can be significantly higher: For instance when executing loads/stores +multi-cycle operations like divisions or when the instruction fetch engine has to reload the prefetch buffers +due to a taken branch. + +Basically, the NEORV32 CPU is somewhere between a classical pipelined architecture, where each stage +requires exactly one processing cycle (if not stalled) and a classical multi-cycle architecture, which executes +every single instruction in a series of consecutive micro-operations. The combination of these two classical +design paradigms allows an increased instruction execution in contrast to a pure multi-cycle approach (due to +the pipelined approach) at a reduced hardware footprint (due to the multi-cycle approach). + +The CPU provides independent interfaces for instruction fetch and data access. These two bus interfaces are +merged into a single processor-internal bus via a bus switch. Hence, memory locations including peripheral +devices are mapped to a single 32-bit address space making the architecture a modified Von-Neumann +Architecture. + + +// #################################################################################################################### +:sectnums: +=== RISC-V Compatibility + +The NEORV32 CPU passes the rv32_m/I, rv32_m/M, rv32_m/C, rv32_m/privilege, and +rv32_m/Zifencei tests of the official RISC-V Architecture Tests (GitHub). The port files for the +NEORV32 processor are located in the repository's `riscv-arch-test` folder. See section <<_risc_v_architecture_test_framework>> +for information how to run the tests on the NEORV32. + +.**RISC-V `rv32_m/C` Tests** +................................... +Check cadd-01 ... OK +Check caddi-01 ... OK +Check caddi16sp-01 ... OK +Check caddi4spn-01 ... OK +Check cand-01 ... OK +Check candi-01 ... OK +Check cbeqz-01 ... OK +Check cbnez-01 ... OK +Check cebreak-01 ... OK +Check cj-01 ... OK +Check cjal-01 ... OK +Check cjalr-01 ... OK +Check cjr-01 ... OK +Check cli-01 ... OK +Check clui-01 ... OK +Check clw-01 ... OK +Check clwsp-01 ... OK +Check cmv-01 ... OK +Check cnop-01 ... OK +Check cor-01 ... OK +Check cslli-01 ... OK +Check csrai-01 ... OK +Check csrli-01 ... OK +Check csub-01 ... OK +Check csw-01 ... OK +Check cswsp-01 ... OK +Check cxor-01 ... OK +-------------------------------- +OK: 27/27 RISCV_TARGET=neorv32 RISCV_DEVICE=C XLEN=32 +................................... + +.**RISC-V `rv32_m/I` Tests** +................................... +Check add-01 ... OK +Check addi-01 ... OK +Check and-01 ... OK +Check andi-01 ... OK +Check auipc-01 ... OK +Check beq-01 ... OK +Check bge-01 ... OK +Check bgeu-01 ... OK +Check blt-01 ... OK +Check bltu-01 ... OK +Check bne-01 ... OK +Check fence-01 ... OK +Check jal-01 ... OK +Check jalr-01 ... OK +Check lb-align-01 ... OK +Check lbu-align-01 ... OK +Check lh-align-01 ... OK +Check lhu-align-01 ... OK +Check lui-01 ... OK +Check lw-align-01 ... OK +Check or-01 ... OK +Check ori-01 ... OK +Check sb-align-01 ... OK +Check sh-align-01 ... OK +Check sll-01 ... OK +Check slli-01 ... OK +Check slt-01 ... OK +Check slti-01 ... OK +Check sltiu-01 ... OK +Check sltu-01 ... OK +Check sra-01 ... OK +Check srai-01 ... OK +Check srl-01 ... OK +Check srli-01 ... OK +Check sub-01 ... OK +Check sw-align-01 ... OK +Check xor-01 ... OK +Check xori-01 ... OK +-------------------------------- +OK: 38/38 RISCV_TARGET=neorv32 RISCV_DEVICE=I XLEN=32 +................................... + +.**RISC-V `rv32_m/M` Tests** +................................... +Check div-01 ... OK +Check divu-01 ... OK +Check mul-01 ... OK +Check mulh-01 ... OK +Check mulhsu-01 ... OK +Check mulhu-01 ... OK +Check rem-01 ... OK +Check remu-01 ... OK +-------------------------------- +OK: 8/8 RISCV_TARGET=neorv32 RISCV_DEVICE=M XLEN=32 +................................... + +.**RISC-V `rv32_m/privilege` Tests** +................................... +Check ebreak ... OK +Check ecall ... OK +Check misalign-beq-01 ... OK +Check misalign-bge-01 ... OK +Check misalign-bgeu-01 ... OK +Check misalign-blt-01 ... OK +Check misalign-bltu-01 ... OK +Check misalign-bne-01 ... OK +Check misalign-jal-01 ... OK +Check misalign-lh-01 ... OK +Check misalign-lhu-01 ... OK +Check misalign-lw-01 ... OK +Check misalign-sh-01 ... OK +Check misalign-sw-01 ... OK +Check misalign1-jalr-01 ... OK +Check misalign2-jalr-01 ... OK +-------------------------------- +OK: 16/16 RISCV_TARGET=neorv32 RISCV_DEVICE=privilege XLEN=32 +................................... + +.**RISC-V `rv32_m/Zifencei` Tests** +................................... +Check Fencei ... OK +-------------------------------- +OK: 1/1 RISCV_TARGET=neorv32 RISCV_DEVICE=Zifencei XLEN=32 +................................... + + +<<< +:sectnums: +==== RISC-V Incompatibility Issues and Limitations + +This list shows the currently known issues regarding full RISC-V-compatibility. More specific information +can be found in section <<_instruction_sets_and_extensions>>. + +[IMPORTANT] +The `misa` CSR is read-only. It shows the synthesized CPU extensions. Hence, all implemented +CPU extensions are always active and cannot be enabled/disabled dynamically during runtime. Any +write access to it (in machine mode) is ignored and will not cause any exception or side-effects. + +[IMPORTANT] +The `mip` CSR is read-only. Pending IRQs can be cleared using the `mie` CSR. + +[IMPORTANT] +The `mtval` CSR is read-only. + +[IMPORTANT] +The physical memory protection (see section <<_machine_physical_memory_protection>>) +only supports the modes _OFF_ and _NAPOT_ yet and a minimal granularity of 8 bytes per region. + +[IMPORTANT] +The `A` CPU extension (atomic memory access) only implements the `lr.w` and `sc.w` instructions yet. +However, these instructions are sufficient to emulate all further AMO operations. + + +<<< +// #################################################################################################################### +:sectnums: +=== CPU Top Entity - Signals + +The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The +type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal +direction seen from the CPU. + +.NEORV32 CPU top entity signals +[cols="<2,^1,^1,<6"] +[options="header", grid="rows"] +|======================= +| Signal | Width | Dir. | Function +4+^| **Global Signals** +| `clk_i` | 1 | in | global clock line, all registers triggering on rising edge +| `rstn_i` | 1 | in | global reset, low-active +| `sleep_o` | 1 | out | CPU is in sleep mode when set +4+^| **Instruction Bus Interface (<<_bus_interface>>)** +| `i_bus_addr_o` | 32 | out | destination address +| `i_bus_rdata_i` | 32 | in | read data +| `i_bus_wdata_o` | 32 | out | write data (always zero) +| `i_bus_ben_o` | 4 | out | byte enable +| `i_bus_we_o` | 1 | out | write transaction (always zero) +| `i_bus_re_o` | 1 | out | read transaction +| `i_bus_lock_o` | 1 | out | exclusive access request (always zero) +| `i_bus_ack_i` | 1 | in | bus transfer acknowledge from accessed peripheral +| `i_bus_err_i` | 1 | in | bus transfer terminate from accessed peripheral +| `i_bus_fence_o` | 1 | out | indicates an executed _fence.i_ instruction +| `i_bus_priv_o` | 2 | out | current CPU privilege level +4+^| **Data Bus Interface (<<_bus_interface>>)** +| `d_bus_addr_o` | 32 | out | destination address +| `d_bus_rdata_i` | 32 | in | read data +| `d_bus_wdata_o` | 32 | out | write data +| `d_bus_ben_o` | 4 | out | byte enable +| `d_bus_we_o` | 1 | out | write transaction +| `d_bus_re_o` | 1 | out | read transaction +| `d_bus_lock_o` | 1 | out | exclusive access request +| `d_bus_ack_i` | 1 | in | bus transfer acknowledge from accessed peripheral +| `d_bus_err_i` | 1 | in | bus transfer terminate from accessed peripheral +| `d_bus_fence_o` | 1 | out | indicates an executed _fence_ instruction +| `d_bus_priv_o` | 2 | out | current CPU privilege level +4+^| **System Time (see <<_timeh>> CSR)** +| `time_i` | 64 | in | system time input (from MTIME) +4+^| **Non-Maskable Interrupt (<<_traps_exceptions_and_interrupts>>)** +| `nm_irq_i` | 1 | in | non-maskable interrupt +4+^| **Interrupts, RISC-V-compatible (<<_traps_exceptions_and_interrupts>>)** +| `msw_irq_i` | 1 | in | RISC-V machine software interrupt +| `mext_irq_i` | 1 | in | RISC-V machine external interrupt +| `mtime_irq_i` | 1 | in | RISC-V machine timer interrupt +4+^| **Fast Interrupts, NEORV32-specific (<<_traps_exceptions_and_interrupts>>)** +| `firq_i` | 16 | in | fast interrupt request signals +| `firq_ack_o` | 16 | out | fast interrupt acknowledge signals +4+^| **Enter Debug Mode Request (<<_on_chip_debugger_ocd>>)** +| `db_halt_req_i` | 1 | in | request CPU to halt and enter debug mode +|======================= + +<<< +// #################################################################################################################### +:sectnums: +=== CPU Top Entity - Generics + +Most of the CPU configuration generics are a subset of the actual Processor configuration generics (see section <<_processor_top_entity_generics>>). +and are not listed here. However, the CPU provides some _specific_ generics that are used to configure the CPU for the +NEORV32 processor setup. These generics are assigned by the processor setup only and are not available for user defined configuration. +The _specific_ generics are listed below. + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_BOOT_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000 +3+| This address defines the reset address at which the CPU starts fetching instructions after reset. In terms of the NEORV32 processor, this +generic is configured with the base address of the bootloader ROM (default) or with the base address of the processor-internal instruction +memory (IMEM) if the bootloader is disabled (_BOOTLOADER_EN_ = _false_). See section <<_address_space>> for more information. +|====== + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_DEBUG_ADDR** | _std_ulogic_vector(31 downto 0)_ | 0x00000000 +3+| This address defines the entry address for the "execution based" on-chip debugger. By default, this generic is configured with the base address +of the debugger memory. See section <<_on_chip_debugger_ocd>> for more information. +|====== + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_EXTENSION_RISCV_DEBUG** | _boolean_ | false +3+| Implement RISC-V-compatible "debug" CPU operation mode. See section <<_cpu_debug_mode>> for more information. +|====== + + +<<< +// #################################################################################################################### +:sectnums: +=== Instruction Sets and Extensions + +The NEORV32 is an RISC-V `rv32i` architecture that provides several optional RISC-V CPU and ISA +(instruction set architecture) extensions. For more information regarding the RISC-V ISA extensions please +see the The _RISC-V Instruction Set Manual – Volume I: Unprivileged ISA_ and _The RISC-V Instruction Set Manual +Volume II: Privileged Architecture_, which are available in the projects `docs/references` folder. + +[TIP] +The CPU can discover available ISA extensions via the <<_misa>> and <<_mzext>> CSRs or by executing an instruction +and checking for an _illegal instruction exception_. + + +==== **`A`** - Atomic Memory Access + +Atomic memory access instructions (for implementing semaphores and mutexes) are available when the +`CPU_EXTENSION_RISCV_A` configuration generic is _true_. In this case the following additional instructions +are available: + +* `lr.w`: load-reservate +* `sc.w`: store-conditional + +[NOTE] +Even though only `lr.w` and `sc.w` instructions are implemented yet, all further atomic operations +(load-modify-write instruction) can be emulated using these two instruction. Furthermore, the +instruction’s ordering flags (`aq` and `lr`) are ignored by the CPU hardware. Using any other (not yet +implemented) AMO (atomic memory operation) will trigger an illegal instruction exception. + +[NOTE] +The atomic instructions have special requirements for memory system / bus interconnect. More +information can be found in sections <<_bus_interface>> and <<_processor_external_memory_interface_wishbone_axi4_lite>>, respectively. + + +==== **`C`** - Compressed Instructions + +Compressed 16-bit instructions are available when the `CPU_EXTENSION_RISCV_C` configuration generic is +_true_. In this case the following instructions are available: + +* `c.addi4spn`, `c.lw`, `c.sw`, `c.nop`, `c.addi`, `c.jal`, `c.li`, `c.addi16sp`, `c.lui`, `c.srli`, `c.srai` `c.andi`, `c.sub`, +`c.xor`, `c.or`, `c.and`, `c.j`, `c.beqz`, `c.bnez`, `c.slli`, `c.lwsp`, `c.jr`, `c.mv`, `c.ebreak`, `c.jalr`, `c.add`, `c.swsp` + +[NOTE] +When the compressed instructions extension is enabled, branches to an _unaligned_ and _uncompressed_ address require +an additional instruction fetch to load the required second half-word of that instruction. The performance can be increased +again by forcing a 32-bit alignment of branch target addresses. By default, this is enforced via the GCC `-falign-functions=4`, +`-falign-labels=4`, `-falign-loops=4` and `-falign-jumps=4` compile flags (via the makefile). + + +==== **`E`** - Embedded CPU + +The embedded CPU extensions reduces the size of the general purpose register file from 32 entries to 16 entries to reduce hardware +requirements. This extensions is enabled when the `CPU_EXTENSION_RISCV_E` configuration generic is _true_. Accesses to registers beyond +`x15` will raise and _illegal instruction exception_. + +Due to the reduced register file an alternate ABI (**`ilp32e`**) is required for the toolchain. + + +==== **`I`** - Base Integer ISA +The CPU always supports the complete `rv32i` base integer instruction set. This base set is always enabled +regardless of the setting of the remaining exceptions. The base instruction set includes the following +instructions: + +* immediates: `lui`, `auipc` +* jumps: `jal`, `jalr` +* branches: `beq`, `bne`, `blt`, `bge`, `bltu`, `bgeu` +* memory: `lb`, `lh`, `lw`, `lbu`, `lhu`, `sb`, `sh`, `sw` +* alu: `addi`, `slti`, `sltiu`, `xori`, `ori`, `andi`, `slli`, `srli`, `srai`, `add`, `sub`, `sll`, `slt`, `sltu`, `xor`, `srl`, `sra`, `or`, `and` +* environment: `ecall`, `ebreak`, `fence` + +[NOTE] +In order to keep the hardware footprint low, the CPU's shift unit uses a hybrid parallel/serial approach. Shift +operations are split in coarse shifts (multiples of 4) and a final fine shift (0 to 3). The total execution +time depends on the shift amount. Alternatively, the shift operations can be processed completely in parallels by a fast +(but large) barrel shifter when the `FAST_SHIFT_EN` generic is _true_. In that case, shift operations +complete within 2 cycles regardless of the shift amount. Shift operations can also be executed in a pure serial manner when +then `TINY_SHIFT_EN` generic is _true_. In that case, shift operations take up to 32 cycles depending on the shift amount. + +[NOTE] +Internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the +top’s `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been +executed. Any flags within the `fence` instruction word are ignore by the hardware. + + +==== **`M`** - Integer Multiplication and Division + +Hardware-accelerated integer multiplication and division instructions are available when the +`CPU_EXTENSION_RISCV_M` configuration generic is _true_. In this case the following instructions are +available: + +• multiplication: `mul`, `mulh`, `mulhsu`, `mulhu` +• division: `div`, `divu`, `rem`, `remu` + +[NOTE] +By default, multiplication and division operations are executed in a bit-serial approach. +Alternatively, the multiplier core can be implemented using DSP blocks if the `FAST_MUL_EN` +generic is _true_ allowing faster execution. Multiplications and divisions +always require a fixed amount of cycles to complete - regardless of the input operands. + + +==== **`U`** - Less-Privileged User Mode + +Adds the less-privileged _user mode_ when the `CPU_EXTENSION_RISCV_U` configuration generic is _true_. For +instance, use-level code cannot access machine-mode CSRs. Furthermore, access to the address space (like +peripheral/IO devices) can be limited via the physical memory protection (_PMP_) unit for code running in user mode. + + +==== **`X`** - NEORV32-Specific (Custom) Extensions + +The NEORV32-specific extensions are always enabled and are indicated by the set `X` bit in the `misa` CSR. + +[NOTE] +The CPU provides 16 _fast interrupt_ interrupts (`FIRQ)`, which are controlled via custom bits in the `mie` +and `mip` CSR. This extension is mapped to bits, that are available for custom use (according to the +RISC-V specs). Also, custom trap codes for `mcause` are implemented. + +[NOTE] +The CPU provides a single _non-maskable_ interrupt (`NMI)` that also provides a custom trap code for `mcause`. + +[NOTE] +A custom CSR `mzext` is available that can be used to check for implemented `Z*` CPU extensions +(for example `Zifencei`). This CSR is mapped to the official "custom CSR address region". + +[NOTE] +All undefined/unimplemented/malformed/illegal instructions do raise an illegal instruction exception +(see <<_execution_safety>>). + + +==== **`Zfinx`** Single-Precision Floating-Point Operations + +The `Zfinx` floating-point extension is an alternative of the `F` floating-point instruction that also uses the +integer register file `x` to store and operate on floating-point data (hence, `F-in-x`). Since not dedicated floating-point `f` +register file exists, the `Zfinx` extension requires less hardware resources and features faster context changes. +This also implies that there are NO dedicated `f` register file related load/store or move instructions. The +official RISC-V specifications can be found here: https://github.com/riscv/riscv-zfinx + +The NEORV32 floating-point unit used by the `Zfinx` extension is compatible to the _IEEE-754_ specifications. + +The `Zfinx` extensions only supports single-precision (`.s` suffix) yet (so it is a direct alternative to the `F` +extension). The `Zfinx` extension is implemented when the `CPU_EXTENSION_RISCV_Zfinx` configuration +generic is _true_. In this case the following instructions and CSRs are available: + +* conversion: `fcvt.s.w`, `fcvt.s.wu`, `fcvt.w.s`, `fcvt.wu.s` +* comparison: `fmin.s`, `fmax.s`, `feq.s`, `flt.s`, `fle.s` +* computational: `fadd.s`, `fsub.s`, `fmul.s` +* sign-injection: `fsgnj.s`, `fsgnjn.s`, `fsgnjx.s` +* number classification: `fclass.s` + +* additional CSRs: `fcsr`, `frm`, `fflags` + +[WARNING] +Fused multiply-add instructions `f[n]m[add/sub].s` are not supported! +Division `fdiv.s` and square root `fsqrt.s` instructions are not supported yet! + +[WARNING] +Subnormal numbers (also "de-normalized" numbers) are not supported by the NEORV32 FPU. +Subnormal numbers (exponent = 0) are _flushed to zero_ (setting them to +/- 0) before entering the +FPU's processing core. If a computational instruction (like `fmul.s`) generates a subnormal result, the +result is also flushed to zero during normalization. + +[WARNING] +The `Zfinx` extension is not yet officially ratified, but is expected to stay unchanged. There is no +software support for the `Zfinx` extension in the upstream GCC RISC-V port yet. However, an +intrinsic library is provided to utilize the provided `Zfinx` floating-point extension from C-language +code (see `sw/example/floating_point_test`). + + +==== **`Zicsr`** Control and Status Register Access / Privileged Architecture + +The CSR access instructions as well as the exception and interrupt system (= the privileged architecture) is implemented when the +`CPU_EXTENSION_RISCV_Zicsr` configuration generic is _true_. In this case the following instructions are +available: + +* CSR access: `csrrw`, `csrrs`, `csrrc`, `csrrwi`, `csrrsi`, `csrrci` +* environment: `mret`, `wfi` + +[WARNING] +If the `Zicsr` extension is disabled the CPU does not provide any kind of interrupt or exception +support at all. In order to provide the full spectrum of functions and to allow a secure executions +environment, the `Zicsr` extension should always be enabled. + +[NOTE] +The "wait for interrupt instruction" `wfi` works like a sleep command. When executed, the CPU is +halted until a valid interrupt request occurs. To wake up again, the according interrupt source has to +be enabled via the `mie` CSR and the global interrupt enable flag in `mstatus` has to be set. + + +==== **`Zifencei`** Instruction Stream Synchronization + +The `Zifencei` CPU extension is implemented if the `CPU_EXTENSION_RISCV_Zifencei` configuration +generic is _true_. It allows manual synchronization of the instruction stream via the following instruction: + +* `fence.i` + +[NOTE] +The `fence.i` instruction resets the CPU's internal instruction fetch engine and flushes the prefetch buffer. +This allows a clean re-fetch of modified data from memory. Also, the top's `i_bus_fencei_o` signal is set +high for one cycle to inform the memory system. Any additional flags within the `fence.i` instruction word +are ignore by the hardware. + +[NOTE] +If the `Zifencei` extension is disabled (_CPU_EXTENSION_RISCV_Zifencei_ generic = false) executing +a `fence.i` instruction will be executed as `nop` (and will **not trap**) and none of the functions +described above will be executed. + + +==== **`PMP`** Physical Memory Protection + +The NEORV32 physical memory protection (PMP) is compatible to the PMP specified by the RISC-V specs. +The CPU PMP only supports _NAPOT_ mode yet and a minimal region size (granularity) of 8 bytes. Larger minimal sizes can be configured +via the top `PMP_MIN_GRANULARITY` generic to reduce hardware requirements. The physical memory protection system is implemented when the +`PMP_NUM_REGIONS` configuration generic is >0. In this case the following additional CSRs are available: + +* `pmpcfg*` (0..15, depending on configuration): PMP configuration registers +* `pmpaddr*` (0..63, depending on configuration): PMP address registers + +See section <<_machine_physical_memory_protection>> for more information regarding the PMP CSRs. + +**Configuration** + +The actual number of regions and the minimal region granularity are defined via the top entity +`PMP_MIN_GRANULARITY` and `PMP_NUM_REGIONS` generics. `PMP_MIN_GRANULARITY` defines the minimal available +granularity of each region in bytes. `PMP_NUM_REGIONS` defines the total number of implemented regions and thus, the +number of available `pmpcfg*` and `pmpaddr*` CSRs. + +When implementing more PMP regions that a _certain critical limit_ *an additional register stage +is automatically inserted* into the CPU's memory interfaces to reduce critical path length. Unfortunately, this will also +increase the latency of instruction fetches and data access by +1 cycle. + +The critical limit can be adapted for custom use by a constant from the main VHDL package file +(`rtl/core/neorv32_package.vhd`). The default value is 8: + +[source,vhdl] +---- +-- "critical" number of PMP regions -- +constant pmp_num_regions_critical_c : natural := 8; +---- + +**Operation** + +Any memory access address (from the CPU's instruction fetch or data access interface) is tested if it is accessing any +of the specified (configured via `pmpaddr*` and enabled via `pmpcfg*`) PMP regions. If an +address accesses one of these regions, the configured access rights (attributes in `pmpcfg*`) are checked: + +* a write access (store) will fail if no write attribute is set +* a read access (load) will fail if no read attribute is set +* an instruction fetch access will fail if no execute attribute is set + +If an access to a protected region does not have the according access rights (attributes) it will raise the according +_instruction/load/store access fault exception_. + +By default, all PMP checks are enforced for user-level programs only. If you wish to enforce the physical +memory protection also for machine-level programs you need to active the _locked bit_ in the according +`pmpcfg*` configuration. + +[IMPORTANT] +After updating the address configuration registers `pmpaddr*` the system requires up to 33 cycles for +internal (iterative) computations before the configuration becomes valid. + +[NOTE] +For more information regarding RISC-V physical memory protection see the official _The RISC-V +Instruction Set Manual – Volume II: Privileged Architecture_ specifications. + + +==== **`HPM`** Hardware Performance Monitors + +In additions to the mandatory cycles (`[m]cycle[h]`) and instruction (`[m]instret[h]`) counters the NEORV32 CPU provides +up to 29 hardware performance monitors (HPM 3..31), which can be used to benchmark applications. Each HPM consists of an +N-bit wide counter (split in a high-word 32-bit CSR and a low-word 32-bit CSR), where N is defined via the top's +`HPM_CNT_WIDTH` generic (0..64-bit), and a corresponding event configuration CSR. The event configuration +CSR defines the architectural events that lead to an increment of the associated HPM counter. + +The cycle, time and instructions-retired counters (`[m]cycle[h]`, `time[h]`, `[m]instret[h]`) are +mandatory performance monitors on every RISC-V platform and have fixed increment event. For example, +the instructions-retired counter increments with each executed instructions. The actual hardware performance +monitors are optional and can be configured to increment on arbitrary hardware events. The number of +available HPM is configured via the top's `HPM_NUM_CNTS` generic at synthesis time. Assigning a zero will exclude +all HPM logic from the design. + +Depending on the configuration, the following additional CSR are available: + +* counters: `[m]hpmcounter*[h]` (3..31, depending on configuration) +* event configuration: `mhpmevent*` (3..31, depending on configuration) + +User-level access to the counter registers `hpmcounter*[h]` can be individually restricted via the `mcounteren` CSR. +Auto-increment of the HPMs can be individually deactivated via the `mcountinhibit` CSR. + +If `HPM_NUM_CNTS` is lower than the maximumg value (=29) the remaining HPMs are not implemented. +However, accessing their associated CSRs will not raise an illegal instructions exception. These CSR are +read-only and will always return 0. + +[NOTE] +For a list of all allocated HPM-related CSRs and all provided event configurations see section <<_hardware_performance_monitors_hpm>>. + + +<<< +// #################################################################################################################### +:sectnums: +=== Instruction Timing + +The instruction timing listed in the table below shows the required clock cycles for executing a certain +instruction. These instruction cycles assume a bus access without additional wait states and a filled +pipeline. + +Average CPI (cycles per instructions) values for "real applications" like for executing the CoreMark benchmark for different CPU +configurations are presented in <<_cpu_performance>>. + +.Clock cycles per instruction +[cols="<2,^1,^4,<3"] +[options="header", grid="rows"] +|======================= +| Class | ISA | Instruction(s) | Execution cycles +| ALU | `I/E` | `addi` `slti` `sltiu` `xori` `ori` `andi` `add` `sub` `slt` `sltu` `xor` `or` `and` `lui` `auipc` | 2 +| ALU | `C` | `c.addi4spn` `c.nop` `c.addi` `c.li` `c.addi16sp` `c.lui` `c.andi` `c.sub` `c.xor` `c.or` `c.and` `c.add` `c.mv` | 2 +| ALU | `I/E` | `slli` `srli` `srai` `sll` `srl` `sra` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINY_SHIFT_EN` is enabled.]: 2..32 +| ALU | `C` | `c.srli` `c.srai` `c.slli` | 3 + SAfootnote:[Shift amount.]/4 + SA%4; FAST_SHIFTfootnote:[Barrel shift when `FAST_SHIFT_EN` is enabled.]: 4; TINY_SHIFTfootnote:[Serial shift when `TINS_SHIFT_EN` is enabled.]: 2..32 +| Branches | `I/E` | `beq` `bne` `blt` `bge` `bltu` `bgeu` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3 +| Branches | `C` | `c.beqz` `c.bnez` | Taken: 5 + MLfootnote:[Memory latency.]; Not taken: 3 +| Jumps / Calls | `I/E` | `jal` `jalr` | 4 + ML +| Jumps / Calls | `C` | `c.jal` `c.j` `c.jr` `c.jalr` | 4 + ML +| Memory access | `I/E` | `lb` `lh` `lw` `lbu` `lhu` `sb` `sh` `sw` | 4 + ML +| Memory access | `C` | `c.lw` `c.sw` `c.lwsp` `c.swsp` | 4 + ML +| Memory access | `A` | `lr.w` `sc.w` | 4 + ML +| Multiplication | `M` | `mul` `mulh` `mulhsu` `mulhu` | 2+31+3; FAST_MULfootnote:[DSP-based multiplication; enabled via `FAST_MUL_EN`.]: 5 +| Division | `M` | `div` `divu` `rem` `remu` | 22+32+4 +| Bit-manipulation - arithmetic/logic | `B(Zbb)` | `sext.b` `sext.h` `min` `minu` `max` `maxu` `andn` `orn` `xnor` `zext`(pack) `rev8`(grevi) `orc.b`(gorci) | 3 +| Bit-manipulation - shifts | `B(Zbb)` | `clz` `ctz` | 3 + 0..32 +| Bit-manipulation - shifts | `B(Zbb)` | `cpop` | 3 + 32 +| Bit-manipulation - shifts | `B(Zbb)` | `rol` `ror` `rori` | 3 + SA +| Bit-manipulation - single-bit | `B(Zbs)` | `sbset[i]` `sbclr[i]` `sbinv[i]` `sbext[i]` | 3 +| Bit-manipulation - shifted-add | `B(Zba)` | `sh1add` `sh2add` `sh3add` | 3 +| CSR access | `Zicsr` | `csrrw` `csrrs` `csrrc` `csrrwi` `csrrsi` `csrrci` | 4 +| System | `I/E`+`Zicsr` | `ecall` `ebreak` | 4 +| System | `I/E` | `fence` | 3 +| System | `C`+`Zicsr` | `c.break` | 4 +| System | `Zicsr` | `mret` `wfi` | 5 +| System | `Zifencei` | `fence.i` | 5 +| Floating-point - artihmetic | `Zfinx` | `fadd.s` | 110 +| Floating-point - artihmetic | `Zfinx` | `fsub.s` | 112 +| Floating-point - artihmetic | `Zfinx` | `fmul.s` | 22 +| Floating-point - compare | `Zfinx` | `fmin.s` `fmax.s` `feq.s` `flt.s` `fle.s` | 13 +| Floating-point - misc | `Zfinx` | `fsgnj.s` `fsgnjn.s` `fsgnjx.s` `fclass.s` | 12 +| Floating-point - conversion | `Zfinx` | `fcvt.w.s` `fcvt.wu.s` | 47 +| Floating-point - conversion | `Zfinx` | `fcvt.s.w` `fcvt.s.wu` | 48 +|======================= + +[NOTE] +The presented values of the *floating-point execution cycles* are average values – obtained from +4096 instruction executions using pseudo-random input values. The execution time for emulating the +instructions (using pure-software libraries) is ~17..140 times higher. + + + +// #################################################################################################################### +include::cpu_csr.adoc[] + + + +<<< +// #################################################################################################################### +:sectnums: +==== Execution Safety + +The hardware of the NEORV32 CPU was designed for maximum *execution safety*. If the `Zicsr` CPU +extension is enabled, the core supports **all** traps specified by the official RISC-V specifications (obviously, +not the ones that are related to yet unimplemented extensions/features). Thus, the CPU provides well-defined +hardware fall-backs for (nearly) everything that can go wrong. Even if any kind of trap is triggered, the core +is always in a defined and fully synchronized state throughout the whole architecture (i.e. no need to make +out-of-order operations undone) that allows predictable execution behavior at any time. + +**Core Safety Features** + +* Due to the acknowledged memory accesses the CPU is _always_ sync with the memory system (no speculative execution / out-of-order states). +* The CPU supports all bus exceptions including bus access exceptions that are triggered if an +accessed address does not respond or encounters an internal error during access (which is a rare +feature in many open-source RISC-V cores). +* The CPU raises an illegal instruction trap for **all** unimplemented/malformed/illegal instructions (to support _full_ virtualization). +* If user-level code tries to read from machine-level-only CSRs (like `mstatus`) an illegal instruction +exception is raised. The results of this operations is always zero (though, machine-level +code handling this exception can modify the target register of the illegal access-causing +instruction to allow full virtualization). Illegal write accesses to machine CSRs will not be write any data at all. +* Illegal user-level memory accesses to protected addresses or address regions (via physical memory +protection) will not be conducted at all (no actual write and no actual read; prevents triggering of +memory-mapped devices). Illegal load operations will not return any data (the instruction's +destination register will not be written at all). + + + +<<< +// #################################################################################################################### +:sectnums: +==== Traps, Exceptions and Interrupts + +In this document a (maybe) special nomenclature regarding traps is used: + +* _interrupt_ = asynchronous exceptions +* _exceptions_ = synchronous exceptions +* _traps_ = exceptions + interrupts (synchronous or asynchronous exceptions) + +Whenever an exception or interrupt is triggered, the CPU transfers control to the address stored in the `mtvec` +CSR. The cause of the according interrupt or exception can be determined via the content of the `mcause` +CSR The address that reflected the current program counter when a trap was taken is stored to `mepc`. +Additional information regarding the cause of the trap can be retrieved from `mtval`. + +The traps are prioritized. If several exceptions occur at once only the one with highest priority is triggered. If +several interrupts trigger at once, the one with highest priority is triggered while the remaining ones are +queued. After completing the interrupt handler the interrupt with the second highest priority will issues and +so on. + + +**Memory Access Exceptions** + +If a load operation causes any exception, the destination register is not written at all. Exceptions caused by a +misalignment or a physical memory protection fault do not trigger a bus read-operation at all. +Exceptions caused by a store address misalignment or a store physical memory protection fault do not trigger +a bus write-operation at all. + + +**Instruction Atomicity** + +All instructions execute as atomic operations – interrupts can only trigger between two instructions. + + +**Custom Fast Interrupt Request Lines** + +As a custom extension, the NEORV32 CPU features 16 fast interrupt request lines via the `firq_i` CPU (/Processor) top +entity signals. These interrupts have custom configuration and status flags in the `mie` and `mip` CSRs and also +provide custom trap codes in `mcause`. + + +**Non-Maskable Interrupt** + +The NEORV32 CPU features a single non-maskable interrupt source via the `nm_irq_i` CPU (/Processor) top +entity signal that can be used to signal critical system conditions. This interrupt source _cannot_ be disabled at all (even not in interrupt service routines). +Hence, it does _not_ provide configuration/status flags in the `mie` and `mip` CSRs. The RISC-V-compatible +`mcause` value `0x80000000` is used to indicate the non-maskable interrupt. + +[IMPORTANT] +All CPU/Processor interrupt request signals are triggered when the signal is _high_ for exactly one cycle (being high for several cycles might +cause multiple triggering of the interrupt). + + +<<< +// #################################################################################################################### +:sectnums!: +===== NEORV32 Trap Listing + +.NEORV32 trap listing +[cols="3,6,5,14,11,4,4"] +[options="header",grid="rows"] +|======================= +| Prio. | `mcause` | [RISC-V] | ID [C] | Cause | `mepc` | `mtval` +| 1 | `0x80000000` | 1.0 | _TRAP_CODE_NMI_ | non-maskable interrupt | _I-PC_ | _0_ +| 2 | `0x8000000B` | 1.11 | _TRAP_CODE_MEI_ | machine external interrupt | _I-PC_ | _0_ +| 3 | `0x80000003` | 1.3 | _TRAP_CODE_MSI_ | machine software interrupt | _I-PC_ | _0_ +| 4 | `0x80000007` | 1.7 | _TRAP_CODE_MTI_ | machine timer interrupt | _I-PC_ | _0_ +| 5 | `0x80000010` | 1.16 | _TRAP_CODE_FIRQ_0_ | fast interrupt request channel 0 | _I-PC_ | _0_ +| 6 | `0x80000011` | 1.17 | _TRAP_CODE_FIRQ_1_ | fast interrupt request channel 1 | _I-PC_ | _0_ +| 7 | `0x80000012` | 1.18 | _TRAP_CODE_FIRQ_2_ | fast interrupt request channel 2 | _I-PC_ | _0_ +| 8 | `0x80000013` | 1.19 | _TRAP_CODE_FIRQ_3_ | fast interrupt request channel 3 | _I-PC_ | _0_ +| 9 | `0x80000014` | 1.20 | _TRAP_CODE_FIRQ_4_ | fast interrupt request channel 4 | _I-PC_ | _0_ +| 10 | `0x80000015` | 1.21 | _TRAP_CODE_FIRQ_5_ | fast interrupt request channel 5 | _I-PC_ | _0_ +| 11 | `0x80000016` | 1.22 | _TRAP_CODE_FIRQ_6_ | fast interrupt request channel 6 | _I-PC_ | _0_ +| 12 | `0x80000017` | 1.23 | _TRAP_CODE_FIRQ_7_ | fast interrupt request channel 7 | _I-PC_ | _0_ +| 13 | `0x80000018` | 1.24 | _TRAP_CODE_FIRQ_8_ | fast interrupt request channel 8 | _I-PC_ | _0_ +| 14 | `0x80000019` | 1.25 | _TRAP_CODE_FIRQ_9_ | fast interrupt request channel 9 | _I-PC_ | _0_ +| 15 | `0x8000001a` | 1.26 | _TRAP_CODE_FIRQ_10_ | fast interrupt request channel 10 | _I-PC_ | _0_ +| 16 | `0x8000001b` | 1.27 | _TRAP_CODE_FIRQ_11_ | fast interrupt request channel 11 | _I-PC_ | _0_ +| 17 | `0x8000001c` | 1.28 | _TRAP_CODE_FIRQ_12_ | fast interrupt request channel 12 | _I-PC_ | _0_ +| 18 | `0x8000001d` | 1.29 | _TRAP_CODE_FIRQ_13_ | fast interrupt request channel 13 | _I-PC_ | _0_ +| 19 | `0x8000001e` | 1.30 | _TRAP_CODE_FIRQ_14_ | fast interrupt request channel 14 | _I-PC_ | _0_ +| 20 | `0x8000001f` | 1.31 | _TRAP_CODE_FIRQ_15_ | fast interrupt request channel 15 | _I-PC_ | _0_ +| 21 | `0x00000001` | 0.1 | _TRAP_CODE_I_ACCESS_ | instruction access fault | _B-ADR_ | _PC_ +| 22 | `0x00000002` | 0.2 | _TRAP_CODE_I_ILLEGAL_ | illegal instruction | _PC_ | _Inst_ +| 23 | `0x00000000` | 0.0 | _TRAP_CODE_I_MISALIGNED_ | instruction address misaligned | _B-ADR_ | _PC_ +| 24 | `0x0000000B` | 0.11 | _TRAP_CODE_MENV_CALL_ | environment call from M-mode (ECALL in machine-mode) | _PC_ | _PC_ +| 25 | `0x00000008` | 0.8 | _TRAP_CODE_UENV_CALL_ | environment call from U-mode(ECALL in user-mode) | _PC_ | _PC_ +| 26 | `0x00000003` | 0.3 | _TRAP_CODE_BREAKPOINT_ | breakpoint (EBREAK) | _PC_ | _PC_ +| 27 | `0x00000006` | 0.6 | _TRAP_CODE_S_MISALIGNED_ | store address misaligned | _B-ADR_ | _B-ADR_ +| 28 | `0x00000004` | 0.4 | _TRAP_CODE_L_MISALIGNED_ | load address misaligned | _B-ADR_ | _B-ADR_ +| 29 | `0x00000007` | 0.7 | _TRAP_CODE_S_ACCESS_ | store access fault | _B-ADR_ | _B-ADR_ +| 30 | `0x00000005` | 0.5 | _TRAP_CODE_L_ACCESS_ | lad access fault | _B-ADR_ | _B-ADR_ +|======================= + +**Notes** + +The "Prio." column shows the priority of each trap. The highest priority is 1. The "`mcause`" column shows the +cause ID of the according trap that is written to `mcause` CSR. The "[RISC-V]" columns show the interrupt/exception code value from the +official RISC-V privileged architecture manual. The "[C]" names are defined by the NEORV32 core library (`sw/lib/include/neorv32.h`) and can +be used in plain C code. The "`mepc`" and "`mtval`" columns show the value written to +`mepc` and `mtval` CSRs when a trap is triggered: + +* _I-PC_ - address of interrupted instruction (instruction has not been execute/completed yet) +* _B-ADR_- bad memory access address that cause the trap +* _PC_ - address of instruction that caused the trap +* _0_ - zero +* _Inst_ - the faulting instruction itself + + + +<<< +// #################################################################################################################### +:sectnums: +==== Bus Interface + +The CPU provides two independent bus interfaces: One for fetching instructions (`i_bus_*`) and one for +accessing data (`d_bus_*`) via load and store operations. Both interfaces use the same interface protocol. + +:sectnums: +===== Address Space + +The CPU is a 32-bit architecture with separated instruction and data interfaces making it a Harvard +Architecture. Each of this interfaces can access an address space of up to 2^32^ bytes (4GB). The memory +system is based on 32-bit words with a minimal granularity of 1 byte. Please note, that the NEORV32 CPU +does not support unaligned memory accesses _in hardware_ – however, a software-based handling can be +implemented as any unaligned memory access will trigger an according exception. + +:sectnums: +===== Interface Signals + +The following table shows the signals of the data and instruction interfaces seen from the CPU +(`*_o` signals are driven by the CPU / outputs, `*_i` signals are read by the CPU / inputs). + +.CPU bus interface +[cols="<2,^1,<7"] +[options="header",grid="rows"] +|======================= +| Signal | Size | Function +| `bus_addr_o` | 32 | access address +| `bus_rdata_i` | 32 | data input for read operations +| `bus_wdata_o` | 32 | data output for write operations +| `bus_ben_o` | 4 | byte enable signal for write operations +| `bus_we_o` | 1 | bus write access +| `bus_re_o` | 1 | bus read access +| `bus_lock_o` | 1 | exclusive access request +| `bus_ack_i` | 1 | accessed peripheral indicates a successful completion of the bus transaction +| `bus_err_i` | 1 | accessed peripheral indicates an error during the bus transaction +| `bus_fence_o` | 1 | this signal is set for one cycle when the CPU executes a data/instruction fence operation +| `bus_priv_o` | 2 | current CPU privilege level +|======================= + +[NOTE] +Currently, there a no pipelined or overlapping operations implemented within the same bus interface. +So only a single transfer request can be "on the fly". + +:sectnums: +===== Protocol + +A bus request is triggered either by the `bus_re_o` signal (for reading data) or by the `bus_we_o` signal (for +writing data). These signals are active for exactly one cycle and initiate either a read or a write transaction. The transaction is +completed when the accessed peripheral either sets the `bus_ack_i` signal (-> successful completion) or the +`bus_err_i` signal is set (-> failed completion). All these control signals are only active (= high) for one +single cycle. An error indicated via the `bus_err_i` signal during a transfer will trigger the according instruction bus +access fault or load/store bus access fault exception. + +[NOTE] +The transfer can be completed directly in the same cycle as it was initiated (via the `bus_re_o` or `bus_we_o` +signal) if the peripheral sets `bus_ack_i` or `bus_err_i` high for one cycle. However, in order to shorten the critical path such "asynchronous" +completion should be avoided. The default processor-internal module provide exactly **one cycle delay** between initiation and completion of transfers. + +.Bus Keeper: Processor-internal memories and memory-mapped devices with variable / high latency +[IMPORTANT] +Processor-internal peripherals or memories do not have to respond within one cycle after the transfer initiation (= latency > 1 cycle). +However, the bus transaction has to be completed (= acknowledged) within a certain **response time window**. This time window is defined +by the global `max_proc_int_response_time_c` constant (default = 15 cycles) from the processor's VHDL package file (`rtl/neorv32_package.vhd`). +It defines the maximum number of cycles after which an _unacknowledged_ processor-internal bus transfer will timeout and raise a **bus fault exception**. +The _BUSKEEPER_ hardware module (`rtl/core/neorv32_bus_keeper.vhd`) keeps track of all _internal_ bus transactions. If any bus operations times out +(for example when accessing "address space holes") this unit will issue a bus error to the CPU that will raise the according instruction fetch or data access bus exception. +Note that **the bus keeper does not track external accesses via the external memory bus interface**. However, the external memory bus interface also provides +an _optional_ bus timeout (see section <<_processor_external_memory_interface_wishbone_axi4_lite>>). + +**Exemplary Bus Accesses** + +.Example bus accesses: see read/write access description below +[cols="^2,^2"] +[grid="none"] +|======================= +a| image::cpu_interface_read_long.png[read,300,150] +a| image::cpu_interface_write_long.png[write,300,150] +| Read access | Write access +|======================= + +**Write Access** + +For a write access, the accessed address (`bus_addr_o`), the data to be written (`bus_wdata_o`) and the byte +enable signals (`bus_ben_o`) are set when bus_we_o goes high. These three signals are kept stable until the +transaction is completed. In the example the accessed peripheral cannot answer directly in the next +cycle after issuing. Here, the transaction is successful and the peripheral sets the `bus_ack_i` signal several +cycles after issuing. + +**Read Access** + +For a read access, the accessed address (`bus_addr_o`) is set when `bus_re_o` goes high. The address is kept +stable until the transaction is completed. In the example the accessed peripheral cannot answer +directly in the next cycle after issuing. The peripheral hast to apply the read data right in the same cycle as +the bus transaction is completed (here, the transaction is successful and the peripheral sets the `bus_ack_i` +signal). + +**Access Boundaries** + +The instruction interface will always access memory on word (= 32-bit) boundaries even if fetching +compressed (16-bit) instructions. The data interface can access memory on byte (= 8-bit), half-word (= 16- +bit) and word (= 32-bit) boundaries. + +**Exclusive (Atomic) Access** + +The CPU can access memory in an exclusive manner by generating a load-reservate and store-conditional +combination. Normally, these combinations should target the same memory address. + +The CPU starts an exclusive access to memory via the _load-reservate instruction_ (`lr.w`). This instruction +will set the CPU-internal _exclusive access lock_, which directly drives the `d_bus_lock_o`. It is the task of +the memory system to manage this exclusive access reservation by storing the according access address and +the source of the access itself (for example via the CPU ID in a multi-core system). + +When the CPU executes a _store-conditional instruction_ (`sc.w`) the _CPU-internal exclusive access lock_ is +evaluated to check if the exclusive access was successful. If the lock is still OK, the instruction will write-back +zero and will allow the according store operation to the memory system. If the lock is broken, the +instruction will write-back non-zero and will not generate an actual memory store operation. + +The CPU-internal exclusive access lock is broken if at least one of the situations appear. + +* when executing any other memory-access operation than `lr.w` +* when any trap (sync. or async.) is triggered (for example to force a context switch) +* when the memory system signals a bus error (via the `bus_err_i` signal) + +[TIP] +For more information regarding the SoC-level behavior and requirements of atomic operations see +section <<_processor_external_memory_interface_wishbone_axi4_lite>>. + +**Memory Barriers** + +Whenever the CPU executes a fence instruction, the according interface signal is set high for one cycle +(`d_bus_fence_o` for a _fence_ instruction; `i_bus_fence_o` for a _fencei_ instruction). It is the task of the +memory system to perform the necessary operations (like a cache flush and refill). + + + +<<< +// #################################################################################################################### +:sectnums: +==== CPU Hardware Reset + +In order to reduce routing constraints (and by this the actual hardware requirements), most uncritical +registers of the NEORV32 CPU as well as most register of the whole NEORV32 Processor do not use **a +dedicated hardware reset**. "Uncritical registers" in this context means that the initial value of these registers +after power-up is not relevant for a defined CPU boot process. + +**Rational** + +A good example to illustrate the concept of uncritical registers is a pipelined processing engine. Each stage +of the engine features an N-bit _data register_ and a 1-bit _status register_. The status register is set when the +data in the according data register is valid. At the end of the pipeline the status register might trigger a writeback +of the processing result to some kind of memory. The initial status of the data registers after power-up is +irrelevant as long as the status registers are all reset to a defined value that indicates there is no valid data in +the pipeline’s data register. Therefore, the pipeline data register do no require a dedicated reset as they do not +control the actual operation (in contrast to the status register). This makes the pipeline data registers from +this example "uncritical registers". + +**NEORV32 CPU Reset** + +In terms of the NEORV32 CPU, there are several pipeline registers, state machine registers and even status +and control registers (CSRs) that do not require a defined initial state to ensure a correct boot process. The +pipeline register will get initialized by the CPU’s internal state machines, which are initialized from the main +control engine that actually features a defined reset. The initialization of most of the CPU's core CSRs (like +interrupt control) is done by the software (to be more specific, this is done by the `crt0.S` start-up code). + +During the very early boot process (where `crt0.S` is running) there is no chance for undefined behavior due to +the lack of dedicated hardware resets of certain CSRs. For example the machine interrupt-enable CSR (`mie`) +does not provide a dedicated reset. The value after reset of this register is uncritical as interrupts cannot fire +because the global interrupt enabled flag in the status register (`mstatsus(mie)`) provides a dedicated +hardware reset setting it to low (globally disabling interrupts). + +**Reset Configuration** + +Most CPU-internal register do feature an asynchronous reset in the VHDL code, but the "don't care" value +(VHDL `'-'`) is used for initialization of the uncritical register, effectively generating a flip-flop without a +reset. However, certain applications or situations (like advanced gate-level / timing simulations) might +require a more deterministic reset state. For this case, a defined reset level (reset-to-low) of all registers can +be enabled via a constant in the main VHDL package file (`rtl/core/neorv32_package.vhd`): + +[source,vhdl] +---- +-- "critical" number of PMP regions -- +constant dedicated_reset_c : boolean := false; -- use dedicated hardware reset value +for UNCRITICAL registers (FALSE=reset value is irrelevant (might simplify HW), +default; TRUE=defined LOW reset value) +---- Index: datasheet/cpu_csr.adoc =================================================================== --- datasheet/cpu_csr.adoc (nonexistent) +++ datasheet/cpu_csr.adoc (revision 60) @@ -0,0 +1,793 @@ +<<< +:sectnums: +=== Control and Status Registers (CSRs) + +The following table shows a summary of all available CSRs. The address field defines the CSR address for +the CSR access instructions. The *[ASM]* name can be used for (inline) assembly code and is directly +understood by the assembler/compiler. The *[C]* names are defined by the NEORV32 core library and can be +used as immediate in plain C code. The *R/W* column shows whether the CSR can be read and/or written. +The NEORV32-specific CSRs are mapped to the official "custom CSRs" CSR address space. + +[IMPORTANT] +The CSRs, the CSR-related instructions as well as the complete exception/interrupt processing +system are only available when the `CPU_EXTENSION_RISCV_Zicsr` generic is _true_. + +[IMPORTANT] +When trying to write to a read-only CSR (like the `time` CSR) or when trying to access a nonexistent +CSR or when trying to access a machine-mode CSR from less-privileged user-mode an +illegal instruction exception is raised. + +[NOTE] +CSR reset value: Please note that most of the CSRs do *NOT* provide a dedicated reset. Hence, +these CSRs are not initialized by a hardware reset and keep an *UNDEFINED* value until they are +explicitly initialized by the software (normally, this is already done by the NEORV32-specific +`crt0.S` start-up code). For more information see section <<_cpu_hardware_reset>>. + +**CSR Listing** + +The description of each single CSR provides the following summary: + +.CSR description +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| _Address_ | _Description_ | _ASM alias_ +3+| Reset value: _CSR content after hardware reset_ (also see <<_cpu_hardware_reset>>) +3+| _Detailed description_ +|====== + +.Not Implemented CSRs / CSR Bits +[IMPORTANT] +All CSR bits that are unused / not implemented / not shown are _hardwired to zero_. All CSRs that are not +implemented at all (and are not "disabled" using certain configuration generics) will trigger an exception on +access. The CSR that are implemented within the NEORV32 might cause an exception if they are disabled. +See the according CSR description for more information. + +.Debug Mode CSRs +[IMPORTANT] +The _debug mode_ CSRs are not listed here since they are only accessible in debug mode and not during normal CPU operation. +See section <<_cpu_debug_mode_csrs>>. + + +<<< +// #################################################################################################################### +**CSR Listing Notes** + +CSRs with the following notes ... + +* `C` - have or are a custom CPU extension (that is allowed by the RISC-V specs) +* `R` - are read-only (in contrast to the originally specified r/w capability) +* `S` - have a constrained compatibility; for example not all specified bits are available + +.NEORV32 Control and Status Registers (CSRs) +[cols="<4,<6,<11,^3,<11,^3"] +[options="header"] +|======================= +| Address | Name [ASM] | Name [C] | R/W | Function | Note +6+^| **<<_floating_point_csrs>>** +| 0x001 | <<_fflags>> | _CSR_FFLAGS_ | r/w | Floating-point accrued exceptions | +| 0x002 | <<_frm>> | _CSR_FRM_ | r/w | Floating-point dynamic rounding mode | +| 0x003 | <<_fcsr>> | _CSR_FCSR_ | r/w | Floating-point control and status (`frm` + `fflags`) | +6+^| **<<_machine_trap_setup>>** +| 0x300 | <<_mstatus>> | _CSR_MSTATUS_ | r/w | Machine status register | `S` +| 0x301 | <<_misa>> | _CSR_MISA_ | r/- | Machine CPU ISA and extensions | `R` +| 0x304 | <<_mie>> | _CSR_MIE_ | r/w | Machine interrupt enable register | `C` +| 0x305 | <<_mtvec>> | _CSR_MTVEC_ | r/w | Machine trap-handler base address (for ALL traps) | +| 0x306 | <<_mcounteren>> | _CSR_MCOUNTEREN_ | r/w | Machine counter-enable register | `S` +6+^| **<<_machine_trap_handling>>** +| 0x340 | <<_mscratch>> | _CSR_MSCRATCH_ | r/w | Machine scratch register | +| 0x341 | <<_mepc>> | _CSR_MEPC_ | r/w | Machine exception program counter | +| 0x342 | <<_mcause>> | _CSR_MCAUSE_ | r/w | Machine trap cause | `C` +| 0x343 | <<_mtval>> | _CSR_MTVAL_ | r/- | Machine bad address or instruction | `R` +| 0x344 | <<_mip>> | _CSR_MIP_ | r/- | Machine interrupt pending register | `CR` +6+^| **<<_machine_physical_memory_protection>>** +| 0x3a0 .. 0x3af | <<_pmpcfg, `pmpcfg0`>> .. <<_pmpcfg, , `pmpcfg15`>> | _CSR_PMPCFG0_ .. _CSR_PMPCFG15_ | r/w | Physical memory protection config. for region 0..63 | `S` +| 0x3b0 .. 0x3ef | <<_pmpaddr, `pmpaddr0`>> .. <<_pmpaddr, `pmpaddr63`>> | _CSR_PMPADDR0_ .. _CSR_PMPADDR63_ | r/w | Physical memory protection addr. register region 0..63 | +6+^| **<<_machine_counters_and_timers>>** +| 0xb00 | <<_mcycleh, `mcycle`>> | _CSR_MCYCLE_ | r/w | Machine cycle counter low word | +| 0xb02 | <<_minstreth, `_minstret`>> | _CSR_MINSTRET_ | r/w | Machine instruction-retired counter low word | +| 0xb80 | <<_mcycleh>> | _CSR_MCYCLE_ | r/w | Machine cycle counter high word | +| 0xb82 | <<_minstreth>> | _CSR_MINSTRET_ | r/w | Machine instruction-retired counter high word | +| 0xc00 | <<_cycleh, `cycle`>> | _CSR_CYCLE_ | r/- | Cycle counter low word | +| 0xc01 | <<_timeh, `time`>> | _CSR_TIME_ | r/- | System time (from MTIME) low word | +| 0xc02 | <<_instreth, `instret`>> | _CSR_INSTRET_ | r/- | Instruction-retired counter low word | +| 0xc80 | <<_cycleh>> | _CSR_CYCLEH_ | r/- | Cycle counter high word | +| 0xc81 | <<_timeh>> | _CSR_TIMEH_ | r/- | System time (from MTIME) high word | +| 0xc82 | <<_instreth>> | _CSR_INSTRETH_ | r/- | Instruction-retired counter high word | +6+^| **<<_hardware_performance_monitors_hpm>>** +| 0x323 .. 0x33f | <<_mhpmevent, `mhpmevent3`>> .. <<_mhpmevent, `mhpmevent31`>> | _CSR_MHPMEVENT3_ .. _CSR_MHPMEVENT31_ | r/w | Machine performance-monitoring event selector 3..31 | `C` +| 0xb03 .. 0xb1f | <<_mhpmcounterh, `mhpmcounter3`>> .. <<_mhpmcounterh, `mhpmcounter31`>> | _CSR_MHPMCOUNTER3_ .. _CSR_MHPMCOUNTER31_ | r/w | Machine performance-monitoring counter 3..31 low word | +| 0xb83 .. 0xb9f | <<_mhpmcounterh, `mhpmcounter3h`>> .. <<_mhpmcounterh, `mhpmcounter31h`>> | _CSR_MHPMCOUNTER3H_ .. _CSR_MHPMCOUNTER31H_ | r/w | Machine performance-monitoring counter 3..31 high word | +| 0xc03 .. 0xc1f | <<_hpmcounterh, `hpmcounter3`>> .. <<_hpmcounterh, `hpmcounter31`>> | _CSR_HPMCOUNTER3_ .. _CSR_HPMCOUNTER31_ | r/- | Performance-monitoring counter 3..31 low word | +| 0xc83 .. 0xc9f | <<_hpmcounterh, `hpmcounter3h`>> .. <<_hpmcounter31h, `hpmcounter31h`>> | _CSR_HPMCOUNTER3H_ .. _CSR_HPMCOUNTER31H_ | r/- | Performance-monitoring counter 3..31 high word | +6+^| **<<_machine_counter_setup>>** +| 0x320 | <<_mcountinhibit>> | _CSR_MCOUNTINHIBIT_ | r/w | Machine counter-enable register | +6+^| **<<_machine_information_registers>>** +| 0xf11 | <<_mvendorid>> | _CSR_MVENDORID_ | r/- | Vendor ID | +| 0xf12 | <<_marchid>> | _CSR_MARCHID_ | r/- | Architecture ID | +| 0xf13 | <<_mimpid>> | _CSR_MIMPID_ | r/- | Machine implementation ID / version | +| 0xf14 | <<_mhartid>> | _CSR_MHARTID_ | r/- | Machine thread ID | +6+^| **<<_neorv32_specific_custom_csrs>>** +| 0xfc0 | <<_mzext>> | _CSR_MZEXT_ | r/- | Available `Z*` CPU extensions | +|======================= + + + +<<< +// #################################################################################################################### +:sectnums: +==== Floating-Point CSRs + +These CSRs are available if the `Zfinx` extensions is enabled (`CPU_EXTENSION_RISCV_Zfinx` is _true_). +Otherwise any access to the floating-point CSRs will raise an illegal instruction exception. + + +:sectnums!: +===== **`fflags`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x001 | **Floating-point accrued exceptions** | `fflags` +3+| Reset value: _UNDEFINED_ +3+| The `fflags` CSR is compatible to the RISC-V specifications. It shows the accrued ("accumulated") +exception flags in the lowest 5 bits. This CSR is only available if a floating-point CPU extension is enabled. +See the RISC-V ISA spec for more information. +|====== + + +:sectnums!: +===== **`frm`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x002 | **Floating-point dynamic rounding mode** | `frm` +3+| Reset value: _UNDEFINED_ +3+| The `frm` CSR is compatible to the RISC-V specifications and is used to configure the rounding modes using +the lowest 3 bits. This CSR is only available if a floating-point CPU extension is enabled. See the RISC-V +ISA spec for more information. +|====== + + +:sectnums!: +===== **`fcsr`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x003 | **Floating-point control and status register** | `fcsr` +3+| Reset value: _UNDEFINED_ +3+| The `fcsr` CSR is compatible to the RISC-V specifications. It provides combined read/write access to the +`fflags` and `frm` CSRs. This CSR is only available if a floating-point CPU extension is enabled. See the +RISC-V ISA spec for more information. +|====== + + +<<< +// #################################################################################################################### +:sectnums: +==== Machine Trap Setup + +:sectnums!: +===== **`mstatus`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x300 | **Machine status register - low word** | `mstatus` +3+| Reset value: _0x00000020.00000000_ +3+| The `mstatus` CSR is compatible to the RISC-V specifications. It shows the CPU's current execution state. +The following bits are implemented (all remaining bits are always zero and are read-only). +|====== + +.Machine status register +[cols="^1,<3,^1,<5"] +[options="header",grid="rows"] +|======================= +| Bit | Name [C] | R/W | Function +| 12:11 | _CSR_MSTATUS_MPP_H_ : _CSR_MSTATUS_MPP_L_ | r/w | Previous machine privilege level, 11 = machine (M) level, 00 = user (U) level +| 7 | _CSR_MSTATUS_MPIE_ | r/w | Previous machine global interrupt enable flag state +| 3 | _CSR_MSTATUS_MIE_ | r/w | Machine global interrupt enable flag +|======================= + +When entering an exception/interrupt, the `MIE` flag is copied to `MPIE` and cleared afterwards. When leaving +the exception/interrupt (via the `mret` instruction), `MPIE` is copied back to `MIE`. + + +:sectnums!: +===== **`misa`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x301 | **ISA and extensions** | `misa` +3+| Reset value: _configuration dependant_ +3+| The `misa` CSR gives information about the actual CPU features. The lowest 26 bits show the implemented +CPU extensions. The following bits are implemented (all remaining bits are always zero and are read-only). +|====== + +[IMPORTANT] +The `misa` CSR is not fully RISC-V-compatible as it is read-only. Hence, implemented CPU +extensions cannot be switch on/off during runtime. For compatibility reasons any write access to this +CSR is simply ignored and will NOT cause an illegal instruction exception. + +.Machine ISA and extension register +[cols="^1,<3,^1,<5"] +[options="header",grid="rows"] +|======================= +| Bit | Name [C] | R/W | Function +| 31:30 | _CSR_MISA_MXL_HI_EXT_ : _CSR_MISA_MXL_LO_EXT_ | r/- | 32-bit architecture indicator (always _01_) +| 23 | _CSR_MISA_X_EXT_ | r/- | `X` extension bit is always set to indicate custom non-standard extensions +| 20 | _CSR_MISA_U_EXT_ | r/- | `U` CPU extension (user mode) available, set when _CPU_EXTENSION_RISCV_U_ enabled +| 12 | _CSR_MISA_M_EXT_ | r/- | `M` CPU extension (mul/div) available, set when _CPU_EXTENSION_RISCV_M_ enabled +| 8 | _CSR_MISA_I_EXT_ | r/- | `I` CPU base ISA, cleared when _CPU_EXTENSION_RISCV_E_ enabled +| 4 | _CSR_MISA_E_EXT_ | r/- | `E` CPU extension (embedded) available, set when _CPU_EXTENSION_RISCV_E_ enabled +| 2 | _CSR_MISA_C_EXT_ | r/- | `C` CPU extension (compressed instruction) available, set when _CPU_EXTENSION_RISCV_C_ enabled +| 0 | _CSR_MISA_A_EXT_ | r/- | `A` CPU extension (atomic memory access) available, set when _CPU_EXTENSION_RISCV_A_ enabled +|======================= + +[TIP] +Information regarding the available RISC-V Z* _sub-extensions_ (like `Zicsr` or `Zfinx`) can be found in the <<_mzext>> CSR. + + +:sectnums!: +===== **`mie`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x304 | **Machine interrupt-enable register** | `mie` +3+| Reset value: _UNDEFINED_ +3+| The `mie` CSR is compatible to the RISC-V specifications and features custom extensions for the fast +interrupt channels. It is used to enabled specific interrupts sources. Please note that interrupts also have to be +globally enabled via the `CSR_MSTATUS_MIE` flag of the `mstatus` CSR. The following bits are implemented +(all remaining bits are always zero and are read-only): +|====== + +.Machine ISA and extension register +[cols="^1,<3,^1,<5"] +[options="header",grid="rows"] +|======================= +| Bit | Name [C] | R/W | Function +| 31:16 | _CSR_MIE_FIRQ15E_ : _CSR_MIE_FIRQ0E_ | r/w | Fast interrupt channel 15..0 enable +| 11 | _CSR_MIE_MEIE_ | r/w | Machine _external_ interrupt enable +| 7 | _CSR_MIE_MTIE_ | r/w | Machine _timer_ interrupt enable (from _MTIME_) +| 3 | _CSR_MIE_MSIE_ | r/w | Machine _software_ interrupt enable +|======================= + + +:sectnums!: +===== **`mtvec`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x305 | **Machine trap-handler base address** | `mtvec` +3+| Reset value: _UNDEFINED_ +3+| The `mtvec` CSR is compatible to the RISC-V specifications. It stores the base address for ALL machine +traps. Thus, it defines the main entry point for exception/interrupt handling regardless of the actual trap +source. The lowest two bits of this register are always zero and cannot be modified (= fixed address mode). +|====== + +.Machine trap-handler base address +[cols="^1,^1,<8"] +[options="header",grid="rows"] +|======================= +| Bit | R/W | Function +| 31:2 | r/w | 4-byte aligned base address of trap base handler +| 1:0 | r/- | Always zero +|======================= + + +:sectnums!: +===== **`mcounteren`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x306 | **Machine counter enable** | `mcounteren` +3+| Reset value: _UNDEFINED_ +3+| The `mcounteren` CSR is compatible to the RISC-V specifications. The bits of this CSR define which +counter/timer CSR can be accessed (read) from code running in a less-privileged modes. For example, +if user-level code tries to read from a counter/timer CSR without having access, the illegal instruction +exception is raised. The following table shows all implemented bits (all remaining bits are always zero and +are read-only). If user mode in not implemented (_CPU_EXTENSION_RISCV_U_ = _false_) all bits of the +`mcounteren` CSR are tied to zero. +|====== + +.Machine counter enable register +[cols="^1,<3,^1,<5"] +[options="header",grid="rows"] +|======================= +| Bit | Name [C] | R/W | Function +| 31:16 | _CSR_MCOUNTEREN_HPM31_ : _CSR_MCOUNTEREN_HPM3_ | r/w | User-level code is allowed to read `hpmcounter*[h]` CSRs when set +| 2 | _CSR_MCOUNTEREN_IR_ | r/w | User-level code is allowed to read `cycle[h]` CSRs when set +| 1 | _CSR_MCOUNTEREN_TM_ | r/w | User-level code is allowed to read `time[h]` CSRs when set +| 0 | _CSR_MCOUNTEREN_CY_ | r/w | User-level code is allowed to read `instret[h]` CSRs when set +|======================= + + +<<< +// #################################################################################################################### +:sectnums: +==== Machine Trap Handling + +:sectnums!: +===== **`mscratch`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x340 | **Scratch register for machine trap handlers** | `mscratch` +3+| Reset value: _UNDEFINED_ +3+| The `mscratch` CSR is compatible to the RISC-V specifications. It is a general purpose scratch register that +can be used by the exception/interrupt handler. The content pf this register after reset is undefined. +|====== + +:sectnums!: +===== **`mepc`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x341 | **Machine exception program counter** | `mepc` +3+| Reset value: _UNDEFINED_ +3+| The `mepc` CSR is compatible to the RISC-V specifications. For exceptions (like an illegal instruction) this +register provides the address of the exception-causing instruction. For Interrupt (like a machine timer +interrupt) this register provides the address of the next not-yet-executed instruction. +|====== + +:sectnums!: +===== **`mcause`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x342 | **Machine trap cause** | `mcause` +3+| Reset value: _UNDEFINED_ +3+| The `mcause` CSR is compatible to the RISC-V specifications. It show the cause ID for a taken exception. +|====== + +.Machine trap cause register +[cols="^1,^1,<8"] +[options="header",grid="rows"] +|======================= +| Bit | R/W | Function +| 31 | r/w | `1` if the trap is caused by an interrupt (`0` if the trap is caused by an exception) +| 30:5 | r/- | _Reserved_, read as zero +| 4:0 | r/w | Trap ID, see <<_neorv32_trap_listing>> +|======================= + +:sectnums!: +===== **`mtval`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x343 | **Machine bad address or instruction** | `mtval` +3+| Reset value: _UNDEFINED_ +3+| The `mtval` CSR is compatible to the RISC-V specifications. When a trap is triggered, the CSR shows either +the faulting address (for misaligned/faulting load/stores/fetch) or the faulting instruction itself (for illegal +instructions). For interrupts the CSR is set to zero. +|====== + +.Machine bad address or instruction register +[cols="^5,^5"] +[options="header",grid="rows"] +|======================= +| Trap cause | `mtval` content +| misaligned instruction fetch address or instruction fetch access fault | address of faulting instruction fetch +| breakpoint | program counter (= address) of faulting instruction itself +| misaligned load address, load access fault, misaligned store address or store access fault | program counter (= address) of faulting instruction itself +| illegal instruction | actual instruction word of faulting instruction +| anything else including interrupts | _0x00000000_ (always zero) +|======================= + +[IMPORTAN] +The NEORV32 `mtval` CSR is read-only. A write access will raise an illegal instruction exception. + +:sectnums!: +===== **`mip`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x344 | **Machine interrupt Pending** | `mip` +3+| Reset value: _0x00000000_ +3+| The `mip` CSR is _partly_ compatible to the RISC-V specifications and also provides custom extensions. It shows currently pending interrupts. Since this register is +read-only, pending interrupt can only be cleared by disabling and re-enabling the according `mie` CSr bit. Writing to this CSR will +raise an illegal instruction exception. The following CSR bits are implemented (all remaining bits are always zero and are read-only). +|====== + +.Machine interrupt pending register +[cols="^1,<3,^1,<5"] +[options="header",grid="rows"] +|======================= +| Bit | Name [C] | R/W | Function +| 31:16 | _CSR_MIP_FIRQ15P_ : _CSR_MIP_FIRQ0P_ | r/- | fast interrupt channel 15..0 pending +| 11 | _CSR_MIP_MEIP_ | r/- | machine _external_ interrupt pending +| 7 | _CSR_MIP_MTIP_ | r/- | machine _timer_ interrupt pending +| 3 | _CSR_MIP_MSIP_ | r/- | machine _software_ interrupt pending +|======================= + + +<<< +// #################################################################################################################### +:sectnums: +==== Machine Physical Memory Protection + +The available physical memory protection logic is configured via the _PMP_NUM_REGIONS_ and +_PMP_MIN_GRANULARITY_ top entity generics. _PMP_NUM_REGIONS_ defines the number of implemented +protection regions and thus, the availability of the according `pmpcfg*` and `pmpaddr*` CSRs. + +[TIP] +If trying to access an PMP-related CSR beyond _PMP_NUM_REGIONS_ **no illegal instruction +exception** is triggered. The according CSRs are read-only (writes are ignored) and always return zero. + +[IMPORTANT] +The RISC-V-compatible NEORV32 physical memory protection only implements the _NAPOT_ +(naturally aligned power-of-two region) mode with a minimal region granularity of 8 bytes. + + +:sectnums!: +===== **`pmpcfg`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x3a0 - 0x3af| **Physical memory protection configuration registers** | `pmpcfg0` - `pmpcfg15` +3+| Reset value: _0x00000000_ +3+| The `pmpcfg*` CSRs are compatible to the RISC-V specifications. They are used to configure the protected +regions, where each `pmpcfg*` CSR provides configuration bits for four regions. The following bits (for the +first PMP configuration entry) are implemented (all remaining bits are always zero and are read-only): +|====== + +.Physical memory protection configuration register entry +[cols="^1,^3,^1,<11"] +[options="header",grid="rows"] +|======================= +| Bit | RISC-V name | R/W | Function +| 7 | _L_ | r/w | lock bit, can be set – but not be cleared again (only via CPU reset) +| 6:5 | - | r/- | reserved, read as zero +| 4:3 | _A_ | r/w | mode configuration; only OFF (`00`) and NAPOT (`11`) are supported +| 2 | _X_ | r/w | execute permission +| 1 | _W_ | r/w | write permission +| 0 | _R_ | r/w | read permission +|======================= + + +:sectnums!: +===== **`pmpaddr`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x3b0 - 0x3ef| **Physical memory protection configuration registers** | `pmpaddr0` - `pmpaddr63` +3+| Reset value: _UNDEFINED_ +3+| The `pmpaddr*` CSRs are compatible to the RISC-V specifications. They are used to configure the base +address and the region size. +|====== + +[NOTE] +When configuring PMP make sure to set `pmpaddr*` before activating the according region via +`pmpcfg*`. When changing the PMP configuration, deactivate the according region via `pmpcfg*` +before modifying `pmpaddr*`. + + +<<< +// #################################################################################################################### +:sectnums: +==== (Machine) Counters and Timers + +[IMPORTANT] +The _CPU_CNT_WIDTH_ generic defines the total size of the CPU's `[m]cycle` and `[m]instret` +counter CSRs (low and high words combined); the time CSRs are not affected by this generic. Any +configuration with _CPU_CNT_WIDTH_ less than 64 is not RISC-V compliant. + +[IMPORTANT] +If _CPU_CNT_WIDTH_ is less than 64 (the default value) and greater than or equal 32, the according +MSBs of `[m]cycleh` and `[m]instreth` are read-only and always read as zero. This configuration +will also set the _ZXSCNT_ flag in the `mzext` CSR. + +[IMPORTANT] +If _CPU_CNT_WIDTH_ is less than 32 and greater than 0, the `[m]cycleh` and `[m]instreth` do not +exist and any access will raise an illegal instruction exception. Furthermore, the according MSBs of +`[m]cycle` and `[m]instret` are read-only and always read as zero. This configuration will also +set the _ZXSCNT_ flag in the `mzext` CSR. + +[IMPORTANT] +If _CPU_CNT_WIDTH_ is 0, the `[m]cycleh`, `[m]cycle`, `[m]instreth` and `[m]instret` do not +exist and any access will raise an illegal instruction exception. This configuration will also set the +_ZXNOCNT_ flag in the `mzext` CSR. + + +:sectnums!: +===== **`cycle[h]`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xc00 | **Cycle counter - low word** | `cycle` +| 0xc80 | **Cycle counter - high word** | `cycleh` +3+| Reset value: _UNDEFINED_ +3+| The `cycle[h]` CSR is compatible to the RISC-V specifications. It shows the lower/upper 32-bit of the 64-bit cycle +counter. The `cycle[h]` CSR is a read-only shadowed copy of the `mcycle[h]` CSR. +|====== + + +:sectnums!: +===== **`time[h]`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xc01 | **System time - low word** | `time` +| 0xc81 | **System time - high word** | `timeh` +3+| Reset value: _UNDEFINED_ +3+| The `time[h]` CSR is compatible to the RISC-V specifications. It shows the lower/upper 32-bit of the 64-bit system +time. The system time is either generated by the processor-internal _MTIME_ system timer unit (if _IO_MTIME_EN_ = _true_) or can be provided by an +external timer unit via the processor's `mtime_i` signal (if _IO_MTIME_EN_ = _false_). +CSR is read-only. Change the system time via the _MTIME_ unit. +|====== + + +:sectnums!: +===== **`instret[h]`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xc02 | **Instructions-retired counter - low word** | `instret` +| 0xc82 | **Instructions-retired counter - high word** | `instreth` +3+| Reset value: _UNDEFINED_ +3+| The `instret[h]` CSR is compatible to the RISC-V specifications. It shows the lower/upper 32-bit of the 64-bit retired +instructions counter. The `instret[h]` CSR is a read-only shadowed copy of the `minstret[h]` CSR. +|====== + + +:sectnums!: +===== **`mcycle[h]`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xb00 | **Machine cycle counter - low word** | `mcycle` +| 0xb80 | **Machine cycle counter - high word** | `mcycleh` +3+| Reset value: _UNDEFINED_ +3+| The `mcycle[h]` CSR is compatible to the RISC-V specifications. It shows the lower/upper 32-bit of the 64-bit cycle +counter. The `mcycle[h]` CSR can also be written when in machine mode and is copied to the `cycle[h]` CSR. +|====== + + +:sectnums!: +===== **`minstret[h]`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xb02 | **Machine instructions-retired counter - low word** | `minstret` +| 0xb82 | **Machine instructions-retired counter - high word** | `minstreth` +3+| Reset value: _UNDEFINED_ +3+| The `minstret[h]` CSR is compatible to the RISC-V specifications. It shows the lower/upper 32-bit of the 64-bit retired +instructions counter. The `minstret[h]` CSR also be written when in machine mode and is copied to the `instret[h]` CSR. +|====== + + + +<<< +// #################################################################################################################### +:sectnums: +==== Hardware Performance Monitors (HPM) + +The available hardware performance logic is configured via the _HPM_NUM_CNTS_ top entity generic. +_HPM_NUM_CNTS_ defines the number of implemented performance monitors and thus, the availability of the +according `[m]hpmcounter*[h]` and `mhpmevent*` CSRs. + +The total size of the HPMs can be configured before synthesis via the _HPM_CNT_WIDTH_ generic (0..64-bit). + +[TIP] +If trying to access an HPM-related CSR beyond _HPM_NUM_CNTS_ **no illegal instruction exception is +triggered**. The according CSRs are read-only (writes are ignored) and always return zero. + +[NOTE] +The total LSB-aligned HPM counter size (low word CSR + high word CSR) is defined via the +_HPM_CNT_WIDTH_ generic (0..64-bit). If _HPM_CNT_WIDTH_ is less than 64, all unused MSB-aligned +bits are hardwired to zero. + + +:sectnums!: +===== **`mhpmevent`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x232 -0x33f | **Machine hardware performance monitor event selector** | `mhpmevent3` - `mhpmevent31` +3+| Reset value: _UNDEFINED_ +3+| The `mhpmevent*` CSRs are compatible to the RISC-V specifications. The configuration of these CSR define +the architectural events that cause the according `[m]hpmcounter*[h]` counters to increment. All available events are +listed in the table below. If more than one event is selected, the according counter will increment if any of +the enabled events is observed (logical OR). Note that the counter will only increment by 1 step per clock +cycle even if more than one event is observed. If the CPU is in sleep mode, no HPM counter will increment +at all. +|====== + +The available hardware performance logic is configured via the _HPM_NUM_CNTS_ top entity generic. +_HPM_NUM_CNTS_ defines the number of implemented performance monitors and thus, the availability of the +according `[m]hpmcounter*[h]` and `mhpmevent*` CSRs. + +.HPM event selector +[cols="^1,<3,^1,<5"] +[options="header",grid="rows"] +|======================= +| Bit | Name [C] | R/W | Event +| 0 | _HPMCNT_EVENT_CY_ | r/w | active clock cycle (not in sleep) +| 1 | - | r/- | _not implemented, always read as zero_ +| 2 | _HPMCNT_EVENT_IR_ | r/w | retired instruction +| 3 | _HPMCNT_EVENT_CIR_ | r/w | retired cmpressed instruction +| 4 | _HPMCNT_EVENT_WAIT_IF_ | r/w | instruction fetch memory wait cycle (if more than 1 cycle memory latency) +| 5 | _HPMCNT_EVENT_WAIT_II_ | r/w | instruction issue pipeline wait cycle (if more than 1 cycle latency), caused by pipelines flushes (like taken branches) +| 6 | _HPMCNT_EVENT_WAIT_MC_ | r/w | multi-cycle ALU operation wait cycle +| 7 | _HPMCNT_EVENT_LOAD_ | r/w | load operation +| 8 | _HPMCNT_EVENT_STORE_ | r/w | store operation +| 9 | _HPMCNT_EVENT_WAIT_LS_ | r/w | load/store memory wait cycle (if more than 1 cycle memory latency) +| 10 | _HPMCNT_EVENT_JUMP_ | r/w | unconditional jump +| 11 | _HPMCNT_EVENT_BRANCH_ | r/w | conditional branch (taken or not taken) +| 12 | _HPMCNT_EVENT_TBRANCH_ | r/w | taken conditional branch +| 13 | _HPMCNT_EVENT_TRAP_ | r/w | entered trap +| 14 | _HPMCNT_EVENT_ILLEGAL_ | r/w | illegal instruction exception +|======================= + + +:sectnums!: +===== **`hpmcounter[h]`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xc03 - 0xc1f | **Hardware performance monitor - counter low** | `hpmcounter3` - `hpmcounter31` +| 0xc83 - 0xc9f | **Hardware performance monitor - counter high** | `hpmcounter3h` - `hpmcounter31h` +3+| Reset value: _UNDEFINED_ +3+| The `hpmcounter*[h]` CSRs are compatible to the RISC-V specifications. These CSRs provide the lower/upper 32-bit +of arbitrary event counters (64-bit). These CSRs are read-only and provide a showed copy of the according +`mhpmcounter*[h]` CSRs. The event(s) that trigger an increment of theses counters are selected via the according +`mhpmevent*` CSRs. +|====== + + +:sectnums!: +===== **`mhpmcounter[h]`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xb03 - 0xb1f | **Machine hardware performance monitor - counter low** | `mhpmcounter3` - `mhpmcounter31` +| 0xb83 - 0xb9f | **Machine hardware performance monitor - counter high** | `mhpmcounter3h` - `mhpmcounter31h` +3+| Reset value: _UNDEFINED_ +3+| The `mhpmcounter*[h]` CSRs are compatible to the RISC-V specifications. These CSRs provide the lower/upper 32- +bit of arbitrary event counters (64-bit). The `mhpmcounter*[h]` CSRs can also be written and are copied to the +`hpmcounter*[h]` CSRs. The event(s) that trigger an increment of theses counters are selected via the according +`mhpmevent*` CSRs. +|====== + + +<<< +// #################################################################################################################### +:sectnums: +==== Machine Counter Setup + +:sectnums!: +===== **`mcountinhibit`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x320 | **Machine counter-inhibit register** | `mcountinhibit` +3+| Reset value: _UNDEFINED_ +3+| The `mcountinhibit` CSR is compatible to the RISC-V specifications. The bits in this register define which +counter/timer CSR are allowed to perform an automatic increment. Automatic update is enabled if the +according bit in `mcountinhibit` is cleared. The following bits are implemented (all remaining bits are +always zero and are read-only). +|====== + +.Machine counter-inhibit register +[cols="^1,<3,^1,<5"] +[options="header",grid="rows"] +|======================= +| Bit | Name [C] | R/W | Event +| 0 | _CSR_MCOUNTINHIBIT_IR_ | r/w | the `[m]instret[h]` CSRs will auto-increment with each committed instruction when set +| 2 | _CSR_MCOUNTINHIBIT_IR_ | r/w | the `[m]cycle[h]` CSRs will auto-increment with each clock cycle (if CPU is not in sleep state) when set +| 3:31 | _CSR_MCOUNTINHIBIT_HPM3_ _: _CSR_MCOUNTINHIBIT_HPM31_ | r/w | the `[m]hpmcount*[h]` CSRs will auto-increment according to the configured `mhpmevent*` selector +|======================= + + +<<< +// #################################################################################################################### +:sectnums: +==== Machine Information Registers + + +:sectnums!: +===== **`mvendorid`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xf11 | **Machine vendor ID** | `mvendorid` +3+| Reset value: _0x00000000_ +3+| The `mvendorid` CSR is compatible to the RISC-V specifications. It is read-only and always reads zero. +|====== + + +:sectnums!: +===== **`marchid`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xf12 | **Machine architecture ID** | `marchid` +3+| Reset value: _0x00000013_ +3+| The `marchid` CSR is compatible to the RISC-V specifications. It is read-only and shows the NEORV32 +official _RISC-V open-source architecture ID_ (decimal: 19, 32-bit hexadecimal: 0x00000013). +|====== + + +:sectnums!: +===== **`mimpid`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xf13 | **Machine implementation ID** | `mimpid` +3+| Reset value: _HW version number_ +3+| The `mimpid` CSR is compatible to the RISC-V specifications. It is read-only and shows the version of the +NEORV32 as BCD-coded number (example: `mimpid` = _0x01020312_ → 01.02.03.12 → version 1.2.3.12). +|====== + + +:sectnums!: +===== **`mhartid`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xf14 | **Machine hardware thread ID** | `mhartid` +3+| Reset value: _HW_THREAD_ID_ generic +3+| The `mhartid` CSR is compatible to the RISC-V specifications. It is read-only and shows the core's hart ID, +which is assigned via the CPU's _HW_THREAD_ID_ generic. +|====== + + + +<<< +// #################################################################################################################### +:sectnums: +==== NEORV32-Specific Custom CSRs + + +:sectnums!: +===== **`mzext`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0xfc0 | **Available Z* extensions** | `mzext` +3+| Reset value: _0x00000000_ +3+| The `mzext` CSR is a custom read-only CSR that shows the implemented Z* extensions. The following bits +are implemented (all remaining bits are always zero). The entire CSR is read-only. +|====== + +.Machine counter-inhibit register +[cols="^1,<3,^1,<5"] +[options="header",grid="rows"] +|======================= +| Bit | Name [C] | R/W | Event +| 0 | _CPU_MZEXT_ZICSR_ | r/- | `Zicsr` extensions available (enabled via _CPU_EXTENSION_RISCV_Zicsr_ generic) +| 1 | _CPU_MZEXT_ZIFENCEI_ | r/- | `Zifencei` extensions available (enabled via _CPU_EXTENSION_RISCV_Zifencei_ generic) +| 5 | _CPU_MZEXT_ZFINX_ | r/- | `Zfinx` extensions available (enabled via _CPU_EXTENSION_RISCV_Zfinx_ generic) +| 6 | _CPU_MZEXT_ZXSCNT_ | r/- | custom extension: "Small CPU counters": `cycle[h]` & `instret[h]` CSRs have less than 64-bit when set (when _CPU_CNT_WIDTH_ generic is less than 64) +| 7 | _CPU_MZEXT_ZXNOCNT_ | r/- | custom extension: "NO CPU counters": `cycle[h]` & `instret[h]` CSRs are not available at all when set (when _CPU_CNT_WIDTH_ generic is 0) +| 8 | _CSR_MZEXT_PMP_ | r/- | PMP (physical memory protection) extension available (_PMP_NUM_REGIONS_ generic > 0) +| 9 | _CSR_MZEXT_HPM_ | r/- | HPM (hardware performance monitors) extension available (_HPM_NUM_CNTS_ generic > 0) +| 10 | _CSR_MZEXT_DEBUGMODE_ | r/- | RISC-V "CPU debug mode" extension available (enabled via _CPU_EXTENSION_RISCV_DEBUG_ generic) +|======================= Index: datasheet/index.adoc =================================================================== --- datasheet/index.adoc (nonexistent) +++ datasheet/index.adoc (revision 60) @@ -0,0 +1,32 @@ += The NEORV32 RISC-V Processor: Datasheet +:title: [Datasheet] The NEORV32 RISC-V Processor +:author: Dipl.-Ing. Stephan Nolting +:email: stnolting@gmail.com +:description: A size-optimized, customizable and open-source full-scale 32-bit RISC-V soft-core CPU and SoC written in platform-independent VHDL. +:revnumber: v1.5.6.0 +:doctype: book +:sectnums: +:icons: font +:imagesdir: img +:stem: +:reproducible: +:listing-caption: Listing +:toc: left +:toclevels: 4 +:title-logo-image: neorv32_logo_dark.png[pdfwidth=6.25in,align=center] +:favicon: img/icon.png + +image::neorv32_logo_transparent.png[align=center] + +image::riscv_logo.png[width=350,align=center] + +[.text-center] +https://github.com/stnolting/neorv32[image:https://img.shields.io/badge/GitHub-stnolting%2Fneorv32-ffbd00?style=flat-square&logo=github&[title='homepage']] +https://github.com/stnolting/neorv32/blob/master/LICENSE[image:https://img.shields.io/github/license/stnolting/neorv32?longCache=true&style=flat-square[title='license']] +https://github.com/stnolting/neorv32/releases/tag/nightly[image:https://img.shields.io/badge/data%20sheet-PDF-ffbd00?longCache=true&style=flat-square&logo=asciidoctor[title='datasheet (pdf)']] +https://github.com/stnolting/neorv32/releases/tag/nightly[image:https://img.shields.io/badge/user%20guide-PDF-ffbd00?longCache=true&style=flat-square&logo=asciidoctor[title='userguide (pdf)']] +https://stnolting.github.io/neorv32/ug[image:https://img.shields.io/badge/-HTML-ffbd00?longCache=true&style=flat-square[title='userguide (html)']] +https://stnolting.github.io/neorv32/sw/files.html[image:https://img.shields.io/badge/doxygen-HTML-ffbd00?longCache=true&style=flat-square&logo=Doxygen[title='doxygen']] + + +include::content.adoc[] Index: datasheet/main.adoc =================================================================== --- datasheet/main.adoc (nonexistent) +++ datasheet/main.adoc (revision 60) @@ -0,0 +1,34 @@ += The NEORV32 RISC-V Processor: Datasheet +:author: Dipl.-Ing. Stephan Nolting +:email: stnolting@gmail.com +:description: A size-optimized, customizable and open-source full-scale 32-bit RISC-V soft-core CPU and SoC written in platform-independent VHDL. +:revnumber: v1.5.6.0 +:doctype: book +:sectnums: +:icons: image +:iconsdir: ../icons +:imagesdir: ../figures +:stem: +:reproducible: +:listing-caption: Listing +:toc: macro +:toclevels: 4 +:title-logo-image: image:neorv32_logo_dark.png[pdfwidth=6.25in,align=center] +// Uncomment next line to set page size (default is A4) +//:pdf-page-size: Letter + + +<<< +// #################################################################################################################### +.**Documentation** +[TIP] +The online documentation of the project (a.k.a. the **data sheet**) is available on GitHub-pages: https://stnolting.github.io/neorv32/ + + + +The online documentation of the **software framework** is also available on GitHub-pages: https://stnolting.github.io/neorv32/sw/files.html + + +<<< +// #################################################################################################################### +toc::[] + +include::content.adoc[] Index: datasheet/on_chip_debugger.adoc =================================================================== --- datasheet/on_chip_debugger.adoc (nonexistent) +++ datasheet/on_chip_debugger.adoc (revision 60) @@ -0,0 +1,602 @@ +<<< +:sectnums: +== On-Chip Debugger (OCD) + +The NEORV32 Processor features an _on-chip debugger_ (OCD) implementing **execution-based debugging** that is compatible +to the **Minimal RISC-V Debug Specification Version 0.13.2**. +Please refer to this spec for in-deep information. +A copy of the specification is available in `docs/references/riscv-debug-release.pdf`. +The NEORV32 OCD provides the following key features: + +* JTAG test access port +* run-control of the CPU: halting, single-stepping and resuming +* executing arbitrary programs during debugging +* accessing core registers (direct access to GPRs, indirect access to CSRs via program buffer) +* indirect access to the whole processor address space (via program buffer)) +* compatible to the https://github.com/riscv/riscv-openocd[RISC-V port of OpenOCD]; + pre-built binaries can be obtained for example from https://www.sifive.com/software[SiFive] + +[NOTE] +The OCD requires additional resources for implementation and _might_ also increase the critical path resulting in less +performance. If the OCD is not really required for the _final_ implementation, it can be disabled and thus, +discarded from implementation. In this case all circuitry of the debugger is completely removed (no impact +on area, energy or timing at all). + +[TIP] +A simple example on how to use NEORV32 on-chip debugger in combination with `OpenOCD` and `gdb` +is shown in chapter <<_debugging_using_the_on_chip_debugger>>. + +The NEORV32 on-chip debugger complex is based on three hardware modules: + +.NEORV32 on-chip debugger complex +image::neorv32_ocd_complex.png[align=center] + +[start=1] +. <<_debug_transport_module_dtm>> (`rtl/core/neorv32_debug_dtm.vhd`): External JTAG access tap to allow an external + adapter to interface with the _debug module(DM)_ using the _debug module interface (dmi)_. +. <<_debug_module_dm>> (`rtl/core/neorv32_debug_tm.vhd`): Debugger control unit that is configured by the DTM via the + the _dmi_. Form the CPU's "point of view" this module behaves as a memory-mapped "peripheral" that can be accessed + via the processor-internal bus. The memory-mapped registers provide an internal _data buffer_ for data transfer + from/to the DM, a _code ROM_ containing the "park loop" code, a _program buffer_ to allow the debugger to + execute small programs defined by the DM and a _status register_ that is used to communicate + _halt_, _resume_ and _execute_ requests/acknowledges from/to the DM. +. CPU <<_cpu_debug_mode>> extension (part of`rtl/core/neorv32_cpu_control.vhd`): + This extension provides the "debug execution mode" which executes the "park loop" code from the DM. + The mode also provides additional CSRs. + +**Theory of Operation** + +When debugging the system using the OCD, the debugger issues a halt request to the CPU (via the CPU's +`db_halt_req_i` signal) to make the CPU enter _debug mode_. In this state, the application-defined architectural +state of the system/CPU is "frozen" so the debugger can monitor and even modify it. +While in debug mode, the CPU executes the "park loop" code from the _code ROM_ of the DM. +This park loop implements an endless loop, in which the CPU polls the memory-mapped _status register_ that is +controlled by the _debug module (DM)_. The flags of these register are used to communicate _requests_ from +the DM and to _acknowledge_ them by the CPU: trigger execution of the program buffer or resume the halted +application. + + + +<<< +// #################################################################################################################### +:sectnums: +=== Debug Transport Module (DTM) + +The debug transport module (VHDL module: `rtl/core/neorv32_debug_dtm.vhd`) provides a JTAG test access port (TAP). +The DTM is the first entity in the debug system, which connects and external debugger via JTAG to the next debugging +entity: the debug module (DM). +External access is provided by the following top-level ports. + +.JTAG top level signals +[cols="^2,^2,^2,<8"] +[options="header",grid="rows"] +|======================= +| Name | Width | Direction | Description +| `jtag_trst_i` | 1 | in | TAP reset (low-active); this signal is optional, make sure to pull it _high_ if it is not used +| `jtag_tck_i` | 1 | in | serial clock +| `jtag_tdi_i` | 1 | in | serial data input +| `jtag_tdo_o` | 1 | out | serial data output +| `jtag_tms_i` | 1 | in | mode select +|======================= + +.JTAG Clock +[IMPORTANT] +The actual JTAG clock signal is **not** used as primary clock. Instead it is used to synchronize +JTGA accesses, while all internal operations trigger on the system clock. Hence, no additional clock domain is required +for integration of this module. +However, this constraints the maximal JTAG clock (`jtag_tck_i`) frequency to be less than or equal to +1/4 of the system clock (`clk_i`) frequency. + +[NOTE] +If the on-chip debugger is disabled (_ON_CHIP_DEBUGGER_EN_ = false) the JTAG serial input `jtag_tdi_i` is directly +connected to the JTAG serial output `jtag_tdo_o` to maintain the JTAG chain. + +[WARNING] +The NEORV32 JTAG TAP does not provide a _boundary check_ function (yet?). Hence, physical device pins cannot be accessed. + +The DTM uses the "debug module interface (dmi)" to access the actual debug module (DM). +These accesses are controlled by TAP-internal registers. +Each registers is selected by the JTAG instruction register (`IR`) and accessed through the JTAG data register (`DR`). + +[NOTE] +The DTM's instruction and data registers can be accessed using OpenOCDs `irscan` and `drscan` commands. +The RISC-V port of OpenOCD also provides low-level command (`riscv dmi_read` & `riscv dmi_write`) to access the _dmi_ +debug module interface. + +JTAG access is conducted via the *instruction register* `IR`, which is 5 bit wide, and several *data registers* `DR` +with different sizes. +The data registers are accessed by writing the according address to the instruction register. +The following table shows the available data registers: + +.JTAG TAP registers +[cols="^2,^2,^2,<8"] +[options="header",grid="rows"] +|======================= +| Address (via `IR`) | Name | Size [bits] | Description +| `00001` | `IDCODE` | 32 | identifier, default: `0x0CAFE001` (configurable via package's `jtag_tap_idcode_*` constants) +| `10000` | `DTMCS` | 32 | debug transport module control and status register +| `10001` | `DMI` | 41 | debug module interface (_dmi_); 7-bit address, 32-bit read/write data, 2-bit operation (`00` = NOP; `10` = write; `01` = read) +| others | `BYPASS` | 1 | default JTAG bypass register +|======================= + +[INFO] +See the https://github.com/riscv/riscv-debug-spec[RISC-V debug specification] for more information regarding the data +registers and operations. +A local copy can be found in `docs/references`. + + + +<<< +// #################################################################################################################### +:sectnums: +=== Debug Module (DM) + +According to the RISC-V debug specification, the DM (VHDL module: `rtl/core/neorv32_debug_dm.vhd`) +acts as a translation interface between abstract operations issued by the debugger and the platform-specific +debugger implementation. It supports the following features (excerpt from the debug spec): + +* Gives the debugger necessary information about the implementation. +* Allows the hart to be halted and resumed and provides status of the current state. +* Provides abstract read and write access to the halted hart's GPRs. +* Provides access to a reset signal that allows debugging from the very first instruction after reset. +* Provides a mechanism to allow debugging the hart immediately out of reset. (_still experimental_) +* Provides a Program Buffer to force the hart to execute arbitrary instructions. +* Allows memory access from a hart's point of view. + +The NEORV32 DM follows the "Minimal RISC-V External Debug Specification" to provide full debugging +capabilities while keeping resource (area) requirements at a minimum level. +It implements the **execution based debugging scheme** for a single hart and provides the following +hardware features: + +* program buffer with 2 entries and implicit `ebreak` instruction afterwards +* no _direct_ bus access (indirect bus access via the CPU) +* abstract commands: "access register" plus auto-execution +* no _dedicated_ halt-on-reset capabilities yet (but can be emulated) + +The DM provides two "sides of access": access from the DTM via the _debug module interface (dmi)_ and access from the +CPU via the processor-internal bus. From the DTM's point of view, the DM implements a set of <<_dm_registers>> that +are used to control and monitor the actual debugging. From the CPU's point of view, the DM implements several +memory-mapped registers (within the _normal_ address space) that are used for communicating debugging control +and status (<<_dm_cpu_access>>). + + +:sectnums: +==== DM Registers + +The DM is controlled via a set of registers that are accessed via the DTM's _dmi_. +The "Minimal RISC-V Debug Specification" requires only a subset of the registers specified in the spec. +The following registers are implemented. +Write accesses to any other registers are ignored and read accesses will always return zero. +Register names that are encapsulated in "( )" are not actually implemented; however, they are listed to explicitly show +their functionality. + +.Available DM registers +[cols="^2,^3,<7"] +[options="header",grid="rows"] +|======================= +| Address | Name | Description +| `0x04` | `data0` | Abstract data 0, used for data transfer between debugger and processor +| `0x10` | `dmcontrol` | Debug module control +| `0x11` | `dmstatus` | Debug module status +| `0x12` | `hartinfo` | Hart information +| `0x16` | `abstracts` | Abstract control and status +| `0x17` | `command` | Abstract command +| `0x18` | `abstractauto` | Abstract command auto-execution +| `0x1d` | (`nextdm`) | Base address of _next_ DM; read as zero to indicate there is only _one_ DM +| `0x20` | `progbuf0` | Program buffer 0 +| `0x21` | `progbuf1` | Program buffer 1 +| `0x38` | (`sbcs`) | System bus access control and status; read as zero to indicate there is no _direct_ system bus access +| `0x40` | `haltsum0` | Halt summary 0 +|======================= + + +:sectnums!: +===== **`data`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x04 | **Abstract data 0** | `data0` +3+| Reset value: _UNDEFINED_ +3+| Basic read/write registers to be used with abstract command (for example to read/write data from/to CPU GPRs). +|====== + + +:sectnums!: +===== **`dmcontrol`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x10 | **Debug module control register** | `dmcontrol` +3+| Reset value: 0x00000000 +3+| Control of the overall debug module and the hart. The following table shows all implemented bits. All remaining bits/bit-fields are configures as "zero" and are +read-only. Writing '1' to these bits/fields will be ignored. +|====== + +.`dmcontrol` - debug module control register bits +[cols="^1,^2,^1,<8"] +[options="header",grid="rows"] +|======================= +| Bit | Name [RISC-V] | R/W | Description +| 31 | `haltreq` | -/w | set/clear hart halt request +| 30 | `resumereq` | -/w | request hart to resume +| 28 | `ackhavereset` | -/w | write `1` to clear `*havereset` flags +| 1 | `ndmreset` | r/w | put whole processor into reset when `1` +| 0 | `dmactive` | r/w | DM enable; writing `0`-`1` will reset the DM +|======================= + + +:sectnums!: +===== **`dmstatus`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x11 | **Debug module status register** | `dmstatus` +3+| Reset value: 0x00000000 +3+| Current status of the overall debug module and the hart. The entire register is read-only. +|====== + +.`dmstatus` - debug module status register bits +[cols="^1,^2,<10"] +[options="header",grid="rows"] +|======================= +| Bit | Name [RISC-V] | Description +| 31:23 | _reserved_ | reserved; always zero +| 22 | `impebreak` | always `1`; indicates an implicit `ebreak` instruction after the last program buffer entry +| 21:20 | _reserved_ | reserved; always zero +| 19 | `allhavereset` .2+| `1` when the hart is in reset +| 18 | `anyhavereset` +| 17 | `allresumeack` .2+| `1` when the hart has acknowledged a resume request +| 16 | `anyresumeack` +| 15 | `allnonexistent` .2+| always zero to indicate the hart is always existent +| 14 | `anynonexistent` +| 13 | `allunavail` .2+| `1` when the DM is disabled to indicate the hart is unavailable +| 12 | `anyunavail` +| 11 | `allrunning` .2+| `1` when the hart is running +| 10 | `anyrunning` +| 9 | `allhalted` .2+| `1` when the hart is halted +| 8 | `anyhalted` +| 7 | `authenticated` | always `1`; there is no authentication +| 6 | `authbusy` | always `0`; there is no authentication +| 5 | `hasresethaltreq` | always `0`; halt-on-reset is not supported (directly) +| 4 | `confstrptrvalid` | always `0`; no configuration string available +| 3:0 | `version` | `0010` - DM is compatible to version 0.13 +|======================= + + +:sectnums!: +===== **`hartinfo`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x12 | **Hart information** | `hartinfo` +3+| Reset value: see below +3+| This register gives information about the hart. The entire register is read-only. +|====== + +.`hartinfo` - hart information register bits +[cols="^1,^2,<8"] +[options="header",grid="rows"] +|======================= +| Bit | Name [RISC-V] | Description +| 31:24 | _reserved_ | reserved; always zero +| 23:20 | `nscratch` | `0001`, number of `dscratch*` CPU registers = 1 +| 19:17 | _reserved_ | reserved; always zero +| 16 | `dataccess` | `0`, the `data` registers are shadowed in the hart's address space +| 15:12 | `datasize` | `0001`, number of 32-bit words in the address space dedicated to shadowing the `data` registers = 1 +| 11:0 | `dataaddr` | = `dm_data_base_c(11:0)`, signed base address of `data` words (see address map in <<_dm_cpu_access>>) +|======================= + + +:sectnums!: +===== **`abstracts`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x16 | **Abstract control and status** | `abstracts` +3+| Reset value: see below +3+| Command execution info and status. +|====== + +.`abstracts` - abstract control and status register bits +[cols="^1,^2,^1,<8"] +[options="header",grid="rows"] +|======================= +| Bit | Name [RISC-V] | R/W | Description +| 31:29 | _reserved_ | r/- | reserved; always zero +| 28:24 | `progbufsize` | r/- | `0010`; size of the program buffer (`progbuf`) = 2 entries +| 23:11 | _reserved_ | r/- | reserved; always zero +| 12 | `busy` | r/- | `1` when a command is being executed +| 11 | _reserved_ | r/- | reserved; always zero +| 10:8 | `cmerr` | r/w | error during command execution (see below); has to be cleared by writing `111` +| 7:4 | _reserved_ | r/- | reserved; always zero +| 3:0 | `datacount` | r/- | `0001`; number of implemented `data` registers for abstract commands = 1 +|======================= + +Error codes in `cmderr` (highest priority first): + +* `000` - no error +* `100` - command cannot be executed since hart is not in expected state +* `011` - exception during command execution +* `010` - unsupported command +* `001` - invalid DM register read/write while command is/was executing + + +:sectnums!: +===== **`command`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x17 | **Abstract command** | `command` +3+| Reset value: 0x00000000 +3+| Writing this register will trigger the execution of an abstract command. New command can only be executed if +`cmderr` is zero. The entire register in write-only (reads will return zero). +|====== + +[NOTE] +The NEORV32 DM only supports **Access Register** abstract commands. These commands can only access the +hart's GPRs (abstract command register index `0x1000` - `0x101f`). + +.`command` - abstract command register - "access register" commands only +[cols="^1,^2,<8"] +[options="header",grid="rows"] +|======================= +| Bit | Name [RISC-V] | Description / required value +| 31:24 | `cmdtype` | `00000000` to indicate "access register" command +| 23 | _reserved_ | reserved, has to be `0` when writing +| 22:20 | `aarsize` | `010` to indicate 32-bit accesses +| 21 | `aarpostincrement` | `0`, postincrement is not supported +| 18 | `postexec` | if set the program buffer is executed _after_ the command +| 17 | `transfer` | if set the operation in `write` is conducted +| 16 | `write` | `1`: copy `data0` to `[regno]`; `0` copy `[regno]` to `data0` +| 15:0 | `regno` | GPR-access only; has to be `0x1000` - `0x101f` +|======================= + + +:sectnums!: +===== **`abstractauto`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x18 | **Abstract command auto-execution** | `abstractauto` +3+| Reset value: 0x00000000s +3+| Register to configure when a read/write access to a DM repeats execution of the last abstract command. +|====== + +.`abstractauto` - Abstract command auto-execution register bits +[cols="^1,^2,^1,<8"] +[options="header",grid="rows"] +|======================= +| Bit | Name [RISC-V] | R/W | Description +| 17 | `autoexecprogbuf[1]` | r/w | when set reading/writing from/to `progbuf1` will execute `command again` +| 16 | `autoexecprogbuf[0]` | r/w | when set reading/writing from/to `progbuf0` will execute `command again` +| 0 | `autoexecdata[0]` | r/w | when set reading/writing from/to `data0` will execute `command again` +|======================= + + +:sectnums!: +===== **`progbuf`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x20 | **Program buffer 0** | `progbuf0` +| 0x21 | **Program buffer 1** | `progbuf1` +3+| Reset value: `NOP`-instruction +3+| General purpose program buffer for the DM. +|====== + + +:sectnums!: +===== **`haltsum0`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x40 | **Halt summary 0** | `haltsum0` +3+| Reset value: _UNDEFINED_ +3+| Bit 0 of this register is set if the hart is halted (all remaining bits are always zero). The entire register is read-only. +|====== + +:sectnums: +==== DM CPU Access + +From the CPU's point of view, the DM behaves as a memory-mapped peripheral that includes + +* a small ROM that contains the code for the "park loop", which is executed when the CPU is _in_ debug mode. +* a program buffer populated by the debugger host to execute small programs +* a data buffer to transfer data between the processor and the debugger host +* a status register to communicate debugging requests + +.Park Loop Code Sources +[NOTE] +The assembly sources of the **park loop code** are available in `sw/ocd-firmware/park_loop.S`. Please note, that these +sources are not intended to be changed by the used. Hence, the makefile does not provide an automatic option +to compile and "install" the debugger ROM code into the HDL sources and require a manual copy +(see `sw/ocd-firmware/README.md`). + +The DM uses a total address space of 128 words of the CPU's address space (= 512 bytes) divided into four sections +of 32 words (= 128 bytes) each. +Please note, that the program buffer, the data buffer and the status register only uses a few effective words in this +address space. However, these effective addresses are mirrored to fill up the whole 128 bytes of the section. +Hence, any CPU access within this address space will succeed. + +.DM CPU access - address map (divided into four sections) +[cols="^2,^4,^2,<7"] +[options="header",grid="rows"] +|======================= +| Base address | Name [VHDL package] | Actual size | Description +| `0xfffff800` | `dm_code_base_c` (= `dm_base_c`) | 128 bytes | Code ROM for the "park loop" code +| `0xfffff880` | `dm_pbuf_base_c` | 16 bytes | Program buffer, provided by DM +| `0xfffff900` | `dm_data_base_c` | 4 bytes | Data buffer (`dm.data0`) +| `0xfffff980` | `dm_sreg_base_c` | 4 bytes | Control and status register +|======================= + +[NOTE] +From the CPU's point of view, the DM is mapped to an _"unused"_ address range within the processor's +<<_address_space>> right between the bootloader ROM (BOOTROM) and the actual processor-internal IO +space at addresses `0xfffff800` - `0xfffff9ff` + +When the CPU enters or re-enters (for example via `ebreak` in the DM's program buffer) debug mode, it jumps to +the beginning of the DM's "park loop" code ROM at `dm_code_base_c`. This is the _normal entry point_ for the +park loop code. If an exception is encountered during debug mode, the CPU jumps to `dm_code_base_c + 4`, +which is the _exception entry point_. + +**Status Register** + +The status register provides a direct communication channel between the CPU executing the park loop and the +host-controlled controller of the DM. Note that all bits that can be written by the CPU (acknowledge flags) +cause a single-shot (1-cycle) signal to the DM controller and auto-clear (always read as zero). +The bits that are driven by the DM controller and are read-only to the CPU and keep their state until the CPU +acknowledges the according request. + +.DM CPU access - status register +[cols="^2,^2,^2,<8"] +[options="header",grid="rows"] +|======================= +| Bit | Name | CPU access | Description +| 0 | `halt_ack` | -/w | Set by the CPU to indicate that the CPU is halted and keeps iterating in the park loop +| 1 | `resume_req` | r/- | Set by the DM to tell the CPU to resume normal operation (leave parking loop and leave debug mode via `dret` instruction) +| 2 | `resume_ack` | -/w | Set by the CPU to acknowledge that the CPU is now going to leave parking loop & debug mode +| 3 | `execute_req` | r/- | Set by the DM to tell the CPU to leave debug mode and execute the instructions from the program buffer; CPU will re-enter parking loop afterwards +| 4 | `execute_ack` | -/w | Set by the CPU to acknowledge that the CPU is now going to execute the program buffer +| 5 | `exception_ack` | -/w | Set by the CPU to inform the DM that an exception occurred during execution of the park loop or during execution of the program buffer +|======================= + + + +<<< +// #################################################################################################################### +:sectnums: +=== CPU Debug Mode + +The NEORV32 CPU Debug Mode `DB` (part of `rtl/core/neorv32_cpu_control.vhd`) is compatible to the "Minimal RISC-V Debug Specification 0.13.2". +It is enabled/implemented by setting the CPU generic _CPU_EXTENSION_RISCV_DEBUG_ to "true" (done by setting processor +generic _ON_CHIP_DEBUGGER_EN_). +It provides a new operation mode called "debug mode". +When enabled, three additional CSRs are available (section <<_cpu_debug_mode_csrs>>) and also the "return from debug mode" +instruction `dret` is available when the CPU is "in" debug mode. + +[IMPORTANT] +The CPU _debug mode_ requires the `Zicsr` CPU extension to be implemented (top generic _CPU_EXTENSION_RISCV_Zicsr_ = true). + +The CPU debug mode is entered when one of the following events appear: + +[start=1] +. executing `ebreak` instruction (when `dcsr.ebreakm` is set and in machine mode OR when `dcsr.ebreaku` is set and in user mode) +. debug halt request from external DM (via CPU signal `db_halt_req_i`, high-active, triggering on rising-edge) +. finished executing of a single instruction while in single-step debugging mode (enabled via `dcsr.step`) + +From a hardware point of view, these "entry conditions" are special synchronous (`ebreak` instruction) or asynchronous +(single-stepping "interrupt"; halt request "interrupt") traps, that are handled invisibly by the control logic. + +Whenever the CPU **enters debug mode** it performs the following operations: + +* move `pc` to `dpcs` +* copy the hart's current privilege level to `dcsr.prv` +* set `dcrs.cause` according to the cause why debug mode is entered +* **no update** of `mtval`, `mcause`, `mtval` and `mstatus` CSRs +* load the address configured via the CPU _CPU_DEBUG_ADDR_ generic to the `pc` to jump to "debugger park loop" code in the debug module (DM) + +When the CPU **is in debug mode** the following things are important: + +* while in debug mode, the CPU executes the parking loop and the program buffer provided by the DM if requested +* effective CPU privilege level is `machine` mode, PMP is not active +* if an exception occurs + * if the exception was caused by any debug-mode entry action the CPU jumps to the _normal entry point_ + ( = _CPU_DEBUG_ADDR_) of the park loop again (for example when executing `ebreak` in debug mode) + * for all other exception sources the CPU jumps to the _exception entry point_ ( = _CPU_DEBUG_ADDR_ + 4) + to signal an exception to the DM and restarts the park loop again afterwards +* interrupts _including_ non-maskable interrupts are disabled; however, they will be buffered and executed when the CPU has left debug mode +* if the DM makes a resume request, the park loop exits and the CPU leaves debug mode (executing `dret`) + +Debug mode is left either by executing the `dret` instruction footnote:[`dret` should only be executed _inside_ the debugger +"park loop" code (-> code ROM in the debug module (DM).)] (_in_ debug mode) or by performing +a hardware reset of the CPU. Executing `dret` outside of debug mode will raise an illegal instruction exception. +Whenever the CPU **leaves debug mode** the following things happen: + +* set the hart's current privilege level according to `dcsr.prv` +* restore `pc` from `dpcs` +* resume normal operation at `pc` + + +:sectnums: +==== CPU Debug Mode CSRs + +Two additional CSRs are required by the _Minimal RISC-V Debug Specification_: The debug mode control and status register +`dcsr` and the program counter `dpc`. Providing a general purpose scratch register for debug mode (`dscratch0`) allows +faster execution of program provided by the debugger, since _one_ general purpose register can be backup-ed and +directly used. + +[NOTE] +The debug-mode control and status registers (CSRs) are only accessible when the CPU is _in_ debug mode. +If these CSRs are accessed outside of debug mode (for example when in `machine` mode) an illegal instruction exception +is raised. + + +:sectnums!: +===== **`dcsr`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x7b0 | **Debug control and status register** | `dcsr` +3+| Reset value: 0x00000000 +3+| The `dcsr` CSR is compatible to the RISC-V debug spec. It is used to configure debug mode and provides additional status information. +The following bits are implemented. The reaming bits are read-only and always read as zero. +|====== + +.Debug control and status register bits +[cols="^1,^2,^1,<8"] +[options="header",grid="rows"] +|======================= +| Bit | Name [RISC-V] | R/W | Event +| 31:28 | `xdebugver` | r/- | always `0100` - indicates external debug support exists +| 27:16 | - | r/- | _reserved_, read as zero +| 15 | `ebereakm` | r/w | `ebreak` instructions in `machine` mode will _enter_ debug mode when set +| 14 | [line-through]#`ebereakh`# | r/- | `0` - hypervisor mode not supported +| 13 | [line-through]#`ebereaks`# | r/- | `0` - supervisor mode not supported +| 12 | `ebereaku` | r/w | `ebreak` instructions in `user` mode will _enter_ debug mode when set +| 11 | [line-through]#`stepie`# | r/- | `0` - IRQs are disabled during single-stepping +| 10 | [line-through]#`stopcount`# | r/- | `0` - counters increment as usual +| 9 | [line-through]#`stoptime`# | r/- | `0` - timers increment as usual +| 8:6 | `cause` | r/- | cause identifier - why was debug mode entered +| 5 | - | r/- | _reserved_, read as zero +| 4 | `mprven` | r/- | `0` - `mstatus.mprv` is ignored when in debug mode +| 3 | `nmip` | r/- | set when the non-maskable CPU/processor interrupt is pending +| 2 | `step` | r/w | enable single-stepping when set +| 1:0 | `prv` | r/w | CPU privilege level before/after debug mode +|======================= + + +:sectnums!: +===== **`dpc`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x7b1 | **Debug program counter** | `dpc` +3+| Reset value: _UNDEFINED_ +3+| The `dcsr` CSR is compatible to the RISC-V debug spec. It is used to store the current program counter when +debug mode is entered. The `dret` instruction will return to `dpc` by moving `dpc` to `pc`. +|====== + + +:sectnums!: +===== **`dscratch0`** + +[cols="4,27,>7"] +[frame="topbot",grid="none"] +|====== +| 0x7b2 | **Debug scratch register 0** | `dscratch0` +3+| Reset value: _UNDEFINED_ +3+| The `dscratch0` CSR is compatible to the RISC-V debug spec. It provides a general purpose debug mode-only scratch register. +|====== + + Index: datasheet/overview.adoc =================================================================== --- datasheet/overview.adoc (nonexistent) +++ datasheet/overview.adoc (revision 60) @@ -0,0 +1,392 @@ +:sectnums: +== Overview + +[quote] +____ +RISC-V - Instruction Sets Want To Be Free! +____ + +The NEORV32footnote:[Pronounced "neo-R-V-thirty-two" or "neo-risc-five-thirty-two" in its long form.] is an open-source +RISC-V compatible processor system that is intended as *ready-to-go* auxiliary processor within a larger SoC +designs or as stand-alone custom / customizable microcontroller. + +The system is highly configurable and provides optional common peripherals like embedded memories, +timers, serial interfaces, general purpose IO ports and an external bus interface to connect custom IP like +memories, NoCs and other peripherals. On-line and in-system debugging is supported by an OpenOCD/gdb +compatible on-chip debugger accessible via JTAG. + +The software framework of the processor comes with application makefiles, software libraries for all CPU +and processor features, a bootloader, a runtime environment and several example programs – including a port +of the CoreMark MCU benchmark and the official RISC-V architecture test suite. RISC-V GCC is used as +default toolchain (https://github.com/stnolting/riscv-gcc-prebuilt[prebuilt toolchains are also provided]). + +[TIP] +The project's change log is available in https://github.com/stnolting/neorv32/blob/master/CHANGELOG.md[CHANGELOG.md] +in the root directory of the NEORV32 repository. Please also check out the <<_legal>> section. + + + +:sectnums!: +=== Structure + +Chapter <<_neorv32_processor_soc>> + +* top entity signals and configuration generics, address space layout, internal peripheral devices and interrupts, internal +memories and caches, internal bus architecture, external bus interface + +Chapter <<_neorv32_central_processing_unit_cpu>> + +* instruction set(s) and extensions, instruction timing, control ans status registers, traps, exceptions and interrupts, +hardware execution safety, native bus interface + +Chapter <<_on_chip_debugger_ocd>> + +* on-chip debugging compatible to the "Minimal RISC-V Debug Specification Version 0.13.2". + +Chapter <<_software_framework>> + +* core libraries, bootloader, makefiles, runtime environment + +Chapter <<_lets_get_it_started>> + +* toolchain installation and setup, hardware setup, software setup, application compilation, simulating the processor +debugging using the on-chip debugger + +[TIP] +Links in this document are <<_structure,highlighted>>. + + + +<<< +// #################################################################################################################### +:sectnums: +=== Project Key Features + +* **NEORV32 CPU**: 32-bit `rv32i` RISC-V CPU - passes the official RISC-V architecture tests +* official https://github.com/riscv/riscv-isa-manual/blob/master/marchid.md[RISC-V open source architecture ID] +* optional RISC-V CPU extensions: +** `A` - atomic memory access operations +** `B` - bit-manipulation instructions +** `C` - 16-bit compressed instructions +** `E` - embedded CPU version (reduced register file size) +** `M` - integer multiplication and division hardware +** `U` - less-privileged _user_ mode +** `Zfinx` - single-precision floating-point unit +** `Zicsr` - control and status register access (privileged architecture) +** `Zifencei` - instruction stream synchronization +** `PMP` - physical memory protection +** `HPM` - hardware performance monitors +* **Software framework** +** GCC-based toolchain - prebuilt toolchains available; application compilation based on GNU makefiles +** internal bootloader with serial user interface +** core libraries for high-level usage of the provided functions and peripherals +** runtime environment and several example programs +** doxygen-based documentation of the software framework; a deployed version is available at https://stnolting.github.io/neorv32/sw/files.html +** FreeRTOS port + demos available +* **NEORV32 Processor**: highly-configurable full-scale microcontroller-like processor system / SoC based on the NEORV32 CPU with optional standard peripherals: +** serial interfaces (UARTs, TWI, SPI) +** timers and counters (WDT, MTIME, NCO) +** general purpose IO and PWM and native NeoPixel (c) compatible smart LED interface +** embedded memories / caches for data, instructions and bootloader +** external memory interface (Wishbone or AXI4-Lite) +* on-chip debugger compatible with OpenOCD and gdb +* fully synchronous design, no latches, no gated clocks +* completely described in behavioral, platform-independent VHDL +* small hardware footprint and high operating frequency + + +<<< +// #################################################################################################################### +:sectnums: +=== Project Folder Structure + +................................... +neorv32 - Project home folder +├.ci - Scripts for continuous integration +├boards - Example setups for various FPGA boards +├CHANGELOG.md - Project change log +├docs - Project documentation +│├doxygen_build - Software framework documentation (generated by doxygen) +│├src_adoc - AsciiDoc sources for this document +│├references - Data sheets and RISC-V specs. +│└figures - Figures and logos +├riscv-arch-test - Port files for the official RISC-V architecture tests +├rtl - VHDL sources +│├core - Sources of the CPU & SoC +│└top_templates - Alternate/additional top entities/wrappers +├sim - Simulation files +│├ghdl - Simulation scripts for GHDL +│├rtl_modules - Processor modules for simulation-only +│└vivado - Pre-configured Xilinx ISIM waveform +└sw - Software framework + ├bootloader - Sources and scripts for the NEORV32 internal bootloader + ├common - Linker script and crt0.S start-up code + ├example - Various example programs + │└... + ├ocd_firmware - source code for on-chip debugger's "park loop" + ├openocd - OpenOCD on-chip debugger configuration files + ├image_gen - Helper program to generate NEORV32 executables + └lib - Processor core library + ├include - Header files (*.h) + └source - Source files (*.c) +................................... + +[NOTE] +There are further files and folders starting with a dot which – for example – contain +data/configurations only relevant for git or for the continuous integration framework (`.ci`). + + +<<< +// #################################################################################################################### +:sectnums: +=== VHDL File Hierarchy + +All necessary VHDL hardware description files are located in the project's `rtl/core folder`. The top entity +of the entire processor including all the required configuration generics is **`neorv32_top.vhd`**. + +[IMPORTANT] +All core VHDL files from the list below have to be assigned to a new design library named **`neorv32`**. Additional +files, like alternative top entities, can be assigned to any library. + +................................... +neorv32_top.vhd - NEORV32 Processor top entity +├neorv32_boot_rom.vhd - Bootloader ROM +│└neorv32_bootloader_image.vhd - Bootloader boot ROM memory image +├neorv32_busswitch.vhd - Processor bus switch for CPU buses (I&D) +├neorv32_bus_keeper.vhd - Processor-internal bus monitor +├neorv32_icache.vhd - Processor-internal instruction cache +├neorv32_cfs.vhd - Custom functions subsystem +├neorv32_cpu.vhd - NEORV32 CPU top entity +│├neorv32_package.vhd - Processor/CPU main VHDL package file +│├neorv32_cpu_alu.vhd - Arithmetic/logic unit +│├neorv32_cpu_bus.vhd - Bus interface unit + physical memory protection +│├neorv32_cpu_control.vhd - CPU control, exception/IRQ system and CSRs +││└neorv32_cpu_decompressor.vhd - Compressed instructions decoder +│├neorv32_cpu_cp_fpu.vhd - Floating-point co-processor (Zfinx extension) +│├neorv32_cpu_cp_muldiv.vhd - Mul/Div co-processor (M extension) +│└neorv32_cpu_regfile.vhd - Data register file +├neorv32_debug_dm.vhd - on-chip debugger: debug module +├neorv32_debug_dtm.vhd - on-chip debugger: debug transfer module +├neorv32_dmem.vhd - Processor-internal data memory +├neorv32_gpio.vhd - General purpose input/output port unit +├neorv32_imem.vhd - Processor-internal instruction memory +│└neor32_application_image.vhd - IMEM application initialization image +├neorv32_mtime.vhd - Machine system timer +├neorv32_nco.vhd - Numerically-controlled oscillator +├neorv32_neoled.vhd - NeoPixel (TM) compatible smart LED interface +├neorv32_pwm.vhd - Pulse-width modulation controller +├neorv32_spi.vhd - Serial peripheral interface controller +├neorv32_sysinfo.vhd - System configuration information memory +├neorv32_trng.vhd - True random number generator +├neorv32_twi.vhd - Two wire serial interface controller +├neorv32_uart.vhd - Universal async. receiver/transmitter +├neorv32_wdt.vhd - Watchdog timer +└neorv32_wb_interface.vhd - External (Wishbone) bus interface +................................... + + +<<< +// #################################################################################################################### +:sectnums: +=== FPGA Implementation Results + +This chapter shows exemplary implementation results of the NEORV32 CPU and Processor. Please note, that +the provided results are just a relative measure as logic functions of different modules might be merged +between entity boundaries, so the actual utilization results might vary a bit. + +:sectnums: +==== CPU + +[cols="<2,<8"] +[grid="topbot"] +|======================= +| Hardware version: | `1.5.5.5` +| Top entity: | `rtl/core/neorv32_cpu.vhd` +|======================= + +[cols="<5,>1,>1,>1,>1,>1"] +[options="header",grid="rows"] +|======================= +| CPU | LEs | FFs | MEM bits | DSPs | _f~max~_ +| `rv32i` | 980 | 409 | 1024 | 0 | 123 MHz +| `rv32i_Zicsr` | 1835 | 856 | 1024 | 0 | 124 MHz +| `rv32im_Zicsr` | 2443 | 1134 | 1024 | 0 | 124 MHz +| `rv32imc_Zicsr` | 2669 | 1149 | 1024 | 0 | 125 MHz +| `rv32imac_Zicsr` | 2685 | 1156 | 1024 | 0 | 124 MHz +| `rv32imac_Zicsr` + `debug_mode` | 3058 | 1225 | 1024 | 0 | 120 MHz +| `rv32imac_Zicsr` + `u` | 2698 | 1162 | 1024 | 0 | 124 MHz +| `rv32imac_Zicsr_Zifencei` + `u` | 2715 | 1162 | 1024 | 0 | 122 MHz +| `rv32imac_Zicsr_Zifencei_Zfinx` + `u` | 4004 | 1812 | 1024 | 7 | 121 MHz +|======================= + + +:sectnums: +==== Processor Modules + +[cols="<2,<8"] +[grid="topbot"] +|======================= +| Hardware version: | `1.5.5.9` +| Top entity: | `rtl/core/neorv32_top.vhd` +|======================= + +.Hardware utilization by the processor modules (mandatory core modules in **bold**) +[cols="<2,<8,>1,>1,>2,>1"] +[options="header",grid="rows"] +|======================= +| Module | Description | LEs | FFs | MEM bits | DSPs +| Boot ROM | Bootloader ROM (4kB) | 3 | 1 | 32768 | 0 +| **BUSKEEPER** | Processor-internal bus monitor | 11 | 6 | 0 | 0 +| **BUSSWITCH** | Bus mux for CPU instr. and data interface | 49 | 8 | 0 | 0 +| CFS | Custom functions subsystem | - | - | - | - +| DMEM | Processor-internal data memory (8kB) | 18 | 2 | 65536 | 0 +| DM | On-chip debugger - debug module | 493 | 240 | 0 | 0 +| DTM | On-chip debugger - debug transfer module (JTAG) | 254 | 218 | 0 | 0 +| GPIO | General purpose input/output ports | 67 | 65 | 0 | 0 +| iCACHE | Instruction cache (1x4 blocks, 256 bytes per block) | 220 | 154 | 8192 | 0 +| IMEM | Processor-internal instruction memory (16kB) | 6 | 2 | 131072 | 0 +| MTIME | Machine system timer | 289 | 200 | 0 | 0 +| NCO | Numerically-controlled oscillator | 254 | 226 | 0 | 0 +| NEOLED | Smart LED Interface (NeoPixel/WS28128) [4xFIFO] | 347 | 309 | 0 | 0 +| PWM | Pulse_width modulation controller (4 channels) | 71 | 69 | 0 | 0 +| SPI | Serial peripheral interface | 138 | 124 | 0 | 0 +| **SYSINFO** | System configuration information memory | 10 | 10 | 0 | 0 +| TRNG | True random number generator | 132 | 105 | 0 | 0 +| TWI | Two-wire interface | 77 | 44 | 0 | 0 +| UART0/1 | Universal asynchronous receiver/transmitter 0/1 | 176 | 132 | 0 | 0 +| WDT | Watchdog timer | 60 | 45 | 0 | 0 +| WISHBONE | External memory interface | 129 | 104 | 0 | 0 +|======================= + + +<<< +:sectnums: +==== Exemplary Setups + +[TIP] +Exemplary setups for different technologies and various FPGA boards can be found in the `boards` folder +(https://github.com/stnolting/neorv32/tree/master/boards). + +The following table shows exemplary NEORV32 processor implementation results for different FPGA +platforms. Most setups use the default peripheral configuration (like no CFS, no caches and no +TRNG), no external memory interface and only internal instruction and data memories (IMEM uses 16kB +and DMEM uses 8kB memory space). + +[cols="<2,<8"] +[grid="topbot"] +|======================= +| Hardware version: | `1.4.9.0` +|======================= + +.Hardware utilization for exemplary NEORV32 setups +[cols="<4,<5,<4,<4,<3,<3,<3,<4,<4,<3"] +[options="header",grid="rows"] +|======================= +| Vendor | FPGA | Board | Toolchain | CPU | LUT | FF | DSP | Memory | _f_ +| Intel | Cyclone IV `EP4CE22F17-C6N` | Terasic DE0-Nano | Quartus Prime Lite 20.1 | `rv32imcu_Zicsr_Zifencei` + `PMP` | 3813 (17%) | 1890 (8%) | 0 (0%) | Memory bits: 231424 (38%) | 119 MHz +| Lattice | iCE40 UltraPlus `iCE40UP5KSG48I` | Upduino v3.0 | Radiant 2.1 | `rv32icu_Zicsr_Zifencei` | 5123 (97%) | 1972 (37%) | 0 (0%) | EBR: 12 (40%) SPRAM: 4 (100%) | 24 MHz +| Xilinx | Artix-7 `XC7A35TICSG324-1L` | Arty A7-35T | Vivado 2019.2 | `rv32imcu_Zicsr_Zifencei` + `PMP` | 2465 (12%) | 1912 (5%) | 0 (0%) | BRAM: 8 (16%) | 100 MHz +|======================= + +**Notes** + +* The Lattice iCE40 UltraPlus setup uses the FPGA's SPRAM memory primitives for the internal IMEM and DEMEM (each 64kB). +* The Upduino and the Arty board have on-board SPI flash memories for storing the FPGA configuration. These device can also be used by the default NEORV32 bootloader to store and automatically boot an application program after reset (both tested successfully). +* The setups with PMP implement 2 regions with a minimal granularity of 64kB. +* No HPM counters are used. + + +<<< +// #################################################################################################################### +:sectnums: +=== CPU Performance + +:sectnums: +==== CoreMark Benchmark + +.Configuration +[cols="<2,<8"] +[grid="topbot"] +|======================= +| Hardware: | 32kB IMEM, 16kB DMEM, no caches, 100MHz clock +| CoreMark: | 2000 iterations, MEM_METHOD is MEM_STACK +| Compiler: | RISCV32-GCC 10.1.0 +| Peripherals: | UART for printing the results +| Compiler flags: | default, see makefile +|======================= + +The performance of the NEORV32 was tested and evaluated using the https://www.eembc.org/coremark/[Core Mark CPU benchmark]. This +benchmark focuses on testing the capabilities of the CPU core itself rather than the performance of the whole +system. The according source code and the SW project can be found in the `sw/example/coremark` folder. + +The resulting CoreMark score is defined as CoreMark iterations per second. +The execution time is determined via the RISC-V `[m]cycle[h]` CSRs. The relative CoreMark score is +defined as CoreMark score divided by the CPU's clock frequency in MHz. + +[cols="<2,<8"] +[grid="topbot"] +|======================= +| Hardware version: | `1.4.9.8` +|======================= + +.CoreMark results +[cols="<4,>1,>1,>1"] +[options="header",grid="rows"] +|======================= +| CPU (incl. `Zicsr`) | Executable size | CoreMark Score | CoreMarks/Mhz +| `rv32i` | 28756 bytes | 36.36 | **0.3636** +| `rv32im` | 27516 bytes | 68.97 | **0.6897** +| `rv32imc` | 22008 bytes | 68.97 | **0.6897** +| `rv32imc` + _FAST_MUL_EN_ | 22008 bytes | 86.96 | **0.8696** +| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ | 22008 bytes | 90.91 | **0.9091** +|======================= + +[NOTE] +All executable were generated using maximum optimization `-O3`. +The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the _M_ extension (enabled via the +_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift +operations (enabled via the _FAST_SHIFT_EN_ generic). + + +<<< +:sectnums: +==== Instruction Timing + +The NEORV32 CPU is based on a multi-cycle architecture. Each instruction is executed in a sequence of +several consecutive micro operations. Hence, each instruction requires several clock cycles to execute. + +The average CPI (cycles per instruction) depends on the instruction mix of a specific applications and also on +the available CPU extensions. The following table shows the performance results for successfully (!) running +2000 CoreMark iterations. + +The average CPI is computed by dividing the total number of required clock cycles (only the timed core to +avoid distortion due to IO wait cycles) by the number of executed instructions (`[m]instret[h]` CSRs). The +executables were generated using optimization -O3. + +[cols="<2,<8"] +[grid="topbot"] +|======================= +| Hardware version: | `1.4.9.8` +|======================= + +.CoreMark instruction timing +[cols="<4,>2,>2,>2"] +[options="header",grid="rows"] +|======================= +| CPU (incl. `Zicsr`) | Required clock cycles | Executed instruction | Average CPI +| `rv32i` | 5595750503 | 1466028607 | **3.82** +| `rv32im` | 2966086503 | 598651143 | **4.95** +| `rv32imc` | 2981786734 | 611814918 | **4.87** +| `rv32imc` + _FAST_MUL_EN_ | 2399234734 | 611814918 | **3.92** +| `rv32imc` + _FAST_MUL_EN_ + _FAST_SHIFT_EN_ | 2265135174 | 611814948 | **3.70** +|======================= + +[TIP] +The _FAST_MUL_EN_ configuration uses DSPs for the multiplier of the M extension (enabled via the +_FAST_MUL_EN_ generic). The _FAST_SHIFT_EN_ configuration uses a barrel shifter for CPU shift +operations (enabled via the _FAST_SHIFT_EN_ generic). + +[TIP] +More information regarding the execution time of each implemented instruction can be found in +chapter <<_instruction_timing>>. + Index: datasheet/soc.adoc =================================================================== --- datasheet/soc.adoc (nonexistent) +++ datasheet/soc.adoc (revision 60) @@ -0,0 +1,1097 @@ + +// #################################################################################################################### +:sectnums: +== NEORV32 Processor (SoC) + +The NEORV32 Processor is based on the NEORV32 CPU. Together with common peripheral +interfaces and embedded memories it provides a RISC-V-based full-scale microcontroller-like SoC platform. + +image::neorv32_processor.png[align=center] + +**Key Features** + +* _optional_ processor-internal data and instruction memories (<<_data_memory_dmem,**DMEM**>>/<<_instruction_memory_imem,**IMEM**>>) + cache (<<_processor_internal_instruction_cache_icache,**iCACHE**>>) +* _optional_ internal bootloader (<<_bootloader_rom_bootrom,**BOOTROM**>>) with UART console & SPI flash boot option +* _optional_ machine system timer (<<_machine_system_timer_mtime,**MTIME**>>), RISC-V-compatible +* _optional_ two independent universal asynchronous receivers and transmitters (<<_primary_universal_asynchronous_receiver_and_transmitter_uart0,**UART0**>>, <<_secondary_universal_asynchronous_receiver_and_transmitter_uart1,**UART1**>>) with optional hardware flow control (RTS/CTS) +* _optional_ 8/16/24/32-bit serial peripheral interface controller (<<_serial_peripheral_interface_controller_spi,**SPI**>>) with 8 dedicated CS lines +* _optional_ two wire serial interface controller (<<_two_wire_serial_interface_controller_twi,**TWI**>>), compatible to the I²C standard +* _optional_ general purpose parallel IO port (<<_general_purpose_input_and_output_port_gpio,**GPIO**>>), 32xOut, 32xIn +* _optional_ 32-bit external bus interface, Wishbone b4 / AXI4-Lite compatible (<<_processor_external_memory_interface_wishbone_axi4_lite,**WISHBONE**>>) +* _optional_ watchdog timer (<<_watchdog_timer_wdt,**WDT**>>) +* _optional_ PWM controller with up to 60 channels & 8-bit duty cycle resolution (<<_pulse_width_modulation_controller_pwm,**PWM**>>) +* _optional_ ring-oscillator-based true random number generator (<<_true_random_number_generator_trng,**TRNG**>>) +* _optional_ custom functions subsystem for custom co-processor extensions (<<_custom_functions_subsystem_cfs,**CFS**>>) +* _optional_ numerically-controlled oscillator (<<_numerically_controlled_oscillator_nco,**NCO**>>) with 3 independent channels +* _optional_ NeoPixel(TM)/WS2812-compatible smart LED interface (<<_smart_led_interface_neoled,**NEOLED**>>) +* _optional_ on-chip debugger with JTAG TAP (<<_on_chip_debugger_ocd,**OCD**>>) +* system configuration information memory to check HW configuration via software (<<_system_configuration_information_memory_sysinfo,**SYSINFO**>>) + + +<<< +// #################################################################################################################### +:sectnums: +=== Processor Top Entity - Signals + +The following table shows all interface ports of the processor top entity (`rtl/core/neorv32_top.vhd`). +The type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. + +[TIP] +A wrapper for the NEORV32 Processor setup providing resolved port signals can be found in +`rtl/top_templates/neorv32_top_stdlogic.vhd`. + +[cols="<3,^2,^2,<11"] +[options="header",grid="rows"] +|======================= +| Signal | Width | Dir. | Function +4+^| **Global Control** +| `clk_i` | 1 | in | global clock line, all registers triggering on rising edge +| `rstn_i` | 1 | in | global reset, asynchronous, **low-active** +4+^| **JTAG Access Port for <<_on_chip_debugger_ocd>>** +| `jtag_trst_i` | 1 | in | TAP reset, low-active (optionalfootnote:[Pull high if not used.]) +| `jtag_tck_i ` | 1 | in | serial clock +| `jtag_tdi_i ` | 1 | in | serial data input +| `jtag_tdo_o ` | 1 | out | serial data outputfootnote:[If the on-chip debugger is not implemented (_ON_CHIP_DEBUGGER_EN_ = false) `jtag_tdi_i` is directly forwarded to `jtag_tdo_o` to maintain the JTAG chain.] +| `jtag_tms_i ` | 1 | in | mode select +4+^| **External Bus Interface (<<_processor_external_memory_interface_wishbone_axi4_lite,WISHBONE>>)** +| `wb_tag_o` | 3 | out | tag (access type identifier) +| `wb_adr_o` | 32 | out | destination address +| `wb_dat_i` | 32 | in | write data +| `wb_dat_o` | 32 | out | read data +| `wb_we_o` | 1 | out | write enable ('0' = read transfer) +| `wb_sel_o` | 4 | out | byte enable +| `wb_stb_o` | 1 | out | strobe +| `wb_cyc_o` | 1 | out | valid cycle +| `wb_lock_o`| 1 | out | exclusive access request +| `wb_ack_i` | 1 | in | transfer acknowledge +| `wb_err_i` | 1 | in | transfer error +4+^| **Advanced Memory Control Signals** +| `fence_o` | 1 | out | indicates an executed _fence_ instruction +| `fencei_o` | 1 | out | indicates an executed _fencei_ instruction +4+^| **General Purpose Inputs & Outputs (<<_general_purpose_input_and_output_port_gpio,GPIO>>)** +| `gpio_o` | 32 | out | general purpose parallel output +| `gpio_i` | 32 | in | general purpose parallel input +4+^| **Primary Universal Asynchronous Receiver/Transmitter (<<_primary_universal_asynchronous_receiver_and_transmitter_uart0,UART0>>)** +| `uart0_txd_o` | 1 | out | UART0 serial transmitter +| `uart0_rxd_i` | 1 | in | UART0 serial receiver +| `uart0_rts_o` | 1 | out | UART0 RX ready to receive new char +| `uart0_cts_i` | 1 | in | UART0 TX allowed to start sending +4+^| **Primary Universal Asynchronous Receiver/Transmitter (<<_secondary_universal_asynchronous_receiver_and_transmitter_uart1,UART1>>)** +| `uart1_txd_o` | 1 | out | UART1 serial transmitter +| `uart1_rxd_i` | 1 | in | UART1 serial receiver +| `uart1_rts_o` | 1 | out | UART1 RX ready to receive new char +| `uart1_cts_i` | 1 | in | UART1 TX allowed to start sending +4+^| **Serial Peripheral Interface Controller (<<_serial_peripheral_interface_controller_spi,SPI>>)** +| `spi_sck_o` | 1 | out | SPI controller clock line +| `spi_sdo_o` | 1 | out | SPI serial data output +| `spi_sdi_i` | 1 | in | SPI serial data input +| `spi_csn_o` | 8 | out | SPI dedicated chip select (low-active) +4+^| **Two-Wire Interface Controller (<<_two_wire_serial_interface_controller_twi,TWI>>)** +| `twi_sda_io` | 1 | inout | TWI serial data line +| `twi_scl_io` | 1 | inout | TWI serial clock line +4+^| **Custom Functions Subsystem (<<_custom_functions_subsystem_cfs,CFS>>)** +| `cfs_in_i` | 32 | in | custom CFS input signal conduit +| `cfs_out_o` | 32 | out | custom CFS output signal conduit +4+^| **Pulse-Width Modulation Channels (<<_pulse_width_modulation_controller_pwm,PWM>>)** +| `pwm_o` | 4 | out | pulse-width modulated channels +4+^| **Numerically-Controller Oscillator (<<_numerically_controlled_oscillator_nco,NCO>>)** +| `nco_o` | 3 | out | NCO output channels +4+^| **Smart LED Interface - NeoPixel(TM) compatible (<<_smart_led_interface_neoled,NEOLED>>)** +| `neoled_o` | 1 | out | asynchronous serial data output +4+^| **System time (<<_machine_system_timer_mtime,MTIME>>)** +| `mtime_i` | 64 | in | machine timer time (to `time[h]` CSRs) from _external MTIME_ unit if the processor-internal _MTIME_ unit is NOT implemented +| `mtime_o` | 64 | out | machine timer time from _internal MTIME_ unit if processor-internal _MTIME_ unit IS implemented +4+^| **<<_processor_interrupts>>** +| `nm_irq_i` | 1 | in | non-maskable interrupt +| `soc_firq_i` | 6 | in | platform fast interrupt channels (custom) +| `mtime_irq_i` | 1 | in | machine timer interrupt13 (RISC-V) +| `msw_irq_i` | 1 | in | machine software interrupt (RISC-V) +| `mext_irq_i` | 1 | in | machine external interrupt (RISC-V) +|======================= + + +<<< +// #################################################################################################################### +:sectnums: +=== Processor Top Entity - Generics + +This is a list of all configuration generics of the NEORV32 processor top entity rtl/neorv32_top.vhd. +The generic name is shown in orange, followed by the type in printed in black and concluded by the default +value printed in light gray. + +[TIP] +The NEORV32 generics allow to configure the system according to your needs. The generics are +used to control implementation of certain CPU extensions and peripheral modules and even allow to +optimize the system for certain design goals like minimal area or maximum performance. + +[TIP] +Privileged software can determine the actual CPU and processor configuration via the `misa` and +`mzext` (see <<_machine_trap_setup>> and <<_neorv32_specific_custom_csrs>>) CSRs and via the memory-mapped _SYSINFO_ module (see <<_system_configuration_information_memory_sysinfo>>), +respectively. + +[TIP] +If optional modules (like CPU extensions or peripheral devices) are *not enabled* the according circuitry **will not be synthesized at all**. +Hence, the disabled modules do not increase area and power requirements and do not impact the timing. + +**CSR Description** + +The description of each CSR provides the following summary: + +.Generic description +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| _Generic_ | _type_ | _default value_ +3+| _Description_ +|====== + +<<< +// #################################################################################################################### +:sectnums: +==== General + +See section <<_system_configuration_information_memory_sysinfo>> for more information. + +:sectnums!: +===== _CLOCK_FREQUENCY_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CLOCK_FREQUENCY** | _natural_ | 0 +3+| The clock frequency of the processor's `clk_i` input port in Hertz (Hz). +|====== + + +:sectnums!: +===== _BOOTLOADER_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **BOOTLOADER_EN** | _boolean_ | true +3+| Implement the boot ROM, pre-initialized with the bootloader image when true. This will also change the +processor's boot address from the beginning of the instruction memory address space (default = +0x00000000) to the base address of the boot ROM. See section <<_bootloader>> for more information. +|====== + + +:sectnums!: +===== _USER_CODE_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **USER_CODE** | _std_ulogic_vector(31 downto 0)_ | x"00000000" +3+| Custom user code that can be read by software via the _SYSINFO_ module. +|====== + + +:sectnums!: +===== _HW_THREAD_ID_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **HW_THREAD_ID** | _natural_ | 0 +3+| The hart ID of the CPU. Can be read via the `mhartid` CSR. Hart IDs must be unique within a system. +|====== + + +:sectnums!: +===== _ON_CHIP_DEBUGGER_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **ON_CHIP_DEBUGGER_EN** | _boolean_ | false +3+| Implement on-chip debugger (OCD). See chapter <<_on_chip_debugger_ocd>>. +|====== + + +// #################################################################################################################### +:sectnums: +==== RISC-V CPU Extensions + +See section <<_instruction_sets_and_extensions>> for more information. + + +:sectnums!: +===== _CPU_EXTENSION_RISCV_A_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_EXTENSION_RISCV_A** | _boolean_ | false +3+| Implement atomic memory access operations when _true_. +|====== + + +:sectnums!: +===== _CPU_EXTENSION_RISCV_C_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_EXTENSION_RISCV_C** | _boolean_ | false +3+| Implement compressed instructions (16-bit) when _true_. +|====== + + +:sectnums!: +===== _CPU_EXTENSION_RISCV_E_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_EXTENSION_RISCV_E** | _boolean_ | false +3+| Implement the embedded CPU extension (only implement the first 16 data registers) when _true_. +|====== + + +:sectnums!: +===== _CPU_EXTENSION_RISCV_M_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_EXTENSION_RISCV_M** | _boolean_ | false +3+| Implement integer multiplication and division instructions when _true_. +|====== + + +:sectnums!: +===== _CPU_EXTENSION_RISCV_U_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_EXTENSION_RISCV_U** | _boolean_ | false +3+| Implement less-privileged user mode when _true_. +|====== + + +:sectnums!: +===== _CPU_EXTENSION_RISCV_Zfinx_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_EXTENSION_RISCV_Zfinx** | _boolean_ | false +3+| Implement the 32-bit single-precision floating-point extension (using integer registers) when _true_. For +more information see section <<_zfinx_single_precision_floating_point_operations>>. +|====== + + +:sectnums!: +===== _CPU_EXTENSION_RISCV_Zicsr_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_EXTENSION_RISCV_Zicsr** | _boolean_ | true +3+| Implement the control and status register (CSR) access instructions when true. Note: When this option is +disabled, the complete privileged architecture / trap system will be excluded from synthesis. Hence, no interrupts, no exceptions and +no machine information will be available. +|====== + + +:sectnums!: +===== _CPU_EXTENSION_RISCV_Zifencei_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_EXTENSION_RISCV_Zifencei** | _boolean_ | false +3+| Implement the instruction fetch synchronization instruction _fence.i_. For example, this option is required +for self-modifying code (and/or for i-cache flushes). +|====== + + +// #################################################################################################################### +:sectnums: +==== Extension Options + +See section <<_instruction_sets_and_extensions>> for more information. + + +:sectnums!: +===== _FAST_MUL_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **FAST_MUL_EN** | _boolean_ | false +3+| When this generic is enabled, the multiplier of the `M` extension is realized using DSPs blocks instead of an +iterative bit-serial approach. This generic is only relevant when the multiplier and divider CPU extension is +enabled (<<_cpu_extension_riscv_m>> is _true_). +|====== + + +:sectnums!: +===== _FAST_SHIFT_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **FAST_SHIFT_EN** | _boolean_ | false +3+| When this generic is enabled the shifter unit of the CPU's ALU is implement as fast barrel shifter (requiring +more hardware resources). +|====== + + +:sectnums!: +===== _TINY_SHIFT_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **TINY_SHIFT_EN** | _boolean_ | false +3+| If this generic is enabled the shifter unit of the CPU's ALU is implemented as (slow but tiny) single-bit iterative shifter +(requires up to 32 clock cycles for a shift operations, but reducing hardware footprint). The configuration of +this generic is ignored if <<_fast_shift_en>> is _true_. +|====== + + +:sectnums!: +===== _CPU_CNT_WIDTH_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **CPU_CNT_WIDTH** | _natural_ | 0 +3+| This generic configures the total size of the CPU's `cycle` and `instret` CSRs (low word + high word). +The maximum value is 64, the minimal is 0. See +section <<_machine_counters_and_timers>> for more information. Note: Configurations with <<_cpu_cnt_width>> +less than 64 are not RISC-V compliant. +|====== + + +// #################################################################################################################### +:sectnums: +==== Physical Memory Protection (PMP) + +See section <<_pmp_physical_memory_protection>> for more information. + + +:sectnums!: +===== _PMP_NUM_REGIONS_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **PMP_NUM_REGIONS** | _natural_ | 0 +3+| Total number of implemented protections regions (0..64). If this generics is zero no physical memory +protection logic will be implemented at all. Setting <<_pmp_num_regions>>_ > 0 will set the _CSR_MZEXT_PMP_ flag +in the <<_mzext>> CSR. +|====== + + +:sectnums!: +===== _PMP_MIN_GRANULARITY_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **PMP_MIN_GRANULARITY** | _natural_ | 64*1024 +3+| Minimal region granularity in bytes. Has to be a power of two. Has to be at least 8 bytes. +|====== + + +// #################################################################################################################### +:sectnums: +==== Hardware Performance Monitors (HPM) + +See section <<_hpm_hardware_performance_monitors>> for more information. + + +:sectnums!: +===== _HPM_NUM_CNTS_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **HPM_NUM_CNTS** | _natural_ | 0 +3+| Total number of implemented hardware performance monitor counters (0..29). If this generics is zero no +hardware performance monitor logic will be implemented at all. Setting <<_hpm_num_cnts>> > 0 will set the _CSR_MZEXT_HPM_ flag +in the <<_mzext>> CSR. +|====== + + +:sectnums!: +===== _HPM_CNT_WIDTH_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **HPM_CNT_WIDTH** | _natural_ | 40 +3+| This generic defines the total LSB-aligned size of each HPM counter (size(`[m]hpmcounter*h`) + +size(`[m]hpmcounter*`)). The maximum value is 64, the minimal is 0. If the size is less than 64-bit, the +unused MSB-aligned counter bits are hardwired to zero. +|====== + + +// #################################################################################################################### +:sectnums: +==== Internal Instruction Memory + +See sections <<_address_space>> and <<_instruction_memory_imem>> for more information. + + +:sectnums!: +===== _MEM_INT_IMEM_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **MEM_INT_IMEM_EN** | _boolean_ | true +3+| Implement processor internal instruction memory (IMEM) when _true_. +|====== + + +:sectnums!: +===== _MEM_INT_IMEM_SIZE_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **MEM_INT_IMEM_SIZE** | _natural_ | 16*1024 +3+| Size in bytes of the processor internal instruction memory (IMEM). Has no effect when _MEM_INT_IMEM_EN_ is _false_. +|====== + + +:sectnums!: +===== _MEM_INT_IMEM_ROM_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **MEM_INT_IMEM_ROM** | _boolean_ | false +3+| Implement processor-internal instruction memory as read-only memory, which will be initialized with the +application image at synthesis time. Has no effect when _MEM_INT_IMEM_EN_ is _false_. +|====== + + +// #################################################################################################################### +:sectnums: +==== Internal Data Memory + +See sections <<_address_space>> and <<_data_memory_dmem>> for more information. + + +:sectnums!: +===== _MEM_INT_DMEM_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **MEM_INT_DMEM_EN** | _boolean_ | true +3+| Implement processor internal data memory (DMEM) when _true_. +|====== + + +:sectnums!: +===== _MEM_INT_DMEM_SIZE_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **MEM_INT_DMEM_SIZE** | _natural_ | 8*1024 +3+| Size in bytes of the processor-internal data memory (DMEM). Has no effect when _MEM_INT_DMEM_EN_ is _false_. +|====== + + +// #################################################################################################################### +:sectnums: +==== Internal Cache Memory + +See section <<_processor_internal_instruction_cache_icache>> for more information. + + +:sectnums!: +===== _ICACHE_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **ICACHE_EN** | _boolean_ | false +3+| Implement processor internal instruction cache when _true_. +|====== + + +:sectnums!: +===== _ICACHE_NUM_BLOCK_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **ICACHE_NUM_BLOCKS** | _natural_ | 4 +3+| Number of blocks (cache "pages" or "lines") in the instruction cache. Has to be a power of two. Has no +effect when _ICACHE_DMEM_EN_ is false. +|====== + + +:sectnums!: +===== _ICACHE_BLOCK_SIZE_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **ICACHE_BLOCK_SIZE** | _natural_ | 64 +3+| Size in bytes of each block in the instruction cache. Has to be a power of two. Has no effect when +_ICACHE_EN_ is _false_. +|====== + + +:sectnums!: +===== _ICACHE_ASSOCIATIVITY_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **ICACHE_ASSOCIATIVITY** | _natural_ | 1 +3+| Associativity (= number of sets) of the instruction cache. Has to be a power of two. Allowed configurations: +`1` = 1 set, direct mapped; `2` = 2-way set-associative. Has no effect when _ICACHE_EN_ is _false_. +|====== + + +// #################################################################################################################### +:sectnums: +==== External Memory Interface + +See sections <<_address_space>> and <<_processor_external_memory_interface_wishbone_axi4_lite>> for more information. + + +:sectnums!: +===== _MEM_EXT_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **MEM_EXT_EN** | _boolean_ | false +3+| Implement external bus interface (WISHBONE) when _true_. +|====== + + +:sectnums!: +===== _MEM_EXT_TIMEOUT_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **MEM_EXT_TIMEOUT** | _natural_ | 255 +3+| Clock cycles after which a pending external bus access will auto-terminate and raise a bus fault exception. Set to 0 to disable auto-timeout. +|====== + + +// #################################################################################################################### +:sectnums: +==== Processor Peripheral/IO Modules + +See section <<_processor_internal_modules>> for more information. + + +:sectnums!: +===== _IO_GPIO_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_GPIO_EN** | _boolean_ | true +3+| Implement general purpose input/output port unit (GPIO) when _true_. +See section <<_general_purpose_input_and_output_port_gpio>> for more information. +|====== + + +:sectnums!: +===== _IO_MTIME_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_MTIME_EN** | _boolean_ | true +3+| Implement machine system timer (MTIME) when _true_. +See section <<_machine_system_timer_mtime>> for more information. +|====== + + +:sectnums!: +===== _IO_UART0_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_UART0_EN** | _boolean_ | true +3+| Implement primary universal asynchronous receiver/transmitter (UART0) when _true_. +See section <<_primary_universal_asynchronous_receiver_and_transmitter_uart0>> for +more information. +|====== + + +:sectnums!: +===== _IO_UART1_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_UART1_EN** | _boolean_ | true +3+| Implement secondary universal asynchronous receiver/transmitter (UART1) when _true_. +See section <<_secondary_universal_asynchronous_receiver_and_transmitter_uart1>> for more information. +|====== + + +:sectnums!: +===== _IO_SPI_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_SPI_EN** | _boolean_ | true +3+| Implement serial peripheral interface controller (SPI) when _true_. +See section <<_serial_peripheral_interface_controller_spi>> for more information. +|====== + + +:sectnums!: +===== _IO_TWI_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_TWI_EN** | _boolean_ | true +3+| Implement two-wire interface controller (TWI) when _true_. +See section <<_two_wire_serial_interface_controller_twi>> for +more information. +|====== + + +:sectnums!: +===== _IO_PWM_NUM_CH_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_PWM_NUM_CH** | _natural_ | 4 +3+| Number of pulse-width modulation (PWM) channels (0..60) to implement. The PWM controller is _not_ implemented if zero. +See section <<_pulse_width_modulation_controller_pwm>> for more information. +|====== + + +:sectnums!: +===== _IO_WDT_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_WDT_EN** | _boolean_ | true +3+| Implement watchdog timer (WDT) when _true_. See section <<_watchdog_timer_wdt>> for more +information. +|====== + + +:sectnums!: +===== _IO_TRNG_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_TRNG_EN** | _boolean_ | false +3+| Implement true-random number generator (TRNG) when _true_. See section <<_true_random_number_generator_trng>> for more information. +|====== + + +:sectnums!: +===== _IO_CFS_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_CFS_EN** | _boolean_ | false +3+| Implement custom functions subsystem (CFS) when _true_. See section <<_custom_functions_subsystem_cfs>> for more information. +|====== + + +:sectnums!: +===== _IO_CFS_CONFIG_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_CFS_CONFIG** | _std_ulogic_vector(31 downto 0)_ | 0x"00000000" +3+| This is a "conduit" generic that can be used to pass user-defined CFS implementation flags to the custom +functions subsystem entity. See section <<_custom_functions_subsystem_cfs>> for more information. +|====== + + +:sectnums!: +===== _IO_CFS_IN_SIZE_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_CFS_IN_SIZE** | _positive_ | 32 +3+| Defines the size of the CFS input signal conduit (`cfs_in_i`). See section <<_custom_functions_subsystem_cfs>> for more information. +|====== + + +:sectnums!: +===== _IO_CFS_OUT_SIZE_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_CFS_OUT_SIZE** | _positive_ | 32 +3+| Defines the size of the CFS output signal conduit (`cfs_out_o`). See section <<_custom_functions_subsystem_cfs>> for more information. +|====== + + +:sectnums!: +===== _IO_NCO_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_NCO_EN** | _boolean_ | true +3+| Implement numerically-controlled oscillator (NCO) when _true_. +See section <<_numerically_controlled_oscillator_nco>> for more information. +|====== + + +:sectnums!: +===== _IO_NEOLED_EN_ + +[cols="4,4,2"] +[frame="all",grid="none"] +|====== +| **IO_NEOLED_EN** | _boolean_ | true +3+| Implement smart LED interface (WS2812 / NeoPixel(TM)-compatible) (NEOLED) when _true_. +See section <<_smart_led_interface_neoled>> for more information. +|====== + + +<<< +// #################################################################################################################### +:sectnums: +=== Processor Interrupts + +[TIP] +The interrupt request signals have specific `mip` CSR bits (see <<_machine_trap_setup>>), specifc +`mie` CSR bits (see <<_machine_trap_handling>>) and specifc `mcause` CSR trap codes and trap +priorities. For more information (also regarding the signaling protocol) see section <<_traps_exceptions_and_interrupts>>. + +**RISC-V Standard Interrupts** + +The processor setup features the standard RISC-V interrupt lines for "machine timer interrupt", "machine +software interrupt" and "machine external interrupt". The software and external interrupt lines are available +via the processor's top entity. By default, the timer interrupt is connected to the internal machine timer +MTIME timer unit (<<_machine_system_timer_mtime>>). If this module has not been enabled for +synthesis, the machine timer interrupt is also available via the processor's top entity. + +**NEORV32-Specific Fast Interrupt Requests** + +As part of the custom/NEORV32-specific CPU extensions, the CPU features 16 fast interrupt request signals +(`FIRQ0` – `FIRQ15`). + +The fast interrupt request signals are divided into two groups. The FIRQs with higher priority (FIRQ0 – +FIRQ9) are dedicated for processor-internal usage. The FIRQs with lower priority (FIRQ10 – FIRQ15) are +available for custom usage via the processor's top entity signal `soc_firq_i`. + +The mapping of the 16 FIRQ channels is shown in the following table (the channel number corresponds to the FIRQ priority): + +.NEORV32 fast interrupt channel mapping +[cols="^1,<2,<7"] +[options="header",grid="rows"] +|======================= +| Channel | Source | Description +| 0 | _WDT_ | watchdog timeout interrupt +| 1 | _CFS_ | custom functions subsystem (CFS) interrupt (user-defined) +| 2 | _UART0_ (RXD) | UART0 data received interrupt (RX complete) +| 3 | _UART0_ (TXD) | UART0 sending done interrupt (TX complete) +| 4 | _UART1_ (RXD) | UART1 data received interrupt (RX complete) +| 5 | _UART1_ (TXD) | UART1 sending done interrupt (TX complete) +| 6 | _SPI_ | SPI transmission done interrupt +| 7 | _TWI_ | TWI transmission done interrupt +| 8 | _GPIO_ | GPIO input pin-change interrupt +| 9 | _NEOLED_ | NEOLED buffer TX empty / not full interrupt +| 10:15 | `soc_firq_i(5:0)` | Custom platform use; available via processor's top signal +|======================= + +**Non-Maskable Interrupt** + +The NEORV32 features a single non-maskable interrupt source via the `nm_irq_i` top +entity signal that can be used to signal critical system conditions. This interrupt source _cannot_ be disabled. Hence, it does _not_ provide +configuration/status flags in the `mie` and `mip` CSRs. The RISC-V-compatible `mcause` value `0x80000000` is used to indicate the non-maskable interrupt. + +<<< +// #################################################################################################################### +:sectnums: +=== Address Space + +By default, the total 32-bit (4GB) address space of the NEORV32 Processor is divided into four main regions: + +1. Instruction memory (IMEM) space – for instructions and constants. +2. Data memory (DMEM) space – for application runtime data (heap, stack, etc.). +3. Bootloader ROM address space – for the processor-internal bootloader. +4. IO/peripheral address space – for the processor-internal IO/peripheral devices (e.g., UART). + +.NEORV32 processor - address space (default configuration) +image::address_space.png[900] + +[TIP] +These four memory regions are handled by the linker when compiling a NEORV32 executable. +See section <<_executable_image_format>> for more information. + +**Address Space Layout** + +The general address space layout consists of two main configuration constants: `ispace_base_c` defining +the base address of the instruction memory address space and `dspace_base_c` defining the base address of +the data memory address space. Both constants are defined in the NEORV32 VHDL package file +`rtl/core/neorv32_package.vhd`: + +[source,vhdl] +---- +-- Architecture Configuration ---------------------------------------------------- +-- ---------------------------------------------------------------------------------- +constant ispace_base_c : std_ulogic_vector(31 downto 0) := x"00000000"; +constant dspace_base_c : std_ulogic_vector(31 downto 0) := x"80000000"; +---- + +The default configuration assumes the instruction memory address space starting at address _0x00000000_ +and the data memory address space starting at _0x80000000_. Both values can be modified for a specific +setup and the address space may overlap or can be completely identical. + +The base address of the bootloader (at _0xFFFF0000_) and the IO region (at _0xFFFFFF00_) for the peripheral +devices are also defined in the package and are fixed. These address regions cannot be used for other +applications – even if the bootloader or all IO devices are not implemented. + +[WARNING] +When using the processor-internal data and/or instruction memories (DMEM/IMEM) and using a non-default +configuration for the `dspace_base_c` and/or `ispace_base_c` base addresses, the +following requirements have to be fulfilled: +**1.** Both base addresses have to be aligned to a 4-byte boundary. +**2.** Both base addresses have to be aligned to the according internal memory sizes. + +:sectnums: +==== CPU Data and Instruction Access + +The CPU can access all of the 4GB address space from the instruction fetch interface (**I**) and also from the +data access interface (**D**). These two CPU interfaces are multiplexed by a simple bus switch +(`rtl/core/neorv32_busswitch.vhd`) into a _single_ processor-internal bus. All processor-internal +memories, peripherals and also the external memory interface are connected to this bus. Hence, both CPU +interfaces (instruction fetch & data access) have access to the same (**identical**) address space making the +setup a modified von-Neumann architecture. + +.Processor-internal bus architecture +image::neorv32_bus.png[1300] + +[NOTE] +The internal processor bus might appear as bottleneck. In order to reduce traffic jam on this bus +(when instruction fetch and data interface access the bus at the same time) the instruction fetch of +the CPU is equipped with a prefetch buffer. Instruction fetches can be further buffered using the i-cache. +Furthermore, data accesses (loads and stores) have higher priority than instruction fetch +accesses. + +[IMPORTANT] +Please note that all processor-internal components including the peripheral/IO devices can also be +accessed from programs running in less-privileged user mode. For example, if the system relies on +a periodic interrupt from the _MTIME_ timer unit, user-level programs could alter the _MTIME_ +configuration corrupting this interrupt. This kind of security issues can be compensated using the +PMP system (see <<_machine_physical_memory_protection>>). + +:sectnums: +==== Physical Memory Attributes + +The processor setup defines four simple attributes for the four processor-internal address space regions: + +* `r` – read access (from CPU data access interface, e.g. via "load") +* `w` – write access (from CPU data access interface, e.g. via "store") +* `x` – execute access (from CPU instruction fetch interface) +* `a` – atomic access (from CPU data access interface) +* `8` – byte (8-bit)-accessible (when writing) +* `16` – half-word (16-bit)-accessible (when writing) +* `32` – word (32-bit)-accessible (when writing) + +The following table shows the provided physical memory attributes of each region. Additional attributes (like +denying execute right for certain region of the IMEM) can be provided using the RISC-V <<_machine_physical_memory_protection>> extension. + +[cols="^1,^2,^2,^3,^2"] +[options="header",grid="rows"] +|======================= +| # | Region | Base address | Size | Attributes +| 4 | IO/peripheral devices | 0xfffffe00 | 512 bytes | `r/w/a/32` +| 3 | bootloader ROM | 0xffff0000 | up to 32kB| `r/x/a` +| 2 | DMEM | 0x80000000 | up to 2GB (-64kB) | `r/w/x/a/8/16/32` +| 1 | IMEM | 0x00000000 | up to 2GB | `r/w/x/a/8/16/32` +|======================= + +Only the CPU of the processor has access to the internal memories and IO devices, hence all accesses are +always exclusive. Accessing a memory region in a way that violates the provided attributes will trigger a +load/store/instruction fetch access exception or will return a failed atomic access result, respectively. + +The physical memory attributes of memories and/or devices connected via the external bus interface have to +defined by those components or the interconnection fabric. + +:sectnums: +==== Internal Memories + +The processor can implement internal memories for instructions (IMEM) and data (DMEM), which will be +mapped to FPGA block RAMs. The implementation of these memories is controlled via the boolean +<<_mem_int_imem_en>> and <<_mem_int_dmem_en>> generics. + +The size of these memories are configured via the _MEM_INT_IMEM_SIZE_ and _MEM_INT_DMEM_SIZE_ +generics (in bytes), respectively. The processor-internal instruction memory (IMEM) can optionally be +implemented as true ROM (<<_mem_int_imem_rom>>), which is initialized with the application code during +synthesis. + +If the processor-internal IMEM is implemented, it is located right at the base address of the instruction +address space (default `ispace_base_c` = _0x00000000_). Vice versa, the processor-internal data memory is +located right at the beginning of the data address space (default `dspace_base_c` = _0x80000000_) when +implemented. + +:sectnums: +==== External Memory/Bus Interface + +Any CPU access (data or instructions), which does not fulfill one of the following conditions, is forwarded +to the <<_processor_external_memory_interface_wishbone_axi4_lite>>: + +* access to the processor-internal IMEM and processor-internal IMEM is implemented +* access to the processor-internal DMEM and processor-internal DMEM is implemented +* access to the bootloader ROM and beyond → addresses >= _BOOTROM_BASE_ (default 0xFFFF0000) will never be forwarded to the external memory interface + +The external bus interface is available when the <<_mem_ext_en>> generic is _true_. If this interface is +deactivated, any access exceeding the internal memories or peripheral devices will trigger a bus access fault +exception. If <<_mem_ext_timeout>> is greater than zero any external bus access that is not acknowledged or terminated +within <<_mem_ext_timeout>> clock cycles will auto-timeout and raise the according bus fault exception. + + + +<<< +// #################################################################################################################### +:sectnums: +=== Processor-Internal Modules + +Basically, the processor is a SoC consisting of the NEORV32 CPU, peripheral/IO devices, embedded +memories, an external memory interface and a bus infrastructure to interconnect all units. Additionally, the +system implements an internal reset generator and a global clock generator/divider. + +**Internal Reset Generator** + +Most processor-internal modules – except for the CPU and the watchdog timer – do not have a dedicated +reset signal. However, all devices can be reset by software by clearing the corresponding unit's control +register. The automatically included application start-up code (`crt0.S`) will perform a software-reset of all +modules to ensure a clean system reset state. + +The hardware reset signal of the processor can either be +triggered via the external reset pin (`rstn_i`, low-active) or by the internal watchdog timer (if implemented). +Before the external reset signal is applied to the system, it is extended to have a minimal duration of eight +clock cycles. + +**Internal Clock Divider** + +An internal clock divider generates 8 clock signals derived from the processor's main clock input `clk_i`. +These derived clock signals are not actual _clock signals_. Instead, they are derived from a simple counter and +are used as "clock enable" signal by the different processor modules. Thus, the whole design operates using +only the main clock signal (single clock domain). Some of the processor peripherals like the Watchdog or the +UARTs can select one of the derived clock enabled signals for their internal operation. If none of the +connected modules require a clock signal from the divider, it is automatically deactivated to reduce dynamic +power. + +The peripheral devices, which feature a time-based configuration, provide a three-bit prescaler select in their +according control register to select one out of the eight available clocks. The mapping of the prescaler select +bits to the actually obtained clock are shown in the table below. Here, f represents the processor main clock +from the top entity's `clk_i` signal. + +[cols="<3,^1,^1,^1,^1,^1,^1,^1,^1"] +[grid="rows"] +|======================= +| Prescaler bits: | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` +| Resulting clock: | _f/2_ | _f/4_ | _f/8_ | _f/64_ | _f/128_ | _f/1024_| _f/2048_| _f/4096_ +|======================= + +**Peripheral / IO Devices** + +The processor-internal peripheral/IO devices are located at the end of the 32-bit address space at base +address _0xFFFFFE00_. A region of 512 bytes is reserved for this devices. Hence, all peripheral/IO devices are +accessed using a memory-mapped scheme. A special linker script as well as the NEORV32 core software +library abstract the specific memory layout for the user. + +[IMPORTANT] +When accessing an IO device that hast not been implemented (via the according _IO_x_EN_ generic), a +load/store access fault exception is triggered. + +[IMPORTANT] +The peripheral/IO devices can only be written in full-word mode (i.e. 32-bit). Byte or half-word +(8/16-bit) writes will trigger a store access fault exception. Read accesses are not size constrained. +Processor-internal memories as well as modules connected to the external memory interface can still +be written with a byte-wide granularity. + +[TIP] +You should use the provided core software library to interact with the peripheral devices. This +prevents incompatibilities with future versions, since the hardware driver functions handle all the +register and register bit accesses. + +[TIP] +Most of the IO devices do not have a hardware reset. Instead, the devices are reset via software by +writing zero to the unit's control register. A general software-based reset of all devices is done by the +application start-up code `crt0.S`. + +**Nomenclature for the Peripheral / IO Devices Listing** + +Each peripheral device chapter features a register map showing accessible control and data registers of the +according device including the implemented control and status bits. You can directly interact with these +registers/bits via the provided _C-code defines_. These defines are set in the main processor core library +include file `sw/lib/include/neorv32.h`. The registers and/or register bits, which can be accessed +directly using plain C-code, are marked with a "[C]". + +Not all registers or register bits can be arbitrarily read/written. The following read/write access types are +available: + +* `r/w` registers / bits can be read and written +* `r/-` registers / bits are read-only; any write access to them has no effect +* `-/w` these registers / bits are write-only; they auto-clear in the next cycle and are always read as zero + +[TIP] +Bits / registers that are not listed in the register map tables are not (yet) implemented. These registers +/ bits are always read as zero. A write access to them has no effect, but user programs should only +write zero to them to keep compatible with future extension. + +[TIP] +When writing to read-only registers, the access is nevertheless acknowledged, but no actual data is +written. When reading data from a write-only register the result is undefined. + + +include::soc_imem.adoc[] + +include::soc_dmem.adoc[] + +include::soc_bootrom.adoc[] + +include::soc_icache.adoc[] + +include::soc_wishbone.adoc[] + +include::soc_gpio.adoc[] + +include::soc_wdt.adoc[] + +include::soc_mtime.adoc[] + +include::soc_uart.adoc[] + +include::soc_spi.adoc[] + +include::soc_twi.adoc[] + +include::soc_pwm.adoc[] + +include::soc_trng.adoc[] + +include::soc_cfs.adoc[] + +include::soc_nco.adoc[] + +include::soc_neoled.adoc[] + +include::soc_sysinfo.adoc[] + + Index: datasheet/soc_bootrom.adoc =================================================================== --- datasheet/soc_bootrom.adoc (nonexistent) +++ datasheet/soc_bootrom.adoc (revision 60) @@ -0,0 +1,36 @@ +<<< +:sectnums: +==== Bootloader ROM (BOOTROM) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_boot_rom.vhd | +| Software driver file(s): | none | _implicitly used_ +| Top entity port: | none | +| Configuration generics: | _BOOTLOADER_EN_ | implement processor-internal bootloader when _true_ +| CPU interrupts: | none | +|======================= + +As the name already suggests, the boot ROM contains the read-only bootloader image. When the bootloader +is enabled via the _BOOTLOADER_EN_ generic it is directly executed after system reset. + +The bootloader ROM is located at address 0xFFFF0000. This location is fixed and the bootloader ROM size +must not exceed 32kB. The bootloader read-only memory is automatically initialized during synthesis via the +`rtl/core/neorv32_bootloader_image.vhd` file, which is generated when compiling and installing the +bootloader sources. + +The bootloader ROM address space cannot be used for other applications even when the bootloader is not +implemented. + +**Boot Configuration** + +If the bootloader is implemented, the CPU starts execution after reset right at the beginning of the boot +ROM. If the bootloader is not implemented, the CPU starts execution at the beginning of the instruction +memory space (defined via `ispace_base_c` constant in the `neorv32_package.vhd` VHDL package file, +default `ispace_base_c` = 0x00000000). In this case, the instruction memory has to contain a valid +executable – either by using the internal IMEM with an initialization during synthesis or by a user-defined +initialization process. + +[TIP] +See section <<_bootloader>> for more information regarding the bootloader's boot process and configuration options. Index: datasheet/soc_cfs.adoc =================================================================== --- datasheet/soc_cfs.adoc (nonexistent) +++ datasheet/soc_cfs.adoc (revision 60) @@ -0,0 +1,76 @@ +<<< +:sectnums: +==== Custom Functions Subsystem (CFS) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_gfs.vhd | +| Software driver file(s): | neorv32_gfs.c | +| | neorv32_gfs.h | +| Top entity port: | `cfs_in_i` | custom input conduit +| | `cfs_out_o` | custom output conduit +| Configuration generics: | _IO_CFS_EN_ | implement CFS when _true_ +| | _IO_CFS_CONFIG_ | custom generic conduit +| | _IO_CFS_IN_SIZE_ | size of `cfs_in_i` +| | _IO_CFS_OUT_SIZE_ | size of `cfs_out_o` +| CPU interrupts: | fast IRQ channel 1 | CFS interrupt (see <<_processor_interrupts>>) +|======================= + +**Theory of Operation** + +The custom functions subsystem can be used to implement application-specific user-defined co-processors +(like encryption or arithmetic accelerators) or peripheral/communication interfaces. In contrast to connecting +custom hardware accelerators via the external memory interface, the CFS provide a convenient and low-latency +extension and customization option. + +The CFS provides up to 32x 32-bit memory-mapped registers (see register map table below). The actual +functionality of these register has to be defined by the hardware designer. + +[INFO] +Take a look at the template CFS VHDL source file (`rtl/core/neorv32_cfs.vhd`). The file is highly +commented to illustrate all aspects that are relevant for implementing custom CFS-based co-processor designs. + +**CFS Software Access** + +The CFS memory-mapped registers can be accessed by software using the provided C-language aliases (see +register map table below). Note that all interface registers provide 32-bit access data of type `uint32_t`. + +[source,c] +---- +// C-code CFS usage example +CFS_REG_0 = (uint32_t)some_data_array(i); // write to CFS register 0 +uint32_t temp = CFS_REG_20; // read from CFS register 20 +---- + +**CFS Interrupt** + +The CFS provides a single one-shot interrupt request signal mapped to the CPU's fast interrupt channel 1. +See section <<_processor_interrupts>> for more information. + +**CFS Configuration Generic** + +By default, the CFS provides a single 32-bit `std_(u)logic_vector` configuration generic _IO_CFS_CONFIG_ +that is available in the processor's top entity. This generic can be used to pass custom configuration options +from the top entity down to the CFS entity. + +**CFS Custom IOs** + +By default, the CFS also provides two unidirectional input and output conduits `cfs_in_i` and `cfs_out_o`. +These signals are propagated to the processor's top entity. The actual use of these signals has to be defined +by the hardware designer. The size of the input signal conduit `cfs_in_i` is defined via the (top's) _IO_CFS_IN_SIZE_ configuration +generic (default = 32-bit). The size of the output signal conduit `cfs_out_o` is defined via the (top's) +_IO_CFS_OUT_SIZE_ configuration generic (default = 32-bit). If the custom function subsystem is not implemented +(_IO_CFS_EN_ = false) the `cfs_out_o` signal is tied to all-zero. + +.CFS register map +[cols="^4,<5,^2,^3,<14"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bit(s) | R/W | Function +| `0xfffffe00` | _CFS_REG_0_ |`31:0` | (r)/(w) | custom CFS interface register 0 +| `0xfffffe04` | _CFS_REG_1_ |`31:0` | (r)/(w) | custom CFS interface register 1 +| ... | ... |`31:0` | (r)/(w) | ... +| `0xfffffe78` | _CFS_REG_30_ |`31:0` | (r)/(w) | custom CFS interface register 30 +| `0xfffffe7c` | _CFS_REG_31_ |`31:0` | (r)/(w) | custom CFS interface register 31 +|======================= Index: datasheet/soc_dmem.adoc =================================================================== --- datasheet/soc_dmem.adoc (nonexistent) +++ datasheet/soc_dmem.adoc (revision 60) @@ -0,0 +1,19 @@ +<<< +:sectnums: +==== Data Memory (DMEM) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_dmem.vhd | +| Software driver file(s): | none | _implicitly used_ +| Top entity port: | none | +| Configuration generics: | _MEM_INT_DMEM_EN_ | implement processor-internal DMEM when _true_ +| | _MEM_INT_DMEM_SIZE_ | DMEM size in bytes +| CPU interrupts: | none | +|======================= + +Implementation of the processor-internal data memory is enabled via the processor's _MEM_INT_DMEM_EN_ +generic. The size in bytes is defined via the _MEM_INT_DMEM_SIZE_ generic. If the DMEM is implemented, +the memory is mapped into the data memory space and located right at the beginning of the data memory +space (default `dspace_base_c` = 0x80000000). The DMEM is always implemented as RAM. Index: datasheet/soc_gpio.adoc =================================================================== --- datasheet/soc_gpio.adoc (nonexistent) +++ datasheet/soc_gpio.adoc (revision 60) @@ -0,0 +1,42 @@ +<<< +:sectnums: +==== General Purpose Input and Output Port (GPIO) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_gpio.vhd | +| Software driver file(s): | neorv32_gpio.c | +| | neorv32_gpio.h | +| Top entity port: | `gpio_o` | 32-bit parallel output port +| | `gpio_i` | 32-bit parallel input port +| Configuration generics: | _IO_GPIO_EN_ | implement GPIO port when _true_ +| CPU interrupts: | FIRQ channel 8 | pin-change interrupt (see <<_processor_interrupts>>) +|======================= + +**Theory of Operation** + +The general purpose parallel IO port unit provides a simple 32-bit parallel input port and a 32-bit parallel +output port. These ports can be used chip-externally (for example to drive status LEDs, connect buttons, etc.) +or system-internally to provide control signals for other IP modules. When the modules is disabled for +implementation the GPIO output port is tied to zero. + +**Pin-Change Interrupt** + +The parallel input port `gpio_i` features a single pin-change interrupt. Whenever an input pin has a low-to-high +or high-to-low transition, the interrupt is triggered. By default, the pin-change interrupt is disabled and +can be enabled using a bit mask that has to be written to the _GPIO_INPUT_ register. Each set bit in this mask +enables the pin-change interrupt for the corresponding input pin. If more than one input pin is enabled for +triggering the pin-change interrupt, any transition on one of the enabled input pins will trigger the CPU's pinchange +interrupt. If the modules is disabled for implementation, the pin-change interrupt is also permanently +disabled. + +.GPIO unit register map +[cols="<2,<2,^1,^1,<6"] +[options="header",grid="rows"] +|======================= +| Address | Name [C] | Bit(s) | R/W | Function +.2+<| `0xffffff80` .2+<| _GPIO_INPUT_ ^| 31:0 ^| r/- <| parallel input port + ^| 31:0 ^| -/w <| parallel input pin-change IRQ enable mask +| `0xffffff84` | _GPIO_OUTPUT_ | 31:0 | r/w | parallel output port +|======================= Index: datasheet/soc_icache.adoc =================================================================== --- datasheet/soc_icache.adoc (nonexistent) +++ datasheet/soc_icache.adoc (revision 60) @@ -0,0 +1,50 @@ +<<< +:sectnums: +==== Processor-Internal Instruction Cache (iCACHE) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_icache.vhd | +| Software driver file(s): | none | _implicitly used_ +| Top entity port: | none | +| Configuration generics: | _ICACHE_EN_ | implement processor-internal instruction cache when _true_ +| | _ICACHE_NUM_BLOCKS_ | number of cache blocks (pages/lines) +| | _ICACHE_BLOCK_SIZE_ | size of a cache block in bytes +| | _ICACHE_ASSOCIATIVITY_ | associativity / number of sets +| CPU interrupts: | none | +|======================= + +The processor features an optional cache for instructions to compensate memories with high latency. The +cache is directly connected to the CPU's instruction fetch interface and provides a full-transparent buffering +of instruction fetch accesses to the entire 4GB address space. + +[IMPORTANT] +The instruction cache is intended to accelerate instruction fetch via the external memory interface. +Since all processor-internal memories provide an access latency of one cycle (by default), caching +internal memories does not bring any performance gain. However, it _might_ reduce traffic on the +processor-internal bus. + +The cache is implemented if the _ICACHE_EN_ generic is true. The size of the cache memory is defined via +_ICACHE_BLOCK_SIZE_ (the size of a single cache block/page/line in bytes; has to be a power of two and >= +4 bytes), _ICACHE_NUM_BLOCKS_ (the total amount of cache blocks; has to be a power of two and >= 1) and +the actual cache associativity _ICACHE_ASSOCIATIVITY_ (number of sets; 1 = direct-mapped, 2 = 2-way set-associative, +has to be a power of two and >= 1). + +If the cache associativity (_ICACHE_ASSOCIATIVITY_) is > 1 the LRU replacement policy (least recently +used) is used. + +[TIP] +Keep the features of the targeted FPGA's memory resources (block RAM) in mind when configuring +the cache size/layout to maximize and optimize resource utilization. + +By executing the `ifence.i` instruction (`Zifencei` CPU extension) the cache is cleared and a reload from +main memory is forced. Among other things, this allows to implement self-modifying code. + +**Bus Access Fault Handling** + +The cache always loads a complete cache block (_ICACHE_BLOCK_SIZE_ bytes) aligned to the size of a cache +block if a miss is detected. If any of the accessed addresses within a single block do not successfully +acknowledge (i.e. issuing an error signal or timing out) the whole cache block is invalidate and any access to +an address within this cache block will also raise an instruction fetch bus error fault exception. + Index: datasheet/soc_imem.adoc =================================================================== --- datasheet/soc_imem.adoc (nonexistent) +++ datasheet/soc_imem.adoc (revision 60) @@ -0,0 +1,31 @@ +<<< +:sectnums: +==== Instruction Memory (IMEM) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_imem.vhd | +| Software driver file(s): | none | _implicitly used_ +| Top entity port: | none | +| Configuration generics: | _MEM_INT_IMEM_EN_ | implement processor-internal IMEM when _true_ +| | _MEM_INT_IMEM_SIZE_ | IMEM size in bytes +| | _MEM_INT_IMEM_ROM_ | implement IMEM as ROM when _true_ +| CPU interrupts: | none | +|======================= + +Implementation of the processor-internal instruction memory is enabled via the processor's +_MEM_INT_IMEM_EN_ generic. The size in bytes is defined via the _MEM_INT_IMEM_SIZE_ generic. If the +IMEM is implemented, the memory is mapped into the instruction memory space and located right at the +beginning of the instruction memory space (default `ispace_base_c` = 0x00000000). + +By default, the IMEM is implemented as RAM, so the content can be modified during run time. This is +required when using a bootloader that can update the content of the IMEM at any time. If you do not need +the bootloader anymore – since your application development has completed and you want the program to +permanently reside in the internal instruction memory – the IMEM can also be implemented as true _read-only_ +memory. In this case set the _MEM_INT_IMEM_ROM_ generic of the processor's top entity to _true_. + +When the IMEM is implemented as ROM, it will be initialized during synthesis with the actual application +program image. The compiler toolchain will generate a VHDL initialization +file `rtl/core/neorv32_application_image.vhd`, which is automatically inserted into the IMEM. If +the IMEM is implemented as RAM (default), the memory will **not be initialized** at all. Index: datasheet/soc_mtime.adoc =================================================================== --- datasheet/soc_mtime.adoc (nonexistent) +++ datasheet/soc_mtime.adoc (revision 60) @@ -0,0 +1,50 @@ +<<< +:sectnums: +==== Machine System Timer (MTIME) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_mtime.vhd | +| Software driver file(s): | neorv32_mtime.c | +| | neorv32_mtime.h | +| Top entity port: | `mtime_i` | System time input from external MTIME +| | `mtime_o` | System time output (64-bit) for SoC +| Configuration generics: | _IO_MTIME_EN_ | implement MTIME when _true_ +| CPU interrupts: | `MTI` | machine timer interrupt (see <<_processor_interrupts>>) +|======================= + +**Theory of Operation** + +The MTIME machine system timer implements the memory-mapped MTIME timer from the official RISC-V +specifications. This unit features a 64-bit system timer incremented with the primary processor clock. +The current system time can also be obtained using the `time[h]` CSRs and is made available for processor-external +use via the top's `mtime_o` signal. + +[NOTE] +If the processor-internal **MTIME unit is NOT implemented**, the top's `mtime_i` input signal is used to update the `time[h]` CSRs +and the `MTI` machine timer interrupt) CPU interrupt is directly connected to the top's `mtime_irq_i` input. + +The 64-bit system time can be accessed via the `MTIME_LO` and `MTIME_HI` memory-mapped registers (read/write) and also via +the CPU's `time[h]` CSRs (read-only). A 64-bit time compare register – accessible via memory-mapped `MTIMECMP_LO` and `MTIMECMP_HI` +registers – are used to configure an interrupt to the CPU. The interrupt is triggered +whenever `MTIME` (high & low part) >= `MTIMECMP` (high & low part) and is directly forwarded to the CPU's `MTI` interrupt. + +[TIP] +The interrupt request is a single-shot signal, +so the CPU is triggered once if the system time is greater than or equal to the compare time. Hence, +another MTIME IRQ is only possible when updating `MTIMECMP`. + +The 64-bit counter and the 64-bit comparator are implemented as 2×32-bit counters and comparators with a +registered carry to prevent a 64-bit carry chain and thus, to simplify timing closure. + +.MTIME register map +[cols="<3,<3,^1,^1,<6"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bits | R/W | Function +| `0xffffff90` | _MTIME_LO_ | 31:0 | r/w | machine system time, low word +| `0xffffff94` | _MTIME_HI_ | 31:0 | r/w | machine system time, high word +| `0xffffff98` | _MTIMECMP_LO_ | 31:0 | r/w | time compare, low word +| `0xffffff9c` | _MTIMECMP_HI_ | 31:0 | r/w | time compare, high word +|======================= Index: datasheet/soc_nco.adoc =================================================================== --- datasheet/soc_nco.adoc (nonexistent) +++ datasheet/soc_nco.adoc (revision 60) @@ -0,0 +1,129 @@ +<<< +:sectnums: +==== Numerically-Controlled Oscillator (NCO) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_nco.vhd | +| Software driver file(s): | neorv32_nco.c | +| | neorv32_nco.h | +| Top entity port: | `nco_o` | NCO output (3x 1-bit channels) +| Configuration generics: | _IO_NCO_EN_ | implement NCO when _true_ +| CPU interrupts: | none | +|======================= + +**Theory of Operation** + +The numerically-controller oscillator (NCO) provides a precise arbitrary linear frequency generator with +three independent channels. Based on a **direct digital synthesis** core, the NCO features a 20-bit wide +accumulator that is incremented with a programmable "tuning word". Whenever the accumulator overflows, a +flip flop is toggled that provides the actual frequency output. The accumulator increment is driven by one of +eight configurable clock sources, which are derived from the processor's main clock. + +The NCO features four accessible registers: the control register _NCO_CT_ and three _NCO_TUNE_CHi_ registers for +the tuning word of each channel i. The NCO is globally enabled by setting the _NCO_CT_EN_ bit in the control +register. If this bit is cleared, the accumulators of all channels are reset. The clock source for each channel i is +selected via the three bits _NCO_CT_CHi_PRSCx_ prescaler. The resulting clock is generated from the main +processor clock (f~main~) divided y the selected prescaler. + +.NCO prescaler configuration +[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] +[options="header",grid="rows"] +|======================= +| **`NCO_CT_CHi_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` +| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 +|======================= + +The resulting output frequency of each channel i is defined by the following equation: + +_**f~NCO~(i)**_ = ( _f~main~[Hz]_ / `clock_prescaler`(i) ) * (`tuning_word`(i) / 2*2^20+1^) + +The maximum NCO frequency f~NCOmax~ is configured when using the minimal clock prescaler and a maximum all-one +tuning word: + +_**f~NCOmax~**_ = ( _f~main~[Hz]_ / 2 ) * (1 / 2*2^20+1^) + +The minimum "frequency" is always 0 Hz when the tuning word is zero. The frequency resolution f~NCOres~ is +defined using the maximum clock prescaler and a minimal non-zero tuning word (= 1): + +_**f~NCOres~**_ = ( _f~main~[Hz]_ / 4096 ) * (1 / 2*2^20+1^) + +Assuming a processor frequency of f~main~ = 100 MHz the maximum NCO output frequency is f~NCOmax~ = 12.499 +MHz with an NCO frequency resolution of f~NCOres~ = 0.00582 Hz. + +**Advanced Configuration** + +The idle polarity of each channel is configured via the _NCO_CT_CHi_IDLE_POL_ flag and can be either `0` +(idle low) or `1` (idle high), which basically allows to invert the NCO output. If the NCO is globally disabled +by clearing the _NCO_CT_EN_ flag, `nco_o(i)` output bit i is set to the according _NCO_CT_CHi_IDLE_POL_. + +The current state of each NCO channel output can be read by software via the NCO_CT_CHi_OUTPUT bit. +The NCO frequency output is normally available via the top nco_o output signal. The according channel +output can be permanently set to zero by clearing the according NCO_CT_CHi_OE bit. + +Each NCO channel can operate either in standard mode or in pulse mode. The mode is configured via the +according channel's NCO_CT_CHi_MODE control register bit. + +**_Standard_ Operation Mode** + +If this _NCO_CT_CHi_MODE_ bit of channel i is cleared, the channel operates in standard mode providing a +frequency with **exactly 50% duty cycle** (T~high~ = T~low~). + +**_Pulse_ Operation Mode** + +If the _NCO_CT_CHi_MODE_ bit of channel i is set, the channel operates in pulse mode. In this mode, the duty +cycle can be modified to generate active pulses with variable length. Note that the "active" pulse polarity is defined +by the inverted _NCO_CT_CHi_IDLE_POL_ bit. + +Eight different pulse lengths are available. The active pulse length is defined as number of NCO clock +cycles, where the NCO clock is defined via the clock prescaler bits _NCO_CT_Chi_PRSCx_. The pulse length +of channel i is programmed by the 3-bit _NCO_CT_CHi_PULSEx_ configuration: + +.NCO pulse length configuration +[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] +[options="header",grid="rows"] +|======================= +| **`NCO_CT_CHi_PULSEx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` +| Pulse length (in NCO clock cycles) | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 +|======================= + +If _NCO_CT_CHi_IDLE_POL_ is cleared, T~high~ is defined by the _NCO_CT_CHi_PULSEx_ configuration and T~low~ = +T – T~high~. _If NCO_CT_CHi_IDLE_POL_ is set, T~low~ is defined by the _NCO_CT_CHi_PULSEx_ configuration and +T~high~ = T – T~low~. + +The actual output frequency of the channel (defined via the clock prescaler and the tuning word) is not +affected by the pulse configuration. + +For simple PWM applications, that do not require a precise frequency but a more flexible duty cycle +configuration, see section <<_pulse_width_modulation_controller_pwm>>. + +<<< +.NCO register map +[cols="<4,<3,<9,^2,<11"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bit(s), Name [C] | R/W | Function +.22+<| `0xffffffc0` .22+<| _NCO_CT_ ^|`0` _NCO_CT_EN_ ^| r/w <| NCO enable + 3+^| Channel 0 `nco_o(0)` + ^|`1` _NCO_CT_CH0_MODE_ ^| r/w <| output mode (`0`=fixed 50% duty cycle; `1`=pulse mode) + ^|`2` _NCO_CT_CH0_IDLE_POL_ ^| r/w <| output idle polarity + ^|`3` _NCO_CT_CH0_OE_ ^| r/w <| enable output to `nco_o(0)` + ^|`4` _NCO_CT_CH0_OUTPUT_ ^| r/- <| current state of `nco_o(0)` + ^|`7:5` _NCO_CT_CH0_PRSC02_ : _NCO_CT_CH0_PRSC0_ ^| r/w <| 3-bit clock prescaler select + ^|`10_:8` _NCO_CT_CH0_PULSE2_ : _NCO_CT_CH0_PULSE0_ ^| r/w <| 3-bit pulse length select + 3+^| Channel 1 `nco_o(1)` + ^|`11` _NCO_CT_CH1_MODE_ ^| r/w <| output mode (`0`=fixed 50% duty cycle; `1`=pulse mode) + ^|`12` _NCO_CT_CH1_IDLE_POL_ ^| r/w <| output idle polarity + ^|`13` _NCO_CT_CH1_OE_ ^| r/w <| enable output to `nco_o(1)` + ^|`14` _NCO_CT_CH1_OUTPUT_ ^| r/- <| current state of `nco_o(1)` + ^|`17:15` _NCO_CT_CH1_PRSC2_ : _NCO_CT_CH1_PRSC0_ ^| r/w <| 3-bit clock prescaler select + ^|`20:18` _NCO_CT_CH1_PULSE2_ : _NCO_CT_CH1_PULSE0_ ^| r/w <| 3-bit pulse length select + 3+^| Channel 2 `nco_o(2)` + ^|`21` _NCO_CT_CH2_MODE_ ^| r/w <| output mode (`0`=fixed 50% duty cycle; `1`=pulse mode) + ^|`22` _NCO_CT_CH2_IDLE_POL_ ^| r/w <| output idle polarity + ^|`23` _NCO_CT_CH2_OE_ ^| r/w <| enable output to `nco_o(2)` + ^|`24` _NCO_CT_CH2_OUTPUT_ ^| r/- <| current state of `nco_o(2)` + ^|`27:25` _NCO_CT_CH2_PRSC2_ : _NCO_CT_CH2_PRSC0_ ^| r/w <| 3-bit clock prescaler select + ^|`30:28` _NCO_CT_CH2_PULSE2_ : _NCO_CT_CH2_PULSE0_ ^| r/w <| 3-bit pulse length select +|======================= Index: datasheet/soc_neoled.adoc =================================================================== --- datasheet/soc_neoled.adoc (nonexistent) +++ datasheet/soc_neoled.adoc (revision 60) @@ -0,0 +1,193 @@ +<<< +:sectnums: +==== Smart LED Interface (NEOLED) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_neoled.vhd | +| Software driver file(s): | neorv32_neoled.c | +| | neorv32_neoled.h | +| Top entity port: | `neoled_o` | 1-bit serial data +| Configuration generics: | _IO_NEOLED_EN_ | implement NEOLED when _true_ +| CPU interrupts: | fast IRQ channel 9 | NEOLED interrupt (see <<_processor_interrupts>>) +|======================= + +**Theory of Operation** + +The NEOLED module provides a dedicated interface for "smart RGB LEDs" like the WS2812 or WS2811. +These LEDs provide a single interface wire that uses an asynchronous serial protocol for transmitting color +data. Basically, data is transferred via LED-internal shift registers, which allows to cascade an unlimited +number of smart LEDs. The protocol provides a RESET command to strobe the transmitted data into the +LED PWM driver registers after data has shifted throughout all LEDs in a chain. + +[NOTE] +The NEOLED interface is compatible to the "Adafruit Industries NeoPixel" products, which feature +WS2812 (or older WS2811) smart LEDs (see link:https://learn.adafruit.com/adafruit-neopixel-uberguide). + +The interface provides a single 1-bit output `neoled_o` to drive an arbitrary number of LEDs. Since the +NEOLED module provides 24-bit and 32-bit operating modes, a mixed setup with RGB LEDs (24-bit color) +and RGBW LEDs (32-bit color including a dedicated white LED chip) is also possible. + +**Theory of Operation – Protocol** + +The interface of the WS2812 LEDs uses an 800kHz carrier signal. Data is transmitted in a serial manner +starting with LSB-first. The intensity for each R, G & B LED chip (= color code) is defined via an 8-bit +value. The actual data bits are transferred by modifying the duty cycle of the signal (the timings for the +WS2812 are shown below). A RESET command is "send" by pulling the data line LOW for at least 50μs. + +.WS2812 bit-level protocol - taken from the "Adafruit NeoPixel Überguide" +image::neopixel.png[align=center] + +.WS2812 interface timing +[cols="<2,<2,<6"] +[grid="all"] +|======================= +| T~total~ (T~carrier~) | 1.25μs +/- 300ns | period for a single bit +| T~0H~ | 0.4μs +/- 150ns | high-time for sending a `1` +| T~0L~ | 0.8μs +/- 150ns | low-time for sending a `1` +| T~1H~ | 0.85μs +/- 150ns | high-time for sending a `0` +| T~1L~ | 0.45μs +/- 150 ns | low-time for sending a `0` +| RESET | Above 50μs | low-time for sending a RESET command +|======================= + +**Theory of Operation – NEOLED Module** + +The NEOLED modules provides two accessible interface register: the control register _NEOLED_CT_ and the +TX data register _NEOLED_DATA_. The NEOLED module is globally enabled via the control register's +_NEOLED_CT_EN_ bit. Clearing this bit will terminate any current operation, reset the module and +set the `neoled_o` output to zero. The precise timing (implementing the **WS2812** protocol) and transmission +mode are fully programmable via the _NEOLED_CT_ register to provide maximum flexibility. + +**Timing Configuration** + +The basic carrier frequency (800kHz for the WS2812 LEDs) is configured via a 3-bit main clock prescaler (_NEOLED_CT_PRSCx_, see table below) +that scales the main processor clock f~main~ and a 5-bit cycle multiplier _NEOLED_CT_T_TOT_x_. + +.NEOLED prescaler configuration +[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] +[options="header",grid="rows"] +|======================= +| **`NEOLED_CT_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` +| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 +|======================= + +The duty-cycles (or more precisely: the high- and low-times for sending either a '1' bit or a '0' bit) are +defined via the 5-bit _NEOLED_CT_T_ONE_H_x_ and _NEOLED_CT_T_ZERO_H_x_ values, respecively. These programmable +timing constants allow to adapt the interface for a wide variety of smart LED protocol (for example WS2812 vs. +WS2811). + +**Timing Configuration – Example (WS2812)** + +Generate the base clock f~TX~ for the NEOLED TX engine: + +* processor clock f~main~ = 100 MHz +* _NEOLED_CT_PRSCx_ = `0b001` = f~main~ / 4 + +_**f~TX~**_ = _f~main~[Hz]_ / `clock_prescaler` = 100MHz / 4 = 25MHz + +_**T~TX~**_ = 1 / _**f~TX~**_ = 40ns + +Generate carrier period (T~carrier~) and *high-times* (duty cycle) for sending `0` (T~0H~) and `1` (T~1H~) bits: + +* _NEOLED_CT_T_TOT_ = `0b11110` (= decimal 30) +* _NEOLED_CT_T_ZERO_H_ = `0b01010` (= decimal 10) +* _NEOLED_CT_T_ONE_H_ = `0b10100` (= decimal 20) + +_**T~carrier~**_ = _**T~TX~**_ * _NEOLED_CT_T_TOT_ = 40ns * 30 = 1.4µs + +_**T~0H~**_ = _**T~TX~**_ * _NEOLED_CT_T_ZERO_H_ = 40ns * 10 = 0.4µs + +_**T~1H~**_ = _**T~TX~**_ * _NEOLED_CT_T_ONE_H_ = 40ns * 20 = 0.8µs + +[TIP] +The NEOLED SW driver library (`neorv32_neoled.h`) provides a simplified configuration +function that configures all timing parameters for driving WS2812 LEDs based on the processor +clock configuration. + +**RGB / RGBW Configuration** + +NeoPixel are available in two "color" version: LEDs with three chips providing RGB color and LEDs with +four chips providing RGB color plus a dedicated white LED chip (= RGBW). Since the intensity of every +LED chip is defined via an 8-bit value the RGB LEDs require a frame of 24-bit per module and the RGBW +LEDs require a frame of 32-bit per module. + +The data transfer quantity of the NEOLED module can be configured via the _NEOLED_MODE_EN_ control +register bit. If this bit is cleared, the NEOLED interface operates in 24-bit mode and will transmit bits `23:0` of +the data written to _NEOLED_DATA_. If _NEOLED_MODE_EN_ is set, the NEOLED interface operates in 32-bit +mode and will transmit bits `31:0` of the data written to _NEOLED_DATA_. + +**TX Data FIFO** + +The interface features a TX data buffer (a FIFO) to allow CPU-independent operation. The buffer depth +is configured via the `tx_buffer_entries_c` constant (default = 4 entries) in the module's VHDL source +file `rtl/core/neorv32_neoled.vhd`. The current configuration can be read via the _NEOLED_CT_BUFS_x_ +control register bits, which result log2(`tx_buffer_entries_c`). + +When writing data to the _NEOLED_DATA_ register the data is automatically written to the TX buffer. Whenever +data is available in the buffer the serial transmission engine will take it and transmit it to the LEDs. + +The data transfer size (_NEOLED_MODE_EN_) can be modified at every time since this control register bit is also buffered +in the FIFO. This allows to arbitrarily mixing RGB and RGBW LEDs in the chain. + +[WARNING] +Please note that the timing configurations (_NEOLED_CT_PRSCx_, _NEOLED_CT_T_TOT_x_, +_NEOLED_CT_T_ONE_H_x_ and _NEOLED_CT_T_ZERO_H_x_) are NOT stored to the buffer. Changing +these value while the buffer is not empty or the TX engine is still sending will cause data corruption. + +**Status Configuration** + +The NEOLED modules features two read-only status bits in the control register: _NEOLED_CT_BUSY_ and +_NEOLED_CT_TX_STATUS_. + +If the _NEOLED_CT_TX_STATUS_ is set the serial TX engine is still busy sending serial data to the LED stripes. +If the flag is cleared, the TX engine is idle and the serial data output `neoled_o` is set LOW. + +The _NEOLED_CT_BUSY_ flag provides a programmable option to check for the TX buffer state. The control +register's _NEOLED_CT_BSCON_ bit is used to configure the "meaning" of the _NEOLED_CT_BUSY_ flag. The +condition for sending an interrupt request (IRQ) to the CPU is also configured via the _NEOLED_CT_BSCON_ +bit. + +[cols="^5,^8,^8"] +[options="header",grid="all"] +|======================= +| _NEOLED_CT_BSCON_ | _NEOLED_CT_BUSY_ | Sending an IRQ when ... +| 0 | the busy flag will clear if there **IS at least one free entry** in the TX buffer | the IRQ will fire if **at least one entry GETS free** in the TX buffer +| 1 | the busy flag will clear if the **whole TX buffer IS empty** | the IRQ will fire if the **whole TX buffer GETS empty** +|======================= + +When _NEOLED_CT_BSCON_ is set, the CPU can write up to `tx_buffer_entries_c` of new data words to +_NEOLED_DATA_ without checking the busy flag _NEOLED_CT_BUSY_. This highly relaxes time constraints for +sending a continuous data stream to the LEDs (as an idle time beyond 50μs will trigger the LED's a RESET +command). + +<<< +.NEOLED register map +[cols="<4,<5,<9,^2,<9"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bit(s), Name [C] | R/W | Function +.22+<| `0xffffffd8` .22+<| _NEOLED_CT_ <|`0` _NEOLED_CT_EN_ ^| r/w <| NCO enable + <|`1` _NEOLED_CT_MODE_ ^| r/w <| data transfer size; `0`=24-bit; `1`=32-bit + <|`2` _NEOLED_CT_BSCON_ ^| r/w <| busy flag / IRQ trigger configuration (see table above) + <|`3` _NEOLED_CT_PRSC0_ ^| r/w <| 3-bit clock prescaler, bit 0 + <|`4` _NEOLED_CT_PRSC1_ ^| r/w <| 3-bit clock prescaler, bit 1 + <|`5` _NEOLED_CT_PRSC2_ ^| r/w <| 3-bit clock prescaler, bit 2 + <|`6` _NEOLED_CT_BUFS0_ ^| r/- .4+<| 4-bit log2(`tx_buffer_entries_c`) + <|`7` _NEOLED_CT_BUFS1_ ^| r/- + <|`8` _NEOLED_CT_BUFS2_ ^| r/- + <|`9` _NEOLED_CT_BUFS3_ ^| r/- + <|`10` _NEOLED_CT_T_TOT_0_ ^| r/w .5+| 5-bit pulse clock ticks per total single-bit period (T~total~) + <|`11` _NEOLED_CT_T_TOT_1_ ^| r/w + <|`12` _NEOLED_CT_T_TOT_2_ ^| r/w + <|`13` _NEOLED_CT_T_TOT_3_ ^| r/w + <|`14` _NEOLED_CT_T_TOT_4_ ^| r/w + <|`20` _NEOLED_CT_ONE_H_0_ ^| r/w .5+<| 5-bit pulse clock ticks per high-time for sending a one-bit (T~H1~) + <|`21` _NEOLED_CT_ONE_H_1_ ^| r/w + <|`22` _NEOLED_CT_ONE_H_2_ ^| r/w + <|`23` _NEOLED_CT_ONE_H_3_ ^| r/w + <|`24` _NEOLED_CT_ONE_H_4_ ^| r/w + <|`30` _NEOLED_CT_TX_STATUS_ ^| r/- <| transmit engine busy when `1` + <|`31` _NEOLED_CT_BUSY_ ^| r/- <| busy / buffer status flag; configured via _NEOLED_CT_BSCON_ (see table above) +| `0xffffffdc` | _NEOLED_DATA_ <|`31:0` / `23:0` ^| -/w <| TX data (32-/24-bit) +|======================= Index: datasheet/soc_pwm.adoc =================================================================== --- datasheet/soc_pwm.adoc (nonexistent) +++ datasheet/soc_pwm.adoc (revision 60) @@ -0,0 +1,79 @@ +<<< +:sectnums: +==== Pulse-Width Modulation Controller (PWM) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_pwm.vhd | +| Software driver file(s): | neorv32_pwm.c | +| | neorv32_pwm.h | +| Top entity port: | `pwm_o` | up to 60 PWM output channels (1-bit per channel) +| Configuration generics: | _IO_PWM_NUM_CH_ | number of PWM channels to implement (0..60) +| CPU interrupts: | none | +|======================= + +The PWM controller implements a pulse-width modulation controller with up to 60 independent channels and 8- +bit resolution per channel. The actual number of implemented channels is defined by the _IO_PWM_NUM_CH_ generic. +Setting this generic to zero will completely remove the PWM controller from the design. + +The PWM controller is based on an 8-bit base counter with a programmable threshold comparators for each channel +that defines the actual duty cycle. The controller can be used to drive fancy RGB-LEDs with 24- +bit true color, to dim LCD back-lights or even for "analog" control. An external integrator (RC low-pass filter) +can be used to smooth the generated "analog" signals. + +**Theory of Operation** + +The PWM controller is activated by setting the _PWM_CT_EN_ bit in the module's control register _PWM_CT_. When this +bit is cleared, the unit is reset and all PWM output channels are set to zero. +The 8-bit duty cycle for each channel, which represents the channel's "intensity", is defined via an 8-bit value. The module +provides up to 15 duty cycle registers _PWM_DUTY0_ to _PWM_DUTY14_ (depending on the number of implemented channels). +Each register contains the duty cycle configuration for 4 consecutive channels. For example, the duty cycle of channel 0 +is defined via bits 7:0 in _PWM_DUTY0_. The duty cycle of channel 2 is defined via bits 15:0 in _PWM_DUTY0_. +Channel 4's duty cycle is defined via bits 7:0 in _PWM_DUTY1_ and so on. + +[NOTE] +Regardless of the configuration of _IO_PWM_NUM_CH_ all module registers can be accessed without raising an exception. +Software can discover the number of available channels by writing 0xff to all duty cycle configuration bytes and +reading those values back. The duty-cycle of channels that were not implemented always reads as zero. + +Based on the configured duty cycle the according intensity of the channel can be computed by the following formula: + +_**Intensity~x~**_ = _PWM_DUTY_CHx_ / (2^8^) + +The base frequency of the generated PWM signals is defined by the PWM core clock. This clock is derived +from the main processor clock and divided by a prescaler via the 3-bit PWM_CT_PRSCx in the unit's control +register. The following prescalers are available: + +.PWM prescaler configuration +[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] +[options="header",grid="rows"] +|======================= +| **`PWM_CT_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` +| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 +|======================= + +The resulting PWM base frequency is defined by: + +_**f~PWM~**_ = _f~main~[Hz]_ / (2^8^ * `clock_prescaler`) + +<<< +.PWM register map +[cols="<4,<4,<6,^2,<8"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bit(s), Name [C] | R/W | Function +.4+<| `0xfffffe80` .4+<| _PWM_CT_ <|`0` _PWM_CT_EN_ ^| r/w | PWM enable + <|`1` _PWM_CT_PRSC0_ ^| r/w .3+<| 3-bit clock prescaler select + <|`2` _PWM_CT_PRSC1_ ^| r/w + <|`3` _PWM_CT_PRSC2_ ^| r/w +.4+<| `0xfffffe84` .4+<| _PWM_DUTY0_ <|`7:0` ^| r/w <| 8-bit duty cycle for channel 0 + <|`15:8` ^| r/w <| 8-bit duty cycle for channel 1 + <|`23:16` ^| r/w <| 8-bit duty cycle for channel 2 + <|`31:24` ^| r/w <| 8-bit duty cycle for channel 3 +| ... | ... | ... | r/w | ... +.4+<| `0xfffffebc` .4+<| _PWM_DUTY14_ <|`7:0` ^| r/w <| 8-bit duty cycle for channel 56 + <|`15:8` ^| r/w <| 8-bit duty cycle for channel 57 + <|`23:16` ^| r/w <| 8-bit duty cycle for channel 58 + <|`31:24` ^| r/w <| 8-bit duty cycle for channel 59 +|======================= Index: datasheet/soc_spi.adoc =================================================================== --- datasheet/soc_spi.adoc (nonexistent) +++ datasheet/soc_spi.adoc (revision 60) @@ -0,0 +1,75 @@ +<<< +:sectnums: +==== Serial Peripheral Interface Controller (SPI) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_spi.vhd | +| Software driver file(s): | neorv32_spi.c | +| | neorv32_spi.h | +| Top entity port: | `spi_sck_o` | 1-bit serial clock output +| | `spi_sdo_i` | 1-bit serial data output +| | `spi_sdi_o` | 1-bit serial data input +| | `spi_csn_i` | 8-bit dedicated chip select (low-active) +| Configuration generics: | _IO_SPI_EN_ | implement SPI controller when _true_ +| CPU interrupts: | fast IRQ channel 6 | transmission done interrupt (see <<_processor_interrupts>>) +|======================= + +**Theory of Operation** + +SPI is a synchronous serial transmission interface. The NEORV32 SPI transceiver allows 8-, 16-, 24- and 32- +bit long transmissions. The unit provides 8 dedicated chip select signals via the top entity's `spi_csn_o` +signal. + +The SPI unit is enabled via the _SPI_CT_EN_ bit in the _SPI_CT_ control register. The idle clock polarity is configured via the _SPI_CT_CPHA_ +bit and can be low (`0`) or high (`1`) during idle. The data quantity to be transferred within a +single transmission is defined via the _SPI_CT_SIZEx bits_. The unit supports 8-bit (`00`), 16-bit (`01`), 24- +bit (`10`) and 32-bit (`11`) transfers. Whenever a transfer is completed, the "transmission done interrupt" is triggered. +A transmission is still in progress as long as the _SPI_CT_BUSY_ flag is set. + +The SPI controller features 8 dedicated chip-select lines. These lines are controlled via the control register's _SPI_CT_CSx_ bits. When +a specifc _SPI_CT_CSx_ bit is **set**, the according chip select line `spi_csn_o(x)` goes **low** (low-active chip select lines). + +The SPI clock frequency is defined via the 3-bit _SPI_CT_PRSCx_ clock prescaler. The following prescalers +are available: + +.SPI prescaler configuration +[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] +[options="header",grid="rows"] +|======================= +| **`SPI_CT_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` +| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 +|======================= + +Based on the _SPI_CT_PRSCx_ configuration, the actual SPI clock frequency f~SPI~ is derived from the processor's main clock f~main~ and is determined by: + +_**f~SPI~**_ = _f~main~[Hz]_ / (2 * `clock_prescaler`) + +A transmission is started when writing data to the _SPI_DATA_ register. The data must be LSB-aligned. So if +the SPI transceiver is configured for less than 32-bit transfers data quantity, the transmit data must be placed +into the lowest 8/16/24 bit of _SPI_DATA_. Vice versa, the received data is also always LSB-aligned. + +.SPI register map +[cols="<2,<2,<4,^1,<7"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bit(s), Name [C] | R/W | Function +.16+<| `0xffffffa8` .16+<| _SPI_CT_ <|`0` _SPI_CT_CS0_ ^| r/w .8+<| Direct chip-select 0..7; setting `spi_csn_o(x)` low when set + <|`1` _SPI_CT_CS1_ ^| r/w + <|`2` _SPI_CT_CS2_ ^| r/w + <|`3` _SPI_CT_CS3_ ^| r/w + <|`4` _SPI_CT_CS4_ ^| r/w + <|`5` _SPI_CT_CS5_ ^| r/w + <|`6` _SPI_CT_CS6_ ^| r/w + <|`7` _SPI_CT_CS7_ ^| r/w + <|`8` _SPI_CT_EN_ ^| r/w <| SPI enable + <|`9` _SPI_CT_CPHA_ ^| r/w <| polarity of `spi_sck_o` when idle + <|`10` _SPI_CT_PRSC0_ ^| r/w .3+| 3-bit clock prescaler select + <|`11` _SPI_CT_PRSC1_ ^| r/w + <|`12` _SPI_CT_PRSC2_ ^| r/w + <|`14` _SPI_CT_SIZE0_ ^| r/w .2+<| transfer size (`00`=8-bit, `01`=16-bit, `10`=24-bit, `11`=32-bit) + <|`15` _SPI_CT_SIZE1_ ^| r/w + <|`31` _SPI_CT_BUSY_ ^| r/- <| transmission in progress when set +| `0xffffffac` | _SPI_DATA_ |`31:0` | r/w | receive/transmit data, LSB-aligned +|======================= Index: datasheet/soc_sysinfo.adoc =================================================================== --- datasheet/soc_sysinfo.adoc (nonexistent) +++ datasheet/soc_sysinfo.adoc (revision 60) @@ -0,0 +1,67 @@ +<<< +:sectnums: +==== System Configuration Information Memory (SYSINFO) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_sysinfo.vhd | +| Software driver file(s): | (neorv32.h) | +| Top entity port: | none | +| Configuration generics: | * | most of the top's configuration generics +| CPU interrupts: | none | +|======================= + +**Theory of Operation** + +The SYSINFO allows the application software to determine the setting of most of the processor's top entity +generics that are related to processor/SoC configuration. All registers of this unit are read-only. + +This device is always implemented – regardless of the actual hardware configuration. The bootloader as well +as the NEORV32 software runtime environment require information from this device (like memory layout +and default clock speed) for correct operation. + +.SYSINFO register map +[cols="<2,<4,<7"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Function +| `0xffffffe0` | _SYSINFO_CLK_ | clock speed in Hz (via top's _CLOCK_FREQUENCY_ generic) +| `0xffffffe4` | _SYSINFO_USER_CODE_ | custom user code, assigned via top's _USER_CODE_ generic +| `0xffffffe8` | _SYSINFO_FEATURES_ | specific hardware configuration (see next table) +| `0xffffffec` | _SYSINFO_CACHE_ | cache configuration information (see next table) +| `0xfffffff0` | _SYSINFO_ISPACE_BASE_ | instruction address space base (defined via `ispace_base_c` constant in the `neorv32_package.vhd` file) +| `0xfffffff4` | _SYSINFO_IMEM_SIZE_ | internal IMEM size in bytes (defined via top's _MEM_INT_IMEM_SIZE_ generic) +| `0xfffffff8` | _SYSINFO_DSPACE_BASE_ | data address space base (defined via `sdspace_base_c` constant in the `neorv32_package.vhd` file) +| `0xfffffffc` | _SYSINFO_DMEM_SIZE_ | internal DMEM size in bytes (defined via top's _MEM_INT_DMEM_SIZE_ generic) +|======================= + + +._SYSINFO_FEATURES_ bits +[cols="^1,<10,<11"] +[options="header",grid="all"] +|======================= +| Bit | Name [C] | Function +| `0` | _SYSINFO_FEATURES_BOOTLOADER_ | set if the processor-internal bootloader is implemented (via top's _BOOTLOADER_EN_ generic) +| `1` | _SYSINFO_FEATURES_MEM_EXT_ | set if the external Wishbone bus interface is implemented (via top's _MEM_EXT_EN_ generic) +| `2` | _SYSINFO_FEATURES_MEM_INT_IMEM_ | set if the processor-internal DMEM implemented (via top's _MEM_INT_DMEM_EN_ generic) +| `3` | _SYSINFO_FEATURES_MEM_INT_IMEM_ROM_ | set if the processor-internal IMEM is read-only (via top's _MEM_INT_IMEM_ROM_ generic) +| `4` | _SYSINFO_FEATURES_MEM_INT_DMEM_ | set if the processor-internal IMEM is implemented (via top's _MEM_INT_IMEM_EN_ generic) +| `5` | _SYSINFO_FEATURES_MEM_EXT_ENDIAN_ | set if external bus interface uses BIG-endian byte-order (via package's `xbus_big_endian_c` constant) +| `6` | _SYSINFO_FEATURES_ICACHE_ | set if processor-internal instruction cache is implemented (via _ICACHE_EN_ generic) +| `14` | _SYSINFO_FEATURES_HW_RESET_ | set if on-chip debugger implemented (via _ON_CHIP_DEBUGGER_EN_ generic) +| `15` | _SYSINFO_FEATURES_HW_RST_ | set if a dedicated hardware reset of all core registers is implemented (via package's _dedicated_reset_c_ constant) +| `15` | _SYSINFO_FEATURES_HW_RST_ | set if a dedicated hardware reset of all core registers is implemented (via package's _dedicated_reset_c_ constant) +| `16` | _SYSINFO_FEATURES_IO_GPIO_ | set if the GPIO is implemented (via top's _IO_GPIO_EN_ generic) +| `17` | _SYSINFO_FEATURES_IO_MTIME_ | set if the MTIME is implemented (via top's _IO_MTIME_EN_ generic) +| `18` | _SYSINFO_FEATURES_IO_UART0_ | set if the primary UART0 is implemented (via top's _IO_UART0_EN_ generic) +| `19` | _SYSINFO_FEATURES_IO_SPI_ | set if the SPI is implemented (via top's _IO_SPI_EN_ generic) +| `20` | _SYSINFO_FEATURES_IO_TWI_ | set if the TWI is implemented (via top's _IO_TWI_EN_ generic) +| `21` | _SYSINFO_FEATURES_IO_PWM_ | set if the PWM is implemented (via top's _IO_PWM_EN_ generic) +| `22` | _SYSINFO_FEATURES_IO_WDT_ | set if the WDT is implemented (via top's _IO_WDT_EN_ generic) +| `23` | _SYSINFO_FEATURES_IO_CFS_ | set if the custom functions subsystem is implemented (via top's _IO_CFS_EN_ generic) +| `24` | _SYSINFO_FEATURES_IO_TRNG_ | set if the TRNG is implemented (via top's _IO_TRNG_EN_ generic) +| `25` | _SYSINFO_FEATURES_IO_NCO_ | set if the NCO is implemented (via top's _IO_NCO_EN_ generic) +| `26` | _SYSINFO_FEATURES_IO_UART1_ | set if the secondary UART1 is implemented (via top's _IO_UART1_EN_ generic) +| `27` | _SYSINFO_FEATURES_IO_NEOLED_ | set if the NEOLED is implemented (via top's _IO_NEOLED_EN_ generic) +|======================= Index: datasheet/soc_trng.adoc =================================================================== --- datasheet/soc_trng.adoc (nonexistent) +++ datasheet/soc_trng.adoc (revision 60) @@ -0,0 +1,84 @@ +<<< +:sectnums: +==== True Random-Number Generator (TRNG) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_trng.vhd | +| Software driver file(s): | neorv32_trng.c | +| | neorv32_trng.h | +| Top entity port: | none | +| Configuration generics: | _IO_TRNG_EN_ | implement TRNG when _true_ +| CPU interrupts: | none | +|======================= + +**Theory of Operation** + +The NEORV32 true random number generator provides _physical true random numbers_ for your application. +Instead of using a pseudo RNG like a LFSR, the TRNG of the processor uses a simple, straight-forward ring +oscillator as physical entropy source. Hence, voltage and thermal fluctuations are used to provide true +physical random data. + +[NOTE] +The TRNG features a platform independent architecture without FPGA-specific primitives, macros or +attributes. + +**Architecture** + +The NEORV32 TRNG is based on simple ring oscillators, which are implemented as an inverter chain with +an odd number of inverters. A **latch** is used to decouple each individual inverter. Basically, this architecture +is some king of asynchronous LFSR. + +The output of several ring oscillators are synchronized using two registers and are XORed together. The +resulting output is de-biased using a von-Neumann randomness extractor. This de-biased output is further +processed by a simple 8-bit Fibonacci LFSR to improve whitening. After at least 8 clock cycles the state of +the LFSR is sampled and provided as final data output. + +To prevent the synthesis tool from doing logic optimization and thus, removing all but one inverter, the +TRNG uses simple latches to decouple an inverter and its actual output. The latches are reset when the +TRNG is disabled and are enabled one by one by a "real" shift register when the TRNG is activated. This +construct can be synthesized for any FPGA platform. Thus, the NEORV32 TRNG provides a platform +independent architecture. + +**TRNG Configuration** + +The TRNG uses several ring-oscillators, where the next oscillator provides a slightly longer chain (more +inverters) than the one before. This increment is constant for all implemented oscillators. This setup can be +customized by modifying the "Advanced Configuration" constants in the TRNG's VHDL file: + +* The `num_roscs_c` constant defines the total number of ring oscillators in the system. num_inv_start_c +defines the number of inverters used by the first ring oscillators (has to be an odd number). Each additional +ring oscillator provides `num_inv_inc_c` more inverters that the one before (has to be an even number). +* The LFSR-based post-processing can be deactivated using the `lfsr_en_c` constant. The polynomial tap +mask of the LFSR can be customized using `lfsr_taps_c`. + +**Using the TRNG** + +The TRNG features a single register for status and data access. When the _TRNG_CT_EN_ control register bit is +set, the TRNG is enabled and starts operation. As soon as the _TRNG_CT_VALID_ bit is set, the currently +sampled 8-bit random data byte can be obtained from the lowest 8 bits of the TRNG_CT register +(_TRNG_CT_DATA_MSB_ : _TRNG_CT_DATA_LSB_). The _TRNG_CT_VALID_ bit is automatically cleared +when reading the control register. + +[IMPORTANT] +The TRNG needs at least 8 clock cycles to generate a new random byte. During this sampling time +the current output random data is kept stable in the output register until a valid sampling of the new byte has +completed. + +Randomness "Quality" +I have not verified the quality of the generated random numbers (for example using NIST test suites). The +quality is highly effected by the actual configuration of the TRNG and the resulting FPGA mapping/routing. +However, generating larger histograms of the generated random number shows an equal distribution (binary +average of the random numbers = 127). A simple evaluation test/demo program can be found in +`sw/example/demo_trng`. + +.TRNG register map +[cols="<2,<2,<4,^1,<7"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bit(s), Name [C] | R/W | Function +.3+<| `0xffffff88` .3+<| _TRNG_CT_ <|`7:0` _TRNG_CT_DATA_MSB_ : _TRNG_CT_DATA_MSB_ ^| r/- <| 8-bit random data output + <|`30` _TRNG_CT_EN_ ^| r/w <| TRNG enable + <|`31` _TRNG_CT_VALID_ ^| r/- <| random data output is valid when set +|======================= Index: datasheet/soc_twi.adoc =================================================================== --- datasheet/soc_twi.adoc (nonexistent) +++ datasheet/soc_twi.adoc (revision 60) @@ -0,0 +1,84 @@ +<<< +:sectnums: +==== Two-Wire Serial Interface Controller (TWI) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_twi.vhd | +| Software driver file(s): | neorv32_twi.c | +| | neorv32_twi.h | +| Top entity port: | `twi_sda_io` | 1-bit bi-directional serial data +| | `twi_scl_io` | 1-bit bi-directional serial clock +| Configuration generics: | _IO_TWI_EN_ | implement TWI controller when _true_ +| CPU interrupts: | fast IRQ channel 7 | transmission done interrupt (see <<_processor_interrupts>>) +|======================= + +**Theory of Operation** + +The two wire interface – also called "I²C" – is a quite famous interface for connecting several on-board +components. Since this interface only needs two signals (the serial data line `twi_sda_io` and the serial +clock line `twi_scl_io`) – despite of the number of connected devices – it allows easy interconnections of +several peripheral nodes. + +The NEORV32 TWI implements a **TWI controller**. It features "clock stretching" (if enabled via the control +register), so a slow peripheral can halt the transmission by pulling the SCL line low. Currently, **no multi-controller +support** is available. Also, the NEORV32 TWI unit cannot operate in peripheral mode. + +The TWI is enabled via the _TWI_CT_EN_ bit in the _TWI_CT_ control register. The user program can start / stop a +transmission by issuing a START or STOP condition. These conditions are generated by setting the +according bits (_TWI_CT_START_ or _TWI_CT_STOP_) in the control register. + +Data is send by writing a byte to the _TWI_DATA_ register. Received data can also be read from this +register. The TWI controller is busy (transmitting data or performing a START or STOP condition) as long as the +_TWI_CT_BUSY_ bit in the control register is set. + +An accessed peripheral has to acknowledge each transferred byte. When the _TWI_CT_ACK_ bit is set after a +completed transmission, the accessed peripheral has send an acknowledge. If it is cleared after a +transmission, the peripheral has send a not-acknowledge (NACK). The NEORV32 TWI controller can also +send an ACK by itself ("controller acknowledge _MACK_") after a transmission by pulling SDA low during the +ACK time slot. Set the _TWI_CT_MACK_ bit to activate this feature. If this bit is cleared, the ACK/NACK of the +peripheral is sampled in this time slot instead (normal mode). + +In summary, the following independent TWI operations can be triggered by the application program: + +* send START condition (also as REPEATED START condition) +* send STOP condition +* send (at least) one byte while also sampling one byte from the bus + +[IMPORTANT] +The serial clock (SCL) and the serial data (SDA) lines can only be actively driven low by the +controller. Hence, external pull-up resistors are required for these lines. + +The TWI clock frequency is defined via the 3-bit _TWI_CT_PRSCx_ clock prescaler. The following prescalers +are available: + +.TWI prescaler configuration +[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] +[options="header",grid="rows"] +|======================= +| **`TWI_CT_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` +| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 +|======================= + +Based on the _TWI_CT_PRSCx_ configuration, the actual TWI clock frequency f~SCL~ is derived from the processor main clock f~main~ and is determined by: + +_**f~SCL~**_ = _f~main~[Hz]_ / (4 * `clock_prescaler`) + +.TWI register map +[cols="<2,<2,<4,^1,<7"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bit(s), Name [C] | R/W | Function +.10+<| `0xffffffb0` .10+<| _TWI_CT_ <|`0` _TWI_CT_EN_ ^| r/w <| TWI enable + <|`1` _TWI_CT_START_ ^| r/w <| generate START condition + <|`2` _TWI_CT_STOP_ ^| r/w <| generate STOP condition + <|`3` _TWI_CT_PRSC0_ ^| r/w .3+<| 3-bit clock prescaler select + <|`4` _TWI_CT_PRSC1_ ^| r/w + <|`5` _TWI_CT_PRSC2_ ^| r/w + <|`6` _TWI_CT_MACK_ ^| r/w <| generate controller ACK for each transmission ("MACK") + <|`7` _TWI_CT_CKSTEN_ ^| r/w <| allow clock-stretching by peripherals when set + <|`30` _TWI_CT_ACK_ ^| r/- <| ACK received when set + <|`31` _TWI_CT_BUSY_ ^| r/- <| transfer/START/STOP in progress when set +| `0xffffffb4` | _TWI_DATA_ |`7:0` _TWI_DATA_MSB_ : TWI_DATA_LSB_ | r/w | receive/transmit data +|======================= Index: datasheet/soc_uart.adoc =================================================================== --- datasheet/soc_uart.adoc (nonexistent) +++ datasheet/soc_uart.adoc (revision 60) @@ -0,0 +1,216 @@ +<<< +:sectnums: +==== Primary Universal Asynchronous Receiver and Transmitter (UART0) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_uart.vhd | +| Software driver file(s): | neorv32_uart.c | +| | neorv32_uart.h | +| Top entity port: | `uart0_txd_o` | serial transmitter output UART0 +| | `uart0_rxd_i` | serial receiver input UART0 +| | `uart0_rts_o` | flow control: RX ready to receive +| | `uart0_cts_i` | flow control: TX allowed to send +| Configuration generics: | _IO_UART0_EN_ | implement UART0 when _true_ +| CPU interrupts: | fast IRQ channel 2 | RX done interrupt +| | fast IRQ channel 3 | TX done interrupt (see <<_processor_interrupts>>) +|======================= + +[IMPORTANT] +Please note that ALL default example programs and software libraries of the NEORV32 software +framework (including the bootloader and the runtime environment) use the primary UART +(_UART0_) as default user console interface. For compatibility, all C-language function calls to +`neorv32_uart_*` are mapped to the according primary UART (_UART0_) `neorv32_uart0_*` +functions. + +**Theory of Operation** + +In most cases, the UART is a standard interface used to establish a communication channel between the +computer/user and an application running on the processor platform. The NEORV32 UARTs features a +standard configuration frame configuration: 8 data bits, an optional parity bit (even or odd) and 1 stop bit. +The parity and the actual Baudrate are configurable by software. + +The UART0 is enabled by setting the _UART_CT_EN_ bit in the UART control register _UART0_CT_. The actual +transmission Baudrate (like 19200) is configured via the 12-bit _UART_CT_BAUDxx_ baud prescaler (`baud_rate`) and the +3-bit _UART_CT_PRSCx_ clock prescaler. + +.UART prescaler configuration +[cols="<4,^1,^1,^1,^1,^1,^1,^1,^1"] +[options="header",grid="rows"] +|======================= +| **`UART_CT_PRSCx`** | `0b000` | `0b001` | `0b010` | `0b011` | `0b100` | `0b101` | `0b110` | `0b111` +| Resulting `clock_prescaler` | 2 | 4 | 8 | 64 | 128 | 1024 | 2048 | 4096 +|======================= + +_**Baudrate**_ = (_f~main~[Hz]_ / `clock_prescaler`) / (`baud_rate` + 1) + +A new transmission is started by writing the data byte to be send to the lowest byte of the _UART0_DATA_ register. The +transfer is completed when the _UART_CT_TX_BUSY_ control register flag returns to zero. A new received byte +is available when the _UART_DATA_AVAIL_ flag of the UART0_DATA register is set. A "frame error" in a received byte +(broken stop bit) is indicated via the _UART_DATA_FERR_ flag in the UART0_DATA register. + +**RX Double-Buffering** + +The UART receive engine provides a simple data buffer with two entries. These two entries are transparent +for the user. The transmitting device can send up to 2 chars to the UART without risking data loss. If another +char is sent before at least one char has been read from the buffer data loss occurs. This situation can be +detected via the receiver overrun flag _UART_DATA_OVERR_ in the _UART0_DATA_ register. The flag is +automatically cleared after reading _UART0_DATA_. + +**Parity Modes** + +The parity flag is added if the _UART_CT_PMODE1_ flag is set. When _UART_CT_PMODE0_ is zero the UART +operates in "even parity" mode. If this flag is set, the UART operates in "odd parity" mode. Parity errors in +received data are indicated via the _UART_DATA_PERR_ flag in the _UART_DATA_ registers. This flag is updated with each new +received character. A frame error in the received data (i.e. stop bit is not set) is indicated via the +_UART_DATA_FERR_ flag in the _UART0_DATA_. This flag is also updated with each new received character + +**Hardware Flow Control – RTS/CTS** + +The UART supports hardware flow control using the standard CTS (clear to send) and/or RTS (ready to send +/ ready to receive "RTR") signals. Both hardware control flow mechanisms can be individually enabled. + +If **RTS hardware flow control** is enabled by setting the _UART_CT_RTS_EN_ control register flag, the UART +will pull the `uart0_rts_o` signal low if the UART's receiver is idle and no received data is waiting to get read by +application software. As long as this signal is low the connected device can send new data. `uart0_rts_o` is always LOW if the UART is disabled. + +The RTS line is de-asserted (going high) as soon as the start bit of a new incoming char has been +detected. The transmitting device continues sending the current char and can also send another char +(due to the RX double-buffering), which is done by most terminal programs. Any additional data send +when RTS is still asserted will override the RX input buffer causing data loss. This will set the _UART_DATA_OVERR_ flag in the +_UART0_DATA_ register. Any read access to this register clears the flag again. + +If **CTS hardware flow control** is enabled by setting the _UART_CT_CTS_EN_ control register flag, the UART's +transmitter will not start sending a new char until the `uart0_cts_i` signal goes low. If a new data to be +send is written to the UART data register while `uart0_cts_i` is not asserted (=low), the UART will wait for +`uart0_cts_i` to become asserted (=high) before sending starts. During this time, the UART busy flag +_UART_CT_TX_BUSY_ remains set. + +If `uart0_cts_i` is asserted, no new data transmission will be started by the UART. The state of the `uart0_cts_i` +signals has no effect on a transmission being already in progress. + +Signal changes on `uart0_cts_i` during an active transmission are ignored. Application software can check +the current state of the `uart0_cts_o` input signal via the _UART_CT_CTS_ control register flag. + +[TIP] +Please note that – just like the RXD and TXD signals – the RTS and CTS signals have to be **cross**-coupled +between devices. + +**Interrupts** + +The UART features two interrupts: the "TX done interrupt" is triggered when a transmit operation (sending) has finished. The "RX +done interrupt" is triggered when a data byte has been received. If the UART0 is not implemented, the UART0 interrupts are permanently tied to zero. + +[NOTE] +The UART's RX interrupt is always triggered when a new data word has arrived – regardless of the +state of the RX double-buffer. + +**Simulation Mode** + +The default UART0 operation will transmit any data written to the _UART0_DATA_ register via the serial TX line at +the defined baud rate. Even though the default testbench provides a simulated UART0 receiver, which +outputs any received char to the simulator console, such a transmission takes a lot of time. To accelerate +UART0 output during simulation (and also to dump large amounts of data for further processing like +verification) the UART0 features a **simulation mode**. + +The simulation mode is enabled by setting the _UART_CT_SIM_MODE_ bit in the UART0's control register +_UART0_CT_. Any other UART0 configuration bits are irrelevant, but the UART0 has to be enabled via the +_UART_CT_EN_ bit. When the simulation mode is enabled, any written char to _UART0_DATA_ (bits 7:0) is +directly output as ASCII char to the simulator console. Additionally, all text is also stored to a text file +`neorv32.uart0.sim_mode.text.out` in the simulation home folder. Furthermore, the whole 32-bit word +written to _UART0_DATA_ is stored as plain 8-char hexadecimal value to a second text file +`neorv32.uart0.sim_mode.data.out` also located in the simulation home folder. + +If the UART is configured for simulation mode there will be **NO physical UART0 transmissions via +`uart0_txd_o`** at all. Furthermore, no interrupts (RX done or TX done) will be triggered in any situation. + +[TIP] +More information regarding the simulation-mode of the UART0 can be found in section <<_simulating_the_processor>>. + +.UART0 register map +[cols="<6,<7,<10,^2,<18"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bit(s), Name [C] | R/W | Function +.12+<| `0xffffffa0` .12+<| _UART0_CT_ <|`11:0` _UART_CT_BAUDxx_ ^| r/w <| 12-bit BAUD value configuration value + <|`12` _UART_CT_SIM_MODE_ ^| r/w <| enable **simulation mode** + <|`20` _UART_CT_RTS_EN_ ^| r/w <| enable RTS hardware flow control + <|`21` _UART_CT_CTS_EN_ ^| r/w <| enable CTS hardware flow control + <|`22` _UART_CT_PMODE0_ ^| r/w .2+<| parity bit enable and configuration (`00`/`01`= no parity; `10`=even parity; `11`=odd parity) + <|`23` _UART_CT_PMODE1_ ^| r/w + <|`24` _UART_CT_PRSC0_ ^| r/w .3+<| 3-bit baudrate clock prescaler select + <|`25` _UART_CT_PRSC1_ ^| r/w + <|`26` _UART_CT_PRSC2_ ^| r/w + <|`27` _UART_CT_CTS_ ^| r/- <| current state of UART's CTS input signal + <|`28` _UART_CT_EN_ ^| r/w <| UART enable + <|`31` _UART_CT_TX_BUSY_ ^| r/- <| trasmitter busy flag +.6+<| `0xffffffa4` .6+<| _UART0_DATA_ <|`7:0` _UART_DATA_MSB_ : _UART_DATA_LSB_ ^| r/w <| receive/transmit data (8-bit) + <|`31:0` - ^| -/w <| **simulation data output** + <|`28` _UART_DATA_PERR_ ^| r/- <| RX parity error + <|`29` _UART_DATA_FERR_ ^| r/- <| RX data frame error (stop bit nt set) + <|`30` _UART_DATA_OVERR_ ^| r/- <| RX data overrun + <|`31` _UART_DATA_AVAIL_ ^| r/- <| RX data available when set +|======================= + + + +<<< +// #################################################################################################################### +:sectnums: +==== Secondary Universal Asynchronous Receiver and Transmitter (UART1) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_uart.vhd | +| Software driver file(s): | neorv32_uart.c | +| | neorv32_uart.h | +| Top entity port: | `uart1_txd_o` | serial transmitter output UART1 +| | `uart1_rxd_i` | serial receiver input UART1 +| | `uart1_rts_o` | flow control: RX ready to receive +| | `uart1_cts_i` | flow control: TX allowed to send +| Configuration generics: | _IO_UART1_EN_ | implement UART1 when _true_ +| CPU interrupts: | fast IRQ channel 4 | RX done interrupt +| | fast IRQ channel 5 | TX done interrupt (see <<_processor_interrupts>>) +|======================= + +**Theory of Operation** + +The secondary UART (UART1) is functional identical to the primary UART (<<_primary_universal_asynchronous_receiver_and_transmitter_uart0>>). +Obviously, UART1 has different addresses for +thw control register (_UART1_CT_) and the data register (_UART1_DATA_) – see the register map below. However, the +register bits/flags use the same bit positions and naming. Furthermore, the "RX done" and "TX done" interrupts are +mapped to different CPU fast interrupt channels. + +**Simulation Mode** + +The secondary UART (UART1) provides the same simulation options as the primary UART. However, +output data is written to UART1-specific files: `neorv32.uart1.sim_mode.text.out` is used to store +plain ASCII text and `neorv32.uart1.sim_mode.data.out` is used to store full 32-bit hexadecimal +encoded data words. + +.UART1 register map +[cols="<6,<7,<10,^2,<18"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bit(s), Name [C] | R/W | Function +.12+<| `0xffffffd0` .12+<| _UART1_CT_ <|`11:0` _UART_CT_BAUDxx_ ^| r/w <| 12-bit BAUD value configuration value + <|`12` _UART_CT_SIM_MODE_ ^| r/w <| enable **simulation mode** + <|`20` _UART_CT_RTS_EN_ ^| r/w <| enable RTS hardware flow control + <|`21` _UART_CT_CTS_EN_ ^| r/w <| enable CTS hardware flow control + <|`22` _UART_CT_PMODE0_ ^| r/w .2+<| parity bit enable and configuration (`00`/`01`= no parity; `10`=even parity; `11`=odd parity) + <|`23` _UART_CT_PMODE1_ ^| r/w + <|`24` _UART_CT_PRSC0_ ^| r/w .3+<| 3-bit baudrate clock prescaler select + <|`25` _UART_CT_PRSC1_ ^| r/w + <|`26` _UART_CT_PRSC2_ ^| r/w + <|`27` _UART_CT_CTS_ ^| r/- <| current state of UART's CTS input signal + <|`28` _UART_CT_EN_ ^| r/w <| UART enable + <|`31` _UART_CT_TX_BUSY_ ^| r/- <| trasmitter busy flag +.6+<| `0xffffffd4` .6+<| _UART1_DATA_ <|`7:0` _UART_DATA_MSB_ : _UART_DATA_LSB_ ^| r/w <| receive/transmit data (8-bit) + <|`31:0` - ^| -/w <| **simulation data output** + <|`28` _UART_DATA_PERR_ ^| r/- <| RX parity error + <|`29` _UART_DATA_FERR_ ^| r/- <| RX data frame error (stop bit nt set) + <|`30` _UART_DATA_OVERR_ ^| r/- <| RX data overrun + <|`31` _UART_DATA_AVAIL_ ^| r/- <| RX data available when set +|======================= Index: datasheet/soc_wdt.adoc =================================================================== --- datasheet/soc_wdt.adoc (nonexistent) +++ datasheet/soc_wdt.adoc (revision 60) @@ -0,0 +1,69 @@ +<<< +:sectnums: +==== Watchdog Timer (WDT) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_wdt.vhd | +| Software driver file(s): | neorv32_wdt.c | +| | neorv32_wdt.h | +| Top entity port: | none | +| Configuration generics: | _IO_WDT_EN_ | implement GPIO port when _true_ +| CPU interrupts: | FIRQ channel 0 | watchdog timer overflow (see <<_processor_interrupts>>) +|======================= + +**Theory of Operation** + +The watchdog (WDT) provides a last resort for safety-critical applications. The WDT has an internal 20-bit +wide counter that needs to be reset every now and then by the user program. If the counter overflows, either +a system reset or an interrupt is generated (depending on the configured operation mode). + +Configuration of the watchdog is done by a single control register _WDT_CT_. The watchdog is enabled by +setting the _WDT_CT_EN_ bit. The clock used to increment the internal counter is selected via the 3-bit +_WDT_CT_CLK_SELx_ prescaler: + +[cols="^3,^3,>4"] +[options="header",grid="rows"] +|======================= +| **`WDT_CT_CLK_SELx`** | Main clock prescaler | Timeout period in clock cycles +| `0b000` | 2 | 2 097 152 +| `0b001` | 4 | 4 194 304 +| `0b010` | 8 | 8 388 608 +| `0b011` | 64 | 67 108 864 +| `0b100` | 128 | 134 217 728 +| `0b101` | 1024 | 1 073 741 824 +| `0b110` | 2048 | 2 147 483 648 +| `0b111` | 4096 | 4 294 967 296 +|======================= + +Whenever the internal timer overflows the watchdog executes one of two possible actions: Either a hard +processor reset is triggered or an interrupt is requested at CPU's fast interrupt channel #0. The +WDT_CT_MODE bit defines the action to be taken on an overflow: When cleared, the Watchdog will trigger an +IRQ, when set the WDT will cause a system reset. The configured actions can also be triggered manually at +any time by setting the _WDT_CT_FORCE_ bit. The watchdog is reset by setting the _WDT_CT_RESET_ bit. + +The cause of the last action of the watchdog can be determined via the _WDT_CT_RCAUSE_ flag. If this flag is +zero, the processor has been reset via the external reset signal. If this flag is set the last system reset was +initiated by the watchdog. + +The Watchdog control register can be locked in order to protect the current configuration. The lock is +activated by setting bit _WDT_CT_LOCK_. In the locked state any write access to the configuration flags is +ignored (see table below, "accessible if locked"). Read accesses to the control register are not effected. The +lock can only be removed by a system reset (via external reset signal or via a watchdog reset action). + +.WDT register map +[cols="<2,<2,<4,^1,^2,<4"] +[options="header",grid="all"] +|======================= +| Address | Name [C] | Bit(s), Name [C] | R/W | Writable if locked | Function +.9+<| `0xffffff8c` .9+<| _WDT_CT_ <|`0` _WDT_CT_EN_ ^| r/w ^| no <| watchdog enable + <|`1` _WDT_CT_CLK_SEL0_ ^| r/w ^| no .3+<| 3-bit clock prescaler select + <|`2` _WDT_CT_CLK_SEL1_ ^| r/w ^| no + <|`3` _WDT_CT_CLK_SEL2_ ^| r/w ^| no + <|`4` _WDT_CT_MODE_ ^| r/w ^| no <| overflow action: `1`=reset, `0`=IRQ + <|`5` _WDT_CT_RCAUSE_ ^| r/- ^| - <| cause of last system reset: `0`=caused by external reset signal, `1`=caused by watchdog + <|`6` _WDT_CT_RESET_ ^| -/w ^| yes <| watchdog reset when set, auto-clears + <|`7` _WDT_CT_FORCE_ ^| -/w ^| yes <| force configured watchdog action when set, auto-clears + <|`8` _WDT_CT_LOCK_ ^| r/w ^| no <| lock access to configuration when set, clears only on system reset (via external reset signal OR watchdog reset action = reset) +|======================= Index: datasheet/soc_wishbone.adoc =================================================================== --- datasheet/soc_wishbone.adoc (nonexistent) +++ datasheet/soc_wishbone.adoc (revision 60) @@ -0,0 +1,158 @@ +<<< +:sectnums: +==== Processor-External Memory Interface (WISHBONE) (AXI4-Lite) + +[cols="<3,<3,<4"] +[frame="topbot",grid="none"] +|======================= +| Hardware source file(s): | neorv32_wishbone.vhd | +| Software driver file(s): | none | _implicitly used_ +| Top entity port: | `wb_tag_o` | request tag output (3-bit) +| | `wb_adr_o` | address output (32-bit) +| | `wb_dat_i` | data input (32-bit) +| | `wb_dat_o` | data output (32-bit) +| | `wb_we_o` | write enable (1-bit) +| | `wb_sel_o` | byte enable (4-bit) +| | `wb_stb_o` | strobe (1-bit) +| | `wb_cyc_o` | valid cycle (1-bit) +| | `wb_lock_o` | exclusive access request (1-bit) +| | `wb_ack_i` | acknowledge (1-bit) +| | `wb_err_i` | bus error (1-bit) +| | `fence_o` | an executed `fence` instruction +| | `fencei_o` | an executed `fence.i` instruction +| Configuration generics: | _MEM_EXT_EN_ | enable external memory interface when _true_ +| | _MEM_EXT_TIMEOUT_ | number of clock cycles after which an unacknowledged external bus access will auto-terminate (0 = disabled) +| Configuration constants in VHDL package file `neorv32_package.vhd`: | `wb_pipe_mode_c` | when _false_ (default): classic/standard Wishbone protocol; when _true_: pipelined Wishbone protocol +| | `xbus_big_endian_c` | byte-order (Endianness) of external memory interface; true=BIG, false=little (default) +| CPU interrupts: | none | +|======================= + +The external memory interface uses the Wishbone interface protocol. The external interface port is available +when the _MEM_EXT_EN_ generic is _true_. This interface can be used to attach external memories, custom +hardware accelerators additional IO devices or all other kinds of IP blocks. All memory accesses from the +CPU, that do not target the internal bootloader ROM, the internal IO region or the internal data/instruction +memories (if implemented at all) are forwarded to the Wishbone gateway and thus to the external memory +interface. + +[TIP] +When using the default processor setup, all access addresses between 0x00000000 and +0xffff0000 (= beginning of processor-internal BOOT ROM) are delegated to the external memory +/ bus interface if they are not targeting the (actually enabled/implemented) processor-internal +instruction memory (IMEM) or the (actually enabled/implemented) processor-internal data memory +(DMEM). See section <<_address_space>> for more information. + +**Wishbone Bus Protocol** + +The external memory interface either uses **standard** ("classic") Wishbone transactions (default) or +**pipelined** Wishbone transactions. The transaction protocol is configured via the wb_pipe_mode_c constant +in the in the main VHDL package file (`rtl/neorv32_package.vhd`): + +[source,vhdl] +---- +-- (external) bus interface -- +constant wb_pipe_mode_c : boolean := false; +---- + +When `wb_pipe_mode_c` is disabled, all bus control signals including _STB_ are active (and stable) until the +transfer is acknowledged/terminated. If `wb_pipe_mode_c` is enabled, all bus control except _STB_ are active +(and stable) until the transfer is acknowledged/terminated. In this case, _STB_ is active only during the very +first bus clock cycle. + +.Exemplary Wishbone bus accesses using "classic" and "pipelined" protocol +[cols="^2,^2"] +[grid="none"] +|======================= +a| image::wishbone_classic_read.png[700,300] +a| image::wishbone_pipelined_write.png[700,300] +| **Classic** Wishbone read access | **Pipelined** Wishbone write access +|======================= + + +[TOP] +A detailed description of the implemented Wishbone bus protocol and the according interface signals +can be found in the data sheet "Wishbone B4 – WISHBONE System-on-Chip (SoC) Interconnection +Architecture for Portable IP Cores". A copy of this document can be found in the docs folder of this +project. + +**Interface Latency** + +The Wishbone gateway introduces two additional latency cycles: Processor-outgoing and -incoming signals +are fully registered. Thus, any access from the CPU to a processor-external devices requires +2 clock cycles. + +**Bus Access Timeout** + +The Wishbone bus interface provides an option to configure a bus access timeout counter. The _MEM_EXT_TIMEOUT_ +top generic is used to specify the _maximum_ time (in clock cycles) a bus access can be pending before it is automatically +terminated. If _MEM_EXT_TIMEOUT_ is set to zero, the timeout disabled an a bus access can take an arbitrary number of cycles to complete. + +When _MEM_EXT_TIMEOUT_ is greater than zero, the WIshbone adapter starts an internal countdown whenever the CPU +accesses a memory address via the external memory interface. If the accessed memory / device does not acknowledge (via `wb_ack_i`) +or terminate (via `wb_err_i`) the transfer within _MEM_EXT_TIMEOUT_ clock cycles, the bus access is automatically canceled +(setting `wb_cyc_o` low again) and a load/store/instruction fetch bus access fault exception is raised. + +[TIP] +This feature can be used as **safety guard** if the external memory system does not check for "address space holes". That means that addresses, which +do not belong to a certain memory or device, do not permanently stall the processor due to an unacknowledged/unterminated bus access. If the external +memory system can guarantee to access **any** bus access (even it targets an unimplemented address) the timeout feature should be disabled +(_MEM_EXT_TIMEOUT_ = 0). + +**Wishbone Tag** + +The 3-bit wishbone `wb_tag_o` signal provides additional information regarding the access type. This signal +is compatible to the AXI4 _AxPROT_ signal. + +* `wb_tag_o(0)` 1: privileged access (CPU is in machine mode); 0: unprivileged access +* `wb_tag_o(1)` always zero (indicating "secure access") +* `wb_tag_o(2)` 1: instruction fetch access, 0: data access + +**Exclusive / Atomic Bus Access** + +If the atomic memory access CPU extension (via _CPU_EXTENSION_RISCV_A_) is enabled, the CPU can +request an atomic/exclusive bus access via the external memory interface. + +The load-reservate instruction (`lr.w`) will set the `wb_lock_o` signal telling the bus interconnect to establish a +reservation for the current accessed address (start of an exclusive access). This signal will stay asserted until +another memory access instruction is executed (for example a `sc.w`). + +The memory system has to make sure that no other entity can access the reservated address until `wb_lock_o` +is released again. If this attempt fails, the memory system has to assert `wb_err_i` in order to indicate that the +reservation was broken. + +[TIP] +See section <<_bus_interface>> for the CPU bus interface protocol. + +**Endianness** + +The NEORV32 CPU and the Processor setup are *little-endian* architectures. To allow direct connection +to a big-endian memory system the external bus interface provides an _Endianness configuration_. The +Endianness (of the external memory interface) can be configured via the global `xbus_big_endian_c` +constant in the main VHDL package file (`rtl/neorv32_package.vhd`). By default, the external memory +interface uses little-endian byte-order. + +[source,vhdl] +---- +-- (external) bus interface -- +constant xbus_big_endian_c : boolean := true; +---- + +Application software can check the Endianness configuration of the external bus interface via the +_SYSINFO_FEATURES_MEM_EXT_ENDIAN_ flag in the processor's SYSINFO module (see section +<<_system_configuration_information_memory_sysinfo>> for more information). + +**AXI4-Lite Connectivity** + +The AXI4-Lite wrapper (`rtl/top_templates/neorv32_top_axi4lite.vhd`) provides a Wishbone-to- +AXI4-Lite bridge, compatible with Xilinx Vivado (IP packager and block design editor). All entity signals of +this wrapper are of type _std_logic_ or _std_logic_vector_, respectively. + +The AXI Interface has been verified using Xilinx Vivado IP Packager and Block Designer. The AXI +interface port signals are automatically detected when packaging the core. + +.Example AXI SoC using Xilinx Vivado +image::neorv32_axi_soc.png[] + +[WARNING] +Using the auto-termination timeout feature (_MEM_EXT_TIMEOUT_ greater than zero) is **not AXI4 compliant** as the AXI protocol does not support canceling of +bus transactions. Therefore, the NEORV32 top wrapper with AXI4-Lite interface (`rtl/top_templates/neorv32_top_axi4lite`) configures _MEM_EXT_TIMEOUT_ = 0 by default. + + Index: datasheet/software.adoc =================================================================== --- datasheet/software.adoc (nonexistent) +++ datasheet/software.adoc (revision 60) @@ -0,0 +1,610 @@ +:sectnums: +== Software Framework + +To make actual use of the NEORV32 processor, the project comes with a complete software eco-system. This +ecosystem is based on the RISC-V port of the GCC GNU Compiler Collection and consists of the following elementary parts: + +[cols="<6,<4"] +[grid="none"] +|======================= +| Application/bootloader start-up code | `sw/common/crt0.S` +| Application/bootloader linker script | `sw/common/neorv32.ld` +| Core hardware driver libraries | `sw/lib/include/` & `sw/lib/source/` +| Makefiles | e.g. `sw/example/blink_led/makefile` +| Auxiliary tool for generating NEORV32 executables | `sw/image_gen/` +| Default bootloader | `sw/bootloader/bootloader.c` +|======================= + +Last but not least, the NEORV32 ecosystem provides some example programs for testing the hardware, for +illustrating the usage of peripherals and for general getting in touch with the project (`sw/example`). + +// #################################################################################################################### +:sectnums: +=== Compiler Toolchain + +The toolchain for this project is based on the free RISC-V GCC-port. You can find the compiler sources and +build instructions on the official RISC-V GNU toolchain GitHub page: https://github.com/riscv/riscv-gnutoolchain. + +The NEORV32 implements a 32-bit base integer architecture (`rv32i`) and a 32-bit integer and soft-float ABI +(ilp32), so make sure you build an according toolchain. + +Alternatively, you can download my prebuilt `rv32i/e` toolchains for 64-bit x86 Linux from: https://github.com/stnolting/riscv-gcc-prebuilt + +The default toolchain prefix used by the project's makefiles is (can be changed in the makefiles): **`riscv32-unknown-elf`** + +[TIP] +More information regarding the toolchain (building from scratch or downloading the prebuilt ones) +can be found in section <<_toolchain_setup>>. + + + +<<< +// #################################################################################################################### +:sectnums: +=== Core Libraries + +The NEORV32 project provides a set of C libraries that allows an easy usage of the processor/CPU features. +Just include the main NEORV32 library file in your application's source file(s): + +[source,c] +---- +#include +---- + +Together with the makefile, this will automatically include all the processor's header files located in +`sw/lib/include` into your application. The actual source files of the core libraries are located in +`sw/lib/source` and are automatically included into the source list of your software project. The following +files are currently part of the NEORV32 core library: + +[cols="<3,<4,<8"] +[options="header",grid="rows"] +|======================= +| C source file | C header file | Description +| - | `neorv32.h` | main NEORV32 definitions and library file +| `neorv32_cfs.c` | `neorv32_cfs.h` | HW driver (stub)footnote:[This driver file only represents a stub, since the real CFS drivers are defined by the actual CFS implementation.] functions for the custom functions subsystem +| `neorv32_cpu.c` | `neorv32_cpu.h` | HW driver functions for the NEORV32 **CPU** +| `neorv32_gpio.c` | `neorv32_gpio.h` | HW driver functions for the **GPIO** +| - | `neorv32_intrinsics.h` | macros for custom intrinsics/instructions +| `neorv32_mtime.c` | `neorv32_mtime.h` | HW driver functions for the **MTIME** +| `neorv32_nco.c` | `neorv32_nco.h` | HW driver functions for the **NCO** +| `neorv32_neoled.c` | `neorv32_neoled.h` | HW driver functions for the **NEOLED** +| `neorv32_pwm.c` | `neorv32_pwm.h` | HW driver functions for the **PWM** +| `neorv32_rte.c` | `neorv32_rte.h` | NEORV32 **runtime environment** and helpers +| `neorv32_spi.c` | `neorv32_spi.h` | HW driver functions for the **SPI** +| `neorv32_trng.c` | `neorv32_trng.h` | HW driver functions for the **TRNG** +| `neorv32_twi.c` | `neorv32_twi.h` | HW driver functions for the **TWI** +| `neorv32_uart.c` | `neorv32_uart.h` | HW driver functions for the **UART0** and **UART1** +| `neorv32_wdt.c` | `neorv32_wdt.h` | HW driver functions for the **WDT** +|======================= + +.Documentation +[TIP] +All core library software sources are highly documented using _doxygen_. See section <>. +The documentation is automatically built and deployed to GitHub pages by the CI workflow (:https://stnolting.github.io/neorv32/sw/files.html). + + + + +<<< +// #################################################################################################################### +:sectnums: +=== Application Makefile + +Application compilation is based on **GNU makefiles**. Each project in the `sw/example` folder features +a makefile. All these makefiles are identical. When creating a new project, copy an existing project folder or +at least the makefile to your new project folder. I suggest to create new projects also in `sw/example` to keep +the file dependencies. Of course, these dependencies can be manually configured via makefiles variables +when your project is located somewhere else. + +Before you can use the makefiles, you need to install the RISC-V GCC toolchain. Also, you have to add the +installation folder of the compiler to your system's `PATH` variable. More information can be found in chapter +<<_lets_get_it_started>>. + +The makefile is invoked by simply executing make in your console: + +[source,bash] +---- +neorv32/sw/example/blink_led$ make +---- + +:sectnums: +==== Targets + +Just executing `make` will show the help menu showing all available targets. The following targets are +available: + +[cols="<3,<15"] +[grid="none"] +|======================= +| `help` | Show a short help text explaining all available targets. +| `check` | Check the compiler toolchain. You should run this target at least once after installing the toolchain. +| `info` | Show the makefile configuration (see next chapter). +| `exe` | Compile all sources and generate application executable for upload via bootloader. +| `install` | Compile all sources, generate executable (via exe target) for upload via bootloader and generate and install IMEM VHDL initialization image file `rtl/core/neorv32_application_image.vhd`. +| `all` | Execute `exe` and `install`. +| `clean` | Remove all generated files in the current folder. +| `clean_all` | Remove all generated files in the current folder and also removes the compiled core libraries and the compiled image generator tool. +| `bootloader` | Compile all sources, generate executable and generate and install BOOTROM VHDL initialization image file `rtl/core/neorv32_bootloader_image.vhd`. This target modifies the ROM origin and length in the linker script by setting the `make_bootloader` define. +| `upload` | Upload NEORV32 executable to the bootloader via serial port +|======================= + +[TIP] +An assembly listing file (`main.asm`) is created by the compilation flow for further analysis or debugging purpose. + +:sectnums: +==== Configuration + +The compilation flow is configured via variables right at the beginning of the makefile: + +[source,makefile] +---- +# ***************************************************************************** +# USER CONFIGURATION +# ***************************************************************************** +# User's application sources (*.c, *.cpp, *.s, *.S); add additional files here +APP_SRC ?= $(wildcard ./*.c) $(wildcard ./*.s) $(wildcard ./*.cpp) $(wildcard ./*.S) +# User's application include folders (don't forget the '-I' before each entry) +APP_INC ?= -I . +# User's application include folders - for assembly files only (don't forget the '-I' before each +entry) +ASM_INC ?= -I . +# Optimization +EFFORT ?= -Os +# Compiler toolchain +RISCV_TOOLCHAIN ?= riscv32-unknown-elf +# CPU architecture and ABI +MARCH ?= -march=rv32i +MABI ?= -mabi=ilp32 +# User flags for additional configuration (will be added to compiler flags) +USER_FLAGS ?= +# Serial port for executable upload via bootloer +COM_PORT ?= /dev/ttyUSB0 +# Relative or absolute path to the NEORV32 home folder +NEORV32_HOME ?= ../../.. +# ***************************************************************************** +---- + +[cols="<3,<10"] +[grid="none"] +|======================= +| _APP_SRC_ | The source files of the application (`*.c`, `*.cpp`, `*.S` and `*.s` files are allowed; file of these types in the project folder are automatically added via wildcards). Additional files can be added; separated by white spaces +| _APP_INC_ | Include file folders; separated by white spaces; must be defined with `-I` prefix +| _ASM_INC_ | Include file folders that are used only for the assembly source files (`*.S`/`*.s`). +| _EFFORT_ | Optimization level, optimize for size (`-Os`) is default; legal values: `-O0`, `-O1`, `-O2`, `-O3`, `-Os` +| _RISCV_TOOLCHAIN_ | The toolchain prefix to be used; follows the naming convention "architecture-vendor-output" +| _MARCH_ | The targetd RISC-V architecture/ISA. Only `rv32` is supported by the NEORV32. Enable compiler support of optional CPU extension by adding the according extension letter (e.g. `rv32im` for _M_ CPU extension). See section <<_enabling_risc_v_cpu_extensions>>. +| _MABI_ | The default 32-bit integer ABI. +| _USER_FLAGS_ | Additional flags that will be forwarded to the compiler tools +| _NEORV32_HOME_ | Relative or absolute path to the NEORV32 project home folder. Adapt this if the makefile/project is not in the project's `sw/example folder`. +| _COM_PORT_ | Default serial port for executable upload to bootloader. +|======================= + +:sectnums: +==== Default Compiler Flags + +The following default compiler flags are used for compiling an application. These flags are defined via the +`CC_OPTS` variable. Custom flags can be appended via the `USER_FLAGS` variable to the `CC_OPTS` variable. + +[cols="<3,<9"] +[grid="none"] +|======================= +| `-Wall` | Enable all compiler warnings. +| `-ffunction-sections` | Put functions and data segment in independent sections. This allows a code optimization as dead code and unused data can be easily removed. +| `-nostartfiles` | Do not use the default start code. The makefiles use the NEORV32-specific start-up code instead (`sw/common/crt0.S`). +| `-Wl,--gc-sections` | Make the linker perform dead code elimination. +| `-lm` | Include/link with `math.h`. +| `-lc` | Search for the standard C library when linking. +| `-lgcc` | Make sure we have no unresolved references to internal GCC library subroutines. +| `-mno-fdiv` | Use builtin software functions for floating-point divisions and square roots (since the according instructions are not supported yet). +| `-falign-functions=4` .4+| Force a 32-bit alignment of functions and labels (branch/jump/call targets). This increases performance as it simplifies instruction fetch when using the C extension. As a drawback this will also slightly increase the program code. +| `-falign-labels=4` +| `-falign-loops=4` +| `-falign-jumps=4` +|======================= + +[TIP] +The makefile configuration variables can be (re-)defined directly when invoking the makefile. For +example: `$ make MARCH=-march=rv32ic clean_all exe` + + + +<<< +// #################################################################################################################### +:sectnums: +=== Executable Image Format + +When all the application sources have been compiled and linked, a final executable file has to be generated. +For this purpose, the makefile uses the NEORV32-specific linker script `sw/common/neorv32.ld`. This linker script defines three memory sections: +`rom`, `ram` and `iodev`. These sections have specific access attributes: Read access (`r`), write access (`w`) and executable (`x`). + +.Linker memory sections +[cols="<2,^1,<7"] +[options="header",grid="rows"] +|======================= +| Memory section | Attributes | Description +| `rom` | `rx` | Instruction memory (IMEM) **OR** bootloader ROM +| `ram` | `rwx` | Data memory (DMEM) +| `iodev` | `rw` | Memory-mapped IO/peripheral devices +|======================= + +The `iodev` section is reserved for processor-internal memory-mapped IO and peripheral devices. The linker does not use this section at all +and just passes the start and end adresses of this section to the start-up code `crt0.S` (see next section). + +[NOTE] +The `rom` region is used to place the instructions of "normal" applications. If the bootloader is being compiled, the makefile defines the `make_bootloader` +symbol, which changes the _ORIGIN_ (base address) and _LENGTH_ (size) attributes of the `rom` region according to the BOOTROM definitions. + +The linker maps all the regions from the compiled object files into only four final sections: `.text`, `.rodata`, `.data` and `.bss` +using the specified memory section. These four regions contain everything required for the application to run: + +.Executable regions +[cols="<1,<9"] +[options="header",grid="rows"] +|======================= +| Region | Description +| `.text` | Executable instructions generated from the start-up code and all application sources. +| `.rodata` | Constants (like strings) from the application; also the initial data for initialized variables. +| `.data` | This section is required for the address generation of fixed (= global) variables only. +| `.bss` | This section is required for the address generation of dynamic memory constructs only. +|======================= + +The `.text` and `.rodata` sections are mapped to processor's instruction memory space and the `.data` and +`.bss` sections are mapped to the processor's data memory space. Finally, the `.text`, `.rodata` and `.data` sections are extracted and concatenated into a single file +**`main.bin`**. + +**Executable Image Generator** + +The **`main.bin`** file is processed by the NEORV32 image generator (`sw/image_gen`) to generate the final +executable. It is automatically compiled when invoking the makefile. The image generator can generate three +types of executables, selected by a flag when calling the generator: + +[cols="<1,<9"] +[grid="none"] +|======================= +| `-app_bin` | Generates an executable binary file `neorv32_exe.bin` (for UART uploading via the bootloader). +| `-app_img` | Generates an executable VHDL memory initialization image for the processor-internal IMEM. This option generates the `rtl/core/neorv32_application_image.vhd` file. +| `-bld_img` | Generates an executable VHDL memory initialization image for the processor-internal BOOT ROM. This option generates the `rtl/core/neorv32_bootloader_image.vhd` file. +|======================= + +All these options are managed by the makefile – so you don't actually have to think about them. The normal +application compilation flow will generate the `neorv32_exe.bin` file in the current software project folder +ready for upload via UART to the NEORV32 bootloader. + +The actual executable provides a very small header consisting of three 32-bit words located right at the +beginning of the file. This header is generated by the image generator. The first word of the executable is the signature +word and is always `0x4788cafe`. Based on this word, the bootloader can identify a valid image file. The next word represents the size in bytes of the actual program +image in bytes. A simple "complement" checksum of the actual program image is given by the third word. This +provides a simple protection against data transmission or storage errors. + + +=== Start-Up Code (crt0) + +The CPU (and also the processor) requires a minimal start-up and initialization code o bring the CPU (and the SoC) into a stable and initialized state before the +acutal application can be executed. This start-up code is located in `sw/common/crt0.S` and is automatically linked with _every_ application program. +The `crt0.S` is directly executed right after a reset and performs the following operations: + +* Initialize integer registers `x1 - x31` (or `x1 - x15` when using the `E` CPU extension) to a defined value. +* Initialize all CPU core CSRs and also install a default "dummy" trap handler for _all_ traps. +* Initialize the global pointer `gp` and the stack pointer `sp` according to the `.data` segment layout provided by the linker script. +* Clear IO area: Write zero to all memory-mapped registers within the IO region (`iodev` section). If certain devices have not been implemented, a bus access fault exception will occur. This exception is captured by the dummy trap handler. +* Clear the `.bss` section defined by the linker script. +* Copy read-only data from the `.text` section to the `.data` section to set initialized variables. +* Call the application's `main` function (with no arguments: `argc` = `argv` = 0). +* If the `main` function returns, the processor goes to an endless sleep mode (using a simple loop or via the `wfi` instruction if available). + + +<<< +// #################################################################################################################### +:sectnums: +=== Bootloader + +The default bootloader (sw/bootloader/bootloader.c) of the NEORV32 processor allows to upload +new program executables at every time. If there is an external SPI flash connected to the processor (like the +FPGA's configuration memory), the bootloader can store the program executable to it. After reset, the +bootloader can directly boot from the flash without any user interaction. + +[WARNING] +The bootloader is only implemented when the BOOTLOADER_EN generic is true and requires the +CSR access CPU extension (CPU_EXTENSION_RISCV_Zicsr generic is true). + +[IMPORTANT] +The bootloader requires the primary UART (UART0) for user interaction (_IO_UART0_EN_ generic is _true_). + +[IMPORTANT] +For the automatic boot from an SPI flash, the SPI controller has to be implemented (_IO_SPI_EN_ +generic is _true_) and the machine system timer MTIME has to be implemented (_IO_MTIME_EN_ +generic is _true_), too, to allow an auto-boot timeout counter. + +[WARNING] +The bootloader is intended to work independent of the actual hardware (-configuration). Hence, it +should be compiled with the minimal base ISA only. The current version of the bootloader uses the +`rv32i` ISA – so it will not work on `rv32e` architectures. To make the bootloader work on an embedded +CPU configuration or on any other more sophisticated configuration, recompile it using the according ISA +(see section <<_customizing_the_internal_bootloader>>). + +To interact with the bootloader, connect the primary UART (UART0) signals (`uart0_txd_o` and +`uart0_rxd_o`) of the processor's top entity via a serial port (-adapter) to your computer (hardware flow control is +not used so the according interface signals can be ignored.), configure your +terminal program using the following settings and perform a reset of the processor. + +Terminal console settings (`19200-8-N-1`): + +* 19200 Baud +* 8 data bits +* no parity bit +* 1 stop bit +* newline on `\r\n` (carriage return, newline) +* no transfer protocol / control flow protocol - just the raw byte stuff + +The bootloader uses the LSB of the top entity's `gpio_o` output port as high-active status LED (all other +output pin are set to low level by the bootloader). After reset, this LED will start blinking at ~2Hz and the +following intro screen should show up in your terminal: + +[source] +---- +<< NEORV32 Bootloader >> + +BLDV: Mar 23 2021 +HWV: 0x01050208 +CLK: 0x05F5E100 +USER: 0x10000DE0 +MISA: 0x40901105 +ZEXT: 0x00000023 +PROC: 0x0EFF0037 +IMEM: 0x00004000 bytes @ 0x00000000 +DMEM: 0x00002000 bytes @ 0x80000000 + +Autoboot in 8s. Press key to abort. +---- + +This start-up screen also gives some brief information about the bootloader and several system configuration parameters: + +[cols="<2,<15"] +[grid="none"] +|======================= +| `BLDV` | Bootloader version (built date). +| `HWV` | Processor hardware version (from the `mimpid` CSR) in BCD format (example: `0x01040606` = v1.4.6.6). +| `USER` | Custom user code (from the _USER_CODE_ generic). +| `CLK` | Processor clock speed in Hz (via the SYSINFO module, from the _CLOCK_FREQUENCY_ generic). +| `MISA` | CPU extensions (from the `misa` CSR). +| `ZEXT` | CPU sub-extensions (from the `mzext` CSR) +| `PROC` | Processor configuration (via the SYSINFO module, from the IO_* and MEM_* configuration generics). +| `IMEM` | IMEM memory base address and size in byte (from the _MEM_INT_IMEM_SIZE_ generic). +| `DMEM` | DMEM memory base address and size in byte (from the _MEM_INT_DMEM_SIZE_ generic). +|======================= + +Now you have 8 seconds to press any key. Otherwise, the bootloader starts the auto boot sequence. When +you press any key within the 8 seconds, the actual bootloader user console starts: + +[source] +---- +<< NEORV32 Bootloader >> + +BLDV: Mar 23 2021 +HWV: 0x01050208 +CLK: 0x05F5E100 +USER: 0x10000DE0 +MISA: 0x40901105 +ZEXT: 0x00000023 +PROC: 0x0EFF0037 +IMEM: 0x00004000 bytes @ 0x00000000 +DMEM: 0x00002000 bytes @ 0x80000000 + +Autoboot in 8s. Press key to abort. +Aborted. + +Available commands: +h: Help +r: Restart +u: Upload +s: Store to flash +l: Load from flash +e: Execute +CMD:> +---- + +The auto-boot countdown is stopped and now you can enter a command from the list to perform the +corresponding operation: + +* `h`: Show the help text (again) +* `r`: Restart the bootloader and the auto-boot sequence +* `u`: Upload new program executable (`neorv32_exe.bin`) via UART into the instruction memory +* `s`: Store executable to SPI flash at `spi_csn_o(0)` +* `l`: Load executable from SPI flash at `spi_csn_o(0)` +* `e`: Start the application, which is currently stored in the instruction memory (IMEM) +* `#`: Shortcut for executing u and e afterwards (not shown in help menu) + +A new executable can be uploaded via UART by executing the `u` command. After that, the executable can be directly +executed via the `e` command. To store the recently uploaded executable to an attached SPI flash press `s`. To +directly load an executable from the SPI flash press `l`. The bootloader and the auto-boot sequence can be +manually restarted via the `r` command. + +[TIP] +The CPU is in machine level privilege mode after reset. When the bootloader boots an application, +this application is also started in machine level privilege mode. + +:sectnums: +==== External SPI Flash for Booting + +If you want the NEORV32 bootloader to automatically fetch and execute an application at system start, you +can store it to an external SPI flash. The advantage of the external memory is to have a non-volatile program +storage, which can be re-programmed at any time just by executing some bootloader commands. Thus, no +FPGA bitstream recompilation is required at all. + +**SPI Flash Requirements** + +The bootloader can access an SPI compatible flash via the processor top entity's SPI port and connected to +chip select `spi_csn_o(0)`. The flash must be capable of operating at least at 1/8 of the processor's main +clock. Only single read and write byte operations are used. The address has to be 24 bit long. Furthermore, +the SPI flash has to support at least the following commands: + +* READ (`0x03`) +* READ STATUS (`0x05`) +* WRITE ENABLE (`0x06`) +* PAGE PROGRAM (`0x02`) +* SECTOR ERASE (`0xD8`) +* READ ID (`0x9E`) + +Compatible (FGPA configuration) SPI flash memories are for example the "Winbond W25Q64FV2 or the "Micron N25Q032A". + +**SPI Flash Configuration** + +The base address `SPI_FLASH_BOOT_ADR` for the executable image inside the SPI flash is defined in the +"user configuration" section of the bootloader source code (`sw/bootloader/bootloader.c`). Most +FPGAs that use an external configuration flash, store the golden configuration bitstream at base address 0. +Make sure there is no address collision between the FPGA bitstream and the application image. You need to +change the default sector size if your flash has a sector size greater or less than 64kB: + +[source,c] +---- +/** SPI flash boot image base address */ +#define SPI_FLASH_BOOT_ADR 0x00800000 +/** SPI flash sector size in bytes */ +#define SPI_FLASH_SECTOR_SIZE (64*1024) +---- + +[IMPORTANT] +For any change you made inside the bootloader, you have to recompile the bootloader (see section +<<_customizing_the_internal_bootloader>>) and do a new synthesis of the processor. + + +:sectnums: +==== Auto Boot Sequence +When you reset the NEORV32 processor, the bootloader waits 8 seconds for a user console input before it +starts the automatic boot sequence. This sequence tries to fetch a valid boot image from the external SPI +flash, connected to SPI chip select `spi_csn_o(0)`. If a valid boot image is found and can be successfully +transferred into the instruction memory, it is automatically started. If no SPI flash was detected or if there +was no valid boot image found, the bootloader stalls and the status LED is permanently activated. + + +:sectnums: +==== Bootloader Error Codes + +If something goes wrong during bootloader operation, an error code is shown. In this case the processor +stalls, a bell command and one of the following error codes are send to the terminal, the bootloader status +LED is permanently activated and the system must be reset manually. + +[cols="<2,<13"] +[grid="rows"] +|======================= +| **`ERROR_0`** | If you try to transfer an invalid executable (via UART or from the external SPI flash), this error message shows up. There might be a transfer protocol configuration error in the terminal program. See section <<_uploading_and_starting_of_a_binary_executable_image_via_uart>> for more information. Also, if no SPI flash was found during an auto-boot attempt, this message will be displayed. +| **`ERROR_1`** | Your program is way too big for the internal processor’s instructions memory. Increase the memory size or reduce (optimize!) your application code. +| **`ERROR_2`** | This indicates a checksum error. Something went wrong during the transfer of the program image (upload via UART or loading from the external SPI flash). If the error was caused by a UART upload, just try it again. When the error was generated during a flash access, the stored image might be corrupted. +| **`ERROR_3`** | This error occurs if the attached SPI flash cannot be accessed. Make sure you have the right type of flash and that it is properly connected to the NEORV32 SPI port using chip select #0. +| **`ERROR_4`** | The instruction memory is marked as read-only. Set the _MEM_INT_IMEM_ROM_ generic to _false_ to allow write accesses. +| **`ERROR_5`** | This error pops up when an unexpected exception or interrupt was triggered. The cause of the trap (`mcause` CSR) is displayed for further investigation. This might be caused if an ISA extension is used that has not been synthesized. +| **`ERROR_?`** | Something really bad happened when there is no specific error code available :( +|======================= + + + +<<< +// #################################################################################################################### +:sectnums: +=== NEORV32 Runtime Environment + +The NEORV32 provides a minimal runtime environment (RTE) that takes care of a stable +and _safe_ execution environment by handling _all_ traps (including interrupts). + +[NOTE] +Using the RTE is **optional**. The RTE provides a simple and comfortable way of delegating traps while making sure that all traps (even though they are not +explicitly used by the application) are handled correctly. Performance-optimized applications or embedded operating systems should not use the RTE for delegating traps. + +When execution enters the application's `main` function, the actual runtime environment is responsible for catching all implemented exceptions +and interrupts. To activate the NEORV32 RTE execute the following function: + +[source,c] +---- +void neorv32_rte_setup(void); +---- + +This setup initializes the `mtvec` CSR, which provides the base entry point for all trap +handlers. The address stored to this register reflects the first-level exception handler provided by the +NEORV32 RTE. Whenever an exception or interrupt is triggered, this first-level handler is called. + +The first-level handler performs a complete context save, analyzes the source of the exception/interrupt and +calls the according second-level exception handler, which actually takes care of the exception/interrupt +handling. For this, the RTE manages a private look-up table to store the addresses of the according trap +handlers. + +After the initial setup of the RTE, each entry in the trap handler's look-up table is initialized with a debug +handler, that outputs detailed hardware information via the **primary UART (UART0)** when triggered. This +is intended as a fall-back for debugging or for accidentally-triggered exceptions/interrupts. +For instance, an illegal instruction exception catched by the RTE debug handler might look like this in the UART0 output: + +[source] +---- + Illegal instruction @0x000002d6, MTVAL=0x00001537 +---- + +To install the **actual application's trap handlers** the NEORV32 RTE provides functions for installing and +un-installing trap handler for each implemented exception/interrupt source. + +[source,c] +---- +int neorv32_rte_exception_install(uint8_t id, void (*handler)(void)); +---- + +[cols="<5,<12"] +[options="header",grid="rows"] +|======================= +| ID name [C] | Description / trap causing entry +| `RTE_TRAP_I_MISALIGNED` | instruction address misaligned +| `RTE_TRAP_I_ACCESS` | instruction (bus) access fault +| `RTE_TRAP_I_ILLEGAL` | illegal instruction +| `RTE_TRAP_BREAKPOINT` | breakpoint (`ebreak` instruction) +| `RTE_TRAP_L_MISALIGNED` | load address misaligned +| `RTE_TRAP_L_ACCESS` | load (bus) access fault +| `RTE_TRAP_S_MISALIGNED` | store address misaligned +| `RTE_TRAP_S_ACCESS` | store (bus) access fault +| `RTE_TRAP_MENV_CALL` | environment call from machine mode (`ecall` instruction) +| `RTE_TRAP_UENV_CALL` | environment call from user mode (`ecall` instruction) +| `RTE_TRAP_MTI` | machine timer interrupt +| `RTE_TRAP_MEI` | machine external interrupt +| `RTE_TRAP_MSI` | machine software interrupt +| `RTE_TRAP_FIRQ_0` : `RTE_TRAP_FIRQ_15` | fast interrupt channel 0..15 +|======================= + +When installing a custom handler function for any of these exception/interrupts, make sure the function uses +**no attributes** (especially no interrupt attribute!), has no arguments and no return value like in the following +example: + +[source,c] +---- +void handler_xyz(void) { + + // handle exception/interrupt... +} +---- + +[WARNING] +Do NOT use the `((interrupt))` attribute for the application exception handler functions! This +will place an `mret` instruction to the end of it making it impossible to return to the first-level +exception handler of the RTE, which will cause stack corruption. + +Example: Installation of the MTIME interrupt handler: + +[source,c] +---- +neorv32_rte_exception_install(EXC_MTI, handler_xyz); +---- + +To remove a previously installed exception handler call the according un-install function from the NEORV32 +runtime environment. This will replace the previously installed handler by the initial debug handler, so even +un-installed exceptions and interrupts are further captured. + +[source,c] +---- +int neorv32_rte_exception_uninstall(uint8_t id); +---- + +Example: Removing the MTIME interrupt handler: + +[source,c] +---- +neorv32_rte_exception_uninstall(EXC_MTI); +---- + +[TIP] +More information regarding the NEORV32 runtime environment can be found in the doxygen +software documentation (also available online at https://stnolting.github.io/neorv32/sw/files.html[GitHub pages]). Index: figures/address_space.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: figures/neorv32_processor.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: icons/important.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: icons/important.png =================================================================== --- icons/important.png (nonexistent) +++ icons/important.png (revision 60)
icons/important.png Property changes : Added: svn:mime-type ## -0,0 +1 ## +application/octet-stream \ No newline at end of property Index: icons/note.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: icons/note.png =================================================================== --- icons/note.png (nonexistent) +++ icons/note.png (revision 60)
icons/note.png Property changes : Added: svn:mime-type ## -0,0 +1 ## +application/octet-stream \ No newline at end of property Index: icons/tip.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: icons/tip.png =================================================================== --- icons/tip.png (nonexistent) +++ icons/tip.png (revision 60)
icons/tip.png Property changes : Added: svn:mime-type ## -0,0 +1 ## +application/octet-stream \ No newline at end of property Index: icons/warning.png =================================================================== Cannot display: file marked as a binary type. svn:mime-type = application/octet-stream Index: icons/warning.png =================================================================== --- icons/warning.png (nonexistent) +++ icons/warning.png (revision 60)
icons/warning.png Property changes : Added: svn:mime-type ## -0,0 +1 ## +application/octet-stream \ No newline at end of property Index: userguide/content.adoc =================================================================== --- userguide/content.adoc (nonexistent) +++ userguide/content.adoc (revision 60) @@ -0,0 +1,1160 @@ +Let's Get It Started! + +To make your NEORV32 project run, follow the guides from the upcoming sections. Follow these guides +step by step and in the presented order. + +:sectnums: +== Toolchain Setup + +There are two possibilities to get the actual RISC-V GCC toolchain: + +1. Download and _build_ the official RISC-V GNU toolchain yourself +2. Download and install a prebuilt version of the toolchain + +[NOTE] +The default toolchain prefix for this project is **`riscv32-unknown-elf`**. Of course you can use any other RISC-V +toolchain (like `riscv64-unknown-elf`) that is capable to emit code for a `rv32` architecture. Just change the _RISCV_TOOLCHAIN_ variable in the application +makefile(s) according to your needs or define this variable when invoking the makefile. + +[IMPORTANT] +Keep in mind that – for instance – a rv32imc toolchain only provides library code compiled with +compressed (_C_) and `mul`/`div` instructions (_M_)! Hence, this code cannot be executed (without +emulation) on an architecture without these extensions! + + +:sectnums: +=== Building the Toolchain from Scratch + +To build the toolchain by yourself you can follow the guide from the official https://github.com/riscv/riscvgnu-toolchain GitHub page. + +The official RISC-V repository uses submodules. You need the `--recursive` option to fetch the submodules +automatically: + +[source,bash] +---- +$ git clone --recursive https://github.com/riscv/riscv-gnu-toolchain +---- + +Download and install the prerequisite standard packages: + +[source,bash] +---- +$ sudo apt-get install autoconf automake autotools-dev curl python3 libmpc-dev libmpfrdev libgmp-dev gawk build-essential bison flex texinfo gperf libtool patchutils bc zlib1g-dev libexpat-dev +---- + +To build the Linux cross-compiler, pick an install path. If you choose, say, `/opt/riscv`, then add +`/opt/riscv/bin` to your `PATH` variable. + +[source,bash] +---- +$ export PATH=$PATH:/opt/riscv/bin +---- + +Then, simply run the following commands and configuration in the RISC-V GNU toolchain source folder to compile a +`rv32i` toolchain: + +[source,bash] +---- +riscv-gnu-toolchain$ ./configure --prefix=/opt/riscv --with-arch=rv32i –-with-abi=ilp32 +riscv-gnu-toolchain$ make +---- + +After a while you will get `riscv32-unknown-elf-gcc` and all of its friends in your `/opt/riscv/bin` folder. + + +:sectnums: +=== Downloading and Installing a Prebuilt Toolchain + +Alternatively, you can download a prebuilt toolchain. + +**Use The Toolchain I have Build** + +I have compiled the toolchain on a 64-bit x86 Ubuntu (Ubuntu on Windows, actually) and uploaded it to +GitHub. You can directly download the according toolchain archive as single _zip-file_ within a packed +release from github.com/stnolting/riscv-gcc-prebuilt. + +Unpack the downloaded toolchain archive and copy the content to a location in your file system (e.g. +`/opt/riscv`). More information about downloading and installing my prebuilt toolchains can be found in +the repository's README. + +**Use a Third Party Toolchain** + +Of course you can also use any other prebuilt version of the toolchain. There are a lot RISC-V GCC packages out there - +even for Windows. + +[IMPORTANT] +Make sure the toolchain can (also) emit code for a `rv32i` architecture, uses the `ilp32` or `ilp32e` ABI and **was not build** using +CPU extensions that are not supported by the NEORV32 (like `D`). + + +:sectnums: +=== Installation + +Now you have the binaries. The last step is to add them to your `PATH` environment variable (if you have not +already done so). Make sure to add the binaries folder (`bin`) of your toolchain. + +[source,bash] +---- +$ export PATH:$PATH:/opt/riscv/bin +---- + +You should add this command to your `.bashrc` (if you are using bash) to automatically add the RISC-V +toolchain at every console start. + +:sectnums: +=== Testing the Installation + +To make sure everything works fine, navigate to an example project in the NEORV32 example folder and +execute the following command: + +[source,bash] +---- +neorv32/sw/example/blink_led$ make check +---- + +This will test all the tools required for the NEORV32. Everything is working fine if "Toolchain check OK" appears at the end. + + + +<<< +// #################################################################################################################### +:sectnums: +== General Hardware Setup + +The following steps are required to generate a bitstream for your FPGA board. If you want to run the +NEORV32 processor in simulation only, the following steps might also apply. + +[TIP] +Check out the example setups in the `boards` folder (@GitHub: https://github.com/stnolting/neorv32/tree/master/boards), which provides script-based +demo projects for various FPGA boars. + +In this tutorial we will use a test implementation of the processor – using many of the processor's optional +modules but just propagating the minimal signals to the outer world. Hence, this guide is intended as +evaluation or "hello world" project to check out the NEORV32. A little note: The order of the following +steps might be a little different for your specific EDA tool. + +[start=0] +. Create a new project with your FPGA EDA tool of choice. +. Add all VHDL files from the project's `rtl/core` folder to your project. Make sure to _reference_ the +files only – do not copy them. +. Make sure to add all the rtl files to a new library called **`neorv32`**. If your FPGA tools does not +provide a field to enter the library name, check out the "properties" menu of the rtl files. +. The `rtl/core/neorv32_top.vhd` VHDL file is the top entity of the NEORV32 processor. If you +already have a design, instantiate this unit into your design and proceed. +. If you do not have a design yet and just want to check out the NEORV32 – no problem! In this guide +we will use a simplified top entity, that encapsulated the actual processor top entity: add the +`rtl/core/top_templates/neorv32_test_setup.vhd` VHDL file to your project too, and +select it as top entity. +. This test setup provides a minimal test hardware setup: + +.NEORV32 "hello world" test setup +image::neorv32_test_setup.png[align=center] + +[start=7] +. This test setup only implements some very basic processor and CPU features. Also, only the +minimum number of signals is propagated to the outer world. Please note that the reset input signal +`rstn_i` is **low-active**. +. The configuration of the NEORV32 processor is done using the generics of the instantiated processor +top entity. Let's keep things simple at first and use the default configuration: + +.Cut-out of `neorv32_test_setup.vhd` showing the processor instance and its configuration +[source,vhdl] +---- +neorv32_top_inst: neorv32_top +generic map ( + -- General -- + CLOCK_FREQUENCY => 100000000, -- in Hz # <1> + BOOTLOADER_EN => true, + USER_CODE => x"00000000", + ... + -- Internal instruction memory -- + MEM_INT_IMEM_EN => true, + MEM_INT_IMEM_SIZE => 16*1024, # <2> + MEM_INT_IMEM_ROM => false, + -- Internal data memory -- + MEM_INT_DMEM_EN => true, + MEM_INT_DMEM_SIZE => 8*1024, # <3> + ... +---- +<1> Clock frequency of `clk_i` in Hertz +<2> Default size of internal instruction memory: 16kB (no need to change that _now_) +<3> Default size of internal data memory: 8kB (no need to change that _now_) + +[start=9] +. There is one generic that has to be set according to your FPGA / board: The clock frequency of the +top's clock input signal (`clk_i`). Use the _CLOCK_FREQUENC_Y generic to specify your clock source's +frequency in Hertz (Hz) (note "1"). +. If you feel like it – or if your FPGA does not provide so many resources – you can modify the +**memory sizes** (_MEM_INT_IMEM_SIZE_ and _MEM_INT_DMEM_SIZE_ – marked with notes "2" and "3") or even +exclude certain ISa extensions and peripheral modules from implementation - but as mentioned above, let's keep things +simple at first and use the standard configuration for now. + +[NOTE] +Keep the internal instruction and data memory sizes in mind – these values are required for setting +up the software framework in the next section <<_general_software_framework_setup>>. + +[start=11] +. Depending on your FPGA tool of choice, it is time to assign the signals of the test setup top entity to +the according pins of your FPGA board. All the signals can be found in the entity declaration: + +.Entity signals of `neorv32_test_setup.vhd` +[source,vhdl] +---- +entity neorv32_test_setup is + port ( + -- Global control -- + clk_i : in std_ulogic := '0'; -- global clock, rising edge + rstn_i : in std_ulogic := '0'; -- global reset, low-active, async + -- GPIO -- + gpio_o : out std_ulogic_vector(7 downto 0); -- parallel output + -- UART0 -- + uart0_txd_o : out std_ulogic; -- UART0 send data + uart0_rxd_i : in std_ulogic := '0' -- UART0 receive data +); +end neorv32_test_setup; +---- + +[start=12] +. Attach the clock input `clk_i` to your clock source and connect the reset line `rstn_i` to a button of +your FPGA board. Check whether it is low-active or high-active – the reset signal of the processor is +**low-active**, so maybe you need to invert the input signal. +. If possible, connected at least bit `0` of the GPIO output port `gpio_o` to a high-active LED (invert +the signal when your LEDs are low-active) - this LED will be used as status LED by the bootloader. +. Finally, connect the primary UART's (UART0) communication signals `uart0_txd_o` and +`uart0_rxd_i` to your serial host interface (USB-to-serial converter). +. Perform the project HDL compilation (synthesis, mapping, bitstream generation). +. Download the generated bitstream into your FPGA ("program" it) and press the reset button (just to +make sure everything is sync). +. Done! If you have assigned the bootloader status LED , it should be +flashing now and you should receive the bootloader start prompt in your UART console (check the baudrate!). + + + +<<< +// #################################################################################################################### +:sectnums: +== General Software Framework Setup + +While your synthesis tool is crunching the NEORV32 HDL files, it is time to configure the project's software +framework for your processor hardware setup. + +[start=1] +. You need to tell the linker the actual size of the processor's instruction and data memories. This has to be always sync +to the *hardware memory configuration* (done in section <<_general_hardware_setup>>). +. Open the NEORV32 linker script `sw/common/neorv32.ld` with a text editor. Right at the +beginning of the linker script you will find the **MEMORY** configuration showing two regions: `rom` and `ram` + +.Cut-out of the linker script `neorv32.ld`: Memory configuration +[source,c] +---- +MEMORY +{ + rom (rx) : ORIGIN = DEFINED(make_bootloader) ? 0xFFFF0000 : 0x00000000, LENGTH = DEFINED(make_bootloader) ? 4*1024 : 16*1024 # <1> + ram (rwx) : ORIGIN = 0x80000000, LENGTH = 8*1024 # <2> +} +---- +<1> Size of internal instruction memory (IMEM): 16kB +<2> Size of internal data memory (DMEM): 8kB + +[WARNING] +The `rom` region provides conditional assignments (via the _make_bootloader_ symbol) for the _origin_ +and the _length_ configuration depending on whether the executable is built as normal application (for the IMEM) or +as bootloader code (for the BOOTROM). To modify the IMEM configuration of the `rom` region, +make sure to **only edit the most right values** for `ORIGIN` and `LENGTH` (marked with notes "1" and "2"). + +[start=3] +. There are four parameters that are relevant here (only the right-most value for the `rom` section): The _origin_ +and the _length_ of the instruction memory (region name `rom`) and the _origin_ and the _length_ of the data +memory (region name `ram`). These four parameters have to be always sync to your hardware memory +configuration as described in section <<_general_hardware_setup>>. + +[IMPORTANT] +The `rom` _ORIGIN_ parameter has to be equal to the configuration of the NEORV32 ispace_base_c +(default: 0x00000000) VHDL package (`rtl/core/neorv32_package.vhd`) configuration constant. The `ram` _ORIGIN_ parameter has to +be equal to the configuration of the NEORV32 `dspace_base_c` (default: 0x80000000) VHDL +package (`rtl/core/neorv32_package.vhd`) configuration constant. + +[IMPORTANT] +The `rom` _LENGTH_ and the `ram` _LENGTH_ parameters have to match the configured memory sizes. For +instance, if the system does not have any external memories connected, the `rom` _LENGTH_ parameter +has to be equal to the processor-internal IMEM size (defined via top's _MEM_INT_IMEM_SIZE_ generic) +and the `ram` _LENGTH_ parameter has to be equal to the processor-internal DMEM size (defined via top's +_MEM_INT_DMEM_SIZE_ generic). + + + +<<< +// #################################################################################################################### +:sectnums: +== Application Program Compilation + +[start=1] +. Open a terminal console and navigate to one of the project's example programs. For instance navigate to the +simple `sw/example_blink_led` example program. This program uses the NEORV32 GPIO unit to display +an 8-bit counter on the lowest eight bit of the `gpio_o` output port. +. To compile the project and generate an executable simply execute: + +[source,bash] +---- +neorv32/sw/example/blink_led$ make exe +---- + +[start=3] +. This will compile and link the application sources together with all the included libraries. At the end, +your application is transformed into an ELF file (`main.elf`). The *NEORV32 image generator* (in `sw/image_gen`) takes this file and creates a +final executable. The makefile will show the resulting memory utilization and the executable size: + +[source,bash] +---- +neorv32/sw/example/blink_led$ make exe +Memory utilization: + text data bss dec hex filename + 852 0 0 852 354 main.elf +Executable (neorv32_exe.bin) size in bytes: +864 +---- + +[start=4] +. That's it. The `exe` target has created the actual executable `neorv32_exe.bin` in the current +folder, which is ready to be uploaded to the processor via the bootloader's UART interface. + +[TIP] +The compilation process will also create a `main.asm` assembly listing file in the project directory, which +shows the actual assembly code of the complete application. + + + +<<< +// #################################################################################################################### +:sectnums: +== Uploading and Starting of a Binary Executable Image via UART + +You have just created the executable. Now it is time to upload it to the processor. There are basically two +options to do so. + +[TIP] +Executables can also be uploaded via the **on-chip debugger**. +See section <<_debugging_with_gdb>> for more information. + +**Option 1** + +The NEORV32 makefiles provide an upload target that allows to directly upload an executable from the +command line. Reset the processor and execute: + +[source,bash] +---- +sw/example/blink_led$ make COM_PORT=/dev/ttyUSB1 upload +---- + +Replace `/dev/ttyUSB1` with the actual serial port you are using to communicate with the processor. You +might have to use `sudo make ...` if the targeted device requires elevated access rights. + + +**Option 2** + +The "better" option is to use a standard terminal program to upload an executable. This provides a more +comfortable way as you can directly interact with the bootloader console. Additionally, using a terminal program +also allows to directly communicate with the uploaded application. + +[start=1] +. Connect the primary UART (UART0) interface of your FPGA board to a serial port of your +computer or use an USB-to-serial adapter. +. Start a terminal program. In this tutorial, I am using TeraTerm for Windows. You can download it from https://ttssh2.osdn.jp/index.html.en + +[WARNING] +Make sure your terminal program can transfer the executable in raw byte mode without any protocol stuff around it. + +[start=3] +. Open a connection to the corresponding srial port. Configure the terminal according to the +following parameters: + +* 19200 Baud +* 8 data bits +* 1 stop bit +* no parity bits +* no transmission/flow control protocol! (just raw byte mode) +* newline on `\r\n` (carriage return & newline) + +[start=4] +. Also make sure, that single chars are transmitted without any consecutive "new line" or "carriage +return" commands (this is highly dependent on your terminal application of choice, TeraTerm only +sends the raw chars by default). +. Press the NEORV32 reset button to restart the bootloader. The status LED starts blinking and the +bootloader intro screen appears in your console. Hurry up and press any key (hit space!) to abort the +automatic boot sequence and to start the actual bootloader user interface console. + +.Bootloader console; aborted auto-boot sequence +[source,bash] +---- +<< NEORV32 Bootloader >> + +BLDV: Mar 23 2021 +HWV: 0x01050208 +CLK: 0x05F5E100 +USER: 0x10000DE0 +MISA: 0x40901105 +ZEXT: 0x00000023 +PROC: 0x0EFF0037 +IMEM: 0x00004000 bytes @ 0x00000000 +DMEM: 0x00002000 bytes @ 0x80000000 + +Autoboot in 8s. Press key to abort. +Aborted. + +Available commands: +h: Help +r: Restart +u: Upload +s: Store to flash +l: Load from flash +e: Execute +CMD:> +---- + +[start=6] +. Execute the "Upload" command by typing `u`. Now the bootloader is waiting for a binary executable +to be send. + +[source,bash] +---- +CMD:> u +Awaiting neorv32_exe.bin... +---- + +[start=7] +. Use the "send file" option of your terminal program to transmit the previously generated binary executable `neorv32_exe.bin`. +. Again, make sure to transmit the executable in raw binary mode (no transfer protocol, no additional +header stuff). When using TeraTerm, select the "binary" option in the send file dialog. +. If everything went fine, OK will appear in your terminal: + +[source,bash] +---- +CMD:> u +Awaiting neorv32_exe.bin... OK +---- + +[start=10] +. The executable now resides in the instruction memory of the processor. To execute the program right +now run the "Execute" command by typing `e`: + +[source,bash] +---- +CMD:> u +Awaiting neorv32_exe.bin... OK +CMD:> e +Booting... +Blinking LED demo program +---- + +[start=11] +. Now you should see the LEDs counting. + + + +<<< +// #################################################################################################################### +:sectnums: +== Setup of a New Application Program Project + +Done with all the introduction tutorials and those example programs? Then it is time to start your own +application project! + +[start=1] +. The easiest way of creating a *new* project is to make a copy of an *existing* project (like the +`blink_led` project) inside the `sw/example` folder. By this, all file dependencies are kept and you can +start coding and compiling. +. If you want to place the project folder somewhere else you need to adapt the project's makefile. In +the makefile you will find a variable that keeps the relative or absolute path to the NEORV32 home +folder. Just modify this variable according to your new project's home location: + +[source,makefile] +---- +# Relative or absolute path to the NEORV32 home folder (use default if not set by user) +NEORV32_HOME ?= ../../.. +---- + +[start=3] +. If your project contains additional source files outside of the project folder, you can add them to the _APP_SRC_ variable: + +[source,makefile] +---- +# User's application sources (add additional files here) +APP_SRC = $(wildcard *.c) ../somewhere/some_file.c +---- + +[start=4] +. You also need to add the folder containing the include files of your new project to the _APP_INC variable_ (do not forget the `-I` prefix): + +[source,makefile] +---- +# User's application include folders (don't forget the '-I' before each entry) +APP_INC = -I . -I ../somewhere/include_stuff_folder +---- + +[start=5] +. If you feel like it, you can change the default optimization level: + +[source,makefile] +---- +# Compiler effort +EFFORT = -Os +---- + +[TIP] +All the assignments made to the makefile variable can also be done "inline" when invoking the makefile. For example: `$make EFFORT=-Os clean_all exe` + + + + +<<< +// #################################################################################################################### +:sectnums: +== Enabling RISC-V CPU Extensions + +Whenever you enable/disable a RISC-V CPU extensions via the according _CPU_EXTENSION_RISCV_x_ generic, you need to +adapt the toolchain configuration so the compiler can actually generate according code for it. + +To do so, open the makefile of your project (for example `sw/example/blink_led/makefile`) and scroll to the +"USER CONFIGURATION" section right at the beginning of the file. You need to modify the _MARCH_ variable and eventually +the _MABI_ variable according to your CPU hardware configuration. + +[source,makefile] +---- +# CPU architecture and ABI +MARCH = -march=rv32i # <1> +MABI = -mabi=ilp32 # <2> +---- +<1> MARCH = Machine architecture ("ISA string") +<2> MABI = Machine binary interface + +For example when you enable the RISC-V `C` extension (16-bit compressed instructions) via the _CPU_EXTENSION_RISCV_C_ generic (set _true_) you need +to add the 'c' extension also to the _MARCH_ ISA string. + +You can also override the default _MARCH_ and _MABI_ configurations from the makefile when invoking the makefile: + +[source,bash] +---- +$ make MARCH=-march=rv32ic clean_all all +---- + +[NOTE] +The RISC-V ISA string (for _MARCH_) follows a certain canonical structure: +`rev32[i/e][m][a][f][d][g][q][c][b][v][n]...` For example `rv32imac` is valid while `rv32icma` is not valid. + + + + +<<< +// #################################################################################################################### +:sectnums: +== Building a Non-Volatile Application without External Boot Memory + +The primary purpose of the bootloader is to allow an easy and fast update of the current application. In particular, this is very handy +during the development stage of a project as you can upload modified programs at any time via the UART. +Maybe at some time your project has become mature and you want to actually _embed_ your processor +including the application. + +There are two options to provide _non-volatile_ storage of your application. The simplest (but also most constrained) one is to implement the IMEM +as true ROM to contain your program. The second option is to use an external boot memory - this concept is shown in a different section: +<<_programming_an_external_spi_flash_via_the_bootloader>>. + +Using the IMEM as ROM: + +* for this boot concept the bootloader is no longer required +* this concept only works for the internal IMEM (but can be extended to work with external memories coupled via the processor's bus interface) +* make sure that the memory components (like block RAM) the IMEM is mapped to support an initialization via the bitstream + +[start=1] +. At first, compile your application code by running the `make install` command: + +[source,bash] +---- +neorv32/sw/example/blink_led$ make compile +Memory utilization: + text data bss dec hex filename + 852 0 0 852 354 main.elf +Executable (neorv32_exe.bin) size in bytes: +864 +Installing application image to ../../../rtl/core/neorv32_application_image.vhd +---- + +[start=2] +. The `install` target has created an executable, too, but this time also in the form of a VHDL memory +initialization file. during synthesis, this initialization will become part of the final FPGA bitstream, which +in terms initializes the IMEM's memory primitives. +. To allow a direct boot of this image without interference of the bootloader you _can_ deactivate the implementation of +the bootloader via the according top entity's generic: + +[source,vhdl] +---- +BOOTLOADER_EN => false, -- implement processor-internal bootloader? # <1> +---- +<1> Set to _false_ to make the CPU directly boot from the IMEM. In this case the BOOTROM is discarded from the design. + +[start=4] +. When the bootloader is deactivated, the according module (BOOTROM) is removed from the design and the CPU will start booting +at the base address of the instruction memory space (IMEM base address) making the CPU directly executing your +application after reset. +. The IMEM could be still modified, since it is implemented as RAM by default, which might corrupt your +executable. To prevent this and to implement the IMEM as true ROM (and eventually saving some +more hardware resources), active the "IMEM as ROM" feature using the processor's according top entity +generic: + +[source,vhdl] +---- +MEM_INT_IMEM_ROM => true, -- implement processor-internal instruction memory as ROM +---- + +[start=6] +. Perform a new synthesis and upload your bitstream. Your application code now resides unchangeable +in the processor's IMEM and is directly executed after reset. + + + + +<<< +// #################################################################################################################### +:sectnums: +== Customizing the Internal Bootloader + +The bootloader provides several configuration options to customize it for your specific applications. The +most important user-defined configuration options are available as C `#defines` right at the beginning of the +bootloader source code `sw/bootloader/bootloader.c`): + +.Cut-out from the bootloader source code `bootloader.c`: configuration parameters +[source,c] +---- +/** UART BAUD rate */ +#define BAUD_RATE (19200) +/** Enable auto-boot sequence if != 0 */ +#define AUTOBOOT_EN (1) +/** Time until the auto-boot sequence starts (in seconds) */ +#define AUTOBOOT_TIMEOUT 8 +/** Set to 0 to disable bootloader status LED */ +#define STATUS_LED_EN (1) +/** SPI_DIRECT_BOOT_EN: Define/uncomment to enable SPI direct boot */ +//#define SPI_DIRECT_BOOT_EN +/** Bootloader status LED at GPIO output port */ +#define STATUS_LED (0) +/** SPI flash boot image base address (warning! address might wrap-around!) */ +#define SPI_FLASH_BOOT_ADR (0x00800000) +/** SPI flash chip select line at spi_csn_o */ +#define SPI_FLASH_CS (0) +/** Default SPI flash clock prescaler */ +#define SPI_FLASH_CLK_PRSC (CLK_PRSC_8) +/** SPI flash sector size in bytes (default = 64kb) */ +#define SPI_FLASH_SECTOR_SIZE (64*1024) +/** ASCII char to start fast executable upload process */ +#define FAST_UPLOAD_CMD '#' +---- + +**Changing the Default Size of the Bootloader ROM** + +The NEORV32 default bootloader uses 4kB of storage. This is also the default size of the BOOTROM memory component. +If your new/modified bootloader exceeds this size, you need to modify the boot ROM configurations. + +[start=1] +. Open the processor's main package file `rtl/core/neorv32_package.vhd` and edit the +`boot_size_c` constant according to your requirements. The boot ROM size must not exceed 32kB +and should be a power of two (for optimal hardware mapping). + +[source,vhdl] +---- +-- Bootloader ROM -- +constant boot_size_c : natural := 4*1024; -- bytes +---- + +[start=2] +. Now open the NEORV32 linker script `sw/common/neorv32.ld` and adapt the _LENGTH_ parameter +of the `rom` according to your new memory size. `boot_size_c` and the `rom` _LENGTH_ attribute have to be always +identical. Do **not modify** the _ORIGIN_ of the `rom` section. + +[source,c] +---- +MEMORY +{ + rom (rx) : ORIGIN = DEFINED(make_bootloader) ? 0xFFFF0000 : 0x00000000, LENGTH = DEFINED(make_bootloader) ? 4*1024 : 16*1024 # <1> + ram (rwx) : ORIGIN = 0x80000000, LENGTH = 8*1024 +} +---- +<1> Bootloader ROM default size = 4*1024 bytes (**left** value) + +[IMPORTANT] +The `rom` region provides conditional assignments (via symbol `make_bootloader`) for the origin +and the length depending on whether the executable is built as normal application (for the IMEM) or +as bootloader code (for the BOOTROM). To modify the BOOTLOADER memory size, make +sure to edit the first value for the origin (note "1"). + +**Re-Compiling and Re-Installing the Bootloader** + +Whenever you have modified the bootloader you need to recompile and re-install it and re-synthesize your design. + +[start=1] +. Compile and install the bootloader using the explicit `bootloader` makefile target. + +[source,bash] +---- +neorv32/sw/bootloader$ make bootloader +---- + +[start=1] +. Now perform a new synthesis / HDL compilation to update the bitstream with the new bootloader +image (some synthesis tools also allow to only update the BRAM initialization without re-running +the entire synthesis process). + +[NOTE] +The bootloader is intended to work regardless of the actual NEORV32 hardware configuration – +especially when it comes to CPU extensions. Hence, the bootloader should be build using the +minimal `rv32i` ISA only (`rv32e` would be even better). + + + + +<<< +// #################################################################################################################### +:sectnums: +== Programming an External SPI Flash via the Bootloader + +As described in section https://stnolting.github.io/neorv32/#_external_spi_flash_for_booting[Documentation: External SPI Flash for Booting] +the bootloader provides an option to store an application image to an external SPI flash +and to read this image back for booting. These steps show how to store a + +[start=1] +. At first, reset the NEORV32 processor and wait until the bootloader start screen appears in your terminal program. +. Abort the auto boot sequence and start the user console by pressing any key. +. Press u to upload the program image, that you want to store to the external flash: + +[source] +---- +CMD:> u +Awaiting neorv32_exe.bin... +---- + +[start=4] +. Send the binary in raw binary via your terminal program. When the uploaded is completed and "OK" +appears, press `p` to trigger the programming of the flash (do not execute the image via the `e` +command as this might corrupt the image): + +[source] +---- +CMD:> u +Awaiting neorv32_exe.bin... OK +CMD:> p +Write 0x000013FC bytes to SPI flash @ 0x00800000? (y/n) +---- + +[start=5] +. The bootloader shows the size of the executable and the base address inside the SPI flash where the +executable is going to be stored. A prompt appears: Type `y` to start the programming or type `n` to +abort. See section <<_external_spi_flash_for_booting> for more information on how to configure the base address. + +[source] +---- +CMD:> u +Awaiting neorv32_exe.bin... OK +CMD:> p +Write 0x000013FC bytes to SPI flash @ 0x00800000? (y/n) y +Flashing... OK +CMD:> +---- + +[start=6] +. If "OK" appears in the terminal line, the programming process was successful. Now you can use the +auto boot sequence to automatically boot your application from the flash at system start-up without +any user interaction. + + + +<<< +// #################################################################################################################### +:sectnums: +== Simulating the Processor + +**Testbench** + +The NEORV32 project features a simple default testbench (`sim/neorv32_tb.vhd`) that can be used to simulate +and test the processor setup. This testbench features a 100MHz clock and enables all optional peripheral and +CPU extensions except for the `E` extension and the TRNG IO module (that CANNOT be simulated due to its +combinatorial (looped) oscillator architecture). + +The simulation setup is configured via the "User Configuration" section located right at the beginning of +the testbench's architecture. Each configuration constant provides comments to explain the functionality. + +Besides the actual NEORV32 Processor, the testbench also simulates "external" components that are connected +to the processor's external bus/memory interface. These components are: + +* an external instruction memory (that also allows booting from it) +* an external data memory +* an external memory to simulate "external IO devices" +* a memory-mapped registers to trigger the processor's interrupt signals + +The following table shows the base addresses of these four components and their default configuration and +properties (attributes: `r` = read, `w` = write, `e` = execute, `a` = atomic accesses possible, `8` = byte-accessible, `16` = +half-word-accessible, `32` = word-accessible). + +.Testbench: processor-external memories +[cols="^4,>3,^5,<11"] +[options="header",grid="rows"] +|======================= +| Base address | Size | Attributes | Description +| `0x00000000` | `imem_size_c` | `r/w/e, a, 8/16/32` | external IMEM (initialized with application image) +| `0x80000000` | `dmem_size_c` | `r/w/e, a, 8/16/32` | external DMEM +| `0xf0000000` | 64 bytes | `r/w/e, !a, 8/16/32` | external "IO" memory, atomic accesses will fail +| `0xff000000` | 4 bytes | `-/w/-, a, -/-/32` | memory-mapped register to trigger "machine external", "machine software" and "SoC Fast Interrupt" interrupts +|======================= + +The simulated NEORV32 does not use the bootloader and directly boots the current application image (from +the `rtl/core/neorv32_application_image.vhd` image file). Make sure to use the `all` target of the +makefile to install your application as VHDL image after compilation: + +[source, bash] +---- +sw/example/blink_led$ make clean_all all +---- + +.Simulation-Optimized CPU/Processors Modules +[NOTE] +The `sim/rtl_modules` folder provides simulation-optimized versions of certain CPU/processor modules. +These alternatives can be used to replace the default CPU/processor HDL files to allow faster/easier/more +efficient simulation. **These files are not intended for synthesis!** + +**Simulation Console Output** + +Data written to the NEORV32 UART0 / UART1 transmitter is send to a virtual UART receiver implemented +as part of the testbench. Received chars are send to the simulator console and are also stored to a log file +(`neorv32.testbench_uart0.out` for UART0, `neorv32.testbench_uart1.out` for UART1) inside the simulator home folder. + +**Faster Simulation Console Output** + +When printing data via the UART the communication speed will always be based on the configured BAUD +rate. For a simulation this might take some time. To have faster output you can enable the **simulation mode** +or UART0/UART1 (see section https://stnolting.github.io/neorv32/#_primary_universal_asynchronous_receiver_and_transmitter_uart0[Documentation: Primary Universal Asynchronous Receiver and Transmitter (UART0)]). + +ASCII data send to UART0 will be immediately printed to the simulator console. Additionally, the +ASCII data is logged in a file (`neorv32.uart0.sim_mode.text.out`) in the simulator home folder. All +written 32-bit data is also dumped as 8-char hexadecimal value into a file +(`neorv32.uart0.sim_mode.data.out`) also in the simulator home folder. + +ASCII data send to UART1 will be immediately printed to the simulator console. Additionally, the +ASCII data is logged in a file (`neorv32.uart1.sim_mode.text.out`) in the simulator home folder. All +written 32-bit data is also dumped as 8-char hexadecimal value into a file +(`neorv32.uart1.sim_mode.data.out`) also in the simulator home folder. + +You can "automatically" enable the simulation mode of UART0/UART1 when compiling an application. In this case the +"real" UART0/UART1 transmitter unit is permanently disabled. To enable the simulation mode just compile +and install your application and add _UART0_SIM_MODE_ for UART0 and/or _UART1_SIM_MODE_ for UART1 to +the compiler's _USER_FLAGS_ variable (do not forget the `-D` suffix flag): + +[source, bash] +---- +sw/example/blink_led$ make USER_FLAGS+=-DUART0_SIM_MODE clean_all all +---- + +The provided define will change the default UART0/UART1 setup function in order to set the simulation mode flag in the according UART's control register. + +[NOTE] +The UART simulation output (to file and to screen) outputs "complete lines" at once. A line is +completed with a line feed (newline, ASCII `\n` = 10). + +**Simulation with Xilinx Vivado** + +The project features default a Vivado simulation waveform configuration in `sim/vivado`. + +**Simulation with GHDL** + +To simulate the processor using _GHDL_ navigate to the `sim` folder and run the provided shell script. All arguments are passed to GHDL. +For example the simulation time can be configured using `--stop-time=4ms` as argument. + +[source, bash] +---- +neorv32/sim$ sh ghdl_sim.sh --stop-time=4ms +---- + + + +<<< +// #################################################################################################################### +:sectnums: +== Building the Documentation + +The documentation is written using `asciidoc`. The according source files can be found in `docs/...`. +The documentation of the software framework is written _in-code_ using `doxygen`. + +A makefiles in the project's root directory is provided to either build all of the documentation as HTML pages +or as PDF documents. + +[TIP] +Pre-rendered PDFs are available online as nightly pre-releases: https://github.com/stnolting/neorv32/releases. +The HTML-based documentation is also available online at the project's https://stnolting.github.io/neorv32/[GitHub Pages]. + +The makefile provides a help target to show all available build options and their according outputs. + +[source,bash] +---- +neorv32$ make help +---- + +.Example: Generate HTML documentation (data sheet) using `asciidoctor` +[source,bash] +---- +neorv32$ make html +---- + +[TIP] +If you don't have `asciidoctor` / `asciidoctor-pdf` installed, you can still generate all the documentation using +a _docker container_ via `make container`. + + + +// #################################################################################################################### +:sectnums: +== Building the Project Documentation + + + +<<< +// #################################################################################################################### +:sectnums: +== FreeRTOS Support + +A NEORV32-specific port and a simple demo for FreeRTOS (https://github.com/FreeRTOS/FreeRTOS) are +available in the `sw/example/demo_freeRTOS` folder. + +See the according documentation (`sw/example/demo_freeRTOS/README.md`) for more information. + + + +// #################################################################################################################### +:sectnums: +== RISC-V Architecture Test Framework + +The NEORV32 Processor passes the according tests provided by the official RISC-V Architecture Test Suite +(V2.0+), which is available online at GitHub: https://github.com/riscv/riscv-arch-test + +All files required for executing the test framework on a simulated instance of the processor (including port +files) are located in the `riscv-arch-test` folder in the root directory of the NEORV32 repository. Take a +look at the provided `riscv-arch-test/README.md` (https://github.com/stnolting/neorv32/blob/master/riscv-arch-test/README.md[online at GitHunb]) +file for more information on how to run the tests and how testing is conducted in detail. + + + +<<< +// #################################################################################################################### +:sectnums: +== Debugging using the On-Chip Debugger + +The NEORV32 https://stnolting.github.io/neorv32/#_on_chip_debugger_ocd[Documentation: On-Chip Debugger] +allows _online_ in-system debugging via an external JTAG access port from a +host machine. The general flow is independent of the host machine's operating system. However, this tutorial uses +Windows and Linux (Ubuntu on Windows) in parallel. + +[NOTE] +This tutorial uses `gdb` to **directly upload an executable** to the processor. If you are using the default +processor setup _with_ internal instruction memory (IMEM) make sure it is implemented as RAM +(_MEM_INT_IMEM_ROM_ generic = false). + + +:sectnums: +=== Hardware Requirements + +Make sure the on-chip debugger of your NEORV32 setups is implemented (_ON_CHIP_DEBUGGER_EN_ generic = true). +Connect a JTAG adapter to the NEORV32 `jtag_*` interface signals. If you do not have a full-scale JTAG adapter, you can +also use a FTDI-based adapter like the "FT2232H-56Q Mini Module", which is a simple and inexpensive FTDI breakout board. + +.JTAG pin mapping +[cols="^3,^2,^2"] +[options="header",grid="rows"] +|======================= +| NEORV32 top signal | JTAG signal | FTDI port +| `jtag_tck_i` | TCK | D0 +| `jtag_tdi_i` | TDI | D1 +| `jtag_tdo_o` | TDO | D2 +| `jtag_tms_i` | TMS | D3 +| `jtag_trst_i` | TRST | D4 +|======================= + +[TIP] +The low-active JTAG _test reset_ (TRST) signals is _optional_ as a reset can also be triggered via the TAP controller. +If TRST is not used make sure to pull the signal _high_. + + +:sectnums: +=== OpenOCD + +The NEORV32 on-chip debugger can be accessed using the https://github.com/riscv/riscv-openocd[RISC-V port of OpenOCD]. +Prebuilt binaries can be obtained - for example - from https://www.sifive.com/software[SiFive]. A pre-configured +OpenOCD configuration file (`sw/openocd/openocd_neorv32.cfg`) is available that allows easy access to the NEORV32 CPU. + +[NOTE] +You might need to adapt `ftdi_vid_pid`, `ftdi_channel` and `ftdi_layout_init` in `sw/openocd/openocd_neorv32.cfg` +according to your interface chip and your operating system. + +[TIP] +If you want to modify the JTAG clock speed (via `adapter speed` in `sw/openocd/openocd_neorv32.cfg`) make sure to meet +the clock requirements noted in https://stnolting.github.io/neorv32/#_debug_module_dm[Documentation: Debug Transport Module (DTM)]. + +To access the processor using OpenOCD, open a terminal and start OpenOCD with the pre-configured configuration file. + +.Connecting via OpenOCD (on Windows) +[source, bash] +-------------------------- +N:\Projects\neorv32\sw\openocd>openocd -f openocd_neorv32.cfg +Open On-Chip Debugger 0.11.0-rc1+dev (SiFive OpenOCD 0.10.0-2020.12.1) +Licensed under GNU GPL v2 +For bug reports: + https://github.com/sifive/freedom-tools/issues +1 +Info : Listening on port 6666 for tcl connections +Info : Listening on port 4444 for telnet connections +Info : clock speed 1000 kHz +Info : JTAG tap: neorv32.cpu tap/device found: 0x0cafe001 (mfg: 0x000 (), part: 0xcafe, ver: 0x0) +Info : datacount=1 progbufsize=2 +Info : Disabling abstract command reads from CSRs. +Info : Examined RISC-V core; found 1 harts +Info : hart 0: XLEN=32, misa=0x40801105 +Info : starting gdb server for neorv32.cpu.0 on 3333 +Info : Listening on port 3333 for gdb connections +-------------------------- + +OpenOCD has successfully connected to the NEORV32 on-chip debugger and has examined the CPU (showing the content of +the `misa` CSRs). Now you can use `gdb` to connect via port 3333. + + +:sectnums: +=== Debugging with GDB + +This guide uses the simple "blink example" from `sw/example/blink_led` as simplified test application to +show the basics of in-system debugging. + +At first, the application needs to be compiled. We will use the minimal machine architecture configuration +(`rv32i`) here to be independent of the actual processor/CPU configuration. +Navigate to `sw/example/blink_led` and compile the application: + +.Compile the test application +[source, bash] +-------------------------- +.../neorv32/sw/example/blink_led$ make MARCH=-march=rv32i clean_all all +-------------------------- + +This will generate an ELF file `main.elf` that contains all the symbols required for debugging. +Furthermore, an assembly listing file `main.asm` is generated that we will use to define breakpoints. + +Open another terminal in `sw/example/blink_led` and start `gdb`. +The GNU debugger is part of the toolchain (see <<_toolchain_setup>>). + +.Starting GDB (on Linux (Ubuntu on Windows)) +[source, bash] +-------------------------- +.../neorv32/sw/example/blink_led$ riscv32-unknown-elf-gdb +GNU gdb (GDB) 10.1 +Copyright (C) 2020 Free Software Foundation, Inc. +License GPLv3+: GNU GPL version 3 or later +This is free software: you are free to change and redistribute it. +There is NO WARRANTY, to the extent permitted by law. +Type "show copying" and "show warranty" for details. +This GDB was configured as "--host=x86_64-pc-linux-gnu --target=riscv32-unknown-elf". +Type "show configuration" for configuration details. +For bug reporting instructions, please see: +. +Find the GDB manual and other documentation resources online at: + . + +For help, type "help". +Type "apropos word" to search for commands related to "word". +(gdb) +-------------------------- + +Now connect to OpenOCD using the default port 3333 on your local machine. +Set the ELF file we want to debug to the recently generated `main.elf` from the `blink_led` example. +Finally, upload the program to the processor. + +[NOTE] +The executable that is uploaded to the processor is **not** the default NEORV32 executable (`neorv32_exe.bin`) that +is used for uploading via the bootloader. Instead, all the required sections (like `.text`) are extracted from `mail.elf` +by GDB and uploaded via the debugger's indirect memory access. + +.Running GDB +[source, bash] +-------------------------- +(gdb) target remote localhost:3333 <1> +Remote debugging using localhost:3333 +warning: No executable has been specified and target does not support +determining executable automatically. Try using the "file" command. +0xffff0c94 in ?? () <2> +(gdb) file main.elf <3> +A program is being debugged already. +Are you sure you want to change the file? (y or n) y +Reading symbols from main.elf... +(gdb) load <4> +Loading section .text, size 0xd0c lma 0x0 +Loading section .rodata, size 0x39c lma 0xd0c +Start address 0x00000000, load size 4264 +Transfer rate: 43 KB/sec, 2132 bytes/write. +(gdb) +-------------------------- +<1> Connect to OpenOCD +<2> The CPU was still executing code from the bootloader ROM - but that does not matter here +<3> Select `mail.elf` from the `blink_led` example +<4> Upload the executable + +After the upload, GDB will make the processor jump to the beginning of the uploaded executable +(by default, this is the beginning of the instruction memory at `0x00000000`) skipping the bootloader +and halting the CPU right before executing the `blink_led` application. + + +:sectnums: +==== Breakpoint Example + +The following steps are just a small showcase that illustrate a simple debugging scheme. + +While compiling `blink_led`, an assembly listing file `main.asm` was generated. +Open this file with a text editor to check out what the CPU is going to do when resumed. + +The `blink_led` example implements a simple counter on the 8 lowest GPIO output ports. The program uses +"busy wait" to have a visible delay between increments. This waiting is done by calling the `neorv32_cpu_delay_ms` +function. We will add a _breakpoint_ right at the end of this wait function so we can step through the iterations +of the counter. + +.Cut-out from `main.asm` generated from the `blink_led` example +[source, assembly] +-------------------------- +00000688 <__neorv32_cpu_delay_ms_end>: + 688: 01c12083 lw ra,28(sp) + 68c: 02010113 addi sp,sp,32 + 690: 00008067 ret +-------------------------- + +The very last instruction of the `neorv32_cpu_delay_ms` function is `ret` (= return) +at hexadecimal `690` in this example. Add this address as _breakpoint_ to GDB. + +[NOTE] +The address might be different if you use a different version of the software framework or +if different ISA options are configured. + +.Adding a GDB breakpoint +[source, bash] +-------------------------- +(gdb) b * 0x690 +Breakpoint 1 at 0x690 +-------------------------- + +Now execute `c` (= continue). The CPU will resume operation until it hits the break-point. +By this we can "step" from increment to increment. + +.Iterating from breakpoint to breakpoint +[source, bash] +-------------------------- +Breakpoint 1 at 0x690 +(gdb) c +Continuing. + +Breakpoint 1, 0x00000690 in neorv32_cpu_delay_ms () +(gdb) c +Continuing. + +Breakpoint 1, 0x00000690 in neorv32_cpu_delay_ms () +(gdb) c +Continuing. +-------------------------- + +include::../legal.adoc[] Index: userguide/index.adoc =================================================================== --- userguide/index.adoc (nonexistent) +++ userguide/index.adoc (revision 60) @@ -0,0 +1,32 @@ += The NEORV32 RISC-V Processor: User Guide +:title: [User Guide] The NEORV32 RISC-V Processor +:author: Dipl.-Ing. Stephan Nolting +:email: stnolting@gmail.com +:description: A size-optimized, customizable and open-source full-scale 32-bit RISC-V soft-core CPU and SoC written in platform-independent VHDL. +:revnumber: v1.5.6.0 +:doctype: book +:sectnums: +:icons: font +:imagesdir: ../img +:stem: +:reproducible: +:listing-caption: Listing +:toc: left +:toclevels: 4 +:title-logo-image: neorv32_logo_dark.png[pdfwidth=6.25in,align=center] +:favicon: ../img/icon.png + +image::neorv32_logo_transparent.png[align=center] + +image::riscv_logo.png[width=350,align=center] + +[.text-center] +https://github.com/stnolting/neorv32[image:https://img.shields.io/badge/GitHub-stnolting%2Fneorv32-ffbd00?style=flat-square&logo=github&[title='homepage']] +https://github.com/stnolting/neorv32/blob/master/LICENSE[image:https://img.shields.io/github/license/stnolting/neorv32?longCache=true&style=flat-square[title='license']] +https://github.com/stnolting/neorv32/releases/tag/nightly[image:https://img.shields.io/badge/data%20sheet-PDF-ffbd00?longCache=true&style=flat-square&logo=asciidoctor[title='datasheet (pdf)']] +https://stnolting.github.io/neorv32[image:https://img.shields.io/badge/-HTML-ffbd00?longCache=true&style=flat-square[title='datasheet (html)']] +https://github.com/stnolting/neorv32/releases/tag/nightly[image:https://img.shields.io/badge/user%20guide-PDF-ffbd00?longCache=true&style=flat-square&logo=asciidoctor[title='userguide (pdf)']] +https://stnolting.github.io/neorv32/sw/files.html[image:https://img.shields.io/badge/doxygen-HTML-ffbd00?longCache=true&style=flat-square&logo=Doxygen[title='doxygen']] + + +include::content.adoc[] Index: userguide/main.adoc =================================================================== --- userguide/main.adoc (nonexistent) +++ userguide/main.adoc (revision 60) @@ -0,0 +1,34 @@ += The NEORV32 RISC-V Processor: User Guide +:author: Dipl.-Ing. Stephan Nolting +:email: stnolting@gmail.com +:description: A size-optimized, customizable and open-source full-scale 32-bit RISC-V soft-core CPU and SoC written in platform-independent VHDL. +:revnumber: v1.5.6.0 +:doctype: book +:sectnums: +:icons: image +:iconsdir: ../icons +:imagesdir: ../figures +:stem: +:reproducible: +:listing-caption: Listing +:toc: macro +:toclevels: 4 +:title-logo-image: image:neorv32_logo_dark.png[pdfwidth=6.25in,align=center] +// Uncomment next line to set page size (default is A4) +//:pdf-page-size: Letter + + +<<< +// #################################################################################################################### +.**Documentation** +[TIP] +The online documentation of the project (a.k.a. the **data sheet**) is available on GitHub-pages: https://stnolting.github.io/neorv32/ + + + +The online documentation of the **software framework** is also available on GitHub-pages: https://stnolting.github.io/neorv32/sw/files.html + + +<<< +// #################################################################################################################### +toc::[] + +include::content.adoc[] Index: legal.adoc =================================================================== --- legal.adoc (nonexistent) +++ legal.adoc (revision 60) @@ -0,0 +1,112 @@ +<<< +:sectnums: +== Legal + +// #################################################################################################################### +:sectnums!: +=== License + +**BSD 3-Clause License** + +Copyright (c) 2021, Stephan Nolting. All rights reserved. + +Redistribution and use in source and binary forms, with or without modification, are permitted provided that +the following conditions are met: + +. Redistributions of source code must retain the above copyright notice, this list of conditions and the +following disclaimer. +. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and +the following disclaimer in the documentation and/or other materials provided with the distribution. +. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or +promote products derived from this software without specific prior written permission. + +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" +AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE +IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE +ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE +LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR +CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF +SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS +INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN +CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) +ARISING IN ANY WAY OUT OF + + +========================== +**The NEORV32 RISC-V Processor** + +Copyright (c) 2021, by Dipl.-Ing. Stephan Nolting. All rights reserved. + +HQ: https://github.com/stnolting/neorv32 + +Contact: stnolting@gmail.com + +_made in Hanover, Germany_ +========================== + + +<<< +// #################################################################################################################### +:sectnums!: +=== Proprietary Notice + +* "GitHub" is a Subsidiary of Microsoft Corporation. +* "Vivado" and "Artix" are trademarks of Xilinx Inc. +* "AXI" and "AXI4-Lite" are trademarks of Arm Holdings plc. +* "ModelSim" is a trademark of Mentor Graphics – A Siemens Business. +* "Quartus Prime" and "Cyclone" are trademarks of Intel Corporation. +* "iCE40", "UltraPlus" and "Radiant" are trademarks of Lattice Semiconductor Corporation. +* "Windows" is a trademark of Microsoft Corporation. +* "Tera Term" copyright by T. Teranishi. +* Timing diagrams made with WaveDrom Editor. +* "NeoPixel" is a trademark of Adafruit Industries. +* Documentation made with `asciidoctor`. + +PDF icons from https://www.flaticon.com and made by +link:https://www.freepik.com[Freepik], link:https://www.flaticon.com/authors/good-ware[Good Ware], +link:https://www.flaticon.com/authors/pixel-perfect[Pixel perfect], link:https://www.flaticon.com/authors/vectors-market[Vectors Market] + + +:sectnums!: +=== Disclaimer + +This project is released under the BSD 3-Clause license. No copyright infringement +intended. Other implied or used projects might have different licensing – see their documentation to get more information. + + +:sectnums!: +=== Limitation of Liability for External Links + +This document contains links to the websites of third parties ("external links"). As the content of these websites +is not under our control, we cannot assume any liability for such external content. In all cases, the provider of +information of the linked websites is liable for the content and accuracy of the information provided. At the +point in time when the links were placed, no infringements of the law were recognizable to us. As soon as an +infringement of the law becomes known to us, we will immediately remove the link in question. + + +:sectnums!: +=== Citing + +If you are using the NEORV32 or parts of the project in some kind of publication, please cite it as follows: + +.BibTeX +[source] +---- +@misc{nolting20, + author = {Nolting, S.}, + title = {The NEORV32 RISC-V Processor}, + year = {2020}, + publisher = {GitHub}, + journal = {GitHub repository}, + howpublished = {\url{https://github.com/stnolting/neorv32}} +} +---- + +:sectnums!: +=== Acknowledgments + +**A big shoutout to all https://github.com/stnolting/neorv32/graphs/contributors[contributors], +who helped improving this project! ❤️** + +https://riscv.org[RISC-V] - instruction sets want to be free! + + + + + Index: neorv32-theme.yml =================================================================== --- neorv32-theme.yml (nonexistent) +++ neorv32-theme.yml (revision 60) @@ -0,0 +1,48 @@ +extends: default +page: + margin: [0.8in, 0.67in, 0.75in, 0.67in] +link: + font-color: #edac00 +image: + align: center +caption: + align: center +running-content: + start-at: toc +header: + height: 0.65in + vertical-align: bottom + image-vertical-align: bottom + font-size: 11 + border-color: #000000 + border-width: 1 + recto: + left: + content: '*The NEORV32 Processor*' + right: + content: '*Visit on https://github.com/stnolting/neorv32[GitHub]*' + verso: + left: + content: '*The NEORV32 Processor*' + right: + content: '*Visit on https://github.com/stnolting/neorv32[GitHub]*' +footer: + start-at: toc + height: 0.75in + font-size: 10 + border-color: #000000 + border-width: 1 + recto: + left: + content: '{page-number} / {page-count}' + center: + content: 'Copyright (c) 2021, Stephan Nolting. All rights reserved.' + right: + content: '{docdate}' + verso: + left: + content: '{page-number} / {page-count}' + center: + content: 'NEORV32 Version: {revnumber}' + right: + content: '{docdate}'

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.