Line 2... |
Line 2... |
Opensource RISC-V implemented from scratch in one night!
|
Opensource RISC-V implemented from scratch in one night!
|
|
|
## Table of Contents
|
## Table of Contents
|
|
|
- [Introduction](#introduction)
|
- [Introduction](#introduction)
|
|
- [History](#history)
|
- [Project Background](#project-background)
|
- [Project Background](#project-background)
|
- [Directory Description](#directory-description)
|
- [Directory Description](#directory-description)
|
- ["src" Directory](#src-directory)
|
- ["src" Directory](#src-directory)
|
- ["sim" Directory](#sim-directory)
|
- ["sim" Directory](#sim-directory)
|
- ["rtl" Directory](#rtl-directory)
|
- ["rtl" Directory](#rtl-directory)
|
Line 21... |
Line 22... |
|
|
Developed in a magic night of 19 Aug, 2018 between 2am and 8am, the
|
Developed in a magic night of 19 Aug, 2018 between 2am and 8am, the
|
*DarkRISCV* softcore started as an proof of concept for the opensource
|
*DarkRISCV* softcore started as an proof of concept for the opensource
|
RISC-V instruction set.
|
RISC-V instruction set.
|
|
|
|
Although the code is small and crude when compared with other RISC-V
|
|
implementations, the *DarkRISCV* has lots of impressive features:
|
|
|
|
- implements most of the RISC-V RV32E instruction set
|
|
- implements most of the RISC-V RV32I instruction set (missing csr*, e* and fence*)
|
|
- works up to 220MHz in a kintex-7 and up to 100MHz in a cheap spartan-6
|
|
- can sustain 1 clock per instruction most of time
|
|
- flexible harvard architecture (easy to integrate a cache controller)
|
|
- works fine in a real xilinx, altera and lattice FPGAs
|
|
- works fine with gcc 9.0.0 for RISC-V (no patches required!)
|
|
- uses between 1000-1500LUTs (core only with LUT6 technology, depending of enabled features)
|
|
- optional RV32E support (works better with LUT4 FPGAs)
|
|
- optional 16x16-bit MAC instruction (for digital signal processing)
|
|
- optional coarse-grained multi-threading (MT)
|
|
- no interlock between pipeline stages!
|
|
- BSD license: can be used anywhere with no restrictions!
|
|
|
|
Some extra features are planned for the furure or under development:
|
|
|
|
- interrupt controller (under tests)
|
|
- cache controller (under tests)
|
|
- gpio and timer (under tests)
|
|
- sdram controller w/ data scrambler
|
|
- branch predictor (under tests)
|
|
- ethernet controller (GbE)
|
|
- multi-processing (SMP)
|
|
- network on chip (NoC)
|
|
- rv64i support (not so easy as it appears...)
|
|
- dynamic bus sizing and big-endian support
|
|
- user/supervisor modes
|
|
- debug support
|
|
- misaligned memory access
|
|
- bridge for 8/16/32-bit buses
|
|
|
|
And much other features!
|
|
|
|
Feel free to make suggestions and good hacking! o/
|
|
|
|
## History
|
|
|
The initial concept was based in my other early 16-bit RISC processors and
|
The initial concept was based in my other early 16-bit RISC processors and
|
composed by a simplified two stage pipeline, where a instruction is fetch
|
composed by a simplified two stage pipeline, where a instruction is fetch
|
from a instruction memory in the first clock and then the instruction is
|
from a instruction memory in the first clock and then the instruction is
|
decoded/executed in the second clock. The pipeline is overlapped without
|
decoded/executed in the second clock. The pipeline is overlapped without
|
interlocks, in a way that the *DarkRISCV* can reach the performance of one
|
interlocks, in a way that the *DarkRISCV* can reach the performance of one
|
Line 38... |
Line 79... |
obfuscated but beautiful Verilog code. After lots of exciting sleepless
|
obfuscated but beautiful Verilog code. After lots of exciting sleepless
|
nights of work and the help of lots of colleagues, the *DarkRISCV* reached a
|
nights of work and the help of lots of colleagues, the *DarkRISCV* reached a
|
very good quality result, in a way that the code compiled by the standard
|
very good quality result, in a way that the code compiled by the standard
|
GCC for RV32I worked fine.
|
GCC for RV32I worked fine.
|
|
|
Nowadays, after two years of development, a three stage pipeline working
|
After two years of development, a three stage pipeline working
|
with a single clock phase is also available, resulting in a better
|
with a single clock phase is also available, resulting in a better
|
distribution between the decode and execute stages. In this case the
|
distribution between the decode and execute stages. In this case the
|
instruction is fetch in the first clock from a blockram, decoded in the
|
instruction is fetch in the first clock from a blockram, decoded in the
|
second clock and executed in the third clock.
|
second clock and executed in the third clock.
|
|
|
Line 53... |
Line 94... |
optimizations, but according to the lastest measurements, the 3-stage
|
optimizations, but according to the lastest measurements, the 3-stage
|
pipeline version can reach a instruction per clock (IPC) of 0.7, smaller
|
pipeline version can reach a instruction per clock (IPC) of 0.7, smaller
|
than the measured IPC of 0.85 in the case of the 2-stage pipeline version.
|
than the measured IPC of 0.85 in the case of the 2-stage pipeline version.
|
|
|
Anyway, with the 3-stage pipeline and some other expensive optimizations,
|
Anyway, with the 3-stage pipeline and some other expensive optimizations,
|
the *DarkRISCV* can reach 100MHz in a low-cost Spartan-6, which results in
|
the *DarkRISCV* can reach up to 100MHz in a low-cost Spartan-6, which results in
|
more performance when compared with the 2-stage pipeline version (typically
|
more performance when compared with the 2-stage pipeline version (typically
|
50MHz).
|
50MHz).
|
|
|
Although the code is small and crude when compared with other RISC-V
|
|
implementations, the *DarkRISCV* has lots of impressive features:
|
|
|
|
- implements most of the RISC-V RV32I instruction set (missing csr*, e* and fence*)
|
|
- works up to 100MHz (spartan-6) and sustain 1 clock per instruction most of time
|
|
- flexible harvard architecture (easy to integrate a cache controller)
|
|
- works fine in a real xilinx and lattice FPGAs
|
|
- works fine with gcc 9.0.0 for RISC-V (no patches required!)
|
|
- uses between 1000-1500LUTs, depending of enabled features (Xilinx LUT6)
|
|
- optional RV32E support (works better with LUT4 FPGAs)
|
|
- optional 16x16-bit MAC instruction (for signal processing)
|
|
- optional coarse-grained multi-threading (MT)
|
|
- no interlock between pipeline stages
|
|
- BSD license: can be used anywhere with no restrictions!
|
|
|
|
Some extra features are planned for the furure or under development:
|
|
|
|
- interrupt controller (under tests)
|
|
- cache controller (under tests)
|
|
- gpio and timer (under tests)
|
|
- sdram controller w/ data scrambler
|
|
- branch predictor (under tests)
|
|
- ethernet controller (GbE)
|
|
- multi-processing (SMP)
|
|
- network on chip (NoC)
|
|
- rv64i support (not so easy as appears...)
|
|
- dynamic bus size and big-endian support
|
|
- user/supervisor modes
|
|
- debug support
|
|
|
|
And much other features!
|
|
|
|
Feel free to make suggestions and good hacking! o/
|
|
|
|
## Project Background
|
## Project Background
|
|
|
The main motivation for the *DarkRISCV* was create a migration path for some
|
The main motivation for the *DarkRISCV* was create a migration path for some
|
projects around the 680x0/Coldfire family.
|
projects around the 680x0/Coldfire family.
|
|
|
Line 163... |
Line 170... |
Step 1: Clone the DarkRISC repo to your local using below code.
|
Step 1: Clone the DarkRISC repo to your local using below code.
|
git clone https://github.com/darklife/darkriscv.git
|
git clone https://github.com/darklife/darkriscv.git
|
|
|
Pre Setup Guide for MacOS:
|
Pre Setup Guide for MacOS:
|
|
|
The document encompasses all the dependencies and steps to install those dependencies to successfully utilize the Darriscv ecosystem on MacOS.
|
The document encompasses all the dependencies and steps to install those
|
|
dependencies to successfully utilize the Darriscv ecosystem on MacOS.
|
|
|
|
Essentially, the ecosystem cannot be utilized in MacOS because of on of the
|
|
dependencies Xilinx ISE 14.7 Design suit, which currently do not support
|
|
MacOS.
|
|
|
Essentially, the ecosystem cannot be utilized in MacOS because of on of the dependencies Xilinx ISE 14.7 Design suit, which currently do not support MacOS.
|
In order to overcome this issue, we need to install Linux/Windows on MacOS
|
|
by using below two methods:
|
|
|
In order to overcome this issue, we need to install Linux/Windows on MacOS by using below two methods:
|
a) WineSkin, which is a kind of Windows emulator that runs the Windows
|
|
application natively but intercepts and emulate the Windows calls to map
|
|
directly in the macOS.
|
|
|
a) WineSkin, which is a kind of Windows emulator that runs the Windows application natively but intercepts and emulate the Windows calls to map directly in the macOS.
|
b) VirtualBox (or VMware, Parallels, etc) in order to run a complete Windows
|
b) VirtualBox (or VMware, Parallels, etc) in order to run a complete Windows OS or Linux, which appears to be far better than the WineSkin option.
|
OS or Linux, which appears to be far better than the WineSkin option.
|
|
|
I used the second method and installed VMware Fusion to install Linux Mint. Please find below the links I used to obtain download files.
|
I used the second method and installed VMware Fusion to install Linux Mint.
|
|
Please find below the links I used to obtain download files.
|
|
|
Dependencies:
|
Dependencies:
|
|
|
1. Icarus Verilog
|
1. Icarus Verilog
|
a. Bison
|
a. Bison
|
Line 187... |
Line 203... |
2. Xilinx 14.7 ISE
|
2. Xilinx 14.7 ISE
|
|
|
|
|
Icarus Verilog Setup:
|
Icarus Verilog Setup:
|
|
|
The steps have been condensed for linux operating system. Complete steps for all other OS platforms are available on https://iverilog.fandom.com/wiki/Installation_Guide.
|
The steps have been condensed for linux operating system. Complete steps
|
|
for all other OS platforms are available on
|
Step 1: Download Verilog download tar file from ftp://ftp.icarus.com/pub/eda/verilog/ . Always install the latest version. Verilog-10.3 is the latest version as of now.
|
https://iverilog.fandom.com/wiki/Installation_Guide.
|
|
|
|
Step 1: Download Verilog download tar file from
|
|
ftp://ftp.icarus.com/pub/eda/verilog/ . Always install the latest version.
|
|
Verilog-10.3 is the latest version as of now.
|
|
|
Step 2: Extract the tar file using ‘% tar -zxvf verilog-version.tar.gz’.
|
Step 2: Extract the tar file using ‘% tar -zxvf verilog-version.tar.gz’.
|
|
|
Step 3: Go to the Verilog folder using ‘cd Verilog-version’. Here it is cd Verilog-10.3.
|
Step 3: Go to the Verilog folder using ‘cd Verilog-version’. Here it is cd
|
|
Verilog-10.3.
|
|
|
Step 4: Check if you have the following libraries installed: Flex, Bison, g++ and gcc. If not use ‘sudo apt-get install flex bison g++ gcc’ in terminal to install. Restart the system once for effects to change place.
|
Step 4: Check if you have the following libraries installed: Flex, Bison,
|
|
g++ and gcc. If not use ‘sudo apt-get install flex bison g++ gcc’ in
|
|
terminal to install. Restart the system once for effects to change place.
|
|
|
Step 5: Run the below commands in directory Verilog-10.3
|
Step 5: Run the below commands in directory Verilog-10.3
|
1. ./configure
|
1. ./configure
|
2. Make
|
2. Make
|
3. Sudo make install
|
3. Sudo make install
|
Line 221... |
Line 244... |
1. For 64 bit architechure
|
1. For 64 bit architechure
|
a. Sudo apt-get install libncurses5 libncursesw-dev
|
a. Sudo apt-get install libncurses5 libncursesw-dev
|
2. For 32 bit architecture
|
2. For 32 bit architecture
|
a. Sudo apt-get install libncurses5:i386
|
a. Sudo apt-get install libncurses5:i386
|
|
|
Once all pre-requisites are installed, go to root directory and run the below code:
|
Once all pre-requisites are installed, go to root directory and run the
|
|
below code:
|
|
|
cd darkrisc
|
cd darkrisc
|
make (use sudo if required)
|
make (use sudo if required)
|
|
|
|
|
Line 238... |
Line 262... |
CROSS = riscv32-embedded-elf
|
CROSS = riscv32-embedded-elf
|
CCPATH = /usr/local/share/gcc-$(CROSS)/bin/
|
CCPATH = /usr/local/share/gcc-$(CROSS)/bin/
|
ICARUS = /usr/local/bin/iverilog
|
ICARUS = /usr/local/bin/iverilog
|
BOARD = avnet_microboard_lx9
|
BOARD = avnet_microboard_lx9
|
|
|
Just update the configuration according to your system configuration,
|
Just update the configuration according to your system configuration, type
|
type *make* and hope everything is in the correct location! You probably will
|
*make* and hope everything is in the correct location! You probably will
|
need fix some paths and set some others in the PATH environment variable, but
|
need fix some paths and set some others in the PATH environment variable,
|
it will eventually work.
|
but it will eventually work.
|
|
|
And, when everything is correctly configured, the result will be something like this:
|
And, when everything is correctly configured, the result will be something
|
|
like this:
|
|
|
```$
|
```$
|
# make
|
# make
|
make -C src all CROSS=riscv32-embedded-elf CCPATH=/usr/local/share/gcc-riscv32-embedded-elf/bin/ ARCH=rv32e HARVARD=1
|
make -C src all CROSS=riscv32-embedded-elf CCPATH=/usr/local/share/gcc-riscv32-embedded-elf/bin/ ARCH=rv32e HARVARD=1
|
make[1]: Entering directory `/home/marcelo/Documents/Verilog/darkriscv/v38/src'
|
make[1]: Entering directory `/home/marcelo/Documents/Verilog/darkriscv/v38/src'
|
Line 798... |
Line 823... |
|
|
And one number for speed grade 3 devices:
|
And one number for speed grade 3 devices:
|
|
|
- Kintex-7: 221MHz
|
- Kintex-7: 221MHz
|
|
|
Although Vivado is far slow and shows pessimistic numbers for the same FPGAs when
|
Although Vivado is far slow and shows pessimistic numbers for the same FPGAs
|
compared with ISE, I guess Vivado is more realistic and, at least, it supports the
|
when compared with ISE, I guess Vivado is more realistic and, at least, it
|
new Spartan-7, which shows very good numbers (almost the same as the Artix-7!).
|
supports the new Spartan-7, which shows very good numbers (almost the same
|
|
as the Artix-7!).
|
|
|
That values are only for reference. The real values depends of some options
|
That values are only for reference. The real values depends of some options
|
in the core, such as the number of pipeline stages, who the memories are
|
in the core, such as the number of pipeline stages, who the memories are
|
connected, etc. Basically, the best clock is reached by the 3-stage
|
connected, etc. Basically, the best clock is reached by the 3-stage
|
pipeline version (up to 100MHz in a Spartan-6), but it requires at lease 1
|
pipeline version (up to 100MHz in a Spartan-6), but it requires at lease 1
|
Line 1031... |
Line 1057... |
- https://www.twitch.tv/videos/850859857 instruction decode and execute - part 1/3 (08h56)
|
- https://www.twitch.tv/videos/850859857 instruction decode and execute - part 1/3 (08h56)
|
- https://www.twitch.tv/videos/852082786 instruction decode and execute - part 2/3 (10h56)
|
- https://www.twitch.tv/videos/852082786 instruction decode and execute - part 2/3 (10h56)
|
- https://www.twitch.tv/videos/858055433 instruction decode and execute - part 3/3 - SoC simulation (10h24)
|
- https://www.twitch.tv/videos/858055433 instruction decode and execute - part 3/3 - SoC simulation (10h24)
|
- TBD tests in the Lattice FPGA
|
- TBD tests in the Lattice FPGA
|
|
|
Unfortunately the video set is currently in portuguese only and there a lot of
|
Unfortunately the video set is currently in portuguese only and there a lot
|
parallel discussions about technology, including the fix of the Teske's notebook
|
of parallel discussions about technology, including the fix of the Teske's
|
online! I hope in the future will be possible edit the video set and, maybe,
|
notebook online! I hope in the future will be possible edit the video set
|
create english subtitles.
|
and, maybe, create english subtitles.
|
|
|
About the processor itself, it is a microcode oriented concept with a classic
|
About the processor itself, it is a microcode oriented concept with a
|
von neumann archirecture, designed to support more easily different ISAs. It is really
|
classic von neumann archirecture, designed to support more easily different
|
very different than the traditional RISC cores that we found around! Also, it includes
|
ISAs. It is really very different than the traditional RISC cores that we
|
a very good eco-system around opensource tools, such as Icarus, Yosys and gtkWave!
|
found around! Also, it includes a very good eco-system around opensource
|
|
tools, such as Icarus, Yosys and gtkWave!
|
|
|
Although not finished yet (95% done!), I think it is very illustrative about the RISC-V design:
|
Although not finished yet (95% done!), I think it is very illustrative about the RISC-V design:
|
|
|
- rv32e instruction set: very reduced (37) and very ortogonal bit patterns (6)
|
- rv32e instruction set: very reduced (37) and very ortogonal bit patterns (6)
|
- rv32e register set: 16x32-bit register bank and a 32-bit program counter
|
- rv32e register set: 16x32-bit register bank and a 32-bit program counter
|
- rv32e ALU with basic operations for reg/imm and reg/reg instructions
|
- rv32e ALU with basic operations for reg/imm and reg/reg instructions
|
- rv32e instruction decode: very simple to understand, very direct to implement
|
- rv32e instruction decode: very simple to understand, very direct to implement
|
- rv32e software support: the GCC support provides an easy way to generate code and test it!
|
- rv32e software support: the GCC support provides an easy way to generate code and test it!
|
|
|
The Teske's proposal is not design the faster RISC-V core ever (we already have lots
|
The Teske's proposal is not design the faster RISC-V core ever (we already
|
of faster cores with CPI ~ 1, such as the darkriscv, vexriscv, etc), but create a clean,
|
have lots of faster cores with CPI ~ 1, such as the darkriscv, vexriscv,
|
reliable and compreensive RISC-V core.
|
etc), but create a clean, reliable and compreensive RISC-V core.
|
|
|
You can check the code in the following repository:
|
You can check the code in the following repository:
|
|
|
- https://github.com/racerxdl/riskow
|
- https://github.com/racerxdl/riskow
|
|
|