OpenCores
URL https://opencores.org/ocsvn/darkriscv/darkriscv/trunk

Subversion Repositories darkriscv

[/] [darkriscv/] [trunk/] [README.md] - Diff between revs 2 and 4

Show entire file | Details | Blame | View Log

Rev 2 Rev 4
Line 2... Line 2...
Opensource RISC-V implemented from scratch in one night!
Opensource RISC-V implemented from scratch in one night!
 
 
## Table of Contents
## Table of Contents
 
 
- [Introduction](#introduction)
- [Introduction](#introduction)
 
- [History](#history)
- [Project Background](#project-background)
- [Project Background](#project-background)
- [Directory Description](#directory-description)
- [Directory Description](#directory-description)
- ["src" Directory](#src-directory)
- ["src" Directory](#src-directory)
- ["sim" Directory](#sim-directory)
- ["sim" Directory](#sim-directory)
- ["rtl" Directory](#rtl-directory)
- ["rtl" Directory](#rtl-directory)
Line 21... Line 22...
 
 
Developed in a magic night of 19 Aug, 2018 between 2am and 8am, the
Developed in a magic night of 19 Aug, 2018 between 2am and 8am, the
*DarkRISCV* softcore started as an proof of concept for the opensource
*DarkRISCV* softcore started as an proof of concept for the opensource
RISC-V instruction set.
RISC-V instruction set.
 
 
 
Although the code is small and crude when compared with other RISC-V
 
implementations, the *DarkRISCV* has lots of impressive features:
 
 
 
- implements most of the RISC-V RV32E instruction set
 
- implements most of the RISC-V RV32I instruction set (missing csr*, e* and fence*)
 
- works up to 220MHz in a kintex-7 and up to 100MHz in a cheap spartan-6
 
- can sustain 1 clock per instruction most of time
 
- flexible harvard architecture (easy to integrate a cache controller)
 
- works fine in a real xilinx, altera and lattice FPGAs
 
- works fine with gcc 9.0.0 for RISC-V (no patches required!)
 
- uses between 1000-1500LUTs (core only with LUT6 technology, depending of enabled features)
 
- optional RV32E support (works better with LUT4 FPGAs)
 
- optional 16x16-bit MAC instruction (for digital signal processing)
 
- optional coarse-grained multi-threading (MT)
 
- no interlock between pipeline stages!
 
- BSD license: can be used anywhere with no restrictions!
 
 
 
Some extra features are planned for the furure or under development:
 
 
 
- interrupt controller (under tests)
 
- cache controller (under tests)
 
- gpio and timer (under tests)
 
- sdram controller w/ data scrambler
 
- branch predictor (under tests)
 
- ethernet controller (GbE)
 
- multi-processing (SMP)
 
- network on chip (NoC)
 
- rv64i support (not so easy as it appears...)
 
- dynamic bus sizing and big-endian support
 
- user/supervisor modes
 
- debug support
 
- misaligned memory access
 
- bridge for 8/16/32-bit buses
 
 
 
And much other features!
 
 
 
Feel free to make suggestions and good hacking! o/
 
 
 
## History
 
 
The initial concept was based in my other early 16-bit RISC processors and
The initial concept was based in my other early 16-bit RISC processors and
composed by a simplified two stage pipeline, where a instruction is fetch
composed by a simplified two stage pipeline, where a instruction is fetch
from a instruction memory in the first clock and then the instruction is
from a instruction memory in the first clock and then the instruction is
decoded/executed in the second clock.  The pipeline is overlapped without
decoded/executed in the second clock.  The pipeline is overlapped without
interlocks, in a way that the *DarkRISCV* can reach the performance of one
interlocks, in a way that the *DarkRISCV* can reach the performance of one
Line 38... Line 79...
obfuscated but beautiful Verilog code.  After lots of exciting sleepless
obfuscated but beautiful Verilog code.  After lots of exciting sleepless
nights of work and the help of lots of colleagues, the *DarkRISCV* reached a
nights of work and the help of lots of colleagues, the *DarkRISCV* reached a
very good quality result, in a way that the code compiled by the standard
very good quality result, in a way that the code compiled by the standard
GCC for RV32I worked fine.
GCC for RV32I worked fine.
 
 
Nowadays, after two years of development, a three stage pipeline working
After two years of development, a three stage pipeline working
with a single clock phase is also available, resulting in a better
with a single clock phase is also available, resulting in a better
distribution between the decode and execute stages.  In this case the
distribution between the decode and execute stages.  In this case the
instruction is fetch in the first clock from a blockram, decoded in the
instruction is fetch in the first clock from a blockram, decoded in the
second clock and executed in the third clock.
second clock and executed in the third clock.
 
 
Line 53... Line 94...
optimizations, but according to the lastest measurements, the 3-stage
optimizations, but according to the lastest measurements, the 3-stage
pipeline version can reach a instruction per clock (IPC) of 0.7, smaller
pipeline version can reach a instruction per clock (IPC) of 0.7, smaller
than the measured IPC of 0.85 in the case of the 2-stage pipeline version.
than the measured IPC of 0.85 in the case of the 2-stage pipeline version.
 
 
Anyway, with the 3-stage pipeline and some other expensive optimizations,
Anyway, with the 3-stage pipeline and some other expensive optimizations,
the *DarkRISCV* can reach 100MHz in a low-cost Spartan-6, which results in
the *DarkRISCV* can reach up to 100MHz in a low-cost Spartan-6, which results in
more performance when compared with the 2-stage pipeline version (typically
more performance when compared with the 2-stage pipeline version (typically
50MHz).
50MHz).
 
 
Although the code is small and crude when compared with other RISC-V
 
implementations, the *DarkRISCV* has lots of impressive features:
 
 
 
- implements most of the RISC-V RV32I instruction set (missing csr*, e* and fence*)
 
- works up to 100MHz (spartan-6) and sustain 1 clock per instruction most of time
 
- flexible harvard architecture (easy to integrate a cache controller)
 
- works fine in a real xilinx and lattice FPGAs
 
- works fine with gcc 9.0.0 for RISC-V (no patches required!)
 
- uses between 1000-1500LUTs, depending of enabled features (Xilinx LUT6)
 
- optional RV32E support (works better with LUT4 FPGAs)
 
- optional 16x16-bit MAC instruction (for signal processing)
 
- optional coarse-grained multi-threading (MT)
 
- no interlock between pipeline stages
 
- BSD license: can be used anywhere with no restrictions!
 
 
 
Some extra features are planned for the furure or under development:
 
 
 
- interrupt controller (under tests)
 
- cache controller (under tests)
 
- gpio and timer (under tests)
 
- sdram controller w/ data scrambler
 
- branch predictor (under tests)
 
- ethernet controller (GbE)
 
- multi-processing (SMP)
 
- network on chip (NoC)
 
- rv64i support (not so easy as appears...)
 
- dynamic bus size and big-endian support
 
- user/supervisor modes
 
- debug support
 
 
 
And much other features!
 
 
 
Feel free to make suggestions and good hacking! o/
 
 
 
## Project Background
## Project Background
 
 
The main motivation for the *DarkRISCV* was create a migration path for some
The main motivation for the *DarkRISCV* was create a migration path for some
projects around the 680x0/Coldfire family.
projects around the 680x0/Coldfire family.
 
 
Line 163... Line 170...
Step 1: Clone the DarkRISC repo to your local using below code.
Step 1: Clone the DarkRISC repo to your local using below code.
git clone https://github.com/darklife/darkriscv.git
git clone https://github.com/darklife/darkriscv.git
 
 
Pre Setup Guide for MacOS:
Pre Setup Guide for MacOS:
 
 
The document encompasses all the dependencies and steps to install those dependencies to successfully utilize the Darriscv ecosystem on MacOS.
The document encompasses all the dependencies and steps to install those
 
dependencies to successfully utilize the Darriscv ecosystem on MacOS.
 
 
 
Essentially, the ecosystem cannot be utilized in MacOS because of on of the
 
dependencies Xilinx ISE 14.7 Design suit, which currently do not support
 
MacOS.
 
 
Essentially, the ecosystem cannot be utilized in MacOS because of on of the dependencies Xilinx ISE 14.7 Design suit, which currently do not support MacOS.
In order to overcome this issue, we need to install Linux/Windows on MacOS
 
by using below two methods:
 
 
In order to overcome this issue, we need to install Linux/Windows on MacOS by using below two methods:
a) WineSkin, which is a kind of Windows emulator that runs the Windows
 
application natively but intercepts and emulate the Windows calls to map
 
directly in the macOS.
 
 
a) WineSkin, which is a kind of Windows emulator that runs the Windows application natively but intercepts and emulate the Windows calls to map directly in the macOS.
b) VirtualBox (or VMware, Parallels, etc) in order to run a complete Windows
b) VirtualBox (or VMware, Parallels, etc) in order to run a complete Windows OS or Linux, which appears to be far better than the WineSkin option.
OS or Linux, which appears to be far better than the WineSkin option.
 
 
I used the second method and installed VMware Fusion to install Linux Mint. Please find below the links I used to obtain download files.
I used the second method and installed VMware Fusion to install Linux Mint.
 
Please find below the links I used to obtain download files.
 
 
Dependencies:
Dependencies:
 
 
1.  Icarus Verilog
1.  Icarus Verilog
a.  Bison
a.  Bison
Line 187... Line 203...
2.  Xilinx 14.7 ISE
2.  Xilinx 14.7 ISE
 
 
 
 
Icarus Verilog Setup:
Icarus Verilog Setup:
 
 
The steps have been condensed for linux operating system. Complete steps for all other OS platforms are available on https://iverilog.fandom.com/wiki/Installation_Guide.
The steps have been condensed for linux operating system.  Complete steps
 
for all other OS platforms are available on
Step 1: Download Verilog download tar file from ftp://ftp.icarus.com/pub/eda/verilog/ . Always install the latest version. Verilog-10.3 is the latest version as of now.
https://iverilog.fandom.com/wiki/Installation_Guide.
 
 
 
Step 1: Download Verilog download tar file from
 
ftp://ftp.icarus.com/pub/eda/verilog/ .  Always install the latest version.
 
Verilog-10.3 is the latest version as of now.
 
 
Step 2: Extract the tar file using ‘% tar -zxvf verilog-version.tar.gz’.
Step 2: Extract the tar file using ‘% tar -zxvf verilog-version.tar.gz’.
 
 
Step 3: Go to the Verilog folder using ‘cd Verilog-version’. Here it is cd Verilog-10.3.
Step 3: Go to the Verilog folder using ‘cd Verilog-version’.  Here it is cd
 
Verilog-10.3.
 
 
Step 4: Check if you have the following libraries installed: Flex, Bison, g++ and gcc. If not use ‘sudo apt-get install flex bison g++ gcc’ in terminal to install. Restart the system once for effects to change place.
Step 4: Check if you have the following libraries installed: Flex, Bison,
 
g++ and gcc.  If not use ‘sudo apt-get install flex bison g++ gcc’ in
 
terminal to install.  Restart the system once for effects to change place.
 
 
Step 5: Run the below commands in directory Verilog-10.3
Step 5: Run the below commands in directory Verilog-10.3
1.  ./configure
1.  ./configure
2.  Make
2.  Make
3.  Sudo make install
3.  Sudo make install
Line 221... Line 244...
1.  For 64 bit architechure
1.  For 64 bit architechure
a.  Sudo apt-get install libncurses5 libncursesw-dev
a.  Sudo apt-get install libncurses5 libncursesw-dev
2.  For 32 bit architecture
2.  For 32 bit architecture
a.  Sudo apt-get install libncurses5:i386
a.  Sudo apt-get install libncurses5:i386
 
 
Once all pre-requisites are installed, go to root directory and run the below code:
Once all pre-requisites are installed, go to root directory and run the
 
below code:
 
 
cd darkrisc
cd darkrisc
make (use sudo if required)
make (use sudo if required)
 
 
 
 
Line 238... Line 262...
        CROSS = riscv32-embedded-elf
        CROSS = riscv32-embedded-elf
        CCPATH = /usr/local/share/gcc-$(CROSS)/bin/
        CCPATH = /usr/local/share/gcc-$(CROSS)/bin/
        ICARUS = /usr/local/bin/iverilog
        ICARUS = /usr/local/bin/iverilog
        BOARD  = avnet_microboard_lx9
        BOARD  = avnet_microboard_lx9
 
 
Just update the configuration according to your system configuration,
Just update the configuration according to your system configuration, type
type *make* and hope everything is in the correct location! You probably will
*make* and hope everything is in the correct location!  You probably will
need fix some paths and set some others in the PATH environment variable, but
need fix some paths and set some others in the PATH environment variable,
it will eventually work.
but it will eventually work.
 
 
And, when everything is correctly configured, the result will be something like this:
And, when everything is correctly configured, the result will be something
 
like this:
 
 
```$
```$
# make
# make
make -C src all             CROSS=riscv32-embedded-elf CCPATH=/usr/local/share/gcc-riscv32-embedded-elf/bin/ ARCH=rv32e HARVARD=1
make -C src all             CROSS=riscv32-embedded-elf CCPATH=/usr/local/share/gcc-riscv32-embedded-elf/bin/ ARCH=rv32e HARVARD=1
make[1]: Entering directory `/home/marcelo/Documents/Verilog/darkriscv/v38/src'
make[1]: Entering directory `/home/marcelo/Documents/Verilog/darkriscv/v38/src'
Line 798... Line 823...
 
 
And one number for speed grade 3 devices:
And one number for speed grade 3 devices:
 
 
- Kintex-7:     221MHz
- Kintex-7:     221MHz
 
 
Although Vivado is far slow and shows pessimistic numbers for the same FPGAs when
Although Vivado is far slow and shows pessimistic numbers for the same FPGAs
compared with ISE, I guess Vivado is more realistic and, at least, it supports the
when compared with ISE, I guess Vivado is more realistic and, at least, it
new Spartan-7, which shows very good numbers (almost the same as the Artix-7!).
supports the new Spartan-7, which shows very good numbers (almost the same
 
as the Artix-7!).
 
 
That values are only for reference.  The real values depends of some options
That values are only for reference.  The real values depends of some options
in the core, such as the number of pipeline stages, who the memories are
in the core, such as the number of pipeline stages, who the memories are
connected, etc.  Basically, the best clock is reached by the 3-stage
connected, etc.  Basically, the best clock is reached by the 3-stage
pipeline version (up to 100MHz in a Spartan-6), but it requires at lease 1
pipeline version (up to 100MHz in a Spartan-6), but it requires at lease 1
Line 1031... Line 1057...
- https://www.twitch.tv/videos/850859857 instruction decode and execute - part 1/3 (08h56)
- https://www.twitch.tv/videos/850859857 instruction decode and execute - part 1/3 (08h56)
- https://www.twitch.tv/videos/852082786 instruction decode and execute - part 2/3 (10h56)
- https://www.twitch.tv/videos/852082786 instruction decode and execute - part 2/3 (10h56)
- https://www.twitch.tv/videos/858055433 instruction decode and execute - part 3/3 - SoC simulation (10h24)
- https://www.twitch.tv/videos/858055433 instruction decode and execute - part 3/3 - SoC simulation (10h24)
- TBD tests in the Lattice FPGA
- TBD tests in the Lattice FPGA
 
 
Unfortunately the video set is currently in portuguese only and there a lot of
Unfortunately the video set is currently in portuguese only and there a lot
parallel discussions about technology, including the fix of the Teske's notebook
of parallel discussions about technology, including the fix of the Teske's
online! I hope in the future will be possible edit the video set and, maybe,
notebook online!  I hope in the future will be possible edit the video set
create english subtitles.
and, maybe, create english subtitles.
 
 
About the processor itself, it is a microcode oriented concept with a classic
About the processor itself, it is a microcode oriented concept with a
von neumann archirecture, designed to support more easily different ISAs. It is really
classic von neumann archirecture, designed to support more easily different
very different than the traditional RISC cores that we found around! Also, it includes
ISAs.  It is really very different than the traditional RISC cores that we
a very good eco-system around opensource tools, such as Icarus, Yosys and gtkWave!
found around!  Also, it includes a very good eco-system around opensource
 
tools, such as Icarus, Yosys and gtkWave!
 
 
Although not finished yet (95% done!), I think it is very illustrative about the RISC-V design:
Although not finished yet (95% done!), I think it is very illustrative about the RISC-V design:
 
 
- rv32e instruction set: very reduced (37) and very ortogonal bit patterns (6)
- rv32e instruction set: very reduced (37) and very ortogonal bit patterns (6)
- rv32e register set: 16x32-bit register bank and a 32-bit program counter
- rv32e register set: 16x32-bit register bank and a 32-bit program counter
- rv32e ALU with basic operations for reg/imm and reg/reg instructions
- rv32e ALU with basic operations for reg/imm and reg/reg instructions
- rv32e instruction decode: very simple to understand, very direct to implement
- rv32e instruction decode: very simple to understand, very direct to implement
- rv32e software support: the GCC support provides an easy way to generate code and test it!
- rv32e software support: the GCC support provides an easy way to generate code and test it!
 
 
The Teske's proposal is not design the faster RISC-V core ever (we already have lots
The Teske's proposal is not design the faster RISC-V core ever (we already
of faster cores with CPI ~ 1, such as the darkriscv, vexriscv, etc), but create a clean,
have lots of faster cores with CPI ~ 1, such as the darkriscv, vexriscv,
reliable and compreensive RISC-V core.
etc), but create a clean, reliable and compreensive RISC-V core.
 
 
You can check the code in the following repository:
You can check the code in the following repository:
 
 
- https://github.com/racerxdl/riskow
- https://github.com/racerxdl/riskow
 
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.