Thor Superscaler

Project maintainers

Finch, Robert

Details

Name: thor
Created: Dec 14, 2015
Updated: Apr 28, 2019
SVN Updated: May 11, 2019
SVN: Browse
Latest version: download (might take a bit to start...)
Statistics: View
Bugs: 0 reported / 0 solved

Star1you like it: star it!

Other project properties

Category:Processor
Language:Verilog
Development status:Mature
Additional info:
WishBone compliant: No
WishBone version: n/a
License: LGPL

Description

Work is progessing on version seven of the core. Version eight has been shelved. There are many updates and improvements to the compiler and assembler. A number of changes to the ISA have been made. The docs are somewhat out-of-date with the running target.

The core is up to version eight now which is in the works. The author felt it might be a good idea to release a snapshot of some of the version seven files. The guts of the core is in the FT64.v file.
Version seven of the core is similar to version five with some improvements. The register specifier fields in the instructions have been switched around to provide better decoding efficiencies. Version seven of the core is able to clear the screen and boot into a small monitor program, it runs a little bit better than version five. Version seven allows speculative loads to be disabled and features bus randomization on failed speculative execution to help mitigate issues with security and integrity.

These cores are up to version five now. The v5 core uses three different instruction lengths to reduce code footprint (16/32/48) bit instructions are present. v5 is really different from prior versions. The core is more configurable than previous versions. Now a write-buffer is also available. The v5 core is substantially smaller than previous versions a lot due to a better decoder implementation.

Thor II / FT64 is a very different machine from Thor. It remains a two-way superscalar core.
- Thor II uses a fixed 32 bit instruction format
- no predicate registers
Thor Superscaler is a two-way superscalar processing core. On trivial test code it can execute up to two instructions at a time for a CPI of 0.5 minimum.

Features

FT64v5 features include
32 general purpose registers
32 floating point registers
32 vector registers, length 63
- instruction L1,L2, and data caches
- branch prediction (2,2) correlating (BTB branch target buffer)
- return address prediction (RSB)
- load speculation
- dual instruction decoders
- dual asymmetrical ALU's
- one floating-point unit
- one branch unit
- one memory unit (1-3 load channels)
- seven entry write buffer
- precise interrupts and exceptions
- 16 bit compressed instruction set, 32 and 48 bit instruction forms
- Vector and SIMD operations
- fine-grained SMT (simultaneous multi-threading)
- four operating levels
Approx. Size 61,000 LC's.

Thor II's features include
32 general purpose registers
32 floating point registers
32 vector registers, length 63
- instruction L1,L2, and data caches
- branch prediction (2,2) correlating (BTB branch target buffer)
- return address prediction (RSB)
- load speculation
- dual asymmetrical ALU's
- one floating-point unit
- one branch unit
- one memory unit
- precise interrupts and exceptions
- Vector and SIMD operations
- fine-grained SMT (simultaneous multi-threading)
- eight operating levels
Approx. Size 210,000 LC's.

Thor's features include
64 general purpose registers,
loop counter
16 code address registers
8 segment registers
16 predicate registers (8 for 32 bit version)
debug registers
- predicated instruction execution
- instruction and data caches
- branch prediction (2,2) correlating
- load speculation
- string instructions
- dual asymmetrical ALU's
- one memory unit
- precise interrupts and exceptions
- two operating modes (kernel and user)
Approx. Size 90,000 LC's (32 bit limited version)

Current Status

09/14/2018
FTv5 is still being developed and tested on an FPGA. The integer part of the design is able to run into the thousands of instructions in simulation. SMT is not tested. Floating point still needs to be adapted from v2.

Thor is still being developed and tested on an FPGA.
04/14/2016
A problem with issuing multiplies to ALU#1 which doesn't normally support multiplies was found and fixed. In an FPGA Thor now gets about a couple of dozen lines further into the start-up code before crashing.

03/15/2016
Some more updates to the compiler. In theory there is now partial support for C++ style classes. (Single inheritance, no templates and no virtual functions).

02/19/2016
The compiler and assembler have been updated. The assembler didn't support the cpuid instruction. The compiler was assigning global variables to registers in an unsafe fashion. In the RTL code the cpuid instruction wasn't restricted to executing only on ALU#0 and it should have been.
The emulator has been updated as well.

02/15/2016
The Thor emulator has most of the instruction set supported now. Some of the oddball instructions aren't supported yet. The emulator is capable of running the BIOS to the start-up menu. A few software bugs in the BIOS were fixed with the aid of the emulator. In the real FPGA the software hangs after clearing the green debug screen.

He is able to run for several thousand cycles before crashing.
01/05/2016
Several bugs have been fixed and Thor now runs significantly better.
The Thor compiler (C64) has been updated and an attempt is being made to get Thor to run compiled code.

A software emulator for Thor has been started. It is capable of running the first few lines of the boot ROM, but not much else.

Goals

A goal for Thor is to encompass and include every piece of a modern processor.
Thor is much more complex than a bares-bones RISC processor. It is not
recommended to study until one has a good knowledge of cpu architecture.
As such Thor has superscalar operation and both segmentation and paging units.
Thor makes use of variable length instructions in order to improve code
density. Even with a whole byte dedicated for predication the average
instruction length is working out to be about 34 bits just slightly larger
than a 32 bit RISC. Thor's feature set means that it is a large core. The
plan is to make it even larger in the future. An eventual goal for Thor is
to support vector operations through the superscalar pipeline.

Some Eventual Goals:

A larger queue.
- A larger reorder queue may make better use of predication as more basic
blocks would fit into the queue. With Thor's current short queue an average
branch clears out only about 3 or 4 instructions. That means predication
is applied only for short instruction sequences which isn't all that useful.
The first step would be to double the queue length to 16 instructions from
eight.

Wide FP and ALU units.
- Wide functional units would allow processing multiple vector elements
and SIMD type instructions as single entities in the pipeline. Wide units
mean that it isn't neccessary to increase the number of commit busses. Only
the bus width increases. 4x normal width would be a starting point.

More functional Units
- More functional units would allow more parallel processing potential. This
would have additional issue logic overhead.

Fetch, enque and commit of three or more instructions at a time.
- Thor currently processes a maximum of two instruction at a time.
Using a combination of a greater number of ways superscalar and wider
functional units Thor would be able to process a number of vector
elements at once.

64 bit (or wider) datapath processing
- The current FPGA makes use of only 32 bit processing in order to conserve
resources. While there is an option to control the bus width, it hasn't
been tested significantly.

Multi-core processing

These changes would likely make Thor about an order of magnitude (10x-20x)
larger or around 2M LC's.