FT816Float - Floating point accelerator

Project maintainers


Name: ft816float
Created: Dec 9, 2014
Updated: Jun 3, 2023
SVN Updated: Jun 3, 2023
SVN: Browse
Latest version: download (might take a bit to start...)
Statistics: View
Bugs: 2 reported / 2 solved
Star4you like it: star it!

Other project properties

Category:Arithmetic core
Development status:Alpha
Additional info:
WishBone compliant: No
WishBone version: n/a
License: LGPL


06/03/2023 - added sine / co-sine modules. Although using 64-bit values they are only estimates not good to the full precision due to the need for tables generated with 128-bit FP which has not been done yet.

12/14/2022 - added 96-bit densely-packed-decimal floating point modules. The DF Divide requires thousands of clock cycles to execute. It may be faster to implement in software.

08/24/2022 - Found a bug in the float point modules having to do with denormal numbers and zero. Zero is treated as a denormal number and the exponent is incorrectly incremented by one during calculations. The line looks like: wire [fp64Pkg::EMSB:0] xad = xa|adn; // operand a exponent, compensated for denormalized numbers It should be: wire [fp64Pkg::EMSB:0] xad = xa|(adn&~az); // operand a exponent, compensated for denormalized numbers

This applies to addsub, multiply, divide and fma modules.

03/06/2022 - A bug was found in the carry chain for the Karatsuba version of multipliers. This showed up in about 1 in 5000 test cases. The bug has been fixed. Approximately 50,000 random test cases were run for the 128x128 multiplier without errors.

01/14/2021 - Moving the decimal floating-point so that it uses an IEEE format. Added pack and unpack modules. Modules using IEEE format have the size appended to the module name as in

12/20/2020 - Decimal floating-point modules have been parameterized to allow the number of digits processed to vary. The default is to process 33 digits using a 152-bit format like the 128-bit format just with more digits. When moving data into or out of the decimal floating-point unit, a densely packed decimal format is used which allows packing the 33 digits into a 128-bit value.

12/17/2020 - A decimal floating-point adder/subtracter and multiplier were added. The DFP (decimal floating-point) format is not a standard one. It consists of 128 bits for the number. Digits are represented as packed BCD numbers. The low order 108 bits contain the significand of 27 decimal digits. The next 16 bits are an exponent which is a power of 10. Above that is an additional nybble that contains a NaN indicator, sign of number, Inifnity indicator then sign of exponent.

Decimal Floating-Point Format

11/01/2020 - A fully pipelined posit multipler was added. This is a hand pipelined and able to issue every other clock with a latency of 13 clock cycles. 10/30/2020 - posit to integer converter added. Note that some modules are BSD3 license.

10/16/2020 - posit divider added. The results were compared with results from the PACoGen project. They sometimes differ in the last bit. This is due to a more accurate lookup table used for NR iterations.

04/16/2020 - added some support for posit numbers. (see

07/06/2019 - Updated the square root core to allow restarting the calculation any time load is active.

06/14/2019 - Updates have been made to improve the accuracy of the cores. The normalizer needed an extra bit for results generated by the FMA. Please be wary about use. The author has limited testing resources.

06/11/2019 - Added a latency 17 fused multiplier - adder (FMA) , also commonly called a MAC. Testing shows that the output is sometimes off by one in the LSB versus the results generated by a program on the workstation. It could have to do with the way the test data was generated. It seems there may also be a pipeline skew issue with the core. It's supposed to be able to process a new set of operands every clock cycle, but works better if the latency expires first.

06/09/2019 - Added a latency of 10 adder/subtractor. This adder/subtractor should be capable of a much higher clock frequency than the current adder/subtractor which has a latency of two.

06/06/2019 - Reciprocal estimate function added. The estimate is accurate to about eight bits. It uses a piecewise linear approximation from a 1024 entry lookup table, then interpolates. Comparing the results of the reciprocal generated by the workstation (fpRes_tv.txt) and simulation results (fpRes_tvo.txt) is a bit tricky. For some reason the order of the output is a little scrambled. Also added reciprocal square root estimate and sigmoid function estimates.

10/10/2018 - Goldschmidt divider added
The floating-point divider may now use a Goldschmidt divider. This is the divider used in some modern microprocessors and it converges in very few clock cycles (around six clocks).

fpDiv – when using the Goldschmidt divider, the divider result is sometimes off by 1 in the least significant bit. It may be too high but is usually too low if it’s off. This is due to the fact that only four extra bits are being calculated in the divider. Increasing the number of extra bits calculated would help to obtain a match to the workstation results. The Goldschmidt divider may not be suitable for an FPGA implementation. It uses a fair number of resources in an FPGA in part due to the need for two wide single cycle multiply operations.

02/09/2018 - a fix was made to the position of the sticky bit
02/05/2018 - Some more testing has been done
About 8,000 single precision random test values have been fed into fpMul, FpAAddsub, and fpDiv and output checked against the output produced by a desktop workstation.
Results are somewhat different, but the same in many cases.
- underflow output isn't the same, those really small numbers might be off.

12/09/2016 - Some rudimentary testing has been done on the fp units at 128 bit and 80 bit precision. It correctly calculates the following:
10.0 + 10.0 = 20.
10.0 * 10.0 = 100.
300.0 / 25.0 = 12.
1.0 + 1.0 = 2
1.0 + 0.0 = 1
1.0 - 1.0/65536 = 0.99998474121095

7/10/2016 - This project is now a bit of a misnomer because it includes cores for IEEE compatible operations as well as the original FT816 core. Rather than start another project I just decided to lump the cores together in this one. FT816Float.v is the original unit which shouldn't require any other modules to use.
added missing redor64 function for floating point unit

3/24/2016 - Added FloatToInt and IntToFloat cores with single cycle latency

FT816 floating point accelerator consists of two ninety-six bit floating point accumulators between which floating point or fixed point operations occur. Basic operations include ADD, SUB, MUL, DIV, FIX2FLT, FLT2FIX, SWAP, NEG and ABS. The floating point accumulators operate as a memory mapped device placed by default between $FEA200 and $FEA2FF. The floating point accelerator communicates through a byte wide data port and twenty-four bit address port. It was intended for use primarily with smaller byte oriented cpu’s like the 65xx, 68xx series in order to provide them with some floating point capability.