Name: double_fpu

Created: Jan 7, 2009

Updated: Jul 2, 2017

SVN Updated: Mar 10, 2009

SVN: Browse

Latest version: download (might take a bit to start...)

Statistics: View

Bugs: 1 reported / 0 solved

Created: Jan 7, 2009

Updated: Jul 2, 2017

SVN Updated: Mar 10, 2009

SVN: Browse

Latest version: download (might take a bit to start...)

Statistics: View

Bugs: 1 reported / 0 solved

Star0you like it: star it!

Category:Arithmetic core

Language:Verilog

Development status:Alpha

Additional info:Design done

WishBone compliant: No

WishBone version: n/a

License: LGPL

Language:Verilog

Development status:Alpha

Additional info:Design done

WishBone compliant: No

WishBone version: n/a

License: LGPL

- The unit is designed to be synchronous to one global clock. All registers are updated on the rising edge of the clock.

- All registers can be reset with one global reset.

- The multiply operation is broken up to take advantage of the 25 x 18 multiply blocks in the Virtex5 DSP48E slices. The 25 x 18 multiply twos complement block will perform a 24 x 17 unsigned multiply, so it takes 9 DSP48E slices to perform the 53 x 53 bit multiply required to multiply two double-precision floating point numbers.

- fpu_double.v is the top-level module. The input signals are:

- 1) clk

- 2) rst

- 3) enable

- 4) rmode (rounding mode)

- 5) fpu_op (operation code)

- 6) opa (64-bit floating point number)

- 7) opb (64-bit floating point number)

- The output signals are:

- 1) out (64-bit floating point output)

- 2) ready (goes high when the output is ready)

- 3) underflow

- 4) overflow

- 5) inexact

- 6) exception

- 7) invalid

- Each operation takes the following amount of clock cycles to complete:

- 1. addition : 20 clock cycles

- 2. subtraction: 21 clock cycles

- 3. multiplication: 24 clock cycles

- 4. division: 71 clock cycles

- This is longer than some floating point units, but the support for denormalized numbers requires several more logic levels and a longer latency.

- All registers can be reset with one global reset.

- The multiply operation is broken up to take advantage of the 25 x 18 multiply blocks in the Virtex5 DSP48E slices. The 25 x 18 multiply twos complement block will perform a 24 x 17 unsigned multiply, so it takes 9 DSP48E slices to perform the 53 x 53 bit multiply required to multiply two double-precision floating point numbers.

- fpu_double.v is the top-level module. The input signals are:

- 1) clk

- 2) rst

- 3) enable

- 4) rmode (rounding mode)

- 5) fpu_op (operation code)

- 6) opa (64-bit floating point number)

- 7) opb (64-bit floating point number)

- The output signals are:

- 1) out (64-bit floating point output)

- 2) ready (goes high when the output is ready)

- 3) underflow

- 4) overflow

- 5) inexact

- 6) exception

- 7) invalid

- Each operation takes the following amount of clock cycles to complete:

- 1. addition : 20 clock cycles

- 2. subtraction: 21 clock cycles

- 3. multiplication: 24 clock cycles

- 4. division: 71 clock cycles

- This is longer than some floating point units, but the support for denormalized numbers requires several more logic levels and a longer latency.

- version 1

- pipelined versions of add/sub and multiply are included in the "pipeline" folder

- pipelined versions of add/sub and multiply are included in the "pipeline" folder

IEEE-754 compliant double-precision floating point unit. 4 operations (addition, subtraction, multiplication, division) are supported, as are the 4 rounding modes (nearest, 0, +inf, -inf). This unit also supports denormalized numbers, which is rare because most floating point units treat denormalized numbers as zero. The unit can run at clock frequencies up to 230 MHz for a Virtex5 target device.

Also, a pipelined version of add/sub and multiply is available in the pipeline folder. Add/sub has a latency of 24 clock cycles, then an answer is available on each clock cycle. Multiply has a latency of 21 clock cycles. Denormalized numbers are treated as 0 by the pipelined versions.

Also, a pipelined version of add/sub and multiply is available in the pipeline folder. Add/sub has a latency of 24 clock cycles, then an answer is available on each clock cycle. Multiply has a latency of 21 clock cycles. Denormalized numbers are treated as 0 by the pipelined versions.

© copyright 1999-2018 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.