OpenCores
URL https://opencores.org/ocsvn/double_fpu/double_fpu/trunk

Subversion Repositories double_fpu

[/] [double_fpu/] [trunk/] [Readme.txt] - Blame information for rev 13

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 davidklun
The following describes the IEEE-Standard-754 compliant, double-precision floating point unit,
2
written in Verilog.  The module consists of the following files:
3
 
4
1.      fpu_double.v (top level)
5
2.      fpu_add.v
6
3.      fpu_sub.v
7
4.      fpu_mul.v
8
5.      fpu_div.v
9
6.      fpu_round.v
10
7.      fpu_exceptions.v
11
 
12
And a testbench file is included, containing 50 test-case operations:
13
1.      fpu_tb.v
14
 
15
This unit has been extensively simulated, covering all operations, rounding modes, exceptions
16
like underflow and overflow, and even the obscure corner cases, like when overflowing from
17
denormalized to normalized, and vice-versa.
18
 
19
The floating point unit supports denormalized numbers,
20
4 operations (add, subtract, multiply, divide), and 4 rounding
21
modes (nearest, zero, + inf, - inf).  The unit was synthesized with an
22
estimated frequency of 230 MHz, for a Virtex5 target device.  The synthesis results
23
are below.  fpu_double.v is the top-level module, and it contains the input
24
and output signals from the unit.  The unit was designed to be synchronous with
25
one global clock, and all of the registers can be reset with an synchronous global reset.
26
When the inputs signals (a and b operands, fpu operation code, rounding mode code) are
27
available, set the enable input high, then set it low after 2 clock cycles.  When the
28
operation is complete and the output is available, the ready signal will go high.  To start
29
the next operation, set the enable input high.
30
 
31
Each operation takes the following amount of clock cycles to complete:
32
1.      addition :              20 clock cycles
33
2.      subtraction:            21 clock cycles
34
3.      multiplication:         24 clock cycles
35
4.      division:               71 clock cycles
36
 
37
This is longer than other floating point units, but supporting denormalized numbers
38
requires more signals and logic levels to accommodate gradual underflow.  The supported
39
clock speed of 230 MHz makes up for the large number of clock cycles required for each
40
operation to complete.  If you have a lower clock speed, the code can be changed to
41
reduce the number of registers and latency of each operation. I purposely increased the
42
number of logic levels to get the code to synthesize to a faster clock frequency, but of course,
43
this led to longer latency.  I guess it depends on your application what is more important.
44
 
45
The following output signals are also available: underflow, overflow, inexact, exception,
46
and invalid.  They are compliant with the IEEE-754 definition of each signal.  The unit
47
will handle QNaN and SNaN inputs per the standard.
48
 
49
I'm planning on adding more operations, like square root, sin, cos, tan, etc.,
50
so check back for updates.
51
 
52
Multiply:
53
The multiply module is written specifically for a Virtex5 target device.  The DSP48E slices
54
can perform a 25-bit by 18-bit Twos-complement multiply (24 by 17 unsigned multiply).  I broke up the multiply to
55
fit these DSP48E slices.  The breakdown is similar to the design in Figure 4-15 of the
56
Xilinx User Guide Document, "Virtex-5 FPGA XtremeDSP Design Considerations", also known as UG193.
57
You can find this document at xilinx.com by searching for "UG193".
58
Depending on your device, the multiply can be changed to match the bit-widths of the available
59
multipliers.  A total of 9 DSP48E slices are used to do the 53-bit by 53-bit multiply of 2
60
floating point numbers.
61
 
62
If you have any questions, please email me at: davidklun@gmail.com
63
 
64
Thanks,
65
David Lundgren
66
 
67
-----
68
 
69
Synthesis Results:
70
 
71
 
72
 
73
 
74
Performance Summary
75
*******************
76
 
77
 
78
Worst slack in design: -0.971
79
 
80
                   Requested     Estimated     Requested     Estimated                Clock        Clock
81
Starting Clock     Frequency     Frequency     Period        Period        Slack      Type         Group
82
-----------------------------------------------------------------------------------------------------------
83
fpu|clk            300.0 MHz     232.3 MHz     3.333         4.304         -0.971     inferred
84
==========================================================================
85
 
86
---------------------------------------
87
Resource Usage Report for fpu
88
 
89
Mapping to part: xc5vsx95tff1136-2
90
Cell usage:
91
DSP48E          9 uses
92
FD              5 uses
93
FDR             519 uses
94
FDRE            3920 uses
95
FDRSE           1 use
96
GND             6 uses
97
LD              6 uses
98
MUXCY           35 uses
99
MUXCY_L         704 uses
100
MUXF7           1 use
101
VCC             5 uses
102
XORCY           491 uses
103
XORCY_L         12 uses
104
LUT1            185 uses
105
LUT2            725 uses
106
LUT3            1523 uses
107
LUT4            738 uses
108
LUT5            604 uses
109
LUT6            2506 uses
110
 
111
I/O ports: 206
112
I/O primitives: 205
113
IBUF           135 uses
114
OBUF           70 uses
115
 
116
BUFGP          1 use
117
 
118
I/O Register bits:                  0
119
Register bits not including I/Os:   4445 (7%)
120
Latch bits not including I/Os:      6 (0%)
121
 
122
Global Clock Buffers: 1 of 32 (3%)
123
 
124
Total load per clock:
125
   fpu|clk: 4454
126
 
127
Mapping Summary:
128
Total  LUTs: 6281 (10%)
129
 
130
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.