OpenCores
URL https://opencores.org/ocsvn/fpu_double/fpu_double/trunk

Subversion Repositories fpu_double

[/] [fpu_double/] [trunk/] [Readme.txt] - Blame information for rev 5

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 davidklun
The following describes the IEEE-Standard-754 compliant, double-precision floating point unit,
2
written in VHDL.  The module consists of the following files:
3
 
4
1.      fpu_double.vhd (top level)
5
2.      fpu_add.vhd
6
3.      fpu_sub.vhd
7
4.      fpu_mul.vhd
8
5.      fpu_div.vhd
9
6.      fpu_round.vhd
10
7.      fpu_exceptions.vhd
11
8.  fpupack.vhd
12
9.  comppack.vhd
13
 
14
And a testbench file is included, containing 50 test-case operations:
15
1.      fpu_double_TB.vhd
16
 
17
This unit has been extensively simulated, covering all 4 operations, rounding modes, exceptions
18
like underflow and overflow, and even the obscure corner cases, like when overflowing from
19
denormalized to normalized, and vice-versa.
20
 
21
The floating point unit supports denormalized numbers,
22
4 operations (add, subtract, multiply, divide), and 4 rounding
23
modes (nearest, zero, + inf, - inf).  The unit was synthesized with an
24
estimated frequency of 185 MHz, for a Virtex5 target device.  The synthesis results
25
are below.  fpu_double.vhd is the top-level module, and it contains the input
26
and output signals from the unit.
27
 
28
The input and output signals to the unit are the following:
29
 
30
1. clk  (global)
31
2. rst  (global)
32
2. enable   (set high, then low, to start operation)
33
3. rmode (rounding mode, 2 bits, 00 = nearest, 01 = zero,
34
                        10 = pos inf, 11 = neg inf)
35
4. fpu_op (operation code, 3 bits, 000 = add, 001 = subtract,
36
                        010 = multiply, 011 = divide, others are not used)
37
5. opa, opb (input operands, 64 bits, Big-endian order,
38
                        bit 63 = sign, bits 62-52 exponent, bits 51-0 mantissa)
39
6. out_fp   (output from operation, 64 bits, Big-endian order,
40
                        same ordering as inputs)
41
7. ready        (goes high when output is available)
42
8. underflow
43
9. overflow
44
10. inexact
45
11. exception - see IEEE 754 definition
46
12. invalid   - see IEEE 754 definition
47
 
48
The unit was designed to be synchronous with one global clock, and all of the
49
registers can be reset with an synchronous global reset.
50
When the inputs signals (a and b operands, fpu operation code, rounding mode code) are
51
available, set the enable input high, then set it low after 2 clock cycles.  When the
52
operation is complete and the output is available, the ready signal will go high.  To start
53
the next operation, set the enable input high.
54
 
55
Each operation takes the following amount of clock cycles to complete:
56
1.      addition :                      20 clock cycles
57
2.      subtraction:            21 clock cycles
58
3.      multiplication:         24 clock cycles
59
4.      division:                       71 clock cycles
60
 
61
This is longer than other floating point units, but supporting denormalized numbers
62
requires more signals and logic levels to accommodate gradual underflow.  The supported
63
clock speed of 185 MHz makes up for the large number of clock cycles required for each
64
operation to complete.  If you have a lower clock speed, the code can be changed to
65
reduce the number of registers and latency of each operation. I purposely increased the
66
number of logic levels to get the code to synthesize to a faster clock frequency, but of course,
67
this led to longer latency.  I guess it depends on your application what is more important.
68
 
69
The following output signals are also available: underflow, overflow, inexact, exception,
70
and invalid.  They are compliant with the IEEE-754 definition of each signal.  The unit
71
will handle QNaN and SNaN inputs per the standard.
72
 
73
I'm planning on adding more operations, like square root, sin, cos, tan, etc.,
74
so check back for updates.
75
 
76
Multiply:
77
The multiply module is written specifically for a Virtex5 target device.  The DSP48E slices
78
can perform a 25-bit by 18-bit Twos-complement multiply (24 by 17 unsigned multiply).  I broke up the multiply to
79
fit these DSP48E slices.  The breakdown is similar to the design in Figure 4-15 of the
80
Xilinx User Guide Document, "Virtex-5 FPGA XtremeDSP Design Considerations", also known as UG193.
81
You can find this document at xilinx.com by searching for "UG193".
82
Depending on your device, the multiply can be changed to match the bit-widths of the available
83
multipliers.  A total of 9 DSP48E slices are used to do the 53-bit by 53-bit multiply of 2
84
floating point numbers.
85
 
86
If you have any questions, please email me at: davidklun@gmail.com
87
 
88
Thanks,
89
David Lundgren
90
 
91
-----
92
 
93
Synthesis Results:
94
 
95
 
96
 
97
 
98
Performance Summary
99
*******************
100
 
101
 
102
Worst slack in design: -2.049
103
 
104
 
105
                   Requested     Estimated     Requested     Estimated                Clock        Clock
106
Starting Clock     Frequency     Frequency     Period        Period        Slack      Type         Group
107
----------------------------------------------------------------------------------------------------------------------
108
fpu_double|clk     300.0 MHz     185.8 MHz     3.333         5.382         -2.049     inferred     Inferred_clkgroup_0
109
======================================================================================================================
110
 
111
 
112
---------------------------------------
113
Resource Usage Report for fpu_double
114
 
115
Mapping to part: xc5vsx95tff1136-2
116
Cell usage:
117
DSP48E          9 uses
118
FD              3 uses
119
FDE             21 uses
120
FDR             587 uses
121
FDRE            3767 uses
122
FDRS            8 uses
123
FDRSE           51 uses
124
GND             6 uses
125
MUXCY           20 uses
126
MUXCY_L         598 uses
127
MUXF7           2 uses
128
VCC             6 uses
129
XORCY           497 uses
130
XORCY_L         5 uses
131
LUT1            187 uses
132
LUT2            742 uses
133
LUT3            1591 uses
134
LUT4            847 uses
135
LUT5            589 uses
136
LUT6            2613 uses
137
 
138
I/O ports: 206
139
I/O primitives: 205
140
IBUF           135 uses
141
OBUF           70 uses
142
 
143
BUFGP          1 use
144
 
145
I/O Register bits:                  0
146
Register bits not including I/Os:   4437 (7%)
147
 
148
Global Clock Buffers: 1 of 32 (3%)
149
 
150
Total load per clock:
151
   fpu_double|clk: 4446
152
 
153
Mapping Summary:
154
Total  LUTs: 6569 (11%)
155
 
156
Mapper successful!
157
 
158
 
159
 
160
 
161
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.