OpenCores
URL https://opencores.org/ocsvn/fp_log/fp_log/trunk

Subversion Repositories fp_log

[/] [fp_log/] [trunk/] [README] - Blame information for rev 3

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 NikosAl
======================================================================================================
2
DP-ICSI log  (C implementation of a fast logarithmic approximation unit based on ICSI log V2 0.6 Beta)
3
 
4
DP/SP LAU    (FPGA unit that implements the ICSI log algorithm in VHDL)
5
======================================================================================================
6
 
7
 
8
Version 0.2 beta
9
Build date: August 2nd, 2009
10
 
11
 
12
 
13
Introduction
14
------------
15
 
16
Software :
17
 
18
This package contains a C implementation of the ICSI logarithm approximation algorithm originally introduced in
19
 
20
O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy.
21
Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.
22
 
23
The new C function has been adjusted to support double precision inputs in contrast to the official implementation of the algorithm
24
which supports only single precision. Furthermore, there is invalid input detection which makes the function fully compatible with
25
the IEEE 754 standard and the GNU library log() function.
26
 
27
 
28
Hardware:
29
 
30
This package also contains a VHDL implementation of the ICSI logarithm approximation algorithm described in
31
 
32
N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop,
33
held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.
34
 
35
The SP-LAU (Single Precision Logarithm Approximation Unit) implements the algorithm and supports single precision inputs.
36
The DP-LAU (Double Precision Logarithm Approximation Unit) implements the algorithm and supports double precision inputs.
37
 
38
Both units support invalid input detection.
39
 
40
All implementations in this package calculate an approximation of the natural logarithm.
41
 
42
For more details about the software implementation see the respective readme file and paper for the ICSI log.
43
 
44
 
45
 
46
Package Structure
47
-----------------
48
 
49
This package contains the following files and folder:
50
 
51
-README                                 : This file
52
 
53
-DP-ICSILog/DP-ICSILog.c        : C file that contains the adjusted for double precision implementation and an example of how to use the function.
54
 
55
-Virtex 5/SP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 5.
56
 
57
-Virtex 5/DP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 5.
58
 
59
-Virtex 4/SP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 4.
60
 
61
-Virtex 4/DP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 4.
62
 
63
-COE Files                              : This folder contains COE files to be used if one needs to adjust the accuracy of the unit.
64
 
65
-PAO Files                              : This folder contains PAO files that contain the Peripheral Analysis Order for the SP and DP LAUs.
66
 
67
 
68
 
69
 
70
Usage of the DP-ICSILog
71
-----------------------
72
 
73
 
74
The DP-ICSILog.c file contains the necessay global variables and functions that need to be
75
called in order to use the DP-ICSILog function as well as an example.
76
 
77
 
78
 
79
Interface of the LAU
80
--------------------
81
 
82
The toplevel module of the LAU is sp_fp_log_v2 for the single precision logarithm approximation unit
83
and the dp_fp_log_v2 for double precison.
84
 
85
sp/dp : Single Precision / Double Precision
86
fp    : Floating Point
87
log   : Logarithm
88
V2    : Because the mantissa lookup table has been initialized using the respective function of the ICSILog V2 0.6 Beta software.
89
        (The Version 2 of this function doubled the precision of the unit comparing to Version 1)
90
 
91
The interface of the unit is defined as follows:
92
 
93
entity sp_fp_log_v2/dp_fp_log_v2 is
94
        Port ( rst : in STD_LOGIC;          -- The reset signal
95
               clk : in STD_LOGIC;          -- The clock signal
96
               valid_in: in STD_LOGIC;      -- Signal that indicates valid number at the input port of the unit.
97
               input_val: STD_LOGIC_VECTOR(31/63 downto 0);     -- The input number.
98
               valid_out : STD_LOGIC;       -- Signal that indicates valid number at the output port of the unit.
99
               output_val : STD_LOGIC_VECTOR(31/63 downto 0)   -- The output number, the approximation of the logarithm of the input number.
100
              );
101
end sp_fp_log_v2/dp_fp_log_v2;
102
 
103
 
104
 
105
Implementation Details
106
----------------------
107
 
108
The VHDL units have been designed using the Xilinx 10.1 Design Suite.
109
 
110
ISE 10.1 was used to create the unit.
111
 
112
Coregen was used to create all the IPs used in this unit.
113
 
114
The released LAUs use a mantissa lookup table with 4,096 entries.
115
 
116
Target devices are Virtex 4 and Virtex 5 FPGAs.
117
 
118
One needs to change the IPs used in order to use the unit on any FPGA that meets the demands of number of block rams (This number
119
depends on the desired accuracy and thus on the size of the mantissa lookup table) and number of DSP slices (3 DSP slices are occupied).
120
 
121
One can use the coe files in the COE file folder to regenerate the mantissa lookup table for different accuracy and resources occupation.
122
 
123
Both units have a latency of 22 cycles (Virtex 5) and 28 cycles (Virtex 4) which is the same irrespective of the size of the mantissa lookup table used and thus the accuracy.
124
 
125
The released units occupy 2% of the hardware resources on the Virtex 5 SX95T FPGA and can operate with the following clock frequencies
126
as they were reported by the static timing report:
127
 
128
353.4 MHz for the SP-LAU on the V5SX95T-2 and
129
320.6 MHz for the DP-LAU on the V5SX95T-2 .
130
 
131
 
132
 
133
Verification Details
134
--------------------
135
 
136
Modelsim 6.3f was used for extensive post place and route simulations.
137
 
138
The development board HTG-V5-PCIE by HiTech Global populated with a V5SX95T-1 FPGA was used to verify the LAUs.
139
 
140
ChiScope Pro Analyzer was used for advanced on-chip debugging and verification of the units.
141
 
142
 
143
 
144
 
145
 
146
IP Configuration Details for the Virtex 5 LAUs
147
----------------------------------------------
148
 
149
The IPs used for the implementations are the following:
150
(The configuration options that are not mentioned were not selected.)
151
 
152
 
153
comp_eq_000000000000 :
154
 
155
Comparator ,
156
Operation :A=B,
157
Data Type: Unsigned,
158
Input Width: 12,
159
Port B Constant: 000000000000,
160
Pipeline Stages: 0,
161
Output Options:Registered Output,
162
Synchronous Settings: Clear
163
 
164
 
165
comp_eq_000000000000000 :
166
 
167
Comparator ,
168
Operation :A=B,
169
Data Type: Unsigned,
170
Input Width: 15,
171
Port B Constant: 000000000000000,
172
Pipeline Stages: 0,
173
Output Options:Registered Output,
174
Synchronous Settings: Clear
175
 
176
 
177
comp_eq_8ones :
178
 
179
Comparator ,
180
Operation :A=B,
181
Data Type: Unsigned,
182
Input Width: 8,
183
Port B Constant: 11111111,
184
Pipeline Stages: 0,
185
Output Options:Registered Output,
186
Synchronous Settings: Clear
187
 
188
 
189
comp_eq_11ones :
190
 
191
Comparator ,
192
Operation :A=B,
193
Data Type: Unsigned,
194
Input Width: 11,
195
Port B Constant: 11111111111,
196
Pipeline Stages: 0,
197
Output Options:Registered Output,
198
Synchronous Settings: Clear
199
 
200
 
201
comp_eq_22zeros :
202
 
203
Comparator ,
204
Operation :A=B,
205
Data Type: Unsigned,
206
Input Width: 22,
207
Port B Constant: 00000...0000,
208
Pipeline Stages: 0,
209
Output Options:Registered Output,
210
Synchronous Settings: Clear
211
 
212
 
213
comp_eq_51zeros :
214
 
215
Comparator ,
216
Operation :A=B,
217
Data Type: Unsigned,
218
Input Width: 51,
219
Port B Constant: 00000...0000,
220
Pipeline Stages: 0,
221
Output Options:Registered Output,
222
Synchronous Settings: Clear
223
 
224
 
225
comp_eq_111111 :
226
 
227
Comparator ,
228
Operation :A=B,
229
Data Type: Unsigned,
230
Input Width: 6,
231
Port B Constant: 111111,
232
Pipeline Stages: 0,
233
Output Options:Registered Output,
234
Synchronous Settings: Clear
235
 
236
 
237
comp_eq_111111111 :
238
 
239
Comparator ,
240
Operation :A=B,
241
Data Type: Unsigned,
242
Input Width: 9,
243
Port B Constant: 111111111,
244
Pipeline Stages: 0,
245
Output Options:Registered Output,
246
Synchronous Settings: Clear
247
 
248
 
249
exp_lut_MEM :
250
 
251
Block Memory Generator,
252
Memory Type: Single Port ROM,
253
Read Width: 9
254
Read Depth: 128
255
 
256
 
257
mant_lut_MEM :
258
 
259
Block Memory Generator,
260
Memory Type: Single Port ROM,
261
Read Width: 27
262
Read Depth: 4096 (depends on the desired accuracy)
263
 
264
 
265
All the registers used are RAM-based Shift Registers. The width and depth of each register is indicated by the name.
266
For example: reg_1b_1c is a register of 1 bit and 1 clock latency.
267
 
268
 
269
sp_fp_add:
270
 
271
Floating Point,
272
Operation Selection: Add,
273
Precision: Single,
274
Architecture Optimization: High Speed,
275
Family Optimizations: Full Usage,
276
Latency and Rate Configuration: Use Maximum Latency
277
 
278
 
279
sp_fp_mult:
280
 
281
Floating Point,
282
Operation Selection: Multiply,
283
Precision: Single,
284
Architecture Optimization: High Speed,
285
Family Optimizations: Medium Usage,
286
Latency and Rate Configuration: Use Maximum Latency
287
 
288
 
289
Note:
290
The Coregen Project Settings were changed from Virtex 5 to Virtex 4 and all the above IPs were regenerated under the current project settings,
291
except only for the RAM-based Shift Registers that operate in parallel with the sp_fp_add and sp_fp_mult IPs. In this case the depth (clock delay)
292
was changed according to the latency of the sp_fp_add and sp_fp_mult IPs.
293
 
294
 
295
 
296
Authors and Contact Details
297
---------------------------
298
 
299
Nikos Alachiotis                        alachiot@in.tum.de
300
Alexandros Stamatakis           stamatak@in.tum.de
301
 
302
 
303
 
304
Copyright
305
---------
306
 
307
These programs are free software; you can redistribute it and/or modify
308 3 NikosAl
it under the terms of the GNU Lesser General Public License as published by
309 2 NikosAl
the Free Software Foundation; either version 2 of the License, or
310
(at your option) any later version.
311
 
312
The programs are distributed in the hope that they will be useful,
313
but WITHOUT ANY WARRANTY; without even the implied warranty of
314
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
315
GNU General Public License for more details.
316
 
317
 
318
Further Information
319
-------------------
320
 
321
The FPGA units SP-LAU and DP-LAU are exact implementations of the SP-ICSILog and the DP-ICSILog algorithms respectively.
322
Furthermore there is support for invalid input detection like nan, inf, -inf or zero.
323
 
324
For more information on the LAU see the paper:
325
 
326
N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop,
327
held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.
328
 
329
For more information on the ICSI log algorithm see the paper:
330
 
331
O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy.
332
Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.
333
 
334
or/and download the official single precision C implementation from:
335
 
336
http://linux.softpedia.com/get/Programming/Libraries/ICSILog-41333.shtml
337
 
338
 
339
Citation
340
--------
341
 
342
By using this component you agree to cite it as: "Efficient Floating-Point Logarithm Unit for FPGAs", by Nikos Alachiotis and Alexandros Stamatakis, accapted for publication at RAW workhsop, held in conjunction with IPDPS 2010.
343
 
344
 
345
Release Notes
346
------------
347
 
348
Version : 0.2 beta
349
Build date : September 20th, 2009
350
 * support for Virtex 4 FPGAs as well
351
 * FPGA verification
352
 
353
Version : 0.1 beta
354
Build date : August 2nd, 2009
355
 * support for Virtex 5 FPGAs only
356
 * Tested by using extensive post place and route simulations.
357
 
358
 
359
 
360
 
361
 
362
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.