OpenCores
URL https://opencores.org/ocsvn/fp_log/fp_log/trunk

Subversion Repositories fp_log

[/] [fp_log/] [trunk/] [README] - Rev 3

Compare with Previous | Blame | View Log

======================================================================================================
DP-ICSI log  (C implementation of a fast logarithmic approximation unit based on ICSI log V2 0.6 Beta)

DP/SP LAU    (FPGA unit that implements the ICSI log algorithm in VHDL)
======================================================================================================


Version 0.2 beta
Build date: August 2nd, 2009



Introduction
------------

Software : 

This package contains a C implementation of the ICSI logarithm approximation algorithm originally introduced in

O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy. 
Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.

The new C function has been adjusted to support double precision inputs in contrast to the official implementation of the algorithm 
which supports only single precision. Furthermore, there is invalid input detection which makes the function fully compatible with 
the IEEE 754 standard and the GNU library log() function.


Hardware:

This package also contains a VHDL implementation of the ICSI logarithm approximation algorithm described in

N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop, 
held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.

The SP-LAU (Single Precision Logarithm Approximation Unit) implements the algorithm and supports single precision inputs. 
The DP-LAU (Double Precision Logarithm Approximation Unit) implements the algorithm and supports double precision inputs.

Both units support invalid input detection.

All implementations in this package calculate an approximation of the natural logarithm.

For more details about the software implementation see the respective readme file and paper for the ICSI log.



Package Structure
-----------------

This package contains the following files and folder:

-README                                 : This file

-DP-ICSILog/DP-ICSILog.c        : C file that contains the adjusted for double precision implementation and an example of how to use the function.

-Virtex 5/SP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 5.

-Virtex 5/DP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 5.

-Virtex 4/SP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 4.

-Virtex 4/DP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 4.

-COE Files                              : This folder contains COE files to be used if one needs to adjust the accuracy of the unit. 

-PAO Files                              : This folder contains PAO files that contain the Peripheral Analysis Order for the SP and DP LAUs.              




Usage of the DP-ICSILog  
-----------------------


The DP-ICSILog.c file contains the necessay global variables and functions that need to be
called in order to use the DP-ICSILog function as well as an example.



Interface of the LAU 
--------------------

The toplevel module of the LAU is sp_fp_log_v2 for the single precision logarithm approximation unit
and the dp_fp_log_v2 for double precison.

sp/dp : Single Precision / Double Precision
fp    : Floating Point
log   : Logarithm
V2    : Because the mantissa lookup table has been initialized using the respective function of the ICSILog V2 0.6 Beta software.
        (The Version 2 of this function doubled the precision of the unit comparing to Version 1)

The interface of the unit is defined as follows:

entity sp_fp_log_v2/dp_fp_log_v2 is
        Port ( rst : in STD_LOGIC;          -- The reset signal
               clk : in STD_LOGIC;          -- The clock signal
               valid_in: in STD_LOGIC;      -- Signal that indicates valid number at the input port of the unit.
               input_val: STD_LOGIC_VECTOR(31/63 downto 0);     -- The input number.
               valid_out : STD_LOGIC;       -- Signal that indicates valid number at the output port of the unit.
               output_val : STD_LOGIC_VECTOR(31/63 downto 0)   -- The output number, the approximation of the logarithm of the input number.
              );
end sp_fp_log_v2/dp_fp_log_v2;



Implementation Details
----------------------

The VHDL units have been designed using the Xilinx 10.1 Design Suite.

ISE 10.1 was used to create the unit.

Coregen was used to create all the IPs used in this unit. 

The released LAUs use a mantissa lookup table with 4,096 entries.

Target devices are Virtex 4 and Virtex 5 FPGAs. 

One needs to change the IPs used in order to use the unit on any FPGA that meets the demands of number of block rams (This number
depends on the desired accuracy and thus on the size of the mantissa lookup table) and number of DSP slices (3 DSP slices are occupied).

One can use the coe files in the COE file folder to regenerate the mantissa lookup table for different accuracy and resources occupation.

Both units have a latency of 22 cycles (Virtex 5) and 28 cycles (Virtex 4) which is the same irrespective of the size of the mantissa lookup table used and thus the accuracy.

The released units occupy 2% of the hardware resources on the Virtex 5 SX95T FPGA and can operate with the following clock frequencies
as they were reported by the static timing report:

353.4 MHz for the SP-LAU on the V5SX95T-2 and
320.6 MHz for the DP-LAU on the V5SX95T-2 .



Verification Details
--------------------

Modelsim 6.3f was used for extensive post place and route simulations.

The development board HTG-V5-PCIE by HiTech Global populated with a V5SX95T-1 FPGA was used to verify the LAUs.

ChiScope Pro Analyzer was used for advanced on-chip debugging and verification of the units.




 
IP Configuration Details for the Virtex 5 LAUs
----------------------------------------------

The IPs used for the implementations are the following:
(The configuration options that are not mentioned were not selected.)


comp_eq_000000000000 : 

Comparator , 
Operation :A=B, 
Data Type: Unsigned, 
Input Width: 12, 
Port B Constant: 000000000000, 
Pipeline Stages: 0, 
Output Options:Registered Output, 
Synchronous Settings: Clear


comp_eq_000000000000000 : 

Comparator , 
Operation :A=B, 
Data Type: Unsigned, 
Input Width: 15, 
Port B Constant: 000000000000000, 
Pipeline Stages: 0, 
Output Options:Registered Output, 
Synchronous Settings: Clear


comp_eq_8ones : 

Comparator , 
Operation :A=B, 
Data Type: Unsigned, 
Input Width: 8, 
Port B Constant: 11111111, 
Pipeline Stages: 0, 
Output Options:Registered Output, 
Synchronous Settings: Clear


comp_eq_11ones : 

Comparator , 
Operation :A=B, 
Data Type: Unsigned, 
Input Width: 11, 
Port B Constant: 11111111111, 
Pipeline Stages: 0, 
Output Options:Registered Output, 
Synchronous Settings: Clear


comp_eq_22zeros : 

Comparator , 
Operation :A=B, 
Data Type: Unsigned, 
Input Width: 22, 
Port B Constant: 00000...0000, 
Pipeline Stages: 0, 
Output Options:Registered Output, 
Synchronous Settings: Clear


comp_eq_51zeros : 

Comparator , 
Operation :A=B, 
Data Type: Unsigned, 
Input Width: 51, 
Port B Constant: 00000...0000, 
Pipeline Stages: 0, 
Output Options:Registered Output, 
Synchronous Settings: Clear


comp_eq_111111 : 

Comparator , 
Operation :A=B, 
Data Type: Unsigned, 
Input Width: 6, 
Port B Constant: 111111, 
Pipeline Stages: 0, 
Output Options:Registered Output, 
Synchronous Settings: Clear


comp_eq_111111111 : 

Comparator , 
Operation :A=B, 
Data Type: Unsigned, 
Input Width: 9, 
Port B Constant: 111111111, 
Pipeline Stages: 0, 
Output Options:Registered Output, 
Synchronous Settings: Clear


exp_lut_MEM :

Block Memory Generator,
Memory Type: Single Port ROM,
Read Width: 9
Read Depth: 128


mant_lut_MEM :

Block Memory Generator,
Memory Type: Single Port ROM,
Read Width: 27
Read Depth: 4096 (depends on the desired accuracy)


All the registers used are RAM-based Shift Registers. The width and depth of each register is indicated by the name.
For example: reg_1b_1c is a register of 1 bit and 1 clock latency.


sp_fp_add:

Floating Point,
Operation Selection: Add,
Precision: Single,
Architecture Optimization: High Speed,
Family Optimizations: Full Usage,
Latency and Rate Configuration: Use Maximum Latency


sp_fp_mult:

Floating Point,
Operation Selection: Multiply,
Precision: Single,
Architecture Optimization: High Speed,
Family Optimizations: Medium Usage,
Latency and Rate Configuration: Use Maximum Latency


Note: 
The Coregen Project Settings were changed from Virtex 5 to Virtex 4 and all the above IPs were regenerated under the current project settings, 
except only for the RAM-based Shift Registers that operate in parallel with the sp_fp_add and sp_fp_mult IPs. In this case the depth (clock delay)
was changed according to the latency of the sp_fp_add and sp_fp_mult IPs.



Authors and Contact Details 
---------------------------

Nikos Alachiotis                        alachiot@in.tum.de
Alexandros Stamatakis           stamatak@in.tum.de



Copyright 
---------

These programs are free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

The programs are distributed in the hope that they will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.


Further Information
-------------------

The FPGA units SP-LAU and DP-LAU are exact implementations of the SP-ICSILog and the DP-ICSILog algorithms respectively.
Furthermore there is support for invalid input detection like nan, inf, -inf or zero.

For more information on the LAU see the paper:

N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop, 
held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.

For more information on the ICSI log algorithm see the paper:

O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy. 
Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.

or/and download the official single precision C implementation from:

http://linux.softpedia.com/get/Programming/Libraries/ICSILog-41333.shtml


Citation
--------

By using this component you agree to cite it as: "Efficient Floating-Point Logarithm Unit for FPGAs", by Nikos Alachiotis and Alexandros Stamatakis, accapted for publication at RAW workhsop, held in conjunction with IPDPS 2010.


Release Notes
------------

Version : 0.2 beta
Build date : September 20th, 2009
 * support for Virtex 4 FPGAs as well
 * FPGA verification

Version : 0.1 beta
Build date : August 2nd, 2009
 * support for Virtex 5 FPGAs only
 * Tested by using extensive post place and route simulations.






Compare with Previous | Blame | View Log

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.