OpenCores

Rev 2	Rev 3
`======================================================================================================`	`======================================================================================================`
`DP-ICSI log (C implementation of a fast logarithmic approximation unit based on ICSI log V2 0.6 Beta)`	`DP-ICSI log (C implementation of a fast logarithmic approximation unit based on ICSI log V2 0.6 Beta)`

`DP/SP LAU (FPGA unit that implements the ICSI log algorithm in VHDL)`	`DP/SP LAU (FPGA unit that implements the ICSI log algorithm in VHDL)`
`======================================================================================================`	`======================================================================================================`


`Version 0.2 beta`	`Version 0.2 beta`
`Build date: August 2nd, 2009`	`Build date: August 2nd, 2009`



`Introduction`	`Introduction`
`------------`	`------------`

`Software :`	`Software :`

`This package contains a C implementation of the ICSI logarithm approximation algorithm originally introduced in`	`This package contains a C implementation of the ICSI logarithm approximation algorithm originally introduced in`

`O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy.`	`O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy.`
`Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.`	`Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.`

`The new C function has been adjusted to support double precision inputs in contrast to the official implementation of the algorithm`	`The new C function has been adjusted to support double precision inputs in contrast to the official implementation of the algorithm`
`which supports only single precision. Furthermore, there is invalid input detection which makes the function fully compatible with`	`which supports only single precision. Furthermore, there is invalid input detection which makes the function fully compatible with`
`the IEEE 754 standard and the GNU library log() function.`	`the IEEE 754 standard and the GNU library log() function.`


`Hardware:`	`Hardware:`

`This package also contains a VHDL implementation of the ICSI logarithm approximation algorithm described in`	`This package also contains a VHDL implementation of the ICSI logarithm approximation algorithm described in`

`N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop,`	`N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop,`
`held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.`	`held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.`

`The SP-LAU (Single Precision Logarithm Approximation Unit) implements the algorithm and supports single precision inputs.`	`The SP-LAU (Single Precision Logarithm Approximation Unit) implements the algorithm and supports single precision inputs.`
`The DP-LAU (Double Precision Logarithm Approximation Unit) implements the algorithm and supports double precision inputs.`	`The DP-LAU (Double Precision Logarithm Approximation Unit) implements the algorithm and supports double precision inputs.`

`Both units support invalid input detection.`	`Both units support invalid input detection.`

`All implementations in this package calculate an approximation of the natural logarithm.`	`All implementations in this package calculate an approximation of the natural logarithm.`

`For more details about the software implementation see the respective readme file and paper for the ICSI log.`	`For more details about the software implementation see the respective readme file and paper for the ICSI log.`



`Package Structure`	`Package Structure`
`-----------------`	`-----------------`

`This package contains the following files and folder:`	`This package contains the following files and folder:`

`-README : This file`	`-README : This file`

`-DP-ICSILog/DP-ICSILog.c : C file that contains the adjusted for double precision implementation and an example of how to use the function.`	`-DP-ICSILog/DP-ICSILog.c : C file that contains the adjusted for double precision implementation and an example of how to use the function.`

`-Virtex 5/SP-LAU : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 5.`	`-Virtex 5/SP-LAU : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 5.`

`-Virtex 5/DP-LAU : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 5.`	`-Virtex 5/DP-LAU : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 5.`

`-Virtex 4/SP-LAU : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 4.`	`-Virtex 4/SP-LAU : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 4.`

`-Virtex 4/DP-LAU : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 4.`	`-Virtex 4/DP-LAU : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 4.`

`-COE Files : This folder contains COE files to be used if one needs to adjust the accuracy of the unit.`	`-COE Files : This folder contains COE files to be used if one needs to adjust the accuracy of the unit.`

`-PAO Files : This folder contains PAO files that contain the Peripheral Analysis Order for the SP and DP LAUs.`	`-PAO Files : This folder contains PAO files that contain the Peripheral Analysis Order for the SP and DP LAUs.`




`Usage of the DP-ICSILog`	`Usage of the DP-ICSILog`
`-----------------------`	`-----------------------`


`The DP-ICSILog.c file contains the necessay global variables and functions that need to be`	`The DP-ICSILog.c file contains the necessay global variables and functions that need to be`
`called in order to use the DP-ICSILog function as well as an example.`	`called in order to use the DP-ICSILog function as well as an example.`



`Interface of the LAU`	`Interface of the LAU`
`--------------------`	`--------------------`

`The toplevel module of the LAU is sp_fp_log_v2 for the single precision logarithm approximation unit`	`The toplevel module of the LAU is sp_fp_log_v2 for the single precision logarithm approximation unit`
`and the dp_fp_log_v2 for double precison.`	`and the dp_fp_log_v2 for double precison.`

`sp/dp : Single Precision / Double Precision`	`sp/dp : Single Precision / Double Precision`
`fp : Floating Point`	`fp : Floating Point`
`log : Logarithm`	`log : Logarithm`
`V2 : Because the mantissa lookup table has been initialized using the respective function of the ICSILog V2 0.6 Beta software.`	`V2 : Because the mantissa lookup table has been initialized using the respective function of the ICSILog V2 0.6 Beta software.`
`(The Version 2 of this function doubled the precision of the unit comparing to Version 1)`	`(The Version 2 of this function doubled the precision of the unit comparing to Version 1)`

`The interface of the unit is defined as follows:`	`The interface of the unit is defined as follows:`

`entity sp_fp_log_v2/dp_fp_log_v2 is`	`entity sp_fp_log_v2/dp_fp_log_v2 is`
`Port ( rst : in STD_LOGIC; -- The reset signal`	`Port ( rst : in STD_LOGIC; -- The reset signal`
`clk : in STD_LOGIC; -- The clock signal`	`clk : in STD_LOGIC; -- The clock signal`
`valid_in: in STD_LOGIC; -- Signal that indicates valid number at the input port of the unit.`	`valid_in: in STD_LOGIC; -- Signal that indicates valid number at the input port of the unit.`
`input_val: STD_LOGIC_VECTOR(31/63 downto 0); -- The input number.`	`input_val: STD_LOGIC_VECTOR(31/63 downto 0); -- The input number.`
`valid_out : STD_LOGIC; -- Signal that indicates valid number at the output port of the unit.`	`valid_out : STD_LOGIC; -- Signal that indicates valid number at the output port of the unit.`
`output_val : STD_LOGIC_VECTOR(31/63 downto 0) -- The output number, the approximation of the logarithm of the input number.`	`output_val : STD_LOGIC_VECTOR(31/63 downto 0) -- The output number, the approximation of the logarithm of the input number.`
`);`	`);`
`end sp_fp_log_v2/dp_fp_log_v2;`	`end sp_fp_log_v2/dp_fp_log_v2;`



`Implementation Details`	`Implementation Details`
`----------------------`	`----------------------`

`The VHDL units have been designed using the Xilinx 10.1 Design Suite.`	`The VHDL units have been designed using the Xilinx 10.1 Design Suite.`

`ISE 10.1 was used to create the unit.`	`ISE 10.1 was used to create the unit.`

`Coregen was used to create all the IPs used in this unit.`	`Coregen was used to create all the IPs used in this unit.`

`The released LAUs use a mantissa lookup table with 4,096 entries.`	`The released LAUs use a mantissa lookup table with 4,096 entries.`

`Target devices are Virtex 4 and Virtex 5 FPGAs.`	`Target devices are Virtex 4 and Virtex 5 FPGAs.`

`One needs to change the IPs used in order to use the unit on any FPGA that meets the demands of number of block rams (This number`	`One needs to change the IPs used in order to use the unit on any FPGA that meets the demands of number of block rams (This number`
`depends on the desired accuracy and thus on the size of the mantissa lookup table) and number of DSP slices (3 DSP slices are occupied).`	`depends on the desired accuracy and thus on the size of the mantissa lookup table) and number of DSP slices (3 DSP slices are occupied).`

`One can use the coe files in the COE file folder to regenerate the mantissa lookup table for different accuracy and resources occupation.`	`One can use the coe files in the COE file folder to regenerate the mantissa lookup table for different accuracy and resources occupation.`

`Both units have a latency of 22 cycles (Virtex 5) and 28 cycles (Virtex 4) which is the same irrespective of the size of the mantissa lookup table used and thus the accuracy.`	`Both units have a latency of 22 cycles (Virtex 5) and 28 cycles (Virtex 4) which is the same irrespective of the size of the mantissa lookup table used and thus the accuracy.`

`The released units occupy 2% of the hardware resources on the Virtex 5 SX95T FPGA and can operate with the following clock frequencies`	`The released units occupy 2% of the hardware resources on the Virtex 5 SX95T FPGA and can operate with the following clock frequencies`
`as they were reported by the static timing report:`	`as they were reported by the static timing report:`

`353.4 MHz for the SP-LAU on the V5SX95T-2 and`	`353.4 MHz for the SP-LAU on the V5SX95T-2 and`
`320.6 MHz for the DP-LAU on the V5SX95T-2 .`	`320.6 MHz for the DP-LAU on the V5SX95T-2 .`



`Verification Details`	`Verification Details`
`--------------------`	`--------------------`

`Modelsim 6.3f was used for extensive post place and route simulations.`	`Modelsim 6.3f was used for extensive post place and route simulations.`

`The development board HTG-V5-PCIE by HiTech Global populated with a V5SX95T-1 FPGA was used to verify the LAUs.`	`The development board HTG-V5-PCIE by HiTech Global populated with a V5SX95T-1 FPGA was used to verify the LAUs.`

`ChiScope Pro Analyzer was used for advanced on-chip debugging and verification of the units.`	`ChiScope Pro Analyzer was used for advanced on-chip debugging and verification of the units.`





`IP Configuration Details for the Virtex 5 LAUs`	`IP Configuration Details for the Virtex 5 LAUs`
`----------------------------------------------`	`----------------------------------------------`

`The IPs used for the implementations are the following:`	`The IPs used for the implementations are the following:`
`(The configuration options that are not mentioned were not selected.)`	`(The configuration options that are not mentioned were not selected.)`


`comp_eq_000000000000 :`	`comp_eq_000000000000 :`

`Comparator ,`	`Comparator ,`
`Operation :A=B,`	`Operation :A=B,`
`Data Type: Unsigned,`	`Data Type: Unsigned,`
`Input Width: 12,`	`Input Width: 12,`
`Port B Constant: 000000000000,`	`Port B Constant: 000000000000,`
`Pipeline Stages: 0,`	`Pipeline Stages: 0,`
`Output Options:Registered Output,`	`Output Options:Registered Output,`
`Synchronous Settings: Clear`	`Synchronous Settings: Clear`


`comp_eq_000000000000000 :`	`comp_eq_000000000000000 :`

`Comparator ,`	`Comparator ,`
`Operation :A=B,`	`Operation :A=B,`
`Data Type: Unsigned,`	`Data Type: Unsigned,`
`Input Width: 15,`	`Input Width: 15,`
`Port B Constant: 000000000000000,`	`Port B Constant: 000000000000000,`
`Pipeline Stages: 0,`	`Pipeline Stages: 0,`
`Output Options:Registered Output,`	`Output Options:Registered Output,`
`Synchronous Settings: Clear`	`Synchronous Settings: Clear`


`comp_eq_8ones :`	`comp_eq_8ones :`

`Comparator ,`	`Comparator ,`
`Operation :A=B,`	`Operation :A=B,`
`Data Type: Unsigned,`	`Data Type: Unsigned,`
`Input Width: 8,`	`Input Width: 8,`
`Port B Constant: 11111111,`	`Port B Constant: 11111111,`
`Pipeline Stages: 0,`	`Pipeline Stages: 0,`
`Output Options:Registered Output,`	`Output Options:Registered Output,`
`Synchronous Settings: Clear`	`Synchronous Settings: Clear`


`comp_eq_11ones :`	`comp_eq_11ones :`

`Comparator ,`	`Comparator ,`
`Operation :A=B,`	`Operation :A=B,`
`Data Type: Unsigned,`	`Data Type: Unsigned,`
`Input Width: 11,`	`Input Width: 11,`
`Port B Constant: 11111111111,`	`Port B Constant: 11111111111,`
`Pipeline Stages: 0,`	`Pipeline Stages: 0,`
`Output Options:Registered Output,`	`Output Options:Registered Output,`
`Synchronous Settings: Clear`	`Synchronous Settings: Clear`


`comp_eq_22zeros :`	`comp_eq_22zeros :`

`Comparator ,`	`Comparator ,`
`Operation :A=B,`	`Operation :A=B,`
`Data Type: Unsigned,`	`Data Type: Unsigned,`
`Input Width: 22,`	`Input Width: 22,`
`Port B Constant: 00000...0000,`	`Port B Constant: 00000...0000,`
`Pipeline Stages: 0,`	`Pipeline Stages: 0,`
`Output Options:Registered Output,`	`Output Options:Registered Output,`
`Synchronous Settings: Clear`	`Synchronous Settings: Clear`


`comp_eq_51zeros :`	`comp_eq_51zeros :`

`Comparator ,`	`Comparator ,`
`Operation :A=B,`	`Operation :A=B,`
`Data Type: Unsigned,`	`Data Type: Unsigned,`
`Input Width: 51,`	`Input Width: 51,`
`Port B Constant: 00000...0000,`	`Port B Constant: 00000...0000,`
`Pipeline Stages: 0,`	`Pipeline Stages: 0,`
`Output Options:Registered Output,`	`Output Options:Registered Output,`
`Synchronous Settings: Clear`	`Synchronous Settings: Clear`


`comp_eq_111111 :`	`comp_eq_111111 :`

`Comparator ,`	`Comparator ,`
`Operation :A=B,`	`Operation :A=B,`
`Data Type: Unsigned,`	`Data Type: Unsigned,`
`Input Width: 6,`	`Input Width: 6,`
`Port B Constant: 111111,`	`Port B Constant: 111111,`
`Pipeline Stages: 0,`	`Pipeline Stages: 0,`
`Output Options:Registered Output,`	`Output Options:Registered Output,`
`Synchronous Settings: Clear`	`Synchronous Settings: Clear`


`comp_eq_111111111 :`	`comp_eq_111111111 :`

`Comparator ,`	`Comparator ,`
`Operation :A=B,`	`Operation :A=B,`
`Data Type: Unsigned,`	`Data Type: Unsigned,`
`Input Width: 9,`	`Input Width: 9,`
`Port B Constant: 111111111,`	`Port B Constant: 111111111,`
`Pipeline Stages: 0,`	`Pipeline Stages: 0,`
`Output Options:Registered Output,`	`Output Options:Registered Output,`
`Synchronous Settings: Clear`	`Synchronous Settings: Clear`


`exp_lut_MEM :`	`exp_lut_MEM :`

`Block Memory Generator,`	`Block Memory Generator,`
`Memory Type: Single Port ROM,`	`Memory Type: Single Port ROM,`
`Read Width: 9`	`Read Width: 9`
`Read Depth: 128`	`Read Depth: 128`


`mant_lut_MEM :`	`mant_lut_MEM :`

`Block Memory Generator,`	`Block Memory Generator,`
`Memory Type: Single Port ROM,`	`Memory Type: Single Port ROM,`
`Read Width: 27`	`Read Width: 27`
`Read Depth: 4096 (depends on the desired accuracy)`	`Read Depth: 4096 (depends on the desired accuracy)`


`All the registers used are RAM-based Shift Registers. The width and depth of each register is indicated by the name.`	`All the registers used are RAM-based Shift Registers. The width and depth of each register is indicated by the name.`
`For example: reg_1b_1c is a register of 1 bit and 1 clock latency.`	`For example: reg_1b_1c is a register of 1 bit and 1 clock latency.`


`sp_fp_add:`	`sp_fp_add:`

`Floating Point,`	`Floating Point,`
`Operation Selection: Add,`	`Operation Selection: Add,`
`Precision: Single,`	`Precision: Single,`
`Architecture Optimization: High Speed,`	`Architecture Optimization: High Speed,`
`Family Optimizations: Full Usage,`	`Family Optimizations: Full Usage,`
`Latency and Rate Configuration: Use Maximum Latency`	`Latency and Rate Configuration: Use Maximum Latency`


`sp_fp_mult:`	`sp_fp_mult:`

`Floating Point,`	`Floating Point,`
`Operation Selection: Multiply,`	`Operation Selection: Multiply,`
`Precision: Single,`	`Precision: Single,`
`Architecture Optimization: High Speed,`	`Architecture Optimization: High Speed,`
`Family Optimizations: Medium Usage,`	`Family Optimizations: Medium Usage,`
`Latency and Rate Configuration: Use Maximum Latency`	`Latency and Rate Configuration: Use Maximum Latency`


`Note:`	`Note:`
`The Coregen Project Settings were changed from Virtex 5 to Virtex 4 and all the above IPs were regenerated under the current project settings,`	`The Coregen Project Settings were changed from Virtex 5 to Virtex 4 and all the above IPs were regenerated under the current project settings,`
`except only for the RAM-based Shift Registers that operate in parallel with the sp_fp_add and sp_fp_mult IPs. In this case the depth (clock delay)`	`except only for the RAM-based Shift Registers that operate in parallel with the sp_fp_add and sp_fp_mult IPs. In this case the depth (clock delay)`
`was changed according to the latency of the sp_fp_add and sp_fp_mult IPs.`	`was changed according to the latency of the sp_fp_add and sp_fp_mult IPs.`



`Authors and Contact Details`	`Authors and Contact Details`
`---------------------------`	`---------------------------`

`Nikos Alachiotis alachiot@in.tum.de`	`Nikos Alachiotis alachiot@in.tum.de`
`Alexandros Stamatakis stamatak@in.tum.de`	`Alexandros Stamatakis stamatak@in.tum.de`



`Copyright`	`Copyright`
`---------`	`---------`

`These programs are free software; you can redistribute it and/or modify`	`These programs are free software; you can redistribute it and/or modify`
`it under the terms of the GNU General Public License as published by`	`it under the terms of the GNU Lesser General Public License as published by`
`the Free Software Foundation; either version 2 of the License, or`	`the Free Software Foundation; either version 2 of the License, or`
`(at your option) any later version.`	`(at your option) any later version.`

`The programs are distributed in the hope that they will be useful,`	`The programs are distributed in the hope that they will be useful,`
`but WITHOUT ANY WARRANTY; without even the implied warranty of`	`but WITHOUT ANY WARRANTY; without even the implied warranty of`
`MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`	`MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the`
`GNU General Public License for more details.`	`GNU General Public License for more details.`


`Further Information`	`Further Information`
`-------------------`	`-------------------`

`The FPGA units SP-LAU and DP-LAU are exact implementations of the SP-ICSILog and the DP-ICSILog algorithms respectively.`	`The FPGA units SP-LAU and DP-LAU are exact implementations of the SP-ICSILog and the DP-ICSILog algorithms respectively.`
`Furthermore there is support for invalid input detection like nan, inf, -inf or zero.`	`Furthermore there is support for invalid input detection like nan, inf, -inf or zero.`

`For more information on the LAU see the paper:`	`For more information on the LAU see the paper:`

`N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop,`	`N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop,`
`held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.`	`held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.`

`For more information on the ICSI log algorithm see the paper:`	`For more information on the ICSI log algorithm see the paper:`

`O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy.`	`O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy.`
`Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.`	`Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.`

`or/and download the official single precision C implementation from:`	`or/and download the official single precision C implementation from:`

`http://linux.softpedia.com/get/Programming/Libraries/ICSILog-41333.shtml`	`http://linux.softpedia.com/get/Programming/Libraries/ICSILog-41333.shtml`


`Citation`	`Citation`
`--------`	`--------`

`By using this component you agree to cite it as: "Efficient Floating-Point Logarithm Unit for FPGAs", by Nikos Alachiotis and Alexandros Stamatakis, accapted for publication at RAW workhsop, held in conjunction with IPDPS 2010.`	`By using this component you agree to cite it as: "Efficient Floating-Point Logarithm Unit for FPGAs", by Nikos Alachiotis and Alexandros Stamatakis, accapted for publication at RAW workhsop, held in conjunction with IPDPS 2010.`


`Release Notes`	`Release Notes`
`------------`	`------------`

`Version : 0.2 beta`	`Version : 0.2 beta`
`Build date : September 20th, 2009`	`Build date : September 20th, 2009`
`* support for Virtex 4 FPGAs as well`	`* support for Virtex 4 FPGAs as well`
`* FPGA verification`	`* FPGA verification`

`Version : 0.1 beta`	`Version : 0.1 beta`
`Build date : August 2nd, 2009`	`Build date : August 2nd, 2009`
`* support for Virtex 5 FPGAs only`	`* support for Virtex 5 FPGAs only`
`* Tested by using extensive post place and route simulations.`	`* Tested by using extensive post place and route simulations.`

======================================================================================================

======================================================================================================

DP-ICSI log  (C implementation of a fast logarithmic approximation unit based on ICSI log V2 0.6 Beta)

DP-ICSI log  (C implementation of a fast logarithmic approximation unit based on ICSI log V2 0.6 Beta)

DP/SP LAU    (FPGA unit that implements the ICSI log algorithm in VHDL)

DP/SP LAU    (FPGA unit that implements the ICSI log algorithm in VHDL)

======================================================================================================

======================================================================================================

Version 0.2 beta

Version 0.2 beta

Build date: August 2nd, 2009

Build date: August 2nd, 2009

Introduction

Introduction

------------

------------

Software :

Software :

This package contains a C implementation of the ICSI logarithm approximation algorithm originally introduced in

This package contains a C implementation of the ICSI logarithm approximation algorithm originally introduced in

O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy.

O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy.

Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.

Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.

The new C function has been adjusted to support double precision inputs in contrast to the official implementation of the algorithm

The new C function has been adjusted to support double precision inputs in contrast to the official implementation of the algorithm

which supports only single precision. Furthermore, there is invalid input detection which makes the function fully compatible with

which supports only single precision. Furthermore, there is invalid input detection which makes the function fully compatible with

the IEEE 754 standard and the GNU library log() function.

the IEEE 754 standard and the GNU library log() function.

Hardware:

Hardware:

This package also contains a VHDL implementation of the ICSI logarithm approximation algorithm described in

This package also contains a VHDL implementation of the ICSI logarithm approximation algorithm described in

N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop,

N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop,

held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.

held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.

The SP-LAU (Single Precision Logarithm Approximation Unit) implements the algorithm and supports single precision inputs.

The SP-LAU (Single Precision Logarithm Approximation Unit) implements the algorithm and supports single precision inputs.

The DP-LAU (Double Precision Logarithm Approximation Unit) implements the algorithm and supports double precision inputs.

The DP-LAU (Double Precision Logarithm Approximation Unit) implements the algorithm and supports double precision inputs.

Both units support invalid input detection.

Both units support invalid input detection.

All implementations in this package calculate an approximation of the natural logarithm.

All implementations in this package calculate an approximation of the natural logarithm.

For more details about the software implementation see the respective readme file and paper for the ICSI log.

For more details about the software implementation see the respective readme file and paper for the ICSI log.

Package Structure

Package Structure

-----------------

-----------------

This package contains the following files and folder:

This package contains the following files and folder:

-README                                 : This file

-README                                 : This file

-DP-ICSILog/DP-ICSILog.c        : C file that contains the adjusted for double precision implementation and an example of how to use the function.

-DP-ICSILog/DP-ICSILog.c        : C file that contains the adjusted for double precision implementation and an example of how to use the function.

-Virtex 5/SP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 5.

-Virtex 5/SP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 5.

-Virtex 5/DP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 5.

-Virtex 5/DP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 5.

-Virtex 4/SP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 4.

-Virtex 4/SP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the single precision unit on Virtex 4.

-Virtex 4/DP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 4.

-Virtex 4/DP-LAU                        : This folder contains the VHDL source files as well as .xco and .ngc files of the IPs that have been used to implement the double precision unit on Virtex 4.

-COE Files                              : This folder contains COE files to be used if one needs to adjust the accuracy of the unit.

-COE Files                              : This folder contains COE files to be used if one needs to adjust the accuracy of the unit.

-PAO Files                              : This folder contains PAO files that contain the Peripheral Analysis Order for the SP and DP LAUs.

-PAO Files                              : This folder contains PAO files that contain the Peripheral Analysis Order for the SP and DP LAUs.

Usage of the DP-ICSILog

Usage of the DP-ICSILog

-----------------------

-----------------------

The DP-ICSILog.c file contains the necessay global variables and functions that need to be

The DP-ICSILog.c file contains the necessay global variables and functions that need to be

called in order to use the DP-ICSILog function as well as an example.

called in order to use the DP-ICSILog function as well as an example.

Interface of the LAU

Interface of the LAU

--------------------

--------------------

The toplevel module of the LAU is sp_fp_log_v2 for the single precision logarithm approximation unit

The toplevel module of the LAU is sp_fp_log_v2 for the single precision logarithm approximation unit

and the dp_fp_log_v2 for double precison.

and the dp_fp_log_v2 for double precison.

sp/dp : Single Precision / Double Precision

sp/dp : Single Precision / Double Precision

fp    : Floating Point

fp    : Floating Point

log   : Logarithm

log   : Logarithm

V2    : Because the mantissa lookup table has been initialized using the respective function of the ICSILog V2 0.6 Beta software.

V2    : Because the mantissa lookup table has been initialized using the respective function of the ICSILog V2 0.6 Beta software.

        (The Version 2 of this function doubled the precision of the unit comparing to Version 1)

        (The Version 2 of this function doubled the precision of the unit comparing to Version 1)

The interface of the unit is defined as follows:

The interface of the unit is defined as follows:

entity sp_fp_log_v2/dp_fp_log_v2 is

entity sp_fp_log_v2/dp_fp_log_v2 is

        Port ( rst : in STD_LOGIC;          -- The reset signal

        Port ( rst : in STD_LOGIC;          -- The reset signal

               clk : in STD_LOGIC;          -- The clock signal

               clk : in STD_LOGIC;          -- The clock signal

               valid_in: in STD_LOGIC;      -- Signal that indicates valid number at the input port of the unit.

               valid_in: in STD_LOGIC;      -- Signal that indicates valid number at the input port of the unit.

               input_val: STD_LOGIC_VECTOR(31/63 downto 0);     -- The input number.

               input_val: STD_LOGIC_VECTOR(31/63 downto 0);     -- The input number.

               valid_out : STD_LOGIC;       -- Signal that indicates valid number at the output port of the unit.

               valid_out : STD_LOGIC;       -- Signal that indicates valid number at the output port of the unit.

               output_val : STD_LOGIC_VECTOR(31/63 downto 0)   -- The output number, the approximation of the logarithm of the input number.

               output_val : STD_LOGIC_VECTOR(31/63 downto 0)   -- The output number, the approximation of the logarithm of the input number.

);

);

end sp_fp_log_v2/dp_fp_log_v2;

end sp_fp_log_v2/dp_fp_log_v2;

Implementation Details

Implementation Details

----------------------

----------------------

The VHDL units have been designed using the Xilinx 10.1 Design Suite.

The VHDL units have been designed using the Xilinx 10.1 Design Suite.

ISE 10.1 was used to create the unit.

ISE 10.1 was used to create the unit.

Coregen was used to create all the IPs used in this unit.

Coregen was used to create all the IPs used in this unit.

The released LAUs use a mantissa lookup table with 4,096 entries.

The released LAUs use a mantissa lookup table with 4,096 entries.

Target devices are Virtex 4 and Virtex 5 FPGAs.

Target devices are Virtex 4 and Virtex 5 FPGAs.

One needs to change the IPs used in order to use the unit on any FPGA that meets the demands of number of block rams (This number

One needs to change the IPs used in order to use the unit on any FPGA that meets the demands of number of block rams (This number

depends on the desired accuracy and thus on the size of the mantissa lookup table) and number of DSP slices (3 DSP slices are occupied).

depends on the desired accuracy and thus on the size of the mantissa lookup table) and number of DSP slices (3 DSP slices are occupied).

One can use the coe files in the COE file folder to regenerate the mantissa lookup table for different accuracy and resources occupation.

One can use the coe files in the COE file folder to regenerate the mantissa lookup table for different accuracy and resources occupation.

Both units have a latency of 22 cycles (Virtex 5) and 28 cycles (Virtex 4) which is the same irrespective of the size of the mantissa lookup table used and thus the accuracy.

Both units have a latency of 22 cycles (Virtex 5) and 28 cycles (Virtex 4) which is the same irrespective of the size of the mantissa lookup table used and thus the accuracy.

The released units occupy 2% of the hardware resources on the Virtex 5 SX95T FPGA and can operate with the following clock frequencies

The released units occupy 2% of the hardware resources on the Virtex 5 SX95T FPGA and can operate with the following clock frequencies

as they were reported by the static timing report:

as they were reported by the static timing report:

353.4 MHz for the SP-LAU on the V5SX95T-2 and

353.4 MHz for the SP-LAU on the V5SX95T-2 and

320.6 MHz for the DP-LAU on the V5SX95T-2 .

320.6 MHz for the DP-LAU on the V5SX95T-2 .

Verification Details

Verification Details

--------------------

--------------------

Modelsim 6.3f was used for extensive post place and route simulations.

Modelsim 6.3f was used for extensive post place and route simulations.

The development board HTG-V5-PCIE by HiTech Global populated with a V5SX95T-1 FPGA was used to verify the LAUs.

The development board HTG-V5-PCIE by HiTech Global populated with a V5SX95T-1 FPGA was used to verify the LAUs.

ChiScope Pro Analyzer was used for advanced on-chip debugging and verification of the units.

ChiScope Pro Analyzer was used for advanced on-chip debugging and verification of the units.

IP Configuration Details for the Virtex 5 LAUs

IP Configuration Details for the Virtex 5 LAUs

----------------------------------------------

----------------------------------------------

The IPs used for the implementations are the following:

The IPs used for the implementations are the following:

(The configuration options that are not mentioned were not selected.)

(The configuration options that are not mentioned were not selected.)

comp_eq_000000000000 :

comp_eq_000000000000 :

Comparator ,

Comparator ,

Operation :A=B,

Operation :A=B,

Data Type: Unsigned,

Data Type: Unsigned,

Input Width: 12,

Input Width: 12,

Port B Constant: 000000000000,

Port B Constant: 000000000000,

Pipeline Stages: 0,

Pipeline Stages: 0,

Output Options:Registered Output,

Output Options:Registered Output,

Synchronous Settings: Clear

Synchronous Settings: Clear

comp_eq_000000000000000 :

comp_eq_000000000000000 :

Comparator ,

Comparator ,

Operation :A=B,

Operation :A=B,

Data Type: Unsigned,

Data Type: Unsigned,

Input Width: 15,

Input Width: 15,

Port B Constant: 000000000000000,

Port B Constant: 000000000000000,

Pipeline Stages: 0,

Pipeline Stages: 0,

Output Options:Registered Output,

Output Options:Registered Output,

Synchronous Settings: Clear

Synchronous Settings: Clear

comp_eq_8ones :

comp_eq_8ones :

Comparator ,

Comparator ,

Operation :A=B,

Operation :A=B,

Data Type: Unsigned,

Data Type: Unsigned,

Input Width: 8,

Input Width: 8,

Port B Constant: 11111111,

Port B Constant: 11111111,

Pipeline Stages: 0,

Pipeline Stages: 0,

Output Options:Registered Output,

Output Options:Registered Output,

Synchronous Settings: Clear

Synchronous Settings: Clear

comp_eq_11ones :

comp_eq_11ones :

Comparator ,

Comparator ,

Operation :A=B,

Operation :A=B,

Data Type: Unsigned,

Data Type: Unsigned,

Input Width: 11,

Input Width: 11,

Port B Constant: 11111111111,

Port B Constant: 11111111111,

Pipeline Stages: 0,

Pipeline Stages: 0,

Output Options:Registered Output,

Output Options:Registered Output,

Synchronous Settings: Clear

Synchronous Settings: Clear

comp_eq_22zeros :

comp_eq_22zeros :

Comparator ,

Comparator ,

Operation :A=B,

Operation :A=B,

Data Type: Unsigned,

Data Type: Unsigned,

Input Width: 22,

Input Width: 22,

Port B Constant: 00000...0000,

Port B Constant: 00000...0000,

Pipeline Stages: 0,

Pipeline Stages: 0,

Output Options:Registered Output,

Output Options:Registered Output,

Synchronous Settings: Clear

Synchronous Settings: Clear

comp_eq_51zeros :

comp_eq_51zeros :

Comparator ,

Comparator ,

Operation :A=B,

Operation :A=B,

Data Type: Unsigned,

Data Type: Unsigned,

Input Width: 51,

Input Width: 51,

Port B Constant: 00000...0000,

Port B Constant: 00000...0000,

Pipeline Stages: 0,

Pipeline Stages: 0,

Output Options:Registered Output,

Output Options:Registered Output,

Synchronous Settings: Clear

Synchronous Settings: Clear

comp_eq_111111 :

comp_eq_111111 :

Comparator ,

Comparator ,

Operation :A=B,

Operation :A=B,

Data Type: Unsigned,

Data Type: Unsigned,

Input Width: 6,

Input Width: 6,

Port B Constant: 111111,

Port B Constant: 111111,

Pipeline Stages: 0,

Pipeline Stages: 0,

Output Options:Registered Output,

Output Options:Registered Output,

Synchronous Settings: Clear

Synchronous Settings: Clear

comp_eq_111111111 :

comp_eq_111111111 :

Comparator ,

Comparator ,

Operation :A=B,

Operation :A=B,

Data Type: Unsigned,

Data Type: Unsigned,

Input Width: 9,

Input Width: 9,

Port B Constant: 111111111,

Port B Constant: 111111111,

Pipeline Stages: 0,

Pipeline Stages: 0,

Output Options:Registered Output,

Output Options:Registered Output,

Synchronous Settings: Clear

Synchronous Settings: Clear

exp_lut_MEM :

exp_lut_MEM :

Block Memory Generator,

Block Memory Generator,

Memory Type: Single Port ROM,

Memory Type: Single Port ROM,

Read Width: 9

Read Width: 9

Read Depth: 128

Read Depth: 128

mant_lut_MEM :

mant_lut_MEM :

Block Memory Generator,

Block Memory Generator,

Memory Type: Single Port ROM,

Memory Type: Single Port ROM,

Read Width: 27

Read Width: 27

Read Depth: 4096 (depends on the desired accuracy)

Read Depth: 4096 (depends on the desired accuracy)

All the registers used are RAM-based Shift Registers. The width and depth of each register is indicated by the name.

All the registers used are RAM-based Shift Registers. The width and depth of each register is indicated by the name.

For example: reg_1b_1c is a register of 1 bit and 1 clock latency.

For example: reg_1b_1c is a register of 1 bit and 1 clock latency.

sp_fp_add:

sp_fp_add:

Floating Point,

Floating Point,

Operation Selection: Add,

Operation Selection: Add,

Precision: Single,

Precision: Single,

Architecture Optimization: High Speed,

Architecture Optimization: High Speed,

Family Optimizations: Full Usage,

Family Optimizations: Full Usage,

Latency and Rate Configuration: Use Maximum Latency

Latency and Rate Configuration: Use Maximum Latency

sp_fp_mult:

sp_fp_mult:

Floating Point,

Floating Point,

Operation Selection: Multiply,

Operation Selection: Multiply,

Precision: Single,

Precision: Single,

Architecture Optimization: High Speed,

Architecture Optimization: High Speed,

Family Optimizations: Medium Usage,

Family Optimizations: Medium Usage,

Latency and Rate Configuration: Use Maximum Latency

Latency and Rate Configuration: Use Maximum Latency

Note:

Note:

The Coregen Project Settings were changed from Virtex 5 to Virtex 4 and all the above IPs were regenerated under the current project settings,

The Coregen Project Settings were changed from Virtex 5 to Virtex 4 and all the above IPs were regenerated under the current project settings,

except only for the RAM-based Shift Registers that operate in parallel with the sp_fp_add and sp_fp_mult IPs. In this case the depth (clock delay)

except only for the RAM-based Shift Registers that operate in parallel with the sp_fp_add and sp_fp_mult IPs. In this case the depth (clock delay)

was changed according to the latency of the sp_fp_add and sp_fp_mult IPs.

was changed according to the latency of the sp_fp_add and sp_fp_mult IPs.

Authors and Contact Details

Authors and Contact Details

---------------------------

---------------------------

Nikos Alachiotis                        alachiot@in.tum.de

Nikos Alachiotis                        alachiot@in.tum.de

Alexandros Stamatakis           stamatak@in.tum.de

Alexandros Stamatakis           stamatak@in.tum.de

Copyright

Copyright

---------

---------

These programs are free software; you can redistribute it and/or modify

These programs are free software; you can redistribute it and/or modify

it under the terms of the GNU General Public License as published by

it under the terms of the GNU Lesser General Public License as published by

the Free Software Foundation; either version 2 of the License, or

the Free Software Foundation; either version 2 of the License, or

(at your option) any later version.

(at your option) any later version.

The programs are distributed in the hope that they will be useful,

The programs are distributed in the hope that they will be useful,

but WITHOUT ANY WARRANTY; without even the implied warranty of

but WITHOUT ANY WARRANTY; without even the implied warranty of

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

GNU General Public License for more details.

GNU General Public License for more details.

Further Information

Further Information

-------------------

-------------------

The FPGA units SP-LAU and DP-LAU are exact implementations of the SP-ICSILog and the DP-ICSILog algorithms respectively.

The FPGA units SP-LAU and DP-LAU are exact implementations of the SP-ICSILog and the DP-ICSILog algorithms respectively.

Furthermore there is support for invalid input detection like nan, inf, -inf or zero.

Furthermore there is support for invalid input detection like nan, inf, -inf or zero.

For more information on the LAU see the paper:

For more information on the LAU see the paper:

N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop,

N. Alachiotis, A. Stamatakis: "Efficient Floating-Point Logarithm Unit for FPGAs". Accepted for publication at RAW workshop,

held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.

held in conjunction with IPDPS 2010, Atlanta, Georgia, April, 2010.

For more information on the ICSI log algorithm see the paper:

For more information on the ICSI log algorithm see the paper:

O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy.

O.Vinyals, G.Friedland, A Hardware-Independent Fast Logarithm Approximation with Adjustable Accuracy.

Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.

Tenth IEEE International Symposium on Multimedia, 2008. ISM 2008. pp. 61-65, December 2008.

or/and download the official single precision C implementation from:

or/and download the official single precision C implementation from:

http://linux.softpedia.com/get/Programming/Libraries/ICSILog-41333.shtml

http://linux.softpedia.com/get/Programming/Libraries/ICSILog-41333.shtml

Citation

Citation

--------

--------

By using this component you agree to cite it as: "Efficient Floating-Point Logarithm Unit for FPGAs", by Nikos Alachiotis and Alexandros Stamatakis, accapted for publication at RAW workhsop, held in conjunction with IPDPS 2010.

By using this component you agree to cite it as: "Efficient Floating-Point Logarithm Unit for FPGAs", by Nikos Alachiotis and Alexandros Stamatakis, accapted for publication at RAW workhsop, held in conjunction with IPDPS 2010.

Release Notes

Release Notes

------------

------------

Version : 0.2 beta

Version : 0.2 beta

Build date : September 20th, 2009

Build date : September 20th, 2009

 * support for Virtex 4 FPGAs as well

 * support for Virtex 4 FPGAs as well

 * FPGA verification

 * FPGA verification

Version : 0.1 beta

Version : 0.1 beta

Build date : August 2nd, 2009

Build date : August 2nd, 2009

 * support for Virtex 5 FPGAs only

 * support for Virtex 5 FPGAs only

 * Tested by using extensive post place and route simulations.

 * Tested by using extensive post place and route simulations.

Browse

Tools

Subversion Repositories fp_log

[/] [fp_log/] [trunk/] [README] - Diff between revs 2 and 3