URL https://opencores.org/ocsvn/hdl-deflate/hdl-deflate/trunk

Subversion Repositories hdl-deflate

[/] [hdl-deflate/] [trunk/] [README.md] - Blame information for rev 7

Details | Compare with Previous | View Log


# HDL-deflate
FPGA implementation of deflate (de)compress RFC 1950/1951
 
This design is implemented in MyHDL (www.myhdl.org) and can be translated to Verilog or VHDL.
 
It has been verified in Icarus, Xilinx Vivado and on a physical Xilinx device (Digilent Arty).
 
Usage should be clear from the test bench in `test_deflate.py`.
 
# Tunable parameters
 
    OBSIZE = 8192   # Size of output buffer (BRAM)
                    # You need 32768 to compress ALL valid deflate streams!
 
    IBSIZE = 2048   # Size of input buffer (LUT-RAM)
 
    CWINDOW = 32    # Search window for compression
 
## Compression efficiency
 
One can use a sliding window to reduce the size of the input buffer and the LUT-usage.
 
The minimal value is 2 * CWINDOW (64 bytes), the UnitTest in `test_deflate.py`
uses this strategy.
 
By default the compressor will reduce repeated 3/4/5 byte sequences in the search window to 15 bit.
This will result in a decent compression ratio for many real life input data patterns.
 
At the expense of additional LUTs one can improve this by enlarging the `CWINDOW` or expanding
the matching code to include 6/7/8/9/10 byte matches. Set `MATCH10` to `True` in the top of `deflate.py`
to activate this option.
 
Another strategy for data sets with just a small set of used byte values would be
to use a dedicated pre-computed Huffman tree. I could add this if there is interest, but it is probably
better to use a more dense coding in your FPGA application data in the first place.
 
## Compression speed
 
To reduce LUT usage the original implementation matched each slot in the search window in a dedicated clock cycle.
By setting `FAST` to `True` it will generate the logic to match the whole window in a single cycle.
The effective speed will be around 1 input byte every two cycles.
 
## Disabling functionality to save LUTs
 
The compress mode can be disabled by setting `COMPRESS` to `False`.
 
The decompress mode can be disabled by setting `DECOMPRESS` to `False`.
 
As an option you can disable dynamic tree decompression by setting `DYNAMIC` to `False`.
This will save a lot of LUT-ram and HDL-Deflate compressed output is always using a static tree,
but zlib will normally generate dynamic trees. Set zlib option `Z_FIXED` to generate streams with
a static tree.
 
In general the size of `leaves` and `d_leaves` can be reduced a lot when the maximal length of the input stream
is less than 32768. One can replace `test_data()` in `test_deflate.py` with a specific version which generates
typical test data for the intended FPGA application, and repeatedly halve the sizes of the `leaves` arrays
until the test fails.
 
FAST MATCH10 compress only has quite good resource usage.
 
## Practical considerations
 
In general HDL-Deflate is interesting when speed is important. When speed is not a real issue using a (soft)
CPU with zlib is probably the better approach. Especially decompression is also quite fast with a CPU and HDL-Deflate
needs a lot of LUTs when configured to decompress ANY deflate input stream. Compression is another story because it
is a LOT faster in hardware with the `FAST` option and uses a reasonable amount of LUTs.
 
# FPGA validation
 
## Minimal with smaller leaves arrays
 
Resource|Estimation
--------|----------
LUT     |7116
LUTRAM  |800
FF      |2265
BRAM    |4
 
## Compress False and smaller leaves arrays
 
Resource|Estimation
--------|----------
LUT     |5769
LUTRAM  |512
FF      |2169
BRAM    |4
 
## Decompress False and FAST and MATCH10
 
Resource|Estimation
--------|----------
LUT     |5118
LUTRAM  |84
FF      |1577
BRAM    |2
 
## FAST
 
Resource|Estimation
--------|----------
LUT     |8246
LUTRAM  |704
FF      |2520
BRAM    |4
 
## FAST and MATCH10
 
Resource|Estimation
--------|----------
LUT     |11888
LUTRAM  |536
FF      |3308
BRAM    |4
 
## Speed
 
The Vivado timing report fails at 100Mhz, but the test bench runs fine on my Arty at 100Mhz.
 
# Future Improvements (when there is interest)
 
* ~~Reduce LUT usage~~
* ~~Improve speed from current 80Mhz to 100Mhz~~
* ~~Improve compression performance~~

Line No.	Rev	Author	Line
1	7	tomtor	`# HDL-deflate`
2			`FPGA implementation of deflate (de)compress RFC 1950/1951`
3
4			`This design is implemented in MyHDL (www.myhdl.org) and can be translated to Verilog or VHDL.`
5
6			`It has been verified in Icarus, Xilinx Vivado and on a physical Xilinx device (Digilent Arty).`
7
8			Usage should be clear from the test bench in `test_deflate.py`.
9
10			`# Tunable parameters`
11
12			`OBSIZE = 8192 # Size of output buffer (BRAM)`
13			`# You need 32768 to compress ALL valid deflate streams!`
14
15			`IBSIZE = 2048 # Size of input buffer (LUT-RAM)`
16
17			`CWINDOW = 32 # Search window for compression`
18
19			`## Compression efficiency`
20
21			`One can use a sliding window to reduce the size of the input buffer and the LUT-usage.`
22
23			The minimal value is 2 * CWINDOW (64 bytes), the UnitTest in `test_deflate.py`
24			`uses this strategy.`
25
26			`By default the compressor will reduce repeated 3/4/5 byte sequences in the search window to 15 bit.`
27			`This will result in a decent compression ratio for many real life input data patterns.`
28
29			At the expense of additional LUTs one can improve this by enlarging the `CWINDOW` or expanding
30			the matching code to include 6/7/8/9/10 byte matches. Set `MATCH10` to `True` in the top of `deflate.py`
31			`to activate this option.`
32
33			`Another strategy for data sets with just a small set of used byte values would be`
34			`to use a dedicated pre-computed Huffman tree. I could add this if there is interest, but it is probably`
35			`better to use a more dense coding in your FPGA application data in the first place.`
36
37			`## Compression speed`
38
39			`To reduce LUT usage the original implementation matched each slot in the search window in a dedicated clock cycle.`
40			By setting `FAST` to `True` it will generate the logic to match the whole window in a single cycle.
41			`The effective speed will be around 1 input byte every two cycles.`
42
43			`## Disabling functionality to save LUTs`
44
45			The compress mode can be disabled by setting `COMPRESS` to `False`.
46
47			The decompress mode can be disabled by setting `DECOMPRESS` to `False`.
48
49			As an option you can disable dynamic tree decompression by setting `DYNAMIC` to `False`.
50			`This will save a lot of LUT-ram and HDL-Deflate compressed output is always using a static tree,`
51			but zlib will normally generate dynamic trees. Set zlib option `Z_FIXED` to generate streams with
52			`a static tree.`
53
54			In general the size of `leaves` and `d_leaves` can be reduced a lot when the maximal length of the input stream
55			is less than 32768. One can replace `test_data()` in `test_deflate.py` with a specific version which generates
56			typical test data for the intended FPGA application, and repeatedly halve the sizes of the `leaves` arrays
57			`until the test fails.`
58
59			`FAST MATCH10 compress only has quite good resource usage.`
60
61			`## Practical considerations`
62
63			`In general HDL-Deflate is interesting when speed is important. When speed is not a real issue using a (soft)`
64			`CPU with zlib is probably the better approach. Especially decompression is also quite fast with a CPU and HDL-Deflate`
65			`needs a lot of LUTs when configured to decompress ANY deflate input stream. Compression is another story because it`
66			is a LOT faster in hardware with the `FAST` option and uses a reasonable amount of LUTs.
67
68			`# FPGA validation`
69
70			`## Minimal with smaller leaves arrays`
71
72			`Resource\|Estimation`
73			`--------\|----------`
74			`LUT \|7116`
75			`LUTRAM \|800`
76			`FF \|2265`
77			`BRAM \|4`
78
79			`## Compress False and smaller leaves arrays`
80
81			`Resource\|Estimation`
82			`--------\|----------`
83			`LUT \|5769`
84			`LUTRAM \|512`
85			`FF \|2169`
86			`BRAM \|4`
87
88			`## Decompress False and FAST and MATCH10`
89
90			`Resource\|Estimation`
91			`--------\|----------`
92			`LUT \|5118`
93			`LUTRAM \|84`
94			`FF \|1577`
95			`BRAM \|2`
96
97			`## FAST`
98
99			`Resource\|Estimation`
100			`--------\|----------`
101			`LUT \|8246`
102			`LUTRAM \|704`
103			`FF \|2520`
104			`BRAM \|4`
105
106			`## FAST and MATCH10`
107
108			`Resource\|Estimation`
109			`--------\|----------`
110			`LUT \|11888`
111			`LUTRAM \|536`
112			`FF \|3308`
113			`BRAM \|4`
114
115			`## Speed`
116
117			`The Vivado timing report fails at 100Mhz, but the test bench runs fine on my Arty at 100Mhz.`
118
119			`# Future Improvements (when there is interest)`
120
121			`* ~~Reduce LUT usage~~`
122			`* ~~Improve speed from current 80Mhz to 100Mhz~~`
123			`* ~~Improve compression performance~~`

Browse

Tools

Subversion Repositories hdl-deflate

[/] [hdl-deflate/] [trunk/] [README.md] - Blame information for rev 7