OpenCores
URL https://opencores.org/ocsvn/hdl-deflate/hdl-deflate/trunk

Subversion Repositories hdl-deflate

[/] [hdl-deflate/] [trunk/] [README.md] - Blame information for rev 7

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 7 tomtor
# HDL-deflate
2
FPGA implementation of deflate (de)compress RFC 1950/1951
3
 
4
This design is implemented in MyHDL (www.myhdl.org) and can be translated to Verilog or VHDL.
5
 
6
It has been verified in Icarus, Xilinx Vivado and on a physical Xilinx device (Digilent Arty).
7
 
8
Usage should be clear from the test bench in `test_deflate.py`.
9
 
10
# Tunable parameters
11
 
12
    OBSIZE = 8192   # Size of output buffer (BRAM)
13
                    # You need 32768 to compress ALL valid deflate streams!
14
 
15
    IBSIZE = 2048   # Size of input buffer (LUT-RAM)
16
 
17
    CWINDOW = 32    # Search window for compression
18
 
19
## Compression efficiency
20
 
21
One can use a sliding window to reduce the size of the input buffer and the LUT-usage.
22
 
23
The minimal value is 2 * CWINDOW (64 bytes), the UnitTest in `test_deflate.py`
24
uses this strategy.
25
 
26
By default the compressor will reduce repeated 3/4/5 byte sequences in the search window to 15 bit.
27
This will result in a decent compression ratio for many real life input data patterns.
28
 
29
At the expense of additional LUTs one can improve this by enlarging the `CWINDOW` or expanding
30
the matching code to include 6/7/8/9/10 byte matches. Set `MATCH10` to `True` in the top of `deflate.py`
31
to activate this option.
32
 
33
Another strategy for data sets with just a small set of used byte values would be
34
to use a dedicated pre-computed Huffman tree. I could add this if there is interest, but it is probably
35
better to use a more dense coding in your FPGA application data in the first place.
36
 
37
## Compression speed
38
 
39
To reduce LUT usage the original implementation matched each slot in the search window in a dedicated clock cycle.
40
By setting `FAST` to `True` it will generate the logic to match the whole window in a single cycle.
41
The effective speed will be around 1 input byte every two cycles.
42
 
43
## Disabling functionality to save LUTs
44
 
45
The compress mode can be disabled by setting `COMPRESS` to `False`.
46
 
47
The decompress mode can be disabled by setting `DECOMPRESS` to `False`.
48
 
49
As an option you can disable dynamic tree decompression by setting `DYNAMIC` to `False`.
50
This will save a lot of LUT-ram and HDL-Deflate compressed output is always using a static tree,
51
but zlib will normally generate dynamic trees. Set zlib option `Z_FIXED` to generate streams with
52
a static tree.
53
 
54
In general the size of `leaves` and `d_leaves` can be reduced a lot when the maximal length of the input stream
55
is less than 32768. One can replace `test_data()` in `test_deflate.py` with a specific version which generates
56
typical test data for the intended FPGA application, and repeatedly halve the sizes of the `leaves` arrays
57
until the test fails.
58
 
59
FAST MATCH10 compress only has quite good resource usage.
60
 
61
## Practical considerations
62
 
63
In general HDL-Deflate is interesting when speed is important. When speed is not a real issue using a (soft)
64
CPU with zlib is probably the better approach. Especially decompression is also quite fast with a CPU and HDL-Deflate
65
needs a lot of LUTs when configured to decompress ANY deflate input stream. Compression is another story because it
66
is a LOT faster in hardware with the `FAST` option and uses a reasonable amount of LUTs.
67
 
68
# FPGA validation
69
 
70
## Minimal with smaller leaves arrays
71
 
72
Resource|Estimation
73
--------|----------
74
LUT     |7116
75
LUTRAM  |800
76
FF      |2265
77
BRAM    |4
78
 
79
## Compress False and smaller leaves arrays
80
 
81
Resource|Estimation
82
--------|----------
83
LUT     |5769
84
LUTRAM  |512
85
FF      |2169
86
BRAM    |4
87
 
88
## Decompress False and FAST and MATCH10
89
 
90
Resource|Estimation
91
--------|----------
92
LUT     |5118
93
LUTRAM  |84
94
FF      |1577
95
BRAM    |2
96
 
97
## FAST
98
 
99
Resource|Estimation
100
--------|----------
101
LUT     |8246
102
LUTRAM  |704
103
FF      |2520
104
BRAM    |4
105
 
106
## FAST and MATCH10
107
 
108
Resource|Estimation
109
--------|----------
110
LUT     |11888
111
LUTRAM  |536
112
FF      |3308
113
BRAM    |4
114
 
115
## Speed
116
 
117
The Vivado timing report fails at 100Mhz, but the test bench runs fine on my Arty at 100Mhz.
118
 
119
# Future Improvements (when there is interest)
120
 
121
* ~~Reduce LUT usage~~
122
* ~~Improve speed from current 80Mhz to 100Mhz~~
123
* ~~Improve compression performance~~

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.