This project implements a sorter able to sort a continuous stream of data, consisting of records labeled with "sort keys".
Sorter sorts one record every two clock cycles.
Sorter is based on the heap sort algorithm. Efficient implementation is assured thanks to the use of internal dual port
RAM in FPGA.
The required size of heap is equal to the expected maximum distance between unsorted records in the data stream.
The sorter implemented in this project is designed for sorting of stream of constant length records.
The main supposed application area are the multichannel data acquisition systems, where data records
labelled with time stamps are transmitted through multiple channels with different latencies
to the data concentrator, which should sort data according to their time stamps and source identifiers
before sending them to further processing.
Each record contains the "sort key" and the payload.
In the simplest case, the sort key may be a N-bit bit vector, treated as unsigned integer number.
Of course in case of lengthy data stream, the sort key will reach its maximum
value of 2^N-1 and then wrap to 0.
For the sorter it is fully acceptable, as long, as the capacity of the sorter (maximum number of data records
stored in the sorter) summed with the maximum distance between unsorted records in the input data stream is less than half of the maximum value of the sort-key.
In this case we may compare sort keys of the newly received data record and data records stored in the sorter
using simple operation:
signal unsigned sort_key1(N-1 downto 0);
signal unsigned sort_key2(N-1 downto 0);
signal signed sort_key_diff(N-1 downto 0);
Correct operation of the sorter requires, that the heap is initially filled with the records labelled
with smallest possible values of the sort key. Similarly, when all data are sent to the sorter, it is necessary to send to the input of the sorter records with the biggest possible value of sort key.
The "smallest possible" and the "biggest possible" values are implemented using additional bit fields "init" and "valid". This implementation allows as also to mark temporary interruptions in the stream of data, as shown below:
init=1, valid=0 - "initial record"
The sorter is filled with initial records on the beginning of operation. The "initial record" is "smaller" than any data record. On the output of the sorter initial records may be discarded.
init=1, valid=1 - "end record"
When all data are transferred to the sorter, you should feed the sorter with "end records" to keep the data flowing through the sorter, and to "flush" last sorted data out of the sorter. The "end record" is "bigger" than any data record. When first "end record" appears on the output, all data are sorted.
init=0, valid=1 - "standard data record"
This record contains the useful data. The time-stamp is stored in the "sort-key" and user data in the "payload".
init=0, valid=0 - "empty data record"
When in the particular sorting cycle there are no valid data on the input of the sorter, the last data records remain in the sorter. Therefore you may feed the sorter with "empty data records" with sort key equal to to the highest recent sort key, to keep data flowing through the sorter. On the output the "empty data records" may get discarded.
The main entity of the sorter is defined as follows:
entity sorter_sys is
NLEVELS : integer := SYS_NLEVELS -- number of levels in the sorter heap
din : in T_DATA_REC;
we : in std_logic;
dout : out T_DATA_REC;
dav : out std_logic;
clk : in std_logic;
rst_n : in std_logic;
ready : out std_logic);
To customize it to your needs, you should adjust definition of the T_DATA_REC
type in the sorter_pkg.vhd
Additionally you may also need to modify functions for converting data records into the std_logic_vector and from the std_logic_vector (tdrec2stlv
You may also change definition of the T_SORT_KEY type
, but then probably you should also change the function sort_cmp_lt
used to compare sort keys in two data records.
The sources provide also complete testbench for my sorter. To check it, you should simply adjust parameters given in the "sort_test_gen.py" file and run "make".
The Python script will generate the sorter configuration and input records. Then the makefile will compile the testbench and run the simulation.
Finally the records on the output will be analysed. If everything works fine, the "Test passed!" message will be displayed.
The simulation is prepared for the Linux system, and uses the ghdl
If you want to synthesize the sorter for use in real FPGA, you should replace the implementation found in dpram4.vhd
with implementation based on:
The source is available in dpram_inf.vhd
All my sources in this archive are published under the BSD license. You can use them and modify them, however you should keep the information about the original author.
First implementation of the sorter has been described in my paper:
Wojciech M. Zabołotny, "Dual port memory based Heapsort implementation for FPGA", Proc. SPIE 8008, 80080E (2011); doi:10.1117/12.905281
I'll be glad if you cite my paper, when you publish something based on my sources.
The archive contains also some sources (dpram4.vhd, dpram_inf.vhd) which were obtained from sources with unclear licensing conditions - so I simply provide them for completeness, but I'm not able to set any specific licensing for them.
I don't know whether my sorter infringes any patents. If you want to use it for commercial purposes, you should check it yourself. I also don't know if my sorter works correctly in all possible conditions. I provide it "AS IS" without any warranty. You use it on your own risk!
First version of the sorter is available on my own website:
HLS implementation for Xilinx FPGAs
I have prepared different implementations of my heap sorter in C using the Xilinx HLS compiler.
The detailed description of those implementations and their properties may be found in my paper:
Wojciech M. Zabołotny, "Implementation of heapsort in programmable logic with high-level synthesis", Proc. SPIE 10808 (2018); doi:10.1117/12.2502093