OpenCores
URL https://opencores.org/ocsvn/lateq/lateq/trunk

Subversion Repositories lateq

[/] [lateq/] [trunk/] [descr.txt] - Blame information for rev 2

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 wzab
The pipelined architecture is often used in high speed FPGA cores.
2
In complex designs data processing is often splitted into multiple
3
paths performing some (maybe different) operations in parallel.
4
In such a case it may be difficult to keep the same latency
5
(measured in clock periods) in all paths.
6
 
7
GUI based tools (e.g. Xilinx System Generator or Altera DSP Builder)
8
take care to equalize (balance) latencies (delays)
9
of different paths.
10
However it seems, that up to now there is no good solution for designers
11
implementing their designs in HDL.
12
This project offers a methodology, which allows to automatically balance
13
latenices in different paths of pipelined core, so that data arriving
14
to certain processing block, or appearing on the output are properly
15
aligned in time.
16
As estimating delay based on analysis of the source code may be difficult
17
and error prone (if the author uses non-standard solutions), the method
18
is based on simulation.
19
 
20
Data going through the IP core are labelled (in simulation only) with
21
additional "time marker", which the user has to generate on the input.
22
The directives "-- pragma translate_on" and "--pragma translate off" are
23
used to limit generation and processing of those labels only to simulation.
24
 
25
Wherever the user wants to equalize latency in certain data paths, he/she
26
places a special block (latency equalizer - "lateq").
27
The equalized data paths are routed through the shift registers with length
28
calculated from results of previous simulation (the initial length is equal to 0).
29
The block may work in two simulation modes and one synthesis mode.
30
 
31
1. In the standard simulation mode it reports the time markers
32
of the data in different paths. This "delay report" is written to the file
33
(the next version may use a C++ written function called via VHPI to analyze
34
those reports without writing them to the file, or a dedicated program
35
connected via named socket). The delay report file is then analyzed by
36
another program "latreadgen.py", which generates a dedicated function returning the
37
appropriate delay for each path in each equalizer block.
38
 
39
2. If the delays are already correctly selected, the user may set the
40
parameter switching on the "final verification". In this mode the delay
41
report is not generated, so the simulation is faster and no disk
42
space is used for the report. In this mode any inconsistency of time
43
markers on the output of the "lateq" block cause simulation error.
44
 
45
3. In the synthesis mode, all instructions related to time markers
46
and their processing are switched off using the "-- pragma translate_off" and
47
"-- pragma translate_on" directives. Therefore the system does not affect
48
performance of the IP core.
49
 
50
The proposed system is offered in two versions.
51
1. The first version, located in the "single_type" subdirectory assumes, that
52
all time aligned data in all data paths are of the same type.
53
It allows to use the versatile latency checking and equalizing block implemented
54
in a pure VHDL.
55
 
56
2. The second version, located in the "various_types" subdirectory assumes,
57
that the datapath may use various number of data od different types.
58
Unfortunately it is not possible (yet?) to implement so flexible latency checking
59
and equalizing block in pure VHDL.
60
The generic types are introduced only in VHDL-2008, and they are still
61
not fully supported by most synthesis tools. But to implement the needed block
62
we would need to have records of different types as ports, and then iterate through the
63
fields in this records.
64
To solve that problem, the project contains the tool "lateqgen.py", which may generate
65
such "latency checker and equalizer" (LECQ) for particular number of paths of user
66
provided types. The calling syntax is:
67
 
68
lateqgen.py entity_name output_file type_for_path0 type_for_path1 ...
69
 
70
Number of paths in the LECQ block is defined by the number of types provided
71
at the end of the command line. Of course you can use the same type in two paths.
72
In this case you simply use the same type name more than once.
73
 
74
To allow passing of data together with the time markers through the design, the
75
data must be encapsulated in record types with optionally (only for simulation)
76
added time marker field (lateq_mrk).
77
Due to the way how the VHDL sources are generated, it is required that name of
78
each user type passing through the LECQ starts with "T_"
79
(e.g. "T_MY_DATA"). Additionally user must define the constant, which will
80
be used to initialize all registers. The name of this constant is derived from
81
the type name by replacing initial "T_" with "C_" and by adding "_INIT" at the end
82
(so in our example it will be "C_MY_DATA_INIT").
83
 
84
Yet one thing must be explained. It is necessary, that each LECQ block
85
must be uniqely identified. In theory we could use the VHDL INSTANCE_NAME
86
atrribute for it.
87
Unfortunately it appears, that generated instance names are different in different
88
simulators. Most likely they will be also different in the synthesis tools,
89
so it will be not possible to pass delays found in the simulation
90
to the synthesis (maybe for a single set of tools, like Vivado and its simulator
91
it would be possible to adapt tools to convert those instance names
92
apropriately).
93
To avoid described problem the user should pass the unique LEQ_ID generic
94
to each instance of the delay equalizer.
95
What if this block is located in another one, used multiple times?
96
In this case the user should implement the LEQ_ID generic also in this
97
container block, and pass the unique value to it. Than the LEQ_ID generic
98
value from the instance of the container block should be concatenated with
99
":" string and the unique value used for particular instance of the LECQ block.
100
 If the blocks are instantiated in the for generate loop, the user should concatenate
101
also the loop variable value converted to the string (with integer'image function)
102
and again separated with ":".
103
 
104
To allow you to check the described technology, the project contains relatively
105
simple system with 64 inputs from ADC converters, measuring the signals from certain
106
particle detector.
107
The signal generated by the particle passing through the detector is distributed
108
between neighbouring channels (strips). To find the position and anergy of
109
the particle, the system first finds the strip with maximal level of signal.
110
Then it selects predefined number of channels surrounding that "central" strip.
111
For selected channels it calculates the sum of the signal
112
(the energy of the particle) and sum of each signal multiplied by the deviation
113
from the central channel (so the center of gravity of the registered signal may be
114
calculated to improve resolution of the detector).
115
The final calculation of hit position is done in the testbench, as I didn't want to
116
increase project's complexity by implementation of divider block.
117
 
118
Finding of the maximum and calculation of the sums is performed in the hierarchical
119
tree-based comparators and adders. The user may define number of inputs handled
120
in each node of the tree (parameters "EX1_NOF_INS_IN_CMP" and "EX1_NOF_INS_IN_CMP"
121
in the ex1_trees_pkg.vhd file). You can play with these values, changing the number
122
of levels, and hence delay in different paths.
123
 
124
Two versions of the demo use also different implementations of time markers.
125
The first one simply uses integers starting from -1 and increased every clock pulse.
126
So the implementation will fail after 2^31 clock cycles.
127
The second version uses time markers defined as integers starting form minus one, and then
128
increasing every clock pulse until certain predefined C_LATEQ_MRK_MAX value is reached.
129
In the next pulse the time marker returns to 0. such implementation allows to run much
130
longer simulations and works correctly until the highest latency difference is below
131
C_LATEQ_MRK_MAX/2-1 (of course the latency difference must be calculated in a special way).
132
 
133
HOW TO USE PROVIDED DEMOS
134
To use demos you need the following packages:
135
Python3
136
GHDL
137
gtkwave
138
 
139
To start with the fresh configuration, with all additional delays set to 0, you should type:
140
make initial
141
 
142
Then you can test the design with:
143
make final
144
 
145
This performs the simulation in the "final mode". As the design is not synchronized properly,
146
you should see an error message like this:
147
src/ex1_eq_mf.vhd:116:16:@80ns:(report failure): EQ1 inequal latencies: out0=0, out1=-1
148
To synchronize the design corectly, you should type:
149
make synchro
150
 
151
This runs simulation in the analyzis mode, then creates the correct function for configuration
152
of LCEQ blocks, and finally runs the simulation once again.
153
You should see report about properly detected two pulses:
154
src/ex1_proc_tb.vhd:122:11:@300ns:(report note): Hit with charge: 2.5e2 at 1.475999999999999e1
155
src/ex1_proc_tb.vhd:122:11:@380ns:(report note): Hit with charge: 2.649999999999999e2 at 2.549056603773585e1
156
 
157
You can run "make reader" to analyse signals inside of design with gtkwave viewer.
158
You can play with number of inputs in single adder and comparator, by changing the constants
159
EX1_NOF_INS_IN_CMP and EX1_NOF_INS_IN_ADD in file ex1_trees_pkg.vhd.
160
 
161
You will see how the delay in different paths changes, and how the system adapts to this changes.
162
 
163
 
164
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.