Line 10... |
Line 10... |
Correct execution is indicated by the decoded output being a descending count
|
Correct execution is indicated by the decoded output being a descending count
|
from 223 to 1, repeated 3 times.
|
from 223 to 1, repeated 3 times.
|
|
|
For synthesis, import into a C based design tool, set appropriate HW parameters
|
For synthesis, import into a C based design tool, set appropriate HW parameters
|
and synthesize for a particular platform.
|
and synthesize for a particular platform.
|
|
|
|
For our case study of generating hardware from this code, the following
|
|
transformations and directives were used for improving performance:
|
|
|
|
- Break up complex functions into smaller & simpler computational
|
|
segments/loops to allow independent optimizations, for example see
|
|
berlekamp.cpp, each loop labelled for further directives.
|
|
|
|
- Modify loops dependent on dynamic bounds to use static bounds and
|
|
mask output to give correct result.
|
|
|
|
- Unroll 'For' loops with fixed static bounds maximally, marked in
|
|
code as comments corresponding to each loop to be unrolled. Loops
|
|
unrolled in berlekamp.cpp, chien_search.cpp, gf_arith.cpp,
|
|
syndrome.cpp.
|
|
|
|
- Synthesize hardware units for individual functions rather than the
|
|
entire logic as a single unit, marked in code as comments for
|
|
independent synthesis.
|
|
|
|
- Use appropriate streaming buffers for communication between
|
|
functional units. Buffers should allow fine-grained pipeline
|
|
parallelism across a single data block where possible.
|