OpenCores
no use no use 1/1 no use no use
asnc vs sync reset in or1200
by jt_eaton on Nov 16, 2011
jt_eaton
Posts: 142
Joined: Aug 18, 2008
Last seen: Sep 29, 2018
I downloaded the new minsoc release candidate and it ran the uart simulation so I thought that I would strip out the ethnetmac and drop it onto a digilent Nexys2 board. Well it didn't fit which surprised me since I have a similar design that fit nicely. Then I remembered that I had converted the old or1200 from an asynchronous reset to a synchronous one. So I edited out all of the "or posedge wb_rst" text and found that it would fit and give me "hello World." as expected.


I then compared the two xilinx webpack design summaries and it shows exactly what using an async reset costs in a modern fpga design. The difference is rather striking. The biggest difference was that I only inferred the srams instead of setting the `XILINX define and about half of the LUTs went to build four of the srams. The interesting number is the number of 4 input LUTs used as logic which is an apples-apples comparison of the same logic with the two different reset styles. Async uses 42% more LUTs than sync.


Number used as logic: 8,895 async reset
Number used as logic: 6,253 sync reset


John Eaton


original async reset:
-----------------------------------------------

Design Summary:
Number of errors: 2
Number of warnings: 93
Logic Utilization:
Number of Slice Flip Flops: 2,612 out of 17,344 15%
Number of SLICEMs: 4,888 out of 4,336 112% (OVERMAPPED)
(SLICEMs can only be placed in SLICEM sites.)
Number of 4 input LUTs: 17,119 out of 17,344 98%
Logic Distribution:
Number of occupied Slices: 9,207 out of 8,672 106% (OVERMAPPED)
Number of Slices containing only related logic: 9,207 out of 9,207 100%
Number of Slices containing unrelated logic: 0 out of 9,207 0%
*See NOTES below for an explanation of the effects of unrelated logic.
Total Number of 4 input LUTs: 17,245 out of 17,344 99%
Number used as logic: 8,895
Number used as a route-thru: 126
Number used for Dual Port RAMs: 8,224
(Two LUTs used per Dual Port RAM)

The Slice Logic Distribution report is not meaningful if the design is
over-mapped for a non-slice resource or if Placement fails.

Number of bonded IOBs: 112 out of 250 44%
Number of RAMB16s: 8 out of 28 28%
Number of BUFGMUXs: 5 out of 24 20%
Number of DCMs: 1 out of 8 12%
Number of BSCANs: 1 out of 1 100%
Number of MULT18X18SIOs: 4 out of 28 14%

Average Fanout of Non-Clock Nets: 5.43




Converted sync reset
----------------------------------------------------

Design Summary:
Number of errors: 0
Number of warnings: 96
Logic Utilization:
Number of Slice Flip Flops: 2,543 out of 17,344 14%
Number of 4 input LUTs: 6,285 out of 17,344 36%
Logic Distribution:
Number of occupied Slices: 3,942 out of 8,672 45%
Number of Slices containing only related logic: 3,942 out of 3,942 100%
Number of Slices containing unrelated logic: 0 out of 3,942 0%
*See NOTES below for an explanation of the effects of unrelated logic.
Total Number of 4 input LUTs: 6,411 out of 17,344 36%
Number used as logic: 6,253
Number used as a route-thru: 126
Number used for Dual Port RAMs: 32
(Two LUTs used per Dual Port RAM)

The Slice Logic Distribution report is not meaningful if the design is
over-mapped for a non-slice resource or if Placement fails.

Number of bonded IOBs: 112 out of 250 44%
Number of RAMB16s: 12 out of 28 42%
Number of BUFGMUXs: 5 out of 24 20%
Number of DCMs: 1 out of 8 12%
Number of BSCANs: 1 out of 1 100%
Number of MULT18X18SIOs: 4 out of 28 14%

Average Fanout of Non-Clock Nets: 3.75




RE: asnc vs sync reset in or1200
by olof on Nov 16, 2011
olof
Posts: 218
Joined: Feb 10, 2010
Last seen: Dec 17, 2018
I'm not sure I quite understand what you mean. What you describe is that XST fails to identify some constructs as Block RAM and has to map them to logic. You would have to manually instantiate RAMB16 in those places to get a fair comparison.

Async

Number of RAMB16s: 8 out of 28 28%
...
Total Number of 4 input LUTs: 17,245 out of 17,344 99%
Number used for Dual Port RAMs: 8,224

Sync
Number of RAMB16s: 12 out of 28 42%
...
Total Number of 4 input LUTs: 6,411 out of 17,344 36%
Number used for Dual Port RAMs: 32
RE: asnc vs sync reset in or1200
by jt_eaton on Nov 16, 2011
jt_eaton
Posts: 142
Joined: Aug 18, 2008
Last seen: Sep 29, 2018
I'm not sure I quite understand what you mean. What you describe is that XST fails to identify some constructs as Block RAM and has to map them to logic. You would have to manually instantiate RAMB16 in those places to get a fair comparison.

---------------------------------------

XST did fail to identify 4 block srams and so it built them out of LUTs. This is shown in that half of the async designs LUTs are used as dual port RAM. If you only look at the number of LUTs used in logic then that shows the difference that the reset style has on how xilinx compiles an async reset verses a sync one. Even if I called out the xilinx sram blocks the async would still be 42% larger.


Look up xilinx white paper WP-231 for an explanation for why this is happening.

Bottom line: Don't use async resets. I know everyone loved them 10 years ago but they have no place in todays code.

John Eaton






original async reset:
-----------------------------------------------



Total Number of 4 input LUTs: 17,245 out of 17,344 99%

Number used as logic: 8,895
Number used as a route-thru: 126
Number used for Dual Port RAMs: 8,224



(Two LUTs used per Dual Port RAM)





Converted sync reset
----------------------------------------------------

Total Number of 4 input LUTs: 6,411 out of 17,344 36%
Number used as logic: 6,253
Number used as a route-thru: 126
Number used for Dual Port RAMs: 32



RE: asnc vs sync reset in or1200
by olof on Nov 17, 2011
olof
Posts: 218
Joined: Feb 10, 2010
Last seen: Dec 17, 2018
Well, you have to consider that the OpenRISC is used in ASICs too, where you probably will want to have asynchronous set and synchronous release. Other FPGA vendors may also have a different best-practice.

In my opinion, resets shall be kept asynchronous. It's then up to the top-level designer to decide if the reset shall be synchronized to a clock domain before it enters the IP core.


The number you present are very interesting, but I still want to see the async reset with manually instantiated Block RAM. Parts of the extra LUTs could be a side-effect caused by increased replication and higher fanouts as you are driving a lot more logic in the async case.
RE: asnc vs sync reset in or1200
by moogyd on Nov 17, 2011
moogyd
Posts: 15
Joined: Nov 22, 2008
Last seen: Jun 26, 2019
Hi,

I agree with Olof. For ASIC, I *much* prefer asynchronous reset to all FF'.

Why?
- ATPG and Test: It's easy to put all registers into a known state at startup
- Multiple (On/Off) Power Domains: Last design I was involved in, an external IP with synchronous reset added lots of complexity where we needed to switch domains on and off
- Timing is potentially better (no reset in data path)
RE: asnc vs sync reset in or1200
by firefalcon on Nov 17, 2011
firefalcon
Posts: 99
Joined: Jan 10, 2011
Last seen: Mar 26, 2024
I am with John and Xilinx on this one. FPGA's naturally support synchronous resets and to timing is actually easier and more accurate when using synchronous resets.

http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_Resets_rev1_1.pdf
http://www.asic-world.com/tidbits/all_reset.html
http://only-vlsi.blogspot.com/2009/05/synchronous-reset-vs-asynchronous-reset.html
RE: asnc vs sync reset in or1200
by olof on Nov 17, 2011
olof
Posts: 218
Joined: Feb 10, 2010
Last seen: Dec 17, 2018
New Note 21

Describe your new note here.

I am with John and Xilinx on this one. FPGA's naturally support synchronous resets and to timing is actually easier and more accurate when using synchronous resets.

http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_Resets_rev1_1.pdf
http://www.asic-world.com/tidbits/all_reset.html
http://only-vlsi.blogspot.com/2009/05/synchronous-reset-vs-asynchronous-reset.html


That second link had only good things to say about async resets, except that it was sensitive to metastability.

I don't know about all FPGAs, but at least the Virtex-5 CLBs (same as Spartan-6) has selectable async or sync reset on all slice FFs, so they support both styles just as well (see page 174 in ug190 v5.3 or fire up the fpga_editor (which doesn't really work well on latest Ubuntu btw)). You really want sync resets on state machines, but most places can do without reset or have an async reset.

As I said before, the best of two worlds is asynchronous set and syncronous release (which is what happens if you use @posedge rst). I find it hard to see that it would require more logic, as you only need one FF per clock domain to do this, given that you have enough reset nets. I think this is an interesting topic, and I would like to investigate it more. I'm even willing to admit I'm wrong if someone comes up with more evidence :)

RE: asnc vs sync reset in or1200
by jt_eaton on Nov 17, 2011
jt_eaton
Posts: 142
Joined: Aug 18, 2008
Last seen: Sep 29, 2018

The biggest problem with discussions about resets is that you need to distinguish between whether or not you are talking about the reset system or the style of flipflop reset. Your reset system can be synchronous or asynchronous as can your choice of flipflop reset style. But these two choices are completely independent. I can build a synchronous reset system that uses asynchronous reset flipflops or I can build an asynchronous reset system that uses synchronous style flipflops.

You always want to build a asynchronous reset system but you should choose the reset style that works best with your target technology. If you are implementing a design in 7400 ttl then async is your best choice. For everyone else it is synchronous flipflops for a asynchronous reset system.

The trick to doing that is to read Cliff Cummins papers on how to build a synchronous reset distribution tree. That is the best way to get power on reset to all corners of your chip in a controlled fashion.Then you build with synchronous reset flops inside your components and put a wrapper around the outputs that force all outputs into their reset state while the filtered reset signal is active.

You now have a design that is the black box equivalent to the same one using async reset style flipflops. Your test engineer can tell the difference but the mission mode performance is the same.


BTW: Synchronous reset flops do not add more logic or slow down your data path. Synthesis can identify any mission mode logic that also forces the output into the same state as reset and will simply piggyback the power on reset onto that already existing logic. It will also reorder the logic so that the late arriving signal logic is pushed up the cone towards the output while the early arriving signal logic is pushed down toward the inputs. If you follow cliffs advice then reset will be one of your earliest arriving signals and will not be on critical path.

You can also reduce your reset logic by designing for a multicycle reset. You don't have to reset every flipflop on the first clock cycle in reset. The distribution tree will give you a minimum reset pulse that is predictable and the reset wrapper will control your outputs from the first cycle. You only have to make such that every flop is reset before the last clock of the reset pulse.


John Eaton





John Eaton

RE: asnc vs sync reset in or1200
by moogyd on Nov 18, 2011
moogyd
Posts: 15
Joined: Nov 22, 2008
Last seen: Jun 26, 2019


You always want to build a asynchronous reset system but you should choose the reset style that works best with your target technology. If you are implementing a design in 7400 ttl then async is your best choice. For everyone else it is synchronous flipflops for a asynchronous reset system.


If you consider an soft IP (i.e. RTL delivery), it must be designed to support multiple targets i.e. ASIC and FPGA.



You now have a design that is the black box equivalent to the same one using async reset style flipflops. Your test engineer can tell the difference but the mission mode performance is the same.


BTW: Synchronous reset flops do not add more logic or slow down your data path. Synthesis can identify any mission mode logic that also forces the output into the same state as reset and will simply piggyback the power on reset onto that already existing logic. It will also reorder the logic so that the late arriving signal logic is pushed up the cone towards the output while the early arriving signal logic is pushed down toward the inputs. If you follow cliffs advice then reset will be one of your earliest arriving signals and will not be on critical path.


Synchronous reset does add to the data path logic depth, even if it simply changing 2 input to 3 input AND on the D input of a flop, it adds to the timing.
You are correct that the synthesis tool will re-order logic etc to meet timing (but this may increase runtime)


You can also reduce your reset logic by designing for a multicycle reset. You don't have to reset every flipflop on the first clock cycle in reset. The distribution tree will give you a minimum reset pulse that is predictable and the reset wrapper will control your outputs from the first cycle. You only have to make such that every flop is reset before the last clock of the reset pulse.


I think this reset wrapper is more complex than is implied here. Consider clock gating, switched on/off power domains etc. I have always found it is much easier with async flops.
With asynchronous de-assertion (as recommended by Cummings), timing can also be guaranteed.

John Eaton


BTW, does anyone have a copy of the Re-use Methodology Manual? I would be interested to hear what it has to say.

Steven
RE: asnc vs sync reset in or1200
by pekon on Nov 18, 2011
pekon
Posts: 29
Joined: Mar 6, 2009
Last seen: Dec 14, 2020
I second the opinion "the best of two worlds is asynchronous set and syncronous release".
From my experience, i can share following..

------------------
Advantages:
------------------
(1) keeps ur reset logic simple.
(2) all flops get reset in single cycle. (synchronous_reset needs clocks to be running during reset-duration so that whole pipeline of flops are cleared)
(3) "comparatively" less dynamic current load during "Power-up", as clocks can remain gated. This might be advantageous at system-level, if you have weak power-supply because during "Power-Up" everything starts sourcing current in single go.. (even the unused logic)

------------------
Disadvantages:
------------------
(1) "de-assertion" needs to be synchronized per clock-domain basis.
(2) FLOP standard-cell size is bit large (due to extra circuit inside standard-cell to account for asynchronous clear). So bit area hit (LUT hit)
(3) from STA (Static Timing) point of view, need to block arcs for assertion from "async reset" to output of flop.

------------------------------------------------------------------------------------
If FPGA/Technology does not support "async reset flops", but RTL has "async reset flops"
------------------------------------------------------------------------------------
(1) During netlist synthesis, you can force the synthesis tool to convert all async-reset flops to sync-reset flops.

(2) you can always convert it into "synchronous reset" by adding synchronizer in front of it. but at Power-Up you need to extend "system level" to longer duration, so that synchronizer captures it. However, this can be done external to block as and when required.

(3) Time the paths during building of reset-tree.

(4) And extend reset till all flops are cleared in pipeline


(still async-reset with sync deassertion is best)
with regards, pekon
RE: asnc vs sync reset in or1200
by pekon on Nov 18, 2011
pekon
Posts: 29
Joined: Mar 6, 2009
Last seen: Dec 14, 2020
I second the opinion "the best of two worlds is asynchronous set and syncronous release".
From my experience, i can share following..

------------------
Advantages:
------------------
(1) keeps ur reset logic simple.
(2) all flops get reset in single cycle. (synchronous_reset needs clocks to be running during reset-duration so that whole pipeline of flops are cleared)
(3) "comparatively" less dynamic current load during "Power-up", as clocks can remain gated. This might be advantageous at system-level, if you have weak power-supply because during "Power-Up" everything starts sourcing current in single go.. (even the unused logic)

------------------
Disadvantages:
------------------
(1) "de-assertion" needs to be synchronized per clock-domain basis.
(2) FLOP standard-cell size is bit large (due to extra circuit inside standard-cell to account for asynchronous clear). So bit area hit (LUT hit)
(3) from STA (Static Timing) point of view, need to block arcs for assertion from "async reset" to output of flop.

------------------------------------------------------------------------------------
If FPGA/Technology does not support "async reset flops", but RTL has "async reset flops"
------------------------------------------------------------------------------------
(1) During netlist synthesis, you can force the synthesis tool to convert all async-reset flops to sync-reset flops.

(2) you can always convert it into "synchronous reset" by adding synchronizer in front of it. but at Power-Up you need to extend "system level" to longer duration, so that synchronizer captures it. However, this can be done external to block as and when required.

(3) Time the paths during building of reset-tree.

(4) And extend reset till all flops are cleared in pipeline


(still async-reset with sync deassertion is best)
with regards, pekon
RE: asnc vs sync reset in or1200
by jt_eaton on Nov 18, 2011
jt_eaton
Posts: 142
Joined: Aug 18, 2008
Last seen: Sep 29, 2018



If you consider an soft IP (i.e. RTL delivery), it must be designed to support multiple targets i.e. ASIC and FPGA.


Which means that hard coding either way will be wrong for at least some cases. We need to make it user selectable so that they can try it both ways and use what is best for their target




Synchronous reset does add to the data path logic depth, even if it simply changing 2 input to 3 input AND on the D input of a flop, it adds to the timing.



Yes and using a larger async flop instead of a smaller sync one will grow your design and make all your paths longer. Arguments like these are meaningless, You have to route your design for your target in an A/B comparison to find the answer for your design




I think this reset wrapper is more complex than is implied here. Consider clock gating, switched on/off power domains etc. I have always found it is much easier with async flops.
With asynchronous de-assertion (as recommended by Cummings), timing can also be guaranteed.




You treat clock gating exactly as you treat a different clock domain. You resync your reset to that domain and use it with all the other gated clock flops. If you gate your clock to once every four cycles but fail to resync the reset to that domain then can reset can come in as a single cycle path. Since all your other paths are four cycle paths then synthesis will treat reset as the critical path and give it all the resources it needs. You don't want to do that.


When you switch power you put a wrapper around the component that blocks the inputs from driving in and holds the outputs to a static latched value. All I am saying is that you need to force those latched values to their reset values during reset.


BTW, does anyone have a copy of the Re-use Methodology Manual? I would be interested to hear what it has to say.


Keating actually recommends async simply because it is more commonly used and "interoperability is more important that(sic) any small differences in ease of implementation".


I am seeing a 40% premium for using async resets and I don't consider that a small difference. Plus the RMM is about 10 years old. Things have changed since then. We have gone from processes where gate delays determined your clock speed to todays processes where routing delay predominate.

There are a lot of things that used to be true that are now completely turned around.Reset styles is one of them. We need to stay ahead of the technology

John Eaton

BTW:

Another thing that has changed is full adder design.I was taught that a ripple carry full adder was small but slow. If you wanted to go faster then use a carry lookahead design. Today if you design a ripple carry adder where the carry chain length is optimized then you cannot improve its performance. Any attempt to route the carry away from the array and back will cost you more in routing delays that you will recover from bypassing chunks of the carry chain.

Designers need to realize that the techniques that they have used their entire career and are comfortable with may no longer work.



RE: asnc vs sync reset in or1200
by pekon on Nov 20, 2011
pekon
Posts: 29
Joined: Mar 6, 2009
Last seen: Dec 14, 2020
Hi John Eaton,


I am seeing a 40% premium for using async resets and I don't consider that a small difference. Plus the RMM is about 10 years old. Things have changed since then. We have gone from processes where gate delays determined your clock speed to todays processes where routing delay predominate.

There are a lot of things that used to be true that are now completely turned around.Reset styles is one of them. We need to stay ahead of the technology

John Eaton


I agree on ur point of hefty area penalty, but such "important" design decisions should be taken cautiously. A single mistake or wrong approach here, might make ur design buggy and manytimes un-reusable, which is far more costly in today's time than 40% area hit. Because time-to market with correct features and right ingredients is what is required to survive..
I would like to highlight one more following issue while using "sync-reset flops" at "outputs" of design.

----------------------------------------------------------------------------
CASE 1: Possible Issue in "sync-reset", due to duplication of synchronizer
----------------------------------------------------------------------------
There are duplicated sychronizers on reset path (A) and (B).
And output of design is OUT = A ^ B = A XOR B

Now on actual silicon cycles taken by sychronizer to propagate reset is "un-deterministic" (1-3).
Suppose synchornizer on path(A) takes 2 cycles, while synchornizer on path(B) take only 1 cycle.
(assuming both A and B are at '1' before reset.. Something like in below waveform will happen)

|--0--|--1--|--2--|--3--|--4--|--5--| (clock cycles)

|__/--------------------------- (Reset)

|====|====|====|____|____|____| (A)

|====|====|____|____|____|____| (B)

|____|____|====|____|____|____| (OUT= A XOR B)

(NOTE: signals edges might be somewhat skewed in above waveform due to white-space formatting)

Issue: OUT(A^B) will have glitch at cycle-2 due to mismatch in sychronizer timings.

----------------------------------------------------------------------

So, I propose one more implementation format, which will serve both purpose (reduce area hit + no need of wrappers)
(1) Use async reset flops for all registered outputs, and last flops on combo outputs or design.
(2) Use sync reset flops for any paths which are internal to design.

ADVANTAGES:
(1) Reduces area penalty as only flops which are "driving" to external world will be 'async-reset' flops, while most of all other data-path flops can be made sync-reset.
(2) External pluggable bus/modules, need not to worry about reset-tree delays, as any signal coming to them will be reset asynchronously.

DISADVANTAGES
(1) However, this puts some burden on RTL designer and requires recursive checks whether he has adhered to the given guidelines.
(2) Reset assertion needs to be extended to multiple cycles, till all pipelines inside the design are synchronously cleared.


(Please analyze and suggest a feedback)

with regards, pekon
no use no use 1/1 no use no use
© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.