OpenCores
no use no use 1/1 no use no use
3 Clock Cycle Problem
by mericisgenc on Sep 20, 2013
mericisgenc
Posts: 10
Joined: Jun 6, 2013
Last seen: Mar 8, 2018
Hello,
I have been using minsoc and observing important signals via an oscilloscope. I realized that it takes "at least" 3 clock cycles for the program counter to proceed to the very next value. Even the NOP instruction takes 3 clock cycles! So is this a problem related with my system or only two clock cycles latency introduced by the Debug Unit? If it is, are there any methods to avoid the latency? (By somehow disabling the debug unit or something else)
Thanks in advance.
RE: 3 Clock Cycle Problem
by mericisgenc on Sep 25, 2013
mericisgenc
Posts: 10
Joined: Jun 6, 2013
Last seen: Mar 8, 2018
Hello everyone again,
I don't know if you read this topic and thought about the problem but this situation is still going on. I have NOP instructions taking `3 CLOCK CYCLES`. Can't we do theory crafting at least, please?
RE: 3 Clock Cycle Problem
by mericisgenc on Sep 26, 2013
mericisgenc
Posts: 10
Joined: Jun 6, 2013
Last seen: Mar 8, 2018
Hello,
I commented out all of the statements which disabled the synthesize of IC, DC, IMMU, DMMU etc. However I still get the same result. I also have the ic_en signal on the oscilloscope and it still is LOW. I added an oscilloscope screen showing least significant 8 bits of PC (Bus2), the opcode (Bus1), Clock (D1), ic_en signal (Bus3), and icqmem_ack_ic signal at D0.
I would appreciate if you could help me out finding the possible solution.
Thx in advance!
RE: 3 Clock Cycle Problem
by rfajardo on Sep 27, 2013
rfajardo
Posts: 306
Joined: Jun 12, 2008
Last seen: Jan 6, 2020
Hello Mericisgenc,

I believe the Wishbone initiators from OpenRISC have registered outputs and therefore can only make one single data access each two cycles. Read Chapter 4 of the Wishbone specification:
http://opencores.org/opencores,wishbone

That means that the memory can only be read each two cycles in case OpenRISC would access the memory directly. As soon as you turn on Cache or the QMEM, burst transfers could copy data from memory to Cache or QMEM that reduce the access penalty. The CPU would access the Cache or QMEM instead. These accesses should actually occur in a single clock cycle.

However, every memory access that hasn't been copied to Cache has to reach out for memory and takes even longer. The hit miss rate of the Cache depends on the code being executed. It is therefore difficult to say how long instructions will take in advance. Turning on Cache and QMEM will help you. But it is no guarantee that the data will be already available there.

Generally, the Debug Unit does not interfere with the CPU program counter. It does if you are actually stepping over each instruction. Your best bet would to grab a piece of linear code with a known number of instructions, execute it in a loop and measure the time for completion. Then you would get a good measure of the instructions/time you are getting. And you could actually compare the times with or without Caches and QMEM for your piece of code.

I hope this helps you a little in your chase for the instructions / cycle measurement.
Best regards,
Raul
RE: 3 Clock Cycle Problem
by mericisgenc on Oct 4, 2013
mericisgenc
Posts: 10
Joined: Jun 6, 2013
Last seen: Mar 8, 2018
Thanks a lot Raul!
The problem was not giving HIGH value to necessary Special Purpose Registers. (ICE, DMMU etc.). Also after uncommenting the lines about IC, DMMU, DC, IMMU units in or1200_defines you have to be careful because the "line size" of the cache might not be the same with the one in board.h file. Therefore you may need to increase the one in defines file to 1w_8k..
Best regards.
no use no use 1/1 no use no use
© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.