OpenCores
no use no use 1/2 Next Last
WB DDR3 SDRAM Controller
by robfinch on Jul 28, 2016
robfinch
Posts: 28
Joined: Sep 29, 2005
Last seen: Nov 18, 2024
I think the 32 bit access cycles are a bit limiting. Interfacing to the controller is likely to go through another layer in order to gain multiple ports to memory. Most access to the DRAM will be reads to fill cache lines or buffers. It would be better to have a much wider access (128 bits); it allows one to build the system at a slower clock rate than the high-speed memory clock. In a system I�m currently working on 4, 128 bit reads (512 bits) are performed in a pipelined manner in order to get necessary bandwidth.
Also, output of the SDRAM controller may have to drive another memory port controller in order to support multiple ports to memory. Translating 128 bit data to smaller chunks may be handled by this controller.
Does the controller use burst mode to access the DDR ram ? What is the burst length ?
RE: WB DDR3 SDRAM Controller
by dgisselq on Jul 29, 2016
dgisselq
Posts: 247
Joined: Feb 20, 2015
Last seen: Oct 24, 2024
Thank you for writing and sharing your thoughts. I think the core will support your approach nicely.

A couple of thoughts:

  1. Please understand that this core is currently a work in progress and bear with it as it does not currently work as of yet. As an example from yesterday, the startup code didn't even work to bring the device out of reset, much less the code to access the device. I made a lot of progress, but like I said it is a work in progress.
  2. The clock on the memory device side will always be 200MHz and 32-bit. Anything less would be to sacrifice performance.
  3. The DDR3 specification offers two burst length modes: 4x16 bit and 8x16 bit. However, upon examination, the 4x16 bit mode offers exactly the same 8x16 bit timing--there's no speed advantage to using 4x16 bit mode as you still need stall cycles for the other 4x16 bits. Therefore, this controller is being designed for eight word bursts. As one might expect, these bursts must be aligned upon appropriate boundaries.
  4. Building a front end that will lower the clock rate to 50MHz, and increase the transfer size to 128 MHz should be quite doable, although it's not the task that I will be working on.

I've gotta run now, or I might write longer.

Dan

RE: WB DDR3 SDRAM Controller
by dgisselq on Jul 29, 2016
dgisselq
Posts: 247
Joined: Feb 20, 2015
Last seen: Oct 24, 2024
Gosh, did I say 128MHz? I meant 128 bits ...
RE: WB DDR3 SDRAM Controller
by robfinch on Jul 29, 2016
robfinch
Posts: 28
Joined: Sep 29, 2005
Last seen: Nov 18, 2024

:


  1. The DDR3 specification offers two burst length modes: 4x16 bit and 8x16 bit.

    Are those word bursts ? Wouldn't it be 4x32 and 8x32 for a 16 pin wide DDR? DDR3 uses 2 bits per pin and multiple voltage levels? And I'm assuming the core would be interfaced to a PHY provided by the vendor ?
  2. Building a front end that will lower the clock rate to 50MHz, and increase the
    I was thinking that burst mode would have to fill a fifo of some sort. And the clock rate on the fifo read could be 1/4 that of the write while the read is 4x wider. I think this has to be built into controller rather than as a front end or it will increase the latency of memory operations. (Due to cascaded fifo's).
  3. Maybe some core parameters to control the bus widths and clocking ratios.
  4. The controller will likely be interfaced to another component to provide multiple channels/ports to the memory.
  5. I'm used to using the vendor's MIG controller.

Rob

RE: WB DDR3 SDRAM Controller
by dgisselq on Jul 29, 2016
dgisselq
Posts: 247
Joined: Feb 20, 2015
Last seen: Oct 24, 2024
Rob,

I appreciate your inputs!


  1. I'm used to using the vendor's MIG controller.

    I would love to produce something that works similar to the proprietary MIG! However, please understand that this is my first attempt at trying to get anything running at all using DDR3. As a result, I'm not sure how parameterizable I'll be able to make it--especially on the first time around. I'd like to make it parameterized by device timing, perhaps even device data path width, but there are a couple of parameters that've suggested such an approach might be difficult. As I recall, row activation time was one of those. As a result, this may find its way into a secondary goal behind just getting it up and running.

    Further, if someone else wants to help by testing this on hardware with other timing requirements, that'll help to make sure that this works across multiple hardware domains.

    For now, I'm working with a DDR3-1600 with 11-11-11 tRCD-tRP-tCL. It is the chip on the Arty, so you can look up details from there.

  2. My goal is to create a memory that will work with the standard pipeline mode of the Wishbone B4 spec. The bus width is likely going to be closely tied to the hardware. That means it will be 32-bits at 5ns. My goal is to pipeline this, so tht it can sustain this rate as long as addresses are increasing. Thus, you should be able to work with this from a 128-bit/20ns bus without too much delay--certainly no more than 35ns, no? Would this be reasonable? (Let's see, the delay within the controller may be as much as 90ns from read request to result, so ... this is small, no?)
  3. The controller will likely be interfaced to another component to provide multiple channels/ports to the memory.

    I will not be building multiple channels/ports to memory--it's just not in my requirement set. That said, I find this to be a fascinating requirement that would be fun to hear and discuss further. Still, I think a wrapper should be able to handle this requirement nicely.

  4. Now, back to your first question:

    The DDR3 specification offers two burst length modes: 4x16 bit and 8x16 bit.

    Are those word bursts ? Wouldn't it be 4x32 and 8x32 for a 16 pin wide DDR? DDR3 uses 2 bits per pin and multiple voltage levels? And I'm assuming the core would be interfaced to a PHY provided by the vendor ?

    I just looked it up: Figure 33 on Page 62 of the JEDEC DDR3 spec makes it clear that the burst-length 8 mode includes 4 standard clock cycles, or 8 dual data rate cycles. Hence, for a 16-bit wide memory, there will be 8 cycles of 16-bits each taking place at twice the clock rate. Equivalently, there will be 4 cycles of 32-bits each taking place at the clock rate.

    As for the PHY to get up to dual data rates, I have a vendor dependent core that will probably be made part of this project as well. That core specifically handles the data line wires by taking a clock, an output enable line, two bits in, and it produces two bits out. Hopefully, that core will be sufficient to handle business ... but I've got a ways to go still before I'll be able to test that. Worst case, there will be no way to build a similar core for other vendors, and the core will require another 5ns delay. We'll see what time tells as I build this out.

Is this a controller you would like to use for your projects as well (once completed and working ...)? I would certainly welcome any help with testing the core on other devices, if that's a help you would be able to provide! Just--give me about a week or so to get the core up and running first ....

Thanks,

Dan

RE: WB DDR3 SDRAM Controller
by dgisselq on Jul 31, 2016
dgisselq
Posts: 247
Joined: Feb 20, 2015
Last seen: Oct 24, 2024
Rob,

There is one clock lost to getting parameters from the bus. This is due to the typical high fanout of the bus, and the difficulty of doing logic on bus inputs following that fanout. This makes sense if the controller is attached directly to the bus.

It doesn't make sense if the memory controller is being attached directly to a subcontroller, such as you mention above. Whether or not the bus is registered could easily become a parameter of the entire system. It might save 5ns or so--I'm not sure how tight your requirements are.

Of course, if you are like me, my speed requirement is: as fast as possible, and prove to me how fast you can get it. Under that justification, we should put such a parameter in place. :)

Dan

RE: WB DDR3 SDRAM Controller
by robfinch on Jul 31, 2016
robfinch
Posts: 28
Joined: Sep 29, 2005
Last seen: Nov 18, 2024
I've been wondering if the DDR3 controller could be used to control a DDR2 SDRAM by adjusting the latencies appropriately. I've been trying to establish what the difference is between DDR3 and DDR2 and other than internals like prefetch buffer size, I've not found any. I don't have a DDR3 system to test on, but sometime in the future I may.

I would like to use the controller to further isolate systems from vendor dependence.
RE: WB DDR3 SDRAM Controller
by robfinch on Jul 31, 2016
robfinch
Posts: 28
Joined: Sep 29, 2005
Last seen: Nov 18, 2024
I thought a diagram of how I currently use a DDR controller might help.
MPMC.png (59 kb)
RE: WB DDR3 SDRAM Controller
by dgisselq on Jul 31, 2016
dgisselq
Posts: 247
Joined: Feb 20, 2015
Last seen: Oct 24, 2024

I don't know the differences between DDR2 and DDR3, other than there is already a DDR2 controller on this site. I don't know how well or poorly it works, I haven't tried it.

As for your application, that brings up some interesting strategy questions within the controller. Consider:

  • It costs 15ns to activate/open a row. A row must be closed before it can be opened.
  • If the row isn't being used, it costs another 15ns to precharge/close it.
  • Once opened, a row can be kept open until a different one is needed, or the next refresh cycle forces all rows to be closed.
  • If the bank has an open row, but it is open to the wrong row, it will therefore cost at least 30ns to close that row and open the row of interest within that bank. I say at least, since if there is a pending write to the bank, the pending write must complete (up to 45ns) and then another 25ns must pass before the row can be closed (IIRC--this is a criteria I have yet to place into my design).

I am tuning the controller that I am building for an application that would like to read items sequentially through memory. That means that when I activate (open) a row of a bank, I am also likely to activate the next bank over as well, closing both first if necessary. Further, I am not planning to close any banks until either 1) the mandatory refresh period times out and I am forced to close all banks, 2) the bank is open to the wrong row, or 3) the bank prior is being opened and closing this one will help prepare it for continuous reading.

The problem is, if you have a group of 8+ actors trying to deal with memory, the sequential access assumption may not make the most sense. It may make more sense to optimize access to the bank around completely random accesses. In that case, you would want to close any bank row that isn't being used to save the 15ns cost of closing once you discover you need it activated for a different row.

I'm not going to swear that I have the optimal strategy. It's just the one I'm building right now. I will welcome thoughts others might have as to what strategy makes the most sense.

Just some thoughts to consider--indeed, thoughts that you don't get to consider with a proprietary memory controller.

Dan

RE: WB DDR3 SDRAM Controller
by robfinch on Aug 2, 2016
robfinch
Posts: 28
Joined: Sep 29, 2005
Last seen: Nov 18, 2024
Although there's 8+ actors in the system, I think I can arrange the one with the greatest sequential access requirements (the bitmap display) to stay within a single bank (or two if page flipping). And arrange the system so other things are not accessing the bank.

Just looking at the SDRAM controller as it is now, it looks like one must write all 128 bits for a write transaction. There doesn't seem to be a i_wb_sel line for byte write enables (or even 32 bit word enables). This would mean that data updates have to be read-modify-write cycles.
RE: WB DDR3 SDRAM Controller
by dgisselq on Aug 2, 2016
dgisselq
Posts: 247
Joined: Feb 20, 2015
Last seen: Oct 24, 2024

Yeah, I tend not to use the sel lines and do everything at 32-bits. I intend to discuss this, and my reason(s) for it at the upcoming ORCONF.

That said, the _sel lines would be extremely easy to implement in this case and even with this controller. They would come in and get treated like data up until the point that the data lines get placed on the bus. At that point, the lines would be placed onto the udm and ldm lines. IIRC, there's a direct translation, so again putting _sel lines in would not be too difficult. Even better, from my standpoint as someone who doesn't use them, I think Xilinx would be smart enough to optimize them out. As a result, it wouldn't hurt if they were there. This would give you 8-bit write resolution into the memory.

As far as 32-bit word enables, they are actually accomplished within the memory controller. Remember, I've been saying that 32-bits is the natural word size of this controller. It really is. Even if you want to do 128-bit transactions, you're going to find yourself doing a series of 32-bit transactions.

To do 32-bit transactions with this controller, and to leave out one or two values from a 128-bit word, just write to the controller all the words you want to write to the memory: one word per clock, as in the pipeline wishbone spec. If you want to skip a 32-bit word--skip it. The logic within the controller will detect that and mask out the missing write. No further work necessary. Sure, you could disagree with me and rewrite the controller to remove this logic to simplify it for 128-bit writes but ... well, that's the wonderful thing about GPL.

If you go the road I just described, do be aware that when writing a 128-bit word, if you don't have a word to write on every clock you may find yourself using multiple 128-bit transactions, which would take nearly twice as long. Hence if you wish to write w[0], w[2], w[3], where w[0:3] make up a 128-bit word, write each of w[0], w[1], and w[3] on seperate consecutive clocks. Don't stall on the w[2] clock and pick it up again with w[3]--write w[3] on the third clock. The controller will then write to the memory, and mask out the writing of w[2]. (Actually, if this is a "new" transaction, whose bank is not "activated", the controller is likely to stall for a cycle or two after you've written w[0] and w[1]. You might still get w[2] in before the controller notices it's missing. This wouldn't work, though, for a missing w[1] or w[3].)

Dan

RE: WB DDR3 SDRAM Controller
by dgisselq on Aug 2, 2016
dgisselq
Posts: 247
Joined: Feb 20, 2015
Last seen: Oct 24, 2024
Sigh, you know I do profread my answers before sending them ...

The example should read if you wish to write w[0], w[1], and w[3]--not w[0], w[2] and w[3]--otherwise the example isn't consistent.

Sorry.

Dan

RE: WB DDR3 SDRAM Controller
by olof on Aug 2, 2016
olof
Posts: 218
Joined: Feb 10, 2010
Last seen: Dec 17, 2018
Hi guys,

I've been meaning to enter the discussion sooner, but haven't had time to do that. I have done some work on this problem a few years ago, but was never able to finish it. I actually revisited the code quite recently with the ambition to push something.

I started out from the excellent wb_sdram_ctrl core (https://github.com/skristiansson/wb_sdram_ctrl/) that we have been using a lot for SDRAM interfaces for many OpenRISC-based SoCs. This provides a SDRAM interface in one end and multiple wishbone slave ports in the other. The wishbone ports all contains a small cache (which is coherent between the ports). The reason for the cache is to always do full bursts to the RAM in order to mitigate some of the latency that we would have from sequential single word accesses.

The work I was doing for that IP was to split it up into three different cores.

1. A SDRAM phy with a DFI interface. DFI (http://ddr-phy.org/) is a standard interface between a controller and a phy to separate the technology-specific phy and and the (possibly) technology-independent controller. This would allow us to just switch out the phy depending on the FPGA and SDRAM/DDR* chips we want to interface.
2. A controller that can be configured for SDRAM/DDR2/DDR3 operations. Much of the controller is the same for all technologies. The differences is mostly the init state machine and some commands

3. A multiport cached wishbone arbiter. This could be used as a system cache. Likely towards a memory controller, but in theory, anything could be hooked up to this.

I'd love to finish this work myself as I've come a pretty long way, but I don't really have the time right now. I'm happy though to share the code I got right now, and discuss this in more detail. For more in-depth discussions, I'm usually available on the #openrisc channel on irc.freenode.net

//Olof
RE: WB DDR3 SDRAM Controller
by dgisselq on Aug 3, 2016
dgisselq
Posts: 247
Joined: Feb 20, 2015
Last seen: Oct 24, 2024
Rob,

I noticed that Xilinx orders their memory addresses as BANK:ROW:COLUMN. If this is how you were judging whether your peripherals might stay within a given bank, I should warn you that this is not how I am ordering memory. I am ordering memory as in ROW:BANK:COLUMN.

The reason for this is simply pipeline performance. Before I get to the end of a column, I can close and open the next bank. This allows me to do continuous read/writing without stalling. Had I instead used BANK:ROW:COLUMN, then I would need to stall at the end of the COLUMN to close the row within the bank, and to open a new row within the same bank. This is inefficient speed wise, and so have chosen the other approach.

Olof,

Your description sounds very familiar to what Rob is interested in. Thanks for pointing it out!

Dan

RE: WB DDR3 SDRAM Controller
by robfinch on Aug 3, 2016
robfinch
Posts: 28
Joined: Sep 29, 2005
Last seen: Nov 18, 2024
I should warn you that this is not how I am ordering memory. I am ordering memory as in ROW:BANK:COLUMN.

Thanks Dan, I was watching out for that. I will likely end up using a modified version of the controller. I'd like to at least add select lines to it.

Olof
I started out from the excellent wb_sdram_ctrl core (https://github.com/skristiansson/wb_sdram_ctrl/) that we have been using a lot ...

1. A SDRAM phy with a DFI interface. DFI (http://ddr-phy.org/) ...
2. A controller that can be configured for SDRAM/DDR2/DDR3 operations. ...

This generally sounds like what I'm after.
3. A multiport cached wishbone arbiter. This could be used as a system cache. ...

Is this a single cache with multiple ports ?
I've been using separate read-nano-caches for each port. In a version of the controller there are several parallel caches for the cpu port in order to support different threads. Thread-id determines cache used. Writes go directly through to the SDRAM controller. (I believe the vendor's DRAM controller queues writes). I'm also using slightly different port configurations based on the port number/purpose.
The multi-port controller is non-generic in nature at the moment.
no use no use 1/2 Next Last
© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.