OpenCores
URL https://opencores.org/ocsvn/next186/next186/trunk

Subversion Repositories next186

[/] [next186/] [trunk/] [Next186_BIU_2T_delayread.v] - Diff between revs 2 and 3

Go to most recent revision | Show entire file | Details | Blame | View Log

Rev 2 Rev 3
Line 11... Line 11...
// Author: Nicolae Dumitrache 
// Author: Nicolae Dumitrache 
// e-mail: ndumitrache@opencores.org
// e-mail: ndumitrache@opencores.org
//
//
/////////////////////////////////////////////////////////////////////////////////
/////////////////////////////////////////////////////////////////////////////////
// 
// 
// Copyright (C) 2011 Nicolae Dumitrache
// Copyright (C) 2012 Nicolae Dumitrache
// 
// 
// This source file may be used and distributed without 
// This source file may be used and distributed without 
// restriction provided that this copyright statement is not 
// restriction provided that this copyright statement is not 
// removed from the file and that any derivative work contains 
// removed from the file and that any derivative work contains 
// the original copyright notice and the associated disclaimer.
// the original copyright notice and the associated disclaimer.
Line 37... Line 37...
// from http://www.opencores.org/lgpl.shtml 
// from http://www.opencores.org/lgpl.shtml 
// 
// 
///////////////////////////////////////////////////////////////////////////////////
///////////////////////////////////////////////////////////////////////////////////
// Additional Comments: 
// Additional Comments: 
//
//
// Description: Next186 Bus Interface Unit, 32bit width SRAM, double CPU frequency (2T), delayed read (1T)
//      - Links the CPU with a 32bit static synchronous RAM (or cache) 
// 
//      - Able to address up to 1MB 
//      The CPU is able to execute (up to) one instrusction/clock with sepparate data and instruction buses.
//      - 16byte instruction queue 
//      This particular bus interface links the CPU with a 32bit width memory bus and it is able to address up to 1MB. 
//      - Works at 2 X CPU frequency (80Mhz on Spartan3AN), requiring minimum 2T for an instruction.
//      It have a 16byte instruction queue, and works at up to 80Mhz, allowing the CPU to execute an instruction at 2T. It is possible
//      - The 32bit data bus and the double CPU clock allows the instruction queue to be almost always full, avoiding the CPU starving. 
//              to implement a BIU with sepparate data/instruction buses, which will run at the same frequency as the CPU, but it requires more resources.
 
//
 
//      The 32bit data bus width and the double BIU clock allows the instruction queue to be almost always full, avoiding the CPU starving. 
 
//      The data un-alignement penalties are required only when data words crosses the 4byte boundaries.
//      The data un-alignement penalties are required only when data words crosses the 4byte boundaries.
 
//
//////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////
// How to compute each instruction duration, in clock cycles (please note that it is specific to this particular BIU implementation):
//
 
// How to compute each instruction duration, in clock cycles (for this particular BIU implementation!):
//
//
//      1 - From the Next186_features.doc see for each instruction how many T states are required (you will notice they are always
//      1 - From the Next186_features.doc see for each instruction how many T states are required (you will notice they are always
//              less or equal than 486 and much less than the original 80186
//              less or equal than 486 and much less than the original 80186
// 2 - multiply this number by 2 - the BIU works at double ALU frequency because it needs to multiplex the data and instructions,
// 2 - Multiply this number by 2 - the BIU works at double ALU frequency because it needs to multiplex the data and instructions,
//              in order to keep the ALU permanently feed with instructions. The 16bit queue acts like a flexible instruction buffer.
//              in order to keep the ALU permanently feed with instructions. The 16bit queue acts like a flexible instruction buffer.
// 3 - add penalties, as follows:
// 3 - Add penalties, as follows:
//                      +1T for each memory read - because of the synchronous SRAM which need this extra cycle to deliver the data
//                      +1T for each memory read - because of the synchronous SRAM which need this extra cycle to deliver the data
//                      +2T for each jump - required to flush and re-fill the instruction queue
//                      +2T for each jump - required to flush and re-fill the instruction queue
//                      +1T for each 16bit(word) read/write which overlaps the 4byte boundary - specific to 32bit bus width
//                      +1T for each 16bit(word) read/write which overlaps the 4byte boundary - specific to 32bit bus width
//                      +1T if the jump is made at an address with the latest 2bits 11 - specific to 32bit bus width
//                      +1T if the jump is made at an address with the latest 2bits 11 - specific to 32bit bus width
//                      +1T when the instruction queue empties - this case appears very rare, when a lot of 5-6 bytes memory write 
//                      +1T when the instruction queue empties - this case appears very rare, when a lot of 5-6 bytes memory write instructions are executed in direct sequence
//                              instructions are executed one after the other
//
//              Some examples:
//              Some examples:
//                      - the instruction "inc word ptr [1]" will require 5T (2x2T inc M + 1T read)
//              - "lea ax,[bx+si+1234]" requires 2T
//                      - the instruction "inc word ptr [3]" will require 7T (2x2T inc M + 1T read + 1T unaligned read + 1T unaligned write)
//              - "add ax, 2345" requires 2T
//                      - the instruction "imul ax,bx,234" will require 4T (2x2T imul)
//              - "xchg ax, bx" requires 4T
//                      - the instruction "loop <address = 1>" will require 4T (2x1T loop + 2T flush)
//              - "inc word ptr [1]" requires 5T (2x2T inc M + 1T read)
//                      - the instruction "loop <address = 3>" will require 5T (2x1T loop + 2T flush + 1T unaligned jump)
//              - "inc word ptr [3]" requires 7T (2x2T inc M + 1T read + 1T unaligned read + 1T unaligned write)
//                      - the instruction "call <address = 0>" will require 4T (2x1T call near + 2T flush 
//              - "imul ax,bx,234" requires 4T (2x2T imul)
//                      - the instruction "ret <address = 0>" will require 5T (2x2T ret + 1T read penalty)
//              - "loop address != 3(mod 4)" requires 4T (2x1T loop + 2T flush)
 
//              - "loop address == 3(mod 4)" requires 5T (2x1T loop + 2T flush + 1T unaligned jump)
 
//              - "call address 0" requires 4T (2x1T call near + 2T flush
 
//              - "ret address 0" requires 5T (2x2T ret + 1T read penalty)
 
//
//////////////////////////////////////////////////////////////////////////////////
//////////////////////////////////////////////////////////////////////////////////
 
 
`timescale 1ns / 1ps
`timescale 1ns / 1ps
 
 
 
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.