URL
https://opencores.org/ocsvn/ion/ion/trunk
Subversion Repositories ion
Compare Revisions
- This comparison shows the changes necessary to convert path
/ion
- from Rev 44 to Rev 45
- ↔ Reverse comparison
Rev 44 → Rev 45
/trunk/doc/ion_project.txt
5,7 → 5,7
|
Last modified: Feb/02/2011 |
|
Send bug reports or comments to ja_rd[at]hotmail.com |
Send bug reports or comments to ja_rd[at]hotmail[dot]com |
|
|
|
137,7 → 137,7
which load interlocking has been implemented, the core is less efficient |
than that -- more on this later.) |
|
The core can't read and write at the same time; this is a fundamental |
The core can't read and write data at the same time; this is a fundamental |
limitation of the core structure: doing both at the same time would take |
one more read port in the register bank -- too expensive. |
|
222,7 → 222,7
When byte_we(i) is active, the matching byte at data_wr should be stored |
at address data_wr_addr. byte_we(0) is for the LSB, byte_we(3) for the MSB. |
Note that since the CPU is big endian, the MSB has the lowest address and |
LSB the highest. the memory system does not need to care about that. |
LSB the highest. The memory system does not need to care about that. |
|
Write cycles span a single clock cycle and never cause data-hazard stalls. |
|
248,8 → 248,7
|
Note the two back-to-back stores to addresses 0x0800 and 0x0900. They are |
produced by two consecutive S* instructions (SB and SH in the example), |
and can only be done this fast because of the Harvard architecture -- |
with a Von Neumann the read port would be used for opcode fetches too. |
and can only be done this fast because of the Harvard architecture. |
|
|
2.1.3 Memory wait cycles |
256,8 → 255,10
|
Memory wait cycles have already been implemented and tested with a 'stub' |
cache (module mips_cache_stub). This 'cache' is actually just an interface |
to external 16-bit wide memory. |
to external 16-bit wide memory meant for simulation only. It is far too |
rough to be of any use in a real system. |
The memory wait state logic works with this stub module but I expect it to |
change when the final cache implementation is done. |
|
In short, the 'mem_wait' input will unconditionally stall all pipeline |
stages as long as it is active. It is meant to be used by the cache at cache |
270,7 → 271,7
2.2 Pipeline |
|
Here is where I would explain the structure of the cpu in detail; these |
brief comments will have to until I write some real documentation. |
brief comments will have to wait until I write some real documentation. |
|
This section could really use a diagram; since it can take me days to draw |
one, that will have to wait for a further revision. |
285,7 → 286,7
* FETCH-0 : Instruction address is in code_rd_addr bus |
* FETCH-1 : Instruction opcode is in code_rd bus |
* ALU/MEM : ALU operation or memory read/write cycle is done OR |
Memory read/data address is on data_rd/wr_address bus AND |
Memory read/data address is on data_rd/wr_address bus OR |
Memory write data is on data_wr bus |
* LOAD : Memory read data is on data_rd bus |
|
308,7 → 309,7
|
rbank[$gp] | 0x0001 | |
|
|< fetch1>|< 0 >|< 1 >| |
|< fetch0>|< 0 >|< 1 >| |
|
==== Chronogram 3.B: stages for instruction "lw a0,16(v0)" ============ |
____ ____ ____ ____ ____ |
352,8 → 353,9
be used in stage 1 because it is read early (the read port is loaded at the |
same time as the instruction opcode). That is, a small part of the |
instruction decoding is done on stage FETCH-1. Bearing in mind that the code |
ram is meant to be the exact same type of block as the register bank, and we |
will bundle the whole ALU delay plus the reg bank delay in stage 1, it does |
ram is meant to be the exact same type of block as the register bank (or |
faster if the register bank is implemented with distributed RAM), and we |
will cram the whole ALU delay plus the reg bank delay in stage 1, it does |
not hurt moving a tiny part of the decoding to the previous cycle. |
|
All registers but a few exceptions belong squarely to one of the pipeline |
379,7 → 381,7
|
Note how the register bank ports belong in different stages even if it's |
the same physical device. No conflict here, hazards are handled properly |
(by explicit vhdl code, not using synthesis pragmas, etc.). |
(logic defined with explicit vhdl code, not with synthesis pragmas, etc.). |
|
|
There is a small number of global registers that don't belong to any |
411,7 → 413,8
|
a) If an instruction needs to access a register which was modified by the |
previous instruction, we have a data hazard -- because the register bank is |
synchronous. |
synchronous, a memory location can't be read in the same cycle it is updated |
-- we will get the pre-update value. |
|
b) A memory load into a register Rd produces its result a cycle late, so if |
the instruction after the load needs to access Rd there is a conflict. |
425,7 → 428,8
multiplexors are implemented. Note that hazard is detected separately for |
both read ports of the reg bank (p0_rbank_rs_hazard and p0_rbank_rt_hazard). |
Note that this logic is strictly regular vhdl code -- no need to rely here |
on the synthesis tool to add the bypass logic for us. |
on the synthesis tool to add the bypass logic for us. This gets us some |
measure of vendor independence. |
|
As for conflict (b), in the original MIPS-I architecture it was the job |
of the programmer to make sure that a loaded value was not used before it |
490,7 → 494,8
|
Note how read and write cycles are spaced instead of being interleaved, as |
they would if interlocking was implemented efficiently (in this example, |
there was a real hazard, register $a0, but that's coincidence). |
there was a real hazard, register $a0, but that's coincidence -- I need to |
find a better example in the listing files...). |
|
|
2.5 Exceptions |
500,9 → 505,13
|
Both do a limited version of the regular MIPS exception behavior. |
They save their own address to EPC, abort the following instruction, and |
jump to the exception vector 0x03c. All as per the specs. |
jump to the exception vector 0x03c. All as per the specs except the vector |
address. |
|
The following instruction is aborted even if it is a load or a jump. |
The following instruction is aborted even if it is a load or a jump, and |
traps work as specified even from a delay slot -- in that case, the address |
saved to EPF is not the victim instruction's but the preceding jump |
instruction's as explained in [1], pag. 64. |
|
Plasma used to save in epc the address of the instruction after break or |
syscall. This core will use the standard MIPS way instead. |
798,6 → 807,7
The project includes a pre-generated demo, the 'hello world' code sample. |
This is just for convenience, so that you can launch some demo on hardware |
without installing the C toolchain. |
|
A constraints file is provided ('/vhdl/demo/c2sb_demo.csv') which includes |
all the pin constraints for the default target board, in CSV format. |
|
877,7 → 887,8
tried it on Quartus and ISE). Specifically, it does not instance memory |
blocks (relying instead on memory inference) or clock managers or buffers. |
This has its drawbacks but is an stated goal of the project -- in the long |
run it pays, I think. |
run it pays, I think. Vendor-specific hardware has its uses but should not |
be instantiated needlessly. |
|
|
|
962,7 → 973,7
or later versions but I haven't tested. |
|
Note: all of the above info is in the scrip itself, and can be shown |
with command line option -h. Since it will beb more up to date than this |
with command line option -h. Since it will be more up to date than this |
doc, you're advised to read the script. |
|
|