MPX 32-bit CPU :: Overview
Other project properties
‘MPX’ is a 32-bit soft-core processor written in Verilog (and originally in VHDL).
It is a pipelined RISC processor which implements the majority of MIPS-I ISA excluding the formally patented unaligned load/store instructions & multiplier / divider (mult, multu, div, divu) instructions.
By not including native multiplication & division instructions, the pipeline is simplified and the core is smaller.
GCC was modified to provide the option to disable the MIPS™ mult & div instructions as well as to enable turning off the patented unaligned memory access instructions.
Multiplication & division is provided in software by replacement functions in the C library (mulsi3, divsi3, etc), but can optionally be provided by ‘trapping’ on execution of unsupported instructions (hence allowing for a standard release of GCC to be used).
MPX can execute 1 instruction every cycle except for memory access instructions which take 2 cycles (see below). It ‘features’ both a load delay slot & a branch delay slot.
When executing from internal single cycle memory, MPX is able to achieve 65.57 DMIPS (Dhrystone 1.1) @ 57Mhz on a Xilinx Spartan 6 FPGA.
MPX is implemented using a 4 stage pipeline.
As the architecture has a branch delay slot, knowing that you will branch in stage 2 means that you will have also already scheduled a instruction fetch for PC+4 in stage 1, meaning you do not have to flush any part of the pipeline on a branch operation.
MPX is a pipelined Von-Neumann architecture (shared data & instruction bus) which lends itself to connecting to single ported RAM / external memory interfaces.
This means that memory access instructions cause a ‘bubble’ instruction to be inserted into the pipeline. Interrupts are also a source of pipeline bubbles.
All other data hazards in the pipeline are resolved by forwarding logic.
Instruction/Data memory pause (or cache miss) results in the pipelined being stalled.
The RTL code can be simulated using Verilator which is an open source tool that allows Verilog to be compiled to a C++ model.
The ‘Verilated’ model can be used to execute code in a cycle accurate way (with peripherals).
This core has successfully been used in a Spartan 6 based project to decode MP3's.
The core (which is clocked at 57MHz) is able to play 320Kbps MP3's in real-time with the aid of a Xilinx Multiplier block.