This is a "clean" reimplementation of the Vautomation uRISC processor core (aka the "V8", also named the Arclite core) based on ISA documentation only.
It implements the full v8 architecture with a few additions, most of which are optional:
* Thirty-six basic instructions (and four new instructions)
* 8-bit PSR(Program Status Register) with Zero, Carry, Negative, and Interrupt status bits, and 4 general purpose status bits.
* Eight 8-bit registers, R0 though R7.
* Accumulator register (R0)
* A 16-bit program counter
* Any two adjacent registers may be paired to create a 16-bit index register.
* Three basic addressing modes; addressed, indexed, and indexed with offset
The design adds a few new features, which can be enabled through generics:
* An optional auto-increment for indexed addressing modes ("LDX R4++" is equivalent to "LDX R4 ; UPP R4" )
* A new branching instruction, DBNZ (Decrement, and Branch if Not Zero)
* A new math instruction, MUL, uses on-board multipliers.
* The interrupt mask can now be set with the new instructions SMSK and GMSK
* The RSP instruction may now optionally be converted into a "relocate SP" instruction with the ability to copy between R1:R0 <> SP based on the state of a user-specified CPU flag.
The Open8 is being designed to work optimally in newer FPGA architectures. It assumes 2 clocks for memory and register file latency.
This design has now fielded as a test stimulus controller hosted in an Altera 3C40, not once - but twice. It's primarily serving as a data acquisition controller / packet generator in those designs, and has performed trouble-free for well over a year. Additionally, as part of the test stimulus system, the Open8 is responsible for synchronizing output frequencies with the device under test. Due to the nature of these calculations, a 16-bit ALU/co-processor was written to "hardware accelerate" common math functions, rather than have to write emulations in assembly. This ALU has been included in the SVN repository. The Open8, and its ALU coprocessor, use about 2400 LE's in the FPGA.
It has also been fielded in several non-shipping test instruments and small emulators hosted in Altera 3C16's, where it performs a variety of tasks. I am presently looking to use it as a packet processor to bridge between a PC and a custom digital waveform generator design.
- Model is written in VHDL ('93)
- Simple RISC architecture and instruction set. All instructions fit in a single byte, with either 1 or 2 operands.
- 16-bit PC / address allows for 64kB of directly accessible memory (can be expanded with paging) - Flat memory model allows code or data to be placed anywhere in the memory map, as well as easily supporting self-modifying code.
- Moderate number of general purpose registers
+ Eight byte-wide registers.
+ Any two registers may be paired as (Rn+1:Rn) to create an index register
+ R0 acts as the accumulator
- 8 interrupts, 1 NMI, 7 maskable. Interrupt controller is built into the core.
+ Interrupt controller keeps track of interrupt order and priority
+ Interrupt mask is controllable through two new instructions, SMSK and GMSK.
- Reasonably small gate-count, with strong fMax in "low-end" devices.
- As part of spring cleaning, the core has had a few rewrites to optimize performance/complexity. Along the way, several subtle bugs were discovered and fixed.
- * The address generation logic has been split out of the main FSM section. This did require a slight modification to the LDX instruction so that the addresses were all based purely on CPU state and not the instruction itself. This means LDX now takes as long as LDO to execute, but the results are a much simpler, much faster core.
- * The program counter logic was greatly simplified such that it only increments or loads. To stop the counter, the offset value is set to 0x02. This allows for fewer multiplexors in the PC logic.
- * Similarly, the ALU gained an instruction (for GMSK) so that the ALU_Ctrl.Data field could be eliminated. This field is now permanently wired to Operand1. The GMSK ALU instruction simply moves the interrupt mask into R0 as before, but now as an implicit part of the ALU design as opposed to using the LDI instruction in a non-conventional manner.
- * The CPU_Halt signal was brought back from the dead, except now it works within the context of the instruction decode logic. Much like INT N, it enters a wait state. However, the program counter is rewound to the instruction that would have been executed in that slot and the wait state exits on deasssertion of the CPU_Halt input. The result is a halt system that had very little effect on either Fmax or gate count.
- This design has now been used in several fielded instruments and has proven stable over time. I'm presently writing it into a new test tool. As such, I figured it was time to upload the assembler I prefer - a variant of the WLA assembler modified for use with the Open8. I have used versions of this assemble on Windows and Linux for years. It isn't as feature-ful as binutils, but I prefer it for straight assembly work.
- Complete! The CPU has been synthesized and tested on an Altera DE2 board (Cyclone II 2C35).
- [UPDATE: the Hi-Tech compiler is no longer available.] Hi-Tech has now made their C compiler for the v8/Arclite architecture available as a demo. Note, the Open8 implements instructions that aren't in the stock v8/Arc core, so some of the generated code could probably be accelerated with a bit of hand optimization. (the DBNZ Rn instruction won't be used in loops for example)
- Source VHDL for the Open8 can be retrieved from either the "download" link, or from the SVN repository, above.
- An assembly language reference manual has been added to the source repository (March 20, 2011)
- A port of GNU binutils is in the SVN repository. This is a beta release, and has not yet been incorporated into the official binutils source base. Please report any bugs here, not at the binutils bugzilla.
- The Open8 is getting its first real use in a test set. It is implemented alongside a number of hardware accelerators, relegating it to primarily moving things around in memory, but so far it has performed well. There are some minor alterations, including an option to replace BRK with WAI - or WAit_for_Interrupt. When selected, there is no longer a true NOP available, but the ability to halt the processor waiting for an interrupt is a useful capability.
- BRK_Implements_WAI is tested, and shown to work correctly. An updated processor model has been checked in to SVN.
- The Open 8 has now successfully been fielded! The core in question used the new features recently checked in, and has worked remarkably well as a supervisory processor in a larger FPGA design. The whole system features a lot of hardware accelerators, including a 16-bit, bus-addressed ALU to handle some of the math, but using the Open8 has allowed the design to be a lot more flexible.
- A port of the GNU C/C++ compiler is underway, with no release date yet targeted. The calling conventions are still under design, and there will likely be changes to the instruction set to make it easier for the compiler to generate efficient code.
- A few bugs were found while regression testing an updated version of the Open8 processor core. Apparently the vectored interrupt controller didn't always obey priority. Also, it appears that auto-incrementing indexed loads and stores didn't complete execution of the UPP command. These have been both corrected.
- The ALU control signals were pipelined to improve fMax on smaller parts. This allowed a design targeting an Altera Cyclone 3C16 to go from ~60MHz to ~132MHz (without trying, the target frequency was 100MHz). Unfortunately, this also means that all math instructions (Opcodes 0 though 15 and GMSK) now take take 3 clock cycles to execute instead of one, like the MUL and UPP instructions. The only other instruction to suffer increased latency was the DBNZ instruction, which requires the status register to update before continuing. All other instructions retain their existing latencies. Unfortunately, this does imply that code should be regression tested on the model, as the total execution time in clock cycles will increase.
- As part of the update, a lot of superfluous code was stripped out. The model should be a lot easier to understand.