This following page has been lifted from the ZipCPU specification. It may be a touch out of date, though. The current honest assessment may be found in the ZipCPU specification, as part of the last chapter, and maintained on GitHub.
Having now worked with the Zip CPU for a while, it is worth offering an honest assessment of how well it works and how well it was designed. At the end of this assessment, I will propose some changes that may take place in a later version of this Zip CPU to make it better.
The ZipCPU was designed to be a simple and light weight CPU. It has achieved this end nicely. The proof of this is the full multitasking operating system built for Digilent's CMod S6 board, based around a very small Spartan 6/LX4 FPGA.
As a result, the ZipCPU also makes a good starting point for anyone who wishes to build a general purpose CPU and then to experiment with building and adding particular features. Modifications should be simple enough.
Indeed, a non-pipelined version of the ZipBones (with no peripherals) has been built that uses only 1.3k 6-LUT's. When using pipelining, the full cache, and all of the peripherals, the ZipSystem can take up to 4.5k LUTs. Where it fits in between is a function of your needs.
A new implementation using an iCE40 FPGA suggests that the ZipCPU SoC will fit within the 4k 4-way LUTs of the iCE40 HK4X FPGA, but only just barely.
The ZipCPU was designed to be an implementable soft core that could be placed within an FPGA, controlling actions internal to the FPGA. This version of the CPU in particular has been updated so that it would be more general purpose, since as of version 2.0 the ZipCPU now supports octet level access across the bus.
Still, it fits this roll rather nicely. Other capabilities common to more general purpose CPUs, such as double-precision floating point capability, vector registers and vector operations have been left out. However, it was never designed to be such a general purpose CPU but rather a small system within a chip.
The extremely simplified instruction set of the ZipCPU was a good choice. Although it does not have many of the commonly used instructions from other architectures, PUSH, POP, JSR, and RET among them, the simplified instruction set has demonstrated an amazing versatility. I will contend therefore, and for anyone who will listen, that this instruction set offers a full and complete capability for whatever a user might wish to do with the only exception being accelerated floating-point support.
The burst load/store approach using the wishbone pipelining mode is novel, and can be used to greatly increase the speed of the processor--even without a data cache.
The novel approach to interrupts greatly facilitates the development of interrupt handlers from within high level languages.
The approach involves a single interrupt "vector" only, and simply switches the CPU back to the instruction it left off at. By using this approach, interrupt handlers no longer need careful assembly language scripting in order to save their context upon any interrupt.
At the same time, if most modern systems handle interrupt vectoring in software anyway, why maintain complicated hardware support for it?
Binutils, GCC, and newlib backends exist for the ZipCPU.
The ZipCPU does not (yet) support a data cache. While one is currently under development, but has yet to be integrated.
The ZipCPU compensates for this lack via its burst memory capability. Further, performance tests using Dhrystone (and on-chip memory ...) suggest that the ZipCPU is no slower than other processors containing a data cache.
Many other instruction sets offer three operand instructions, whereas the ZipCPU only offers two operand instructions. This means that it may take the ZipCPU more instructions to do many of the same operations. The good part of this is that it gives the ZipCPU a greater amount of flexibility in its immediate operand mode, although that increased flexibility isn't necessarily as valuable as one might like.
The impact of this lack of three operand instructions is application dependent, but does not appear to be too severe.
The ZipCPU doesn't support out of order execution.
I suppose it could be modified to do so, but then it would no longer be the "simple" and low LUT count CPU it was designed to be.
Although switching to an interrupt context in the ZipCPU design doesn't require a tremendous swapping of registers, in reality it still does--since any task swap (such as swapping to a task waiting on an interrupt) still requires saving and restoring all 16 user registers. That's a lot of memory movement just to service an interrupt.
This isn't nearly as bad as it sounds, however, since most RISC architectures have 32 registers that will need to be swapped upon any context swap.
The ZipCPU is by no means generic: it will never handle addresses larger than 32-bits (4GB) without a complete and total redesign. This may limit its utility as a generic CPU in the future, although as an embedded CPU within an FPGA this isn't really much of a restriction.
While a toolchain does exist for the ZipCPU, it isn't yet fully featured. The ZipCPU does not yet have any support for soft floating point arithmetic, nor does it have any gdb support. These may be provided in future versions.
This section could also be labeled as my "To do" list. It outlinse where you may expect features in the future. Currently, there are five primary features on my to do list:
The lack of any floating point capability, either hard or soft, makes porting math software to the ZipCPU difficult. Simply building a soft floating point library (such as finishing the GCC port) will solve this.
A preliminary data cache implemented as a write through cache has been developed. Adding this into the CPU should require a few changes to the CPU. I expect future versions of the CPU will permit this as an option.
The first version of such an MMU has already been written--you can find it within the repository. This MMU exists as a peripheral of the ZipCPU. Integrating this MMU into the ZipCPU will involve slowing down memory stores so that they can be accomplished synchronously, as well as determining how and when particular cache lines need to be invalidated.
Why a small scale CPU needs a hefty floating point unit, I'm not certain, but many application contexts require the ability to do floating point math.
If you have any thoughts on these potential upgrade paths, please feel free to submit them to the "Cores" forum or write me directly.