Quick link to the design document v08.06: http://www.mediafire.com/view/vp3hb8phe3t5yh4
Quick link to everything including C++ sim: http://www.mediafire.com/download/844087o9v6c914j
v08.06 - 2015-09-04 - PUBLIC RELEASE -
- Fixed horrific bug that goes all the way back to v06.01:
- Copy path through the ALU not specified for op_dat immediate data!
- Fixed horrific bug that goes all the way back to v05.03:
- AB Pops not inhibited during decode of IRQ cycle!
- Bit reduction opcodes now return 1/0 rather than -1/0 (more useful?).
- Added conditional (A?0) GTO opcodes.
- Removed op_jmp_4 (A?B) opcodes - hogging too much opcode space.
- Added op_sk1 and op_sk2 (A?B) opcodes.
- Added A odd tests to SKP & SK2 opcodes.
- Removed redundant PC+1 & PC+2 lit logic.
- The core port now sports the RBUS master interface.
- Added parameter MEM_ROM_W to protect ROM area in low main memory.
- Added parameter XSR_LIVE_MASK to enable / disable XSR inputs.
- Moved remaining trivial registering of RBUS bridge to the data ring.
- MEM_IM_W is now 5, moved *2 shift for 32 bit access address
offset into op_decode.
- Removed thread clear events from error register. Cleared
threads can report this through some other mechanism if needed.
- New component: hive_in_cond.sv to handle XSR & register set
input conditioning (identical functionality).
- Fixed bug regarding register set input data edge detection
option masks (mask vectors weren't indexed).
- Register set now distributed rather than in one component,
which makes the design more modular. "RBUS" is the internal bus.
- Removed most interstage feedback, now almost competely feedforward:
- Push stack selector encoded binary w/ enable rather than one-hot.
- Stack push moved to stage 5.
- Stack errors pipelined to next cycle @ stage 0.
- Opcode error and thread clear reporting @ stage 0.
- Shuffled opcodes, removed opcode type CODE_T, streamlined default decoding.
- Passes new verification & functional testing.
- EP3C5E144C: 2420 LEs, 194.1 MHz.
Quick link to the Excel simulator v06.01: http://www.mediafire.com/download/4vy7d202xu7fdbs
Quick link to the design document v06.01: http://www.mediafire.com/view/ghtn03wqe4a6k0z
Quick link to the SystemVerilog code v06.01: http://www.mediafire.com/download/9iloxic8535cdt7
v06.01 - 2014-07-13
- Major changes in hive_main_mem.sv to support 16 & 32 bit aligned and unaligned access for literals and memory R/W.
- Main memory BRAM now a dual entity to provide separate addressing of high and low and to circumvent bootcode init issues.
- R/W immediate field offset is based on 16 bit access.
- New / different opcodes:
- op_cpy_ls : 16 bit copy low signed
- op_cpy_lu : 16 bit copy low unsigned
- op_lit : 32 bit literal
- op_lit_ls : 16 bit literal low signed
- op_lit_lu : 16 bit literal low unsigned
- op_mem_ir : 32 bit memory read
- op_mem_irls : 16 bit mememory read low signed
- op_mem_iw : 32 bit mememory write
- op_mem_iwl : 16 bit memory write low
- Some juggling of opcode order to hopefully ease decode.
- hive_alu_logical.sv rearranged a bit, removed a and default path.
- Passes all boot code verification & functional tests.
Quick link to the SystemVerilog code v05.04: http://www.mediafire.com/download/zry439dg14rz6ab
v05.04 is now written in synthesizable SystemVerilog! Lots of minor edits, magic numbers are all in packages.
Quick link to the Excel sim v05.03: http://www.mediafire.com/download/ypii57k6c6z713h
Quick link to the verilog code v05.03: http://www.mediafire.com/download/1niwno3c2ncnbxq
Quick link to the design document v05.03: http://www.mediafire.com/download/1tjszeo0kmy14ym
v05.03 has more extensive interrupt support, 32 bit register access, and an updated design document. Footprint is a bit smaller and top speed is a bit faster than previous versions due to cleanup / rewrite / edits.
With v04.05 Hive now has 8 stacks per thread and a UART!
Hive is a general purpose soft processor core intended for instantiation in an FPGA when CPU functionality is desired but when an ARM or similar would be overkill. The Hive core is complex enough to be useful, with a wide data path, a relatively full set of instructions, high code density, and good ALU utilization – but with very basic control structures and minimal internal state, so it is simple enough for a human to easily grasp and program at the lowest level without any special tools. It fits in the smallest of current FPGAs with sufficient resources left over for peripherals (as well as other unrelated logic) and operates at or near the top speed of the device DSP hardware.
Hive isn’t an acronym, the name is meant to suggest the swarm of activity in an insect hive: many threads sharing the same program and data space, individually beavering away on separate tasks, and cooperating together to accomplish larger goals. Because of the shared memory space, thread intercommunication is facilitated, and threads can all share a single instance of code, subroutines, and data sets which enables code compaction via global factoring.
The novel hybrid stack / register construct employed reduces the need for a plethora of registers and allows for small operand indexes in the opcode. This construct, coupled with explicit stack pointer control in the form of a pop bit for each stack index, minimizes the confusing and inefficient stack gymnastics (swap, pick, roll, copying to thwart auto-consumption, etc.) normally associated with conventional stack machines, and also minimizes the saving and restoring of register contents normally associated with conventional register machines.
Hive employs a naturally emergent form of multi-threaded scheduling which eliminates all pipeline hazards and provides the programmer with as many equal bandwidth threads – each with its own independent interrupt – as pipeline stages. Processors that employ this form of pipelining are classified as “barrel” processors.
Hive is a largely stateless design (no pipeline bubbles, no registered ALU flags that may or may not be automatically updated, no reserved data registers, no pending operations, no branch prediction, etc.) so subroutines require no overhead and interrupts consume a single branch cycle, and their calculations can be performed directly and immediately with complete disregard for what may be transpiring in other contexts.