OpenCores

CRISC CPU :: Overview

Project maintainers

Details

Name: crisc_cpu
Created: Feb 24, 2011
Updated: May 19, 2017
SVN: No files checked in

Other project properties

Category: Processor
Language: VHDL
Development status: Planning
Additional info: none
WishBone compliant: No
WishBone version: n/a
License: LGPL

CRISC CPU Overview

The CRISC CPU is a Complex Reduced Instruction Set CPU. The CPU has 4 data registers 16 bit, 7 data registers 32 bit and 4 address registers 16 bit and 20 opcodes. Top execution speed is one instruction per clock cycle. The CPU can perform simple precision IEEE754 floating point calculations in software. A good compromise between FPGA real estate and (relative) high CPU speed is the top design goal. The first implementation is on Altera Cyclone II EP2C20. The floating point functions sin(), cos(), ... use the CORDIC algorithm. The CPU registers are designed to allow CORDIC implementation with minimum memory access, all CORDIC relevant data is kept in registers.

The opcode fetch and parameter fetch bus interface translates between the CRISC outer world of instructions with different length and the CRISC inner world of instructions with fixed length. The memory read interface needs 1 to 2 memory cycles to read the opcode and 0 to 3 memory cycles to read the optional parameter. The memory fetch can work independent of the ALUs, a limited pipelining is possible. The memory write interface can output 8 Bit data in 1 memory cycle, 16 Bit data in 1 or 2 memory cycles and 32 Bit data in 2 or 3 cycles. The memory write can work parallel to the ALUs, too. The larger number of cycles is needed if the data is not aligned, that is data starts at an odd memory address.

The three different types of CPU registers have three different ALUs. The address ALU can work in parallel to the data ALUs, the both data ALUs can not work in parallel. The 16 bit data ALU can perform logic operations like NOT, AND, OR, XOR, but can not perform shift operations. The 32 bit data ALU can perform shift operations but can not perform logic operations. All ALUs can perform arithmetric operations and sign extend operations. The address ALU can only perform ADD, the other ALUs can perform SUB and NEG, too.

The CRISC CPU has some constant registers with the values 0, 1 and 2. The value 0 is used for the NEG opcode. The values 1 and 2 are used to increment or decrement the program counter PC, A and R registers. The OF latch is used to hold the (sign extended) offset for the "data offset indirect" addressing mode. The JR latch is used to hold the (sign extended) offset for the "program relative" addressing mode. The C0 and C1 FIFO (first in first out) holds the opcode. The RD0 to RD3 latches hold the 8, 16 or 32 bit input parameter. The WD0 to WD3 latches hold the 8, 16 or 32 bit output parameter. The AD latch holds the address for a memory read or write operation. Part of the ALUs are the N (negative), Z (zero), C (carry) and V (overflow) flags.

This CPU wants to be boring. The CPU got a heavy influence from the Motorola 68000. In one point this CPU is special: Every register can only handle two data sizes, called "half data size" and "full data size". The Motorola 68000 and the Intel 80386 registers can handle three data sizes, 8 bit, 16 bit and 32 bit. The CRISC CPU tries to handle the detail of different data sizes and specially the detail of sign extent and zero extent smarter then the older CPUs.

CRISC CPU block diagram 20110302

Top level Description


CRISC CPU Instruction set

Name
Class
C
Purpose
RET
control
return;
return from subroutine
NOT
logic
R0 = ~R0
invert bits, 1-complement
NEG
arithmetric R0 = -R0
change sign, 2-complement
ASR
shift
R0 >>= 1 or R0 >>= 8
arithmetric shift right, keep sign
LSR
shift
R0 >>= 1 or R0 >>= 8 logic shift right, fill with zero
ASL, LSL
shift
R0 shift left, fill with zero
TST
arithmetric
N,Z = f(R0)
set N and Z flags depending on register contents
INC
arithmetric
++R0
increment register content
DEC
arithmetric
--R0
decrement register content
AND
logic
R0 &= R1
logical AND
OR
logic
R0 |= R1
logical OR
XOR
logic
R0 ^= R1
logical EXCLUSIVE OR
ADD
arithmetric
R0 += R1
add contents of R1 zu contents of R0
SUB
arithmetric R0 -= R1
subtract contents of R1 from contents of R0
CMP
arithmetric
N,C,V,Z = f(R0 - R1)
like SUB but only sets flags
JMP
control
goto X;
unconditional jump
Jcc
control
if (), while(), for()
conditional jump
Scc
control
R0 = f(N,C,V,Z)
load R0 with -1 or 0, depending of condition code cc
JSR
control
func();
jump subroutine
LD
transport

copy memory to register or register to memory with optional sign extend


CRISC CPU Addressing mode

addressing mode
C
Assembler
Description
Implied
return;
RET
return from subroutine
Immediate r0 = 0xAFFE; LDW R0,#$AFFE
load register with constant
Register direct
r0 = r1;
LDW R0,R1
load register with contents of another register
Program direct
func();
JSR func
call sub routine
Program relative
goto near;
JMP near
jump
Program indirect
(*a0)();
JSR (A0)
call sub routine, address is in register
Data direct r0 = variable; LDW R0,variable
load register with contents of memory element
Data indirect
r0 = *a0;
LDW R0,(A0)
load register with the contents of the memory element that is addressed by register A0
Data predecrement indirect r0 = *--a0;
LDW R0,-(A0)
decrement register A0 by the size of the memory element. Load register R0 with the contents of the memory element that is addressed by register A0
Data postincrement indirect r0 = *a0++;
LDW R0,(A0)+
load register R0 with the contents of the memory cell that is addressed by register A0. Increment register A0 by the size of the memory element.
Data offset indirect r0 = a0->offset;
LDW R0,offset(A0)
load register R0 with the contents of the memory element that is addressed by register A0 plus offset. Offset is a constant.


CRISC CPU Instruction format

1 Byte Instruction Control (ret, ...)
 7 6 5 4 3 2 1 0
+-----+---------+
|0 0 0| Opcode  |
+-----+---------+

2 Byte Instruction Control (scc, ...)
 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+-----+-+-------+-----------+---+
|1 0 0|1|  CC   |0 0 0 0 0 0|rg |
+-----+-+-------+-----------+---+

2 Byte Instruction Jump/Call Program relative
 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+-----+-+-------+---------------+
|0 0 1|S|  CC   | 8 Bit constant|
+-----+-+-------+---------------+

3 Byte Instruction Jump/Call Program direct, Program indirect
 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+-----+-+-------+-------------------------------+
|0 1 0|S|  CC   |       16 Bit constant         |
+-----+-+-------+-------------------------------+

2 Byte Instruction ALU (and, add, shift, inc, dec, ...)
 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+-----+-+-------+---+---+---+---+
|0 1 1|L|  ALU  |al2|rg2|al1|rg1|
+-----+-+-------+---+---+---+---+

2 Byte Instruction Load/Store
 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+-----+-+---+---+-+---+-----+---+
|1 0 0|0|sl1|rg1|d|ext| sl2 |rg2|
+-----+-+---+---+-+---+-----+---+

3 Byte Instruction Load/Store
 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+-----+-+---+---+-+---+-----+---+---------------+
|1 0 1|0|sl1|rg1|d|ext| sl2 |rg2| 8 Bit constant|
+-----+-+---+---+-+---+-----+---+---------------+

4 Byte Instruction Load/Store
 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+-----+-+---+---+-+---+-----+---+-------------------------------+
|1 1 0|0|sl1|rg1|d|ext| sl2 |rg2|       16 Bit constant         |
+-----+-+---+---+-+---+-----+---+-------------------------------+

6 Byte Instruction Load/Store
 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+-----+-+---+---+-+---+-----+---+---------------------------------------------------------------+
|1 1 1|0|sl1|rg1|0|0 0| sl2 |rg2|                      32 Bit constant                          |
+-----+-+---+---+-+---+-----+---+---------------------------------------------------------------+


The CRISC CPU has 16 Bit data bus, 16 Bit address bus. In the CPU there are six buses with 16 Bit each. Three Arithmetric Logic Units are available. The CPU does not have much of a pipeline. In the description I talk about a single clock. In real implementation maybe a two-phase non-overlapping clock (NORA) is used. Due to the byte-addressing/word data bus nature of the CPU, a data item (instruction, parameter) can be aligned or not. An data item is aligned if at instruction fetch A0 bit of A latch is zero. If aligned, a 8 Bit data item is on the lower 8 Bit of the databus, a 16 Bit data item uses the full databus. If not aligned a 8 Bit data item is on the higher 8 Bit of the databus, the lower 8 Bit of a 16 Bit data item is on the higher 8 Bit of the databus and the higher 8 Bit of a 16 Bit data item is on the lower 8 Bit of the databus of the next memory fetch. A 8 Bit data item and an aligned 16 bit data item needs 1 memory fetch. An aligned 32bit data item or a non-aligned 16 bit data item needs 2 memory fetches. A non-aligned 32bit data item needs 3 memory fetches.

The short hand notations for the CPU behaviour are:
on(X, Y) latch X activates tristate output Y.
off(X, Y) latch X de-activates tristate output Y.
forward(X) set level triggered latch X to forward data.
hold(X) set level triggered latch X to hold data.

The typical fetch/decode/execute/write back cycles are:
CPU cycle T0 (opcode fetch, 8 bit):
High-phase:
Addr ALU: op = *PC++: on(PC latch, right Addr-bus), on(+1 latch, left Addr-bus), forward(ALU latches), ALU = +
Addressbus: ADDR = PC: forward(A latch)
Low-phase:
Addr ALU: op = *PC++: hold(ALU latches), forward(PC latch), off(PC latch, right Addr-bus), off(+1 latch, left Addr-bus)
Addressbus: out(PC): hold(A latch)
Databus: op = DATA: forward(Opcode FIFO)

CPU cycle T1 (decode):
High-phase:
Databus: op = DATA: hold(Opcode FIFO)
Sequenzer: set up state machine initial state
Low-phase:

CPU cycle T2 (execute, 16 bit, data):
High-phase:
Data ALU: R0 += R1: on(R0 latch, left Low-bus), on(R1 latch, right Low-bus), forward(ALU latches), ALU = +
Low-phase:
Data ALU: R0 += R1: hold(ALU latches), forward(R0 latch), off(R0 latch, left Low-bus), off(R1 latch, right Low-bus)

CPU cycle T3 (write back, 16 bit, aligned):
High-phase:
Addr ALU: mem[A0] = R0: on(A0 latch, right Addr-bus)
Addressbus: ADDR = A0: forward(A latch)
Data ALU: mem[A0] = R0: on(R0 latch, left Low-bus)
Databus: DATA = R0: forward(WB LH latch), forward(WB LL latch), on(WB H TS), on(WB L TS)
Low-phase:
Addr ALU: mem[A0] = R0: off(A0 latch, right Addr-bus)
Addressbus: ADDR = A0: hold(A latch)
Data ALU: mem[A0] = R0: off(R0 latch, left Low-bus)
Databus: DATA = R0: hold(WB LH latch), hold(WB LL latch), on(WB H TS), on(WB L TS)
High-Phase:
Databus: off(WB H TS), off(WB L TS)

... to be continued

© copyright 1999-2017 OpenCores.org, equivalent to ORSoC AB, all rights reserved. OpenCores®, registered trademark.