Subversion Repositories zipcpu
Compare Revisions
- This comparison shows the changes necessary to convert path
/
- from Rev 1 to Rev 2
- ↔ Reverse comparison
Rev 1 → Rev 2
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
; |
; Filename: lodsto.S |
; |
; Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
; |
; Purpose: A quick test of whether or not we can execute loads and |
; stores. The test does not report success or failure, so |
; you will need to observe it in a simulator to know if it |
; worked or didn't. |
; |
; Creator: Dan Gisselquist, Ph.D. |
; Gisselquist Tecnology, LLC |
; |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
; |
; Copyright (C) 2015, Gisselquist Technology, LLC |
; |
; This program is free software (firmware): you can redistribute it and/or |
; modify it under the terms of the GNU General Public License as published |
; by the Free Software Foundation, either version 3 of the License, or (at |
; your option) any later version. |
; |
; This program is distributed in the hope that it will be useful, but WITHOUT |
; ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
; for more details. |
; |
; License: GPL, v3, as defined and found on www.gnu.org, |
; http://www.gnu.org/licenses/gpl.html |
; |
; |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
; |
start: |
LDI $2,R2 |
LOD $5(PC),R0 |
LOD $5(PC),R1 |
STO R1,(R0) |
LDI $1(PC),R0 |
infloop: |
MOV R0,PC |
MOV R0,PC |
MOV R0,PC |
.DAT 0xc0000000 |
.DAT 0x8001ffff |
|
|
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
; |
; Filename: pcpc.S |
; |
; Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
; |
; Purpose: A quick test of whether or not the busy command works. |
; The test does not report success or failure, so you will need |
; to observe it in a simulator to know if it worked or not. |
; |
; Creator: Dan Gisselquist, Ph.D. |
; Gisselquist Tecnology, LLC |
; |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
; |
; Copyright (C) 2015, Gisselquist Technology, LLC |
; |
; This program is free software (firmware): you can redistribute it and/or |
; modify it under the terms of the GNU General Public License as published |
; by the Free Software Foundation, either version 3 of the License, or (at |
; your option) any later version. |
; |
; This program is distributed in the hope that it will be useful, but WITHOUT |
; ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
; for more details. |
; |
; License: GPL, v3, as defined and found on www.gnu.org, |
; http://www.gnu.org/licenses/gpl.html |
; |
; |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
; |
start: |
LDI $1,R0 |
MOV $1(R0),R1 |
MOV $2(R0),R2 |
MOV $3(R0),R3 |
MOV $4(R0),R4 |
MOV $5(R0),R5 |
MOV $6(R0),R6 |
MOV $7(R0),R7 |
MOV $8(R0),R8 |
MOV $9(R0),R9 |
MOV $10(R0),R10 |
MOV $11(R0),R11 |
MOV $12(R0),R12 |
MOV $13(R0),R13 ; R14 is CC, R15 is PC |
LDI $0,R0 |
BUSY ; This should create an endless loop here |
; MOV R0,R0 |
; MOV R0,R0 |
; MOV R0,R0 ; By this point, the loop should've started |
LDI $10,R0 ; If we ever get here, we've got problems |
ADD $1(R0),R1 |
ADD $2(R0),R2 |
ADD $3(R0),R3 |
MOV R0,R0 |
MOV R0,R0 ; If we ever get here, we've got problems |
HALT |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
; |
; Filename: ivec.S |
; |
; Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
; |
; Purpose: Just to test whether or not a timer works as desired. This |
; will set the timer to interrupt every millisecond, and then |
; update a counter on every interrupt. |
; |
; On any failure, the processor will execute a BUSY command. |
; |
; Creator: Dan Gisselquist, Ph.D. |
; Gisselquist Tecnology, LLC |
; |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
; |
; Copyright (C) 2015, Gisselquist Technology, LLC |
; |
; This program is free software (firmware): you can redistribute it and/or |
; modify it under the terms of the GNU General Public License as published |
; by the Free Software Foundation, either version 3 of the License, or (at |
; your option) any later version. |
; |
; This program is distributed in the hope that it will be useful, but WITHOUT |
; ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
; FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
; for more details. |
; |
; License: GPL, v3, as defined and found on www.gnu.org, |
; http://www.gnu.org/licenses/gpl.html |
; |
; |
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; |
; |
; Registers: |
; sR0 Peripheral address |
; sR2 Interrupt controller command |
; sR3 Timer peripheral command |
; sR4 User program entry address (Could also be (re)entry address, |
; but isn't in this implementation) |
; sR5 Whether or not we've gotten the first interrupt |
; sR6 Number of times we've been interrupted |
; sR7 Number of times R6 has overflowed |
reset: |
CLR R0 ; Load the address of the interrupt controller |
LDIHI $c000h,R0 ; into R0 |
LDI $-1,R2 ; Acknowledge and disable all interrupts |
LDIHI $7fffh,R2 ; |
STO R2,(R0) ; |
; Set the timer for a programmaable interrupt, every 100k clocks, |
; or roughly 1,000 times a second on a 100 MHz clock. |
LDIHI $0xc001h,R3 ; R3 = 100k, save that the top two bits are |
LDILO $0x86a0h,R3 ; also set (start timer, and auto reload) |
STO R3,$6(C0) |
; Now that timer-C is set, let's enable it's interrupts |
LDIHI $8004h,R2 ; Leaving the bottom all ones acknowledges and |
STO R2,(R0) ; clears any interrupts (again) |
; Clear our counter variables |
CLR R5 |
CLR R6 |
CLR R7 |
; Program our wait for interrupt routine |
MOV $8(PC),R4 |
MOV R4,uPC |
RTU |
on_first_interrupt: |
ADD $1,R5 |
setup_for_next_interrupt: |
RTU |
on_subsequent_interrupt: |
ADD $1,R6 |
ADD.C $1,R7 |
BRA $-4 |
haltcpu: |
BUSY ; We've failed if we ever get here |
|
waitforinterrupt: |
WAIT |
BRA $-2 |
MOV $0,R0 |
MOV $0,R0 |
BUSY |
/////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: zippy_tb.cpp |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: A bench simulator for the CPU. Eventually, you should be |
// able to give this program the name of a piece of compiled |
// code to load into memory. For now, we hand assemble with the |
// computers help. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////////// |
// |
// |
#include <signal.h> |
#include <time.h> |
|
#include <ctype.h> |
#include <ncurses.h> |
|
#include "verilated.h" |
#include "Vzipsystem.h" |
|
#include "testb.h" |
// #include "twoc.h" |
// #include "qspiflashsim.h" |
#include "memsim.h" |
#include "zopcodes.h" |
#include "zparser.h" |
|
#define CMD_REG 0 |
#define CMD_DATA 1 |
#define CMD_HALT (1<<10) |
#define CMD_STALL (1<<9) |
#define CMD_STEP (1<<8) |
#define CMD_INT (1<<7) |
#define CMD_RESET (1<<6) |
|
|
// No particular "parameters" need definition or redefinition here. |
class ZIPPY_TB : public TESTB<Vzipsystem> { |
public: |
unsigned long m_tx_busy_count; |
MEMSIM m_mem; |
// QSPIFLASHSIM m_flash; |
FILE *dbg_fp; |
bool dbg_flag, bomb; |
|
ZIPPY_TB(void) : m_mem(1<<20) { |
//dbg_fp = fopen("dbg.txt", "w"); |
dbg_fp = NULL; |
dbg_flag = false; |
bomb = false; |
} |
|
void reset(void) { |
// m_flash.debug(false); |
TESTB<Vzipsystem>::reset(); |
} |
|
bool on_tick(void) { |
tick(); |
return true; |
} |
|
void showval(int y, int x, const char *lbl, unsigned int v) { |
mvprintw(y,x, "%s: 0x%08x", lbl, v); |
} |
|
void dispreg(int y, int x, const char *n, unsigned int v) { |
// 4,4,8,1 = 17 of 20, +3 = 19 |
mvprintw(y, x, "%s: 0x%08x", n, v); |
} |
|
void showreg(int y, int x, const char *n, int r) { |
// 4,4,8,1 = 17 of 20, +3 = 19 |
mvprintw(y, x, "%s: 0x%08x", n, m_core->v__DOT__thecpu__DOT__regset[r]); |
addch( ((r == m_core->v__DOT__thecpu__DOT__dcdA) |
&&(m_core->v__DOT__thecpu__DOT__dcdvalid) |
&&(m_core->v__DOT__thecpu__DOT__dcdA_rd)) |
?'a':' '); |
addch( ((r == m_core->v__DOT__thecpu__DOT__dcdB) |
&&(m_core->v__DOT__thecpu__DOT__dcdvalid) |
&&(m_core->v__DOT__thecpu__DOT__dcdB_rd)) |
?'b':' '); |
addch( ((r == m_core->v__DOT__thecpu__DOT__wr_reg_id) |
&&(m_core->v__DOT__thecpu__DOT__wr_reg_ce)) |
?'W':' '); |
} |
|
void showins(int y, const char *lbl, const int ce, const int valid, |
const int gie, const int stall, const unsigned int pc) { |
char line[80]; |
|
if (ce) |
mvprintw(y, 0, "Ck "); |
else |
mvprintw(y, 0, " "); |
if (stall) |
printw("Stl "); |
else |
printw(" "); |
printw("%s: 0x%08x", lbl, pc); |
|
if (valid) { |
if (gie) attroff(A_BOLD); |
else attron(A_BOLD); |
zipi_to_string(m_mem[pc], line); |
printw(" %-20s", &line[1]); |
} else { |
attroff(A_BOLD); |
printw(" (0x%08x)%28s", m_mem[pc],""); |
} |
attroff(A_BOLD); |
} |
|
void dbgins(const char *lbl, const int ce, const int valid, |
const int gie, const int stall, const unsigned int pc) { |
char line[80]; |
|
if (!dbg_fp) |
return; |
|
if (ce) |
fprintf(dbg_fp, "%s Ck ", lbl); |
else |
fprintf(dbg_fp, "%s ", lbl); |
if (stall) |
fprintf(dbg_fp, "Stl "); |
else |
fprintf(dbg_fp, " "); |
fprintf(dbg_fp, "0x%08x: ", pc); |
|
if (valid) { |
zipi_to_string(m_mem[pc], line); |
fprintf(dbg_fp, " %-20s\n", &line[1]); |
} else { |
fprintf(dbg_fp, " (0x%08x)\n", m_mem[pc]); |
} |
} |
|
void show_state(void) { |
int ln= 0; |
|
mvprintw(ln,0, "Peripherals-SS"); ln++; |
/* |
showval(ln, 1, "TRAP", m_core->v__DOT__trap_data); |
mvprintw(ln, 17, "%s%s", |
((m_core->v__DOT__sys_cyc) |
&&(m_core->v__DOT__sys_we) |
&&(m_core->v__DOT__sys_addr == 0))?"W":" ", |
(m_core->v__DOT__trap_int)?"I":" "); |
*/ |
showval(ln, 1, "PIC ", m_core->v__DOT__pic_data); |
showval(ln,21, "WDT ", m_core->v__DOT__watchdog__DOT__r_value); |
showval(ln,41, "CACH", m_core->v__DOT__manualcache__DOT__cache_base); |
showval(ln,61, "PIC2", m_core->v__DOT__ctri__DOT__r_int_state); |
|
ln++; |
showval(ln, 1, "TMRA", m_core->v__DOT__timer_a__DOT__r_value); |
showval(ln,21, "TMRB", m_core->v__DOT__timer_b__DOT__r_value); |
showval(ln,41, "TMRB", m_core->v__DOT__timer_c__DOT__r_value); |
showval(ln,61, "JIF ", m_core->v__DOT__jiffies__DOT__r_counter); |
|
ln++; |
showval(ln, 1, "UTSK", m_core->v__DOT__utc_data); |
showval(ln,21, "UMST", m_core->v__DOT__umc_data); |
showval(ln,41, "UPST", m_core->v__DOT__upc_data); |
showval(ln,61, "UAST", m_core->v__DOT__uac_data); |
|
ln++; |
mvprintw(ln, 40, "%s %s", |
(m_core->v__DOT__cpu_halt)? "CPU-HALT": " ", |
(m_core->v__DOT__cpu_reset)?"CPU-RESET":" "); ln++; |
mvprintw(ln, 40, "%s %s %s 0x%02x", |
(m_core->v__DOT__cmd_halt)? "HALT": " ", |
(m_core->v__DOT__cmd_reset)?"RESET":" ", |
(m_core->v__DOT__cmd_step)? "STEP" :" ", |
(m_core->v__DOT__cmd_addr)&0x3f); |
if (m_core->v__DOT__thecpu__DOT__gie) |
attroff(A_BOLD); |
else |
attron(A_BOLD); |
mvprintw(ln, 0, "Supervisor Registers"); |
ln++; |
|
showreg(ln, 1, "sR0 ", 0); |
showreg(ln,21, "sR1 ", 1); |
showreg(ln,41, "sR2 ", 2); |
showreg(ln,61, "sR3 ", 3); ln++; |
|
showreg(ln, 1, "sR4 ", 4); |
showreg(ln,21, "sR5 ", 5); |
showreg(ln,41, "sR6 ", 6); |
showreg(ln,61, "sR7 ", 7); ln++; |
|
showreg(ln, 1, "sR8 ", 8); |
showreg(ln,21, "sR9 ", 9); |
showreg(ln,41, "sR10", 10); |
showreg(ln,61, "sR11", 11); ln++; |
|
showreg(ln, 1, "sR12", 12); |
showreg(ln,21, "sSP ", 13); |
mvprintw(ln,41, "sCC :%s%s%s%s%s%s%s", |
(m_core->v__DOT__thecpu__DOT__step)?"STP":" ", |
(m_core->v__DOT__thecpu__DOT__sleep)?"SLP":" ", |
(m_core->v__DOT__thecpu__DOT__gie)?"GIE":" ", |
(m_core->v__DOT__thecpu__DOT__iflags&8)?"V":" ", |
(m_core->v__DOT__thecpu__DOT__iflags&4)?"N":" ", |
(m_core->v__DOT__thecpu__DOT__iflags&2)?"C":" ", |
(m_core->v__DOT__thecpu__DOT__iflags&1)?"Z":" "); |
mvprintw(ln,61, "sPC : 0x%08x", m_core->v__DOT__thecpu__DOT__ipc); |
ln++; |
|
if (m_core->v__DOT__thecpu__DOT__gie) |
attron(A_BOLD); |
else |
attroff(A_BOLD); |
mvprintw(ln, 0, "User Registers"); ln++; |
showreg(ln, 1, "uR0 ", 16); |
showreg(ln,21, "uR1 ", 17); |
showreg(ln,41, "uR2 ", 18); |
showreg(ln,61, "uR3 ", 19); ln++; |
|
showreg(ln, 1, "uR4 ", 20); |
showreg(ln,21, "uR5 ", 21); |
showreg(ln,41, "uR6 ", 22); |
showreg(ln,61, "uR7 ", 23); ln++; |
|
showreg(ln, 1, "uR8 ", 24); |
showreg(ln,21, "uR9 ", 25); |
showreg(ln,41, "uR10", 26); |
showreg(ln,61, "uR11", 27); ln++; |
|
showreg(ln, 1, "uR12", 28); |
showreg(ln,21, "uSP ", 29); |
mvprintw(ln,41, "uCC :%s%s%s%s%s%s%s", |
(m_core->v__DOT__thecpu__DOT__step)?"STP":" ", |
(m_core->v__DOT__thecpu__DOT__sleep)?"SLP":" ", |
(m_core->v__DOT__thecpu__DOT__gie)?"GIE":" ", |
(m_core->v__DOT__thecpu__DOT__flags&8)?"V":" ", |
(m_core->v__DOT__thecpu__DOT__flags&4)?"N":" ", |
(m_core->v__DOT__thecpu__DOT__flags&2)?"C":" ", |
(m_core->v__DOT__thecpu__DOT__flags&1)?"Z":" "); |
mvprintw(ln,61, "uPC : 0x%08x", m_core->v__DOT__thecpu__DOT__upc); |
|
attroff(A_BOLD); |
ln+=1; |
|
mvprintw(ln, 0, "PFPIPE: rda=%08x/%d, bas=%08x, off=%08x, nv=%08x", |
m_core->v__DOT__thecpu__DOT__pf__DOT__r_addr, |
m_core->v__DOT__thecpu__DOT__pf__DOT__r_cv, |
m_core->v__DOT__thecpu__DOT__pf__DOT__r_cache_base, |
m_core->v__DOT__thecpu__DOT__pf__DOT__r_cache_offset, |
m_core->v__DOT__thecpu__DOT__pf__DOT__r_nvalid); |
ln++; |
mvprintw(ln, 0, "PF BUS: %3s %3s %s @0x%08x[0x%08x] -> %s %s %08x", |
(m_core->v__DOT__thecpu__DOT__pf_cyc)?"CYC":" ", |
(m_core->v__DOT__thecpu__DOT__pf_stb)?"STB":" ", |
" ", // (m_core->v__DOT__thecpu__DOT__pf_we )?"WE":" ", |
(m_core->v__DOT__thecpu__DOT__pf_addr), |
0, // (m_core->v__DOT__thecpu__DOT__pf_data), |
(m_core->v__DOT__thecpu__DOT__pf_ack)?"ACK":" ", |
(m_core->v__DOT__cpu_stall)?"STL":" ", |
(m_core->v__DOT__wb_data)); ln++; |
|
mvprintw(ln, 0, "MEMBUS: %3s %3s %s @0x%08x[0x%08x] -> %s %s %08x", |
(m_core->v__DOT__thecpu__DOT__mem_cyc)?"CYC":" ", |
(m_core->v__DOT__thecpu__DOT__mem_stb)?"STB":" ", |
(m_core->v__DOT__thecpu__DOT__mem_we )?"WE":" ", |
(m_core->v__DOT__thecpu__DOT__mem_addr), |
(m_core->v__DOT__thecpu__DOT__mem_data), |
(m_core->v__DOT__thecpu__DOT__mem_ack)?"ACK":" ", |
(m_core->v__DOT__cpu_stall)?"STL":" ", |
(m_core->v__DOT__thecpu__DOT__mem_result)); ln++; |
|
mvprintw(ln, 0, "SYSBUS: %3s %3s %s @0x%08x[0x%08x] -> %s %s %08x", |
(m_core->o_wb_cyc)?"CYC":" ", |
(m_core->o_wb_stb)?"STB":" ", |
(m_core->o_wb_we )?"WE":" ", |
(m_core->o_wb_addr), |
(m_core->o_wb_data), |
(m_core->i_wb_ack)?"ACK":" ", |
(m_core->i_wb_stall)?"STL":" ", |
(m_core->i_wb_data)); ln+=2; |
|
showins(ln, "I ", |
!m_core->v__DOT__thecpu__DOT__dcd_stalled, |
m_core->v__DOT__thecpu__DOT__pf_valid, |
//m_core->v__DOT__thecpu__DOT__instruction_gie, |
m_core->v__DOT__thecpu__DOT__gie, |
0, |
// m_core->v__DOT__thecpu__DOT__instruction_pc); ln++; |
m_core->v__DOT__thecpu__DOT__pf_pc); ln++; |
|
showins(ln, "Dc", |
m_core->v__DOT__thecpu__DOT__dcd_ce, |
m_core->v__DOT__thecpu__DOT__dcdvalid, |
m_core->v__DOT__thecpu__DOT__dcd_gie, |
m_core->v__DOT__thecpu__DOT__dcd_stalled, |
m_core->v__DOT__thecpu__DOT__dcd_pc-1); ln++; |
|
showins(ln, "Op", |
m_core->v__DOT__thecpu__DOT__op_ce, |
m_core->v__DOT__thecpu__DOT__opvalid, |
m_core->v__DOT__thecpu__DOT__op_gie, |
m_core->v__DOT__thecpu__DOT__op_stall, |
m_core->v__DOT__thecpu__DOT__op_pc-1); ln++; |
|
showins(ln, "Al", |
m_core->v__DOT__thecpu__DOT__alu_ce, |
m_core->v__DOT__thecpu__DOT__alu_pc_valid, |
m_core->v__DOT__thecpu__DOT__alu_gie, |
m_core->v__DOT__thecpu__DOT__alu_stall, |
m_core->v__DOT__thecpu__DOT__alu_pc-1); ln++; |
|
mvprintw(ln-4, 48, |
(m_core->v__DOT__thecpu__DOT__new_pc)?"new-pc":" "); |
printw("(%s:%02x,%x)", |
(m_core->v__DOT__thecpu__DOT__set_cond)?"SET":" ", |
(m_core->v__DOT__thecpu__DOT__opF&0x0ff), |
(m_core->v__DOT__thecpu__DOT__op_gie) |
? (m_core->v__DOT__thecpu__DOT__w_uflags) |
: (m_core->v__DOT__thecpu__DOT__w_iflags)); |
|
printw("(%s%s%s:%02x)", |
(m_core->v__DOT__thecpu__DOT__opF_wr)?"OF":" ", |
(m_core->v__DOT__thecpu__DOT__alF_wr)?"FL":" ", |
(m_core->v__DOT__thecpu__DOT__wr_flags_ce)?"W":" ", |
(m_core->v__DOT__thecpu__DOT__alu_flags)); |
/* |
mvprintw(ln-3, 48, "dcdI : 0x%08x", |
m_core->v__DOT__thecpu__DOT__dcdI); |
mvprintw(ln-2, 48, "r_opB: 0x%08x", |
m_core->v__DOT__thecpu__DOT__opB); |
*/ |
mvprintw(ln-3, 48, "Op(%x)%8x %8x->%08x", |
m_core->v__DOT__thecpu__DOT__opn, |
m_core->v__DOT__thecpu__DOT__opA, |
m_core->v__DOT__thecpu__DOT__opB, |
m_core->v__DOT__thecpu__DOT__alu_result); |
mvprintw(ln-1, 48, "MEM: %s%s %s%s %s %-5s", |
(m_core->v__DOT__thecpu__DOT__opM)?"M":" ", |
(m_core->v__DOT__thecpu__DOT__mem_ce)?"CE":" ", |
(m_core->v__DOT__thecpu__DOT__mem_we)?"Wr ":"Rd ", |
(m_core->v__DOT__thecpu__DOT__mem_stalled)?"PIPE":" ", |
(m_core->v__DOT__thecpu__DOT__mem_valid)?"MEMV":" ", |
zop_regstr[(m_core->v__DOT__thecpu__DOT__mem_wreg&0x1f)^0x10]); |
} |
|
unsigned int cmd_read(unsigned int a) { |
if (dbg_fp) { |
dbg_flag= true; |
fprintf(dbg_fp, "CMD-READ(%d)\n", a); |
} |
wb_write(CMD_REG, CMD_HALT|(a&0x3f)); |
while((wb_read(CMD_REG) & CMD_STALL) == 0) |
; |
unsigned int v = wb_read(CMD_DATA); |
|
if (dbg_flag) |
fprintf(dbg_fp, "CMD-READ(%d) = 0x%08x\n", a, |
v); |
dbg_flag = false; |
return v; |
} |
|
void read_state(void) { |
int ln= 0; |
|
mvprintw(ln,0, "Peripherals-RS"); ln++; |
showval(ln, 1, "PIC ", cmd_read(32+ 0)); |
showval(ln,21, "WDT ", cmd_read(32+ 1)); |
showval(ln,41, "CACH", cmd_read(32+ 2)); |
showval(ln,61, "PIC2", cmd_read(32+ 3)); |
ln++; |
showval(ln, 1, "TMRA", cmd_read(32+ 4)); |
showval(ln,21, "TMRB", cmd_read(32+ 5)); |
showval(ln,41, "TMRC", cmd_read(32+ 6)); |
showval(ln,61, "JIF ", cmd_read(32+ 7)); |
|
ln++; |
showval(ln, 1, "UTSK", cmd_read(32+12)); |
showval(ln,21, "UMST", cmd_read(32+13)); |
showval(ln,41, "UPST", cmd_read(32+14)); |
showval(ln,61, "UAST", cmd_read(32+15)); |
|
ln++; |
ln++; |
unsigned int cc = cmd_read(14); |
if (dbg_fp) fprintf(dbg_fp, "CC = %08x, gie = %d\n", cc, |
m_core->v__DOT__thecpu__DOT__gie); |
if (cc & 0x020) |
attroff(A_BOLD); |
else |
attron(A_BOLD); |
mvprintw(ln, 0, "Supervisor Registers"); |
ln++; |
|
dispreg(ln, 1, "sR0 ", cmd_read(0)); |
dispreg(ln,21, "sR1 ", cmd_read(1)); |
dispreg(ln,41, "sR2 ", cmd_read(2)); |
dispreg(ln,61, "sR3 ", cmd_read(3)); ln++; |
|
dispreg(ln, 1, "sR4 ", cmd_read(4)); |
dispreg(ln,21, "sR5 ", cmd_read(5)); |
dispreg(ln,41, "sR6 ", cmd_read(6)); |
dispreg(ln,61, "sR7 ", cmd_read(7)); ln++; |
|
dispreg(ln, 1, "sR8 ", cmd_read( 8)); |
dispreg(ln,21, "sR9 ", cmd_read( 9)); |
dispreg(ln,41, "sR10", cmd_read(10)); |
dispreg(ln,61, "sR11", cmd_read(11)); ln++; |
|
dispreg(ln, 1, "sR12", cmd_read(12)); |
dispreg(ln,21, "sSP ", cmd_read(13)); |
|
mvprintw(ln,41, "sCC :%s%s%s%s%s%s%s", |
(cc & 0x040)?"STP":" ", |
(cc & 0x020)?"GIE":" ", |
(cc & 0x010)?"SLP":" ", |
(cc&8)?"V":" ", |
(cc&4)?"N":" ", |
(cc&2)?"C":" ", |
(cc&1)?"Z":" "); |
mvprintw(ln,61, "sPC : 0x%08x", cmd_read(15)); |
ln++; |
|
if (cc & 0x020) |
attron(A_BOLD); |
else |
attroff(A_BOLD); |
mvprintw(ln, 0, "User Registers"); ln++; |
dispreg(ln, 1, "uR0 ", cmd_read(16)); |
dispreg(ln,21, "uR1 ", cmd_read(17)); |
dispreg(ln,41, "uR2 ", cmd_read(18)); |
dispreg(ln,61, "uR3 ", cmd_read(19)); ln++; |
|
dispreg(ln, 1, "uR4 ", cmd_read(20)); |
dispreg(ln,21, "uR5 ", cmd_read(21)); |
dispreg(ln,41, "uR6 ", cmd_read(22)); |
dispreg(ln,61, "uR7 ", cmd_read(23)); ln++; |
|
dispreg(ln, 1, "uR8 ", cmd_read(24)); |
dispreg(ln,21, "uR9 ", cmd_read(25)); |
dispreg(ln,41, "uR10", cmd_read(26)); |
dispreg(ln,61, "uR11", cmd_read(27)); ln++; |
|
dispreg(ln, 1, "uR12", cmd_read(28)); |
dispreg(ln,21, "uSP ", cmd_read(29)); |
cc = cmd_read(30); |
mvprintw(ln,41, "uCC :%s%s%s%s%s%s%s", |
(cc&0x040)?"STP":" ", |
(cc&0x020)?"GIE":" ", |
(cc&0x010)?"SLP":" ", |
(cc&8)?"V":" ", |
(cc&4)?"N":" ", |
(cc&2)?"C":" ", |
(cc&1)?"Z":" "); |
mvprintw(ln,61, "uPC : 0x%08x", cmd_read(31)); |
|
attroff(A_BOLD); |
ln+=2; |
|
ln+=3; |
|
showins(ln, "I ", |
!m_core->v__DOT__thecpu__DOT__dcd_stalled, |
m_core->v__DOT__thecpu__DOT__pf_valid, |
m_core->v__DOT__thecpu__DOT__gie, |
0, |
// m_core->v__DOT__thecpu__DOT__instruction_pc); ln++; |
m_core->v__DOT__thecpu__DOT__pf_pc); ln++; |
|
showins(ln, "Dc", |
m_core->v__DOT__thecpu__DOT__dcd_ce, |
m_core->v__DOT__thecpu__DOT__dcdvalid, |
m_core->v__DOT__thecpu__DOT__dcd_gie, |
m_core->v__DOT__thecpu__DOT__dcd_stalled, |
m_core->v__DOT__thecpu__DOT__dcd_pc-1); ln++; |
|
showins(ln, "Op", |
m_core->v__DOT__thecpu__DOT__op_ce, |
m_core->v__DOT__thecpu__DOT__opvalid, |
m_core->v__DOT__thecpu__DOT__op_gie, |
m_core->v__DOT__thecpu__DOT__op_stall, |
m_core->v__DOT__thecpu__DOT__op_pc-1); ln++; |
|
showins(ln, "Al", |
m_core->v__DOT__thecpu__DOT__alu_ce, |
m_core->v__DOT__thecpu__DOT__alu_pc_valid, |
m_core->v__DOT__thecpu__DOT__alu_gie, |
m_core->v__DOT__thecpu__DOT__alu_stall, |
m_core->v__DOT__thecpu__DOT__alu_pc-1); ln++; |
} |
void tick(void) { |
int gie = m_core->v__DOT__thecpu__DOT__gie; |
/* |
m_core->i_qspi_dat = m_flash(m_core->o_qspi_cs_n, |
m_core->o_qspi_sck, |
m_core->o_qspi_dat); |
*/ |
|
m_mem(m_core->o_wb_cyc, m_core->o_wb_stb, m_core->o_wb_we, |
m_core->o_wb_addr & ((1<<20)-1), m_core->o_wb_data, |
m_core->i_wb_ack, m_core->i_wb_stall,m_core->i_wb_data); |
|
if ((dbg_flag)&&(dbg_fp)) { |
fprintf(dbg_fp, "DBG %s %s %s @0x%08x/%d[0x%08x] %s %s [0x%08x] %s %s %s%s%s%s%s%s%s%s\n", |
(m_core->i_dbg_cyc)?"CYC":" ", |
(m_core->i_dbg_stb)?"STB": |
((m_core->v__DOT__dbg_stb)?"DBG":" "), |
((m_core->i_dbg_we)?"WE":" "), |
(m_core->i_dbg_addr),0, |
m_core->i_dbg_data, |
(m_core->o_dbg_ack)?"ACK":" ", |
(m_core->o_dbg_stall)?"STALL":" ", |
(m_core->o_dbg_data), |
(m_core->v__DOT__cpu_halt)?"CPU-HALT ":"", |
(m_core->v__DOT__cpu_dbg_stall)?"CPU-DBG_STALL":"", |
(m_core->v__DOT__thecpu__DOT__dcdvalid)?"DCDV ":"", |
(m_core->v__DOT__thecpu__DOT__opvalid)?"OPV ":"", |
(m_core->v__DOT__thecpu__DOT__pf_cyc)?"PCYC ":"", |
(m_core->v__DOT__thecpu__DOT__mem_cyc)?"MCYC ":"", |
(m_core->v__DOT__thecpu__DOT__alu_wr)?"ALUW ":"", |
(m_core->v__DOT__thecpu__DOT__alu_ce)?"ALCE ":"", |
(m_core->v__DOT__thecpu__DOT__alu_valid)?"ALUV ":"", |
(m_core->v__DOT__thecpu__DOT__mem_valid)?"MEMV ":""); |
fprintf(dbg_fp, " SYS %s %s %s @0x%08x/%d[0x%08x] %s [0x%08x]\n", |
(m_core->v__DOT__sys_cyc)?"CYC":" ", |
(m_core->v__DOT__sys_stb)?"STB":" ", |
(m_core->v__DOT__sys_we)?"WE":" ", |
(m_core->v__DOT__sys_addr), |
(m_core->v__DOT__dbg_addr), |
(m_core->v__DOT__sys_data), |
(m_core->v__DOT__dbg_ack)?"ACK":" ", |
(m_core->v__DOT__wb_data)); |
} |
|
if (dbg_fp) |
fprintf(dbg_fp, "CEs %d/0x%08x,%d/0x%08x DCD: ->%02x, OP: ->%02x, ALU: halt=%d,%d ce=%d, valid=%d, wr=%d Reg=%02x, IPC=%08x, UPC=%08x\n", |
m_core->v__DOT__thecpu__DOT__dcd_ce, |
m_core->v__DOT__thecpu__DOT__dcd_pc, |
m_core->v__DOT__thecpu__DOT__op_ce, |
m_core->v__DOT__thecpu__DOT__op_pc, |
m_core->v__DOT__thecpu__DOT__dcdA, |
m_core->v__DOT__thecpu__DOT__opR, |
m_core->v__DOT__cmd_halt, |
m_core->v__DOT__cpu_halt, |
m_core->v__DOT__thecpu__DOT__alu_ce, |
m_core->v__DOT__thecpu__DOT__alu_valid, |
m_core->v__DOT__thecpu__DOT__alu_wr, |
m_core->v__DOT__thecpu__DOT__alu_reg, |
m_core->v__DOT__thecpu__DOT__ipc, |
m_core->v__DOT__thecpu__DOT__upc); |
|
if ((dbg_fp)&&(!gie)&&(m_core->v__DOT__thecpu__DOT__w_release_from_interrupt)) { |
fprintf(dbg_fp, "RELEASE: int=%d, %d/%02x[%08x] ?/%02x[0x%08x], ce=%d %d,%d,%d\n", |
m_core->v__DOT__pic_interrupt, |
m_core->v__DOT__thecpu__DOT__wr_reg_ce, |
m_core->v__DOT__thecpu__DOT__wr_reg_id, |
m_core->v__DOT__thecpu__DOT__wr_reg_vl, |
m_core->v__DOT__cmd_addr, |
m_core->v__DOT__dbg_idata, |
m_core->v__DOT__thecpu__DOT__master_ce, |
m_core->v__DOT__thecpu__DOT__alu_wr, |
m_core->v__DOT__thecpu__DOT__alu_valid, |
m_core->v__DOT__thecpu__DOT__mem_valid); |
} else if ((dbg_fp)&&(gie)&&(m_core->v__DOT__thecpu__DOT__w_switch_to_interrupt)) { |
fprintf(dbg_fp, "SWITCH: %d/%02x[%08x] ?/%02x[0x%08x], ce=%d %d,%d,%d, F%02x,%02x\n", |
m_core->v__DOT__thecpu__DOT__wr_reg_ce, |
m_core->v__DOT__thecpu__DOT__wr_reg_id, |
m_core->v__DOT__thecpu__DOT__wr_reg_vl, |
m_core->v__DOT__cmd_addr, |
m_core->v__DOT__dbg_idata, |
m_core->v__DOT__thecpu__DOT__master_ce, |
m_core->v__DOT__thecpu__DOT__alu_wr, |
m_core->v__DOT__thecpu__DOT__alu_valid, |
m_core->v__DOT__thecpu__DOT__mem_valid, |
m_core->v__DOT__thecpu__DOT__w_iflags, |
m_core->v__DOT__thecpu__DOT__w_uflags); |
fprintf(dbg_fp, "\tbrk=%d,%d\n", |
m_core->v__DOT__thecpu__DOT__break_en, |
m_core->v__DOT__thecpu__DOT__op_break); |
} |
|
TESTB<Vzipsystem>::tick(); |
if ((dbg_fp)&&(gie != m_core->v__DOT__thecpu__DOT__gie)) { |
fprintf(dbg_fp, "SWITCH FROM %s to %s: sPC = 0x%08x uPC = 0x%08x pf_pc = 0x%08x\n", |
(gie)?"User":"Supervisor", |
(gie)?"Supervisor":"User", |
m_core->v__DOT__thecpu__DOT__ipc, |
m_core->v__DOT__thecpu__DOT__upc, |
m_core->v__DOT__thecpu__DOT__pf_pc); |
} if (dbg_fp) { |
dbgins("Op - ", m_core->v__DOT__thecpu__DOT__op_ce, |
m_core->v__DOT__thecpu__DOT__opvalid, |
m_core->v__DOT__thecpu__DOT__op_gie, |
m_core->v__DOT__thecpu__DOT__op_stall, |
m_core->v__DOT__thecpu__DOT__op_pc-1); |
dbgins("Al - ", |
m_core->v__DOT__thecpu__DOT__alu_ce, |
m_core->v__DOT__thecpu__DOT__alu_pc_valid, |
m_core->v__DOT__thecpu__DOT__alu_gie, |
m_core->v__DOT__thecpu__DOT__alu_stall, |
m_core->v__DOT__thecpu__DOT__alu_pc-1); |
|
} |
|
if (m_core->v__DOT__cpu_dbg_we) { |
printf("WRITE-ENABLE!!\n"); |
bomb = true; |
} |
} |
|
bool test_success(void) { |
return ((!m_core->v__DOT__thecpu__DOT__gie) |
&&(m_core->v__DOT__thecpu__DOT__sleep)); |
} |
|
bool test_failure(void) { |
return ((m_core->v__DOT__thecpu__DOT__alu_pc_valid) |
&&(!m_core->v__DOT__thecpu__DOT__alu_gie) |
&&(m_mem[m_core->v__DOT__thecpu__DOT__alu_pc-1] |
== 0x2f0f7fff)); |
} |
|
void wb_write(unsigned a, unsigned int v) { |
mvprintw(0,35, "%40s", ""); |
mvprintw(0,40, "wb_write(%d,%x)", a, v); |
m_core->i_dbg_cyc = 1; |
m_core->i_dbg_stb = 1; |
m_core->i_dbg_we = 1; |
m_core->i_dbg_addr = a & 1; |
m_core->i_dbg_data = v; |
|
tick(); |
while(m_core->o_dbg_stall) |
tick(); |
|
m_core->i_dbg_stb = 0; |
while(!m_core->o_dbg_ack) |
tick(); |
|
// Release the bus |
m_core->i_dbg_cyc = 0; |
m_core->i_dbg_stb = 0; |
tick(); |
mvprintw(0,35, "%40s", ""); |
mvprintw(0,40, "wb_write -- complete"); |
} |
|
unsigned long wb_read(unsigned a) { |
unsigned int v; |
mvprintw(0,35, "%40s", ""); |
mvprintw(0,40, "wb_read(0x%08x)", a); |
m_core->i_dbg_cyc = 1; |
m_core->i_dbg_stb = 1; |
m_core->i_dbg_we = 0; |
m_core->i_dbg_addr = a & 1; |
|
tick(); |
while(m_core->o_dbg_stall) |
tick(); |
|
m_core->i_dbg_stb = 0; |
while(!m_core->o_dbg_ack) |
tick(); |
v = m_core->o_dbg_data; |
|
// Release the bus |
m_core->i_dbg_cyc = 0; |
m_core->i_dbg_stb = 0; |
tick(); |
|
mvprintw(0,35, "%40s", ""); |
mvprintw(0,40, "wb_read = 0x%08x", v); |
|
return v; |
} |
|
}; |
|
|
int main(int argc, char **argv) { |
Verilated::commandArgs(argc, argv); |
ZIPPY_TB *tb = new ZIPPY_TB(); |
ZPARSER zp; |
|
printf("uCC = %d\n", (int)zp.ZIP_uCC); |
printf("MOV CC,R0 = 0x%08x\n", zp.op_mov(0,zp.ZIP_uCC, zp.ZIP_R0)); |
// = 0x200e8000 |
// Op = 0x2 |
// Result = 0x0, R0 (Supervisor/default) |
// Cond = 0x0 |
// BReg = 0xe (CC) |
// BMap = 1, BReg = uCC |
// |
|
initscr(); |
raw(); |
noecho(); |
keypad(stdscr, true); |
|
// mem[0x00000] = 0xbe000010; // Halt instruction |
unsigned int mptr = 0; |
/* |
tb->m_mem[mptr++] = 0x30000000; // 0: CLR R0 |
tb->m_mem[mptr++] = 0x21000000; // 1: MOV R0,R1 |
tb->m_mem[mptr++] = 0x22000001; // 2: MOV $1+R0,R2 |
tb->m_mem[mptr++] = 0x23000002; // 3: MOV $2+R0,R3 |
tb->m_mem[mptr++] = 0x24000022; // 4: MOV $22h+R0,R4 |
tb->m_mem[mptr++] = 0x25100377; // 5: MOV $377h+R0,uR5 |
tb->m_mem[mptr++] = 0x4e000000; // 6: NOOP |
tb->m_mem[mptr++] = 0xa0120000; // 7: ADD R2,R0 |
tb->m_mem[mptr++] = 0xa0000020; // 8: ADD $32,R0 |
tb->m_mem[mptr++] = 0xa00fffdf; // 9: ADD -$33,R0 |
tb->m_mem[mptr++] = 0xc02fffff; // A: NOT.Z R0 |
tb->m_mem[mptr++] = 0xc0100000; // B: CLRF R0 |
tb->m_mem[mptr++] = 0x31000005; // C: LDI $5,R1 |
tb->m_mem[mptr++] = 0x00110000; // D: CMP R0,R1 |
tb->m_mem[mptr++] = 0xc0afffff; // E: NOT.LT R0 |
tb->m_mem[mptr++] = 0xc1cfffff; // F: NOT.GE R1 |
tb->m_mem[mptr++] = 0x621ffff9; // 10: LOD $-7(PC),R2 |
tb->m_mem[mptr++] = 0x4f13dead; // 11: LODIHI $deadh,R3 |
tb->m_mem[mptr++] = 0x4f03beef; // 12: LODILO $beefh,R3 |
tb->m_mem[mptr++] = 0x731f0002; // 13: STO R3,$2(PC) |
*/ |
|
/* |
tb->m_mem[mptr++] = zp.op_clr(zp::ZIP_R12);// 0: CLR R12 |
tb->m_mem[mptr++] = 0x4f1cc000; // 1: LODIHI $c000h,R12 |
tb->m_mem[mptr++] = 0x2c1c0000; // 2: MOV R12,uR12 |
tb->m_mem[mptr++] = 0x2f1f000a; // 3: MOV $12+PC,uPC |
tb->m_mem[mptr++] = 0x4f108001; // 4: LODIHI $8001,R0 // Turn on trap |
tb->m_mem[mptr++] = 0x4f00ffff; // 5: LODILO $ffff,R0 // interrupts |
tb->m_mem[mptr++] = 0x701c0001; // 6: STO R0,$1(R12) |
tb->m_mem[mptr++] = 0xbe000020; // 7: RTU // Switch to user mode |
tb->m_mem[mptr++] = 0x601c0000; // 8: LOD (R12),R0 // Check the result |
tb->m_mem[mptr++] = 0x00000000; // A: CMP $0,R0 |
tb->m_mem[mptr++] = 0x2f4f0001; // B: BNZ $1+PC |
tb->m_mem[mptr++] = 0xbe000010; // C: HALT // On SUCCESS |
tb->m_mem[mptr++] = 0x2f0f7fff; // D: BRA PC-1 // On FAILURE |
*/ |
|
|
tb->m_mem[mptr++] = zp.op_clr(zp.ZIP_R0); // 0: CLR R0 |
tb->m_mem[mptr++] = zp.op_mov(zp.ZIP_R0,zp.ZIP_R1); // 1: MOV R0,R1 |
tb->m_mem[mptr++] = zp.op_mov(1,zp.ZIP_R0,zp.ZIP_R2); // 2: MOV $1+R0,R2 |
tb->m_mem[mptr++] = zp.op_mov(2,zp.ZIP_R0,zp.ZIP_R3); // 3: MOV $2+R0,R3 |
tb->m_mem[mptr++] = zp.op_mov(0x022, zp.ZIP_R0, zp.ZIP_R4); // 4: MOV $22h+R0,R4 |
tb->m_mem[mptr++] = zp.op_mov(0x377, zp.ZIP_R0, zp.ZIP_uR5); // 5: MOV $377h+R0,uR5 |
tb->m_mem[mptr++] = zp.op_noop(); // 6: NOOP |
tb->m_mem[mptr++] = zp.op_add(0,zp.ZIP_R2,zp.ZIP_R0); // 7: ADD R2,R0 |
tb->m_mem[mptr++] = zp.op_add(32,zp.ZIP_R0); // 8: ADD $32,R0 |
tb->m_mem[mptr++] = zp.op_add(-33,zp.ZIP_R0); // 9: ADD -$33,R0 |
tb->m_mem[mptr++] = zp.op_not(zp.ZIPC_Z, zp.ZIP_R0); // A: NOT.Z R0 |
tb->m_mem[mptr++] = zp.op_clrf(zp.ZIP_R0); // B: CLRF R0 |
tb->m_mem[mptr++] = zp.op_ldi(5,zp.ZIP_R1); // C: LDI $5,R1 |
tb->m_mem[mptr++] = zp.op_cmp(0,zp.ZIP_R0,zp.ZIP_R1); // D: CMP R0,R1 |
tb->m_mem[mptr++] = zp.op_not(zp.ZIPC_LT, zp.ZIP_R0); // E: NOT.LT R0 |
tb->m_mem[mptr++] = zp.op_not(zp.ZIPC_GE, zp.ZIP_R1); // F: NOT.GE R1 |
tb->m_mem[mptr++] = zp.op_lod(-7,zp.ZIP_PC, zp.ZIP_R2); // 10: LOD $-7(PC),R2 |
tb->m_mem[mptr++] = zp.op_ldihi(0xdead, zp.ZIP_R3); // 11: LODIHI $deadh,R3 |
tb->m_mem[mptr++] = zp.op_ldilo(0xbeef, zp.ZIP_R3); // 12: LODILO $beefh,R3 |
|
// Let's build a software test bench. |
tb->m_mem[mptr++] = zp.op_clr(zp.ZIP_R12);// 0: CLR R12 |
tb->m_mem[mptr++] = zp.op_ldihi(0xc000,zp.ZIP_R12); |
tb->m_mem[mptr++] = zp.op_mov(zp.ZIP_R12,zp.ZIP_uR12); |
tb->m_mem[mptr++] = zp.op_mov(10,zp.ZIP_PC,zp.ZIP_uPC); |
tb->m_mem[mptr++] = zp.op_clr(zp.ZIP_R0); // Clear R0, and disable ints |
tb->m_mem[mptr++] = zp.op_sto(zp.ZIP_R0,0,zp.ZIP_R12); |
tb->m_mem[mptr++] = zp.op_rtu(); // 7: RTU // Switch to user mode |
tb->m_mem[mptr++] = zp.op_mov(0,zp.ZIP_uCC, zp.ZIP_R0); // Check result |
tb->m_mem[mptr++] = zp.op_tst(-256,zp.ZIP_R0); |
tb->m_mem[mptr++] = zp.op_bnz(1); |
tb->m_mem[mptr++] = zp.op_halt();// On SUCCESS |
tb->m_mem[mptr++] = zp.op_busy(); // On FAILURE |
|
|
// Now for a series of tests. If the test fails, call the trap |
// interrupt with the test number that failed. Upon completion, |
// call the trap with #0. |
|
// Now for a series of tests. If the test fails, call the trap |
// interrupt with the test number that failed. Upon completion, |
// call the trap with #0. |
|
// Test LDI to PC |
// Some data registers |
tb->m_mem[mptr] = mptr + 5 + 0x0100000; mptr++; |
tb->m_mem[mptr++] = zp.op_ldi(0x020,zp.ZIP_CC); // LDI $GIE,CC |
tb->m_mem[mptr++] = zp.op_ldi(0x0200,zp.ZIP_R11); // LDI $200h,R11 |
tb->m_mem[mptr++] = zp.op_lod(-4,zp.ZIP_PC,zp.ZIP_PC); // 1: LOD $-3(PC),PC |
tb->m_mem[mptr++] = zp.op_clr(zp.ZIP_R11); // 2: CLR R11 |
tb->m_mem[mptr++] = zp.op_noop(); // 3: NOOP |
tb->m_mem[mptr++] = zp.op_cmp(0,zp.ZIP_R11); // 4: CMP $0,R11 |
tb->m_mem[mptr++] = zp.op_mov(zp.ZIPC_Z, 0, zp.ZIP_R11,zp.ZIP_R10); // 5: STO.Z R11,(R12) |
tb->m_mem[mptr++] = zp.op_mov(zp.ZIPC_Z, 0, zp.ZIP_R11,zp.ZIP_CC); // 5: STO.Z R11,(R12) |
tb->m_mem[mptr++] = zp.op_add(1,zp.ZIP_R0); // 6: ADD $1,R0 |
tb->m_mem[mptr++] = zp.op_add(1,zp.ZIP_R0); // 7: ADD $1,R0 |
|
// Let's test whether overflow works |
tb->m_mem[mptr++] = zp.op_ldi(0x0300,zp.ZIP_R11); // 0: LDI $3,R11 |
tb->m_mem[mptr++] = zp.op_ldi(-1,zp.ZIP_R0); // 1: LDI $-1,R0 |
tb->m_mem[mptr++] = zp.op_lsr(1,zp.ZIP_R0); // R0 // R0 = max int |
tb->m_mem[mptr++] = zp.op_add(1,zp.ZIP_R0); // Should set ovfl |
tb->m_mem[mptr++] = zp.op_bv(1); // 4: BV $1+PC |
tb->m_mem[mptr++] = zp.op_mov(0,zp.ZIP_R11, zp.ZIP_CC); // FAIL! if here |
// Overflow set from subtraction |
tb->m_mem[mptr++] = zp.op_ldi(0x0400,zp.ZIP_R11); // 6: LDI $4,R11 |
tb->m_mem[mptr++] = zp.op_ldi(1,zp.ZIP_R0); // 7: LDI $1,R0 |
tb->m_mem[mptr++] = 0x5000001f; // 8: ROL $31,R0 |
tb->m_mem[mptr++] = zp.op_sub(1,zp.ZIP_R0); // Should set ovfl |
tb->m_mem[mptr++] = zp.op_bv(1); // A: BV $1+PC |
tb->m_mem[mptr++] = zp.op_mov(0,zp.ZIP_R11, zp.ZIP_CC); // FAIL! if here |
// Overflow set from LSR |
tb->m_mem[mptr++] = zp.op_ldi(0x0500,zp.ZIP_R11); // C: LDI $5,R11 |
tb->m_mem[mptr++] = zp.op_ldi(1,zp.ZIP_R0); // D: LDI $1,R0 |
tb->m_mem[mptr++] = 0x5000001f; // E: ROL $31,R0 |
tb->m_mem[mptr++] = zp.op_lsr(1,zp.ZIP_R0); // F: LSR $1,R0 |
tb->m_mem[mptr++] = zp.op_bv(1); // A: BV $1+PC |
tb->m_mem[mptr++] = zp.op_mov(0,zp.ZIP_R11, zp.ZIP_CC); // FAIL! if here |
// Overflow set from LSL |
tb->m_mem[mptr++] = zp.op_ldi(0x0600,zp.ZIP_R11); // C: LDI $6,R11 |
tb->m_mem[mptr++] = zp.op_ldi(1,zp.ZIP_R0); // D: LDI $1,R0 |
tb->m_mem[mptr++] = 0x5000001e; // E: ROL $30,R0 |
tb->m_mem[mptr++] = zp.op_lsl(1,zp.ZIP_R0); // F: LSR $1,R0 |
tb->m_mem[mptr++] = zp.op_bv(1); // A: BV $1+PC |
tb->m_mem[mptr++] = zp.op_mov(0,zp.ZIP_R11, zp.ZIP_CC); // FAIL! if here |
// Overflow set from LSL, negative to positive |
tb->m_mem[mptr++] = zp.op_ldi(0x0700,zp.ZIP_R11); // C: LDI $7,R11 |
tb->m_mem[mptr++] = zp.op_ldi(1,zp.ZIP_R0); // D: LDI $1,R0 |
tb->m_mem[mptr++] = 0x5000001f; // E: ROL $30,R0 |
tb->m_mem[mptr++] = zp.op_lsl(1,zp.ZIP_R0); // F: LSR $1,R0 |
tb->m_mem[mptr++] = zp.op_bv(1); // A: BV $1+PC |
tb->m_mem[mptr++] = zp.op_mov(0,zp.ZIP_R11, zp.ZIP_CC); // FAIL! if here |
|
|
// Test carry |
tb->m_mem[mptr++] = zp.op_ldi(0x01000, zp.ZIP_R11); // 0: LDI $16,R11 |
tb->m_mem[mptr++] = zp.op_ldi(-1, zp.ZIP_R0); // 1: LDI $-1,R0 |
tb->m_mem[mptr++] = zp.op_add(1, zp.ZIP_R0); // 2: ADD $1,R0 |
tb->m_mem[mptr++] = zp.op_tst(2, zp.ZIP_CC); // 3: TST $2,CC // Is the carry set? |
tb->m_mem[mptr++] = zp.op_mov(zp.ZIPC_Z,0,zp.ZIP_R11, zp.ZIP_CC); // FAIL! if here |
// and carry from subtraction |
tb->m_mem[mptr++] = zp.op_ldi(0x01100, zp.ZIP_R11); // 0: LDI $17,R11 |
tb->m_mem[mptr++] = zp.op_sub(1, zp.ZIP_R0); // 1: SUB $1,R0 |
tb->m_mem[mptr++] = zp.op_tst(2, zp.ZIP_CC); // 2: TST $2,CC // Is the carry set? |
tb->m_mem[mptr++] = zp.op_mov(zp.ZIPC_Z,0,zp.ZIP_R11, zp.ZIP_CC); // FAIL! if here |
|
|
|
// Let's try a loop: for i=0; i<5; i++) |
// We'll use R0=i, Immediates for 5 |
tb->m_mem[mptr++] = zp.op_ldi(0x01200, zp.ZIP_R11); // 0: LDI $18,R11 |
tb->m_mem[mptr++] = zp.op_clr(zp.ZIP_R0); // 0: CLR R0 |
tb->m_mem[mptr++] = zp.op_noop(); |
tb->m_mem[mptr++] = zp.op_add(1, zp.ZIP_R0); // 2: R0 = R0 + 1 |
tb->m_mem[mptr++] = zp.op_cmp(5, zp.ZIP_R0); // 3: CMP $5,R0 |
tb->m_mem[mptr++] = zp.op_blt(-4); // 4: BLT PC-4 |
// |
// Let's try a reverse loop. Such loops are usually cheaper to |
// implement, and this one is no different: 2 loop instructions |
// (minus setup instructions) vs 3 from before. |
// R0 = 5; (from before) |
// do { |
// } while (R0 > 0); |
tb->m_mem[mptr++] = zp.op_ldi(0x01300, zp.ZIP_R11); // 0: LDI $18,R11 |
tb->m_mem[mptr++] = zp.op_noop(); // 5: NOOP |
tb->m_mem[mptr++] = zp.op_sub( 1, zp.ZIP_R0); // 6: R0 = R0 - 1 |
tb->m_mem[mptr++] = zp.op_bgt(-3); // 7: BGT PC-3 |
// How about the same thing with a >= comparison? |
// R1 = 5; // Need to do this explicitly |
// do { |
// } while(R1 >= 0); |
tb->m_mem[mptr++] = zp.op_ldi(0x01400, zp.ZIP_R11); // 0: LDI $18,R11 |
tb->m_mem[mptr++] = zp.op_ldi(5, zp.ZIP_R1); |
tb->m_mem[mptr++] = zp.op_noop(); |
tb->m_mem[mptr++] = zp.op_sub(1, zp.ZIP_R1); |
tb->m_mem[mptr++] = zp.op_bge(-3); |
|
// Let's try the reverse loop again, only this time we'll store our |
// loop variable in memory. |
// R0 = 5; (from before) |
// do { |
// } while (R0 > 0); |
tb->m_mem[mptr++] = zp.op_ldi(0x01500, zp.ZIP_R11); // 0: LDI $18,R11 |
tb->m_mem[mptr++] = zp.op_bra(1); // Give us a memory location |
tb->m_mem[mptr++] = 5; // Loop five times |
tb->m_mem[mptr++] = zp.op_mov(-2, zp.ZIP_PC, zp.ZIP_R1); // Get var adr |
tb->m_mem[mptr++] = zp.op_clr(zp.ZIP_R2); |
tb->m_mem[mptr++] = zp.op_ldi(5, zp.ZIP_R0); |
tb->m_mem[mptr++] = zp.op_sto(zp.ZIP_R0,0,zp.ZIP_R1); |
tb->m_mem[mptr++] = zp.op_add(1,zp.ZIP_R2); |
tb->m_mem[mptr++] = zp.op_add(14,zp.ZIP_R0); |
tb->m_mem[mptr++] = zp.op_lod(0,zp.ZIP_R1,zp.ZIP_R0); |
tb->m_mem[mptr++] = zp.op_sub( 1, zp.ZIP_R0); |
tb->m_mem[mptr++] = zp.op_bgt(-6); |
tb->m_mem[mptr++] = zp.op_cmp( 5, zp.ZIP_R2); |
tb->m_mem[mptr++] = zp.op_mov(zp.ZIPC_NZ, 0, zp.ZIP_R11, zp.ZIP_CC); |
|
// Return success / Test the trap interrupt |
tb->m_mem[mptr++] = zp.op_clr(zp.ZIP_R11); // 0: CLR R11 |
tb->m_mem[mptr++] = zp.op_mov(zp.ZIP_R11, zp.ZIP_CC); |
tb->m_mem[mptr++] = zp.op_noop(); // 2: NOOP // Give it a chance to take |
tb->m_mem[mptr++] = zp.op_noop(); // 3: NOOP // effect |
|
// Go into an infinite loop if the trap fails |
// Permanent loop instruction -- a busy halt if you will |
tb->m_mem[mptr++] = zp.op_busy(); // 4: BRA PC-1 |
|
// And, in case we miss a halt ... |
tb->m_mem[mptr++] = zp.op_halt(); // HALT |
|
tb->reset(); |
int chv = 'q'; |
const bool live_debug_mode = true; |
|
if (live_debug_mode) { |
bool done = false, halted = true, manual = true; |
|
halfdelay(1); |
tb->wb_write(CMD_REG, CMD_HALT | CMD_RESET); |
// while((tb->wb_read(CMD_REG) & (CMD_HALT|CMD_STALL))==(CMD_HALT|CMD_STALL)) |
// tb->show_state(); |
|
while(!done) { |
chv = getch(); |
switch(chv) { |
case 'h': case 'H': |
tb->wb_write(CMD_REG, CMD_HALT); |
if (!halted) |
erase(); |
halted = true; |
break; |
case 'g': case 'G': |
tb->wb_write(CMD_REG, 0); |
if (halted) |
erase(); |
halted = false; |
manual = false; |
break; |
case 'q': case 'Q': |
done = true; |
break; |
case 'r': case 'R': |
tb->wb_write(CMD_REG, CMD_RESET|CMD_HALT); |
halted = true; |
erase(); |
break; |
case 's': case 'S': |
tb->wb_write(CMD_REG, CMD_STEP); |
manual = false; |
break; |
case 't': case 'T': |
manual = true; |
tb->tick(); |
break; |
case ERR: |
default: |
if (!manual) |
tb->tick(); |
} |
|
if (manual) { |
tb->show_state(); |
} else if (halted) { |
if (tb->dbg_fp) |
fprintf(tb->dbg_fp, "\n\nREAD-STATE ******\n"); |
tb->read_state(); |
} else |
tb->show_state(); |
|
if (tb->m_core->i_rst) |
done =true; |
if (tb->bomb) |
done = true; |
} |
|
} else { // Manual stepping mode |
tb->show_state(); |
|
while('q' != tolower(chv = getch())) { |
tb->tick(); |
tb->show_state(); |
|
if (tb->test_success()) |
break; |
else if (tb->test_failure()) |
break; |
} |
} |
|
endwin(); |
|
if (tb->test_success()) |
printf("SUCCESS!\n"); |
else if (tb->test_failure()) |
printf("TEST FAILED!\n"); |
else if (chv == 'q') |
printf("chv = %c\n", chv); |
exit(0); |
} |
|
//////////////////////////////////////////////////////////////////////////// |
// |
// Filename: twoc.h |
// |
// Project: A Doubletime Pipelined FFT |
// |
// Purpose: Some various two's complement related C++ helper routines. |
// Specifically, these help extract signed numbers from |
// packed bitfields, while guaranteeing that the upper bits |
// are properly sign extended (or not) as desired. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
#ifndef TWOC_H |
#define TWOC_H |
|
extern long sbits(const long val, const int bits); |
extern unsigned long ubits(const long val, const int bits); |
|
#endif |
|
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: testb.h |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU core |
// |
// Purpose: A wrapper for a common interface to a clocked FPGA core |
// begin exercised in Verilator. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
#ifndef TESTB_H |
#define TESTB_H |
|
template <class VA> class TESTB { |
public: |
VA *m_core; |
unsigned long m_tickcount; |
|
TESTB(void) { m_core = new VA; } |
~TESTB(void) { delete m_core; m_core = NULL; } |
|
virtual void eval(void) { |
m_core->eval(); |
} |
|
virtual void tick(void) { |
m_core->i_clk = 0; |
eval(); |
m_core->i_clk = 1; |
eval(); |
|
m_tickcount++; |
} |
|
virtual void reset(void) { |
m_core->i_rst = 1; |
tick(); |
m_core->i_rst = 0; |
m_tickcount = 0l; |
printf("RESET\n"); |
} |
}; |
|
#endif |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: memsim.cpp |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU core |
// |
// Purpose: This creates a memory like device to act on a WISHBONE bus. |
// It doesn't exercise the bus thoroughly, but does give some |
// exercise to the bus to see whether or not the bus master |
// can control it. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
#include <stdio.h> |
#include <assert.h> |
#include "memsim.h" |
|
MEMSIM::MEMSIM(const unsigned int nwords) { |
unsigned int nxt; |
for(nxt=1; nxt < nwords; nxt<<=1) |
; |
m_len = nxt; m_mask = nxt-1; |
m_mem = new BUSW[m_len]; |
} |
|
MEMSIM::~MEMSIM(void) { |
delete[] m_mem; |
} |
|
void MEMSIM::load(const char *fname) { |
FILE *fp; |
unsigned int nr; |
|
fp = fopen(fname, "r"); |
if (!fp) { |
fprintf(stderr, "Could not open/load file \'%s\'\n", |
fname); |
perror("O/S Err:"); |
fprintf(stderr, "\tInitializing memory with zero instead.\n"); |
nr = 0; |
} else { |
nr = fread(m_mem, sizeof(BUSW), m_len, fp); |
fclose(fp); |
|
if (nr != m_len) { |
fprintf(stderr, "Only read %d of %d words\n", |
nr, m_len); |
fprintf(stderr, "\tFilling the rest with zero.\n"); |
} |
} |
|
for(; nr<m_len; nr++) |
m_mem[nr] = 0l; |
} |
|
void MEMSIM::apply(const unsigned char wb_cyc, |
const unsigned char wb_stb, const unsigned char wb_we, |
const BUSW wb_addr, const BUSW wb_data, |
unsigned char &o_ack, unsigned char &o_stall, BUSW &o_data) { |
if ((wb_cyc)&&(wb_stb)) { |
if (wb_we) |
m_mem[wb_addr & m_mask] = wb_data; |
o_ack = 1; |
o_stall= 0; |
o_data = m_mem[wb_addr & m_mask]; |
|
/* |
printf("MEMBUS -- ACK %s 0x%08x - 0x%08x\n", |
(wb_we)?"WRITE":"READ", |
wb_addr, o_data); |
*/ |
} else { |
o_ack = 0; |
o_stall = 0; |
} |
} |
|
|
//////////////////////////////////////////////////////////////////////////// |
// |
// Filename: twoc.cpp |
// |
// Project: A Doubletime Pipelined FFT |
// |
// Purpose: Some various two's complement related C++ helper routines. |
// Specifically, these help extract signed numbers from |
// packed bitfields, while guaranteeing that the upper bits |
// are properly sign extended (or not) as desired. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
#include "twoc.h" |
|
long sbits(const long val, const int bits) { |
long r; |
|
r = val & ((1l<<bits)-1); |
if (r & (1l << (bits-1))) |
r |= (-1l << bits); |
return r; |
} |
|
unsigned long ubits(const long val, const int bits) { |
unsigned long r = val & ((1l<<bits)-1); |
return r; |
} |
|
|
################################################################################ |
# |
# Filename: Makefile |
# |
# Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
# |
# Purpose: This makefile builds the final verilator simulation of the |
# zipsystem. Specifically, it builds the final C++ portion |
# of the simulator, and thus the final simulator executable. |
# |
# This simulator depends upon the ncurses library. |
# |
# |
# Creator: Dan Gisselquist, Ph.D. |
# Gisselquist Tecnology, LLC |
# |
################################################################################ |
# |
# Copyright (C) 2015, Gisselquist Technology, LLC |
# |
# This program is free software (firmware): you can redistribute it and/or |
# modify it under the terms of the GNU General Public License as published |
# by the Free Software Foundation, either version 3 of the License, or (at |
# your option) any later version. |
# |
# This program is distributed in the hope that it will be useful, but WITHOUT |
# ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
# for more details. |
# |
# License: GPL, v3, as defined and found on www.gnu.org, |
# http://www.gnu.org/licenses/gpl.html |
# |
# |
################################################################################ |
# |
all: zippy_tb |
|
CXX := g++ |
FLAGS := -Wall -Og -g |
ZASM := ../../sw/zasm |
INCS := -I../../rtl/obj_dir/ -I/usr/share/verilator/include -I../../sw/zasm |
SOURCES := zippy_tb.cpp memsim.cpp twoc.cpp $(ZASM)/zopcodes.cpp $(ZASM)/zparser.cpp |
RAWLIB := /usr/share/verilator/include/verilated.cpp ../../rtl/obj_dir/Vzipsystem__ALL.a |
LIBS := $(RAWLIB) -lncurses |
|
zippy_tb: $(SOURCES) $(RAWLIB) $(ZASM)/zopcodes.h $(ZASM)/zparser.h |
$(CXX) $(FLAGS) $(INCS) $(SOURCES) $(LIBS) -o $@ |
|
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: memsim.h |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU core |
// |
// Purpose: This creates a memory like device to act on a WISHBONE bus. |
// It doesn't exercise the bus thoroughly, but does give some |
// exercise to the bus to see whether or not the bus master |
// can control it. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// You should have received a copy of the GNU General Public License along |
// with this program. (It's in the $(ROOT)/doc directory, run make with no |
// target there if the PDF file isn't present.) If not, see |
// <http://www.gnu.org/licenses/> for a copy. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
#ifndef MEMSIM_H |
#define MEMSIM_H |
|
class MEMSIM { |
public: |
typedef unsigned int BUSW; |
typedef unsigned char uchar; |
|
BUSW *m_mem, m_len, m_mask; |
|
|
MEMSIM(const unsigned int nwords); |
~MEMSIM(void); |
void load(const char *fname); |
void apply(const uchar wb_cyc, const uchar wb_stb, const uchar wb_we, |
const BUSW wb_addr, const BUSW wb_data, |
uchar &o_ack, uchar &o_stall, BUSW &o_data); |
void operator()(const uchar wb_cyc, const uchar wb_stb, const uchar wb_we, |
const BUSW wb_addr, const BUSW wb_data, |
uchar &o_ack, uchar &o_stall, BUSW &o_data) { |
apply(wb_cyc, wb_stb, wb_we, wb_addr, wb_data, o_ack, o_stall, o_data); |
} |
BUSW &operator[](const BUSW addr) { return m_mem[addr&m_mask]; } |
}; |
|
#endif |
module memops(i_clk, i_rst, i_stb, |
i_op, i_addr, i_data, i_oreg, |
o_busy, o_valid, o_wreg, o_result, |
o_wb_cyc, o_wb_stb, o_wb_we, o_wb_addr, o_wb_data, |
i_wb_ack, i_wb_stall, i_wb_data); |
input i_clk, i_rst; |
input i_stb; |
// CPU interface |
input i_op; |
input [31:0] i_addr; |
input [31:0] i_data; |
input [4:0] i_oreg; |
// CPU outputs |
output wire o_busy; |
output reg o_valid; |
output reg [4:0] o_wreg; |
output reg [31:0] o_result; |
// Wishbone outputs |
output reg o_wb_cyc, o_wb_stb, o_wb_we; |
output reg [31:0] o_wb_addr, o_wb_data; |
// Wishbone inputs |
input i_wb_ack, i_wb_stall; |
input [31:0] i_wb_data; |
|
always @(posedge i_clk) |
if (i_rst) |
o_wb_cyc <= 1'b0; |
else if (o_wb_cyc) |
begin |
o_wb_stb <= (o_wb_stb)&&(i_wb_stall); |
o_wb_cyc <= (~i_wb_ack); |
end else if (i_stb) // New memory operation |
begin |
// Grab the wishbone |
o_wb_cyc <= 1'b1; |
o_wb_stb <= 1'b1; |
o_wb_we <= i_op; |
o_wb_data <= i_data; |
o_wb_addr <= i_addr; |
end |
|
initial o_valid = 1'b0; |
always @(posedge i_clk) |
o_valid <= (o_wb_cyc)&&(i_wb_ack)&&(~o_wb_we)&&(~i_rst); |
assign o_busy = o_wb_cyc; |
|
always @(posedge i_clk) |
if ((i_stb)&&(~o_wb_cyc)) |
o_wreg <= i_oreg; |
always @(posedge i_clk) |
if ((o_wb_cyc)&&(i_wb_ack)) |
o_result <= i_wb_data; |
endmodule |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: prefetch.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: This is a very simple instruction fetch approach. It gets |
// one instruction at a time. Future versions should pipeline |
// fetches and perhaps even cache results--this doesn't do that. |
// It should, however, be simple enough to get things running. |
// |
// The interface is fascinating. The 'i_pc' input wire is just |
// a suggestion of what to load. Other wires may be loaded |
// instead. i_pc is what must be output, not necessarily input. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Flash requires a minimum of 4 clocks per byte to read, so that would be |
// 4*(4bytes/32bit word) = 16 clocks per word read---and that's in pipeline |
// mode which this prefetch does not support. In non--pipelined mode, the |
// flash will require (16+6+6)*2 = 56 clocks plus 16 clocks per word read, |
// or 72 clocks to fetch one instruction. |
module prefetch(i_clk, i_rst, i_ce, i_pc, i_aux, |
o_i, o_pc, o_aux, o_valid, |
o_wb_cyc, o_wb_stb, o_wb_we, o_wb_addr, o_wb_data, |
i_wb_ack, i_wb_stall, i_wb_data); |
parameter AW = 1; |
input i_clk, i_rst, i_ce; |
input [31:0] i_pc; |
input [(AW-1):0] i_aux; |
output reg [31:0] o_i; |
output reg [31:0] o_pc; |
output reg [(AW-1):0] o_aux; |
output wire o_valid; |
// Wishbone outputs |
output reg o_wb_cyc, o_wb_stb; |
output wire o_wb_we; |
output reg [31:0] o_wb_addr; |
output wire [31:0] o_wb_data; |
// And return inputs |
input i_wb_ack, i_wb_stall; |
input [31:0] i_wb_data; |
|
assign o_wb_we = 1'b0; |
assign o_wb_data = 32'h0000; |
|
// Let's build it simple and upgrade later: For each instruction |
// we do one bus cycle to get the instruction. Later we should |
// pipeline this, but for now let's just do one at a time. |
initial o_wb_cyc = 1'b0; |
initial o_wb_stb = 1'b0; |
initial o_wb_addr= 0; |
always @(posedge i_clk) |
if (i_rst) |
begin |
o_wb_cyc <= 1'b0; |
if (o_wb_cyc) |
o_wb_addr <= 0; |
end else if ((i_ce)&&(~o_wb_cyc)&&(o_wb_addr == i_pc)) |
begin // Single value cache check |
o_aux <= i_aux; |
// o_i was already set during the last bus cycle |
end else if ((i_ce)&&(~o_wb_cyc)) // Initiate a bus cycle |
begin |
o_wb_cyc <= 1'b1; |
o_wb_stb <= 1'b1; |
o_wb_addr <= i_pc; |
o_aux <= i_aux; |
end else if (o_wb_cyc) // Independent of ce |
begin |
if ((o_wb_cyc)&&(o_wb_stb)&&(~i_wb_stall)) |
o_wb_stb <= 1'b0; |
if (i_wb_ack) |
o_wb_cyc <= 1'b0; |
end |
|
always @(posedge i_clk) |
if ((o_wb_cyc)&&(i_wb_ack)) |
o_i <= i_wb_data; |
always @(posedge i_clk) |
if ((o_wb_cyc)&&(i_wb_ack)) |
o_pc <= o_wb_addr; |
|
assign o_valid = (i_pc == o_pc)&&(i_aux == o_aux)&&(~o_wb_cyc); |
|
endmodule |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: regset.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
module pipefetch(i_clk, i_rst, i_new_pc, i_stall_n, i_pc, |
o_i, o_pc, o_v, |
o_wb_cyc, o_wb_stb, o_wb_we, o_wb_addr, o_wb_data, |
i_wb_ack, i_wb_stall, i_wb_data); |
parameter LGCACHELEN = 6, CACHELEN=(1<<LGCACHELEN), BUSW=32; |
input i_clk, i_rst, i_new_pc, i_stall_n; |
input [(BUSW-1):0] i_pc; |
output reg [(BUSW-1):0] o_i; |
output reg [(BUSW-1):0] o_pc; |
output wire o_v; |
// |
output reg o_wb_cyc, o_wb_stb; |
output wire o_wb_we; |
output reg [(BUSW-1):0] o_wb_addr; |
output wire [(BUSW-1):0] o_wb_data; |
// |
input i_wb_ack, i_wb_stall; |
input [(BUSW-1):0] i_wb_data; |
|
// Fixed bus outputs: we read from the bus only, never write. |
// Thus the output data is ... irrelevant and don't care. We set it |
// to zero just to set it to something. |
assign o_wb_we = 1'b0; |
assign o_wb_data = 0; |
|
reg [(BUSW-1):0] r_cache_base, r_cache_offset; |
reg [(LGCACHELEN):0] r_nvalid, r_acks_waiting; |
reg [(BUSW-1):0] cache[0:(CACHELEN-1)]; |
|
wire [(LGCACHELEN-1):0] c_cache_offset; |
assign c_cache_offset = r_cache_offset[(LGCACHELEN-1):0]; |
|
reg r_addr_set; |
reg [(BUSW-1):0] r_addr; |
|
wire [(BUSW-1):0] bus_nvalid; |
assign bus_nvalid = { {(BUSW-LGCACHELEN-1){1'b0}}, r_nvalid }; |
|
initial r_nvalid = 0; |
initial r_cache_base = 0; |
always @(posedge i_clk) |
begin |
if (i_rst) |
o_wb_cyc <= 1'b0; |
else if ((~o_wb_cyc)&&(i_new_pc)&&(r_nvalid != 0) |
&&(i_pc > r_cache_base) |
&&(i_pc < r_cache_base + bus_nvalid)) |
begin |
// The new instruction is in our cache, do nothing |
// with the bus here. |
end else if ((o_wb_cyc)&&(i_new_pc)&&(r_nvalid != 0) |
&&((i_pc < r_cache_base) |
||(i_pc >= r_cache_base + CACHELEN))) |
begin |
// We need to abandon our bus action to start over in |
// a new region, setting up a new cache. This may |
// happen mid cycle while waiting for a result. By |
// dropping o_wb_cyc, we state that we are no longer |
// interested in that result--whatever it might be. |
o_wb_cyc <= 1'b0; |
o_wb_stb <= 1'b0; |
end else if ((~o_wb_cyc)&&( |
((i_new_pc)&&((r_nvalid == 0) |
||(i_pc < r_cache_base) |
||(i_pc >= r_cache_base + CACHELEN))) |
||((r_addr_set)&&((r_addr < r_cache_base) |
||(r_addr >= r_cache_base + CACHELEN))) |
)) |
begin |
// Start a bus transaction |
o_wb_cyc <= 1'b1; |
o_wb_stb <= 1'b1; |
o_wb_addr <= (i_new_pc) ? i_pc : r_addr; |
r_acks_waiting <= 0; |
r_nvalid <= 0; |
r_cache_base <= (i_new_pc) ? i_pc : r_addr; |
r_cache_offset <= 0; |
end else if ((~o_wb_cyc)&&(r_addr_set) |
&&(r_addr >= r_cache_base |
+ (1<<(LGCACHELEN-2)) |
+ (1<<(LGCACHELEN-1)))) |
begin |
// If we're using the last quarter of the cache, then |
// let's start a bus transaction to extend the cache. |
o_wb_cyc <= 1'b1; |
o_wb_stb <= 1'b1; |
o_wb_addr <= r_cache_base + (1<<(LGCACHELEN)); |
r_acks_waiting <= 0; |
r_nvalid <= r_nvalid - (1<<(LGCACHELEN-2)); |
r_cache_base <= r_cache_base + (1<<(LGCACHELEN-2)); |
r_cache_offset <= r_cache_offset + (1<<(LGCACHELEN-2)); |
end else if (o_wb_cyc) |
begin |
// This handles everything ... but the case where |
// while reading we need to extend our cache. |
if ((o_wb_stb)&&(~i_wb_stall)) |
begin |
o_wb_addr <= o_wb_addr + 1; |
if (o_wb_addr - r_cache_base >= CACHELEN-1) |
o_wb_stb <= 1'b0; |
end |
|
if ((o_wb_stb)&&(~i_wb_stall)&&(~i_wb_ack)) |
r_acks_waiting <= r_acks_waiting |
+ ((i_wb_ack)? 0:1); |
else if ((i_wb_ack)&&((~o_wb_stb)||(i_wb_stall))) |
r_acks_waiting <= r_acks_waiting - 1; |
|
if (i_wb_ack) |
begin |
cache[r_nvalid[(LGCACHELEN-1):0]+c_cache_offset] <= i_wb_data; |
r_nvalid <= r_nvalid + 1; |
if ((r_acks_waiting == 1)&&(~o_wb_stb)) |
o_wb_cyc <= 1'b0; |
end |
end |
end |
|
initial r_addr_set = 1'b0; |
always @(posedge i_clk) |
if (i_rst) |
r_addr_set <= 1'b0; |
else if (i_new_pc) |
r_addr_set <= 1'b1; |
|
// Now, read from the cache |
wire w_cv; // Cache valid, address is in the cache |
reg r_cv; |
assign w_cv = ((r_nvalid != 0)&&(r_addr>=r_cache_base) |
&&(r_addr-r_cache_base < bus_nvalid)); |
always @(posedge i_clk) |
r_cv <= (~i_new_pc)&&(w_cv); |
assign o_v = (r_cv)&&(~i_new_pc); |
|
always @(posedge i_clk) |
if (i_new_pc) |
r_addr <= i_pc; |
else if ((i_stall_n)&&(w_cv)) |
r_addr <= r_addr + 1; |
|
wire [(LGCACHELEN-1):0] c_rdaddr, c_cache_base; |
assign c_cache_base = r_cache_base[(LGCACHELEN-1):0]; |
assign c_rdaddr = r_addr[(LGCACHELEN-1):0]-c_cache_base+c_cache_offset; |
always @(posedge i_clk) |
if (i_stall_n) |
o_i <= cache[c_rdaddr]; |
always @(posedge i_clk) |
if (i_stall_n) |
o_pc <= r_addr; |
|
|
endmodule |
/////////////////////////////////////////////////////////////////////////// |
// |
// Filename: cpuops.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
module cpuops(i_clk, i_rst, i_ce, i_valid, i_op, i_a, i_b, o_c, o_f, o_valid); |
input i_clk, i_rst, i_ce; |
input [3:0] i_op; |
input [31:0] i_a, i_b; |
input i_valid; |
output reg [31:0] o_c; |
output wire [3:0] o_f; |
output reg o_valid; |
|
wire [63:0] w_rol_tmp; |
assign w_rol_tmp = { i_a, i_a } << i_b[4:0]; |
wire [31:0] w_rol_result; |
assign w_rol_result = w_rol_tmp[63:32]; // Won't set flags |
|
wire z, n, v; |
reg c, pre_sign, set_ovfl; |
always @(posedge i_clk) |
if (i_ce) |
set_ovfl =((((i_op==4'h0)||(i_op==4'h8)) // SUB&CMP |
&&(i_a[31] != i_b[31])) |
||((i_op==4'ha)&&(i_a[31] == i_b[31])) // ADD |
||(i_op == 4'hd) // LSL |
||(i_op == 4'hf)); // LSR |
always @(posedge i_clk) |
if (i_ce) |
begin |
pre_sign <= (i_a[31]); |
c <= 1'b0; |
case(i_op) |
4'h0: { c, o_c } <= {(i_b>i_a),i_a - i_b};// CMP (SUB) |
4'h1: o_c <= i_a & i_b; // BTST (And) |
4'h2: o_c <= i_b; // MOV |
// 4'h3: o_c <= { i_b[15:0],i_a[15:6],6'h20};//TRAP |
// 4'h4: o_c <= i_a[15:0] * i_b[15:0]; |
4'h5: o_c <= w_rol_result; // ROL |
4'h6: o_c <= { i_a[31:16], i_b[15:0] }; // LODILO |
4'h7: o_c <= { i_b[15:0], i_a[15:0] }; // LODIHI |
4'h8: { c, o_c } <= {(i_b>i_a), i_a - i_b }; // Sub |
4'h9: o_c <= i_a & i_b; // And |
4'ha: { c, o_c } <= i_a + i_b; // Add |
4'hb: o_c <= i_a | i_b; // Or |
4'hc: o_c <= i_a ^ i_b; // Xor |
4'hd: { c, o_c } <= {1'b0, i_a } << i_b[4:0]; // LSL |
4'he: { c, o_c } <= { i_a[31],i_a}>> (i_b[4:0]);// ASR |
4'hf: { c, o_c } <= { 1'b0, i_a } >> (i_b[4:0]);// LSR |
default: o_c <= i_b; // MOV, LDI |
endcase |
end |
|
assign z = (o_c == 32'h0000); |
assign n = (o_c[31]); |
assign v = (set_ovfl)&&(pre_sign != o_c[31]); |
|
assign o_f = { v, n, c, z }; |
|
initial o_valid = 1'b0; |
always @(posedge i_clk) |
if (i_rst) |
o_valid <= 1'b0; |
else if (i_ce) |
o_valid <= i_valid; |
else if (~i_ce) |
o_valid <= 1'b0; |
endmodule |
/////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: zipcpu.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: This is the top level module holding the core of the Zip CPU |
// together. The Zip CPU is designed to be as simple as possible. |
// The instruction set is about as RISC as you can get, there are |
// only 16 instruction types supported (of which one isn't yet |
// supported ...) Please see the accompanying iset.html file |
// for a description of these instructions. |
// |
// All instructions are 32-bits wide. All bus accesses, both |
// address and data, are 32-bits over a wishbone bus. |
// |
// The Zip CPU is fully pipelined with the following pipeline stages: |
// |
// 1. Prefetch, returns the instruction from memory. On the |
// Basys board that I'm working on, one instruction may be |
// issued every 20 clocks or so, unless and until I implement a |
// cache or local memory. |
// |
// 2. Instruction Decode |
// |
// 3. Read Operands |
// |
// 4. Apply Instruction |
// |
// 4. Write-back Results |
// |
// A lot of difficult work has been placed into the pipeline stall |
// handling. My original proposal was not to allow pipeline stalls at all. |
// The idea would be that the CPU would just run every clock and whatever |
// stalled answer took place would just get fixed a clock or two later, |
// meaning that the compiler could just schedule everything out. |
// This idea died at the memory interface, which can take a variable |
// amount of time to read or write any value, thus the whole CPU needed |
// to stall on a stalled memory access. |
// |
// My next idea was to just let things complete. I.e., once an instrution |
// starts, it continues to completion no matter what and we go on. This |
// failed at writing the PC. If the PC gets written in something such as |
// a MOV PC,PC+5 instruction, 3 (or however long the pipeline is) clocks |
// later, if whether or not something happens in those clocks depends |
// upon the instruction fetch filling the pipeline, then the CPU has a |
// non-deterministic behavior. |
// |
// This leads to two possibilities: either *everything* stalls upon a |
// stall condition, or partial results need to be destroyed before |
// they are written. This is made more difficult by the fact that |
// once a command is written to the memory unit, whether it be a |
// read or a write, there is no undoing it--since peripherals on the |
// bus may act upon the answer with whatever side effects they might |
// have. (For example, writing a '1' to the interrupt register will |
// clear certain interrupts ...) Further, since the memory ops depend |
// upon conditions, the we'll need to wait for the condition codes to |
// be available before executing a memory op. Thus, memory ops can |
// proceed without stalling whenever either the previous instruction |
// doesn't write the flags register, or when the memory instruction doesn't |
// depend upon the flags register. |
// |
// The other possibility is that we leave independent instruction |
// execution behind, so that the pipeline is always full and stalls, |
// or moves forward, together on every clock. |
// |
// For now, we pick the first approach: independent instruction execution. |
// Thus, if stage 2 stalls, stages 3-5 may still complete the instructions |
// in their pipeline. This leaves another problem: what happens on a |
// MOV -1+PC,PC instruction? There will be four instructions behind this |
// one (or is it five?) that will need to be 'cancelled'. So here's |
// the plan: Anything can be cancelled before the ALU/MEM stage, |
// since memory ops cannot be canceled after being issued. Thus, the |
// ALU/MEM stage must stall if any prior instruction is going to write |
// the PC register (i.e. JMP). |
// |
// Further, let's define a "STALL" as a reason to not execute a stage |
// due to some condition at or beyond the stage, and let's define |
// a VALID flag to mean that this stage has completed. Thus, the clock |
// enable for a stage is (STG[n-1]VALID)&&((~STG[n]VALID)||(~STG[n]STALL)). |
// The ALU/MEM stages will also depend upon a master clock enable |
// (~SLEEP) condition as well. |
// |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////////// |
// |
`define CPU_PC_REG 4'hf |
`define CPU_CC_REG 4'he |
`define CPU_BREAK_BIT 7 |
`define CPU_STEP_BIT 6 |
`define CPU_GIE_BIT 5 |
`define CPU_SLEEP_BIT 4 |
module zipcpu(i_clk, i_rst, i_interrupt, |
// Debug interface |
i_halt, i_dbg_reg, i_dbg_we, i_dbg_data, |
o_dbg_stall, o_dbg_reg, |
o_break, |
// CPU interface to the wishbone bus |
o_wb_cyc, o_wb_stb, o_wb_we, o_wb_addr, o_wb_data, |
i_wb_ack, i_wb_stall, i_wb_data, |
// Accounting/CPU usage interface |
o_mem_stall, o_pf_stall, o_alu_stall); |
parameter RESET_ADDRESS=32'h0100000; |
input i_clk, i_rst, i_interrupt; |
// Debug interface -- inputs |
input i_halt; |
input [4:0] i_dbg_reg; |
input i_dbg_we; |
input [31:0] i_dbg_data; |
// Debug interface -- outputs |
output reg o_dbg_stall; |
output reg [31:0] o_dbg_reg; |
output wire o_break; |
// Wishbone interface -- outputs |
output wire o_wb_cyc, o_wb_stb, o_wb_we; |
output wire [31:0] o_wb_addr, o_wb_data; |
// Wishbone interface -- inputs |
input i_wb_ack, i_wb_stall; |
input [31:0] i_wb_data; |
// Accounting outputs ... to help us count stalls and usage |
output wire o_mem_stall; |
output wire o_pf_stall; |
output wire o_alu_stall; |
|
|
// Registers |
reg [31:0] regset [0:31]; |
reg [3:0] flags, iflags; // (BREAKEN,STEP,GIE,SLEEP ), V, N, C, Z |
wire master_ce; |
wire [7:0] w_uflags, w_iflags; |
reg step, gie, sleep, break_en; |
|
wire [4:0] mem_wreg; |
wire mem_busy, mem_rdbusy; |
|
reg [31:0] pf_pc; |
reg new_pc; |
|
// |
// |
// PIPELINE STAGE #1 :: Prefetch |
// Variable declarations |
// |
wire pf_ce, dcd_stalled; |
wire pf_cyc, pf_stb, pf_we, pf_busy, pf_ack, pf_stall; |
wire [31:0] pf_addr, pf_data; |
wire [31:0] instruction, instruction_pc; |
wire pf_valid, instruction_gie; |
|
// |
// |
// PIPELINE STAGE #2 :: Instruction Decode |
// Variable declarations |
// |
// |
reg opvalid, op_wr_pc, op_break; |
wire op_stall, dcd_ce; |
reg [3:0] dcdOp; |
reg [4:0] dcdA, dcdB; |
reg [3:0] dcdF; |
reg dcdA_rd, dcdA_wr, dcdB_rd, dcdvalid, |
dcdM, dcdF_wr, dcd_gie, dcd_break; |
reg [31:0] dcd_pc; |
reg [23:0] r_dcdI; |
wire dcdA_stall, dcdB_stall, dcdF_stall; |
|
|
|
// |
// |
// PIPELINE STAGE #3 :: Read Operands |
// Variable declarations |
// |
// |
// |
// Now, let's read our operands |
reg [4:0] alu_reg; |
reg [3:0] opn; |
reg [4:0] opR; |
reg [1:0] opA_cc, opB_cc; |
reg [31:0] r_opA, r_opB, op_pc; |
wire [31:0] opA_nowait, opB_nowait, opA, opB; |
reg opR_wr, opM, opF_wr, op_gie, |
opA_rd, opB_rd; |
reg [7:0] opFl; |
// reg [6:0] r_opF; |
wire [8:0] opF; |
wire op_ce; |
|
|
|
// |
// |
// PIPELINE STAGE #4 :: ALU / Memory |
// Variable declarations |
// |
// |
reg [31:0] alu_pc; |
reg alu_pc_valid;; |
wire alu_ce, alu_stall; |
wire [31:0] alu_result; |
wire [3:0] alu_flags; |
wire alu_valid; |
wire set_cond; |
reg alu_wr, alF_wr, alu_gie; |
|
|
|
wire mem_ce, mem_stalled; |
wire mem_valid, mem_ack, mem_stall, |
mem_cyc, mem_stb, mem_we; |
wire [31:0] mem_addr, mem_data, mem_result; |
|
|
|
// |
// |
// PIPELINE STAGE #5 :: Write-back |
// Variable declarations |
// |
wire wr_reg_ce, wr_flags_ce, wr_write_pc; |
wire [4:0] wr_reg_id; |
wire [31:0] wr_reg_vl; |
wire w_switch_to_interrupt, w_release_from_interrupt; |
reg [31:0] upc, ipc; |
|
|
|
// |
// MASTER: clock enable. |
// |
assign master_ce = (~i_halt)&&(~o_break)&&(~sleep)&&(~mem_rdbusy); |
|
|
// |
// PIPELINE STAGE #1 :: Prefetch |
// Calculate stall conditions |
assign pf_ce = (~dcd_stalled); |
|
// |
// PIPELINE STAGE #2 :: Instruction Decode |
// Calculate stall conditions |
assign dcd_ce = (pf_valid)&&(~dcd_stalled); |
assign dcd_stalled = (dcdvalid)&&( |
(op_stall) |
||((dcdA_stall)||(dcdB_stall)||(dcdF_stall)) |
||((opvalid)&&(op_wr_pc))); |
// |
// PIPELINE STAGE #3 :: Read Operands |
// Calculate stall conditions |
assign op_stall = (opvalid)&&( |
((mem_stalled)&&(opM)) |
||((alu_stall)&&(~opM))); |
assign op_ce = (dcdvalid)&&((~opvalid)||(~op_stall)); |
|
// |
// PIPELINE STAGE #4 :: ALU / Memory |
// Calculate stall conditions |
assign alu_stall = (((~master_ce)||(mem_rdbusy))&&(opvalid)&&(~opM)) |
||((opvalid)&&(wr_reg_ce)&&(wr_reg_id == { op_gie, `CPU_PC_REG })); |
assign alu_ce = (master_ce)&&(opvalid)&&(~opM)&&(~alu_stall)&&(~new_pc); |
// |
assign mem_ce = (master_ce)&&(opvalid)&&(opM)&&(~mem_stalled)&&(~new_pc)&&(set_cond); |
assign mem_stalled = (mem_busy)||((opvalid)&&(opM)&&( |
(~master_ce) |
// Stall waiting for flags to be valid |
||((~opF[8])&&( |
((wr_reg_ce)&&(wr_reg_id[4:0] == {op_gie,`CPU_CC_REG})))) |
// Do I need this last condition? |
//||((wr_flags_ce)&&(alu_gie==op_gie)))) |
// Or waiting for a write to the PC register |
||((wr_reg_ce)&&(wr_reg_id[4] == op_gie)&&(wr_write_pc)))); |
|
|
// |
// |
// PIPELINE STAGE #1 :: Prefetch |
// |
// |
`ifdef SINGLE_FETCH |
prefetch pf(i_clk, i_rst, (pf_ce), pf_pc, gie, |
instruction, instruction_pc, instruction_gie, |
pf_valid, |
pf_cyc, pf_stb, pf_we, pf_addr, |
pf_data, |
pf_ack, pf_stall, i_wb_data); |
`else // Pipe fetch |
pipefetch pf(i_clk, i_rst, new_pc, ~dcd_stalled, pf_pc, |
instruction, instruction_pc, pf_valid, |
pf_cyc, pf_stb, pf_we, pf_addr, pf_data, |
pf_ack, pf_stall, i_wb_data); |
assign instruction_gie = gie; |
`endif |
|
always @(posedge i_clk) |
if (i_rst) |
dcdvalid <= 1'b0; |
else if (dcd_ce) |
dcdvalid <= (~new_pc); |
else if ((~dcd_stalled)||(new_pc)) |
dcdvalid <= 1'b0; |
|
always @(posedge i_clk) |
if (dcd_ce) |
begin |
dcd_pc <= instruction_pc+1; |
|
// Record what operation we are doing |
dcdOp <= instruction[31:28]; |
|
// Default values |
dcdA[4:0] <= { instruction_gie, instruction[27:24] }; |
dcdB[4:0] <= { instruction_gie, instruction[19:16] }; |
dcdM <= 1'b0; |
dcdF_wr <= 1'b1; |
dcd_break <= 1'b0; |
|
// Set the condition under which we do this operation |
// The top four bits are a mask, the bottom four the |
// value the flags must equal once anded with the mask |
dcdF <= { (instruction[23:21]==3'h0), instruction[23:21] }; |
casez(instruction[31:28]) |
4'h2: begin // Move instruction |
if (~instruction_gie) |
begin |
dcdA[4] <= instruction[20]; |
dcdB[4] <= instruction[15]; |
end |
dcdA_wr <= 1'b1; |
dcdA_rd <= 1'b0; |
dcdB_rd <= 1'b1; |
r_dcdI <= { {(9){instruction[14]}}, instruction[14:0] }; |
dcdF_wr <= 1'b0; // Don't write flags |
end |
4'h3: begin // Load immediate |
dcdA_wr <= 1'b1; |
dcdA_rd <= 1'b0; |
dcdB_rd <= 1'b0; |
r_dcdI <= { instruction[23:0] }; |
dcdF_wr <= 1'b0; // Don't write flags |
dcdF <= 4'h8; // This is unconditional |
dcdOp <= 4'h2; |
end |
4'h4: begin // Load immediate special |
dcdF_wr <= 1'b0; // Don't write flags |
r_dcdI <= { 8'h00, instruction[15:0] }; |
if (instruction[27:24] == 4'he) |
begin |
// NOOP instruction |
dcdA_wr <= 1'b0; |
dcdA_rd <= 1'b0; |
dcdB_rd <= 1'b0; |
dcdOp <= 4'h2; |
dcd_break <= 1'b1;//Could be a break ins |
end else if (instruction[27:24] == 4'hf) |
begin // Load partial immediate(s) |
dcdA_wr <= 1'b1; |
dcdA_rd <= 1'b1; |
dcdB_rd <= 1'b0; |
dcdA[4:0] <= { instruction_gie, instruction[19:16] }; |
dcdOp <= { 3'h3, instruction[20] }; |
end else begin |
; // Multiply instruction place holder |
end end |
4'b011?: begin // Load/Store |
dcdF_wr <= 1'b0; // Don't write flags |
dcdA_wr <= (~instruction[28]); // Write on loads |
dcdA_rd <= (instruction[28]); // Read on stores |
dcdB_rd <= instruction[20]; |
if (instruction[20]) |
r_dcdI <= { {(8){instruction[15]}}, instruction[15:0] }; |
else |
r_dcdI <= { {(4){instruction[19]}}, instruction[19:0] }; |
dcdM <= 1'b1; // Memory operation |
end |
default: begin |
dcdA <= { instruction_gie, instruction[27:24] }; |
dcdB <= { instruction_gie, instruction[19:16] }; |
dcdA_wr <= (instruction[31])||(instruction[31:28]==4'h5); |
dcdA_rd <= 1'b1; |
dcdB_rd <= instruction[20]; |
if (instruction[20]) |
r_dcdI <= { {(8){instruction[15]}}, instruction[15:0] }; |
else |
r_dcdI <= { {(4){instruction[19]}}, instruction[19:0] }; |
end |
endcase |
|
|
dcd_gie <= instruction_gie; |
end |
|
|
// |
// |
// PIPELINE STAGE #3 :: Read Operands (Registers) |
// |
// |
|
always @(posedge i_clk) |
if (op_ce) // &&(dcdvalid)) |
begin |
if ((wr_reg_ce)&&(wr_reg_id == dcdA)) |
r_opA <= wr_reg_vl; |
else if (dcdA == { dcd_gie, `CPU_PC_REG }) |
r_opA <= dcd_pc; |
else if (dcdA[3:0] == `CPU_PC_REG) |
r_opA <= (dcdA[4])?upc:ipc; |
else |
r_opA <= regset[dcdA]; |
end |
wire [31:0] dcdI; |
assign dcdI = { {(8){r_dcdI[23]}}, r_dcdI }; |
always @(posedge i_clk) |
if (op_ce) // &&(dcdvalid)) |
begin |
if (~dcdB_rd) |
r_opB <= dcdI; |
else if ((wr_reg_ce)&&(wr_reg_id == dcdB)) |
r_opB <= wr_reg_vl + dcdI; |
else if (dcdB == { dcd_gie, `CPU_PC_REG }) |
r_opB <= dcd_pc + dcdI; |
else if (dcdB[3:0] == `CPU_PC_REG) |
r_opB <= ((dcdB[4])?upc:ipc) + dcdI; |
else |
r_opB <= regset[dcdB] + dcdI; |
end |
|
// The logic here has become more complex than it should be, no thanks |
// to Xilinx's Vivado trying to help. The conditions are supposed to |
// be two sets of four bits: the top bits specify what bits matter, the |
// bottom specify what those top bits must equal. However, two of |
// conditions check whether bits are on, and those are the only two |
// conditions checking those bits. Therefore, Vivado complains that |
// these two bits are redundant. Hence the convoluted expression |
// below, arriving at what we finally want in the (now wire net) |
// opF. |
`ifdef NEWCODE |
always @(posedge i_clk) |
if (op_ce) |
begin // Set the flag condition codes |
case(dcdF[2:0]) |
3'h0: r_opF <= 7'h80; // Always |
3'h1: r_opF <= 7'h11; // Z |
3'h2: r_opF <= 7'h10; // NE |
3'h3: r_opF <= 7'h20; // GE (!N) |
3'h4: r_opF <= 7'h30; // GT (!N&!Z) |
3'h5: r_opF <= 7'h24; // LT |
3'h6: r_opF <= 7'h02; // C |
3'h7: r_opF <= 7'h08; // V |
endcase |
end |
assign opF = { r_opF[6], r_opF[3], r_opF[5], r_opF[1], r_opF[4:0] }; |
`else |
always @(posedge i_clk) |
if (op_ce) |
begin // Set the flag condition codes |
case(dcdF[2:0]) |
3'h0: opF <= 9'h100; // Always |
3'h1: opF <= 9'h011; // Z |
3'h2: opF <= 9'h010; // NE |
3'h3: opF <= 9'h040; // GE (!N) |
3'h4: opF <= 9'h050; // GT (!N&!Z) |
3'h5: opF <= 9'h044; // LT |
3'h6: opF <= 9'h022; // C |
3'h7: opF <= 9'h088; // V |
endcase |
end |
`endif |
|
always @(posedge i_clk) |
if (i_rst) |
opvalid <= 1'b0; |
else if (op_ce) |
// Do we have a valid instruction? |
// The decoder may vote to stall one of its |
// instructions based upon something we currently |
// have in our queue. This instruction must then |
// move forward, and get a stall cycle inserted. |
// Hence, the test on dcd_stalled here. If we must |
// wait until our operands are valid, then we aren't |
// valid yet until then. |
opvalid<= (~new_pc)&&(dcdvalid)&&(~dcd_stalled); |
else if ((~op_stall)||(new_pc)) |
opvalid <= 1'b0; |
|
// Here's part of our debug interface. When we recognize a break |
// instruction, we set the op_break flag. That'll prevent this |
// instruction from entering the ALU, and cause an interrupt before |
// this instruction. Thus, returning to this code will cause the |
// break to repeat and continue upon return. To get out of this |
// condition, replace the break instruction with what it is supposed |
// to be, step through it, and then replace it back. In this fashion, |
// a debugger can step through code. |
always @(posedge i_clk) |
if (i_rst) |
op_break <= 1'b0; |
else if (op_ce) |
op_break <= (dcd_break)&&(r_dcdI[15:0] == 16'h0001); |
else if ((~op_stall)||(new_pc)) |
op_break <= 1'b0; |
|
always @(posedge i_clk) |
if (op_ce) |
begin |
opn <= dcdOp; // Which ALU operation? |
opM <= dcdM; // Is this a memory operation? |
// Will we write the flags/CC Register with our result? |
opF_wr <= dcdF_wr; |
// Will we be writing our results into a register? |
opR_wr <= dcdA_wr; |
// What register will these results be written into? |
opR <= dcdA; |
// User level (1), vs supervisor (0)/interrupts disabled |
op_gie <= dcd_gie; |
|
// We're not done with these yet--we still need them |
// for the unclocked assign. We need the unclocked |
// assign so that there's no wait state between an |
// ALU or memory result and the next register that may |
// use that value. |
opA_cc <= {dcdA[4], (dcdA[3:0] == `CPU_CC_REG) }; |
opA_rd <= dcdA_rd; |
opB_cc <= {dcdB[4], (dcdB[3:0] == `CPU_CC_REG) }; |
opB_rd <= dcdB_rd; |
op_pc <= dcd_pc; |
// |
op_wr_pc <= ((dcdA_wr)&&(dcdA[3:0] == `CPU_PC_REG)); |
end |
assign opFl = (op_gie)?(w_uflags):(w_iflags); |
|
// This is tricky. First, the PC and Flags registers aren't kept in |
// register set but in special registers of their own. So step one |
// is to select the right register. Step to is to replace that |
// register with the results of an ALU or memory operation, if such |
// results are now available. Otherwise, we'd need to insert a wait |
// state of some type. |
// |
// The alternative approach would be to define some sort of |
// op_stall wire, which would stall any upstream stage. |
// We'll create a flag here to start our coordination. Once we |
// define this flag to something other than just plain zero, then |
// the stalls will already be in place. |
assign dcdA_stall = (dcdvalid)&&(dcdA_rd)&& |
(((opvalid)&&(opR_wr)&&(opR == dcdA)) |
||((mem_busy)&&(~mem_we)&&(mem_wreg == dcdA)) |
||((mem_valid)&&(mem_wreg == dcdA))); |
assign dcdB_stall = (dcdvalid)&&(dcdB_rd) |
&&(((opvalid)&&(opR_wr)&&(opR == dcdB)) |
||((mem_busy)&&(~mem_we)&&(mem_wreg == dcdB)) |
||((mem_valid)&&(mem_wreg == dcdB))); |
assign dcdF_stall = (dcdvalid)&&(((dcdF[3]) |
||(dcdA[3:0]==`CPU_CC_REG) |
||(dcdB[3:0]==`CPU_CC_REG)) |
&&((opvalid)&&(opR[3:0] == `CPU_CC_REG)) |
||((dcdF[3])&&(dcdM)&&(opvalid)&&(opF_wr))); |
assign opA = { r_opA[31:8], ((opA_cc[0]) ? |
((opA_cc[1])?w_uflags:w_iflags) : r_opA[7:0]) }; |
assign opB = { r_opB[31:8], ((opB_cc[0]) ? |
((opA_cc[1])?w_uflags:w_iflags) : r_opB[7:0]) }; |
|
// |
// |
// PIPELINE STAGE #4 :: Apply Instruction |
// |
// |
cpuops doalu(i_clk, i_rst, alu_ce, |
(opvalid)&&(~opM), opn, opA, opB, |
alu_result, alu_flags, alu_valid); |
|
assign set_cond = ((opF[7:4]&opFl[3:0])==opF[3:0]); |
initial alF_wr = 1'b0; |
initial alu_wr = 1'b0; |
always @(posedge i_clk) |
if (i_rst) |
begin |
alu_wr <= 1'b0; |
alF_wr <= 1'b0; |
end else if (alu_ce) |
begin |
alu_reg <= opR; |
alu_wr <= (opR_wr)&&(set_cond); |
alF_wr <= (opF_wr)&&(set_cond); |
end else begin |
// These are strobe signals, so clear them if not |
// set for any particular clock |
alu_wr <= 1'b0; |
alF_wr <= 1'b0; |
end |
always @(posedge i_clk) |
if ((alu_ce)||(mem_ce)) |
alu_gie <= op_gie; |
always @(posedge i_clk) |
if ((alu_ce)||(mem_ce)) |
alu_pc <= op_pc; |
initial alu_pc_valid = 1'b0; |
always @(posedge i_clk) |
alu_pc_valid <= (~i_rst)&&(master_ce)&&(opvalid)&&(~new_pc) |
&&((~opM) |
||(~mem_stalled)); |
|
memops domem(i_clk, i_rst, mem_ce, |
(opn[0]), opB, opA, opR, |
mem_busy, mem_valid, mem_wreg, mem_result, |
mem_cyc, mem_stb, mem_we, mem_addr, mem_data, |
mem_ack, mem_stall, i_wb_data); |
assign mem_rdbusy = ((mem_cyc)&&(~mem_we)); |
|
// Either the prefetch or the instruction gets the memory bus, but |
// never both. |
wbarbiter #(32,32) pformem(i_clk, i_rst, |
// Prefetch access to the arbiter |
pf_addr, pf_data, pf_we, pf_stb, pf_cyc, pf_ack, pf_stall, |
// Memory access to the arbiter |
mem_addr, mem_data, mem_we, mem_stb, mem_cyc, mem_ack, mem_stall, |
// Common wires, in and out, of the arbiter |
o_wb_addr, o_wb_data, o_wb_we, o_wb_stb, o_wb_cyc, i_wb_ack, |
i_wb_stall); |
|
// |
// |
// PIPELINE STAGE #5 :: Write-back results |
// |
// |
// This stage is not allowed to stall. If results are ready to be |
// written back, they are written back at all cost. Sleepy CPU's |
// won't prevent write back, nor debug modes, halting the CPU, nor |
// anything else. Indeed, the (master_ce) bit is only as relevant |
// as knowinig something is available for writeback. |
|
// |
// Write back to our generic register set ... |
// When shall we write back? On one of two conditions |
// Note that the flags needed to be checked before issuing the |
// bus instruction, so they don't need to be checked here. |
// Further, alu_wr includes (set_cond), so we don't need to |
// check for that here either. |
assign wr_reg_ce = ((alu_wr)&&(alu_valid))||(mem_valid); |
// Which register shall be written? |
assign wr_reg_id = (alu_wr)?alu_reg:mem_wreg; |
// Are we writing to the PC? |
assign wr_write_pc = (wr_reg_id[3:0] == `CPU_PC_REG); |
// What value to write? |
assign wr_reg_vl = (alu_wr)?alu_result:mem_result; |
always @(posedge i_clk) |
if (wr_reg_ce) |
regset[wr_reg_id] <= wr_reg_vl; |
|
// |
// Write back to the condition codes/flags register ... |
// When shall we write to our flags register? alF_wr already |
// includes the set condition ... |
assign wr_flags_ce = (alF_wr)&&(alu_valid); |
assign w_uflags = { 1'b0, step, 1'b1, sleep, ((wr_flags_ce)&&(alu_gie))?alu_flags:flags }; |
assign w_iflags = { break_en, 1'b0, 1'b0, sleep, ((wr_flags_ce)&&(~alu_gie))?alu_flags:iflags }; |
// What value to write? |
always @(posedge i_clk) |
// If explicitly writing the register itself |
if ((wr_reg_ce)&&(wr_reg_id[4:0] == { 1'b1, `CPU_CC_REG })) |
flags <= wr_reg_vl[3:0]; |
// Otherwise if we're setting the flags from an ALU operation |
else if ((wr_flags_ce)&&(alu_gie)) |
flags <= alu_flags; |
else if ((i_halt)&&(i_dbg_we) |
&&(i_dbg_reg == { 1'b1, `CPU_CC_REG })) |
flags <= i_dbg_data[3:0]; |
|
always @(posedge i_clk) |
if ((wr_reg_ce)&&(wr_reg_id[4:0] == { 1'b0, `CPU_CC_REG })) |
iflags <= wr_reg_vl[3:0]; |
else if ((wr_flags_ce)&&(~alu_gie)) |
iflags <= alu_flags; |
else if ((i_halt)&&(i_dbg_we) |
&&(i_dbg_reg == { 1'b0, `CPU_CC_REG })) |
iflags <= i_dbg_data[3:0]; |
|
// The 'break' enable bit. This bit can only be set from supervisor |
// mode. It control what the CPU does upon encountering a break |
// instruction. |
// |
// The goal, upon encountering a break is that the CPU should stop and |
// not execute the break instruction, choosing instead to enter into |
// either interrupt mode or halt first. |
// if ((break_en) AND (break_instruction)) // user mode or not |
// HALT CPU |
// else if (break_instruction) // only in user mode |
// set an interrupt flag, go to supervisor mode |
// allow supervisor to step the CPU. |
// Upon a CPU halt, any break condition will be reset. The |
// external debugger will then need to deal with whatever |
// condition has taken place. |
initial break_en = 1'b0; |
always @(posedge i_clk) |
if ((i_rst)||(i_halt)) |
break_en <= 1'b0; |
else if ((wr_reg_ce)&&(wr_reg_id[4:0] == {1'b0, `CPU_CC_REG})) |
break_en <= wr_reg_vl[`CPU_BREAK_BIT]; |
assign o_break = (break_en)&&(op_break); |
|
|
// The sleep register. Setting the sleep register causes the CPU to |
// sleep until the next interrupt. Setting the sleep register within |
// interrupt mode causes the processor to halt until a reset. This is |
// a panic/fault halt. |
always @(posedge i_clk) |
if ((i_rst)||((i_interrupt)&&(gie))) |
sleep <= 1'b0; |
else if ((wr_reg_ce)&&(wr_reg_id[3:0] == `CPU_CC_REG)) |
sleep <= wr_reg_vl[`CPU_SLEEP_BIT]; |
else if ((i_halt)&&(i_dbg_we) |
&&(i_dbg_reg == { 1'b1, `CPU_CC_REG })) |
sleep <= i_dbg_data[`CPU_SLEEP_BIT]; |
|
always @(posedge i_clk) |
if ((i_rst)||(w_switch_to_interrupt)) |
step <= 1'b0; |
else if ((wr_reg_ce)&&(~alu_gie)&&(wr_reg_id[4:0] == {1'b1,`CPU_CC_REG})) |
step <= wr_reg_vl[`CPU_STEP_BIT]; |
else if ((i_halt)&&(i_dbg_we) |
&&(i_dbg_reg == { 1'b1, `CPU_CC_REG })) |
step <= i_dbg_data[`CPU_STEP_BIT]; |
else if ((master_ce)&&(alu_pc_valid)&&(step)&&(gie)) |
step <= 1'b0; |
|
// The GIE register. Only interrupts can disable the interrupt register |
assign w_switch_to_interrupt = (gie)&&( |
// On interrupt (obviously) |
(i_interrupt) |
// If we are stepping the CPU |
||((master_ce)&&(alu_pc_valid)&&(step)) |
// If we encounter a break instruction, if the break |
// enable isn't not set. |
||((master_ce)&&(op_break)) |
// If we write to the CC register |
||((wr_reg_ce)&&(~wr_reg_vl[`CPU_GIE_BIT]) |
&&(wr_reg_id[4:0] == { 1'b1, `CPU_CC_REG })) |
// Or if, in debug mode, we write to the CC register |
||((i_halt)&&(i_dbg_we)&&(~i_dbg_data[`CPU_GIE_BIT]) |
&&(i_dbg_reg == { 1'b1, `CPU_CC_REG})) |
); |
assign w_release_from_interrupt = (~gie)&&(~i_interrupt) |
// Then if we write the CC register |
&&(((wr_reg_ce)&&(wr_reg_vl[`CPU_GIE_BIT]) |
&&(wr_reg_id[4:0] == { 1'b0, `CPU_CC_REG })) |
// Or if, in debug mode, we write the CC register |
||((i_halt)&&(i_dbg_we)&&(i_dbg_data[`CPU_GIE_BIT]) |
&&(i_dbg_reg == { 1'b0, `CPU_CC_REG})) |
); |
always @(posedge i_clk) |
if (i_rst) |
gie <= 1'b0; |
else if (w_switch_to_interrupt) |
gie <= 1'b0; |
else if (w_release_from_interrupt) |
gie <= 1'b1; |
|
// |
// Write backs to the PC register, and general increments of it |
// We support two: upc and ipc. If the instruction is normal, |
// we increment upc, if interrupt level we increment ipc. If |
// the instruction writes the PC, we write whichever PC is appropriate. |
// |
// Do we need to all our partial results from the pipeline? |
// What happens when the pipeline has gie and ~gie instructions within |
// it? Do we clear both? What if a gie instruction tries to clear |
// a non-gie instruction? |
always @(posedge i_clk) |
if (i_rst) |
upc <= RESET_ADDRESS; |
else if ((wr_reg_ce)&&(wr_reg_id[4])&&(wr_write_pc)) |
upc <= wr_reg_vl; |
else if ((alu_gie)&&(alu_pc_valid)) |
upc <= alu_pc; |
else if ((i_halt)&&(i_dbg_we) |
&&(i_dbg_reg == { 1'b1, `CPU_PC_REG })) |
upc <= i_dbg_data; |
|
always @(posedge i_clk) |
if (i_rst) |
ipc <= RESET_ADDRESS; |
else if ((wr_reg_ce)&&(~wr_reg_id[4])&&(wr_write_pc)) |
ipc <= wr_reg_vl; |
else if ((~alu_gie)&&(alu_pc_valid)) |
ipc <= alu_pc; |
else if ((i_halt)&&(i_dbg_we) |
&&(i_dbg_reg == { 1'b0, `CPU_PC_REG })) |
ipc <= i_dbg_data; |
|
always @(posedge i_clk) |
if (i_rst) |
pf_pc <= RESET_ADDRESS; |
else if (w_switch_to_interrupt) |
pf_pc <= ipc; |
else if (w_release_from_interrupt) |
pf_pc <= upc; |
else if ((wr_reg_ce)&&(wr_reg_id[4] == gie)&&(wr_write_pc)) |
pf_pc <= wr_reg_vl; |
else if ((i_halt)&&(i_dbg_we) |
&&(wr_reg_id[4:0] == { gie, `CPU_PC_REG})) |
pf_pc <= i_dbg_data; |
// else if (pf_ce) |
else if (dcd_ce) |
pf_pc <= pf_pc + 1; |
|
initial new_pc = 1'b1; |
always @(posedge i_clk) |
if (i_rst) |
new_pc <= 1'b1; |
else if (w_switch_to_interrupt) |
new_pc <= 1'b1; |
else if (w_release_from_interrupt) |
new_pc <= 1'b1; |
else if ((wr_reg_ce)&&(wr_reg_id[4] == gie)&&(wr_write_pc)) |
new_pc <= 1'b1; |
else if ((i_halt)&&(i_dbg_we) |
&&(wr_reg_id[4:0] == { gie, `CPU_PC_REG})) |
new_pc <= 1'b1; |
else |
new_pc <= 1'b0; |
|
// |
// The debug interface |
always @(posedge i_clk) |
begin |
o_dbg_reg <= regset[i_dbg_reg]; |
if (i_dbg_reg[3:0] == `CPU_PC_REG) |
o_dbg_reg <= (i_dbg_reg[4])?upc:ipc; |
else if (i_dbg_reg[3:0] == `CPU_CC_REG) |
o_dbg_reg <= { 25'h00, step, gie, sleep, |
((i_dbg_reg[4])?flags:iflags) }; |
end |
always @(posedge i_clk) |
o_dbg_stall <= (~i_halt)||(pf_cyc)||(mem_cyc)||(mem_busy) |
||((~opvalid)&&(~i_rst)) |
||((~dcdvalid)&&(~i_rst)); |
|
// |
// |
// Produce accounting outputs: Account for any CPU stalls, so we can |
// later evaluate how well we are doing. |
// |
// |
assign o_mem_stall = (~i_halt)&&(~sleep)&&(opvalid)&&(mem_busy) |
&&(~pf_cyc); |
assign o_pf_stall = (~i_halt)&&(~sleep)&&(((pf_ce)&&(~pf_valid)) |
||((opvalid)&&(mem_busy)&&(pf_cyc))); |
// assign o_alu_stall = (~i_halt)&&(~sleep)&&(~mem_busy)&& |
// ((alu_stall)||(~alu_valid)); |
assign o_alu_stall = alu_pc_valid; |
endmodule |
/////////////////////////////////////////////////////////////////////////// |
// |
// Filename: zipsystem.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: This portion of the ZIP CPU implements a number of soft |
// peripherals to the CPU nearby its CORE. The functionality |
// sits on the data bus, and does not include any true |
// external hardware peripherals. The peripherals included here |
// include: |
// |
// |
// Local interrupt controller--for any/all of the interrupts generated |
// here. This would include a pin for interrupts generated |
// elsewhere, so this interrupt controller could be a master |
// handling all interrupts. My interrupt controller would work |
// for this purpose. |
// |
// The ZIP-CPU supports only one interrupt because, as I understand |
// modern systems (Linux), they tend to send all interrupts to the |
// same interrupt vector anyway. Hence, that's what we do here. |
// |
// Bus Error interrupts -- generates an interrupt any time the wishbone |
// bus produces an error on a given access, for whatever purpose |
// also records the address on the bus at the time of the error. |
// |
// Trap instructions |
// Writing to this "register" will always create an interrupt. |
// After the interrupt, this register may be read to see what |
// value had been written to it. |
// |
// Bit reverse register ... ? |
// |
// (Potentially an eventual floating point co-processor ...) |
// |
// Real-time clock |
// |
// Interval timer(s) (Count down from fixed value, and either stop on |
// zero, or issue an interrupt and restart automatically on zero) |
// These can be implemented as watchdog timers if desired--the |
// only difference is that a watchdog timer's interrupt feeds the |
// reset line instead of the processor interrupt line. |
// |
// Watch-dog timer: this is the same as an interval timer, only it's |
// interrupt/time-out line is wired to the reset line instead of |
// the interrupt line of the CPU. |
// |
// ROM Memory map |
// Set a register to control this map, and a DMA will begin to |
// fill this memory from a slower FLASH. Once filled, accesses |
// will be from this memory instead of |
// |
// |
// Doing some market comparison, let's look at what peripherals a TI |
// MSP430 might offer: MSP's may have I2C ports, SPI, UART, DMA, ADC, |
// Comparators, 16,32-bit timers, 16x16 or 32x32 timers, AES, BSL, |
// brown-out-reset(s), real-time-clocks, temperature sensors, USB ports, |
// Spi-Bi-Wire, UART Boot-strap Loader (BSL), programmable digital I/O, |
// watchdog-timers, |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
`define PERIPHBASE 32'hc0000000 |
`define INTCTRL 4'h0 // |
`define WATCHDOG 4'h1 // Interrupt generates reset signal |
`define CACHECTRL 4'h2 // Sets IVEC[0] |
`define CTRINT 4'h3 // Sets IVEC[5] |
`define TIMER_A 4'h4 // Sets IVEC[4] |
`define TIMER_B 4'h5 // Sets IVEC[3] |
`define TIMER_C 4'h6 // Sets IVEC[2] |
`define JIFFIES 4'h7 // Sets IVEC[1] |
|
`define MSTR_TASK_CTR 4'h8 |
`define MSTR_MSTL_CTR 4'h9 |
`define MSTR_PSTL_CTR 4'ha |
`define MSTR_ASTL_CTR 4'hb |
`define USER_TASK_CTR 4'hc |
`define USER_MSTL_CTR 4'hd |
`define USER_PSTL_CTR 4'he |
`define USER_ASTL_CTR 4'hf |
|
`define CACHEBASE 16'hc010 // |
// `define RTC_CLOCK 32'hc0000008 // A global something |
// `define BITREV 32'hc0000003 |
// |
// DBGCTRL |
// 10 HALT |
// 9 HALT(ED) |
// 8 STEP (W=1 steps, and returns to halted) |
// 7 INTERRUPT-FLAG |
// 6 RESET_FLAG |
// ADDRESS: |
// 5 PERIPHERAL-BIT |
// [4:0] REGISTER-ADDR |
// DBGDATA |
// read/writes internal registers |
module zipsystem(i_clk, i_rst, |
// Wishbone master interface from the CPU |
o_wb_cyc, o_wb_stb, o_wb_we, o_wb_addr, o_wb_data, |
i_wb_ack, i_wb_stall, i_wb_data, |
// Incoming interrupts |
i_ext_int, |
// Wishbone slave interface for debugging purposes |
i_dbg_cyc, i_dbg_stb, i_dbg_we, i_dbg_addr, i_dbg_data, |
o_dbg_ack, o_dbg_stall, o_dbg_data); |
parameter RESET_ADDRESS=32'h0100000; |
input i_clk, i_rst; |
// Wishbone master |
output wire o_wb_cyc, o_wb_stb, o_wb_we; |
output wire [31:0] o_wb_addr; |
output wire [31:0] o_wb_data; |
input i_wb_ack, i_wb_stall; |
input [31:0] i_wb_data; |
// Incoming interrupts |
input i_ext_int; |
// Wishbone slave |
input i_dbg_cyc, i_dbg_stb, i_dbg_we, i_dbg_addr; |
input [31:0] i_dbg_data; |
output wire o_dbg_ack; |
output wire o_dbg_stall; |
output wire [31:0] o_dbg_data; |
|
wire [31:0] ext_idata; |
|
// Delay the debug port by one clock, to meet timing requirements |
wire dbg_cyc, dbg_stb, dbg_we, dbg_addr, dbg_stall; |
wire [31:0] dbg_idata, dbg_odata; |
reg dbg_ack; |
busdelay #(1,32) wbdelay(i_clk, |
i_dbg_cyc, i_dbg_stb, i_dbg_we, i_dbg_addr, i_dbg_data, |
o_dbg_ack, o_dbg_stall, o_dbg_data, |
dbg_cyc, dbg_stb, dbg_we, dbg_addr, dbg_idata, |
dbg_ack, dbg_stall, dbg_odata); |
|
// |
// |
// |
wire sys_cyc, sys_stb, sys_we; |
wire [3:0] sys_addr; |
wire [31:0] cpu_addr; |
wire [31:0] sys_data; |
// wire sys_ack, sys_stall; |
|
// |
// The external debug interface |
// |
// We offer only a limited interface here, requiring a pre-register |
// write to set the local address. This interface allows access to |
// the Zip System on a debug basis only, and not to the rest of the |
// wishbone bus. Further, to access these registers, the control |
// register must first be accessed to both stop the CPU and to |
// set the following address in question. Hence all accesses require |
// two accesses: write the address to the control register (and halt |
// the CPU if not halted), then read/write the data from the data |
// register. |
// |
wire cpu_break; |
reg cmd_reset, cmd_halt, cmd_step; |
reg [5:0] cmd_addr; |
initial cmd_reset = 1'b1; |
initial cmd_halt = 1'b1; |
initial cmd_step = 1'b0; |
always @(posedge i_clk) |
if (i_rst) |
begin |
cmd_halt <= 1'b0; |
cmd_step <= 1'b0; |
cmd_reset<= 1'b0; |
cmd_addr <= 6'h00; |
end else if ((dbg_cyc)&&(dbg_stb) |
&&(dbg_we)&&(~dbg_addr)) |
begin |
cmd_halt <= dbg_idata[10]; |
cmd_step <= dbg_idata[ 8]; |
cmd_reset<= dbg_idata[ 6]; |
cmd_addr <= dbg_idata[5:0]; |
end else if (cmd_step) |
begin |
cmd_halt <= 1'b1; |
cmd_step <= 1'b0; |
end else if (cpu_break) |
cmd_halt <= 1'b1; |
wire cpu_reset; |
assign cpu_reset = (i_rst)||(cmd_reset)||(wdt_reset); |
|
wire cpu_halt, cpu_dbg_stall; |
assign cpu_halt = (cmd_halt)&&(~cmd_step); |
wire [31:0] pic_data; |
wire [31:0] cmd_data; |
assign cmd_data = { 21'h00, cmd_halt, (~cpu_dbg_stall), 1'b0, pic_data[15], |
cpu_reset, cmd_addr }; |
|
`ifdef USE_TRAP |
// |
// The TRAP peripheral |
// |
wire trap_ack, trap_stall, trap_int; |
wire [31:0] trap_data; |
ziptrap trapp(i_clk, |
sys_cyc, (sys_stb)&&(sys_addr == `TRAP_ADDR), sys_we, |
sys_data, |
trap_ack, trap_stall, trap_data, trap_int); |
`endif |
|
// |
// The WATCHDOG Timer |
// |
wire wdt_ack, wdt_stall, wdt_reset; |
wire [31:0] wdt_data; |
ziptimer watchdog(i_clk, cpu_reset, ~cmd_halt, |
sys_cyc, ((sys_stb)&&(sys_addr == `WATCHDOG)), sys_we, |
sys_data, |
wdt_ack, wdt_stall, wdt_data, wdt_reset); |
|
// |
// The Flash Cache, a pre-read cache to memory that can be used to |
// create a fast memory access area |
// |
wire cache_int; |
wire [31:0] cache_data; |
wire cache_stb, cache_ack, cache_stall; |
wire fc_cyc, fc_stb, fc_we, fc_ack, fc_stall; |
wire [31:0] fc_data, fc_addr; |
flashcache #(10) manualcache(i_clk, |
sys_cyc, cache_stb, |
((sys_stb)&&(sys_addr == `CACHECTRL)), |
sys_we, cpu_addr[9:0], sys_data, |
cache_ack, cache_stall, cache_data, |
// Need the outgoing CACHE wishbone bus |
fc_cyc, fc_stb, fc_we, fc_addr, fc_data, |
fc_ack, fc_stall, ext_idata, |
// Cache interrupt, for upon completion |
cache_int); |
|
|
// Counters -- for performance measurement and accounting |
// |
// Here's the stuff we'll be counting .... |
// |
wire cpu_mem_stall, cpu_pf_stall, cpu_alu_stall; |
|
// |
// The master counters will, in general, not be reset. They'll be used |
// for an overall counter. |
// |
// Master task counter |
wire mtc_ack, mtc_stall, mtc_int; |
wire [31:0] mtc_data; |
zipcounter mtask_ctr(i_clk, (~cmd_halt), sys_cyc, |
(sys_stb)&&(sys_addr == `MSTR_TASK_CTR), |
sys_we, sys_data, |
mtc_ack, mtc_stall, mtc_data, mtc_int); |
|
// Master Memory-Stall counter |
wire mmc_ack, mmc_stall, mmc_int; |
wire [31:0] mmc_data; |
zipcounter mmstall_ctr(i_clk,(~cmd_halt)&&(cpu_mem_stall), sys_cyc, |
(sys_stb)&&(sys_addr == `MSTR_MSTL_CTR), |
sys_we, sys_data, |
mmc_ack, mmc_stall, mmc_data, mmc_int); |
|
// Master PreFetch-Stall counter |
wire mpc_ack, mpc_stall, mpc_int; |
wire [31:0] mpc_data; |
zipcounter mpstall_ctr(i_clk,(~cmd_halt)&&(cpu_pf_stall), sys_cyc, |
(sys_stb)&&(sys_addr == `MSTR_PSTL_CTR), |
sys_we, sys_data, |
mpc_ack, mpc_stall, mpc_data, mpc_int); |
|
// Master ALU-Stall counter |
wire mac_ack, mac_stall, mac_int; |
wire [31:0] mac_data; |
zipcounter mastall_ctr(i_clk,(~cmd_halt)&&(cpu_alu_stall), sys_cyc, |
(sys_stb)&&(sys_addr == `MSTR_ASTL_CTR), |
sys_we, sys_data, |
mac_ack, mac_stall, mac_data, mac_int); |
|
// |
// The user counters are different from those of the master. They will |
// be reset any time a task is given control of the CPU. |
// |
// User task counter |
wire utc_ack, utc_stall, utc_int; |
wire [31:0] utc_data; |
zipcounter utask_ctr(i_clk,(~cmd_halt), sys_cyc, |
(sys_stb)&&(sys_addr == `USER_TASK_CTR), |
sys_we, sys_data, |
utc_ack, utc_stall, utc_data, utc_int); |
|
// User Memory-Stall counter |
wire umc_ack, umc_stall, umc_int; |
wire [31:0] umc_data; |
zipcounter umstall_ctr(i_clk,(~cmd_halt)&&(cpu_mem_stall), sys_cyc, |
(sys_stb)&&(sys_addr == `USER_MSTL_CTR), |
sys_we, sys_data, |
umc_ack, umc_stall, umc_data, umc_int); |
|
// User PreFetch-Stall counter |
wire upc_ack, upc_stall, upc_int; |
wire [31:0] upc_data; |
zipcounter upstall_ctr(i_clk,(~cmd_halt)&&(cpu_pf_stall), sys_cyc, |
(sys_stb)&&(sys_addr == `USER_PSTL_CTR), |
sys_we, sys_data, |
upc_ack, upc_stall, upc_data, upc_int); |
|
// User ALU-Stall counter |
wire uac_ack, uac_stall, uac_int; |
wire [31:0] uac_data; |
zipcounter uastall_ctr(i_clk,(~cmd_halt)&&(cpu_alu_stall), sys_cyc, |
(sys_stb)&&(sys_addr == `USER_ASTL_CTR), |
sys_we, sys_data, |
uac_ack, uac_stall, uac_data, uac_int); |
|
// A little bit of pre-cleanup (actr = accounting counters) |
wire actr_ack, actr_stall; |
wire [31:0] actr_data; |
assign actr_ack = ((mtc_ack | mmc_ack | mpc_ack | mac_ack) |
|(utc_ack | umc_ack | upc_ack | uac_ack)); |
assign actr_stall = ((mtc_stall | mmc_stall | mpc_stall | mac_stall) |
|(utc_stall | umc_stall | upc_stall|uac_stall)); |
assign actr_data = ((mtc_ack) ? mtc_data |
: ((mmc_ack) ? mmc_data |
: ((mpc_ack) ? mpc_data |
: ((mac_ack) ? mac_data |
: ((utc_ack) ? utc_data |
: ((umc_ack) ? umc_data |
: ((upc_ack) ? upc_data |
: uac_data))))))); |
|
|
|
// |
// Counter Interrupt controller |
// |
reg ctri_ack; |
wire ctri_stall, ctri_int, ctri_sel; |
wire [7:0] ctri_vector; |
wire [31:0] ctri_data; |
assign ctri_sel = (sys_cyc)&&(sys_stb)&&(sys_addr == `CTRINT); |
assign ctri_vector = { mtc_int, mmc_int, mpc_int, mac_int, |
utc_int, umc_int, upc_int, uac_int }; |
icontrol #(8) ctri(i_clk, cpu_reset, (ctri_sel)&&(sys_addr==`CTRINT), |
sys_data, ctri_data, ctri_vector, ctri_int); |
always @(posedge i_clk) |
ctri_ack <= ctri_sel; |
|
|
// |
// Timer A |
// |
wire tma_ack, tma_stall, tma_int; |
wire [31:0] tma_data; |
ziptimer timer_a(i_clk, cpu_reset, ~cmd_halt, |
sys_cyc, (sys_stb)&&(sys_addr == `TIMER_A), sys_we, |
sys_data, |
tma_ack, tma_stall, tma_data, tma_int); |
|
// |
// Timer B |
// |
wire tmb_ack, tmb_stall, tmb_int; |
wire [31:0] tmb_data; |
ziptimer timer_b(i_clk, cpu_reset, ~cmd_halt, |
sys_cyc, (sys_stb)&&(sys_addr == `TIMER_B), sys_we, |
sys_data, |
tmb_ack, tmb_stall, tmb_data, tmb_int); |
|
// |
// Timer C |
// |
wire tmc_ack, tmc_stall, tmc_int; |
wire [31:0] tmc_data; |
ziptimer timer_c(i_clk, cpu_reset, ~cmd_halt, |
sys_cyc, (sys_stb)&&(sys_addr == `TIMER_C), sys_we, |
sys_data, |
tmc_ack, tmc_stall, tmc_data, tmc_int); |
|
// |
// JIFFIES |
// |
wire jif_ack, jif_stall, jif_int; |
wire [31:0] jif_data; |
zipjiffies jiffies(i_clk, ~cmd_halt, |
sys_cyc, (sys_stb)&&(sys_addr == `JIFFIES), sys_we, |
sys_data, |
jif_ack, jif_stall, jif_data, jif_int); |
|
// |
// The programmable interrupt controller peripheral |
// |
wire pic_interrupt; |
wire [6:0] int_vector; |
assign int_vector = { i_ext_int, ctri_int, tma_int, tmb_int, tmc_int, |
jif_int, cache_int }; |
icontrol #(7) pic(i_clk, cpu_reset, |
(sys_cyc)&&(sys_stb)&&(sys_we) |
&&(sys_addr==`INTCTRL), |
sys_data, pic_data, |
int_vector, pic_interrupt); |
reg pic_ack; |
always @(posedge i_clk) |
pic_ack <= (sys_cyc)&&(sys_stb)&&(sys_addr == `INTCTRL); |
|
// |
// The CPU itself |
// |
wire cpu_cyc, cpu_stb, cpu_we, cpu_dbg_we; |
wire [31:0] cpu_data, wb_data; |
wire cpu_ack, cpu_stall; |
wire [31:0] cpu_dbg_data; |
assign cpu_dbg_we = ((dbg_cyc)&&(dbg_stb)&&(~cmd_addr[5]) |
&&(dbg_we)&&(dbg_addr)); |
zipcpu #(RESET_ADDRESS) thecpu(i_clk, cpu_reset, pic_interrupt, |
cpu_halt, cmd_addr[4:0], cpu_dbg_we, |
dbg_idata, cpu_dbg_stall, cpu_dbg_data, |
cpu_break, |
cpu_cyc, cpu_stb, cpu_we, cpu_addr, cpu_data, |
cpu_ack, cpu_stall, wb_data, |
cpu_mem_stall, cpu_pf_stall, cpu_alu_stall); |
|
// Now, arbitrate the bus ... first for the local peripherals |
assign sys_cyc = (cpu_cyc)||((cpu_halt)&&(~cpu_dbg_stall)&&(dbg_cyc)); |
assign sys_stb = (cpu_cyc) |
? ((cpu_stb)&&(cpu_addr[31:4] == 28'hc000000)) |
: ((dbg_stb)&&(dbg_addr)&&(cmd_addr[5])); |
|
assign sys_we = (cpu_cyc) ? cpu_we : dbg_we; |
assign sys_addr= (cpu_cyc) ? cpu_addr[3:0] : cmd_addr[3:0]; |
assign sys_data= (cpu_cyc) ? cpu_data : dbg_idata; |
assign cache_stb=((cpu_cyc)&&(cpu_stb)&&(cpu_addr[31:16]==`CACHEBASE)); |
|
// Return debug response values |
assign dbg_odata = (~dbg_addr)?cmd_data |
:((~cmd_addr[5])?cpu_dbg_data : wb_data); |
initial dbg_ack = 1'b0; |
always @(posedge i_clk) |
dbg_ack <= (dbg_cyc)&&(dbg_stb)&& |
((~dbg_addr)||((cpu_halt)&&(~cpu_dbg_stall))); |
assign dbg_stall=(dbg_addr)&&(dbg_cyc) |
&&((cpu_cyc)||(~cpu_halt)||(cpu_dbg_stall)); |
|
// Now for the external wishbone bus |
// Need to arbitrate between the flash cache and the CPU |
// The way this works, though, the CPU will stall once the flash |
// cache gets access to the bus--the CPU will be stuck until the |
// flash cache is finished with the bus. |
wire ext_cyc, ext_stb, ext_we; |
wire cpu_ext_ack, cpu_ext_stall, ext_ack, ext_stall; |
wire [31:0] ext_addr, ext_odata; |
wbarbiter #(32,32) flashvcpu(i_clk, i_rst, |
fc_addr, fc_data, fc_we, fc_stb, fc_cyc, |
fc_ack, fc_stall, |
cpu_addr, cpu_data, cpu_we, |
((cpu_stb)&&(~sys_stb)&&(~cache_stb)), |
cpu_cyc, cpu_ext_ack, cpu_ext_stall, |
ext_addr, ext_odata, ext_we, ext_stb, |
ext_cyc, ext_ack, ext_stall); |
|
busdelay #(32,32) extbus(i_clk, |
ext_cyc, ext_stb, ext_we, ext_addr, ext_odata, |
ext_ack, ext_stall, ext_idata, |
o_wb_cyc, o_wb_stb, o_wb_we, o_wb_addr, o_wb_data, |
i_wb_ack, i_wb_stall, i_wb_data); |
|
wire tmr_ack; |
assign tmr_ack = (tma_ack|tmb_ack|tmc_ack|jif_ack); |
wire [31:0] tmr_data; |
assign tmr_data = (tma_ack)?tma_data |
:(tmb_ack ? tmb_data |
:(tmc_ack ? tmc_data |
:jif_data)); |
assign wb_data = (tmr_ack|wdt_ack)?((tmr_ack)?tmr_data:wdt_data) |
:((actr_ack|cache_ack)?((actr_ack)?actr_data:cache_data) |
:((pic_ack|ctri_ack)?((pic_ack)?pic_data:ctri_data) |
:(ext_idata))); |
|
assign cpu_stall = (tma_stall | tmb_stall | tmc_stall | jif_stall |
| wdt_stall | cache_stall |
| cpu_ext_stall); |
assign cpu_ack = (tmr_ack|wdt_ack|cache_ack|cpu_ext_ack|ctri_ack|actr_ack|pic_ack); |
endmodule |
/////////////////////////////////////////////////////////////////////////// |
// |
// Filename: wbarbiter.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: At some point in time, I might wish to have two masters connect |
// to the same wishbone bus. As an example, I might wish to have |
// both the instruction fetch and the load/store operators |
// of my Zip CPU access the the same bus. How shall they both |
// get access to the same resource? This module allows the |
// wishbone interfaces from two sources to drive the bus, while |
// guaranteeing that only one drives the bus at a time. |
// |
// The core logic works like this: |
// |
// 1. If 'A' or 'B' asserts the o_cyc line, a bus cycle will begin, |
// with acccess granted to whomever requested it. |
// 2. If both 'A' and 'B' assert o_cyc at the same time, only 'A' |
// will be granted the bus. (If the alternating parameter |
// is set, A and B will alternate who gets the bus in |
// this case.) |
// 3. The bus will remain owned by whomever the bus was granted to |
// until they deassert the o_cyc line. |
// 4. At the end of a bus cycle, o_cyc is guaranteed to be |
// deasserted (low) for one clock. |
// 5. On the next clock, bus arbitration takes place again. If |
// 'A' requests the bus, no matter how long 'B' was |
// waiting, 'A' will then be granted the bus. (Unless |
// again the alternating parameter is set, then the |
// access is guaranteed to switch to B.) |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
`define WBA_ALTERNATING |
module wbarbiter(i_clk, i_rst, |
// Bus A |
i_a_adr, i_a_dat, i_a_we, i_a_stb, i_a_cyc, o_a_ack, o_a_stall, |
// Bus B |
i_b_adr, i_b_dat, i_b_we, i_b_stb, i_b_cyc, o_b_ack, o_b_stall, |
// Both buses |
o_adr, o_dat, o_we, o_stb, o_cyc, i_ack, i_stall); |
// 18 bits will address one GB, 4 bytes at a time. |
// 19 bits will allow the ability to address things other than just |
// the 1GB of memory we are expecting. |
parameter DW=32, AW=19; |
// Wishbone doesn't use an i_ce signal. While it could, they dislike |
// what it would (might) do to the synchronous reset signal, i_rst. |
input i_clk, i_rst; |
input [(AW-1):0] i_a_adr, i_b_adr; |
input [(DW-1):0] i_a_dat, i_b_dat; |
input i_a_we, i_a_stb, i_a_cyc; |
input i_b_we, i_b_stb, i_b_cyc; |
output wire o_a_ack, o_b_ack, o_a_stall, o_b_stall; |
output wire [(AW-1):0] o_adr; |
output wire [(DW-1):0] o_dat; |
output wire o_we, o_stb, o_cyc; |
input i_ack, i_stall; |
|
// All the fancy stuff here is done with the three primary signals: |
// o_cyc |
// w_a_owner |
// w_b_owner |
// These signals are helped by r_cyc, r_a_owner, and r_b_owner. |
// If you understand these signals, all else will fall into place. |
|
// r_cyc just keeps track of the last o_cyc value. That way, on |
// the next clock we can tell if we've had one non-cycle before |
// starting another cycle. Specifically, no new cycles will be |
// allowed to begin unless r_cyc=0. |
reg r_cyc; |
always @(posedge i_clk) |
if (i_rst) |
r_cyc <= 1'b0; |
else |
r_cyc <= o_cyc; |
|
// Go high immediately (new cycle) if ... |
// Previous cycle was low and *someone* is requesting a bus cycle |
// Go low immadiately if ... |
// We were just high and the owner no longer wants the bus |
// WISHBONE Spec recommends no logic between a FF and the o_cyc |
// This violates that spec. (Rec 3.15, p35) |
assign o_cyc = ((~r_cyc)&&((i_a_cyc)||(i_b_cyc))) || ((r_cyc)&&((w_a_owner)||(w_b_owner))); |
|
|
// Register keeping track of the last owner, wire keeping track of the |
// current owner allowing us to not lose a clock in arbitrating the |
// first clock of the bus cycle |
reg r_a_owner, r_b_owner; |
wire w_a_owner, w_b_owner; |
`ifdef WBA_ALTERNATING |
reg r_a_last_owner; |
`endif |
always @(posedge i_clk) |
if (i_rst) |
begin |
r_a_owner <= 1'b0; |
r_b_owner <= 1'b0; |
end else begin |
r_a_owner <= w_a_owner; |
r_b_owner <= w_b_owner; |
`ifdef WBA_ALTERNATING |
if (w_a_owner) |
r_a_last_owner <= 1'b1; |
else if (w_b_owner) |
r_a_last_owner <= 1'b0; |
`endif |
end |
// |
// If you are the owner, retain ownership until i_x_cyc is no |
// longer asserted. Likewise, you cannot become owner until o_cyc |
// is de-asserted for one cycle. |
// |
// 'A' is given arbitrary priority over 'B' |
// 'A' may own the bus only if he wants it. When 'A' drops i_a_cyc, |
// o_cyc must drop and so must w_a_owner on the same cycle. |
// However, when 'A' asserts i_a_cyc, he can only capture the bus if |
// it's had an idle cycle. |
// The same is true for 'B' with one exception: if both contend for the |
// bus on the same cycle, 'A' arbitrarily wins. |
`ifdef WBA_ALTERNATING |
assign w_a_owner = (i_a_cyc) // if A requests ownership, and either |
&& ((r_a_owner) // A has already been recognized or |
|| ((~r_cyc) // the bus is free and |
&&((~i_b_cyc) // B has not requested, or if he |
||(~r_a_last_owner)) )); // has, it's A's turn |
assign w_b_owner = (i_b_cyc)&& ((r_b_owner) || ((~r_cyc)&&((~i_a_cyc)||(r_a_last_owner)) )); |
`else |
assign w_a_owner = (i_a_cyc)&& ((r_a_owner) || (~r_cyc) ); |
assign w_b_owner = (i_b_cyc)&& ((r_b_owner) || ((~r_cyc)&&(~i_a_cyc)) ); |
`endif |
|
// Realistically, if neither master owns the bus, the output is a |
// don't care. Thus we trigger off whether or not 'A' owns the bus. |
// If 'B' owns it all we care is that 'A' does not. Likewise, if |
// neither owns the bus than the values on the various lines are |
// irrelevant. |
assign o_adr = (w_a_owner) ? i_a_adr : i_b_adr; |
assign o_dat = (w_a_owner) ? i_a_dat : i_b_dat; |
assign o_we = (w_a_owner) ? i_a_we : i_b_we; |
assign o_stb = (o_cyc) && ((w_a_owner) ? i_a_stb : i_b_stb); |
|
// We cannot allow the return acknowledgement to ever go high if |
// the master in question does not own the bus. Hence we force it |
// low if the particular master doesn't own the bus. |
assign o_a_ack = (w_a_owner) ? i_ack : 1'b0; |
assign o_b_ack = (w_b_owner) ? i_ack : 1'b0; |
|
// Stall must be asserted on the same cycle the input master asserts |
// the bus, if the bus isn't granted to him. |
assign o_a_stall = (w_a_owner) ? i_stall : 1'b1; |
assign o_b_stall = (w_b_owner) ? i_stall : 1'b1; |
|
endmodule |
|
/////////////////////////////////////////////////////////////////////////// |
// |
// Filename: busdelay.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: Delay any access to the wishbone bus by a single clock. |
// |
// When the first Zip System would not meet the timing requirements of |
// the board it was placed upon, this bus delay was added to help out. |
// It may no longer be necessary, having cleaned some other problems up |
// first, but it will remain here as a means of alleviating timing |
// problems. |
// |
// The specific problem takes place on the stall line: a wishbone master |
// *must* know on the first clock whether or not the bus will stall. |
// |
// |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
module busdelay(i_clk, |
// The input bus |
i_wb_cyc, i_wb_stb, i_wb_we, i_wb_addr, i_wb_data, |
o_wb_ack, o_wb_stall, o_wb_data, |
// The delayed bus |
o_dly_cyc, o_dly_stb, o_dly_we, o_dly_addr, o_dly_data, |
i_dly_ack, i_dly_stall, i_dly_data); |
parameter AW=32, DW=32; |
input i_clk; |
// Input/master bus |
input i_wb_cyc, i_wb_stb, i_wb_we; |
input [(AW-1):0] i_wb_addr; |
input [(DW-1):0] i_wb_data; |
output reg o_wb_ack; |
output wire o_wb_stall; |
output reg [(DW-1):0] o_wb_data; |
// Delayed bus |
output reg o_dly_cyc, o_dly_stb, o_dly_we; |
output reg [(AW-1):0] o_dly_addr; |
output reg [(DW-1):0] o_dly_data; |
input i_dly_ack; |
input i_dly_stall; |
input [(DW-1):0] i_dly_data; |
|
initial o_dly_cyc = 1'b0; |
initial o_dly_stb = 1'b0; |
|
always @(posedge i_clk) |
o_dly_cyc <= i_wb_cyc; |
always @(posedge i_clk) |
if (~o_wb_stall) |
o_dly_stb <= i_wb_stb; |
always @(posedge i_clk) |
if (~o_wb_stall) |
o_dly_we <= i_wb_we; |
always @(posedge i_clk) |
if (~o_wb_stall) |
o_dly_addr<= i_wb_addr; |
always @(posedge i_clk) |
if (~o_wb_stall) |
o_dly_data <= i_wb_data; |
always @(posedge i_clk) |
o_wb_ack <= (i_dly_ack)&&(o_dly_cyc)&&(i_wb_cyc); |
always @(posedge i_clk) |
o_wb_data <= i_dly_data; |
|
// Our only non-delayed line, yet still really delayed. |
assign o_wb_stall = ((i_wb_cyc)&&(o_dly_cyc)&&(i_dly_stall)); |
|
endmodule |
################################################################################ |
# |
# Filename: Makefile |
# |
# Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
# |
# Purpose: This makefile builds a verilator simulation of the zipsystem. |
# It does not make the system within Vivado or Quartus. |
# |
# |
# Creator: Dan Gisselquist, Ph.D. |
# Gisselquist Tecnology, LLC |
# |
################################################################################ |
# |
# Copyright (C) 2015, Gisselquist Technology, LLC |
# |
# This program is free software (firmware): you can redistribute it and/or |
# modify it under the terms of the GNU General Public License as published |
# by the Free Software Foundation, either version 3 of the License, or (at |
# your option) any later version. |
# |
# This program is distributed in the hope that it will be useful, but WITHOUT |
# ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
# for more details. |
# |
# License: GPL, v3, as defined and found on www.gnu.org, |
# http://www.gnu.org/licenses/gpl.html |
# |
# |
################################################################################ |
# |
.PHONY: all |
all: zipsystem |
|
CORED:= core |
PRPHD:= peripherals |
AUXD := aux |
VSRC := zipsystem.v \ |
$(PRPHD)/flashcache.v $(PRPHD)/icontrol.v \ |
$(PRPHD)/zipcounter.v $(PRPHD)/zipjiffies.v \ |
$(PRPHD)/ziptimer.v $(PRPHD)/ziptrap.v \ |
$(CORED)/zipcpu.v $(CORED)/cpuops.v \ |
$(CORED)/pipefetch.v $(CORED)/prefetch.v \ |
$(CORED)/memops.v \ |
$(AUXD)/busdelay.v $(AUXD)/wbarbiter.v |
|
VOBJ := obj_dir |
|
$(VOBJ)/Vzipsystem.cpp: $(VSRC) |
verilator -cc -y $(CORED)/ -y $(PRPHD) -y $(AUXD) zipsystem.v |
|
$(VOBJ)/Vzipsystem__ALL.a: $(VOBJ)/Vzipsystem.cpp $(VOBJ)/Vzipsystem.h |
cd $(VOBJ); make -f Vzipsystem.mk |
|
.PHONY: zipsystem |
zipsystem: $(VOBJ)/Vzipsystem__ALL.a |
|
.PHONY: clean |
clean: |
rm -rf $(VOBJ) |
/////////////////////////////////////////////////////////////////////////// |
// |
// Filename: flashcache.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: Since my Zip CPU has primary access to a flash, which requires |
// nearly 24 clock cycles per read, this 'cache' module |
// is offered to minimize the effect. The CPU may now request |
// some amount of flash to be copied into this on-chip RAM, |
// and then access it with nearly zero latency. |
// |
// Interface: |
// FlashCache sits on the Wishbone bus as both a slave and a master. |
// Slave requests for memory will get mapped to a local RAM, from which |
// reads and writes may take place. |
// |
// This cache supports a single control register: the base wishbone address |
// of the device to copy memory from. The bottom bit if this address must |
// be zero (or it will be silently rendered as zero). When read, this |
// bottom bit will indicate 1) that the controller is still loading memory |
// into the cache, or 0) that the cache is ready to be used. |
// |
// Writing to this register will initiate a memory copy from the (new) |
// address. Once done, the loading bit will be cleared and an interrupt |
// generated. |
// |
// Where this memory is placed on the wishbone bus is entirely up to the |
// wishbone bus control logic. Setting the memory base to an |
// address controlled by this flashcache will produce unusable |
// results, and may well hang the bus. |
// Reads from the memory before complete will return immediately with |
// the value if read address is less than the current copy |
// address, or else they will stall until the read address is |
// less than the copy address. |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
module flashcache(i_clk, |
// Wishbone contrl interface |
i_wb_cyc, i_wb_stb,i_wb_ctrl_stb, i_wb_we, i_wb_addr, i_wb_data, |
o_wb_ack, o_wb_stall, o_wb_data, |
// Wishbone copy interface |
o_cp_cyc, o_cp_stb, o_cp_we, o_cp_addr, o_cp_data, |
i_cp_ack, i_cp_stall, i_cp_data, |
o_int); |
parameter LGCACHELEN=10; // 4 kB |
input i_clk; |
// Control interface, CPU interface to cache |
input i_wb_cyc, i_wb_stb,i_wb_ctrl_stb, i_wb_we; |
input [(LGCACHELEN-1):0] i_wb_addr; |
input [31:0] i_wb_data; |
output reg o_wb_ack; |
output wire o_wb_stall; |
output wire [31:0] o_wb_data; |
// Interface to peripheral bus, including flash |
output reg o_cp_cyc, o_cp_stb; |
output wire o_cp_we; |
output reg [31:0] o_cp_addr; |
output wire [31:0] o_cp_data; |
input i_cp_ack, i_cp_stall; |
input [31:0] i_cp_data; |
// And an interrupt to send once we complete |
output reg o_int; |
|
reg loading; |
reg [31:0] cache_base; |
reg [31:0] cache [0:((1<<LGCACHELEN)-1)]; |
|
// Decouple writing the cache base from the highly delayed bus lines |
reg wr_cache_base_flag; |
reg [31:0] wr_cache_base_value; |
always @(posedge i_clk) |
wr_cache_base_flag <= ((i_wb_cyc)&&(i_wb_ctrl_stb)&&(i_wb_we)); |
always @(posedge i_clk) |
wr_cache_base_value<= { i_wb_data[31:1], 1'b0 }; |
|
initial cache_base = 32'hffffffff; |
always @(posedge i_clk) |
if (wr_cache_base_flag) |
cache_base <= wr_cache_base_value; |
|
reg new_cache_base; |
initial new_cache_base = 1'b0; |
always @(posedge i_clk) |
if ((wr_cache_base_flag)&&(cache_base != wr_cache_base_value)) |
new_cache_base <= 1'b1; |
else |
new_cache_base <= 1'b0; |
|
reg [(LGCACHELEN-1):0] rdaddr; |
initial loading = 1'b0; |
always @(posedge i_clk) |
if (new_cache_base) |
begin |
loading <= 1'b1; |
o_cp_cyc <= 1'b0; |
end else if ((~o_cp_cyc)&&(loading)) |
begin |
o_cp_cyc <= 1'b1; |
end else if (o_cp_cyc) |
begin |
// Handle the ack/read line |
if (i_cp_ack) |
begin |
if (&rdaddr) |
begin |
o_cp_cyc <= 1'b0; |
loading <= 1'b0; |
end |
end |
end |
always @(posedge i_clk) |
if (~o_cp_cyc) |
o_cp_addr <= cache_base; |
else if ((o_cp_cyc)&&(o_cp_stb)&&(~i_cp_stall)) |
o_cp_addr <= o_cp_addr + 1;; |
always @(posedge i_clk) |
if ((~o_cp_cyc)&&(loading)) |
o_cp_stb <= 1'b1; |
else if ((o_cp_cyc)&&(o_cp_stb)&&(~i_cp_stall)) |
begin |
// We've made our last request |
if (o_cp_addr >= cache_base + { {(32-LGCACHELEN-1){1'b0}}, 1'b1, {(LGCACHELEN){1'b0}}}) |
o_cp_stb <= 1'b0; |
end |
always @(posedge i_clk) |
if (~loading) |
rdaddr <= 0; |
else if ((o_cp_cyc)&&(i_cp_ack)) |
rdaddr <= rdaddr + 1; |
|
initial o_int = 1'b0; |
always @(posedge i_clk) |
if ((o_cp_cyc)&&(i_cp_ack)&&(&rdaddr)) |
o_int <= 1'b1; |
else |
o_int <= 1'b0; |
|
assign o_cp_we = 1'b0; |
assign o_cp_data = 32'h00; |
|
|
// |
// Writes to our cache ... always delayed by a clock. |
// Clock 0 : Write request |
// Clock 1 : Write takes place |
// Clock 2 : Available for reading |
// |
reg we; |
reg [(LGCACHELEN-1):0] waddr; |
reg [31:0] wval; |
always @(posedge i_clk) |
we <= (loading)?((o_cp_cyc)&&(i_cp_ack)):(i_wb_cyc)&&(i_wb_stb)&&(i_wb_we); |
always @(posedge i_clk) |
waddr <= (loading)?rdaddr:i_wb_addr; |
always @(posedge i_clk) |
wval <= (loading)?i_cp_data:i_wb_data; |
|
always @(posedge i_clk) |
if (we) |
cache[waddr] <= wval; |
|
reg [31:0] cache_data; |
always @(posedge i_clk) |
if ((i_wb_cyc)&&(i_wb_stb)) |
cache_data <= cache[i_wb_addr]; |
|
always @(posedge i_clk) |
o_wb_ack <= (i_wb_cyc)&&( |
((i_wb_stb)&&(~loading)) |
||(i_wb_ctrl_stb)); |
reg ctrl; |
always @(posedge i_clk) |
ctrl <= i_wb_ctrl_stb; |
assign o_wb_data = (ctrl)?({cache_base[31:1],loading}):cache_data; |
assign o_wb_stall = (loading)&&(~o_wb_ack); |
|
endmodule |
/////////////////////////////////////////////////////////////////////////// |
// |
// Filename: zipcounter.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: |
// A very, _very_ simple counter. It's purpose doesn't really |
// include rollover, but it will interrupt on rollover. It can be set, |
// although my design concept is that it can be reset. It cannot be |
// halted. It will always produce interrupts--whether or not they are |
// handled interrupts is another question--that's up to the interrupt |
// controller. |
// |
// My intention is to use this counter for process accounting: I should |
// be able to use this to count clock ticks of processor time assigned to |
// each task by resetting the counter at the beginning of every task |
// interval, and reading the result at the end of the interval. As long |
// as the interval is less than 2^32 clocks, there should be no problem. |
// Similarly, this can be used to measure CPU wishbone bus stalls, |
// prefetch stalls, or other CPU stalls (i.e. stalling as part of a JMP |
// instruction, or a read from the condition codes following a write). |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
module zipcounter(i_clk, i_ce, |
i_wb_cyc, i_wb_stb, i_wb_we, i_wb_data, |
o_wb_ack, o_wb_stall, o_wb_data, |
o_int); |
parameter BW = 32; |
input i_clk, i_ce; |
// Wishbone inputs |
input i_wb_cyc, i_wb_stb, i_wb_we; |
input [(BW-1):0] i_wb_data; |
// Wishbone outputs |
output reg o_wb_ack; |
output wire o_wb_stall; |
output reg [(BW-1):0] o_wb_data; |
// Interrupt line |
output reg o_int; |
|
initial o_wb_data = 32'h00; |
always @(posedge i_clk) |
if ((i_wb_cyc)&&(i_wb_stb)&&(i_wb_we)) |
o_wb_data <= i_wb_data; |
else if (i_ce) |
o_wb_data <= o_wb_data + 1; |
|
initial o_int = 0; |
always @(posedge i_clk) |
if (i_ce) |
o_int <= &o_wb_data; |
else |
o_int <= 1'b0; |
|
initial o_wb_ack = 1'b0; |
always @(posedge i_clk) |
o_wb_ack <= (i_wb_cyc)&&(i_wb_stb); |
assign o_wb_stall = 1'b0; |
endmodule |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: zipjiffies.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: This peripheral is motivated by the Linux use of 'jiffies'. |
// A process, in Linux, can request to be put to sleep until a certain |
// number of 'jiffies' have elapsed. Using this interface, the CPU can |
// read the number of 'jiffies' from this peripheral (it only has the |
// one location in address space), add the sleep length to it, and |
// write the result back to the peripheral. The zipjiffies peripheral |
// will record the value written to it only if it is nearer the current |
// counter value than the last current waiting interrupt time. If no |
// other interrupts are waiting, and this time is in the future, it will |
// be enabled. (There is currrently no way to disable a jiffie interrupt |
// once set.) The processor may then place this sleep request into a |
// list among other sleep requests. Once the timer expires, it would |
// write the next jiffy request to the peripheral and wake up the process |
// whose timer had expired. |
// |
// Quite elementary, really. |
// |
// Interface: |
// This peripheral contains one register: a counter. Reads from the |
// register return the current value of the counter. Writes within |
// the (N-1) bit space following the current time set an interrupt. |
// Writes of values that occurred in the last 2^(N-1) ticks will be |
// ignored. The timer then interrupts when it's value equals that time. |
// Multiple writes cause the jiffies timer to select the nearest possible |
// interrupt. Upon an interrupt, the next interrupt time/value is cleared |
// and will need to be reset if the CPU wants to get notified again. With |
// only the single interface, there is no way of knowing when the next |
// interrupt is scheduled for, neither is there any way to slow down the |
// interrupt timer in case you don't want it overflowing as often and you |
// wish to wait more jiffies than it supports. Thus, currently, if you |
// have a timer you wish to wait upon that is more than 2^31 into the |
// future, you would need to set timers along the way, wake up on those |
// timers, and set further timer's until you finally get to your |
// destination. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
module zipjiffies(i_clk, i_ce, |
i_wb_cyc, i_wb_stb, i_wb_we, i_wb_data, |
o_wb_ack, o_wb_stall, o_wb_data, |
o_int); |
parameter BW = 32, VW = (BW-2); |
input i_clk, i_ce; |
// Wishbone inputs |
input i_wb_cyc, i_wb_stb, i_wb_we; |
input [(BW-1):0] i_wb_data; |
// Wishbone outputs |
output reg o_wb_ack; |
output wire o_wb_stall; |
output wire [(BW-1):0] o_wb_data; |
// Interrupt line |
output reg o_int; |
|
// |
// Our counter logic: The counter is always counting up--it cannot |
// be stopped or altered. It's really quite simple. Okay, not quite. |
// We still support the clock enable line. We do this in order to |
// support debugging, so that if we get everything running inside a |
// debugger, the timer's all slow down so that everything can be stepped |
// together, one clock at a time. |
// |
reg [(BW-1):0] r_counter; |
always @(posedge i_clk) |
if (i_ce) |
r_counter <= r_counter+1; |
|
// |
// Writes to the counter set an interrupt--but only if they are in the |
// future as determined by the signed result of an unsigned subtract. |
// |
reg int_set, new_set; |
reg [(BW-1):0] int_when, new_when; |
wire signed [(BW-1):0] till_when, till_wb; |
assign till_when = int_when-r_counter; |
assign till_wb = new_when-r_counter; |
initial o_int = 1'b0; |
initial int_set = 1'b0; |
initial new_set = 1'b0; |
always @(posedge i_clk) |
begin |
o_int <= 1'b0; |
if ((i_ce)&&(int_set)&&(r_counter == int_when)) |
begin // Interrupts are self-clearing |
o_int <= 1'b1; // Set the interrupt flag |
int_set <= 1'b0;// Clear the interrupt |
end |
|
new_set <= 1'b0; |
if ((new_set)&&(till_wb > 0)&&((till_wb<till_when)||(~int_set))) |
begin |
int_when <= new_when; |
int_set <= ((int_set)||(till_wb>0)); |
end |
|
// Delay things by a clock to simplify our logic |
if ((i_wb_cyc)&&(i_wb_stb)&&(i_wb_we)) |
begin |
new_set <= 1'b1; |
new_when<= i_wb_data; |
end |
end |
|
// |
// Acknowledge any wishbone accesses -- everything we did took only |
// one clock anyway. |
// |
always @(posedge i_clk) |
o_wb_ack <= (i_wb_cyc)&&(i_wb_stb); |
assign o_wb_data = r_counter; |
assign o_wb_stall = 1'b0; |
|
endmodule |
/////////////////////////////////////////////////////////////////////////// |
// |
// Filename: ziptimer.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: |
// |
// Interface: |
// Two options: |
// 1. One combined register for both control and value, and ... |
// The reload value is set any time the timer data value is "set". |
// Reading the register returns the timer value. Controls are |
// set so that writing a value to the timer automatically starts |
// it counting down. |
// 2. Two registers, one for control one for value. |
// The control register would have the reload value in it. |
// On the clock when the interface is set to zero the interrupt is set. |
// Hence setting the timer to zero will disable the timer without |
// setting any interrupts. Thus setting it to five will count |
// 5 clocks: 5, 4, 3, 2, 1, Interrupt. |
// |
// |
// Control bits: |
// Start_n/Stop. Writing a '0' starts the timer, '1' stops it. |
// Thus, ignoring this bit sets it to start. |
// AutoReload. If set, then on reset the timer automatically |
// loads the last set value and starts over. This is |
// useful for distinguishing between a one-time interrupt |
// timer, and a repetitive interval timer. |
// (COUNT: If set, the timer only ticks whenever an external |
// line goes high. What this external line is ... is |
// not specified here. This, however, breaks my |
// interface ideal of having our peripheral set not depend |
// upon anything. Hence, this is an advanced option |
// enabled at compile time only.) |
// (INTEN. Interrupt enable--reaching zero always creates an |
// interrupt, so this control bit isn't needed. The |
// interrupt controller can be used to mask the interrupt.) |
// (COUNT-DOWN/UP: This timer is *only* a count-down timer. |
// There is no means of setting it to count up.) |
// WatchDog |
// This timer can be implemented as a watchdog timer simply by |
// connecting the interrupt line to the reset line of the CPU. |
// When the timer then expires, it will trigger a CPU reset. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
module ziptimer(i_clk, i_rst, i_ce, |
i_wb_cyc, i_wb_stb, i_wb_we, i_wb_data, |
o_wb_ack, o_wb_stall, o_wb_data, |
o_int); |
parameter BW = 32, VW = (BW-2); |
input i_clk, i_rst, i_ce; |
// Wishbone inputs |
input i_wb_cyc, i_wb_stb, i_wb_we; |
input [(BW-1):0] i_wb_data; |
// Wishbone outputs |
output reg o_wb_ack; |
output wire o_wb_stall; |
output wire [(BW-1):0] o_wb_data; |
// Interrupt line |
output reg o_int; |
|
reg r_auto_reload, r_running; |
reg [(VW-1):0] r_reload_value; |
initial r_running = 1'b0; |
initial r_auto_reload = 1'b0; |
always @(posedge i_clk) |
if (i_rst) |
begin |
r_running <= 1'b0; |
r_auto_reload <= 1'b0; |
end else if ((i_wb_cyc)&&(i_wb_stb)&&(i_wb_we)) |
begin |
r_running <= (~i_wb_data[(BW-1)])&&(|i_wb_data[(BW-2):0]); |
r_auto_reload <= (i_wb_data[(BW-2)]); |
|
// If setting auto-reload mode, and the value to other |
// than zero, set the auto-reload value |
if ((i_wb_data[(BW-2)])&&(|i_wb_data[(BW-3):0])) |
r_reload_value <= i_wb_data[(BW-3):0]; |
end |
|
reg [(VW-1):0] r_value; |
initial r_value = 0; |
always @(posedge i_clk) |
if ((r_running)&&(|r_value)&&(i_ce)) |
begin |
r_value <= r_value - 1; |
end else if ((r_running)&&(r_auto_reload)) |
r_value <= r_reload_value; |
else if ((~r_running)&&(i_wb_cyc)&&(i_wb_stb)&&(i_wb_we)) |
r_value <= i_wb_data[(VW-1):0]; |
|
// Set the interrupt on our last tick. |
initial o_int = 1'b0; |
always @(posedge i_clk) |
if (i_ce) |
o_int <= (r_running)&&(r_value == { {(VW-1){1'b0}}, 1'b1 }); |
else |
o_int <= 1'b0; |
|
initial o_wb_ack = 1'b0; |
always @(posedge i_clk) |
o_wb_ack <= (i_wb_cyc)&&(i_wb_stb); |
assign o_wb_stall = 1'b0; |
|
assign o_wb_data = { ~r_running, r_auto_reload, r_value }; |
|
endmodule |
/////////////////////////////////////////////////////////////////////////// |
// |
// Filename: ziptrap.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: On any write, generate an interrupt. On any read, return |
// the value from the last write. |
// |
// This peripheral was added to the Zip System to compensate for the lack |
// of any trap instruction within the Zip instruction set. Such an |
// instruction is used heavily by modern operating systems to switch |
// from a user process to a system process. Since there was no way |
// to build such an interface without a trap instruction, this was added |
// to accomplish that purpose. |
// |
// However, in early simulation testing it was discovered that this |
// approach would not be very suitable: the interrupt was not generated |
// the next clock as one would expect. Hence, executing a trap became: |
// |
// TRAP $5 MOV $TrapAddr, R0 |
// LDI $5,R1 |
// STO R1,(R0) |
// NOOP |
// NOOP -- here the trap would take effect |
// ADD $5,R6 ADD $5,R6 |
// |
// This was too cumbersome, necessitating NOOPS and such. Therefore, |
// the CC register was extended to hold a trap value. This leads to |
// |
// TRAP $5 LDI $500h,CC |
// ; Trap executes immediately, user sees no |
// ; delay's, no extra wait instructions. |
// ADD $5,R6 ADD $5,R6 |
// |
// (BTW: The add is just the "next instruction", whatever that may be.) |
// Note the difference: there's no longer any need to load the trap |
// address into a register (something that usually could not be done with |
// a move, but rather a LDIHI/LDILO pair). There's no longer any wait |
// for the Wishbone bus, which could've introduced a variable delay. |
// Neither are there any wait states while waiting for the system process |
// to take over and respond. Oh, and another difference, the new approach |
// no longer requires the system to activate an interrupt line--the user |
// process can always initiate such an interrupt. Hence, the new |
// solution is better rendering this peripheral obsolete. |
// |
// It is maintained here to document this part of the learning process. |
// |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
/////////////////////////////////////////////////////////////////////////// |
// |
module ziptrap(i_clk, |
i_wb_cyc, i_wb_stb, i_wb_we, i_wb_data, |
o_wb_ack, o_wb_stall, o_wb_data, |
o_int); |
parameter BW = 32; // Bus width |
input i_clk; |
// Wishbone inputs |
input i_wb_cyc, i_wb_stb, i_wb_we; |
input [(BW-1):0] i_wb_data; |
// Wishbone outputs |
output reg o_wb_ack; |
output wire o_wb_stall; |
output reg [(BW-1):0] o_wb_data; |
// Interrupt output |
output reg o_int; |
|
initial o_wb_ack = 1'b0; |
always @(posedge i_clk) |
o_wb_ack <= ((i_wb_cyc)&&(i_wb_stb)); |
assign o_wb_stall = 1'b0; |
|
// Initially set to some of bounds value, such as all ones. |
initial o_wb_data = {(BW){1'b1}}; |
always @(posedge i_clk) |
if ((i_wb_cyc)&&(i_wb_stb)&&(i_wb_we)) |
o_wb_data <= i_wb_data; |
|
// Set the interrupt bit on any write. |
initial o_int = 1'b0; |
always @(posedge i_clk) |
if ((i_wb_cyc)&&(i_wb_stb)&&(i_wb_we)) |
o_int <= 1'b1; |
else |
o_int <= 1'b0; |
|
endmodule |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Filename: icontrol.v |
// |
// Project: Zip CPU -- a small, lightweight, RISC CPU soft core |
// |
// Purpose: An interrupt controller, for managing many interrupt sources. |
// |
// This interrupt controller started from the question of how best to |
// design a simple interrupt controller. As such, it has a few nice |
// qualities to it: |
// 1. This is wishbone compliant |
// 2. It sits on a 32-bit wishbone data bus |
// 3. It only consumes one address on that wishbone bus. |
// 4. There is no extra delays associated with reading this |
// device. |
// 5. Common operations can all be done in one clock. |
// |
// So, how shall this be used? First, the 32-bit word is broken down as |
// follows: |
// |
// Bit 31 - This is the global interrupt enable bit. If set, interrupts |
// will be generated and passed on as they come in. |
// Bits 16-30 - These are specific interrupt enable lines. If set, |
// interrupts from source (bit#-16) will be enabled. |
// To set this line and enable interrupts from this source, write |
// to the register with this bit set and the global enable set. |
// To disable this line, write to this register with global enable |
// bit not set, but this bit set. (Writing a zero to any of these |
// bits has no effect, either setting or unsetting them.) |
// Bit 15 - This is the any interrupt pin. If any interrupt is pending, |
// this bit will be set. |
// Bits 0-14 - These are interrupt bits. When set, an interrupt is |
// pending from the corresponding source--regardless of whether |
// it was enabled. (If not enabled, it won't generate an |
// interrupt, but it will still register here.) To clear any |
// of these bits, write a '1' to the corresponding bit. Writing |
// a zero to any of these bits has no effect. |
// |
// The peripheral also sports a parameter, IUSED, which can be set |
// to any value between 1 and (buswidth/2-1, or) 15 inclusive. This will |
// be the number of interrupts handled by this routine. (Without the |
// parameter, Vivado was complaining about unused bits. With it, we can |
// keep the complaints down and still use the routine). |
// |
// To get access to more than 15 interrupts, chain these together, so |
// that one interrupt controller device feeds another. |
// |
// |
// Creator: Dan Gisselquist, Ph.D. |
// Gisselquist Tecnology, LLC |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
// Copyright (C) 2015, Gisselquist Technology, LLC |
// |
// This program is free software (firmware): you can redistribute it and/or |
// modify it under the terms of the GNU General Public License as published |
// by the Free Software Foundation, either version 3 of the License, or (at |
// your option) any later version. |
// |
// This program is distributed in the hope that it will be useful, but WITHOUT |
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or |
// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License |
// for more details. |
// |
// License: GPL, v3, as defined and found on www.gnu.org, |
// http://www.gnu.org/licenses/gpl.html |
// |
// |
//////////////////////////////////////////////////////////////////////////////// |
// |
module icontrol(i_clk, i_reset, i_wr, i_proc_bus, o_proc_bus, |
i_brd_ints, o_interrupt_strobe); |
parameter IUSED = 15; |
input i_clk, i_reset; |
input i_wr; |
input [31:0] i_proc_bus; |
output wire [31:0] o_proc_bus; |
input [(IUSED-1):0] i_brd_ints; |
output reg o_interrupt_strobe; |
|
reg [(IUSED-1):0] r_int_state; |
reg [(IUSED-1):0] r_int_enable; |
wire [(IUSED-1):0] nxt_int_state; |
reg r_any, r_interrupt, r_gie; |
|
assign nxt_int_state = (r_int_state|i_brd_ints); |
initial r_int_state = 0; |
always @(posedge i_clk) |
if (i_reset) |
r_int_state <= 0; |
else if (i_wr) |
r_int_state <= nxt_int_state & (~i_proc_bus[(IUSED-1):0]); |
else |
r_int_state <= nxt_int_state; |
initial r_int_enable = 0; |
always @(posedge i_clk) |
if (i_reset) |
r_int_enable <= 0; |
else if ((i_wr)&&(i_proc_bus[31])) |
r_int_enable <= r_int_enable | i_proc_bus[(16+IUSED-1):16]; |
else if ((i_wr)&&(~i_proc_bus[31])) |
r_int_enable <= r_int_enable & (~ i_proc_bus[(16+IUSED-1):16]); |
|
initial r_gie = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
r_gie <= 1'b0; |
else if (i_wr) |
r_gie <= i_proc_bus[31]; |
|
initial r_any = 1'b0; |
always @(posedge i_clk) |
r_any <= ((r_int_state & r_int_enable) != 0); |
initial r_interrupt = 1'b0; |
always @(posedge i_clk) |
r_interrupt <= r_gie & r_any; |
|
generate |
if (IUSED < 15) |
begin |
assign o_proc_bus = { |
r_gie, { {(15-IUSED){1'b0}}, r_int_enable }, |
r_any, { {(15-IUSED){1'b0}}, r_int_state } }; |
end else begin |
assign o_proc_bus = { r_gie, r_int_enable, r_any, r_int_state }; |
end endgenerate |
|
reg int_condition; |
initial int_condition = 1'b0; |
initial o_interrupt_strobe = 1'b0; |
always @(posedge i_clk) |
if (i_reset) |
begin |
int_condition <= 1'b0; |
o_interrupt_strobe <= 1'b0; |
end else if (~r_interrupt) // This might end up generating |
begin // many, many, (wild many) interrupts |
int_condition <= 1'b0; |
o_interrupt_strobe <= 1'b0; |
end else if ((~int_condition)&&(r_interrupt)) |
begin |
int_condition <= 1'b1; |
o_interrupt_strobe <= 1'b1; |
end else |
o_interrupt_strobe <= 1'b0; |
|
endmodule |
Zip CPU Goals
+The original goal of the ZIP CPU was a simple CPU. For this reason, + all instructions have been designed to be as simple as possible, and + are all designed to be executed in one instruction cycle per + instruction, barring pipeline stalls. This has resulted in the choice + to drop push and pop instructions, pre-increment and post-decrement + addressing modes, and more. +
For those who like buzz words, the Zip CPU is: +
-
+
- A 32-bit CPU: All registers are 32-bits, addresses are 32-bits, + instructions are 32-bits wide, etc. +
- A RISC CPU. There is no microcode for executing instructions. +
- A Load/Store architecture. (Only load and store instructions + can access memory.) +
- Wishbone compliant. All peripherals are accessed just like + memory across this bus. +
- A Von-Neumann architecture. (The instructions and data share a + common bus.) +
- A pipelined architecture, having stages for Prefetch, + Decode, Read-Operand, the ALU/Memory + unit, and Write-back +
Now, however, that I've worked on the Zip CPU for a while, it is not nearly +as simple as I originally hoped. Worse, I've had to adjust to create +capabilities that I was never expecting to need. These include: +
-
+
- Extenal Debug: Once placed upon an FPGA, I'm going to need + a means of debugging this CPU. That means that there needs to be an + external register that can control the CPU: reset it, halt it, step + it, and tell + whether it is running or not. Another register is placed similar to + this register, to allow the external controller to examine registers + internal to the CPU. +
- Internal Debug: Being able to run a debugger from within + a user process requires an ability to step a user process from + within a debugger. It also requires a break instruction that can + be substituted for any other instruction, and substituted back. + The break is actually difficult: the break instruction cannot be + allowed to execute. That way, upon a break, the debugger should + be able to jump back into the user process to step the instruction + that would've been at the break point initially, and then to + replace the break after passing it. + +
- Prefetch CacheMy original implementation had a very + simple prefetch stage. Any time the PC changed the prefetch would go + and fetch the new instruction. While this was perhaps this simplest + approach, it cost roughly five clocks for every instruction. This + was deemed unacceptable, as I wanted a CPU that could execute + instructions in one cycle. I therefore have a prefetch cache that + issues pipelined wishbone accesses to memory and then pushes + instructions at the CPU. Sadly, this accounts for about 20% of the + logic in the entire CPU, or 15% of the logic in the entire system. + + +
- Operating System:In order to support an operating system,
+ interrupts and so forth, the CPU needs to support supervisor and
+ user modes, as well as a means of switching between them. For example,
+ the user needs a means of executing a system call. This is the
+ purpose of the 'trap' instruction. This instruction needs to
+ place the CPU into supervisor mode (here equivalent to disabling
+ interrupts), as well as handing it a parameter such as identifying
+ which O/S function was called.
+
+
My initial approach to building a trap instruction was to create + an external peripheral which, when written to, would generate an + interrupt and could return the last value written to it. This failed + timing requirements, however: the CPU executed two instructions while + waiting for the trap interrupt to take place. Since then, I've + decided to keep the rest of the CC register for that purpose so that a + write to the CC register, with the GIE bit cleared, could be used to + execute a trap. + +
Modern timesharing systems also depend upon a Timer interrupt + to handle task swapping. For the Zip CPU, this interrupt is handled + external to the CPU as part of the CPU System, found in + zipsystem.v. The timer module itself is found in + ziptimer.v. + +
- Pipeline Stalls: My original plan was to not support pipeline
+ stalls at all, but rather to require the compiler to properly schedule
+ instructions so that stalls would never be necessary. After trying
+ to build such an architecture, I gave up, having learned some things:
+
+
For example, in order to facilitate interrupt handling and debug + stepping, the CPU needs to know what instructions have finished, and + which have not. In other words, it needs to know where it can restart + the pipeline from. Once restarted, it must act as though it had + never stopped. This killed my idea of delayed branching, since + what would be the appropriate program counter to restart at? + The one the CPU was going to branch to, or the ones in the + delay slots? + +
So I switched to a model of discrete execution: Once an instruction + enters into either the ALU or memory unit, the instruction is + guaranteed to complete. If the logic recognizes a branch or a + condition that would render the instruction entering into this stage + possibly inappropriate (i.e. a conditional branch preceeding a store + instruction for example), then the pipeline stalls for one cycle + until the conditional branch completes. Then, if it generates a new + PC address, the stages preceeding are all wiped clean. + +
The discrete execution model allows such things as sleeping: if the + CPU is put to "sleep", the ALU and memory stages stall and back up + everything before them. Likewise, anything that has entered the ALU + or memory stage when the CPU is placed to sleep continues to completion. +
To handle this logic, each pipeline stage has three control signals: + a valid signal, a stall signal, and a clock enable signal. In + general, a stage stalls if it's contents are valid and the next step + is stalled. This allows the pipeline to fill any time a later stage + stalls. + +
- Verilog Modules: When examining how other processors worked
+ here on open cores, many of them had one separate module per pipeline
+ stage. While this appeared to me to be a fascinating and commendable
+ idea, my own implementation didn't work out quite so nicely.
+
+
As an example, the decode module produces a lot of + control wires and registers. Creating a module out of this, with + only the simplest of logic within it, seemed to be more a lesson + in passing wires around, rather than encapsulating logic. + +
Another example was the register writeback section. I would love + this section to be a module in its own right, and many have made them + such. However, other modules depend upon writeback results other + than just what's placed in the register (i.e., the control wires). + For these reasons, I didn't manage to fit this section into it's + own module. + +
The result is that the majority of the CPU code can be found in + the zipcpu.v file. +
Zip CPU Instruction Set
+The Zip CPU supports a set of two operand instructions, where the first operand +(always a register) is the result. The only exception is the store instruction, +where the first operand (always a register) is the source of the data to be +stored. +Register Set
+The Zip CPU supports two sets of sixteen 32-bit registers, a supervisor +and a user set. The supervisor set is used in interrupt mode, whereas +the user set is used otherwise. Of this register set, the Program Counter (PC) +is register 15, whereas the status register (SR) or condition code register +(CC) is register 14. By convention, the stack pointer will be register 13 and +noted as (SP)--although the instruction set allows it to be anything. +The CPU can access both register sets via move instructions from the +supervisor state, whereas the user state can only access the user registers. + +The status register is special, and bears further mention. The lower +8 bits of the status register form a set of condition codes. Writes to other +bits are preserved, and can be used as part of the trap architecture--examined +by the O/S upon any interrupt, cleared before returning. +
Of the eight condition codes, the bottom four are the current flags: + Zero (Z), + Carry (C), + Negative (N), + and Overflow (V). + +
The next bit is a clock enable (0 to enable) or sleep bit (1 to put + the CPU to sleep). Setting this bit will cause the CPU to + wait for an interrupt (if interrupts are enabled), or to + completely halt (if interrupts are disabled). +
The sixth bit is a global interrupt enable bit (GIE). When this + sixth bit is a '1' interrupts will be enabled, else disabled. When + interrupts are disabled, the CPU will be in supervisor mode, otherwise + it is in user mode. Thus, to execute a context switch, one only + need enable or disable interrupts. (When an interrupt line goes + high, interrupts will automatically be disabled, as the CPU goes + and deals with its context switch.) +
Experimental: The seventh bit is a step bit. This bit can be + set from supervisor mode only. After setting this bit, should + the supervisor mode process switch to user mode, it would then + accomplish one instruction in user mode before returning to supervisor + mode. Then, upon return to supervisor mode, this bit will + be automatically cleared. This bit has no effect on the CPU while in + supervisor mode. +
This functionality was added to enable a userspace debugger + functionality on a user process, working through supervisor mode + of course. +
Experimental: The eighth bit is a break enable bit. This + controls whether a break instruction will halt the processor for an + external debuggerr (break enabled), or whether the break instruction + will simply set the STEP bit and send the CPU into interrupt mode. + This bit can only be set within supervisor mode. +
This functionality was added to enable an external debugger to + set and manage breakpoints. +
The status register bits are shown below: +
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
---|---|---|---|---|---|---|---|
BREAKEN | STEP | GIE | SLEEP | V | N | C | Z |
Conditions
+Most, although not quite all, instructions are conditional. From the four +condition code flags, eight conditions are defined. These are: +Code | Mneumonic | Condition |
---|---|---|
3'h0 | (None) | Always |
3'h1 | .Z | Equal (Zero set) |
3'h2 | .NE | Not equal to (!Z) |
3'h3 | .GE | Greater than or equal (N not set, Z irrelevant) |
3'h4 | .GT | Greater than (N not set, Z not set) |
3'h5 | .LT | Less than (N set) |
3'h6 | .C | Carry set |
3'h7 | .V | Overflow set |
Operand B
+Many instruction forms have a 21-bit source "Operand B" associated with them. +This Operand B is either equal to a register plus a signed immediate offset, +or an immediate offset by itself. This value is encoded as, +20 | 19 | 18 | 17 | 16 | +15 | 14 | 13 | 12 | +11 | 10 | 9 | 8 | +7 | 6 | 5 | 4 | +3 | 2 | 1 | 0 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1'b0 | Signed Immediate Value | |||||||||||||||||||
1'b1 | Register | Signed immediate offset |
Address Mode(s)
+The ZIP CPU supports two addressing modes: register plus immediate, and +immediate address. Addresses are therefore encoded in the same fashion as +Operand B's, shown above. + +A lot of long hard thought was put into whether to allow pre/post increment +and decrement addressing modes. Finding no way to use these operators without +taking two or more clocks per instruction, these addressing modes have been +removed from the realm of possibilities. This means that the Zip CPU has no +native way of executing push, pop, return, or jump to subroutine operations. + +
Move Operands
+The previous set of operands would be perfect and complete, save only that + the CPU needs access to non--supervisory registers while in supervisory + mode. Therefore, the MOV instruction is special and offers access + to these registers ... when in supervisory mode. To keep the compiler + simple, the extra bits are ignored in non-supervisory mode (as though + they didn't exist), rather than being mapped to new instructions or + additional capabilities. The bits indicating which register set each + register lies within are the A-map and B-map bits. Further, because + a load immediate instruction exists, there is no move capability between + an immediate and a register: all moves come from either a register or + a register plus an offset. +
This actually leads to a bit of a problem: since the MOV instruction + encodes which register set each register is coming from or moving to, + how shall a compiler or assembler know how to compile a MOV instruction + without knowing the mode of the CPU at the time? For this reason, + the compiler will assume all MOV registers are supervisor registers, + and display them as normal. Anything with the user bit set will + be treated as a user register. The CPU will quietly ignore the + supervisor bits while in user mode, and anything marked as a user + register will always be valid. +
Native Instructions
+Op Code | 31..24 | +23..16 | +15..8 | +7..0 | +Sets CC? (Y/N) | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
+ | + + | + | + + | + | + + | + | + + | ||||||||||||||||||||||||||
CMP(Sub) | 4'h0 | +Data Reg | Conditions | +Operand B | Y | ||||||||||||||||||||||||||||
BTST(And) | 4'h1 | +Data Reg | Conditions | +Operand B | |||||||||||||||||||||||||||||
MOV | 4'h2 | +Data Reg | Conditions | +A-Map | +B-Reg | +B-Map | ++Immediate | N | |||||||||||||||||||||||||
LODI | 4'h3 | Result Reg | +24'bit Signed Immediate | ||||||||||||||||||||||||||||||
NOOP | 4'h4 | 4'he | +24'h00 | N | |||||||||||||||||||||||||||||
BREAK | 4'h4 | 4'he | +24'h01 | N | |||||||||||||||||||||||||||||
LODIHI | 4'h4 | 4'hf | +Conditions | +1'b1 | Result Reg | +16-bit Immediate | N | ||||||||||||||||||||||||||
LODILO | 4'h4 | 4'hf | +Conditions | +1'b0 | Result Reg | +16-bit Immediate | N | ||||||||||||||||||||||||||
16-b MPY | 4'h4 | +Result Reg | Conditions | +Operand + B (Reserved for) | +N | ||||||||||||||||||||||||||||
ROL | 4'h5 | +Result Reg | +Conditions | +2'b11 | +Operand Reg | +6'h00, Unused/Reserved | +1'b0 | +Immediate | N | ||||||||||||||||||||||||
1'b0 | +Rotate amount | +6'h00, Unused/Reserved | +1'b0 | +Immediate | N | ||||||||||||||||||||||||||||
LOD | 4'h6 | +Resulting Reg | Conditions | +Address: Register+Immediate, or Immediate | N | ||||||||||||||||||||||||||||
STO | 4'h7 | +Data Reg | Conditions | +Address: Register+Immediate, or Immediate | |||||||||||||||||||||||||||||
SUB | 4'h8 | +Result Reg | Conditions | +Operand B | Y | ||||||||||||||||||||||||||||
AND | 4'h9 | +Result Reg | Conditions | +Operand B | |||||||||||||||||||||||||||||
ADD | 4'ha | +Result Reg | Conditions | +Operand B | |||||||||||||||||||||||||||||
OR | 4'hb | +Result Reg | Conditions | +Operand B | |||||||||||||||||||||||||||||
XOR | 4'hc | +Result Reg | Conditions | +Operand B | |||||||||||||||||||||||||||||
LSL/ASL | 4'hd | +Result Reg | Conditions | +Operand B | |||||||||||||||||||||||||||||
ASR | 4'he | +Result Reg | Conditions | +Operand B | |||||||||||||||||||||||||||||
LSR | 4'hf | +Result Reg | Conditions | +Operand B |
Derived Instructions
+Mapped | Actual | Notes | |
---|---|---|---|
ADD Ra,Rx + ADDC Rb,Ry | ADD Ra,Rx ADD.C $1,Ry ADD Rb,Ry | Add with carry | |
BRA.cond +/-$Addr | + MOV.cond $Addr+PC,PC | Branch/jump on condition. Works for 14 bit address offsets. | |
LDI $Addr,Rx + ADD.cond Rx,PC | Branch/jump on condition. Works for + 23 bit address offsets, but costs a register, an extra instruction, + and setsthe flags. | ||
BNC PC+$Addr |
+ TEST $Carry,CC + MOV.Z PC+$addr,PC |
+ Example of a branch on an unsupported + condition, in this case a branch on not carry | |
CLRF.NZ Rx | XOR.NZ Rx,Rx | Clear Rx, and flags, if the Z-bit is not set | |
CLR Rx | LDI $0,Rx | Clears Rx, leaves flags untouched. This instruction cannot be conditional. | |
EXCH.W Rx | ROL $16,Rx | Exchanges the top and bottom 16'bit words of Rx | |
HALT | Or $SLEEP,CC | Executed while in interrupt mode. In user mode this is simply a wait until interrupt instructioon. | |
INT | AND $!GIE,CC | Without setting an + interrupt flag or trap vector, the O/S might not know what to do with + this instruction. Therefore the trap version is recommended | |
IRET | OR $GIE,CC | ||
JMP R6+$Addr | MOV $Addr(R6),PC | ||
JSR PC+$Addr |
+ SUB $1,SP + MOV $3+PC,R0 + STO R0,1(SP) + MOV $Addr+PC,PC + ADD $1,SP | Jump to Subroutine. | |
MOV $3+PC,R12 MOV $addr+PC,PC | This is the high speed + version of the call, necessitating a register to hold the last + PC address. In its favor, this method doesn't suffer the mandatory + memory access of the other approach. | ||
JTU | OR $GIE,CC | Also known as a JUMP-To-USER + space command, also known as IRET. | |
LDI.l $val,Rx |
+ LDIHI HIBITS($val),Rx + LDILO LOBITS($val),Rx | Sadly, there's not enough instruction + space to load a complete immediate value into any register. + Therefore, fully loading any register takes two cycles. + The LDIHI (load immediate high) and LDILO (load immediate low) + instructions have been created to facilitate this. | |
LOD.b $addr,Rx |
+ LDI $addr,Ra + LDI $addr,Rb + LSR $2,Ra + AND $3,Rb + LOD (Ra),Rx + LSL $3,Rb + SUB $32,Rb + ROL Rb,Rx + AND $0ffh,Rx | This CPU is designed for 32'bit word
+ length instructions. Byte addressing is not supported by the CPU or
+ the bus, so it therefore takes more work to do. Note that in + this example, $Addr is a byte-wise address, where all other addresses + are 32-bit wordlength addresses. For this reason, we needed to + drop the bottom two bits. | |
LSL $1,Rx LSLC $1,Ry |
+ LSL $1,Ry + LSL $1,Rx + OR.C $1,Ry | Logical shift left with carry. Note that the + instruction order is now backwards, to keep the conditions valid. + That is, LSL sets the carry flag, so if we did this the othe way + with Rx before Ry, then the condition flag wouldn't have been right + for an OR correction at the end. | |
LSR $1,Rx LSRC $1,Ry |
+ CLR Rz + LSR $1,Ry + LDIHI.C $8000h,Rz + LSR $1,Rx + OR Rz,Rx | Logical shift right with carry | |
NEG Rx | XOR $-1,Rx ADD $1,Rx | ||
NOOP | NOOP | While there are many + operations that do nothing, such as MOV Rx,Rx, or OR $0,Rx, these + operations have consequences in that they might stall the bus if + Rx isn't ready yet. For this reason, we have a dedicated NOOP + instruction. | |
NOT Rx | XOR $-1,Rx | ||
POP Rx | LOD $-1(SP),Rx ADD $1,SP | Note + that for interrupt purposes, one can never depend upon the value at + (SP). Hence you read from it, then increment it, lest having + incremented it firost something then comes along and writes to that + value before you can read the result. | |
PUSH Rx |
+ SUB $1,SP + STO Rx,$1(SP) | ||
RESET | STO $1,$watchdog(R12) NOOP NOOP |
+ This depends upon the peripheral base address being in R12.
+ Another opportunity might be to jump to the reset address from within + supervisor mode. + | |
RET | LOD $-1(SP),R0 + MOV $-1+SP,SP + MOV R0,PC | An alternative might be to LOD $-1(SP),PC, followed + by depending upon the calling program to ADD $1,SP. | |
MOV R12,PC | This is the high(er) speed version, that doesn't + touch the stack. As such, it doesn't suffer a stall on memory + read/write to the stack. | ||
STEP Rr,Rt | LSR $1,Rr XOR.C Rt,Rr | Step a + Galois implementation of a Linear Feedback Shift Register, Rr, using + taps Rt | |
STO.b Rx,$addr |
+ LDI $addr,Ra + LDI $addr,Rb + LSR $2,Ra + AND $3,Rb + SUB $32,Rb + LOD (Ra),Ry + AND $0ffh,Rx + AND $-0ffh,Ry + ROL Rb,Rx + OR Rx,Ry + STO Ry,(Ra) | This CPU and it's bus are not optimized
+ for byte-wise operations. Note that in this example, $addr is a + byte-wise address, whereas in all of our other examples it is a + 32-bit word address. Further, this instruction implies a byte ordering, + such as big or little endian. | |
SWAP Rx,Ry |
+ XOR Ry,Rx + XOR Rx,Ry + XOR Ry,Rx | While no extra registers are needed, this example + does take 3-clocks. | |
TRAP #X | LDILO $x,CC | + This approach uses the unused bits of the CC register as a TRAP + address. If these bits are zero, no trap has occurred. Unlike my + previous approach, which was to use a trap peripheral, this approach + has no delay associated with it. To work, the supervisor will need + to clear this register following any trap, and the user will need to + be careful to only set this register prior to a trap condition. + Likewise, when setting this value, the user will need to make certain + that the SLEEP and GIE bits are not set in $x. LDI would also work, + however using LDILO permits the use of conditional traps. (i.e., + trap if the zero flag is set.) Should you wish to trap off of a + register value, you could equivalently load $x into the register and + then MOV it into the CC register. + | |
TST Rx | TST $-1,Rx | Set the + condition codes based upon Rx. Could also do a CMP $0,Rx, + ADD $0,Rx, SUB $0,Rx, etc, AND $-1,Rx, etc. The TST and CMP + approaches won't stall future pipeline stages looking for the value + of Rx. | |
WAIT | Or $SLEEP,CC | Wait + 'til interrupt. In an interrupts disabled context, this becomes a + HALT instruction. |
Pipeline Stages
+-
+
- PREFETCH: Read instruction from memory (cache if possible)
+
-
+
- A lack of an instruction, or a waiting memory operation, stalls the + pipeline. +
- DECODE: Decode instruction into op code, register(s) to read, and
+ immediate offset.
+
- INPUT: Instruction +
- OUTPUT: 5-bit register address of result,
+ 5-bit register address of an input and usage flag (this
+ register is used), 5-bit register address of second input and
+ usage flag, 32-bit immediate offset.
+
decode(i_clk, (i_ce)&(~stall), i_instr, i_gie, i_pc, + o_opcode, o_ccode, + o_wr_back, o_wr_reg, o_ra_read, o_ra_reg, + o_rb_read, o_rb_reg, + o_immediate, + o_memop, o_wr, o_iodec); +
- Move instruction gets one decoder, produces two registers addresses, + use flag set to one on register B, address A is unused. +
- Load/Store instructions produce two registers, an immediate, and + two flags +
- Operand B type instructions produce two registers, an immediate, + and a use flag +
- LDI produces one register, a (longer) immediate, and sets the use + flag to zero (second register isn't used) +
- This section never stalls. On an external stall it simply doesn't update + it's outputs. Outputs are available one clock after + the instruction is valid. +
- READ OPERANDS: Read registers and apply any immediate values to them.
+
-
+
- This should stall if a source operand is pending. +
- Split into two tracks: A) ALU accomplish simple instruction, B) MEMOPS memory read/write.
+
-
+
- Loads stall instructions that access the register until it is + written to the register set. +
- Condition codes are available upon completion +
- Issuing an instruction to the memory while the memory is busy will + stall the bus. If the bus deadlocks, only a reset will + release the CPU. (Watchdog timer, anyone?) +
- WRITE-BACK: Conditionally write back the result to register set, applying the
+ condition. This routine is bi-re-entrant. Either the memory or the
+ simple instruction may request a register write. Memory writes take
+ priority, stalling the other track.
+
-
+
- This stage will stall the pipeline if both memory and op + try to write to the registers at the same time. +
- +
Pipeline Logic
+How the CPU handles some instruction combinations can be telling when +determining what happens in the pipeline. For example: +Instruction(s) | Issue | Choice |
---|---|---|
Delayed Brnaching | What happens in debug mode?
+ That is, what happens when a debugger tries to single step an
+ instruction? While I can easily single step the computer in either
+ user or supervisor mode from externally, this processor does not appear
+ able to step the CPU in user mode from within user mode--gosh, not even
+ from within supervisor mode--such as if a process had a debugger
+ attached. As the processor exists, I would have one result stepping
+ the CPU from a debugger, and another stepping it externally.
+ This is unacceptable. + | |
MOV R0,R1 MOV R1,R2 | What value does + R2 get, the value of R1 before the first move or the value of R0? + Placing the value of R0 into R1 requires a pipeline stall, and possibly + two, as I have the pipeline designed. | R2 must + equal R0 at the end of this operation. This may stall the pipeline + 1-2 cycles. |
CMP R0,R1 MOV.EQ $x,PC | At issue is + the same item as above, save that the CMP instruction updates the + flags that the MOV instruction depends + upon. | Condition codes must be updated and available + immediately for the next instruction without stalling the + pipeline. |
CMP R0,R1 MOV CC,R2 | At issue is the + fact that the logic supporting the CC register is more complicated than + the logic supporting any other register. | This will + create a stall, of 1-2 clock cycles |
ADD $5,R0 BTST $8,CC | Test for + overflow (or not). At issue is the load of the condition codes for + the BTST instruction, which takes place two clocks before the prior + instruction writes it back. | Negotiable for + simplified logic. Let's stall here, 1-2 clocks. |
ADD $x,PC MOV R0,R1 | Will the + instruction following the jump take place before the jump? In + other words, is the MOV to the PC register handled differently from + an ADD to the PC register? | + MOV'es and ADD's use the same logic (simplifies the logic). |
MOV $x,PC MOV R0,R1 | Will the + instruction following the jump take place before the jump? Or must the + pipeline "turn off" any outputs associated with a jump once it + recognizes that the jump has taken place? Alternatively, the pipeline + could stall until the result of the MOV was + available. | + Negotiable for simplified logic |
MOV $x,PC MOV R0,R1 | Will the + instruction following the jump take place before the jump sometimes + but not all times? Might the pipeline not be full when the jump + takes place, and thus the MOV instruction never gets loaded due to a + stalled pre-fetch? Or is the MOV a dependable instruction, guaranteed + to be executed (or not) no matter what the JMP does? Perhaps the + compiler is required to insert 2-3 NOOP's following a jump, just + to keep the pipeline doing something reliably? + | + Negotiable (at present). Highly desired that the behavior + is the same regardless of the prefetch speed. |
MOV.EQ $x,PC MOV $y,PC | Where will + instructions take place next? On a delayed jump instruction, this + means that instruction $x will be executed followed by $y, and + execution will not continue at $x as + desired | Negotiable. | +
As I've studied this, I find several approaches to handling pipeline + issues. These approaches (and their consequences) are listed below. +
Condition/Case | Discussion |
---|---|
All issued instructions complete
+ Stages stall individually | What about a
+ slow pre-fetch? Nominally, this works well: any issued instruction + just runs to completion. If there are four issued instructions in the + pipeline, with the writeback instruction being a write-to-PC + instruction, the other three instructions naturally finish. + This approach fails when reading instructions from the flash, + since such reads require N clocks to clocks to complete. Thus + there may be only one instruction in the pipeline if reading from flash, + or a full pipeline if reading from cache. Each of these approaches + would produce a different response. + This is unacceptable. |
Issued instructions may be canceled
+ Stages stall individually | First problem:
+ Memory operations cannot be canceled, even reads may have side effects
+ on peripherals that cannot be canceled later. Further, in the case of
+ an interrupt, it's difficult to know what to cancel. What happens in
+ a MOV.C $x,PC followed by a MOV $y,PC instruction? Which get
+ canceled? Because it isn't clear what would need to be canceled, + this is not doable. |
All issued instructions complete.
+ All stages are filled, or the entire pipeline + stalls. | What about debug control? What about
+ register writes taking an extra clock stage? MOV R0,R1; MOV R1,R2
+ should place the value of R0 into R2. How do you restart the pipeline
+ after an interrupt? What address do you use? The last issued
+ instruction? But the branch delay slots may make that invalid!
+ Reading from the CPU debug port in this case yields inconsistent + results: the CPU will halt or step with instructions stuck in the + pipeline. Reading registers will give no indication of what is going + on in the pipeline, just the results of completed operations, not of + operations that have been started and not yet completed. + Perhaps we should just report the state of the CPU based upon what + instructions (PC values) have successfully completed? Thus the + debug instruction is the one that will write registers on the next + clock. +
The next problem, though, is how to deal with the read operand + pipeline stage needing the result from the register pipeline. |
All instructions that enter into the memory module *must* + complete. Issued instructions from the prefetch, decode, or operand + read stages may or may not complete. Jumps into code must be valid, + so that interrupt returns may be valid. All instructions entering the + ALU complete. | This looks to be the simplest approach.
+ While the logic may be difficult, this appears to be the only
+ re-entrant approach.
+
+ A new_pc flag will be high anytime the PC changes in an + unpredictable way (i.e., it doesn't increment). This includes jumps + as well as interrupts and interrupt returns. Whenever this flag may + go high, memory operations and ALU operations will stall until the + result is known. When the flag does go high, anything in the prefetch, + decode, and read-op stage will be invalidated. + |