URL
https://opencores.org/ocsvn/m65c02/m65c02/trunk
Subversion Repositories m65c02
[/] [m65c02/] [trunk/] [README.md] - Rev 2
Compare with Previous | Blame | View Log
M65C02 Microprocessor Core=======================Copyright (C) 2012-2013, Michael A. Morris <morrisma@mchsi.com>.All Rights Reserved.Released under LGPL.News----Recently completed tests have verified the M65C02 soft-processor to operate as designed at a frequency of 73.728 MHz in an XC3S50A-4VQG100I FPGA. See below for a more complete description of Release 2.7.2.with which this milestone was achieved.General Description-------------------This project provides a microprogrammed synthesizable IP core compatible withthe WDC and Rockwell 65C02 microprocessors.It is provided as a core. Several external components are required to form afunctioning processor: (1) memory, (2) interrupt controller, and (3) I/Ointerface buffers. The Verilog testbench provided demonstrates a simpleconfiguration for a functioning processor implemented with the M65C02 core:M65C02_Core. The M65C02 core supports the full instruction set of the W65C02.The core accepts an interrupt signal from an external interrupt controller.The core provides the interrupt mask bit to the external interrupt controller,and expects the controller to handle the detection of the NMI edge, theprioritization of the interrupt sources, and to provide the interrupt andexception vectors. The core also provides an indication of whether the BRKinstruction is being executed. With this additional information, the externalinterrupt controller is expected to provide the same vector for the BRKexception as the vector for the IRQ interrupt request, or another suitablevector. This approach to interrupt handling can be used to support a vectoredinterrupt structure with more interrupt sources than the original processorimplementation supported: NMI, RST, and IRQ.With Release 2.x, the core now provides a microcycle length controller as anintegral component of the M65C02 Microprogram Controller (MPC). The M65C02core microprogram can now inform the external memory controller, on a cycle bycycle basis, of the memory cycle type. Logic external to the core can use thisoutput to map the memory cycle to whatever memory is appropriate, and to drivethe microcycle length inputs of the core to extend each microcycle ifnecessary. Thus, the Release 2.x core no longer assumes that the externalmemory is implemented as an asynchronous memory device, and as a result, thecore no longer expects that the memory will accept an address and return theread data at that address in the same cycle. With the built-in microcyclelength controller, single cycle LUT-based zero page memory, 2 cycle internalblock RAM memory, and 4 cycle external memory can easily be supported. A Waitinput can also be used to extend, i.e. add wait states, to the 4 cyclemicrocycles, so a wide variety of memories can be easily supported; the onlylimitation being the memory types supported by the user-supplied externalmemory controller.The core provides a large number of status and control signals that externallogic may use. It also provides access to many internal signals such as all ofthe registers, A, X, Y, S, and P. The *Mode*, *Done*, *SC*, and *RMW* statusoutputs may be used to provide additional signals to external devices.*Mode* provides an indication of the kind of instruction being executed:0 - STP - stop processor instruction executed,1 - INV - invalid instruction (uniformly treated a single cycle NOPs),2 - BRK - break instruction being executed3 - JMP - branch/jump/return (Bcc, BBRx/BBSx, JMP/JSR, RTS/RTI),4 - STK - stack access (PHA/PLA, PHX/PLX, PHY/PLY),5 - INT - single cycle instruction (INC/DEC A, TAX/TXA, SEI/CLI, etc.),6 - MEM - multi-cycle instruction with memory access for operands,7 - WAI - wait for interrupt instruction being executed.*Done* is asserted during the instruction fetch of the next instruction.During that fetch cycle, all instructions complete execution. Thus, the M65C02is pipelined, and executes many instructions in fewer cycles than the 65C02.*SC* is used to indicate a single cycle instruction.*RMW* indicates that a read-modify-write instruction will be performed. Externallogic can use this signal to lock memory.*IO_Op* indicates the I/O cycle required. *IO_Op* signals data memory writes,data memory reads, and instruction memory reads. Therefore, external logic mayimplement separate data and instruction memories and potentially double theamount of memory that an implementation may access.Implementation--------------The implementation of the core provided consists of five Verilog source filesand several memory initialization files:M65C02_Core.v - Top level moduleM65C02_MPCv3.v - M65C02 MPC with microcycle length controllerM65C02_AddrGen.v - M65C02 Address Generator moduleM65C02_ALU.v - M65C02 ALU moduleM65C02_BIN.v - M65C02 Binary Mode Adder moduleM65C02_BCD.v - M65C02 Decimal Mode Adder moduleM65C02_Decoder_ROM.coe - M65C02 core microprogram ALU control fieldsM65C02_uPgm_V3a.coe - M65C02 core microprogram (sequence control)M65C02_Core.ucf - User Constraints File: period and pin LOCsM65C02.tcl - Project settings filetb_M65C02_Core.v - Completed core testbench with test RAMM65C02_Tst3.txt - Memory configuration file of M65C02 "ROM" programM65C02_Tst3.a65 - Kingswood A65 assembler source code test programtb_M65C02_ALU.v - testbench for the ALU moduletb_M65C02_BCD.v - testbench for the BCD adder moduleSynthesis---------The objective for the core is to synthesize such that the FF-FF speed is 100 MHzor higher in a Xilinx XC3S200AN-5FGG256 FPGA using Xilinx ISE 10.1i SP3. In thatregard, the core provided meets and exceeds that objective. Using the settingsprovided in the M65C02.tcl file, ISE 10.1i tool implements the design andreports that the 10.000 ns period (100 MHz) constraint is satisfied.The ISE 10.1i SP3 implementation results are as follows:Number of Slice FFs: 191Number of 4-input LUTs: 747Number of Occupied Slices: 459Total Number of 4-input LUTs: 760 (13 used as route-throughs)Number of BUFGMUXs: 1Number of RAMB16BWEs 2 (M65C02_Decoder_ROM, M65C02_uPgm_V3a)Best Case Achievable: 9.962 ns (0.038 ns Setup, 1.028 ns Hold)Status------Design and verification is complete.Release Notes-------------###Release 1Release 1 of the M65C02 had an issue in that addressing wrapping of zero pageaddressing was not properly implemented. Unlike the W65C02 and MOS6502, theM65C02 synthesizable core implemented the addressing modes, but allowed pageboundaries to be crossed for all addressing modes. This initial behavior ismore like that of the WDC 65C802/816 microprocessors in native mode. With thisrelease, Release 2, the zero page addressing modes of the M65C02 core behavelike those of the WDC W65C02.Following Release 1, a couple of quick patches were made to the zero pageaddressing, but these failed to address all of the issues. Release 2 uses thesame basic next address generation logic, except that it now allows themicrocode to control when addresses are computed modulo 256. With this change,all outstanding issues with respect to zero page addressing have beencorrected.###Release 2Release 2 has reworked the Microprogram Controller (MPC) to include amicrocycle length controller directly. With this new MPC, it is expected thatit will be easier to adapt the core to use LUT RAM for page 0 (data page) andpage 1 (stack page), and to attach a external memory controller with variablelength access cycles. The microcycle length controller allows 1, 2, or 4 cyclemicrocycles. Neither the 1 and 2 cycle microcyles support wait stateinsertion, but the 4 cycle microcycle allows the insertion of wait states.With this architecture, LUT and internal Block RAMs can be used to providehigh speed operation. The 4 cycle external memory microcycle should easilyallow the core to support asynchronous or synchronous external memory. Release1 allowed variable length microcycles, but the address-based mechanismimplemented was difficult to use in practice. Release 1 targeted a singlecycle memory like that provided by the distributed LUT RAMs of the targetFPGAs. The approach used in Release 2 should make it much easier to adapt theM65C02 core.####Release 2.1Release 2.1 has modified the core to export signals to an external memorycontroller that would allow the memory controller to drive the core logic withthe required microcycle length value for the next microcycle. The test bench forthe core is running in parallel with the original Release 1 (with zero pageadressing corrected) core (M65C02_Base.v) so that a self-checking configurationis achieved between the two cores and the common test program. Release 2.1 alsoincludes a modified memory model module, M65C02_RAM,v, that supports all threetypes of memory that is expected to be used with the core: LUT (page 0), BRAM(page 1 and internal program/data memory), and external pipelined SynchRAM.####Release 2.2Release 2.2 has been tested using microcycles of 1, 2, or 4 cycles in length.During testing, some old issues returned when multi-cycle microcycles wereused. With single cycle microcycles there were no problems with either of thetwo cores: M65C02_Core.v or M65C02_Base.v. For example, with 2 and 4 cyclemicrocycles, the modification of the PSW before the first instruction of theISR was found to be taking place several microcycles before it should. Thisissue was tracked down to the fact that the microprogram ROMs and the PSWupdate logic were not being qualified by the internal Rdy signal, or end-of-microcycle. In the single cycle microcycle case, previous corrections appliedto address this issue still worked, but the single cycle solutions applied didnot generalize to the multi-cycle cases. Thus, several modules were modifiedso that ISR, BCD, and zero page addressing modes now behave correctly forsingle and multi-cycle microcycles.####Release 2.3Release 2.3 implements the standard 6502/65C02 vector fetch operations andadds the WAI and STP instructions. Both versions are updated to incorporatethese features. The testbench has been modified to include another M6502_RAMmodule, and to separate the two modules into "ROM" at high memory and "RAM" atlow memory. The test program has been updated to include initialization of"RAM" by the test program running from "ROM". Initialization of the stackpointer is still part of the core logic, and the test program expects that Sis initialized to 0xFF on reset, and that the reset vector fetch sequence doesnot modify the stack. In other words, the Release 2.3 core does not write tothe stack before fetching the vector and starting execution at that address.####Release 2.4Release 2.4 incorporates the 32 Rockwell instruction opcodes and the WAI and STPinstructions.####Release 2.5Release 2.5 makes some minor modifications to the M65C02 core module to allowthe output of some signals that allow the generation of interface signals suchas the active low Vector Pull output of the W65C02S microprocessor. Inaddition to bringing out of these signals, Release 2.5 also provides animplementation of a standalone microprocessor, or system-on-chip, whichdemonstrates how the M65C02 can be used to provide a stand-aloneimplementation of a 65C02 processor. This implementation is composed of thefollowing files:M65C02.v - M65C02 microprocessor demonstrationClkGen.xaw - Xilinx Architecture Wizard clock generator fileM65C02.ucf - User Constraints File: period and pin LOCsM65C02.tcl - Project settings filetb_M65C02.v - M65C02 testbench with RAM/ROM and interrupt sourcesThe header of the M65C02.v module provides details of the differences betweenthe 65C02 microprocessor implementation represented by the M65C02.v and a65C02 processor implementation as represented by the WDC W65C02S microprocessor.The M65C02 implementation is targeted at an XC3S50A-4VQG100I FPGA. The UserConstraints File (ucf) has been developed so that the resulting implementationcan be used as a fully functional microprocessor when attached to external I/Odevices, external SRAM device(s) (25ns or faster), and external an NOR Flashdevice (4kB, 45ns or faster). A development board is presently being developedto demonstrate the M65C02, and to provide a suitable platform for furtherdevelopment of the remaining FPGA resources into a more complete system-on-chip based on the M65C02 core.The Xilinx ISE 10.1i SP3 synthesis results for the M65C02 are as follows:Used Avail %Number of Slice Flip Flops 200 1408 14%Number of 4 input LUTs 736 1408 52%Logic DistributionNumber of occupied Slices 426 704 60%Number of Slices related logic 426 426 100%Number of Slices unrelated logic 0 426 0%Total Number of 4 input LUTs 745 1408 52%Number used as logic 735Number used as a route-thru 9Number used as Shift registers 1Number of bonded IOBsNumber of bonded pads 53 68 77%IOB Flip Flops 79Number of BUFGMUXs 4 24 16%Number of DCMs 1 2 50%Number of RAMB16BWEs 2 3 66%Best Case Achievable: 13.213ns (0.037ns Setup, 1.023ns Hold)Please read the header and other comments for more details on the M65C02processor implementation. In particular, read and understand the discussionregarding the use of an FPGA-specific clock multiplexer to manage the memorycycle length in lieu of supporting wait state generation/insertion.#####Release 2.6Modified the M65C02 processor to use the last available block RAM in theXC3S50A-xVQG100I device as a 2kB Boot/Monitor ROM. Added an external pin toinhibit writes into this block RAM. The UCF file includes a PULLUP on the pinwhich enables writes. Also modified the clock stretch logic to only apply whensystem ROM, CE[2], or User ROM, CE[1], are addressed. The Boot/MonitorROM/RAM, IO (CE[3]), and User RAM, CE[0], do not use the clock stretchinglogic and therefore require devices able to respond in a single memory cycle ofthe M65C02, ~25ns.Adding the additional (internal) device select and data multiplexer to theM65C02 caused a drop in performance. External memory operating frequencydecreased from ~20 MHz (max) to ~16 MHz for a -5 speed grade part. There wasalso an increase in the size of the implementation, but that was expected anddid use a reasonable number of additional resources.The following table summarizes PAR results for the new release of the M65C02processor:Used Avail %Number of Slice Flip Flops 205 1408 14%Number of 4 input LUTs 724 1408 51%Logic DistributionNumber of occupied Slices 443 704 62%Number of Slices related logic 443 443 100%Number of Slices unrelated logic 0 426 0%Total Number of 4 input LUTs 732 1408 51%Number used as logic 723Number used as a route-thru 8Number used as Shift registers 1Number of bonded IOBsNumber of bonded pads 54 68 79%IOB Flip Flops 80Number of BUFGMUXs 4 24 16%Number of DCMs 1 2 50%Number of RAMB16BWEs 3 3 100%Best Case Achievable: 15.147ns (0.003ns Setup, 0.817ns Hold)The modified files are:M65C02.v - M65C02 microprocessor demonstrationM65C02.ucf - User Constraints File: period and pin LOCstb_M65C02.v - M65C02 testbench with RAM/ROM and interrupt sourcesAdditional work is needed for verification, but this release successfullyexecutes the same test program as the previous release of the M65C02 processorand the M65C02 core.#####Release 2.7Modified the Release 2.6 M65C02 processor to use a newly released version of themicroprogram controller. The new microprogram controller, M65C02_MPCv4.v,modifies the behavior of the built-in microcycle length controller. It fixes themicrocycle length to 4, and adds four additional states by which externaldevices can request wait states. The new microprogram controller adds waitstates in integer multiples of the memory cycle. In this way, the clock stretchlogic built using a FF and a BUFGMUX clock multiplexer can be removed, and theexternal Phi1O and Phi2O signals will maintain their natural 50% DC signalcharacteristic.The change to the microprogram controller required a change to the core and tothe interface between the core and the M65C02 processor. Within the core, thechange in the microprogram controller removed the need for the cycle extensionlogic used to insert an extra state in the microcycle whenever a BCD instructionis executed. That extra cycle is only needed when the core is operating withsingle memory. Since the microcycle is fixed to 4 with the new microprogramcontroller, the BCD mode microcycle extension logic was removed.The interface change refers to the need to increase the width of the microstatesignal, MC, from 2 to 3 bits. Within the M65C02 processor, the additional statessupported by the larger MC port required that the clock enable for the externalmemory data input register be modified. The nominal external input data samplingpoint is cycle 3, falling edge of Phi2O. With wait states, the data samplingpoint becomed cycle 3 or cycle 7. For data sampling, the external Rdy inputsignal must also be asserted. A final change to the M65C02 processor is that thePhi1O and Phi2O signals are now set and reset using four microstate decodesignals rather than two.The incorporation of the last block memory into the design resulted in a lossof performance. The M65C02 processor is unable to maintain an external memorycycle rate of 18.432 MHz when the internal block RAM is included. Theadditional decode and input data multiplexer impose a path delay that lowersthe memory interface operating speed to 16 MHz. Thus, the nearest baud ratefrequency is 14.7456 MHz.Operating at 14.7456 MHz requires external devices to request a wait state ifthey are unable to accept or supply data within 33.908ns. (At 16 MHzoperation, the access time requirement is 31.25ns.) A single wait stateextends the memory access time to 101.725ns. At 14.7456 MHz or 16 MHz, thememory cycle characteristics of the M65C02 processor allow the use of low-costhigh-speed asynchronous SRAMs, and with one wait state, low-cost NOR FlashEEPROMs in 45, 55, 70, or 90ns speed grades.The following table summarizes PAR results for Release 2.7 of the M65C02processor:Used Avail %Number of Slice Flip Flops 205 1408 14%Number of 4 input LUTs 720 1408 51%Number of occupied Slices 401 704 56%Number of Slices related logic 401 401 100%Number of Slices unrelated logic 0 401 0%Total Number of 4 input LUTs 728 1408 51%Number used as logic 719Number used as a route-thru 8Number used as Shift registers 1Number of bonded IOBsNumber of bonded pads 54 68 79%IOB Flip Flops 79Number of BUFGMUXs 4 24 16%Number of DCMs 1 2 50%Number of RAMB16BWEs 3 3 100%Best Case Achievable: 15.625ns (0.000ns Setup, 0.961ns Hold)The files modified in this release are:M65C02.v - M65C02 microprocessor demonstrationM65C02_Core.v - M65C02 core logicM65C02_MPCv4.v - M65C02 core microprogram controllerM65C02.ucf - User Constraints File: period and pin LOCsM65C02.tcl - M65C02 ISE tool configurations/settingstb_M65C02.v - M65C02 testbench with RAM/ROM and interrupt sourcesTesting with the current testbench demonstrates that the M65C02 processorcorrectly executes the 65C02 test program, M65C02_Tst3.a65, used in previoustesting of the M65C02 core with tb_M65C02_Core.v. That provides confidencethat the integration of the core logic with the memory interface, interrupthandler, reset controller, and internal block RAM did not introduce any errorsrelated to the core. However, the circuits in the wrapper around the corelogic have not been extensively tested. The testing that has been performed todate indicate these circuits are operating correctly, but the tests performedto date only test the nominal cases and not those cases on the margins.For example, the interrupt handler has demonstrated that it is able to handlevector generation for RST, IRQ, and BRK; NMI vector processing has not yetbeen tested. Another signal not yet tested is the reset logic's characteristicthat requires the external nRst signal to be asserted for four cycle of theinput clock before it is recognized. This behavior has not yet been tested, norhas the related behavior that a loss of lock of the internal clock generatorwill assert reset to the M65C02 processor.#####Release 2.71Corrected logic for generating an internal reset signal, Rst, based on anexternal reset, nRst, and the state of the DCM_Locked signal. The vectorreduction operator applied, '&', is incorrect. The correct vector reductionoperator is '|', or logic OR. The correction has been made, and the FPGAcorrectly drives the nRstO output with the complement of the internal resetsignal, Rst.The changes have been made to the M65C02.v module, and only that module hasbeen loaded into the MAM65C02 GitHUB repository.#####Release 2.72Improved the timing of the soft-core microprocessor, M65C02, by using a moreefficient scheme for the internal bus multiplexers. Previous releases of thecore, M65C02_Core, and the soft-core microprocessor used multiplexersgenerated using _switch/case select_ constructs.Although these constructs are an effective and fast means for generating busmultiplexers, there are some penalties. This latest release has resorted tousing one-hot decode ROMs tied to the various bus selects in theimplementation, and then forcing the various data sources to connect to thebusses as gated signals. When not gated, a logic 0 is driven onto the bus. Atthe terminal end, a simple OR gate is used to collect all of the desired gatedsignals.The result of this effort has been a significant improvement in thecombinatorial path delays. Prior to this optimization, the synthesizerreported a clock period performance of ~55 MHz. After the OR bus optimizationwas fully incorporated, the synthesizer reports a minimum period of ~74 MHz.This is nearly a 35% improvement in the combinatorial path delays.The resulting improvement is sufficient to allow the soft-core processor tosupport an operating speed of **73.728 MHz** which corresponds to a singleinstruction cycle time of **18.432 MHz** given this core's 4 cycle microcycle.In addition to the improved combinatorial path delays, the improvement in pathdelays has allowed the core to be synthesized, Mapped, and PARed for minimumarea. The result is a significant reduction in the resource utilization in thetarget XC3S50A-4VQG100I FPGA.The following table summarizes PAR results for Release 2.7 of the M65C02processor: **XC3S50A-4VQG100I**Used Avail %Number of Slice Flip Flops 248 1408 17%Number of 4 input LUTs 647 1408 45%Number of occupied Slices 400 704 56%Number of Slices related logic 400 400 100%Number of Slices unrelated logic 0 400 0%Total Number of 4 input LUTs 661 1408 46%Number used as logic 646Number used as a route-thru 14Number used as Shift registers 1Number of bonded IOBsNumber of bonded pads 54 68 79%IOB Flip Flops 79Number of BUFGMUXs 3 24 16%Number of DCMs 1 2 50%Number of RAMB16BWEs 3 3 100%Best Case Achievable: 13.516ns (0.047ns Setup, 1.021ns Hold)The files modified in this release are:M65C02.v - M65C02 microprocessor demonstrationM65C02_Core.v - M65C02 core logicM65C02_AddrGen.v - M65C02 core microprogram controllerM65C02_ALU.v - M65C02 core ALUM65C02_BIN.v - M65C02 ALU Binary mode adderM65C02_BCD.v - M65C02 ALU Decimal mode adderM65C02.ucf - User Constraints File: period and pin LOCsM65C02.tcl - M65C02 ISE tool configurations/settingsAdditional optimizations in the ALU can be applied, but with the improvementsmade with this release, a -5 speed grade part can be made to operate at 90+MHz. If higher speeds are needed, then further optimization, including addingpipeline registers to the ALU, can be made. Some pipelining can be easilyadded because of the 4 clock microcycle around which the soft-core processoris built.#####Release 2.73Improved the modularity of the M65C02 top level module by creating modules forclock generation and interrupt handling. Updated the design document, anddeleted unnecessary files.
