Description
This project demonstrates the use of the P16C5x soft-processor core, found elsewhere on opencores.org, in a system-on-chip. The project targets a small FPGA, the Xilinx XC3S50A-4VQ100I. The project integrates the P16C5x PIC-compatible processor core an SPI Master module, SPIxIF, a Synchronous Serial Peripheral (SSP) slave module, SSP_Slv, an SSP UART, SSP_UART, and an inferred 4096 x 12 Block RAM program memory. (The SPIxIF, SSP_Slv, and SSP_UART modules are all modules that can be found on opencores.org.)
The P16C5x module is a PIC-compatible processor core that supports the 12-bit base architecture of the Microchip PIC16 product line. It extends the base architecture by supporting an additional address line into program memory. The base architecture does not implement the PA[2] program bank register in the STATUS register. The P16C5x module implements that bit, and adds an additional bit to the two-level stack so that a complete 4096 x 12 program space is available.
For compatibility with readily available PIC-compatible tools from Microchip, third-party vendors, and open-source suppliers, the P16C5x core has been parameterized such that the core's reset vector is set to be compatible with the corresponding vector of the PIC16C57/PIC16C59 microcomputer. The internal register/RAM memory map of P16C5x core has been set to be compatible with that of the PIC16C57 microcomputer: (1) I/O ports A, B, C are implemented; and (2) internal RAM is set for 72 bytes. (It is possible to increase the size of internal memory to support the banked switched memory of the PIC16C59, but the size of the FIFOs used for the UART may have to be changed to support the additional processor core RAM in the small FPGA chosen as the target for this project. Changing the FPGA to an XC3S200A-4VQ100I is possible, and that choice would allow the increase of the processor memory, and enable the use of Block RAMs for the UART FIFOs, and adding a second SSP_UART module to the M16C5X soft-microcomputer.)
Unlike a Microchip PIC16C57/PIC16C59 microcomputer, the I/O ports are not built into the M16C5x's P16C5x soft-core processor module. Instead, the P16C5x soft-core provides a parallel data bus with one-hot control signals for writing the three TRIS write-only registers and the three output data registers and reading the three input data registers. This allows the core's integrator the flexibility to create custom peripherals which are tightly integrated with the processor core in a manner that reduces the number of instructions needed to access the custom peripherals.
In the M16C5x, the SPI master interface module is integrated into the core using the TRIS C register as a write-only register. The SPI transmit and receive data registers are mapped to the Port C data output and data input registers, respectively. Furthermore, to take advantage of the capability of the SPIxIF module to operate with FIFOs connected, two 16x8 distributed RAM FIFOs are attached to the SPIxIF as the transmit and receive data ports. This allows the P16C5x processor core the opportunity to process other (beyond the scope of the demonstration) I/O or perform other computational functions while an SPI transaction is automatically fulfilled by the SPI master peripheral.
Beyond the testing performed with the simulator and various test benches, the M16C5x has been tested in a working board using the XC3S50A-4VQ100I FPGA. A simple test program was written using MPLAB (8.91) that simply converts lower case ASCII alpha characters into upper case characters, and vice versa. After configuring the SPI master and the SSP UART, it simply polls the UART, transforms the data, and writes it back to the UART. Even with all of this activity on the internal SPI bus, the M16C5x is able to process data at rates to 921.6 kbaud without errors or dropouts.
In the target FPGA, the smallest and lowest speed grade part in the Spartan 3A FPGA family, the M16C5x easily reports post synthesis speeds in excess of 57 MHz, and maps, places, and routes (with only simple period constraints) with reported and verified post-PAR performance better than 60 MHz. Since the core is a single cycle core, this is a substantial improvement over the capabilities of the equivalent Microchip products which are 5 MHz (effective instruction rate) devices.
A final component of the M16C5x project is the demonstration of the use of the Xilinx tool, Data2Mem, that allows specially formatted ASCII hexadecimal files to be written into the block RAMs of the device during the generation of the configuration images, i.e. directly inserted by BitGen. This allows a third party developer to write/modify the contents of the M16C5x program memories without requiring the resulting data to be loaded into the Block RAMs through re-synthesis and MAP/PAR operations. The resulting improvement in the turn around time for non-RTL modifications, i.e. firmware-only mods, is dramatic and far less error prone.
The TCL script included in the RTL source directory allows the integrator of this core to take advantage of this capability. (This capability is likely available from any FPGA vendor supporting soft-core processors. It is expected that Altera (NIOS-II) and Lattice (Mico-32) toolsets provide the same type of capability, but no verification has been performed to verify that these toolsets support this capability in their base (free) configurations.) The project provides a Block Memory Map (BMM) file, sets the mapper and the configuration bitstream generator (BitGen) to support use the BMM file. The project also provides a Windows executable (and its source code) for a simple filter/console program that converts Microchip MPLAB Intel Hex output files into Data2Mem-compatible MEM files.
This core has been used with MPLAB and the CCS C compiler tools. A utility for converting from Intel Hex to Xilinx MEM files has been provided as part of this SoC project.
Synthesis/PAR Results
The data provided in this section represents the synthesis/PAR results of building the project for a XC3S50A-4VQ100I FPGA to achieve best performance. Thus, synthesis is performed with speed as its primary objective; resource sharing is used, but register balancing (forward and backward) is allowed. Mapping is performed with an area objective to compress the resulting image as much as possible. Simple timing constraints are applied for the three internal clock domains, with the primary objective being to achieve a minimum operating speed of 60 MHz for the P16C5x core, 66.667 MHz operation for the SPI Master (internal SPI bus), and 100 MHz for the SSP UART. The UART, although capable of operating at higher speeds, is fed a 29.4912 MHz reference clock.
Module Level Utilization
Module Level Utilization | Sun Nov 3 07:42:40 2013 |
Module | Partition | Slices | Slice Reg | LUTs | LUTRAM | BRAM | MULT18X18 | BUFG | DCM |
[-] M16C5x/ |
| 166/1103 | 24/604 | 73/1265 | 0/211 | 3/3 | 0/0 | 1/4 | 0/1 |
[-] CPU |
| 231/387 | 116/202 | 306/488 | 40/40 | 0/0 | 0/0 | 0/0 | 0/0 |
ALU |
| 78/78 | 13/13 | 112/112 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
IDEC |
| 78/78 | 73/73 | 70/70 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
[-] ClkGen |
| 10/20 | 11/24 | 5/8 | 1/1 | 0/0 | 0/0 | 1/3 | 0/1 |
ClkGen |
| 4/4 | 4/4 | 1/1 | 0/0 | 0/0 | 0/0 | 2/2 | 1/1 |
FE1 |
| 4/4 | 6/6 | 1/1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
FE2 |
| 2/2 | 3/3 | 1/1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
[-] SPI |
| 5/90 | 8/75 | 0/135 | 0/34 | 0/0 | 0/0 | 0/0 | 0/0 |
MSTR |
| 43/43 | 39/39 | 67/67 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RF |
| 21/21 | 14/14 | 33/33 | 16/16 | 0/0 | 0/0 | 0/0 | 0/0 |
TF |
| 21/21 | 14/14 | 35/35 | 18/18 | 0/0 | 0/0 | 0/0 | 0/0 |
[-] UART |
| 0/440 | 0/279 | 0/561 | 0/136 | 0/0 | 0/0 | 0/0 | 0/0 |
SSP_Slv |
| 50/50 | 37/37 | 28/28 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
[-] UART |
| 138/390 | 81/242 | 193/533 | 0/136 | 0/0 | 0/0 | 0/0 | 0/0 |
BRG |
| 15/15 | 13/13 | 26/26 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
[-] INT |
| 7/28 | 4/25 | 5/11 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
FE1 |
| 4/4 | 3/3 | 1/1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
FE2 |
| 2/2 | 3/3 | 1/1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RE1 |
| 4/4 | 4/4 | 1/1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RE2 |
| 4/4 | 4/4 | 1/1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RE3 |
| 2/2 | 3/3 | 1/1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RE4 |
| 5/5 | 4/4 | 1/1 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RCV |
| 35/35 | 26/26 | 56/56 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RED1 |
| 4/4 | 4/4 | 2/2 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RED2 |
| 3/3 | 4/4 | 2/2 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RED3 |
| 4/4 | 4/4 | 2/2 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RED4 |
| 4/4 | 4/4 | 2/2 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RED5 |
| 5/5 | 4/4 | 2/2 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
RF1 |
| 55/55 | 20/20 | 93/93 | 72/72 | 0/0 | 0/0 | 0/0 | 0/0 |
TF1 |
| 51/51 | 20/20 | 85/85 | 64/64 | 0/0 | 0/0 | 0/0 | 0/0 |
TMR |
| 20/20 | 17/17 | 26/26 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
XMT |
| 28/28 | 20/20 | 33/33 | 0/0 | 0/0 | 0/0 | 0/0 | 0/0 |
Timing Constraints
Timing Constraints | Sun Nov 3 07:41:40 2013 |
Met | Constraint | Check | Worst Case Slack | Best Case Achievable | Timing Errors | Timing Score |
Yes | TS_Clk = PERIOD TIMEGRP "Clk" 16.666 ns HIGH 50% | SETUP
HOLD | 0.039ns
0.834ns | 16.627ns | 0
0 | 0
0 |
Yes | TS_SPI_SCK = PERIOD TIMEGRP "SPI_SCK" 15 ns HIGH 50% | SETUP
HOLD | 0.337ns
1.064ns | 14.326ns | 0
0 | 0
0 |
Yes | TS_Clk_UART = PERIOD TIMEGRP "Clk_UART" 10 ns HIGH 50% | SETUP
HOLD | 1.318ns
0.785ns | 8.682ns | 0
0 | 0
0 |
Xilinx Design Summary
M16C5x Project Status (07/05/2013 - 18:41:59) |
Project File: |
M16C5x.ise |
Current State: |
Programming File Generated |
Module Name: |
M16C5x |
|
|
Target Device: |
xc3s50a-4vq100 |
|
|
Product Version: |
ISE 10.1.03 - Foundation |
|
All Signals Completely Routed |
Design Goal: |
Balanced |
|
All Constraints Met |
Design Strategy: |
Xilinx Default (unlocked) |
|
0 |
M16C5x Partition Summary | [+] |
Device Utilization Summary | [-] |
Logic Utilization | Used | Available | Utilization | Note(s) |
Number of Slice Flip Flops |
604 |
1,408 |
42% |
|
Number of 4 input LUTs |
1,217 |
1,408 |
86% |
|
Logic Distribution | | | | |
Number of occupied Slices |
692 |
704 |
98% |
|
Number of Slices containing only related logic |
692 |
692 |
100% |
|
Number of Slices containing unrelated logic |
0 |
692 |
0% |
|
Total Number of 4 input LUTs |
1,265 |
1,408 |
89% |
|
Number used as logic |
1,006 |
|
|
|
Number used as a route-thru |
48 |
|
|
|
Number used as 16x1 RAMs |
8 |
|
|
|
Number used for Dual Port RAMs |
170 |
|
|
|
Number used for 32x1 RAMs |
32 |
|
|
|
Number used as Shift registers |
1 |
|
|
|
Number of bonded IOBs |
Number of bonded |
20 |
68 |
29% |
|
IOB Flip Flops |
5 |
|
|
|
Number of BUFGMUXs |
4 |
24 |
16% |
|
Number of DCMs |
1 |
2 |
50% |
|
Number of RAMB16BWEs |
3 |
3 |
100% |
|
Date Generated: 11/03/2013 - 07:36:08
Number of BUFGMUXs |
4 |
24 |
16% |
|
Number of DCMs |
1 |
2 |
50% |
|
Number of RAMB16BWEs |
3 |
3 |
100% |
|
Date Generated: 11/02/2013 - 13:56:37