1 |
2 |
MichaelA |
M16C5x Soft-Core Microcomputer
|
2 |
|
|
=======================
|
3 |
|
|
|
4 |
|
|
Copyright (C) 2013, Michael A. Morris .
|
5 |
|
|
All Rights Reserved.
|
6 |
|
|
|
7 |
|
|
Released under LGPL.
|
8 |
|
|
|
9 |
|
|
General Description
|
10 |
|
|
-------------------
|
11 |
|
|
|
12 |
|
|
This project demonstrates the use of a PIC16C5x-compatible core as an FPGA-
|
13 |
|
|
based processor. The core provided is instruction set compatible, but it is
|
14 |
|
|
not a cycle accurate model of any particular PIC microcomputer. It implements
|
15 |
|
|
the 12-bit instruction set, the timer 0 module, the pre-scaler, and the watchdog
|
16 |
|
|
timer.
|
17 |
|
|
|
18 |
|
|
As configured, the core supports single cycle (1) operation with internal
|
19 |
|
|
block RAM serving as program memory. In addition to the block RAM program
|
20 |
|
|
store, a 4x clock generator and reset controller is included as part of the in
|
21 |
|
|
the demonstration.
|
22 |
|
|
|
23 |
|
|
Three I/O ports are supported, but they are accessed as external registers and
|
24 |
|
|
buffers using a bidirectional data bus. The TRIS I/O control registers are
|
25 |
|
|
similarly supported. Thus, the core's user is able to map the TRIS and I/O
|
26 |
|
|
port registers in a manner appropriate to the intended application.
|
27 |
|
|
|
28 |
|
|
Read-modify-operations on the I/O ports do not generate read strobes. Read
|
29 |
|
|
strobes of the three I/O ports are generated only if the ports are being read
|
30 |
|
|
using MOVF xxx,0 instructions. Similarly, the write enables for the three I/O
|
31 |
|
|
ports are asserted whenever the ports are updated. This occurs during MOVWF
|
32 |
|
|
instructions, or during read- modify-write operations such as XORF, MOVF, etc.
|
33 |
|
|
|
34 |
|
|
Implementation
|
35 |
|
|
--------------
|
36 |
|
|
|
37 |
|
|
The implementation of the core provided consists of several Verilog source files
|
38 |
|
|
and memory initialization files:
|
39 |
|
|
|
40 |
|
|
M16C5x.v - Top level module
|
41 |
|
|
M16C5x_ClkGen.v - M16C5x Clock/Reset Generator
|
42 |
|
|
P16C5x.v - PIC16C5x-compatible processor core
|
43 |
|
|
P16C5x_IDEC.v - ROM-based instruction decoder for PIC16C5x core
|
44 |
|
|
P16C5x_ALU.v - Arithmetic & Logic Unit for PIC16C5x core
|
45 |
|
|
M16C5x_SPI.v - High-Speed, FIFO-buffered SPI Master Interface
|
46 |
|
|
DPSFmnCE.v - Configurable Depth/Width LUT-based Synch FIFO
|
47 |
|
|
TF_Init.coe - Transmit FIFO Initialization file
|
48 |
|
|
RF_Init.coe - Receive FIFO Initialization file
|
49 |
|
|
SPIxIF.v - Configurable Master SPI I/F with clock Generator
|
50 |
|
|
M16C5x_UART.v - UART with Serial Interface
|
51 |
|
|
SSPx_Slv.v - SSP-compatible Slave Interface
|
52 |
|
|
SSP_UART.v - SSP-compatible UART
|
53 |
|
|
re1ce.v - Rising Edge Clock Domain Crossing Synchronizer
|
54 |
|
|
DPSFmnCE.v - onfigurable Depth/Width LUT-based Synch FIFO
|
55 |
|
|
UART_TF.coe - UART Transmit FIFO Initialization file
|
56 |
|
|
UART_RF.coe - UART Receive FIFO Initialization file
|
57 |
|
|
UART_BRG.v - UART Baud Rate Generator
|
58 |
|
|
UART_TXSM.v - UART Transmit State Machine (includes SR)
|
59 |
|
|
UART_RXSM.v - UART Receive State Machine (includes SR)
|
60 |
|
|
UART_RTO.v - UART Receive Timeout Generator
|
61 |
|
|
UART_INT.v - UART Interrupt Generator
|
62 |
|
|
|
63 |
|
|
M16C5x_Test.coe - M16C5x Test Program Memory Initialization File
|
64 |
|
|
M16C5x_Tst2.coe - M16C5x Test #2 Program Memory Initialization File
|
65 |
|
|
M16C5x_Tst3.coe - M16C5x Test #3 Program Memory Initialization File
|
66 |
|
|
M16C5x_Tst4.coe - M16C5x Test #4 Program Memory Initialization File
|
67 |
|
|
|
68 |
|
|
M16C5x.ucf - M16C5x User Constraint File
|
69 |
|
|
M16C5x.bmm - M16C5x Block RAM Memory Map File
|
70 |
|
|
|
71 |
|
|
Verilog tesbench files are included for the processor core, the FIFO, and the
|
72 |
|
|
SPI modules.
|
73 |
|
|
|
74 |
|
|
tb_M16C5x.v - testbench for the soft-core processor module
|
75 |
|
|
tb_P16C5x.v - testbench for the processor core module
|
76 |
|
|
tb_DPSFmnCE.v - testbench for the LUT-based FIFO module
|
77 |
|
|
tb_SPIxIF.v - testbench for the SPI Master Interface module
|
78 |
|
|
|
79 |
|
|
Also provided is the MPLAB project and the source files used to create the
|
80 |
|
|
memory initialization files for testing the microcomputer application. These
|
81 |
|
|
files are found in the MPLAB subdirectory of the Code directory.
|
82 |
|
|
|
83 |
|
|
Finally, the configuration of the Xilinx tools used to synthesize, map, place,
|
84 |
|
|
and route are captured in the the TCL file:
|
85 |
|
|
|
86 |
|
|
M16C5x_3S50A.tcl - TCL file for XC3S50A-4VQG100I FPGA
|
87 |
|
|
|
88 |
|
|
Run this TCL script from within the TCL console of ISE, or examine it in a
|
89 |
|
|
text editor, to set up the project files and to set the tools to the options
|
90 |
|
|
used to achieve the results provided here.
|
91 |
|
|
|
92 |
|
|
Added utility program to convert MPLAB Intel Hex programming files into MEM
|
93 |
|
|
files for use with Xilinx Data2MEM utility program to speed the process of
|
94 |
|
|
incorporating program/data/parameter data into block RAMs. TCL also
|
95 |
|
|
incorporates the process parameter changes to get the BMM file processed by
|
96 |
|
|
Map/PAR/Bitgen.
|
97 |
|
|
|
98 |
|
|
IH2MEM.c - Source code for Intel Hex to MEM utility
|
99 |
|
|
IH2MEM.exe - Windows Executable (32-bit)
|
100 |
|
|
|
101 |
|
|
M16C5x_Tst3.mem - M16C5x Test #3 Program Memory Data2Mem File
|
102 |
|
|
M16C5x_Tst4.mem - M16C5x Test #4 Program Memory Data2Mem File
|
103 |
|
|
|
104 |
|
|
Synthesis
|
105 |
|
|
---------
|
106 |
|
|
|
107 |
|
|
The primary objective of the M16C5x is to synthesize a processor core, 4kW of
|
108 |
|
|
program memory, a buffered SPI master, and a buffered UART into a Xilinx
|
109 |
|
|
XC3S50A-4VQG100I FPGA. The present implementation includes the P16C5x core,
|
110 |
|
|
4kW of program memory, a dual-channel SPI Master I/F, and an SSP-compatible
|
111 |
|
|
UART supporting baud rates from 3M bps to 1200 bps.
|
112 |
|
|
|
113 |
|
|
Using ISE 10.1i SP3, the implementation results for an XC3S50A-4VQ100I are as
|
114 |
|
|
follows:
|
115 |
|
|
|
116 |
|
|
Number of Slice FFs: 619 of 1408 43%
|
117 |
|
|
Number of 4-input LUTs: 1287 of 1408 92%
|
118 |
|
|
Number of Occupied Slices: 701 of 704 99%
|
119 |
|
|
Total Number of 4-input LUTs: 1333 of 1408 94%
|
120 |
|
|
|
121 |
|
|
Logic: 1052
|
122 |
|
|
Route-Through: 46
|
123 |
|
|
16x1 RAMs: 8
|
124 |
|
|
Dual-Port RAMs: 194
|
125 |
|
|
32x1 RAMs: 32
|
126 |
|
|
Shift Registers: 1
|
127 |
|
|
|
128 |
|
|
Number of BUFGMUXs: 4 of 24 16%
|
129 |
|
|
Number of DCMs: 1 of 2 50%
|
130 |
|
|
Number of RAMB16BWEs 3 of 3 100%
|
131 |
|
|
|
132 |
|
|
Best Case Achievable: 12.381 ns (0.119 ns Setup, 0.691 ns Hold)
|
133 |
|
|
|
134 |
|
|
Status
|
135 |
|
|
------
|
136 |
|
|
|
137 |
|
|
Design and initial verification is complete. Verification using ISim, MPLAB,
|
138 |
|
|
and a board with an XC3S200AN-4VQG100I FPGA, various oscillators, SEEPROMs,
|
139 |
|
|
and RS-232/RS-485 transceivers is underway.
|
140 |
|
|
|
141 |
|
|
In circuit testing of the M16C5x soft-core microcomputer has demonstrated that
|
142 |
|
|
the M16C5x can operate to **147.4560 MHz**. At this internal system clock
|
143 |
|
|
frequency, a 10x multiplication of the external reference oscillator, the SPI
|
144 |
|
|
shift clock divider must be set to divide the system clock by 4, which
|
145 |
|
|
generates an SPI shift clock frequency of 36.864 MHz. Various combinations of
|
146 |
|
|
the DCM multiplier have been generated at tested in the XC3S200A-4VQG100I
|
147 |
|
|
FPGA. The following table shows the system clock frequencies tested, the SPI
|
148 |
|
|
shift clock frequencies tested, and the maximum achievable standard UART bit
|
149 |
|
|
rate:
|
150 |
|
|
|
151 |
|
|
DCM Multiplier System Clock (MHz) SPI Clock (MHz) Max UART bit rate (MHz)
|
152 |
|
|
4x 58.9824 29.4912 3.6864
|
153 |
|
|
5x 73.7280 36.8640 0.9216
|
154 |
|
|
6x 88.4736 44.2368 0.9216
|
155 |
|
|
6.5x 95.8464 47.9232 0.4608
|
156 |
|
|
7x 103.2192 51.6096 0.9216
|
157 |
|
|
7.5x 110.5920 55.2960 0.4608
|
158 |
|
|
8x 117.9648 58.9824 7.3728
|
159 |
|
|
8.5x 125.3376 62.6688 0.4608
|
160 |
|
|
10x 147.4560 36.8640 1.8432
|
161 |
|
|
|
162 |
|
|
These results are only applicable to this particular configuration. The period
|
163 |
|
|
constraint for the system clock is set for 12.5 ns, or 80 MHz. The
|
164 |
|
|
relationship between the clock enable, 0.5 of the system clock, does not seem
|
165 |
|
|
to be accomodated by the reported performance values. Further investigation is
|
166 |
|
|
needed to establish if the results provided in the previous table should be
|
167 |
|
|
accepted as the performance limits of the M16C5x core in this FPGA family.
|
168 |
|
|
|
169 |
|
|
A board has been configured with an XC3S50A-4VQG100I components, and it
|
170 |
|
|
operates as expected at 80 MHz. A new internal resource configuration makes
|
171 |
|
|
the UART clock, Clk_UART, a fixed output of the DCM. The UART clock is fixed
|
172 |
|
|
at 2x ClkIn, or as is the case in this test configuration, 29.4912 MHz.
|
173 |
|
|
|
174 |
|
|
Testing like that performed above with the XC3S200A-4VQG100I is shown below.
|
175 |
|
|
It indicates that the upper operating frequency is limited to **140.0832
|
176 |
|
|
MHz**. This upper limit is most likely imposed by the reduction in routing
|
177 |
|
|
resources. The utilization factor in an XC3S50A-4VQG100I FPGA is **99%**, and _~50%_
|
178 |
|
|
in an XC3S200A-4VQG100I FPGA. The larger number of LUTs/Slices and routing
|
179 |
|
|
resources allows Map and Place greater flexibility to satisfy the timing
|
180 |
|
|
constraints.
|
181 |
|
|
|
182 |
|
|
DCM Multiplier System Clock (MHz) SPI Clock (MHz) Max UART bit rate (MHz)
|
183 |
|
|
4x 58.9824 29.4912 1.8432
|
184 |
|
|
8x 117.9648 58.9824 1.8432
|
185 |
|
|
8.5x 125.3376 62.6688 1.8432
|
186 |
|
|
9x 132.7104 66.3552 1.8432
|
187 |
|
|
9.5x 140.0832 70.0461 1.8432
|
188 |
|
|
|
189 |
|
|
Release Notes
|
190 |
|
|
-------------
|
191 |
|
|
|
192 |
|
|
###Release 1.0
|
193 |
|
|
|
194 |
|
|
In this release, the M16C5x has been synthesized, mapped, placed, routed, and
|
195 |
|
|
used to configure an FPGA. The FPGA used for this initial test of the M16C5x
|
196 |
|
|
was the XC3S200A-4VQG100I FPGA. The test program provided demonstrated that
|
197 |
|
|
the M16C5x was executing the program in the same manner as simulated with the
|
198 |
|
|
MPLAB simulator.
|
199 |
|
|
|
200 |
|
|
Using an external 14.7456 MHz oscillator, selected for use for use with the
|
201 |
|
|
UART, square waves were generated by the core to illuminate external LEDs
|
202 |
|
|
using the upper 6 bits of PortA. The square waves have the appropriate ratios,
|
203 |
|
|
and the frequency of the fastest LED drive signal is ~4.753kHz.
|
204 |
|
|
|
205 |
|
|
The clock generator multiplies the input frequency to 58.9824 MHz which
|
206 |
|
|
results in an effective instruction frequency of 29.4912 MHz because of the
|
207 |
|
|
two cycle nature of the core. The instruction loop is essentially 8*(*+3*256),
|
208 |
|
|
which equals 6208 cycles per LED toggle. The measured toggle frequency of the
|
209 |
|
|
fastest LED is approximately equal to 29.4912 MHz / 6208, or 4.750 kHz.
|
210 |
|
|
|
211 |
|
|
Work will continue to verify the testbench results with the FPGA. The next
|
212 |
|
|
release should include the UART, and test the ability of the core to
|
213 |
|
|
send/receive data using the FIFOs at rates of 115,200 baud or greater.
|
214 |
|
|
|
215 |
|
|
###Release 2.0
|
216 |
|
|
|
217 |
|
|
In this release, the UART has been addded. An update has been made to the SPI
|
218 |
|
|
I/F Master function; update correct fault with the framing of SPI Mode 3
|
219 |
|
|
frames with shift lengths greater than 1 byte. A correction, not fully tested
|
220 |
|
|
or verified, was made to the P16C5x core to correct anomalous behavior for
|
221 |
|
|
BTFSC/BTFSS instructions.
|
222 |
|
|
|
223 |
|
|
UART integrated with the Release 1.0 core. Verification of the integrated
|
224 |
|
|
interface is underway.
|
225 |
|
|
|
226 |
|
|
###Release 2.1
|
227 |
|
|
|
228 |
|
|
Testing with an M16C5x core processor program assembled using
|
229 |
|
|
MPLAB and ISIM showed that polling of the UART status register to determine
|
230 |
|
|
whether the transmit FIFO was empty or not (using the iTFE interrupt flag)
|
231 |
|
|
would clear the generated interrupt flags before they had actually been
|
232 |
|
|
captured and shifted in the SSP response to the core.
|
233 |
|
|
|
234 |
|
|
This indicated a clock domain crossing issue in the interrupt clearing logic.
|
235 |
|
|
This release fixes that issue. Previous use of the UART does not poll the USR,
|
236 |
|
|
so this problem does not manisfest itself in a reasonable amount of time, if
|
237 |
|
|
ever. In other words, the synchronization fault has been present all along in
|
238 |
|
|
the implementation, but the module's usage in the application (or testbench)
|
239 |
|
|
did not present the conditions under which the fault manifests.
|
240 |
|
|
|
241 |
|
|
The correction required registering the USR data on the SSP clock domain, and
|
242 |
|
|
qualifying the clearing of the interrupt flags on the basis of whether the
|
243 |
|
|
flag is set in both domains when the USR is read. The addition of the register
|
244 |
|
|
reduced the logic utilization, and only a small additonal time delay was
|
245 |
|
|
incurred. The resulting design is still able to fit into a Spartan 3A XC3S50A-
|
246 |
|
|
4VQG100I FPGA.
|
247 |
|
|
|
248 |
|
|
Modified the UART Baud Rate Generator. Removed the fixed 16x12 ROM that
|
249 |
|
|
provided the pre-scaler and divider constants for a fixed set of 16 baud
|
250 |
|
|
rates. Added a 12-bit, write-only register, BRR - Baud Rate Register, that can
|
251 |
|
|
be used to set the baud rate from 1/16 of the processor clock. With a
|
252 |
|
|
58.9824 MHz oscillator, the baud rate can range from 3.6864Mbps down to 900 bps.
|
253 |
|
|
Set the default baud rate to 9600 for a 58.9824 MHz UART clock.
|
254 |
|
|
|
255 |
|
|
Utilization for a XC3S50A-4VQG100I FPGA is 100%. The 128 byte LUT-based
|
256 |
|
|
receive FIFO can be reduced to accomodate some additional functions. Synthesis
|
257 |
|
|
and MAP/PAR able to implement the design. There is also some place holder
|
258 |
|
|
logic that can be used for other purposes.
|
259 |
|
|
|
260 |
|
|
###Release 2.2
|
261 |
|
|
|
262 |
|
|
Updated the soft-core so as to be able to parameterize the microcontroller
|
263 |
|
|
from the top module. Changed the frequency multiplication from 4 to 5 in order
|
264 |
|
|
to test operation at the frequency which the UCF constrains Map/PAR tools. The
|
265 |
|
|
input clock is driven by a 14.7456 MHz oscillator, and the clock multiplier
|
266 |
|
|
(DCM) generates **73.7280 MHz**. The default baud rate, 9600, required that the
|
267 |
|
|
default settings be adjusted. All other parameters remain the same.
|
268 |
|
|
|
269 |
|
|
Also added a Block RAM Memory Map file to the project. Utilized Xilinx's
|
270 |
|
|
Data2MEM tool to insert modified program contents into the affected Block RAMs
|
271 |
|
|
using MEM files dereived from standard MPLAB outputs. Tutorial on this subject
|
272 |
|
|
is being prepared and will be released on an associated Wiki soon.
|
273 |
|
|
|
274 |
|
|
###Release 2.3
|
275 |
|
|
|
276 |
|
|
Updated the soft-core microcomputer. Fixed the UART clock, Clk_UART, to twice
|
277 |
|
|
the input frequency. This means that the UART operates with a fixed reference
|
278 |
|
|
frequency unlike Release 2.2 where Clk_UART was set to the system clock
|
279 |
|
|
frequency.
|
280 |
|
|
|
281 |
|
|
Also added asynchronous resets to several registers in the UART so that it
|
282 |
|
|
would simulate correcly with ISim. Direct control of the UART prescaler and
|
283 |
|
|
divider was previously untested using the simulation. With that change to the
|
284 |
|
|
baud rate generator made to UART, the reset/power-on values of these two logic
|
285 |
|
|
functions are unknown. The unknowns, "X", propagate through the baud rate
|
286 |
|
|
generator and prevent the simulator from resolving the state of the internal
|
287 |
|
|
baud rate clock of the UART. Thus, although the rest of circuits simulate as
|
288 |
|
|
expected, the transmit shift register never shifts because there's an
|
289 |
|
|
"unknown" signal level applied on the bit clock.
|
290 |
|
|
|
291 |
|
|
###Release 2.4
|
292 |
|
|
|
293 |
|
|
Polling the UART's Receive Data Register (RDR) uncovered a race condition like
|
294 |
|
|
that previously found and corrected in regards to polling the UART Status
|
295 |
|
|
Register (USR). Correction required registering the RDR in the SCK clock
|
296 |
|
|
domain, and qualifying the read enable pulse for the receive FIFO so that it
|
297 |
|
|
is only generated if the Receive Rdy flag is present in the SCK clock domain.
|
298 |
|
|
Otherwise, the Receive FIFO is not read which prevents the inadvertent
|
299 |
|
|
clearing of the FIFO empty flag.
|
300 |
|
|
|
301 |
|
|
Test Program 4, M16C5x_Tst4.asm, is used to test the receive signal path.
|
302 |
|
|
Hyperterminal and Tera Term were used to sent (without local echo) several
|
303 |
|
|
large text files through the M16C5x UART. The test program polls the RDR, and
|
304 |
|
|
if a character is received without error, then upper case are converted to
|
305 |
|
|
lower case characters, and vice-versa. Using a Keyspan Quad Port USB serial
|
306 |
|
|
port adapter, characters were sent to the M16C5x at a rate of 921.6k baud, the
|
307 |
|
|
highest programmable baud rate supported by the Keyspan device. The echo back
|
308 |
|
|
to terminal emulator appeared to be without error. (**Note:** _the two wire
|
309 |
|
|
RS-232 mode of the UART was used for this test. The ADM3232 charge-pump RS-232
|
310 |
|
|
transceiver appeared to work well at this frequency. Som slew rate limiting is
|
311 |
|
|
visible on an O-scope, but it appears to be tolerable. These tests were
|
312 |
|
|
conducted while the core was operating at **117.9648 MHz**._)
|
313 |
|
|
|
314 |
|
|
This release is expected to be the last public release of this soft-core
|
315 |
|
|
microcomputer. The released core and peripherals are sufficient to demonstrate
|
316 |
|
|
a non-trivial FPGA implementation of a soft-core microcomputer. Further
|
317 |
|
|
developments will be focused on improving access to the internal block RAMs,
|
318 |
|
|
and improving the I/O capabilities of the release core.
|
319 |
|
|
|
320 |
|
|
###Release 2.5
|
321 |
|
|
|
322 |
|
|
Converted the core to operate in a single cycle mode with the block RAM
|
323 |
|
|
memories of the FPGA. Operating frequency, in a -4 Spartan 3A FPGA, is 60+
|
324 |
|
|
MHz. This rate is equivalent to the 117.9848 MHz reported above of for Release
|
325 |
|
|
2.4. Some combinatorial path improvements were made to the processor core,
|
326 |
|
|
P16C5x, by using wired-OR bus connections rather than explicit multiplexers.
|
327 |
|
|
These improvements also provided some reductions in the resource utilization
|
328 |
|
|
of the project.
|