OpenCores
URL https://opencores.org/ocsvn/m16c5x/m16c5x/trunk

Subversion Repositories m16c5x

[/] [m16c5x/] [trunk/] [README.txt] - Blame information for rev 2

Go to most recent revision | Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 MichaelA
M16C5x Soft-Core Microcomputer
2
=======================
3
 
4
Copyright (C) 2013, Michael A. Morris .
5
All Rights Reserved.
6
 
7
Released under LGPL.
8
 
9
General Description
10
-------------------
11
 
12
This project demonstrates the use of a PIC16C5x-compatible core as an FPGA-
13
based processor. The core provided is instruction set compatible, but it is
14
not a cycle accurate model of any particular PIC microcomputer. It implements
15
the 12-bit instruction set, the timer 0 module, the pre-scaler, and the watchdog
16
timer.
17
 
18
As configured, the core supports single cycle (1) operation with internal
19
block RAM serving as program memory. In addition to the block RAM program
20
store, a 4x clock generator and reset controller is included as part of the in
21
the demonstration.
22
 
23
Three I/O ports are supported, but they are accessed as external registers and
24
buffers using a bidirectional data bus. The TRIS I/O control registers are
25
similarly supported. Thus, the core's user is able to map the TRIS and I/O
26
port registers in a manner appropriate to the intended application.
27
 
28
Read-modify-operations on the I/O ports do not generate read strobes. Read
29
strobes of the three I/O ports are generated only if the ports are being read
30
using MOVF xxx,0 instructions. Similarly, the write enables for the three I/O
31
ports are asserted whenever the ports are updated. This occurs during MOVWF
32
instructions, or during read- modify-write operations such as XORF, MOVF, etc.
33
 
34
Implementation
35
--------------
36
 
37
The implementation of the core provided consists of several Verilog source files
38
and memory initialization files:
39
 
40
    M16C5x.v                - Top level module
41
        M16C5x_ClkGen.v     - M16C5x Clock/Reset Generator
42
        P16C5x.v            - PIC16C5x-compatible processor core
43
            P16C5x_IDEC.v   - ROM-based instruction decoder for PIC16C5x core
44
            P16C5x_ALU.v    - Arithmetic & Logic Unit for PIC16C5x core
45
        M16C5x_SPI.v        - High-Speed, FIFO-buffered SPI Master Interface
46
            DPSFmnCE.v      - Configurable Depth/Width LUT-based Synch FIFO
47
                TF_Init.coe - Transmit FIFO Initialization file
48
                RF_Init.coe - Receive FIFO Initialization file
49
            SPIxIF.v        - Configurable Master SPI I/F with clock Generator
50
        M16C5x_UART.v       - UART with Serial Interface
51
            SSPx_Slv.v      - SSP-compatible Slave Interface
52
            SSP_UART.v      - SSP-compatible UART
53
                re1ce.v     - Rising Edge Clock Domain Crossing Synchronizer
54
                DPSFmnCE.v  - onfigurable Depth/Width LUT-based Synch FIFO
55
                    UART_TF.coe - UART Transmit FIFO Initialization file
56
                    UART_RF.coe - UART Receive FIFO Initialization file
57
                UART_BRG.v  - UART Baud Rate Generator
58
                UART_TXSM.v - UART Transmit State Machine (includes SR)
59
                UART_RXSM.v - UART Receive State Machine (includes SR)
60
                UART_RTO.v  - UART Receive Timeout Generator
61
                UART_INT.v  - UART Interrupt Generator
62
 
63
        M16C5x_Test.coe     - M16C5x Test Program Memory Initialization File
64
        M16C5x_Tst2.coe     - M16C5x Test #2 Program Memory Initialization File
65
        M16C5x_Tst3.coe     - M16C5x Test #3 Program Memory Initialization File
66
        M16C5x_Tst4.coe     - M16C5x Test #4 Program Memory Initialization File
67
 
68
        M16C5x.ucf          - M16C5x User Constraint File
69
        M16C5x.bmm          - M16C5x Block RAM Memory Map File
70
 
71
Verilog tesbench files are included for the processor core, the FIFO, and the
72
SPI modules.
73
 
74
    tb_M16C5x.v             - testbench for the soft-core processor module
75
    tb_P16C5x.v             - testbench for the processor core module
76
    tb_DPSFmnCE.v           - testbench for the LUT-based FIFO module
77
    tb_SPIxIF.v             - testbench for the SPI Master Interface module
78
 
79
Also provided is the MPLAB project and the source files used to create the
80
memory initialization files for testing the microcomputer application. These
81
files are found in the MPLAB subdirectory of the Code directory.
82
 
83
Finally, the configuration of the Xilinx tools used to synthesize, map, place,
84
and route are captured in the the TCL file:
85
 
86
        M16C5x_3S50A.tcl    - TCL file for XC3S50A-4VQG100I FPGA
87
 
88
Run this TCL script from within the TCL console of ISE, or examine it in a
89
text editor, to set up the project files and to set the tools to the options
90
used to achieve the results provided here.
91
 
92
Added utility program to convert MPLAB Intel Hex programming files into MEM
93
files for use with Xilinx Data2MEM utility program to speed the process of
94
incorporating program/data/parameter data into block RAMs. TCL also
95
incorporates the process parameter changes to get the BMM file processed by
96
Map/PAR/Bitgen.
97
 
98
    IH2MEM.c                    - Source code for Intel Hex to MEM utility
99
    IH2MEM.exe                  - Windows Executable (32-bit)
100
 
101
        M16C5x_Tst3.mem         - M16C5x Test #3 Program Memory Data2Mem File
102
        M16C5x_Tst4.mem         - M16C5x Test #4 Program Memory Data2Mem File
103
 
104
Synthesis
105
---------
106
 
107
The primary objective of the M16C5x is to synthesize a processor core, 4kW of
108
program memory, a buffered SPI master, and a buffered UART into a Xilinx
109
XC3S50A-4VQG100I FPGA. The present implementation includes the P16C5x core,
110
4kW of program memory, a dual-channel SPI Master I/F, and an SSP-compatible
111
UART supporting baud rates from 3M bps to 1200 bps.
112
 
113
Using ISE 10.1i SP3, the implementation results for an XC3S50A-4VQ100I are as
114
follows:
115
 
116
    Number of Slice FFs:                619 of 1408      43%
117
    Number of 4-input LUTs:            1287 of 1408      92%
118
    Number of Occupied Slices:          701 of  704      99%
119
    Total Number of 4-input LUTs:      1333 of 1408      94%
120
 
121
                    Logic:             1052
122
                    Route-Through:       46
123
                    16x1 RAMs:            8
124
                    Dual-Port RAMs:     194
125
                    32x1 RAMs:           32
126
                    Shift Registers:      1
127
 
128
    Number of BUFGMUXs:                   4 of   24      16%
129
    Number of DCMs:                       1 of    2      50%
130
    Number of RAMB16BWEs                  3 of    3     100%
131
 
132
    Best Case Achievable:           12.381 ns (0.119 ns Setup, 0.691 ns Hold)
133
 
134
Status
135
------
136
 
137
Design and initial verification is complete. Verification using ISim, MPLAB,
138
and a board with an XC3S200AN-4VQG100I FPGA, various oscillators, SEEPROMs,
139
and RS-232/RS-485 transceivers is underway.
140
 
141
In circuit testing of the M16C5x soft-core microcomputer has demonstrated that
142
the M16C5x can operate to **147.4560 MHz**. At this internal system clock
143
frequency, a 10x multiplication of the external reference oscillator, the SPI
144
shift clock divider must be set to divide the system clock by 4, which
145
generates an SPI shift clock frequency of 36.864 MHz. Various combinations of
146
the DCM multiplier have been generated at tested in the XC3S200A-4VQG100I
147
FPGA. The following table shows the system clock frequencies tested, the SPI
148
shift clock frequencies tested, and the maximum achievable standard UART bit
149
rate:
150
 
151
    DCM Multiplier  System Clock (MHz)  SPI Clock (MHz) Max UART bit rate (MHz)
152
        4x               58.9824            29.4912         3.6864
153
        5x               73.7280            36.8640         0.9216
154
        6x               88.4736            44.2368         0.9216
155
        6.5x             95.8464            47.9232         0.4608
156
        7x              103.2192            51.6096         0.9216
157
        7.5x            110.5920            55.2960         0.4608
158
        8x              117.9648            58.9824         7.3728
159
        8.5x            125.3376            62.6688         0.4608
160
        10x             147.4560            36.8640         1.8432
161
 
162
These results are only applicable to this particular configuration. The period
163
constraint for the system clock is set for 12.5 ns, or 80 MHz. The
164
relationship between the clock enable, 0.5 of the system clock, does not seem
165
to be accomodated by the reported performance values. Further investigation is
166
needed to establish if the results provided in the previous table should be
167
accepted as the performance limits of the M16C5x core in this FPGA family.
168
 
169
A board has been configured with an XC3S50A-4VQG100I components, and it
170
operates as expected at 80 MHz. A new internal resource configuration makes
171
the UART clock, Clk_UART, a fixed output of the DCM. The UART clock is fixed
172
at 2x ClkIn, or as is the case in this test configuration, 29.4912 MHz.
173
 
174
Testing like that performed above with the XC3S200A-4VQG100I is shown below.
175
It indicates that the upper operating frequency is limited to **140.0832
176
MHz**. This upper limit is most likely imposed by the reduction in routing
177
resources. The utilization factor in an XC3S50A-4VQG100I FPGA is **99%**, and _~50%_
178
in an XC3S200A-4VQG100I FPGA. The larger number of LUTs/Slices and routing
179
resources allows Map and Place greater flexibility to satisfy the timing
180
constraints.
181
 
182
    DCM Multiplier  System Clock (MHz)  SPI Clock (MHz) Max UART bit rate (MHz)
183
        4x               58.9824            29.4912         1.8432
184
        8x              117.9648            58.9824         1.8432
185
        8.5x            125.3376            62.6688         1.8432
186
        9x              132.7104            66.3552         1.8432
187
        9.5x            140.0832            70.0461         1.8432
188
 
189
Release Notes
190
-------------
191
 
192
###Release 1.0
193
 
194
In this release, the M16C5x has been synthesized, mapped, placed, routed, and
195
used to configure an FPGA. The FPGA used for this initial test of the M16C5x
196
was the XC3S200A-4VQG100I FPGA. The test program provided demonstrated that
197
the M16C5x was executing the program in the same manner as simulated with the
198
MPLAB simulator.
199
 
200
Using an external 14.7456 MHz oscillator, selected for use for use with the
201
UART, square waves were generated by the core to illuminate external LEDs
202
using the upper 6 bits of PortA. The square waves have the appropriate ratios,
203
and the frequency of the fastest LED drive signal is ~4.753kHz.
204
 
205
The clock generator multiplies the input frequency to 58.9824 MHz which
206
results in an effective instruction frequency of 29.4912 MHz because of the
207
two cycle nature of the core. The instruction loop is essentially 8*(*+3*256),
208
which equals 6208 cycles per LED toggle. The measured toggle frequency of the
209
fastest LED is approximately equal to 29.4912 MHz / 6208, or 4.750 kHz.
210
 
211
Work will continue to verify the testbench results with the FPGA. The next
212
release should include the UART, and test the ability of the core to
213
send/receive data using the FIFOs at rates of 115,200 baud or greater.
214
 
215
###Release 2.0
216
 
217
In this release, the UART has been addded. An update has been made to the SPI
218
I/F Master function; update correct fault with the framing of SPI Mode 3
219
frames with shift lengths greater than 1 byte. A correction, not fully tested
220
or verified, was made to the P16C5x core to correct anomalous behavior for
221
BTFSC/BTFSS instructions.
222
 
223
UART integrated with the Release 1.0 core. Verification of the integrated
224
interface is underway.
225
 
226
###Release 2.1
227
 
228
Testing with an M16C5x core processor program assembled using
229
MPLAB and ISIM showed that polling of the UART status register to determine
230
whether the transmit FIFO was empty or not (using the iTFE interrupt flag)
231
would clear the generated interrupt flags before they had actually been
232
captured and shifted in the SSP response to the core.
233
 
234
This indicated a clock domain crossing issue in the interrupt clearing logic.
235
This release fixes that issue. Previous use of the UART does not poll the USR,
236
so this problem does not manisfest itself in a reasonable amount of time, if
237
ever. In other words, the synchronization fault has been present all along in
238
the implementation, but the module's usage in the application (or testbench)
239
did not present the conditions under which the fault manifests.
240
 
241
The correction required registering the USR data on the SSP clock domain, and
242
qualifying the clearing of the interrupt flags on the basis of whether the
243
flag is set in both domains when the USR is read. The addition of the register
244
reduced the logic utilization, and only a small additonal time delay was
245
incurred. The resulting design is still able to fit into a Spartan 3A XC3S50A-
246
4VQG100I FPGA.
247
 
248
Modified the UART Baud Rate Generator. Removed the fixed 16x12 ROM that
249
provided the pre-scaler and divider constants for a fixed set of 16 baud
250
rates. Added a 12-bit, write-only register, BRR - Baud Rate Register, that can
251
be used to set the baud rate from 1/16 of the processor clock. With a
252
58.9824 MHz oscillator, the baud rate can range from 3.6864Mbps down to 900 bps.
253
Set the default baud rate to 9600 for a 58.9824 MHz UART clock.
254
 
255
Utilization for a XC3S50A-4VQG100I FPGA is 100%. The 128 byte LUT-based
256
receive FIFO can be reduced to accomodate some additional functions. Synthesis
257
and MAP/PAR able to implement the design. There is also some place holder
258
logic that can be used for other purposes.
259
 
260
###Release 2.2
261
 
262
Updated the soft-core so as to be able to parameterize the microcontroller
263
from the top module. Changed the frequency multiplication from 4 to 5 in order
264
to test operation at the frequency which the UCF constrains Map/PAR tools. The
265
input clock is driven by a 14.7456 MHz oscillator, and the clock multiplier
266
(DCM) generates **73.7280 MHz**. The default baud rate, 9600, required that the
267
default settings be adjusted. All other parameters remain the same.
268
 
269
Also added a Block RAM Memory Map file to the project. Utilized Xilinx's
270
Data2MEM tool to insert modified program contents into the affected Block RAMs
271
using MEM files dereived from standard MPLAB outputs. Tutorial on this subject
272
is being prepared and will be released on an associated Wiki soon.
273
 
274
###Release 2.3
275
 
276
Updated the soft-core microcomputer. Fixed the UART clock, Clk_UART, to twice
277
the input frequency. This means that the UART operates with a fixed reference
278
frequency unlike Release 2.2 where Clk_UART was set to the system clock
279
frequency.
280
 
281
Also added asynchronous resets to several registers in the UART so that it
282
would simulate correcly with ISim. Direct control of the UART prescaler and
283
divider was previously untested using the simulation. With that change to the
284
baud rate generator made to UART, the reset/power-on values of these two logic
285
functions are unknown. The unknowns, "X", propagate through the baud rate
286
generator and prevent the simulator from resolving the state of the internal
287
baud rate clock of the UART. Thus, although the rest of circuits simulate as
288
expected, the transmit shift register never shifts because there's an
289
"unknown" signal level applied on the bit clock.
290
 
291
###Release 2.4
292
 
293
Polling the UART's Receive Data Register (RDR) uncovered a race condition like
294
that previously found and corrected in regards to polling the UART Status
295
Register (USR). Correction required registering the RDR in the SCK clock
296
domain, and qualifying the read enable pulse for the receive FIFO so that it
297
is only generated if the Receive Rdy flag is present in the SCK clock domain.
298
Otherwise, the Receive FIFO is not read which prevents the inadvertent
299
clearing of the FIFO empty flag.
300
 
301
Test Program 4, M16C5x_Tst4.asm, is used to test the receive signal path.
302
Hyperterminal and Tera Term were used to sent (without local echo) several
303
large text files through the M16C5x UART. The test program polls the RDR, and
304
if a character is received without error, then upper case are converted to
305
lower case characters, and vice-versa. Using a Keyspan Quad Port USB serial
306
port adapter, characters were sent to the M16C5x at a rate of 921.6k baud, the
307
highest programmable baud rate supported by the Keyspan device. The echo back
308
to terminal emulator appeared to be without error. (**Note:** _the two wire
309
RS-232 mode of the UART was used for this test. The ADM3232 charge-pump RS-232
310
transceiver appeared to work well at this frequency. Som slew rate limiting is
311
visible on an O-scope, but it appears to be tolerable. These tests were
312
conducted while the core was operating at **117.9648 MHz**._)
313
 
314
This release is expected to be the last public release of this soft-core
315
microcomputer. The released core and peripherals are sufficient to demonstrate
316
a non-trivial FPGA implementation of a soft-core microcomputer. Further
317
developments will be focused on improving access to the internal block RAMs,
318
and improving the I/O capabilities of the release core.
319
 
320
###Release 2.5
321
 
322
Converted the core to operate in a single cycle mode with the block RAM
323
memories of the FPGA. Operating frequency, in a -4 Spartan 3A FPGA, is 60+
324
MHz. This rate is equivalent to the 117.9848 MHz reported above of for Release
325
2.4. Some combinatorial path improvements were made to the processor core,
326
P16C5x, by using wired-OR bus connections rather than explicit multiplexers.
327
These improvements also provided some reductions in the resource utilization
328
of the project.

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.