OpenCores
URL https://opencores.org/ocsvn/eco32/eco32/trunk

Subversion Repositories eco32

[/] [eco32/] [tags/] [eco32-0.26/] [doc/] [fpga-impl] - Blame information for rev 270

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 26 hellwig
 
2
FPGA Implementations of ECO32
3
=============================
4
 
5
eco32-00
6
--------
7
 
8
This is essentially the same as the solution of assigment 9 of the
9
course "Hardware for Embedded Systems", i.e., an implementation of
10
ECO32e. The differences are:
11
a) The reset circuit is moved to a subdirectory of its own. The
12
   duration of the reset pulse is reduced to 2^24/50MHz = 0.3 sec,
13
   a quarter of the original duration.
14
b) The reset circuit is connected to the pushbutton on the carrier
15
   board, which has been designated for reset by the manufacturer.
16
c) The bus controller is moved to a subdirectory of its own.
17
d) The top-level description is transformed from a schematic into
18
   plain text. This in turn eliminates the need for top-level
19
   symbols of the Reset/ROM/RAM/Busctrl/CPU/DSP/KBD circuits.
20
 
21
 
22
eco32-01
23
--------
24
 
25
We have a new module, "ser", which represents the circuit for a
26
serial interface (8 bit data, no parity, 1 stop bit, 38400 baud).
27
The data is buffered twice in both directions. The module is
28
instanciated once; the data in/out lines are connected to the
29
RS232 interface on the carrier board. The bus controller got
30
the necessary additional connections to drive the module.
31
 
32
 
33
eco32-02
34
--------
35
 
36
The fake RAM module is replaced by a preliminary implementation of
37
real RAM. It uses the block RAM of the FPGA (instead of the SDRAM
38
mounted as an extra chip on the board that the final implementation
39
will use). It is therefore very small in size: 4 blocks of 16K bits
40
each yield a total size of 2 KWords (8K bytes).
41
 
42
 
43
eco32-03
44
--------
45
 
46
This revision corrects an error which should have been corrected
47
a long time ago: the instructions ldb and ldh never sign-extended
48
their loaded data. On top of that, the instructions ldbu and ldhu
49
never placed zeroes into the bit positions 31 to 8 and 31 to 16,
50
respectively. This went undetected so far, because the implementation
51
of the bus did this already, although it is not explicitly requested.
52
 
53
 
54
eco32-04
55
--------
56
 
57
This version got a shift unit. It is connected in parallel to the
58
ALU, feeding its output into an expanded multiplexer. Because
59
arithmetic right shifts are slow, shifting needs an extra cycle
60
to complete. Even then it was necessary to request the place and
61
route effort level "high" to get by with a clock period of 20 nsec.
62
 
63
 
64
eco32-05
65
--------
66
 
67
Again there was an error to correct: I tried to scroll the display
68
by copying the display memory contents and discovered that reading
69
the memory needs an additional bus cycle (because the memory is
70
clocked). A simple state machine had to be written, which in turn
71
needed the reset signal. I changed the top-level description of
72
the display from a schematic to plain text.
73
 
74
 
75
eco32-06
76
--------
77
 
78
An easy job: I implemented the "jalr" instruction.
79
 
80
 
81
eco32-07
82
--------
83
 
84
This is the first step in getting the real memory to work:
85
I integrated the clock/reset module from my SDRAM controller
86
experiments. I also corrected the naming of the flash ROM
87
signals; all active-low signals are now consistently named
88
with a trailing "_n".
89
 
90
 
91
eco32-08
92
--------
93
 
94
We now have a working SDRAM controller!
95
 
96
 
97
eco32-09
98
--------
99
 
100
Second serial interface added.
101
 
102
 
103
eco32-10
104
--------
105
 
106
Branches based on signed comparisons added.
107
 
108
 
109
eco32-11
110
--------
111
 
112
Timer added.
113
 
114
 
115
eco32-12
116
--------
117
 
118
Multiply, divide, and remainder instructions done.
119
 
120
 
121
eco32-13
122
--------
123
 
124
A first attempt to introduce virtual addressing: a totally
125
minimalistic MMU consisting of two AND gates which suppress
126
the two MSBs of the virtual address if they are set. If
127
they are not, too bad - the virtual address is then mapped
128
to physical address 0.
129
 
130
 
131
eco32-14
132
--------
133
 
134
A couple of steps to make interrupts available:
135
 
136
a) The CPU gets an input vector of 16 interrupt request lines which
137
   are all tied to 0 in the top-level design external to the CPU.
138
 
139
b) The timer circuit's control register gets an interrupt enable bit,
140
   which gates the 'timer expired' status bit onto an additional
141
   output line, the timer's interrupt request line. This line is
142
   connected to the CPU's irq line 14.
143
 
144
c) Inside the CPU there must be a set of 4 special registers. They
145
   are implemented in a separate module. Two instructions (mvfs and
146
   mvts) transfer data between the standard and the special register
147
   sets. The data input of the special register set is connected to
148
   the standard register data output 2; the write enable signal for
149
   the special register set is controlled by the CPU's state machine.
150
   The data output of the special register set is connected to the
151
   data input 2 multiplexer of the standard register set, which has
152
   to be widened by one input (and by one control line also). The
153
   register number which selects the special register from/to which
154
   reading/writing should take place comes from the instruction
155
   register's immediate constant. The two new instructions get one
156
   extra state each in the CPU's state machine.
157
 
158
d) For interrupts and exceptions to take place there must be four
159
   additional values available which can be loaded into the PC:
160
   0xE0000004   general interrupts (V-bit of the PSW off)
161
   0xC0000004   general interrupts (V-bit of the PSW on)
162
   0xE0000008   user TLB miss (V-bit of the PSW off)
163
   0xC0000008   user TLB miss (V-bit of the PSW on)
164
   The contents of the special register 0 (the PSW) are needed at
165
   several places in the description of the CPU's state machine.
166
   They have to be set also, independently of the mvts instruction.
167
   Therefore an extra data path from/to the special register set
168
   is established, together with a separate write signal for the
169
   PSW. The state machine gets two new states, one to acknowledge
170
   interrupts and another one to implement the rfx instruction.
171
   Each instruction tests a specific 'interrupt trigger line'
172
   before returning to state 1 (instruction fetch). If it is set,
173
   the state machine branches to the 'interrupt' state. In this
174
   way we don't need a separate state before the 'instruction
175
   fetch' state to check for interrupts (and also avoid the
176
   unpleasant alternative: to merge interrupt detection into
177
   the fetch state - think of the already-incremented pc, for
178
   example). The trigger signal is set if there is any interrupt
179
   request present, its mask is open, and the global interrupt
180
   enable (in the PSW) is set. The ECO32 architecture defines
181
   5 bits in the PSW to be the priority of the last acknowledged
182
   interrupt. Therefore a priority encoder takes the vector of
183
   interrupt requests (possibly modified by closed mask bits)
184
   and determines the highest unmasked interrupt from that. The
185
   two additional states in the state machine also handle the
186
   two stacks (each three positions deep) for the 'interrupt
187
   enable' and 'user mode' flags within the PSW.
188
 
189
e) Since its construction, the ALU had two unused function encodings;
190
   they had been assigned to add and subtract, but were never used.
191
   They now deliver either the first or the second operand of the ALU
192
   to the output, unchanged. This simplifies three instructions (ldhi,
193
   jr, rfx) as well as the interrupt state in the CPU's state machine.
194
 
195
 
196
eco32-15
197
--------
198
 
199
We now have the 'trap' instruction. This is an important first
200
example of an exception.
201
 
202
 
203
eco32-16
204
--------
205
 
206
This version accepts the four TLB instructions as valid instructions
207
(but treats them as no-ops).
208
 
209
 
210
eco32-17
211
--------
212
 
213
A couple of steps to make exceptions work:
214
 
215
a) There are only 16 interrupts, so irq_priority is only [3:0] wide.
216
   The leading bit of the interrupt/exception priority in the PSW is
217
   explicitly set to 0 in state 15 (interrupt).
218
 
219
b) Generally, states returning to state 1 (instruction fetch) check
220
   the signal irq_trigger for pending interrupts and branch to state
221
   15 (interrupt) if it is set. This should NOT be done if the current
222
   state could possibly set the PSW to disable interrupts. So states
223
   15 (interrupt), 22 (mvts), 23 (rfx), and 24 (trap) don't do this
224
   check any longer. On the other hand, delaying the acceptance of
225
   a pending interrupt for a whole instruction would come as a hard
226
   surprise for an unsuspecting system programmer. It would in fact
227
   be possible to write an instruction sequence which never accepts
228
   any interrupts, although interrupts are expected to be enabled for
229
   one instruction:
230
       mvts $5,PSW    ; disable interrupts
231
   label:
232
       mvts $4,PSW    ; enable interrupts
233
       mvts $5,PSW    ; disable interrupts
234
       j label
235
   This cannot be tolerated. Therefore an additional state is inserted,
236
   just to check irq_trigger, computed from the new value of the PSW.
237
   This certainly makes no sense for interrupt and trap, because the new
238
   value of the interrupt enable flag in the PSW is known to be 0. So
239
   the new state is only reached from states 22 (mvts) and 23 (rfx).
240
   First, I did some renumbering of states:
241
   Renamed state 25 to 26 (TLB instruction).
242
   Renamed state 24 to 25 (trap).
243
   Then the additional state is called state 24.
244
 
245
c) Because the trap instruction is merely one of several possible causes
246
   for an exception, its execution state (25, see step b) above) can be
247
   used to implement exceptions. The exception number must be communicated
248
   to this state. We therefore have a 4-bit register named 'exc_priority'
249
   which must be set by any state transition to state 25. Its contents
250
   are appended to a leading 1 and then represent the exception priority
251
   which is found in the PSW.
252
 
253
d) The following exceptions are implemented:
254
     trap instruction exception
255
     illegal instruction exception
256
     divide instruction exception
257
 
258
e) The 'bus timeout exception' is implemented with the help of a counter
259
   which is activated if the bus is enabled and its wait line active.
260
   When the counter expires, the exception execution state is entered.
261
   There is a catch: if the bus timeout occurs during instruction fetch,
262
   the PC has yet its old value, i.e., it must not get decremented while
263
   handling the exception. This could be handled best by just another
264
   state (renaming state 26 to 27, and using the new state 26 for
265
   exception handling without decrementing the PC).
266
 
267
f) The 'privileged instruction exception' isn't difficult to implement
268
   but can only be tested if a TLB is present (because the test program
269
   must enter user mode in order to trigger the exception - and in user
270
   mode, instructions cannot be executed at addresses which have their
271
   MSB set without triggering a 'privileged address exception').
272
 
273
 
274
eco32-18
275
--------
276
 
277
This intermediate version got a new bus controller which does no longer
278
mirror RAM and ROM in their respective upper address spaces but signals
279
a bus timeout instead.
280
 
281
 
282
eco32-19
283
--------
284
 
285
This version implements the MMU with a TLB (first of two parts).
286
 
287
a) Add the TLB module. It consists of an "input section" (32 comparators
288
   working in parallel, and a priority encoder which computes the binary
289
   representation of the number of one of the matching comparators), and
290
   an "output section" which merely delivers the previously stored frame
291
   number and permission bits of the frame. The output section's memory
292
   is addressed by the output of the priority encoder. The two sections
293
   together implement a fully associative address translation cache.
294
 
295
b) Change the MMU from a purely combinational circuit to one which needs
296
   a single clock cycle to compute its output. This is necessary because
297
   the RAM which stores frame numbers in the TLB output section also needs
298
   one cycle to read its contents.
299
 
300
c) In the controller of the CPU add one state before each bus cycle state
301
   (i.e., three states: fetch, load, and store). These additional states
302
   perform the address translation from a virtual to a physical address.
303
   I added three new states (28..30) which now implement the bus cycles
304
   and reassigned the old state numbers (1, 12, 14) to the states which
305
   do address translations.
306
 
307
d) The MMU must implement several functions:
308
     no operation, hold output
309
     map virtual to physical address
310
     execute tbs
311
     execute tbwr
312
     execute tbri
313
     execute tbwi
314
   The controller instructs the MMU which function is to be executed.
315
 
316
e) The tbwr instruction needs a "random" index. This can be generated
317
   by a counter which counts down at every clock pulse, instruction
318
   fetch, or address mapping request. There is a catch: if the counter
319
   would count on every clock pulse and each instruction would need a
320
   multiple of 2 clock pulses to complete, then only half the entries
321
   of the TLB would be used. Thus counting instructions is safer, and
322
   furthermore counting address mappings is cheaper than that (because
323
   address mapping is already one of the functions of the MMU and
324
   therefore easily detectable).
325
 
326
f) The values of the special registers 1 (TLB Index), 2 (TLB EntryHi),
327
   and 3 (TLB EntryLo) are needed within the MMU. The MMU also must
328
   write new values to these registers under certain circumstances.
329
   Three dedicated signals for each of these special registers (old
330
   value, write enable, new value) enable the MMU to do so.
331
 
332
g) In principle, the tbri instruction needs two clock cycles to do
333
   its work: one cycle to read the TLB and another one to write the
334
   data to special register 3. This can be reduced to a single clock
335
   cycle (write to special register 3) if the RAM's contents are read
336
   out by default within every clock cycle.
337
 
338
 
339
eco32-20
340
--------
341
 
342
This version implements the MMU with a TLB (second of two parts).
343
 
344
a) Detect privileged and illegal address exceptions within the state
345
   machine. In order to do so, virtual address bits 31, 1, and 0 must
346
   be available there. The exceptions are detected in the address
347
   translation states (1, 12, 14). Control is transferred to state
348
   25 (or 26 in case of violation during instruction fetch) with
349
   exc_priority set accordingly. Although not yet needed for the bus,
350
   the bus size lines must be set to the intended transfer width
351
   already in the translation states in order to detect illegal
352
   addresses there (before the bus is actually accessed). Last but
353
   not least the MMU must not try to map an address if that triggered
354
   one of the two exceptions.
355
 
356
b) The TLB supplies three control signals (tlb_missed, tlb_invalid,
357
   and tlb_wrtprot) which are needed to detect the three exceptions
358
   "TLB miss", "TLB entry invalid", and "page frame write protected".
359
   The first of these, tlb_missed, is generated in the "input section"
360
   of the TLB and has to be delayed for one clock cycle so that it
361
   appears at the TLB output at the same time the other two signals do.
362
   The three signals are routed to the CPU's state machine. Because
363
   they are valid only after the address translation took place (the
364
   valid and write bits are stored together with the frame number),
365
   the error conditions can only be detected in the bus cycle states.
366
   The actual bus cycle however must suppress its bus enable signal,
367
   if any exception has been detected.
368
   Attention: the three control signals must be de-asserted if the
369
   address in question is directly mapped (i.e., has its two MSBs set).
370
 
371
c) The tlb_missed signal has in fact to be splitted into two signals:
372
   tlb_kmissed (MSB of address is 1) and tlb_umissed (MSB is 0). This
373
   must be done in order to route "user TLB misses" to another start
374
   address. Furthermore, the V bit in the PSW has to be considered and
375
   the ISR start address modified accordingly.
376
 
377
d) The three write enable signals for the three special TLB registers
378
   are best produced within the main CPU state machine, because they
379
   are dependent on the opcode if one of the TLB instructions is
380
   executed. They must also be asserted according to any exception
381
   which das been detected.
382
 
383
 
384
eco32-21
385
--------
386
 
387
I changed the display description from a schematic to plain Verilog.
388
 
389
 
390
eco32-22
391
--------
392
 
393
The display has got character attributes: one attribute byte per
394
character stored in the display memory. The bits in the attribute
395
byte are loosely imitating those from the good old CGA adapter in
396
text mode.
397
  Bit 7:  blinking foreground
398
  Bit 6:  background red
399
  Bit 5:  background green
400
  Bit 4:  background blue
401
  Bit 3:  intensified foreground
402
  Bit 2:  foreground red
403
  Bit 1:  foreground green
404
  Bit 0:  foreground blue
405
 
406
 
407
eco32-23
408
--------
409
 
410
Now the keyboard can interrupt the CPU.
411
 
412
 
413
eco32-24
414
--------
415
 
416
Project re-organized. All source files are now located under a single
417
directory "src". Now it is easier to clean up a project after editing
418
or testing: simply remove all files and directories except "src" and
419
the project manager's control file "eco32.npl".
420
 
421
 
422
eco32-25
423
--------
424
 
425
The reset circuit had the following problem: although an externally
426
applied reset signal (produced by pressing the "reset" pushbutton)
427
was internally recognized for initializing the CPU, it did not work
428
the other way around, which is important when re-loading the FPGA.
429
In this case, the CPU was reset, but the external devices, especially
430
the disk drive, did not get a reset signal. So the drive could get
431
out of sync with its controller. The reset circuit now actively drives
432
the external bidirectional reset line when performing a reset, as well
433
as observing this line when not actively driving it.
434
 
435
 
436
eco32-26
437
--------
438
 
439
This is the first version with a real IDE disk attached! Thanks to
440
Martin Geisse, who did a very nice job.
441
 
442
 
443
eco32-27
444
--------
445
 
446
The two serial interfaces are now able to generate interrupt requests.
447
As far as I can see, the implementation is now functionally complete.
448
 
449
 
450
eco32-28
451
--------
452
 
453
The IDE disk interface had a small problem with reading/writing a block
454
of 8 sectors in a single operation. Fixed.
455
 
456
 
457
eco32-29
458
--------
459
 
460
Same as eco32-28, but with an ISE Version 11 project file. Because
461
it is now possible to develop exclusively under Linux (including
462
download to the FPGA board), all source files were converted to
463
newline-only line endings.
464
 

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.