1 |
8 |
ja_rd |
\documentclass[11pt]{article}
|
2 |
|
|
\usepackage{graphicx} % needed for including graphics e.g. EPS, PS
|
3 |
|
|
\usepackage{multirow}
|
4 |
|
|
\usepackage{alltt}
|
5 |
|
|
\topmargin -1.5cm % read Lamport p.163
|
6 |
|
|
\oddsidemargin -0.04cm % read Lamport p.163
|
7 |
|
|
\evensidemargin -0.04cm % same as oddsidemargin but for left-hand pages
|
8 |
|
|
\textwidth 16.59cm
|
9 |
|
|
\textheight 21.94cm
|
10 |
|
|
%\pagestyle{empty} % Uncomment if don't want page numbers
|
11 |
|
|
\parskip 7.2pt % sets spacing between paragraphs
|
12 |
|
|
%\renewcommand{\baselinestretch}{1.5} % Uncomment for 1.5 spacing between lines
|
13 |
|
|
\parindent 15pt % sets leading space for paragraphs
|
14 |
|
|
|
15 |
|
|
\begin{document}
|
16 |
|
|
|
17 |
|
|
\section{Basic behavior}
|
18 |
|
|
\label{basics}
|
19 |
|
|
|
20 |
|
|
The microcoded machine ($\mu$M) is built around a register bank and an 8-bit
|
21 |
|
|
ALU with registered operands T1 and T2. It performs all its operations in two
|
22 |
|
|
cycles, so I have divided it in two stages: an operand stage and an ALU stage.
|
23 |
|
|
This is nothing more than a 2-stage pipeline. \\
|
24 |
|
|
|
25 |
|
|
In the operand stage, registers T1 and T2 are loaded with either the
|
26 |
|
|
contents of the register bank (RB) or the input signal DI.\\
|
27 |
|
|
In the ALU stage, the ALU output is written back into the RB or loaded
|
28 |
|
|
into the output register DO. Besides, flags are updated, or not, according to
|
29 |
|
|
the microinstruction ($\mu$I) in execution.\\
|
30 |
|
|
|
31 |
|
|
Every microinstruction controls the operation of the operand stage
|
32 |
|
|
and the succeeding ALU stage; that is, the execution of a $\mu$I extends over 2
|
33 |
|
|
succeeding clock cycles, and microinstructions overlap each other. This means
|
34 |
|
|
that the part of the $\mu$I that controls the 2nd stage has to be pipelined; in
|
35 |
|
|
the VHDL code, I have divided the $\mu$I in a field\_1 and a field\_2, the
|
36 |
|
|
latter of which is registered (pipelined) and controls the 2nd $\mu$M stage
|
37 |
|
|
(ALU). \\
|
38 |
|
|
Many of the control signals are encoded in the microinstructions in what I
|
39 |
|
|
have improperly called flags. You will see many references to flags in the
|
40 |
|
|
following text (\#end,\#decode, etc.). They are just signals that you can
|
41 |
|
|
activate individually in each $\mu$I, some are active in the 1st stage, some in
|
42 |
|
|
the 2nd. They are all explained in section ~\ref{ucodeFlags}. \\
|
43 |
|
|
|
44 |
|
|
Note that microinstructions are atomic: both stages are guaranteed to
|
45 |
|
|
execute in all circumstances. Once the 1st stage of a $\mu$I has executed,
|
46 |
|
|
the only thing that can prevent the execution of the 2nd stage is a reset.\\
|
47 |
|
|
It might have been easier to design the machine so that microinstructions
|
48 |
|
|
executed in one cycle, thus needing no pipeline for the $\mu$I itself. I
|
49 |
|
|
arbitrarily chose to 'split' the microcode execution, figuring that it would be
|
50 |
|
|
easier for me to understand and program the microcode; in hindsight it may have
|
51 |
|
|
been a mistake but in the end, once the debugging is over, it makes little
|
52 |
|
|
difference.\\
|
53 |
|
|
|
54 |
|
|
The core as it is now does not support wait states: it does all its
|
55 |
|
|
external accesses (memory or i/o, read or write) in one clock cycle. It would
|
56 |
|
|
not be difficult to improve this with some little modification to the
|
57 |
|
|
micromachine, without changes to the microcode.\\
|
58 |
|
|
Since the microcode rom is the same type of memory as will be used for program
|
59 |
|
|
memory, the main advantage of microprogramming is lost. Thus, it would make
|
60 |
|
|
sense to develop the core a bit further with support for wait states, so it
|
61 |
|
|
could take advantage of the speed difference between the FPGA and external slow
|
62 |
|
|
memory.\\
|
63 |
|
|
The register bank reads asynchronously, while writes are synchronous. This
|
64 |
|
|
is the standard behaviour of a Spartan LUT-based RAM. The register bank holds
|
65 |
|
|
all the 8080 registers, including the accumulator, plus temporary, 'hidden'
|
66 |
|
|
registers (x,y,w,z). Only the PSW register is held out of the register bank, in
|
67 |
|
|
a DFF-8 register.
|
68 |
|
|
|
69 |
|
|
\section{Micromachine control}
|
70 |
|
|
\label{umachineControl}
|
71 |
|
|
|
72 |
|
|
\subsection{Microcode operation}
|
73 |
|
|
\label{ucodeOperation}
|
74 |
|
|
|
75 |
|
|
There is little more to the core that what has already been said; all the
|
76 |
|
|
CPU operations are microcoded, including interrupt response, reset and
|
77 |
|
|
instruction opcode fetch. The microcode source code can be seen in file
|
78 |
|
|
\texttt{ucode/light8080.m80}, in a format I expect will be less obscure than a
|
79 |
|
|
plain vhdl constant table.\\
|
80 |
|
|
|
81 |
|
|
The microcode table is a synchronous ROM with 512 32-bit words, designed
|
82 |
|
|
to fit in a Spartan 3 block ram. Each 32-bit word makes up a microinstruction.
|
83 |
|
|
The microcode 'program counter' (uc\_addr in the VHDL code) thus is a 9-bit
|
84 |
|
|
register.\\
|
85 |
|
|
Out of those 512 words, 256 (the upper half of the table) are used as a
|
86 |
|
|
jump-table for instruction decoding. Each entry at 256+NN contains a 'JSR'
|
87 |
|
|
$\mu$I to the start of the microcode for the instruction whose opcode is NN.
|
88 |
|
|
This seemingly unefficient use of RAM is in fact an optimization for the
|
89 |
|
|
Spartan-3 architecture to which this design is tailored –-- the 2KB RAM blocks
|
90 |
|
|
are too large for the microcode so I chose to fill them up with the decoding
|
91 |
|
|
table.\\
|
92 |
|
|
This scheme is less than efficient where smaller RAM blocks are available (e.g.
|
93 |
|
|
Altera Stratix).\\
|
94 |
|
|
The jump table is built automatically by the microcode
|
95 |
|
|
assembler, as explained in section ~\ref{ucodeAssembler}.\\
|
96 |
|
|
The upper half of the table can only be used for decoding; JSR
|
97 |
|
|
instructions can only point to the lower half, and execution from address 0x0ff
|
98 |
|
|
rolls over to 0x00 (or would; the actual microcode does not use this
|
99 |
|
|
'feature').\\
|
100 |
|
|
|
101 |
|
|
The ucode address counter uc\_addr has a number of possible sources: the
|
102 |
|
|
micromachine supports one level of micro-subroutine calls; it can also
|
103 |
|
|
return from those calls; the uc\_addr gets loaded with some constant values upon
|
104 |
|
|
reset, interrupt or instruction fetch. And finally, there is the decoding jump
|
105 |
|
|
table mentioned above. So, in summary, these are the possible sources of
|
106 |
|
|
uc\_addr each cycle:
|
107 |
|
|
|
108 |
|
|
\begin{itemize}
|
109 |
|
|
\item Constant value of 0x0001 at reset (see VHDL source for details).
|
110 |
|
|
\item Constant value of 0x0003 at the beginning (fetch cycle) of every
|
111 |
|
|
instruction.
|
112 |
|
|
\item Constant value of 0x0007 at interrupt acknowledge.
|
113 |
|
|
\item uc\_addr + 1 in normal microinstruction execution
|
114 |
|
|
\item Some 8-bit value included in JSR microinstructions (calls).
|
115 |
|
|
\item The return value preserved in the last JSR (used when flag \#ret is
|
116 |
|
|
raised)
|
117 |
|
|
\end{itemize}
|
118 |
|
|
|
119 |
|
|
All of this is readily apparent, I hope, by inspecting the VHDL source.
|
120 |
|
|
Note that there is only one jump microinstruction (JSR) which doubles as 'call';
|
121 |
|
|
whenever a jump is taken the the 1-level-deep 'return stack' is loaded with
|
122 |
|
|
the return address (address of the $\mu$I following the jump). You just have to
|
123 |
|
|
ignore the return address when you don't need it (e.g. the jumps in the decoding
|
124 |
|
|
jump table). I admit this scheme is awkward and inflexible; but it was the first
|
125 |
|
|
I devised, it works and fits the area budget: more than enough in this project.
|
126 |
|
|
A list of all predefined, 'special' microcode addresses follows.\\
|
127 |
|
|
\begin{itemize}
|
128 |
|
|
\item \textbf{0x001 –-- reset}\\
|
129 |
|
|
After reset, the $\mu$I program counter (uc\_addr in the VHDL code) is
|
130 |
|
|
initialized to 0x00. The program counter works as a pre-increment counter when
|
131 |
|
|
reading the microcode rom, so the $\mu$I at address 0 never gets executed (unless
|
132 |
|
|
'rolling over' from address 0x0ff, which the actual microcode does not). Reset
|
133 |
|
|
starts at address 1 and takes 2 microinstructions to clear PC to 0x0000. It does
|
134 |
|
|
nothing else. After clearing the PC the microcode runs into the fetch routine.
|
135 |
|
|
\item \textbf{0x003 –-- fetch}\\
|
136 |
|
|
The fetch routine places the PC in the address output lines while postincrementing
|
137 |
|
|
it, and then enables a memory read cycle. In doing so it relies on
|
138 |
|
|
T2 being 0x00 (necessary for the ADC to behave like an INC in the oversimplified
|
139 |
|
|
ALU), which is always true by design. After the fetch is done, the \#decode flag
|
140 |
|
|
is raised, which instructs the micromachine to take the value in the DI signal
|
141 |
|
|
(data input from external memory) and load it into the IR and the microcode
|
142 |
|
|
address counter, while setting the high address bit to 1. At the resulting
|
143 |
|
|
address there will be a JSR $\mu$I pointing to the microcode for the 8080 opcode in
|
144 |
|
|
question (the microcode assembler will make sure of that). The \#decode flag will
|
145 |
|
|
also clear registers T1 and T2.
|
146 |
|
|
\item \textbf{0x007 –-- halt}\\
|
147 |
|
|
Whenever a HALT instruction is executed, the \#halt flag is raised, which
|
148 |
|
|
when used in the same $\mu$I as flag \#end, makes the the micromachine jump to this
|
149 |
|
|
address. The $\mu$I at this address does nothing but raise flags \#halt and \#end. The
|
150 |
|
|
micromachine will keep jumping to this address until the halt state is left,
|
151 |
|
|
something which can only happen by reset or by interrupt. The \#halt flag, when
|
152 |
|
|
raised, sets the halt output signal, which will be cleared when the CPU exits
|
153 |
|
|
the halt state.
|
154 |
|
|
\end{itemize}
|
155 |
|
|
|
156 |
|
|
\subsection{Conditional jumps}
|
157 |
|
|
\label{conditionalJumps}
|
158 |
|
|
|
159 |
|
|
There is a conditional branch microinstruction: TJSR. This instruction
|
160 |
|
|
tests certain condition and, if the condition is true, performs exactly as JSR.
|
161 |
|
|
Otherwise, it ends the microcode execution exactly as if the flag \#end had been
|
162 |
|
|
raised. This microinstruction has been made for the conditional branches and
|
163 |
|
|
returns of the 8080 CPU and is not flexible enough for any other use.
|
164 |
|
|
The condition tested is encoded in the register IR, in the field ccc (bits
|
165 |
|
|
5..3), as encoded in the conditional instructions of the 8080 –-- you can look
|
166 |
|
|
them up in any 8080 reference. Flags are updated in the 2nd stage, so a TJSR
|
167 |
|
|
cannot test the flags modified by the previous $\mu$I. But it is not necessary; this
|
168 |
|
|
instruction will always be used to test conditions set by previous 8080
|
169 |
|
|
instructions, separated at least by the opcode fetch $\mu$Is, and probably many
|
170 |
|
|
more. Thus, the condition flags will always be valid upon testing.
|
171 |
|
|
|
172 |
|
|
\subsection{Implicit operations}
|
173 |
|
|
\label{implicitOperations}
|
174 |
|
|
|
175 |
|
|
Most micromachine operations happen only when explicitly commanded. But
|
176 |
|
|
some happen automatically and have to be taken into account when coding the
|
177 |
|
|
microprogram:
|
178 |
|
|
|
179 |
|
|
\begin{enumerate}
|
180 |
|
|
\item Register IR is loaded automatically when the flag \#decode is raised. The
|
181 |
|
|
microcode program counter is loaded automatically with the same value as
|
182 |
|
|
the IR, as has been explained above. From that point on, execution resumes
|
183 |
|
|
normally: the jump table contains normal JSR microinstructions.
|
184 |
|
|
\item T1 is cleared to 0x00 at reset, when the flag \#decode is active or when
|
185 |
|
|
the flag \#clrt1 is used.
|
186 |
|
|
\item T2 is cleared to 0x00 at reset, when the flag \#decode is active or when
|
187 |
|
|
the flag \#end is used.
|
188 |
|
|
\item Microcode flow control:
|
189 |
|
|
\begin{enumerate}
|
190 |
|
|
\item When flag \#end is raised, execution continues at $\mu$code address
|
191 |
|
|
0x0003.
|
192 |
|
|
\item When both flags \#halt and \#end are raised, execution continues at
|
193 |
|
|
$\mu$code address 0x0007, unless there is an interrupt pending.
|
194 |
|
|
\item Otherwise, when flag \#ret is raised, execution continues in the address
|
195 |
|
|
following the last JSR executed. If such a return is tried before a JSR
|
196 |
|
|
has executed since the last reset, the results are undefined –-- this
|
197 |
|
|
should never happen with the microcode source as it is now.
|
198 |
|
|
\item If none of the above flags are used, the next $\mu$I is executed.
|
199 |
|
|
\end{enumerate}
|
200 |
|
|
\end{enumerate}
|
201 |
|
|
|
202 |
|
|
Notice that both T1 and T2 are cleared at the end of the opcode fetch, so
|
203 |
|
|
they are guaranteed to be 0x00 at the beginning of the instruction microcode.
|
204 |
|
|
And T2 is cleared too at the end of the instruction microcode, so it is
|
205 |
|
|
guaranteed clear for its use in the opcode fetch microcode. T1 can be cleared
|
206 |
|
|
if a microinstruction so requires. Refer to the section on microinstruction
|
207 |
|
|
flags.
|
208 |
|
|
|
209 |
|
|
|
210 |
|
|
\section{Microinstructions}
|
211 |
|
|
\label{uinstructions}
|
212 |
|
|
|
213 |
|
|
The microcode for the CPU is a source text file encoded in a format
|
214 |
|
|
described below. This 'microcode source' is assembled by the microcode assembler
|
215 |
|
|
(described later) which then builds a microcode table in VHDL format. There's
|
216 |
|
|
nothing stopping you from assembling the microcode by hand directly on the VHDL
|
217 |
|
|
source, and in a machine this simple it might have been better.
|
218 |
|
|
|
219 |
|
|
|
220 |
|
|
\subsection{Microcode source format}
|
221 |
|
|
\label{ucodeFormat}
|
222 |
|
|
|
223 |
|
|
The microcode source format is more similar to some early assembly language
|
224 |
|
|
that to other microcodes you may have seen. Each non-blank,
|
225 |
|
|
non-comment line of code contains a single microinstruction in the format
|
226 |
|
|
informally described below:
|
227 |
|
|
|
228 |
|
|
% there must be some cleaner way to do this in TeX...
|
229 |
|
|
|
230 |
|
|
\begin{alltt}
|
231 |
|
|
\textless microinstruction line \textgreater :=
|
232 |
|
|
[\textless label \textgreater]\footnote{Labels appear alone by themselves in a line} \textbar
|
233 |
|
|
\textless operand stage control \textgreater ; \textless ALU stage control \textgreater [; [\textless flag list \textgreater]] \textbar
|
234 |
|
|
JSR \textless destination address \textgreater\textbar TJSR \textless destination address \textgreater
|
235 |
|
|
\\
|
236 |
|
|
\textless label \textgreater := \{':' immediately followed by a common identifier\}
|
237 |
|
|
\textless destination address \textgreater := \{an identifier defined as a label anywhere in the file\}
|
238 |
|
|
\textless operand stage control \textgreater := \textless op\_reg \textgreater = \textless op\_src \textgreater \textbar NOP
|
239 |
|
|
\textless op\_reg \textgreater := T1 \textbar T2
|
240 |
|
|
\textless op\_src \textgreater := \textless register \textgreater \textbar DI \textbar \textless IR register \textgreater
|
241 |
|
|
\textless IR register \textgreater := \{s\}\textbar\{d\}\textbar\{p\}0\textbar\{p\}1\footnote{Registers are specified by IR field}
|
242 |
|
|
\textless register \textgreater := \_a\textbar\_b\textbar\_c\textbar\_d\textbar\_e\textbar\_h\textbar\_l\textbar\_f\textbar\_a\textbar\_ph\textbar\_pl\textbar\_x\textbar\_y\textbar\_z\textbar\_w\textbar\_sh\textbar\_sl
|
243 |
|
|
\textless ALU stage control \textgreater := \textless alu\_dst \textgreater = \textless alu\_op \textgreater \textbar NOP
|
244 |
|
|
\textless alu\_dst \textgreater := \textless register \textgreater \textbar DO
|
245 |
|
|
\textless alu\_op \textgreater := add\textbar adc\textbar sub\textbar sbb\textbar and\textbar orl\textbar not\textbar xrl\textbar rla\textbar rra\textbar rlca\textbar rrca\textbar aaa\textbar
|
246 |
|
|
t1\textbar rst\textbar daa\textbar cpc\textbar sec\textbar psw
|
247 |
|
|
\textless flag list \textgreater := \textless flag \textgreater [, \textless flag \textgreater ...]
|
248 |
|
|
\textless flag \textgreater := \#decode\textbar\#di\textbar\#ei\textbar\#io\textbar\#auxcy\textbar\#clrt1\textbar\#halt\textbar\#end\textbar\#ret\textbar\#rd\textbar\#wr\textbar\#setacy
|
249 |
64 |
ja_rd |
\#ld\_al\textbar\#ld\_addr\textbar\#fp\_c\textbar\#fp\_r\textbar\#fp\_rc\textbar\#clr\_acy \footnote{There are some restrictions on the flags that can be used together} \\
|
250 |
8 |
ja_rd |
\end{alltt}
|
251 |
|
|
|
252 |
|
|
|
253 |
|
|
Please bear in mind that this is just an informal description; I made
|
254 |
|
|
it up from my personal notes and the assembler source. The ultimate reference is
|
255 |
|
|
the microcode source itself and the assembler source.\\
|
256 |
|
|
Due to the way that flags have been encoded (there's less than one bit per
|
257 |
|
|
flag in the microinstruction), there are restrictions on what flags can be used
|
258 |
|
|
together. See section ~\ref{ucodeFlags}.
|
259 |
|
|
|
260 |
|
|
The assembler will complain if the source does not comply with the
|
261 |
|
|
expected format; but syntax check is somewhat weak.
|
262 |
|
|
In the microcode source you will see words like \_\_reset, \_\_fetch, etc.
|
263 |
|
|
which don't fit the above syntax. Those were supposed to be assembler pragmas,
|
264 |
|
|
which the assembler would use to enforce the alignment of the microinstructions
|
265 |
|
|
to certain addresses. I finally decided not to use them and align the
|
266 |
|
|
instructions myself. The assembler ignores them but I kept them as a reminder.
|
267 |
|
|
|
268 |
|
|
The 1st part of the $\mu$I controls the ALU operand stage; we can load either
|
269 |
|
|
T1 or T2 with either the contents of the input signal DI, or the selected
|
270 |
|
|
register from the register bank. Or, we can do nothing (NOP).\\
|
271 |
|
|
The 2nd part of the $\mu$I controls the ALU stage; we can instruct the ALU to
|
272 |
|
|
perform some operation on the operands T1 and T2 loaded by this same
|
273 |
|
|
instruction, in the previous stage; and we can select where to load the ALU
|
274 |
|
|
result, eiher in the output register DO or in the register bank. Or we can do
|
275 |
|
|
nothing of the above (NOP).
|
276 |
|
|
|
277 |
|
|
The write address for the register bank used in the 2nd stage has to be the
|
278 |
|
|
same as the read address used in the 1st stage; that is, if both $\mu$I parts use the
|
279 |
|
|
RB, both have to use the same address (the assembler will enforce this
|
280 |
|
|
restriction). This is due to an early, silly mistake that I chose not to fix:
|
281 |
|
|
there is a single $\mu$I field that holds both addresses.\\
|
282 |
|
|
This is a very annoying limitation that unduly complicates the microcode
|
283 |
|
|
and wastes many microcode slots for no saving in hardware; I just did not want
|
284 |
|
|
to make any major refactors until the project is working. As
|
285 |
|
|
you can see in the VHDL source, the machine is prepared to use 2 independent
|
286 |
|
|
address fields with little modification. I may do this improvement and others
|
287 |
|
|
in a later version, but only when I deem the design 'finished' (since the design
|
288 |
|
|
as it is already exceeds my modest performance target).
|
289 |
|
|
|
290 |
|
|
|
291 |
|
|
\subsection{Microcode ALU operations}
|
292 |
|
|
\label{ucodeAluOps}
|
293 |
|
|
|
294 |
|
|
\begin{tabular}{|l|l|l|l|}
|
295 |
|
|
\hline
|
296 |
|
|
\multicolumn{4}{|c|}{ALU operations} \\
|
297 |
|
|
\hline
|
298 |
|
|
Operation & encoding & result & notes \\
|
299 |
|
|
|
300 |
|
|
\hline ADD & 001100 & T2 + T1 & \\
|
301 |
|
|
\hline ADC & 001101 & T2 + T1 + CY & \\
|
302 |
|
|
\hline SUB & 001110 & T2 - T1 & \\
|
303 |
|
|
\hline SBB & 001111 & T2 – T1 - CY & \\
|
304 |
|
|
\hline AND & 000100 & T1 AND T2 & \\
|
305 |
|
|
\hline ORL & 000101 & T1 OR T2 & \\
|
306 |
|
|
\hline NOT & 000110 & NOT T1 & \\
|
307 |
|
|
\hline XRL & 000111 & T1 XOR T2 & \\
|
308 |
|
|
\hline RLA & 000000 & 8080 RLC & \\
|
309 |
|
|
\hline RRA & 000001 & 8080 RRC & \\
|
310 |
|
|
\hline RLCA & 000010 & 8080 RAL & \\
|
311 |
|
|
\hline RRCA & 000011 & 8080 RAR & \\
|
312 |
|
|
\hline T1 & 010111 & T1 & \\
|
313 |
|
|
\hline RST & 011111 & 8*IR(5..3) & as per RST instruction \\
|
314 |
|
|
\hline DAA & 101000 & DAA T1 & but only after executing 2 in a row \\
|
315 |
|
|
\hline CPC & 101100 & UNDEFINED & CY complemented \\
|
316 |
|
|
\hline SEC & 101101 & UNDEFINED & CY set \\
|
317 |
|
|
\hline PSW & 110000 & PSW & \\
|
318 |
|
|
\hline
|
319 |
|
|
|
320 |
|
|
\end{tabular}
|
321 |
|
|
|
322 |
|
|
|
323 |
|
|
|
324 |
|
|
Notice that ALU operation DAA takes two cycles to complete; it uses a
|
325 |
|
|
dedicated circuit with an extra pipeline stage. So it has to be executed twice
|
326 |
|
|
in a row before taking the result -- refer to microcode source for an example.\\
|
327 |
|
|
The PSW register is updated with the ALU result at every cycle, whatever
|
328 |
|
|
ALU operation is executed –- though every ALU operation computes flags by
|
329 |
|
|
different means, as it is apparent in the case of CY. Which flags are updated,
|
330 |
|
|
and which keep their previous values, is defined by a microinstruction field
|
331 |
|
|
named flag\_pattern. See the VHDL code for details.
|
332 |
|
|
|
333 |
|
|
|
334 |
|
|
\subsection{Microcode binary format}
|
335 |
|
|
\label{ucodeBinFormat}
|
336 |
|
|
|
337 |
|
|
\begin{tabular}{|l|l|l|}
|
338 |
|
|
\hline
|
339 |
|
|
\multicolumn{3}{|c|}{Microcode word bitfields} \\ \hline
|
340 |
|
|
POS & VHDL NAME & PURPOSE \\ \hline
|
341 |
|
|
31..29 & uc\_flags1 & Encoded flag of group 1 (see section on flags) \\ \hline
|
342 |
|
|
28..26 & uc\_flags2 & Encoded flag of group 2 (see section on flags) \\ \hline
|
343 |
|
|
25 & load\_addr & Address register load enable (note 1) \\ \hline
|
344 |
|
|
24 & load\_al & AL load enable (note 1) \\ \hline
|
345 |
|
|
23 & load\_t1 & T1 load enable \\ \hline
|
346 |
|
|
22 & load\_t2 & T2 load enable \\ \hline
|
347 |
|
|
21 & mux\_in & T1/T2 source mux control (0 for DI, 1 for reg bank) \\ \hline
|
348 |
|
|
20..19 & rb\_addr\_sel & Register bank address source control (note 2) \\ \hline
|
349 |
|
|
18..15 & ra\_field & Register bank address (used both for write and read) \\ \hline
|
350 |
64 |
ja_rd |
14 & clr\_acy & Clear CY and AC -- see explaination below (pipelined signal) \\ \hline
|
351 |
8 |
ja_rd |
13..10 & (unused) & Reserved for write register bank address, unused yet \\ \hline
|
352 |
|
|
11..10 & uc\_jmp\_addr(7..6) & JSR/TJSR jump address, higher 2 bits \\ \hline
|
353 |
|
|
9..8 & flag\_pattern & PSW flag update control (note 3) (pipelined signal) \\ \hline
|
354 |
|
|
7 & load\_do & DO load enable (note 4) (pipelined signal) \\ \hline
|
355 |
|
|
6 & we\_rb & Register bank write enable (pipelined signal) \\ \hline
|
356 |
|
|
5..0 & uc\_jmp\_addr(5..0) & JSR/TJSR jump address, lower 6 bits \\ \hline
|
357 |
|
|
5..0 & (several) & Encoded ALU operation \\ \hline
|
358 |
|
|
\end{tabular}
|
359 |
|
|
|
360 |
|
|
\begin{itemize}
|
361 |
|
|
\item {\bf Note 1: load\_al}\\
|
362 |
|
|
AL is a temporary register for the lower byte of the external 16 bit
|
363 |
|
|
address. The memory interface (and the IO interface) assumes external
|
364 |
|
|
synchronous memory, so the 16 bit address has to be externally loaded as
|
365 |
|
|
commanded by load\_addr.
|
366 |
|
|
Note that both halves of the address signal load directly from the
|
367 |
|
|
register bank output; you can load AL with PC, for instance, in the same cycle
|
368 |
|
|
in which you modify the PC –- AL will load with the pre-modified value.
|
369 |
|
|
|
370 |
|
|
\item {\bf Note 2 : rb\_addr\_sel}\\
|
371 |
|
|
A microinstruction can access any register as specified by ra\_field, or
|
372 |
|
|
the register fields in the 8080 instruction opcode: S, D and RP (the
|
373 |
|
|
microinstruction can select which register of the pair). In the microcode source
|
374 |
|
|
this is encoded like this:
|
375 |
|
|
\begin{description}
|
376 |
|
|
\item[\{s\}] $\Rightarrow$ 0 \& SSS
|
377 |
|
|
\item[\{d\}] $\Rightarrow$ 0 \& DDD
|
378 |
|
|
\item[\{p\}0] $\Rightarrow$ 1 \& PP \& 0 (HIGH byte of register pair)
|
379 |
|
|
\item[\{p\}1] $\Rightarrow$ 1 \& PP \& 1 (LOW byte of register pair)
|
380 |
|
|
\end{description}
|
381 |
|
|
\small SSS = IR(5 downto 3) (source register)\\
|
382 |
|
|
\small DDD = IR(2 downto 0) (destination register)\\
|
383 |
|
|
\small PP = IR(5 downto 4) (register pair)\\
|
384 |
|
|
|
385 |
|
|
\item {\bf Note 3 : flag\_pattern}\\
|
386 |
|
|
Selects which flags of the PSW, if any, will be updated by the
|
387 |
|
|
microinstruction:
|
388 |
|
|
\begin{itemize}
|
389 |
|
|
\item When flag\_pattern(0)='1', CY is updated in the PSW.
|
390 |
|
|
\item When flag\_pattern(1)='1', all flags other than CY are updated in the PSW.
|
391 |
|
|
\end{itemize}
|
392 |
|
|
|
393 |
|
|
\item {\bf Note 4 : load\_do}\\
|
394 |
|
|
DO is the data ouput register that is loaded with the ALU output, so the
|
395 |
|
|
load enable signal is pipelined.
|
396 |
|
|
|
397 |
|
|
\item {\bf Note 5 : JSR-H and JSR-L}\\
|
398 |
|
|
These fields overlap existing fields which are unused in JSR/TJSR
|
399 |
|
|
instructions (fields which can be used with no secondary effects).
|
400 |
|
|
|
401 |
|
|
\end{itemize}
|
402 |
|
|
|
403 |
|
|
\subsection{Microcode flags}
|
404 |
|
|
\label{ucodeFlags}
|
405 |
|
|
|
406 |
|
|
|
407 |
|
|
Flags is what I have called those signals of the microinstruction that you
|
408 |
|
|
assert individually in the microcode source. Due to the way they have been
|
409 |
|
|
encoded, I have separated them in two groups. Only one flag in each group can be
|
410 |
|
|
used in any instruction. These are all the flags in the format thay appear in
|
411 |
|
|
the microcode source:
|
412 |
|
|
|
413 |
|
|
\begin{itemize}
|
414 |
|
|
\item Flags from group 1: use only one of these
|
415 |
|
|
\begin{itemize}
|
416 |
|
|
\item \#decode : Load address counter and IR with contents of data input
|
417 |
|
|
lines, thus starting opcode decoging.
|
418 |
|
|
\item \#ei : Set interrupt enable register.
|
419 |
|
|
\item \#di : Reset interrupt enable register.
|
420 |
|
|
\item \#io : Activate io signal for 1st cycle.
|
421 |
|
|
\item \#auxcy : Use aux carry instead of regular carry for this $\mu$I.
|
422 |
|
|
\item \#clrt1 : Clear T1 at the end of 1st cycle.
|
423 |
|
|
\item \#halt : Jump to microcode address 0x07 without saving return value,
|
424 |
|
|
when used with flag \#end, and only if there is no interrupt
|
425 |
|
|
pending. Ignored otherwise.
|
426 |
|
|
\end{itemize}
|
427 |
|
|
|
428 |
|
|
\item Flags from group 2: use only one of these
|
429 |
|
|
\begin{itemize}
|
430 |
|
|
\item \#setacy : Set aux carry at the start of 1st cycle (used for ++).
|
431 |
|
|
\item \#end : Jump to microinstruction address 3 after the present m.i.
|
432 |
|
|
\item \#ret : Jump to address saved by the last JST or TJSR m.i.
|
433 |
|
|
\item \#rd : Activate rd signal for the 2nd cycle.
|
434 |
|
|
\item \#wr : Activate wr signal for the 2nd cycle.
|
435 |
|
|
\end{itemize}
|
436 |
|
|
|
437 |
|
|
\item Independent flags: no restrictions
|
438 |
|
|
\begin{itemize}
|
439 |
|
|
\item \#ld\_al : Load AL register with register bank output as read by opn. 1
|
440 |
|
|
(used in memory and io access).
|
441 |
|
|
\item \#ld\_addr : Load address register (H byte = register bank output as read
|
442 |
|
|
by operation 1, L byte = AL).
|
443 |
|
|
Activate vma signal for 1st cycle.
|
444 |
64 |
ja_rd |
\item \#clr\_acy : Clear PSW flags AC and CY, except for AND instructions
|
445 |
|
|
(ALU operation = 000100), where AC is set.
|
446 |
|
|
Meant to be used with flag \#fp\_rc for the logic instructions (AND, OR, XOR).
|
447 |
|
|
See \ref{compatibility} for a note about compatibility to the original 8080.
|
448 |
8 |
ja_rd |
\end{itemize}
|
449 |
|
|
|
450 |
|
|
\item PSW update flags: use only one of these
|
451 |
|
|
\begin{itemize}
|
452 |
|
|
\item \#fp\_r : This instruction updates all PSW flags except for C.
|
453 |
|
|
\item \#fp\_c : This instruction updates only the C flag in the PSW.
|
454 |
|
|
\item \#fp\_rc : This instruction updates all the flags in the PSW.
|
455 |
|
|
\end{itemize}
|
456 |
|
|
|
457 |
|
|
\end{itemize}
|
458 |
|
|
|
459 |
|
|
\section{Notes on the microcode assembler}
|
460 |
|
|
\label{ucodeAssembler}
|
461 |
|
|
|
462 |
|
|
The microcode assembler is a Perl script (\texttt{util/uasm.pl}). Please refer
|
463 |
|
|
to the comments in the script for a reference on the usage of the assembler.\\
|
464 |
|
|
I will admit up front that the microcode source format and the assembler
|
465 |
|
|
program itself are a mess. They were hacked quickly and then often retouched
|
466 |
|
|
but never redesigned, in order to avoid the 'never ending project' syndrome.\\
|
467 |
|
|
Please note that use of the assembler, and the microcode assembly source,
|
468 |
|
|
is optional and perhaps overkill for this simple core. All you need to build the
|
469 |
|
|
core is the vhdl source file.\\
|
470 |
|
|
|
471 |
|
|
The perl assembler itself accounted for more than half of all the bugs I caught
|
472 |
|
|
during development.
|
473 |
|
|
Though the assembler certainly saved me a lot of mistakes in the hand-assembly
|
474 |
|
|
of the microcode, a half-cooked assembler like
|
475 |
|
|
this one may do more harm than good. I expect that the program now behaves
|
476 |
|
|
correctly; I have done a lot of modifications to the microcode source for
|
477 |
|
|
testing purposes and I have not found any more bugs in the assembler. But you
|
478 |
|
|
have been warned: don't trust the assembler too much (in case someone actually
|
479 |
|
|
wants to mess with these things at all).\\
|
480 |
|
|
The assembler is a Perl program (\texttt{util/uasm.pl}) that will read a
|
481 |
|
|
microcode text source file and write to stdout a microcode table in the form of
|
482 |
|
|
a chunk of VHDL code. You are supposed to capture that output and paste it into
|
483 |
|
|
the VHDL source (Actually, I used another perl script to do that, but I don't
|
484 |
|
|
want to overcomplicate an already messy documentation).\\
|
485 |
|
|
The assembler can do some other simple operations on the source, for debug
|
486 |
|
|
purposes. The invocation options are documented in the program file.\\
|
487 |
|
|
You don't need any extra Perl modules or libraries, any distribution of Perl 5
|
488 |
|
|
will do -– earlier versions should too but might not, I haven't
|
489 |
|
|
tested.
|
490 |
|
|
|
491 |
|
|
\section{CPU details}
|
492 |
|
|
\label{cpuDetails}
|
493 |
|
|
|
494 |
|
|
\subsection{Synchronous memory and i/o interface}
|
495 |
|
|
\label{syncMem}
|
496 |
|
|
|
497 |
|
|
The core is designed to connect to external synchronous memory similar to
|
498 |
|
|
the internal fpga ram blocks found in the Spartan series. It can be used with
|
499 |
|
|
asynchronous ram provided that you add the necessary registers (I have used it
|
500 |
|
|
with external SRAM included on a development board with no trouble).
|
501 |
|
|
|
502 |
|
|
Signal 'vma' is the master read/write enable. It is designed to be used as
|
503 |
|
|
a synchronous rd/wr enable. All other memory/io signals are only valid when vma
|
504 |
|
|
is active. Read data is sampled in the positive clock edge following deassertion
|
505 |
|
|
of vma. Than is, the core expects external memory and io to behave as an
|
506 |
|
|
internal fpga block ram would.\\
|
507 |
|
|
I think the interface is simple enough to be fully described by the
|
508 |
|
|
comments in the header of the VHDL source file.
|
509 |
|
|
|
510 |
|
|
\subsection{Interrupt response}
|
511 |
|
|
\label{irqResponse}
|
512 |
|
|
|
513 |
|
|
Interrupt response has been greatly simplified, but it follows the outline
|
514 |
|
|
of the original procedure. The biggest difference is that inta is
|
515 |
|
|
active for the entire duration of the instruction, and not only the opcode fetch
|
516 |
|
|
cycle.
|
517 |
|
|
|
518 |
|
|
Whenever a high value is sampled in line intr in any positive clock edge,
|
519 |
|
|
an interrupt pending flag is internally raised. After the current instruction
|
520 |
|
|
finishes execution, the interrupt pending flag is sampled. If active, it is
|
521 |
|
|
cleared, interrupts are disabled and the processor enters an inta cycle. If
|
522 |
|
|
inactive, the processor enters a fetch cycle as usual.
|
523 |
|
|
The inta cycle is identical to a fetch cycle, with the exception that inta
|
524 |
|
|
signal is asserted high.
|
525 |
|
|
|
526 |
|
|
The processor will fetch an opcode during the first inta cycle and will
|
527 |
|
|
execute it normally, except the PC increment will not happen and inta will be
|
528 |
|
|
high for the duration of the instruction. Note that though pc increment is
|
529 |
|
|
inhibited while inta is high, pc can be explicitly changed (rst, jmp, etc.).
|
530 |
|
|
After the special inta instruction execution is done, the processor
|
531 |
|
|
resumes normal execution, with interrupts disabled.\\
|
532 |
|
|
The above means that any instruction (even XTHL, which the original 8080
|
533 |
|
|
forbids) can be used as an interrupt vector and will be executed normally. The
|
534 |
|
|
core has been tested with rst, lxi and inr, for example.
|
535 |
|
|
|
536 |
|
|
Since there's no M1 signal available, feeding multi-byte instructions as
|
537 |
|
|
interrupt vectors can be a little complicated. It is up to you to deal with this
|
538 |
|
|
situation (i.e. use only single-byte vectors or make up some sort of cycle
|
539 |
|
|
counter).
|
540 |
|
|
|
541 |
|
|
\subsection{Instruction timing}
|
542 |
|
|
\label{timing}
|
543 |
|
|
|
544 |
|
|
This core is slower than the original in terms of clocks per instruction.
|
545 |
|
|
Since the original 8080 was itself one of the slowest micros ever, this does not
|
546 |
|
|
say much for the core. Yet, one of these clocked at 50MHz would outperform an
|
547 |
|
|
original 8080 at 25 Mhz, which is fast enough for many control applications ---
|
548 |
|
|
except that there are possibly better alternatives.\\
|
549 |
|
|
A comparative table follows.
|
550 |
|
|
|
551 |
|
|
|
552 |
|
|
\begin{tabular}{|l|l|l|l|l|l|l|}
|
553 |
|
|
\hline
|
554 |
|
|
\multicolumn{7}{|c|}{Instruction timing (core vs. original)} \\ \hline
|
555 |
|
|
|
556 |
|
|
Opcode & Intel 8080 & Light8080 & & Opcode & Intel 8080 & Light8080 \\ \hline
|
557 |
|
|
|
558 |
|
|
MOV r1, r2 & 5 & 6 & & XRA M & 7 & 9 \\ \hline
|
559 |
|
|
MOV r, M & 7 & 9 & & XRI data & 7 & 9 \\ \hline
|
560 |
|
|
MOV M, r & 7 & 9 & & ORA r & 4 & 6 \\ \hline
|
561 |
|
|
MVI r, data & 7 & 9 & & ORA M & 7 & 9 \\ \hline
|
562 |
|
|
MVI M, data & 10 & 12 & & ORI data & 7 & 9 \\ \hline
|
563 |
|
|
LXI rp, data16 & 10 & 14 & & CMP r & 4 & 6 \\ \hline
|
564 |
|
|
LDA addr & 13 & 16 & & CMP M & 7 & 9 \\ \hline
|
565 |
|
|
STA addr & 13 & 16 & & CPI data & 7 & 9 \\ \hline
|
566 |
|
|
LHLD addr & 16 & 19 & & RLC & 4 & 5 \\ \hline
|
567 |
|
|
SHLD addr & 16 & 19 & & RRC & 4 & 5 \\ \hline
|
568 |
|
|
LDAX rp & 7 & 9 & & RAL & 4 & 5 \\ \hline
|
569 |
|
|
STAX rp & 7 & 9 & & RAR & 4 & 5 \\ \hline
|
570 |
|
|
XCHG & 4 & 16 & & CMA & 4 & 5 \\ \hline
|
571 |
|
|
ADD r & 4 & 6 & & CMC & 4 & 5 \\ \hline
|
572 |
|
|
ADD M & 7 & 9 & & STC & 4 & 5 \\ \hline
|
573 |
|
|
ADI data & 7 & 9 & & JMP & 10 & 15 \\ \hline
|
574 |
|
|
ADC r & 4 & 6 & & Jcc & 10 & 12/16 \\ \hline
|
575 |
|
|
ADC M & 7 & 9 & & CALL & 17 & 29 \\ \hline
|
576 |
|
|
ACI data & 7 & 9 & & Ccc & 11/17 & 12/30 \\ \hline
|
577 |
|
|
SUB r & 4 & 6 & & RET & 10 & 14 \\ \hline
|
578 |
|
|
SUB M & 7 & 9 & & Rcc & 5/11 & 5/15 \\ \hline
|
579 |
|
|
SUI data & 7 & 9 & & RST n & 11 & 20 \\ \hline
|
580 |
|
|
SBB r & 4 & 6 & & PCHL & 5 & 8 \\ \hline
|
581 |
|
|
SBB M & 7 & 9 & & PUSH rp & 11 & 19 \\ \hline
|
582 |
|
|
SBI data & 7 & 9 & & PUSH PSW & 11 & 19 \\ \hline
|
583 |
|
|
INR r & 5 & 6 & & POP rp & 10 & 14 \\ \hline
|
584 |
|
|
INR M & 10 & 13 & & POP PSW & 10 & 14 \\ \hline
|
585 |
|
|
INX rp & 5 & 6 & & XTHL & 18 & 32 \\ \hline
|
586 |
|
|
DCR r & 5 & 6 & & SPHL & 5 & 8 \\ \hline
|
587 |
|
|
DCR M & 10 & 14 & & EI & 4 & 5 \\ \hline
|
588 |
|
|
DCX rp & 5 & 6 & & DI & 4 & 5 \\ \hline
|
589 |
|
|
DAD rp & 10 & 8 & & IN port & 10 & 14 \\ \hline
|
590 |
|
|
DAA & 4 & 6 & & OUT port & 10 & 14 \\ \hline
|
591 |
|
|
ANA r & 4 & 6 & & HLT & 7 & 5 \\ \hline
|
592 |
|
|
ANA M & 7 & 9 & & NOP & 4 & 5 \\ \hline
|
593 |
|
|
ANI data & 7 & 9 & & & & \\ \hline
|
594 |
|
|
XRA r & 4 & 6 & & & & \\ \hline
|
595 |
|
|
|
596 |
|
|
\end{tabular}
|
597 |
|
|
|
598 |
64 |
ja_rd |
\clearpage
|
599 |
|
|
|
600 |
|
|
\subsection{Binary compatibility to original 8080}
|
601 |
|
|
\label{compatibility}
|
602 |
|
|
|
603 |
|
|
Flag AC (auxiliary carry) does not work exactly as in the original 8080. In the
|
604 |
|
|
original 8080, ANI and ANA don't clear AC but set it to the OR'ing of
|
605 |
|
|
bits 3 of the ALU operands.
|
606 |
|
|
|
607 |
|
|
In this core, these two instructions instead set the AC flag to 1. In this, the
|
608 |
|
|
core is compatible to the 8085 ad not to the 8080.
|
609 |
|
|
|
610 |
|
|
That is the only difference to the original 8080 that I am aware of.
|
611 |
|
|
Unfortunately, the only test bench that I have available right now is not
|
612 |
|
|
exhaustive enough to pick that kind of detail. Until I develop a stronger test
|
613 |
|
|
bench, full compatibility to the 8080 can't be guaranteed.
|
614 |
8 |
ja_rd |
|
615 |
|
|
\end{document}
|
616 |
|
|
|