1 |
2 |
jsauermann |
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
2 |
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
3 |
|
|
<HTML>
|
4 |
|
|
<HEAD>
|
5 |
|
|
<TITLE>html/Opcode_Fetch</TITLE>
|
6 |
|
|
<META NAME="generator" CONTENT="HTML::TextToHTML v2.46">
|
7 |
|
|
<LINK REL="stylesheet" TYPE="text/css" HREF="lecture.css">
|
8 |
|
|
</HEAD>
|
9 |
|
|
<BODY>
|
10 |
|
|
<P><table class="ttop"><th class="tpre"><a href="04_Cpu_Core.html">Previous Lesson</a></th><th class="ttop"><a href="toc.html">Table of Content</a></th><th class="tnxt"><a href="06_Data_Path.html">Next Lesson</a></th></table>
|
11 |
|
|
<hr>
|
12 |
|
|
|
13 |
|
|
<H1><A NAME="section_1">5 OPCODE FETCH</A></H1>
|
14 |
|
|
|
15 |
|
|
<P>In this lesson we will design the opcode fetch stage of the CPU core.
|
16 |
|
|
The opcode fetch stage is the simplest stage in the pipeline.
|
17 |
|
|
It is the stage that put life into the CPU core by generating a sequence
|
18 |
|
|
of opcodes that are then decoded and executed. The opcode fetch stage
|
19 |
|
|
is sometimes called the <STRONG>sequencer</STRONG> of the CPU.
|
20 |
|
|
|
21 |
|
|
<P>Since we use the Harvard architecture with separate program and data
|
22 |
|
|
memories, we can simply instantiate the program memory in the opcode fetch
|
23 |
|
|
stage. If you need more memory than your FPGA provides internally, then you
|
24 |
|
|
can design address and data buses towards an external memory instead
|
25 |
|
|
(or in addition). Most current FPGAs provide a lot of internal memory,
|
26 |
|
|
so we can keep things simple.
|
27 |
|
|
|
28 |
|
|
<P>The opcode fetch stage contains a sub-component <STRONG>pmem</STRONG>, which is
|
29 |
|
|
is the program memory. The main purpose of the opcode fetch stage is to
|
30 |
|
|
manipulate the program counter (<STRONG>PC</STRONG>) and to produce opcodes.
|
31 |
|
|
The <STRONG>PC</STRONG> is a local signal:
|
32 |
|
|
|
33 |
|
|
<P><br>
|
34 |
|
|
|
35 |
|
|
<pre class="vhdl">
|
36 |
|
|
|
37 |
|
|
69 signal L_PC : std_logic_vector(15 downto 0);
|
38 |
|
|
<pre class="filename">
|
39 |
|
|
src/opc_fetch.vhd
|
40 |
|
|
</pre></pre>
|
41 |
|
|
<P>
|
42 |
|
|
|
43 |
|
|
<P><br>
|
44 |
|
|
|
45 |
|
|
<P>The <STRONG>PC</STRONG> is updated on every clock with its next value. The <STRONG>T0</STRONG> output
|
46 |
|
|
is cleared when the <STRONG>WAIT</STRONG> signal is raised.
|
47 |
|
|
This causes the T0 output to be '1' on the first cycle of a 2 cycle
|
48 |
|
|
instruction and '0' on the second cycle:
|
49 |
|
|
|
50 |
|
|
<P><br>
|
51 |
|
|
|
52 |
|
|
<pre class="vhdl">
|
53 |
|
|
|
54 |
|
|
86 lpc: process(I_CLK)
|
55 |
|
|
87 begin
|
56 |
|
|
88 if (rising_edge(I_CLK)) then
|
57 |
|
|
89 L_PC <= L_NEXT_PC;
|
58 |
|
|
90 L_T0 <= not L_WAIT;
|
59 |
|
|
91 end if;
|
60 |
|
|
92 end process;
|
61 |
|
|
<pre class="filename">
|
62 |
|
|
src/opc_fetch.vhd
|
63 |
|
|
</pre></pre>
|
64 |
|
|
<P>
|
65 |
|
|
|
66 |
|
|
<P><br>
|
67 |
|
|
|
68 |
|
|
<P>The next value of the <STRONG>PC</STRONG> depends on the <STRONG>CLR</STRONG>, <STRONG>WAIT</STRONG>, <STRONG>LOAD_PC</STRONG>, and
|
69 |
|
|
<STRONG>LONG_OP</STRONG> signals:
|
70 |
|
|
|
71 |
|
|
<P><br>
|
72 |
|
|
|
73 |
|
|
<pre class="vhdl">
|
74 |
|
|
|
75 |
|
|
94 L_NEXT_PC <= X"0000" when (I_CLR = '1')
|
76 |
|
|
95 else L_PC when (L_WAIT = '1')
|
77 |
|
|
96 else I_NEW_PC when (I_LOAD_PC = '1')
|
78 |
|
|
97 else L_PC + X"0002" when (L_LONG_OP = '1')
|
79 |
|
|
98 else L_PC + X"0001";
|
80 |
|
|
<pre class="filename">
|
81 |
|
|
src/opc_fetch.vhd
|
82 |
|
|
</pre></pre>
|
83 |
|
|
<P>
|
84 |
|
|
|
85 |
|
|
<P><br>
|
86 |
|
|
|
87 |
|
|
<P>The <STRONG>CLR</STRONG> signal, which overrides all others, resets the <STRONG>PC</STRONG> to 0. It
|
88 |
|
|
is generated at power on and when the reset input of the CPU is
|
89 |
|
|
asserted. The <STRONG>WAIT</STRONG> signal freezes the <STRONG>PC</STRONG> at its current value. It
|
90 |
|
|
is used when an instruction needs two <STRONG>CLK</STRONG> cycles to complete. The
|
91 |
|
|
<STRONG>LOAD_PC</STRONG> signal causes the <STRONG>PC</STRONG> to be loaded with the value on the
|
92 |
|
|
<STRONG>NEW_PC</STRONG> input. The <STRONG>LOAD_PC</STRONG> signal is driven by the execution stage
|
93 |
|
|
when a jump instruction is executed. If neither <STRONG>CLR</STRONG>, <STRONG>WAIT</STRONG>, or <STRONG>LOAD_PC</STRONG>
|
94 |
|
|
is present then the <STRONG>PC</STRONG> is advanced to the next instruction. If the current
|
95 |
|
|
instruction is one of <STRONG>JMP</STRONG>, <STRONG>CALL</STRONG>, <STRONG>LDS</STRONG> and <STRONG>STS</STRONG>, then it has a length
|
96 |
|
|
of two 16-bit words and <STRONG>LONG_OP</STRONG> is set. This causes the PC to be
|
97 |
|
|
incremented by 2 rather than by the normal instruction length of 1:
|
98 |
|
|
|
99 |
|
|
<P><br>
|
100 |
|
|
|
101 |
|
|
<pre class="vhdl">
|
102 |
|
|
|
103 |
|
|
100 -- Two word opcodes:
|
104 |
|
|
101 --
|
105 |
|
|
102 -- 9 3210
|
106 |
|
|
103 -- 1001 000d dddd 0000 kkkk kkkk kkkk kkkk - LDS
|
107 |
|
|
104 -- 1001 001d dddd 0000 kkkk kkkk kkkk kkkk - SDS
|
108 |
|
|
105 -- 1001 010k kkkk 110k kkkk kkkk kkkk kkkk - JMP
|
109 |
|
|
106 -- 1001 010k kkkk 111k kkkk kkkk kkkk kkkk - CALL
|
110 |
|
|
107 --
|
111 |
|
|
108 L_LONG_OP <= '1' when (((P_OPC(15 downto 9) = "1001010") and
|
112 |
|
|
109 (P_OPC( 3 downto 2) = "11")) -- JMP, CALL
|
113 |
|
|
110 or ((P_OPC(15 downto 10) = "100100") and
|
114 |
|
|
111 (P_OPC( 3 downto 0) = "0000"))) -- LDS, STS
|
115 |
|
|
112 else '0';
|
116 |
|
|
<pre class="filename">
|
117 |
|
|
src/opc_fetch.vhd
|
118 |
|
|
</pre></pre>
|
119 |
|
|
<P>
|
120 |
|
|
|
121 |
|
|
<P><br>
|
122 |
|
|
|
123 |
|
|
<P>The <STRONG>CLR</STRONG>, <STRONG>SKIP</STRONG>, and <STRONG>I_INTVEC</STRONG> inputs are used to force a <STRONG>NOP</STRONG>
|
124 |
|
|
(no operation) opcode or an "interrupt opcode" onto the output of the
|
125 |
|
|
opcode fetch stage. An interrupt opcode is an opcode that does not
|
126 |
|
|
belong to the normal instruction set of the CPU (and is therefore not
|
127 |
|
|
generated by assemblers or compilers), but is used internally to trigger
|
128 |
|
|
interrupt processing (pushing of the PC, clearing the interrupt enable flag,
|
129 |
|
|
and jumping to specific locations) further down in the pipeline.
|
130 |
|
|
|
131 |
|
|
<P><br>
|
132 |
|
|
|
133 |
|
|
<pre class="vhdl">
|
134 |
|
|
|
135 |
|
|
133 L_INVALIDATE <= I_CLR or I_SKIP;
|
136 |
|
|
134
|
137 |
|
|
135 Q_OPC <= X"00000000" when (L_INVALIDATE = '1')
|
138 |
|
|
136 else P_OPC when (I_INTVEC(5) = '0')
|
139 |
|
|
137 else (X"000000" & "00" & I_INTVEC); -- "interrupt opcode"
|
140 |
|
|
<pre class="filename">
|
141 |
|
|
src/opc_fetch.vhd
|
142 |
|
|
</pre></pre>
|
143 |
|
|
<P>
|
144 |
|
|
|
145 |
|
|
<P><br>
|
146 |
|
|
|
147 |
|
|
<P><STRONG>CLR</STRONG> is derived from the reset input and also resets the program counter.
|
148 |
|
|
<STRONG>SKIP</STRONG> comes from the execution stage and is used to invalidate parts
|
149 |
|
|
of the pipeline, for example when a decision was made to take a conditional
|
150 |
|
|
branch. This will be explained in more detail in the lesson about branching.
|
151 |
|
|
|
152 |
|
|
<H2><A NAME="section_1_1">5.1 Program Memory</A></H2>
|
153 |
|
|
|
154 |
|
|
<P>The program memory is declared as follows:
|
155 |
|
|
|
156 |
|
|
<P><br>
|
157 |
|
|
|
158 |
|
|
<pre class="vhdl">
|
159 |
|
|
|
160 |
|
|
36 entity prog_mem is
|
161 |
|
|
37 port ( I_CLK : in std_logic;
|
162 |
|
|
38
|
163 |
|
|
39 I_WAIT : in std_logic;
|
164 |
|
|
40 I_PC : in std_logic_vector(15 downto 0); -- word address
|
165 |
|
|
41 I_PM_ADR : in std_logic_vector(11 downto 0); -- byte address
|
166 |
|
|
42
|
167 |
|
|
43 Q_OPC : out std_logic_vector(31 downto 0);
|
168 |
|
|
44 Q_PC : out std_logic_vector(15 downto 0);
|
169 |
|
|
45 Q_PM_DOUT : out std_logic_vector( 7 downto 0));
|
170 |
|
|
46 end prog_mem;
|
171 |
|
|
<pre class="filename">
|
172 |
|
|
src/prog_mem.vhd
|
173 |
|
|
</pre></pre>
|
174 |
|
|
<P>
|
175 |
|
|
|
176 |
|
|
<P><br>
|
177 |
|
|
|
178 |
|
|
<H3><A NAME="section_1_1_1">5.2.1 Dual Port Memory</A></H3>
|
179 |
|
|
|
180 |
|
|
<P>The program memory is a dual port memory. This means that two different
|
181 |
|
|
memory locations can be read or written at the same time. We don't
|
182 |
|
|
write to the program memory, be we would like to read two addresses
|
183 |
|
|
at the same time. The reason are the <STRONG>LPM</STRONG> (load program memory)
|
184 |
|
|
instructions. These instructions read from the program memory while
|
185 |
|
|
the program memory is fetching the next instructions. In a way these
|
186 |
|
|
instructions violate the Harvard architecture, but on the other hand
|
187 |
|
|
they are extremely useful for string constants in C. Rather than
|
188 |
|
|
initializing the (typically smaller) data memory with these constants,
|
189 |
|
|
one can leave them in program memory and access them using <STRONG>LPM</STRONG>
|
190 |
|
|
instructions,
|
191 |
|
|
|
192 |
|
|
<P>Without a dual port memory, we would have needed to stop the pipeline
|
193 |
|
|
during the execution of <STRONG>LPM</STRONG> instructions. Use of dual port memory
|
194 |
|
|
avoids this additional complexity.
|
195 |
|
|
|
196 |
|
|
<P>The second port used for <STRONG>LPM</STRONG> instructions consists of the address
|
197 |
|
|
input <STRONG>PM_ADR</STRONG> and the data output <STRONG>PM_DOUT</STRONG>. <STRONG>PM_ADR</STRONG> is a 12-bit
|
198 |
|
|
byte address (and consequently <STRONG>PM_DOUT</STRONG> is an 8-bit output. In
|
199 |
|
|
contrast, the other port uses an 11-bit word address.
|
200 |
|
|
|
201 |
|
|
<P>The other signals of the program memory belong to the first port
|
202 |
|
|
which is used for opcode fetches.
|
203 |
|
|
|
204 |
|
|
<H3><A NAME="section_1_1_2">5.2.2 Look-ahead for two word instructions</A></H3>
|
205 |
|
|
|
206 |
|
|
<P>The vast majority of AVR instructions are single-word (16-bit) instructions.
|
207 |
|
|
There are 4 exceptions, which are <STRONG>CALL</STRONG>, <STRONG>JMP</STRONG>, <STRONG>LDS</STRONG>, and <STRONG>STS</STRONG>. These
|
208 |
|
|
instructions have addresses (the target address for <STRONG>CALL</STRONG> and <STRONG>JMP</STRONG> and
|
209 |
|
|
data memory address for $LDS# and <STRONG>STS</STRONG>) in the word following the opcode.
|
210 |
|
|
|
211 |
|
|
<P>There are two ways to handle such opcodes. One way is to look back in the
|
212 |
|
|
pipeline when the second word is needed. When one of these instructions
|
213 |
|
|
reaches the execution stage, then the next word is clocked into the decoding
|
214 |
|
|
stage (so we could fetch it from there). It might lead to complications,
|
215 |
|
|
however, when it comes to invalidating the pipeline, insertion of interrupts
|
216 |
|
|
and the like.
|
217 |
|
|
|
218 |
|
|
<P>The other way, and the one we choose, is to divide the program memory into
|
219 |
|
|
an even memory and an odd memory. The internal memory modules in an FPGA are
|
220 |
|
|
anyhow small and therefore using two memories is almost as simple as using
|
221 |
|
|
one (both would consist of a number of smaller modules).
|
222 |
|
|
|
223 |
|
|
<P>There are two cases to consider: (1) an even <STRONG>PC</STRONG> (shown on the left of the
|
224 |
|
|
following figure) and (2) an odd <STRONG>PC</STRONG> shown on the right. In both cases do
|
225 |
|
|
we want the (combined) memory at address <STRONG>PC</STRONG> to be stored in the lower word
|
226 |
|
|
of the <STRONG>OPC</STRONG> output and the next word (at <STRONG>PC</STRONG>+1) in the upper word of <STRONG>OPC</STRONG>.
|
227 |
|
|
|
228 |
|
|
<P><br>
|
229 |
|
|
|
230 |
|
|
<P><img src="opcode_fetch_2.png">
|
231 |
|
|
|
232 |
|
|
<P><br>
|
233 |
|
|
|
234 |
|
|
<P>We observe the following:
|
235 |
|
|
|
236 |
|
|
<UL>
|
237 |
|
|
<LI>the odd memory address is <STRONG>PC[10:1]</STRONG> in both cases.
|
238 |
|
|
<LI>the even memory address is <STRONG>PC[10:1]</STRONG> + <STRONG>PC[0]</STRONG> in both cases.
|
239 |
|
|
<LI>the data outputs of the two memories are either straight or crossed,
|
240 |
|
|
depending (only) on <STRONG>PC[0]</STRONG>.
|
241 |
|
|
</UL>
|
242 |
|
|
<P>In VHDL, we express this like:
|
243 |
|
|
|
244 |
|
|
<P><br>
|
245 |
|
|
|
246 |
|
|
<pre class="vhdl">
|
247 |
|
|
|
248 |
|
|
252 L_PC_O <= I_PC(10 downto 1);
|
249 |
|
|
253 L_PC_E <= I_PC(10 downto 1) + ("000000000" & I_PC(0));
|
250 |
|
|
254 Q_OPC(15 downto 0) <= M_OPC_E when L_PC_0 = '0' else M_OPC_O;
|
251 |
|
|
255 Q_OPC(31 downto 16) <= M_OPC_E when L_PC_0 = '1' else M_OPC_O;
|
252 |
|
|
<pre class="filename">
|
253 |
|
|
src/prog_mem.vhd
|
254 |
|
|
</pre></pre>
|
255 |
|
|
<P>
|
256 |
|
|
|
257 |
|
|
<P><br>
|
258 |
|
|
|
259 |
|
|
<P>The output multiplexer uses the <STRONG>PC</STRONG> and <STRONG>PM_ADR</STRONG> of the previous cycle,
|
260 |
|
|
so we need to remember the lower bit(s) in signals <STRONG>PC_0</STRONG> and <STRONG>PM_ADR_1_0</STRONG>:
|
261 |
|
|
|
262 |
|
|
<P><br>
|
263 |
|
|
|
264 |
|
|
<pre class="vhdl">
|
265 |
|
|
|
266 |
|
|
224 pc0: process(I_CLK)
|
267 |
|
|
225 begin
|
268 |
|
|
226 if (rising_edge(I_CLK)) then
|
269 |
|
|
227 Q_PC <= I_PC;
|
270 |
|
|
228 L_PM_ADR_1_0 <= I_PM_ADR(1 downto 0);
|
271 |
|
|
229 if ((I_WAIT = '0')) then
|
272 |
|
|
230 L_PC_0 <= I_PC(0);
|
273 |
|
|
231 end if;
|
274 |
|
|
232 end if;
|
275 |
|
|
233 end process;
|
276 |
|
|
<pre class="filename">
|
277 |
|
|
src/prog_mem.vhd
|
278 |
|
|
</pre></pre>
|
279 |
|
|
<P>
|
280 |
|
|
|
281 |
|
|
<P><br>
|
282 |
|
|
|
283 |
|
|
<P>The split into two memories makes the entire program memory 32-bit
|
284 |
|
|
wide. Note that the PC is a word address, while PM_ADR is a byte address.
|
285 |
|
|
|
286 |
|
|
<H3><A NAME="section_1_1_3">5.2.3 Memory block instantiation and initialization.</A></H3>
|
287 |
|
|
|
288 |
|
|
<P>The entire program memory consists of 8 memory modules, four
|
289 |
|
|
for the even half (components <STRONG>pe_0</STRONG>, <STRONG>pe_1</STRONG>, <STRONG>pe_2</STRONG>, and <STRONG>pe_3</STRONG>) and
|
290 |
|
|
four for the odd part (<STRONG>po_0</STRONG>, <STRONG>po_1</STRONG>, <STRONG>po_2</STRONG>, and <STRONG>po_3</STRONG>).
|
291 |
|
|
|
292 |
|
|
<P>We explain the first module in detail:
|
293 |
|
|
|
294 |
|
|
<P><br>
|
295 |
|
|
|
296 |
|
|
<pre class="vhdl">
|
297 |
|
|
|
298 |
|
|
102 pe_0 : RAMB4_S4_S4 ---------------------------------------------------------
|
299 |
|
|
103 generic map(INIT_00 => pe_0_00, INIT_01 => pe_0_01, INIT_02 => pe_0_02,
|
300 |
|
|
104 INIT_03 => pe_0_03, INIT_04 => pe_0_04, INIT_05 => pe_0_05,
|
301 |
|
|
105 INIT_06 => pe_0_06, INIT_07 => pe_0_07, INIT_08 => pe_0_08,
|
302 |
|
|
106 INIT_09 => pe_0_09, INIT_0A => pe_0_0A, INIT_0B => pe_0_0B,
|
303 |
|
|
107 INIT_0C => pe_0_0C, INIT_0D => pe_0_0D, INIT_0E => pe_0_0E,
|
304 |
|
|
108 INIT_0F => pe_0_0F)
|
305 |
|
|
109 port map(ADDRA => L_PC_E, ADDRB => I_PM_ADR(11 downto 2),
|
306 |
|
|
110 CLKA => I_CLK, CLKB => I_CLK,
|
307 |
|
|
111 DIA => "0000", DIB => "0000",
|
308 |
|
|
112 ENA => L_WAIT_N, ENB => '1',
|
309 |
|
|
113 RSTA => '0', RSTB => '0',
|
310 |
|
|
114 WEA => '0', WEB => '0',
|
311 |
|
|
115 DOA => M_OPC_E(3 downto 0), DOB => M_PMD_E(3 downto 0));
|
312 |
|
|
<pre class="filename">
|
313 |
|
|
src/prog_mem.vhd
|
314 |
|
|
</pre></pre>
|
315 |
|
|
<P>
|
316 |
|
|
|
317 |
|
|
<P><br>
|
318 |
|
|
|
319 |
|
|
<P>The first line instantiates a module of type <STRONG>RAMB4_S4_S4</STRONG>, which
|
320 |
|
|
is a dual-port memory module with two 4-bit ports. For a Xilinx
|
321 |
|
|
FPGA you can used these modules directly by uncommenting the
|
322 |
|
|
use of the UNISIM library. For functional simulation we have provided
|
323 |
|
|
a <STRONG>RAMB4_S4_S4.vhd</STRONG> component in the test directory. This component
|
324 |
|
|
emulates the real <STRONG>RAMB4_S4_S4</STRONG> as good as needed.
|
325 |
|
|
|
326 |
|
|
<P>The next lines define the content of each memory module by means of
|
327 |
|
|
a generic map. The elements of the generic map (like <STRONG>pe_0_00</STRONG>, <STRONG>pe_0_01</STRONG>, and
|
328 |
|
|
so forth) define the initial memory content of the instantiated module.
|
329 |
|
|
<STRONG>pe_0_00</STRONG>, <STRONG>pe_0_01</STRONG>, .. are themselves defined in <STRONG>prog_mem_content.vhd</STRONG>
|
330 |
|
|
which is included in the library section:
|
331 |
|
|
|
332 |
|
|
<P><br>
|
333 |
|
|
|
334 |
|
|
<pre class="vhdl">
|
335 |
|
|
|
336 |
|
|
34 use work.prog_mem_content.all;
|
337 |
|
|
<pre class="filename">
|
338 |
|
|
src/prog_mem.vhd
|
339 |
|
|
</pre></pre>
|
340 |
|
|
<P>
|
341 |
|
|
|
342 |
|
|
<P><br>
|
343 |
|
|
|
344 |
|
|
<P>The process from a C (or C++) source file <STRONG>hello.c</STRONG> to the final
|
345 |
|
|
FPGA is then:
|
346 |
|
|
|
347 |
|
|
<UL>
|
348 |
|
|
<LI>write, compile, and link <STRONG>hello.c</STRONG> (produces <STRONG>hello.hex</STRONG>).
|
349 |
|
|
<LI>generate <STRONG>prog_mem_content.vhd</STRONG> from <STRONG>hello.hex</STRONG> (by means of tool
|
350 |
|
|
<STRONG>make_mem</STRONG>, which is provided with this lecture).
|
351 |
|
|
<LI>simulate, synthesize and implement the design.
|
352 |
|
|
<LI>create a bitmap file.
|
353 |
|
|
<LI>flash the FPGA (or serial PROM).
|
354 |
|
|
</UL>
|
355 |
|
|
<P>There are other ways of initializing the memory modules, such as
|
356 |
|
|
updating sections of the bitmap file, but we found the above sequence
|
357 |
|
|
easier to use.
|
358 |
|
|
|
359 |
|
|
<P>After the generic map, follows the port map of the memory module.
|
360 |
|
|
The two addresses <STRONG>ADDRA</STRONG> and <STRONG>ADDRB</STRONG> of the two ports come from
|
361 |
|
|
the <STRONG>PC</STRONG> and <STRONG>PM_ADR</STRONG> inputs as already described.
|
362 |
|
|
|
363 |
|
|
<P>Both ports are clocked from <STRONG>CLK</STRONG>. Since the program memory is read-only,
|
364 |
|
|
the <STRONG>DIA</STRONG> and <STRONG>DIB</STRONG> inputs are not used (set to 0000) and <STRONG>WEA</STRONG> and <STRONG>WEB</STRONG>
|
365 |
|
|
are 0. <STRONG>RSTA</STRONG> and <STRONG>RSTB</STRONG> are not used either and are set to 0.
|
366 |
|
|
<STRONG>ENA</STRONG> is used for keeping the <STRONG>OPC</STRONG> when the pipeline is stopped, while
|
367 |
|
|
<STRONG>ENB</STRONG> is not used. The memory outputs <STRONG>DOA</STRONG> and <STRONG>DOB</STRONG> go to the output
|
368 |
|
|
multiplexers of the two ports.
|
369 |
|
|
|
370 |
|
|
<H3><A NAME="section_1_1_4">5.2.3 Delayed PC</A></H3>
|
371 |
|
|
|
372 |
|
|
<P><STRONG>Q_PC</STRONG> is <STRONG>I_PC</STRONG> delayed by one clock. The program memory is a synchronous
|
373 |
|
|
memory, which has the consequence that the program memory output <STRONG>OPC</STRONG>
|
374 |
|
|
for a given <STRONG>I_PC</STRONG> is always one clock cycle behind as shown in the figure
|
375 |
|
|
below on the left.
|
376 |
|
|
|
377 |
|
|
<P><br>
|
378 |
|
|
|
379 |
|
|
<P><img src="opcode_fetch_1.png">
|
380 |
|
|
|
381 |
|
|
<P><br>
|
382 |
|
|
|
383 |
|
|
<P>By clocking <STRONG>I_PC</STRONG> once, we re-align <STRONG>Q_PC</STRONG> and <STRONG>OPC</STRONG> as shown on the right:
|
384 |
|
|
|
385 |
|
|
<P><br>
|
386 |
|
|
|
387 |
|
|
<pre class="vhdl">
|
388 |
|
|
|
389 |
|
|
227 Q_PC <= I_PC;
|
390 |
|
|
<pre class="filename">
|
391 |
|
|
src/prog_mem.vhd
|
392 |
|
|
</pre></pre>
|
393 |
|
|
<P>
|
394 |
|
|
|
395 |
|
|
<P><br>
|
396 |
|
|
|
397 |
|
|
<H2><A NAME="section_1_2">5.3 Two Cycle Opcodes</A></H2>
|
398 |
|
|
|
399 |
|
|
<P>The vast majority of instructions executes in one cycle. Some need
|
400 |
|
|
two cycles because they involve reading of a synchronous memory. For
|
401 |
|
|
these signals <STRONG>WAIT</STRONG> signal is generated on the first cycle:
|
402 |
|
|
|
403 |
|
|
<P><br>
|
404 |
|
|
|
405 |
|
|
<pre class="vhdl">
|
406 |
|
|
|
407 |
|
|
114 -- Two cycle opcodes:
|
408 |
|
|
115 --
|
409 |
|
|
116 -- 1001 000d dddd .... - LDS etc.
|
410 |
|
|
117 -- 1001 0101 0000 1000 - RET
|
411 |
|
|
118 -- 1001 0101 0001 1000 - RETI
|
412 |
|
|
119 -- 1001 1001 AAAA Abbb - SBIC
|
413 |
|
|
120 -- 1001 1011 AAAA Abbb - SBIS
|
414 |
|
|
121 -- 1111 110r rrrr 0bbb - SBRC
|
415 |
|
|
122 -- 1111 111r rrrr 0bbb - SBRS
|
416 |
|
|
123 --
|
417 |
|
|
124 L_WAIT <= '0' when (L_INVALIDATE = '1')
|
418 |
|
|
125 else '0' when (I_INTVEC(5) = '1')
|
419 |
|
|
126 else L_T0 when ((P_OPC(15 downto 9) = "1001000" ) -- LDS etc.
|
420 |
|
|
127 or (P_OPC(15 downto 8) = "10010101") -- RET etc.
|
421 |
|
|
128 or ((P_OPC(15 downto 10) = "100110") -- SBIC, SBIS
|
422 |
|
|
129 and P_OPC(8) = '1')
|
423 |
|
|
130 or (P_OPC(15 downto 10) = "111111")) -- SBRC, SBRS
|
424 |
|
|
131 else '0';
|
425 |
|
|
<pre class="filename">
|
426 |
|
|
src/opc_fetch.vhd
|
427 |
|
|
</pre></pre>
|
428 |
|
|
<P>
|
429 |
|
|
|
430 |
|
|
<P><br>
|
431 |
|
|
|
432 |
|
|
<H2><A NAME="section_1_3">5.4 Interrupts</A></H2>
|
433 |
|
|
|
434 |
|
|
<P>The opcode fetch stage is also responsible for part of the interrupt
|
435 |
|
|
handling. Interrupts are generated in the I/O block by setting
|
436 |
|
|
<STRONG>INTVEC</STRONG> to a value with the highest bit set:
|
437 |
|
|
|
438 |
|
|
<P><br>
|
439 |
|
|
|
440 |
|
|
<pre class="vhdl">
|
441 |
|
|
|
442 |
|
|
169 if (L_RX_INT_ENABLED and U_RX_READY) = '1' then
|
443 |
|
|
170 if (L_INTVEC(5) = '0') then -- no interrupt pending
|
444 |
|
|
171 L_INTVEC <= "101011"; -- _VECTOR(11)
|
445 |
|
|
172 end if;
|
446 |
|
|
173 elsif (L_TX_INT_ENABLED and not U_TX_BUSY) = '1' then
|
447 |
|
|
174 if (L_INTVEC(5) = '0') then -- no interrupt pending
|
448 |
|
|
175 L_INTVEC <= "101100"; -- _VECTOR(12)
|
449 |
|
|
176 end if;
|
450 |
|
|
<pre class="filename">
|
451 |
|
|
src/io.vhd
|
452 |
|
|
</pre></pre>
|
453 |
|
|
<P>
|
454 |
|
|
|
455 |
|
|
<P><br>
|
456 |
|
|
|
457 |
|
|
<P>The highest bit of <STRONG>INTVEC</STRONG> indicates that the lower bits contain a
|
458 |
|
|
valid interrupt number. <STRONG>INTVEC</STRONG> proceeds to the cpu core where
|
459 |
|
|
the upper bit is <STRONG>and</STRONG>'ed with the global interrupt enable bit (in the
|
460 |
|
|
status register):
|
461 |
|
|
|
462 |
|
|
<P><br>
|
463 |
|
|
|
464 |
|
|
<pre class="vhdl">
|
465 |
|
|
|
466 |
|
|
241 L_INTVEC_5 <= I_INTVEC(5) and R_INT_ENA;
|
467 |
|
|
<pre class="filename">
|
468 |
|
|
src/cpu_core.vhd
|
469 |
|
|
</pre></pre>
|
470 |
|
|
<P>
|
471 |
|
|
|
472 |
|
|
<P><br>
|
473 |
|
|
|
474 |
|
|
<P>The (possibly modified) <STRONG>INTVEC</STRONG> then proceeds to the opcode fetch stage.
|
475 |
|
|
If the the global interrupt enable bit was set, then the next valid
|
476 |
|
|
opcode is replaced by an "interrupt opcode":
|
477 |
|
|
|
478 |
|
|
<P><br>
|
479 |
|
|
|
480 |
|
|
<pre class="vhdl">
|
481 |
|
|
|
482 |
|
|
135 Q_OPC <= X"00000000" when (L_INVALIDATE = '1')
|
483 |
|
|
136 else P_OPC when (I_INTVEC(5) = '0')
|
484 |
|
|
137 else (X"000000" & "00" & I_INTVEC); -- "interrupt opcode"
|
485 |
|
|
<pre class="filename">
|
486 |
|
|
src/opc_fetch.vhd
|
487 |
|
|
</pre></pre>
|
488 |
|
|
<P>
|
489 |
|
|
|
490 |
|
|
<P><br>
|
491 |
|
|
|
492 |
|
|
<P>The interrupt opcode uses a gap after the <STRONG>NOP</STRONG> instruction in the opcode
|
493 |
|
|
set of the AVR CPU. When the interrupt opcode reaches the execution
|
494 |
|
|
stage then is causes a branch to the location determined by the lower
|
495 |
|
|
bits of <STRONG>INTVEC</STRONG>, pushes the program counter, and clears the interrupt
|
496 |
|
|
enable bit. This happens a few clock cycles later. In the meantime
|
497 |
|
|
the opcode fetch stage keeps inserting interrupt instructions into
|
498 |
|
|
the pipeline. These additional interrupt instructions are being
|
499 |
|
|
invalidated by the execution stage when the first interrupt instruction
|
500 |
|
|
reaches the execution stage.
|
501 |
|
|
|
502 |
|
|
<P><hr><BR>
|
503 |
|
|
<table class="ttop"><th class="tpre"><a href="04_Cpu_Core.html">Previous Lesson</a></th><th class="ttop"><a href="toc.html">Table of Content</a></th><th class="tnxt"><a href="06_Data_Path.html">Next Lesson</a></th></table>
|
504 |
|
|
</BODY>
|
505 |
|
|
</HTML>
|