1 |
4 |
doru |
/*!
|
2 |
|
|
\defgroup pavr_intro Introduction
|
3 |
|
|
\par Goal
|
4 |
|
|
This project implements an \b 8 \b bit \b controller that is compatible with
|
5 |
|
|
Atmel's \ref pavr_avrarch "AVR architecture", using \b VHDL (Very High speed
|
6 |
|
|
integrated circuits Hardware Definition Language). \n
|
7 |
|
|
The device built here is not a specific controller of the AVR family, but rather
|
8 |
|
|
a maximally featured AVR controller. It is configurable enough to be able to
|
9 |
|
|
simulate most AVR family controllers. \n
|
10 |
|
|
\b The \b goal is to obtain an AVR processor that is as powerful as possible (in
|
11 |
|
|
terms of MIPS), with a work budget of about 6 months*man. \n
|
12 |
|
|
\n
|
13 |
|
|
\par Approach
|
14 |
|
|
Atmel's AVR core is reasonably fast, among the other 8 bit controllers on the
|
15 |
|
|
market (year 2002). Most instructions take one clock. The instruction set is
|
16 |
|
|
(almost) RISC. In real life applications, the average clocks per instruction
|
17 |
|
|
(CPI) is typically 1.2...1.7, depending on the application. CPI=1.4 is a good
|
18 |
|
|
average. The core has a short pipeline, with 2 stages (fetch and execute). With
|
19 |
|
|
Atmel's 0.5um technology, the core runs at 10...15 MHz. \n
|
20 |
|
|
\n
|
21 |
|
|
From the start were searched ways to improve original core's performance. \n
|
22 |
|
|
As the original core already executes most instructions in one clock, two
|
23 |
|
|
ideas come quick in mind: a deeper pipeline and issuing more than one instruction
|
24 |
|
|
per clock (multi-issue). \n
|
25 |
|
|
A deeper pipeline is relatively straightforward. A clock speed increase of about
|
26 |
|
|
3...4x is expected from a 5 or 6 stages pipeline. However, the resulted average
|
27 |
|
|
CPI is expected to be slightly bigger than the original, mainly because of jumps,
|
28 |
|
|
branches, calls and returns. They require the pipeline to be flushed, at least
|
29 |
|
|
partially, thus some clocks are lost while refilling the pipeline. \n
|
30 |
|
|
The multi-issue approach was quickly rejected. The available time budget is
|
31 |
|
|
too small for implementing both a deep pipeline and multi-issuing. On the other
|
32 |
|
|
hand, multi-issue without a deeper pipeline wouldn't make much sense. \n
|
33 |
|
|
\n
|
34 |
|
|
\par Result
|
35 |
|
|
pAVR is a \b parameterizable and \b synthesizable VHDL design, AVR-compatible,
|
36 |
|
|
that has: \n
|
37 |
|
|
|
38 |
|
|
6 pipeline stages
|
39 |
|
|
1 instruction/clock for most instructions
|
40 |
|
|
estimated clock frequency: \b ~50 \b MHz & \b 0.5 \b um; assuming that
|
41 |
|
|
Atmel's core runs at 15 MHz & 0.5 um. \n
|
42 |
|
|
3x Atmel original core's performance.
|
43 |
|
|
estimated MIPS at 50 MHz: \b 28 \b MIPS (typical), \b 50 \b MIPS (peak) \n
|
44 |
|
|
3x Atmel original core's performance.. At 15 MHz, Atmel's core has
|
45 |
|
|
10 MIPS typical, and 15 MIPS peak.
|
46 |
|
|
CPI: 1.7 clocks/instruction (typical), 1 clock/instruction (peak) \n
|
47 |
|
|
0.75x (typical), 1.00x (peak) Atmel original core's performance.
|
48 |
|
|
pAVR architecture is rather computational-friendly than control-friendly.
|
49 |
|
|
\ref pavr_pipeline_jumps "Jumps", \ref pavr_pipeline_branches "branches",
|
50 |
|
|
\ref pavr_pipeline_skips "skips", \ref pavr_pipeline_calls "calls" and
|
51 |
|
|
\ref pavr_pipeline_returns "returns" are relatively expansive in terms of
|
52 |
|
|
clocks. A branch prediction scheme and a smarter return procedure might
|
53 |
|
|
be considered as upgrades.
|
54 |
|
|
|
55 |
|
|
\n
|
56 |
|
|
The \ref pavr_src "sources" structure is \b modularized. The sources are written
|
57 |
|
|
based on a set of common-sense \ref pavr_src_conv "conventions" (the process
|
58 |
|
|
splitting strategy, signals naming, etc). Thus, pAVR is quite an easily
|
59 |
|
|
\b maintainable design. \n
|
60 |
|
|
Extensive \ref pavr_test "testing" was carried out. \n
|
61 |
|
|
pAVR is to be synthesized and burned into a \ref pavr_fpga "FPGA". \n
|
62 |
|
|
\n
|
63 |
|
|
\par Project structure
|
64 |
|
|
This project is distributed in two forms: \b release and \b devel (development). \n
|
65 |
|
|
\n
|
66 |
|
|
The \b devel distribution contains
|
67 |
|
|
|
68 |
|
|
pAVR documentation
|
69 |
|
|
VHDL sources for pAVR and associated VHDL tests
|
70 |
|
|
test programs
|
71 |
|
|
some utilities (preprocessor, some useful scripts)
|
72 |
|
|
|
73 |
|
|
In a word, the devel structure contains anything that is needed for one to develop
|
74 |
|
|
this project further. As a side note, this project was developed under Windows
|
75 |
|
|
XP. Yet, all the main software tools used here have Linux counterparts (Doxygen,
|
76 |
|
|
VHDL simulator, C compiler, TCL interpreter, text editor). \n
|
77 |
|
|
The documentation is generated via Doxygen. For those who don't know how to use
|
78 |
|
|
this wonderful tool, please check www.doxygen.org . \n
|
79 |
|
|
In the "doc" directory can be found the sources of the documentation. Also, here
|
80 |
|
|
are some scripts for compiling the documentation, cleaning it up, or running
|
81 |
|
|
(viewing) it. \n
|
82 |
|
|
In the "doc/html" folder is placed the compilation result (HTML). The HTML
|
83 |
|
|
documentation is further compiled into a .CHM (compressed HTML) file that is
|
84 |
|
|
placed in the "doc/chm" folder. CHM is a very convenient file format, providing
|
85 |
|
|
about all the features of HTML, plus that it's very small due to compression
|
86 |
|
|
and very handy (a single file instead of a bunch of files and folders).
|
87 |
|
|
However, this file format is still Windows-bound. There are neither compilers
|
88 |
|
|
nor viewers for Linux (but things might change soon...). \n
|
89 |
|
|
The "src" folder contains pAVR VHDL sources, VHDL tests and some Modelsim macro
|
90 |
|
|
files. \n
|
91 |
|
|
The "test" folder contains the test programs (ASM and ANSI C) with which pAVR was
|
92 |
|
|
tested. \n
|
93 |
|
|
The "tools" folder contains some utilities. The most important utility is a text
|
94 |
|
|
preprocessor. In the VHDL sources are placed XML-like tags, inserted as
|
95 |
|
|
comments. The preprocessor parses these sources and interprets the XML-like
|
96 |
|
|
tags. For example, some tags isolate non-synthesizable code that can easily
|
97 |
|
|
removed when synthesizing pAVR. The preprocessor is also used to insert
|
98 |
|
|
a common header into all VHDL sources. \n
|
99 |
|
|
Also, in the "tools" folder are some scripts that build devel or release packages. \n
|
100 |
|
|
\n
|
101 |
|
|
The \b release distribution contains only the documentation. However, all the VHDL
|
102 |
|
|
sources are embedded into the documentation, and are thus easily accessible. \n
|
103 |
|
|
The release distribution comes in two flavors: HTML or CHM. My favorite is CHM,
|
104 |
|
|
because it's much more compact. However, for viewing the documentation under
|
105 |
|
|
Linux, HTML is still needed. \n
|
106 |
|
|
\n
|
107 |
|
|
Throughout this project are a few sub-projects that must be edited/compiled/run
|
108 |
|
|
independently (for example, generating the documentation, or compiling test
|
109 |
|
|
sources). For this purpose, I use a TCL console with stdin/stdout/stderr, and
|
110 |
|
|
a few buttons: edit/compile/run/clean. Each button launches a script with the
|
111 |
|
|
same name as the button, placed in the same folder as the console script. The
|
112 |
|
|
stdout/stderr of the scripts are captured on the TCL console. I use this
|
113 |
|
|
"project manager" (the TCL console) the very same way for, let's say, compiling
|
114 |
|
|
a C source or generating Doxygen documentation.
|
115 |
|
|
\n
|
116 |
|
|
\n
|
117 |
|
|
\n
|
118 |
|
|
*/
|
119 |
|
|
|
120 |
|
|
|
121 |
|
|
|
122 |
|
|
/*!
|
123 |
|
|
\defgroup pavr_avrarch AVR architecture
|
124 |
|
|
\par AVR features
|
125 |
|
|
- Load-store \ref pavr_avris "RISC" machine
|
126 |
|
|
- Harvard architecture, with separate program/data buses
|
127 |
|
|
- 2 level pipeline: fetch and execute
|
128 |
|
|
- Most instructions execute in 1 clock.
|
129 |
|
|
- Variable instruction word width: 16 or 32 bits. Most instructions are 16 bits
|
130 |
|
|
wide
|
131 |
|
|
- Register File (RF) with 32 registers
|
132 |
|
|
- IO File (IOF) with 64 registers
|
133 |
|
|
- Loads and stores operate in the Unified Memory space. \n
|
134 |
|
|
The Unified Memory (UM) is the space formed by concatenating the RF, IOF and the
|
135 |
|
|
Data Memory (DM), in this order. Thus, the RF begins at address 0 in the UM, the
|
136 |
|
|
IOF at address 32 and the DM at address 96.
|
137 |
|
|
- Register File mapped pointer registers X, Y, Z, 16 bits each, for indirect
|
138 |
|
|
addressing the Data Memory and the Program Memory (PM). \n
|
139 |
|
|
Pointer registers have pre-decrement and post-increment capabilities.
|
140 |
|
|
|
141 |
|
|
\todo
|
142 |
|
|
Add some AVR kernel schematics. \n
|
143 |
|
|
Add some AVR general considerations.
|
144 |
|
|
|
145 |
|
|
\par Notes on AVR downsides
|
146 |
|
|
Among other 8 bit microcontrollers, the AVR architecture is relatively clean and
|
147 |
|
|
fast. Of course, it is not perfect.
|
148 |
|
|
In the following, I will expand on some of the drawbacks of the AVR architecture. \n
|
149 |
|
|
\n
|
150 |
|
|
Pipeline-friendliness issues:
|
151 |
|
|
|
152 |
|
|
\b The \b register \b file, \b IO \b file \b and \b data \b memory \b have
|
153 |
|
|
\b a \b unified \b addressing \b space. \n
|
154 |
|
|
The Register File, IO file and Data Memory are very different entities,
|
155 |
|
|
from the point of view of the AVR instruction set. It's an obvious
|
156 |
|
|
decision to physically implement them as different memory-like entities.
|
157 |
|
|
Pipelining such a structure is straightforward. A simple and fast
|
158 |
|
|
pipeline can be built naturally. Every memory-like entity can be
|
159 |
|
|
assigned a fixed pipe stage during which it is accessed for writing or
|
160 |
|
|
for reading, with no more than one such elementary operation needed
|
161 |
|
|
during any instruction. \n
|
162 |
|
|
\b However, the AVR architecture has a unified addressing space for Register
|
163 |
|
|
File - IO file - Data Memory. Accessing this Unified Memory space can be
|
164 |
|
|
done through indirect loads and stores, via dedicated pointer registers.
|
165 |
|
|
Depending upon the contents of a pointer register, an access to the
|
166 |
|
|
Register File or the IO file or the Data Memory is needed. This
|
167 |
|
|
completely messes up the simple pipeline structure above, because
|
168 |
|
|
instructions' execution is \b data \b driven. As a result, for example, the
|
169 |
|
|
Register File must now be accessed, let's say for reading, in more than
|
170 |
|
|
one pipe stage. This is most pipeline-destructive, because different
|
171 |
|
|
instructions will compete on the same hardware resources. \n
|
172 |
|
|
Arbitration/stall schemes are required. Also, new data hazards must be
|
173 |
|
|
dealt with. All these are pretty complex, and come with a cost, in
|
174 |
|
|
terms of both power consumption and speed. \n
|
175 |
|
|
The unified address space does bring new addressing capabilities. However,
|
176 |
|
|
they are unnatural and basically useless. Who will ever place the stack in
|
177 |
|
|
the Register File or in the IO File? That would make some sense for low-end
|
178 |
|
|
controllers that don't have Data Memory at all, and rely on a Register
|
179 |
|
|
File mapped stack. However, the price paid for that is big. \n
|
180 |
|
|
As a result, pAVR's loads and stores take 2 cycles. If the pointer registers
|
181 |
|
|
would have pointed only in the Data Memory space, loads and stores would
|
182 |
|
|
have naturally taken a single clock.
|
183 |
|
|
\b The \b Register \b File/IO \b file \b operand's \b addresses \b don't
|
184 |
|
|
\b have \b fixed \b positions \b in \b the \b instruction \b opcodes. \n
|
185 |
|
|
That would have allowed reducing the number of pipe stages from 6 to 5.
|
186 |
|
|
As a result, a lower CPI would have been obtained, because of less
|
187 |
|
|
cycles penalty on the instructions that modify the instruction flow
|
188 |
|
|
(branches, jumps, calls etc). Also, that would have ment lower power
|
189 |
|
|
consumption because of less registers and combinational logic.
|
190 |
|
|
\b The \b instruction \b has \b variable \b width: \b 16 \b or \b 32 \b bits. \n
|
191 |
|
|
That is not pipeline-friendly. \n
|
192 |
|
|
Each 32 bit instruction could have easily been replaced by two 16 bit
|
193 |
|
|
instructions.
|
194 |
|
|
|
195 |
|
|
\n
|
196 |
|
|
Instruction set orthogonalithy issues:
|
197 |
|
|
- Pointer registers X, Y, Z have addressing capabilities that are different
|
198 |
|
|
from each other.
|
199 |
|
|
- Register File locations 0...15 have different addressing capabilities than
|
200 |
|
|
RF locations 16...31.
|
201 |
|
|
- IO locations 0 to 31 support more addressing modes than IO locations 32 to
|
202 |
|
|
63.
|
203 |
|
|
- There are instructions that work on 16 bit words (for example, 16 bit
|
204 |
|
|
register-to-register moves). \n
|
205 |
|
|
The existance of such instructions on a 8 bit RISC controller is questionable.
|
206 |
|
|
That's not because such operations are not needed, but because the raise in
|
207 |
|
|
complexity and irregularity is not justifiable. \n
|
208 |
|
|
The cost/performance balance is negative for these instructions (we're still
|
209 |
|
|
talking about a controller claimed to be RISC).
|
210 |
|
|
- opcodes 0x95C8 and 0x9004 do exactly the same thing (LPM). \n
|
211 |
|
|
Other such examples might exist. \n
|
212 |
|
|
The instruction bits could have been used more carefully.
|
213 |
|
|
- CLR affects flags, while SER does not, even though they seem to be
|
214 |
|
|
complementary intructions. \n
|
215 |
|
|
This might be a design flaw in the original core \b or designed on
|
216 |
|
|
(a hidden) purpose by whoever designed the AVR core. By the way, if I
|
217 |
|
|
remember well some ancient news, AVR was designed not by Atmel, but by a
|
218 |
|
|
Scandinavian company that was aquired later by Atmel.
|
219 |
|
|
|
220 |
|
|
\n
|
221 |
|
|
\n
|
222 |
|
|
*/
|
223 |
|
|
|
224 |
|
|
|
225 |
|
|
|
226 |
|
|
/*!
|
227 |
|
|
\defgroup pavr_avris AVR instruction set
|
228 |
|
|
\htmlonly
|
229 |
|
|
230 |
|
|
AVR instruction set
|
231 |
|
|
232 |
|
|
Arithmetic |
|
233 |
|
|
Bit & Others |
|
234 |
|
|
Transfer |
|
235 |
|
|
Jump |
|
236 |
|
|
Branch |
|
237 |
|
|
Call |
|
238 |
|
|
239 |
|
|
|
|
240 |
|
|
ADD Rd, Rr
|
241 |
|
|
ADC Rd, Rr
|
242 |
|
|
ADIW Rd+1:Rd, K6
|
243 |
|
|
|
244 |
|
|
SUB Rd, Rr
|
245 |
|
|
SUBI Rd, K8
|
246 |
|
|
SBC Rd, Rr
|
247 |
|
|
SBCI Rd, K8
|
248 |
|
|
SBIW Rd+1:Rd, K6
|
249 |
|
|
|
250 |
|
|
INC Rd
|
251 |
|
|
DEC Rd
|
252 |
|
|
|
253 |
|
|
AND Rd, Rr
|
254 |
|
|
ANDI Rd, K8
|
255 |
|
|
OR Rd, Rr
|
256 |
|
|
ORI Rd, K8
|
257 |
|
|
EOR Rd, Rr
|
258 |
|
|
|
259 |
|
|
|
266 |
|
|
|
267 |
|
|
COM Rd
|
268 |
|
|
NEG Rd
|
269 |
|
|
CP Rd, Rr
|
270 |
|
|
CPC Rd, Rr
|
271 |
|
|
CPI Rd, K8
|
272 |
|
|
SWAP Rd
|
273 |
|
|
|
274 |
|
|
|
275 |
|
|
LSR Rd
|
276 |
|
|
|
277 |
|
|
ROR Rd
|
278 |
|
|
ASR Rd
|
279 |
|
|
|
280 |
|
|
|
281 |
|
|
MUL Rd, Rr*
|
282 |
|
|
MULS Rd, Rr
|
283 |
|
|
MULSU Rd, Rr
|
284 |
|
|
FMUL Rd, Rr
|
285 |
|
|
FMULS Rd, Rr
|
286 |
|
|
FMULSU Rd, Rr
|
287 |
|
|
|
288 |
|
|
|
289 |
|
|
|
|
290 |
|
|
BSET s
|
291 |
|
|
BCLR s
|
292 |
|
|
SBI A, b
|
293 |
|
|
CBI A, b
|
294 |
|
|
BST Rd, b
|
295 |
|
|
BLD Rd, b
|
296 |
|
|
|
297 |
|
|
NOP
|
298 |
|
|
|
299 |
|
|
BREAK**
|
300 |
|
|
SLEEP
|
301 |
|
|
WDR
|
302 |
|
|
|
303 |
|
|
|
304 |
|
|
|
|
305 |
|
|
MOV Rd, Rr
|
306 |
|
|
MOVW Rd+1:Rd, Rr+1:Rr
|
307 |
|
|
|
308 |
|
|
IN Rd, A
|
309 |
|
|
OUT A, Rr
|
310 |
|
|
|
311 |
|
|
PUSH Rr
|
312 |
|
|
POP Rr
|
313 |
|
|
|
314 |
|
|
LDI Rd, K8
|
315 |
|
|
LDS Rd, K16
|
316 |
|
|
|
317 |
|
|
LD Rd, X
|
318 |
|
|
LD Rd, -X
|
319 |
|
|
LD Rd, X+
|
320 |
|
|
|
321 |
|
|
LDD Rd, Y+K6
|
322 |
|
|
LD Rd, -Y
|
323 |
|
|
LD Rd, Y+
|
324 |
|
|
|
325 |
|
|
LDD Rd, Z+K6
|
326 |
|
|
LD Rd, -Z
|
327 |
|
|
LD Rd, Z+
|
328 |
|
|
|
329 |
|
|
STS K16, Rr
|
330 |
|
|
|
331 |
|
|
ST X, Rr
|
332 |
|
|
ST -X, Rr
|
333 |
|
|
ST X+, Rr
|
334 |
|
|
|
335 |
|
|
STD Y+K6, Rr
|
336 |
|
|
ST -Y, Rr
|
337 |
|
|
ST Y+, Rr
|
338 |
|
|
|
339 |
|
|
STD Z+K6, Rr
|
340 |
|
|
ST -Z, Rr
|
341 |
|
|
ST Z+, Rr
|
342 |
|
|
|
343 |
|
|
LPM
|
344 |
|
|
LPM Rd, Z
|
345 |
|
|
LPM Rd, Z+
|
346 |
|
|
ELPM
|
347 |
|
|
ELPM Rd, Z
|
348 |
|
|
ELPM Rd, Z+
|
349 |
|
|
|
350 |
|
|
|
351 |
|
|
SPM
|
352 |
|
|
|
353 |
|
|
|
354 |
|
|
|
|
355 |
|
|
RJMP K12
|
356 |
|
|
IJMP
|
357 |
|
|
EIJMP
|
358 |
|
|
JMP K22
|
359 |
|
|
|
360 |
|
|
|
|
361 |
|
|
CPSE Rd, Rr
|
362 |
|
|
|
363 |
|
|
SBRC Rr, b
|
364 |
|
|
SBRS Rr, b
|
365 |
|
|
|
366 |
|
|
SBIC A, b
|
367 |
|
|
SBIS A, b
|
368 |
|
|
|
369 |
|
|
BRBC s, K7
|
370 |
|
|
BRBS s, K7
|
371 |
|
|
|
372 |
|
|
|
|
373 |
|
|
RCALL K12
|
374 |
|
|
ICALL
|
375 |
|
|
EICALL
|
376 |
|
|
CALL K22
|
377 |
|
|
|
378 |
|
|
RET
|
379 |
|
|
RETI
|
380 |
|
|
|
381 |
|
|
|
|
|
|
382 |
|
|
\endhtmlonly
|
383 |
|
|
\b * Multiplications are fully supported by the pipeline (in terms of timing,
|
384 |
|
|
wires and registers). However, the multiplication module itself is null-defined
|
385 |
|
|
in the ALU, and always returns zero for now. It will be defined and plugged into
|
386 |
|
|
the ALU in a future version of pAVR. \n
|
387 |
|
|
\b ** Italicized instructions are currently not implemented in pAVR. \n
|
388 |
|
|
|
389 |
|
|
\n
|
390 |
|
|
\n
|
391 |
|
|
*/
|
392 |
|
|
|
393 |
|
|
/*!
|
394 |
|
|
\defgroup pavr_implementation Implementation
|
395 |
|
|
*/
|
396 |
|
|
|
397 |
|
|
|
398 |
|
|
|
399 |
|
|
/*!
|
400 |
|
|
\defgroup pavr_control Pipeline structure
|
401 |
|
|
\ingroup pavr_implementation
|
402 |
|
|
\par Shift-like flow
|
403 |
|
|
pAVR has a pipeline with 6 stages:
|
404 |
|
|
|
405 |
|
|
1. read Program Memory (PM)
|
406 |
|
|
2. strobe Program Memory output into instruction register (INSTR)
|
407 |
|
|
3. decode instruction and read Register File (RFRD)
|
408 |
|
|
4. strobe Register File output (OPS)
|
409 |
|
|
5. execution or Unified Memory access (ALU)
|
410 |
|
|
6. write Register File (RFWR)
|
411 |
|
|
|
412 |
|
|
\n
|
413 |
|
|
\image html pavr_pipestruct_01.gif \n
|
414 |
|
|
\n
|
415 |
|
|
Each pipeline stage is pretty much of an independent state machine. \n
|
416 |
|
|
\n
|
417 |
|
|
Basically, each pipeline stage receives values from the previous one, in a
|
418 |
|
|
\b shift-like flow. Only the `terminal' registers contain data actually used,
|
419 |
|
|
the previous ones are used just for synchronization. \n
|
420 |
|
|
For example, this is how a particular hardware resource request flows through
|
421 |
|
|
pipeline stages s3, s4 until it is processed in s5: \n
|
422 |
|
|
\n
|
423 |
|
|
\image html pavr_pipestruct_02.gif \n
|
424 |
|
|
\n
|
425 |
|
|
\b Exceptions from this `normal' flow are the \b stall and \b flush actions, which
|
426 |
|
|
can basically independently stall or reset to zero (force a nop into) any stage.
|
427 |
|
|
Other exceptions are when several registers in such a chain are actually used,
|
428 |
|
|
not only the terminal one. \n
|
429 |
|
|
\n
|
430 |
|
|
Apart from the (main) pipeline stages above (stages s1-s6), there are a number
|
431 |
|
|
of pipeline stages only needed by a few instructions (such as 16 bit
|
432 |
|
|
arithmetic, some of the skips, returns): s61, s51, s52, s53 and s54. During
|
433 |
|
|
these pipeline stages, the main stages are stalled. \n
|
434 |
|
|
\n
|
435 |
|
|
Stages s1, s2 are common to all instructions. They bring the instruction from
|
436 |
|
|
Program Memory (PM) into the instruction register (instruction fetch stages). \n
|
437 |
|
|
During stage s3, the instruction just read from PM is decoded. That is, the
|
438 |
|
|
following pipeline stages (s4, s5, s6, s61, s51, s52, s53, s54) are
|
439 |
|
|
instructed what to do, by means of dedicated registers. \n
|
440 |
|
|
\n
|
441 |
|
|
At a given moment, a pipe stage stage can do one of the following actions:
|
442 |
|
|
|
443 |
|
|
execute normally \n
|
444 |
|
|
The registers in that stage are loaded with:
|
445 |
|
|
|
446 |
|
|
Values from the previous stage, if that stage is different from s1 or s2 or s3
|
447 |
|
|
Some particular values if that stage is s1 or s2 (those values are set by the
|
448 |
|
|
Program Memory manager)
|
449 |
|
|
Values from the instruction decoder, if that stage is s3
|
450 |
|
|
|
451 |
|
|
flush (execute nop) \n
|
452 |
|
|
All registers in that stage are reseted to zero.
|
453 |
|
|
stall \n
|
454 |
|
|
All registers in that stage are kept unchanged.
|
455 |
|
|
|
456 |
|
|
\n
|
457 |
|
|
|
458 |
|
|
\par Hardware resource managing
|
459 |
|
|
Pipeline stages can request access to hardware resources. Access to hardware
|
460 |
|
|
resources is done via dedicated hardware resource managers (one manager
|
461 |
|
|
per hardware resource; one VHDL process per manager). \n
|
462 |
|
|
\n
|
463 |
|
|
Main hardware resources:
|
464 |
|
|
|
465 |
|
|
Register File (RF)
|
466 |
|
|
Bypass Unit (BPU)
|
467 |
|
|
|
468 |
|
|
Bypass Register 0 (Bypass chain 0) (BPR0)
|
469 |
|
|
Bypass Register 1 (Bypass chain 1) (BPR1)
|
470 |
|
|
Bypass Register 2 (Bypass chain 2) (BPR2)
|
471 |
|
|
|
472 |
|
|
IO File (IOF)
|
473 |
|
|
Status Register (SREG)
|
474 |
|
|
Stack Pointer (SP)
|
475 |
|
|
Arithmetic and Logic Unit (ALU)
|
476 |
|
|
Data Access Control Unit (DACU)
|
477 |
|
|
Program Memory (PM)
|
478 |
|
|
Stall and Flush Unit (SFU)
|
479 |
|
|
|
480 |
|
|
\n
|
481 |
|
|
Only one such request can be received by a given resource at a time. If
|
482 |
|
|
multiple accesses are requested from a resource, its access manager
|
483 |
|
|
will assert an error during simulation; that would indicate a design bug. \n
|
484 |
|
|
The pipeline is built so that each resource is normally accessed during a
|
485 |
|
|
fixed pipeline stage:
|
486 |
|
|
|
487 |
|
|
RF is normally read in s3 and written in s6.
|
488 |
|
|
IOF is normally read/written in s5.
|
489 |
|
|
DM is normally read/written in s5.
|
490 |
|
|
DACU is normally read/written in s5.
|
491 |
|
|
PM is normally read in s1.
|
492 |
|
|
|
493 |
|
|
However, exceptions can occur. For example, LPM instructions need to read
|
494 |
|
|
PM in stage s5. Also, loads/stores must be able to read/write RF in stage s5. \n
|
495 |
|
|
Exceptions are handled at the hardware resource managers level. \n
|
496 |
|
|
|
497 |
|
|
\par Stall and Flush Unit
|
498 |
|
|
Because of the exceptions above, different pipeline stages can compete for a given
|
499 |
|
|
hardware resource. A mechanism must be provided to handle hardware resource
|
500 |
|
|
conflicts. The SFU implements this function, by arbitring hardware resource
|
501 |
|
|
requests. The SFU stalls some instructions (some pipeline stages), while
|
502 |
|
|
allowing others to execute. \n
|
503 |
|
|
\n
|
504 |
|
|
Stall handling is done through two sets of signals:
|
505 |
|
|
|
506 |
|
|
SFU requests (SFU inputs)
|
507 |
|
|
|
508 |
|
|
stall requests
|
509 |
|
|
flush requests
|
510 |
|
|
branch requests
|
511 |
|
|
skip requests
|
512 |
|
|
nop requests
|
513 |
|
|
|
514 |
|
|
SFU control signals (SFU outputs)
|
515 |
|
|
|
516 |
|
|
stall control
|
517 |
|
|
flush control
|
518 |
|
|
|
519 |
|
|
There is one pair of stall-flush control signals for each of the pipeline
|
520 |
|
|
stages s1, s2, s3, s4, s5, s6.
|
521 |
|
|
|
522 |
|
|
\n
|
523 |
|
|
\image html pavr_hwres_sfu_01.gif
|
524 |
|
|
\n
|
525 |
|
|
Each instruction has an embedded stall behavior, that is decoded by the
|
526 |
|
|
instruction decoder. \n
|
527 |
|
|
Various instructions in the pipeline, in different execution phases, access the
|
528 |
|
|
SFU exactly the same way they access any other hardware resources, through SFU
|
529 |
|
|
access requests. \n
|
530 |
|
|
The SFU prioritizes stall/flush/branch/skip/nop requests and postpones younger
|
531 |
|
|
instructions until older instructions free the hardware resources (SFU hardware
|
532 |
|
|
resource including). The postponing process is done through the stall-flush
|
533 |
|
|
controls, on a per-pipeline stage basis. \n
|
534 |
|
|
The `SFU rule': when a resource conflict appears, the older instruction wins. \n
|
535 |
|
|
\n
|
536 |
|
|
Some instructions need to insert a nop \b before the instruction `wave front',
|
537 |
|
|
for freeing hardware resources normally used by younger instructions. For
|
538 |
|
|
example, loads must `steal' the Register File read port 1 from younger
|
539 |
|
|
instructions. \n
|
540 |
|
|
Nops are inserted by stalling certain pipe stages and flushing other, or
|
541 |
|
|
possibly the same, stages. \n
|
542 |
|
|
Other instructions need a nop \b after the instruction wave front, for the
|
543 |
|
|
previous instruction to complete and free hardware resources. For example,
|
544 |
|
|
stores must wait a clock, until the previous instruction frees the Register
|
545 |
|
|
File write port. \n
|
546 |
|
|
The two situations differ pretty much from the point of view of the control
|
547 |
|
|
structure. In the second situation, the instruction is required to stall
|
548 |
|
|
and flush itself, which adds additional problems. These problems are solved
|
549 |
|
|
by introducing a dedicated noping state machine in stage s4, whose only
|
550 |
|
|
purpose is to introduce at most one nop \b after any instruction. On the
|
551 |
|
|
other hand, introducing nops \b before an instruction wave front is
|
552 |
|
|
straightforward, as any instruction can stall/flush younger instructions by
|
553 |
|
|
means of SFU requests. \n
|
554 |
|
|
\n
|
555 |
|
|
The specific SFU requests can be found \ref pavr_hwres_sfu "here".
|
556 |
|
|
\n
|
557 |
|
|
|
558 |
|
|
\par Shadowing
|
559 |
|
|
Let's consider the following situation: a load instruction reads the Data Memory
|
560 |
|
|
during pipe stage s5. Suppose that next clock, an older instruction stalls s6,
|
561 |
|
|
during which Data Memory output was supposed to be written into the Register
|
562 |
|
|
File. After another clock, the stall is removed, and s6 requests to write the
|
563 |
|
|
Register File, but the Data Memory output has changed during the stall.
|
564 |
|
|
Corrupted data will be written into the Register File. With the shadow protocol,
|
565 |
|
|
the Data Memory output is saved during the stall. When the stall is removed, the
|
566 |
|
|
Register File is written with the saved data. \n
|
567 |
|
|
\n
|
568 |
|
|
\b The \b shadow \b protocol \n
|
569 |
|
|
If a pipe stage is not permitted to place hardware resource requests, then mark
|
570 |
|
|
every memory-like entity in that stage as having its output `shadowed', and
|
571 |
|
|
write its associated shadow register with the corresponding data output.
|
572 |
|
|
Else, mark it as `unshadowed'. \n
|
573 |
|
|
As long as the memory-like enity is marked `shadowed', it will be read (by
|
574 |
|
|
whatever entity needs that) from its associated shadow register, rather than
|
575 |
|
|
directly from its data output. \n
|
576 |
|
|
In order to enable shadowing during multiple, successive stalls, shadow
|
577 |
|
|
memory-like entities only if they aren't already shadowed. \n
|
578 |
|
|
\n
|
579 |
|
|
Basically, the condition that shadows a memory-like entity's output is `hardware
|
580 |
|
|
resources are disabled during that stage'. However, there are exceptions. For
|
581 |
|
|
example, LPM family instructions steal Program Memory access by stalling the
|
582 |
|
|
instruction that would normally be fetched that time. By stalling, hardware
|
583 |
|
|
resource requests become disabled in that pipe stage. Still, LPM family
|
584 |
|
|
instructions must be able to access directly Program Memory output. Here, the
|
585 |
|
|
PM must not be shadowed even though during its pipe stage s2 (during which PM
|
586 |
|
|
is normally accessed) all hardware requests are disabled by default. \n
|
587 |
|
|
Fortunately, there are only a few such exceptions (holes through the shadow
|
588 |
|
|
protocol). Overall, the shadow protocol is still a good idea, as it permits natural
|
589 |
|
|
& automatic handling of a bunch of registers placed in delicate areas. \n
|
590 |
|
|
\n
|
591 |
|
|
|
592 |
|
|
\todo
|
593 |
|
|
|
594 |
|
|
Branch prediction with hashed branch prediction table and 2 bit predictor.
|
595 |
|
|
Super-RAM interfacing to Program Memory. \n
|
596 |
|
|
A super-RAM is a classic RAM with two supplemental lines: a mem_rq input
|
597 |
|
|
and a mem_ack output. The device that writes/reads the super-RAM knows that
|
598 |
|
|
it can place an access request when the memory signalizes it's ready via
|
599 |
|
|
mem_ack. Only then, it can place an access request via mem_rq. \n
|
600 |
|
|
A super-RAM is a super-class for classic RAM. That is, a super-RAM becomes
|
601 |
|
|
classic RAM if the RAM ignores mem_rq and keeps continousely mem_ack to 1. \n
|
602 |
|
|
The super-RAM protocol is so flexible that, as an extreme example, it can
|
603 |
|
|
serially (!) interface the Program Memory to the controller. That is, about
|
604 |
|
|
2-3 wires instead of 38 wires, without needing to modify anything in the
|
605 |
|
|
controller. Of course, that would come with a very large speed penalty, but
|
606 |
|
|
it allows choosing the most advantageous compromise between the number of
|
607 |
|
|
wires and speed. The only thing to be done is to add a serial to parallel
|
608 |
|
|
converter, that complies to the super-RAM protocol. \n
|
609 |
|
|
After pAVR is made super-RAM compatible, it can run anyway from a regular
|
610 |
|
|
RAM, as it runs now, by ignoring the two extra lines. Thus, nothing is
|
611 |
|
|
removed, it's only added. No speed penalty should be payed. \n
|
612 |
|
|
A simple way to add the super-RAM interface is to force nops into the
|
613 |
|
|
pipeline as long as the serial-to-parallel converter works on an instruction
|
614 |
|
|
word. \n
|
615 |
|
|
Modify stall handling so that no nops are required \b after the instruction
|
616 |
|
|
wavefront. The instructions could take care of themselves. The idea is that
|
617 |
|
|
a request to a hardware resource that is already in use by an older instruction,
|
618 |
|
|
could \b automatically generate a stall. \n
|
619 |
|
|
That would:
|
620 |
|
|
|
621 |
|
|
generally simplify instruction handling
|
622 |
|
|
make average instruction execution slightly faster.
|
623 |
|
|
|
624 |
|
|
|
625 |
|
|
|
626 |
|
|
\n
|
627 |
|
|
\n
|
628 |
|
|
*/
|
629 |
|
|
|
630 |
|
|
|
631 |
|
|
|
632 |
|
|
/*!
|
633 |
|
|
\defgroup pavr_hwres Hardware resources
|
634 |
|
|
\ingroup pavr_implementation
|
635 |
|
|
*/
|
636 |
|
|
|
637 |
|
|
|
638 |
|
|
|
639 |
|
|
/*!
|
640 |
|
|
\defgroup pavr_hwres_rf Register File
|
641 |
|
|
\ingroup pavr_hwres
|
642 |
|
|
The Register File is a 3 port memory, with 2 read ports and 1 write port. \n
|
643 |
|
|
It has 32 locations, 8 bits each. \n
|
644 |
|
|
Separate read and write ports for the upper three 16 bit words are provided.
|
645 |
|
|
The upper three 16 bit words are the pointer registers X (at byte address
|
646 |
|
|
27:26), Y (29:28) and Z (31:30). \n
|
647 |
|
|
The RF is placed at the beginning of the Unified Memory space. \n
|
648 |
|
|
\n
|
649 |
|
|
\image html pavr_hwres_rf_01.gif
|
650 |
|
|
\n
|
651 |
|
|
*/
|
652 |
|
|
|
653 |
|
|
|
654 |
|
|
|
655 |
|
|
/*!
|
656 |
|
|
\defgroup pavr_hwres_rf_rd1 Read port 1
|
657 |
|
|
\ingroup pavr_hwres_rf
|
658 |
|
|
\par Register File read port 1 connectivity
|
659 |
|
|
\n
|
660 |
|
|
\image html pavr_hwres_rf_rd1_01.gif
|
661 |
|
|
\n
|
662 |
|
|
\par Requests to RF read port 1
|
663 |
|
|
- pavr_s3_rfrd1_rq \n
|
664 |
|
|
Most ALU-requiring instructions need to read an operand from RF read port 1
|
665 |
|
|
in the same clock as the instruction is decoded (here, "to read" = "to
|
666 |
|
|
strobe the read input"). Activate the read signal if necessary, via RF read
|
667 |
|
|
port 1 manager. \n
|
668 |
|
|
- pavr_s5_dacu_rfrd1_rq \n
|
669 |
|
|
\anchor dacu_rq
|
670 |
|
|
\b Note \b 1: This a somehow `missplaced' RF read port 1 request. To keep
|
671 |
|
|
the controller compatible with the AVR architecture, loads and stores must
|
672 |
|
|
operate in the Unified Memory space, that includes Register File, IO File
|
673 |
|
|
and Data Memory. Thus, it is possible for a LOAD to actually transfer, for
|
674 |
|
|
example, data from RF to RF, rather than from DM to RF (depending on the
|
675 |
|
|
addresses involved). \n
|
676 |
|
|
The DACU manager takes the decision which physical device has to be used,
|
677 |
|
|
and places consequent calls to the appropriate hardware resource manager.
|
678 |
|
|
This request is such a call. \n
|
679 |
|
|
\b Note \b 2: The same situation happens with `misplaced' RF writes in
|
680 |
|
|
stores. The stores read from RF and can actually write any of RF, IOF or DM.
|
681 |
|
|
\n
|
682 |
|
|
*/
|
683 |
|
|
|
684 |
|
|
|
685 |
|
|
|
686 |
|
|
/*!
|
687 |
|
|
\defgroup pavr_hwres_rf_rd2 Read port 2
|
688 |
|
|
\ingroup pavr_hwres_rf
|
689 |
|
|
\par Register File read port 2 connectivity
|
690 |
|
|
\n
|
691 |
|
|
\image html pavr_hwres_rf_rd2_01.gif
|
692 |
|
|
\n
|
693 |
|
|
\par Requests to RF read port 2
|
694 |
|
|
- pavr_s3_rfrd2_rq \n
|
695 |
|
|
Needed by 2 operands instructions (most ALU instructions, moves).
|
696 |
|
|
\n
|
697 |
|
|
*/
|
698 |
|
|
|
699 |
|
|
|
700 |
|
|
|
701 |
|
|
/*!
|
702 |
|
|
\defgroup pavr_hwres_rf_wr Write port
|
703 |
|
|
\ingroup pavr_hwres_rf
|
704 |
|
|
\par Register File write port connectivity
|
705 |
|
|
\n
|
706 |
|
|
\image html pavr_hwres_rf_wr_01.gif
|
707 |
|
|
\n
|
708 |
|
|
\par Requests to RF write port
|
709 |
|
|
- pavr_s6_aluoutlo8_rfwr_rq \n
|
710 |
|
|
Request to write the lower 8 bits of the ALU result into the Register File. \n
|
711 |
|
|
- pavr_s61_aluouthi8_rfwr_rq \n
|
712 |
|
|
Request to write the higher 8 bits of the ALU result into RF. \n
|
713 |
|
|
- pavr_s6_iof_rfwr_rq \n
|
714 |
|
|
Request to write IOF data out into RF. \n
|
715 |
|
|
Needed by IN, BLD.
|
716 |
|
|
- pavr_s6_dacu_rfwr_rq \n
|
717 |
|
|
Request to write Unified Memory data out (DACU data out) into RF. \n
|
718 |
|
|
Needed by loads and POP.
|
719 |
|
|
- pavr_s6_pm_rfwr_rq \n
|
720 |
|
|
Request to write Program Memory data out into RF. \n
|
721 |
|
|
Needed by LPM, ELPM.
|
722 |
|
|
- pavr_s5_dacu_rfwr_rq \n
|
723 |
|
|
Request to write RF out into RF. \n
|
724 |
|
|
Needed by stores and PUSH. \n
|
725 |
|
|
See \ref dacu_rq "Note 2".
|
726 |
|
|
\n
|
727 |
|
|
*/
|
728 |
|
|
|
729 |
|
|
|
730 |
|
|
|
731 |
|
|
/*!
|
732 |
|
|
\defgroup pavr_hwres_rf_xwr X port
|
733 |
|
|
\ingroup pavr_hwres_rf
|
734 |
|
|
\par X port connectivity
|
735 |
|
|
\n
|
736 |
|
|
\image html pavr_hwres_rf_xwr_01.gif
|
737 |
|
|
\n
|
738 |
|
|
This is a read and write port. \n
|
739 |
|
|
The contents of the X register is permanently available for reading, under the
|
740 |
|
|
name `pavr_rf_x'. \n
|
741 |
|
|
The X write port consists of a data in (pavr_rf_x_di) and a write strobe
|
742 |
|
|
(pavr_rf_x_wr).
|
743 |
|
|
\par Requests to X write port
|
744 |
|
|
- pavr_s5_ldstincrampx_xwr_rq \n
|
745 |
|
|
Increment X. \n
|
746 |
|
|
If the controller has more than 64KB memory, than increment RAMPX:X (24
|
747 |
|
|
bits) rather than X (16 bits). \n
|
748 |
|
|
Needed by loads and stores with postincrement. \n
|
749 |
|
|
- pavr_s5_ldstdecrampx_xwr_rq \n
|
750 |
|
|
Decrement X. \n
|
751 |
|
|
If more than 64KB memory, than decrement RAMPX:X rather than X. \n
|
752 |
|
|
Needed by loads and stores with predecrement. \n
|
753 |
|
|
\n
|
754 |
|
|
*/
|
755 |
|
|
|
756 |
|
|
|
757 |
|
|
|
758 |
|
|
/*!
|
759 |
|
|
\defgroup pavr_hwres_rf_ywr Y port
|
760 |
|
|
\ingroup pavr_hwres_rf
|
761 |
|
|
\par Y port connectivity
|
762 |
|
|
\n
|
763 |
|
|
\image html pavr_hwres_rf_ywr_01.gif
|
764 |
|
|
\n
|
765 |
|
|
This is a read and write port. \n
|
766 |
|
|
\par Requests to Y write port
|
767 |
|
|
- pavr_s5_ldstincrampy_ywr_rq \n
|
768 |
|
|
Increment Y or RAMPY:Y. \n
|
769 |
|
|
Needed by loads and stores with postincrement. \n
|
770 |
|
|
- pavr_s5_ldstdecrampy_ywr_rq \n
|
771 |
|
|
Decrement Y or RAMPY:Y. \n
|
772 |
|
|
Needed by loads and stores with predecrement. \n
|
773 |
|
|
\n
|
774 |
|
|
*/
|
775 |
|
|
|
776 |
|
|
|
777 |
|
|
|
778 |
|
|
/*!
|
779 |
|
|
\defgroup pavr_hwres_rf_zwr Z port
|
780 |
|
|
\ingroup pavr_hwres_rf
|
781 |
|
|
\par Z port connectivity
|
782 |
|
|
\n
|
783 |
|
|
\image html pavr_hwres_rf_zwr_01.gif
|
784 |
|
|
\n
|
785 |
|
|
This is a read and write port. \n
|
786 |
|
|
\par Requests to Z write port
|
787 |
|
|
- pavr_s5_ldstincrampz_zwr_rq \n
|
788 |
|
|
Increment Z or RAMPZ:Z. \n
|
789 |
|
|
Needed by loads and stores with postincrement. \n
|
790 |
|
|
- pavr_s5_ldstdecrampz_zwr_rq \n
|
791 |
|
|
Decrement Z or RAMPZ:Z. \n
|
792 |
|
|
Needed by loads and stores with predecrement. \n
|
793 |
|
|
- pavr_s5_lpminc_zwr_rq \n
|
794 |
|
|
Increment Z. \n
|
795 |
|
|
Needed by LPM with postincrement. \n
|
796 |
|
|
- pavr_s5_elpmincrampz_zwr_rq \n
|
797 |
|
|
Increment RAMPZ:Z. \n
|
798 |
|
|
Needed by ELPM with postincrement. \n
|
799 |
|
|
\n
|
800 |
|
|
*/
|
801 |
|
|
|
802 |
|
|
|
803 |
|
|
|
804 |
|
|
/*!
|
805 |
|
|
\defgroup pavr_hwres_bpu Bypass Unit
|
806 |
|
|
\ingroup pavr_hwres
|
807 |
|
|
\par General considerations
|
808 |
|
|
The Bypass Unit (BPU) is a FIFO-like temporary storage area, that keeps data to
|
809 |
|
|
be written into the Register File. \n
|
810 |
|
|
If an instruction computes a value that must be written into the Register File
|
811 |
|
|
(RF) (an ALU instruction, for example) it first writes the BPU, and then (or at
|
812 |
|
|
the same time) actually writes the RF. \n
|
813 |
|
|
If the following instructions need an operand from the RF, at the same
|
814 |
|
|
address where the previous result should have been written into the RF, they will
|
815 |
|
|
actually read that operand from the BPU rather than from RF. \n
|
816 |
|
|
This way, `read before write' pipeline hazards are avoided. \n
|
817 |
|
|
\n
|
818 |
|
|
The specific situations where BPU is needed are:
|
819 |
|
|
- when reading Register File operand(s). \n
|
820 |
|
|
Reading Register File operands is done through the BPU.
|
821 |
|
|
- when reading pointer registers. \n
|
822 |
|
|
Reading pointer registers is done through the BPU.
|
823 |
|
|
|
824 |
|
|
\par Details
|
825 |
|
|
The algorithm of using BPU:
|
826 |
|
|
|
827 |
|
|
the instruction that wants to write a result into the RF, writes first the
|
828 |
|
|
BPU with 3 data fields:
|
829 |
|
|
|
830 |
|
|
the result itself
|
831 |
|
|
result's address into RF
|
832 |
|
|
a flag that marks this BPU entry as having valid data (a so-called
|
833 |
|
|
`active' flag)
|
834 |
|
|
|
835 |
|
|
next instruction(s) that need an operand from RF, read it through
|
836 |
|
|
a dedicated function (combinational logic), that does the following:
|
837 |
|
|
|
838 |
|
|
checks all BPU entries and see which ones are active (hold meaningful
|
839 |
|
|
data).
|
840 |
|
|
compares operand's address against the addresses in all active BPU
|
841 |
|
|
entries.
|
842 |
|
|
if a single address matches, gets the data in that BPU entry rather than
|
843 |
|
|
data from the RF.
|
844 |
|
|
if multiple addresses match, gets the data in the most recent BPU entry.
|
845 |
|
|
Even though it's possible that 2 matches happen at simultaneous BPU
|
846 |
|
|
entries, this situation should never occur; it would indicate a design
|
847 |
|
|
bug. This illegal situation would assert an error during simulation.
|
848 |
|
|
if no address matches, gets data from the RF (as if BPU were not
|
849 |
|
|
existing).
|
850 |
|
|
|
851 |
|
|
|
852 |
|
|
\n
|
853 |
|
|
The maximum delay between a write and a read from the RF is 4 clocks. Thus, the
|
854 |
|
|
BPU FIFO-like structure has a depth of 4. \n
|
855 |
|
|
On the other hand, the BPU must be able to be written 3 one byte operands, at a
|
856 |
|
|
time (must have 3 write ports). The most BPU demanding instructions are stores with
|
857 |
|
|
pre(post) decrement(increment). Both the one byte data and a 2 byte pointer register
|
858 |
|
|
must be written into the BPU, as well as into the RF. The 3 bytes are
|
859 |
|
|
simultaneousely written into so-called `BPU chains' or `BPU registers' (BPU
|
860 |
|
|
chains 0, 1, 2; or BPU registers 0, 1, 2; or BPR0, BPR1, BPR2). \n
|
861 |
|
|
\n
|
862 |
|
|
The BPU has 3x4 entries, each consisting of:
|
863 |
|
|
- an 8 bit data field
|
864 |
|
|
- a 5 bit address field
|
865 |
|
|
- a flag that marks the entry as active or inactive
|
866 |
|
|
|
867 |
|
|
\par Accessing BPU:
|
868 |
|
|
\n
|
869 |
|
|
\image html pavr_hwres_bpu_01.gif
|
870 |
|
|
\n
|
871 |
|
|
\n
|
872 |
|
|
*/
|
873 |
|
|
|
874 |
|
|
|
875 |
|
|
|
876 |
|
|
/*!
|
877 |
|
|
\defgroup pavr_hwres_bpr0 Bypass chain 0
|
878 |
|
|
\ingroup pavr_hwres_bpu
|
879 |
|
|
\par Bypass chain 0 (BPR0) write port connectivity
|
880 |
|
|
\n
|
881 |
|
|
\image html pavr_hwres_bpr0_01.gif
|
882 |
|
|
\n
|
883 |
|
|
\par Requests to BPR0 write port
|
884 |
|
|
- pavr_s5_alu_bpr0wr_rq \n
|
885 |
|
|
Need by regular ALU instructions. \n
|
886 |
|
|
- pavr_s6_iof_bpr0wr_rq \n
|
887 |
|
|
Needed by instructions that read the IO File (IN, BLD).
|
888 |
|
|
- pavr_s6_daculd_bpr0wr_rq \n
|
889 |
|
|
Needed by loads.
|
890 |
|
|
- pavr_s5_dacust_bpr0wr_rq \n
|
891 |
|
|
Needed by stores.
|
892 |
|
|
- pavr_s6_pmdo_bpr0wr_rq \n
|
893 |
|
|
Needed by LPM family instructions.
|
894 |
|
|
\n
|
895 |
|
|
*/
|
896 |
|
|
|
897 |
|
|
|
898 |
|
|
|
899 |
|
|
/*!
|
900 |
|
|
\defgroup pavr_hwres_bpr1 Bypass chain 1
|
901 |
|
|
\ingroup pavr_hwres_bpu
|
902 |
|
|
\par Bypass chain 1 (BPR1) write port connectivity
|
903 |
|
|
\n
|
904 |
|
|
\image html pavr_hwres_bpr1_01.gif
|
905 |
|
|
\n
|
906 |
|
|
\par Requests to BPR1 write port
|
907 |
|
|
- pavr_s5_alu_bpr1wr_rq \n
|
908 |
|
|
Need by regular ALU instructions that have a 16 bit result (ADIW, SBIW,
|
909 |
|
|
MUL, MULS, MULSU, FMUL, FMULS, FMULSU, MOVW).
|
910 |
|
|
- pavr_s5_dacux_bpr12wr_rq \n
|
911 |
|
|
Needed by loads and stores with pre(post) decrement(increment). \n
|
912 |
|
|
Lower byte of X pointer will be written into BPR1.
|
913 |
|
|
- pavr_s5_dacuy_bpr12wr_rq \n
|
914 |
|
|
Needed by loads and stores with pre(post) decrement(increment). \n
|
915 |
|
|
Lower byte of Y pointer will be written into BPR1.
|
916 |
|
|
- pavr_s5_dacuz_bpr12wr_rq \n
|
917 |
|
|
Needed by loads and stores with pre(post) decrement(increment). \n
|
918 |
|
|
Lower byte of Z pointer will be written into BPR1.
|
919 |
|
|
*/
|
920 |
|
|
|
921 |
|
|
|
922 |
|
|
|
923 |
|
|
/*!
|
924 |
|
|
\defgroup pavr_hwres_bpr2 Bypass chain 2
|
925 |
|
|
\ingroup pavr_hwres_bpu
|
926 |
|
|
\par Bypass chain 2 (BPR2) write port connectivity
|
927 |
|
|
\n
|
928 |
|
|
\image html pavr_hwres_bpr2_01.gif
|
929 |
|
|
\n
|
930 |
|
|
\par Requests to BPR2 write port
|
931 |
|
|
- pavr_s5_dacux_bpr12wr_rq \n
|
932 |
|
|
Needed by loads and stores with pre(post) decrement(increment). \n
|
933 |
|
|
Higher byte of X pointer will be written into BPR2.
|
934 |
|
|
- pavr_s5_dacuy_bpr12wr_rq \n
|
935 |
|
|
Needed by loads and stores with pre(post) decrement(increment). \n
|
936 |
|
|
Higher byte of Y pointer will be written into BPR2.
|
937 |
|
|
- pavr_s5_dacuz_bpr12wr_rq
|
938 |
|
|
Needed by loads and stores with pre(post) decrement(increment). \n
|
939 |
|
|
Higher byte of Z pointer will be written into BPR2.
|
940 |
|
|
*/
|
941 |
|
|
|
942 |
|
|
|
943 |
|
|
|
944 |
|
|
/*!
|
945 |
|
|
\defgroup pavr_hwres_iof IO File
|
946 |
|
|
\ingroup pavr_hwres
|
947 |
|
|
The IO File is composed of a set of discrete registers, that are grouped into a
|
948 |
|
|
memory-like entity. The IO File has a general write/read port that is
|
949 |
|
|
byte-oriented, and separate read and write ports for each register in the IO
|
950 |
|
|
File. \n
|
951 |
|
|
\n
|
952 |
|
|
\image html pavr_hwres_iof_01.gif
|
953 |
|
|
\n
|
954 |
|
|
Each IO File register is assigned a unique address in the IO space. That address
|
955 |
|
|
is defined in the in the constants definition file
|
956 |
|
|
(`pavr-constants.vhd'). \n
|
957 |
|
|
The IO space is placed in the Unified Memory just above the RF, that is, starting
|
958 |
|
|
with address 32. \n
|
959 |
|
|
The IO addressing space range is 0...63 (Unified Memory addresses 32...95). \n
|
960 |
|
|
Undefined IO registers will read an undefined value. \n
|
961 |
|
|
\n
|
962 |
|
|
*/
|
963 |
|
|
|
964 |
|
|
|
965 |
|
|
|
966 |
|
|
/*!
|
967 |
|
|
\defgroup pavr_hwres_iof_gen General IO port
|
968 |
|
|
\ingroup pavr_hwres_iof
|
969 |
|
|
\par General IO File port connectivity
|
970 |
|
|
\n
|
971 |
|
|
\image html pavr_hwres_iof_gen_01.gif
|
972 |
|
|
\n
|
973 |
|
|
The general IO File port is a little bit more ellaborated than a simple read/write
|
974 |
|
|
port. It can read bytes from IO registers to output and write bytes from input to
|
975 |
|
|
IO registers. Also, it can do some bit processing: load bits (from T flag in SREG to
|
976 |
|
|
output), store bits (from input to T bit in SREG), set IO bits, clear IO bits. \n
|
977 |
|
|
An opcode has to be provided to specify one of the actions that this port is
|
978 |
|
|
capable of. \n
|
979 |
|
|
\n
|
980 |
|
|
The following \b opcodes are implemented for the IO File general port:
|
981 |
|
|
- read byte (needed by instructions IN, SBIC, SBIS)
|
982 |
|
|
- write byte (OUT)
|
983 |
|
|
- clear bit (CBI)
|
984 |
|
|
- set bit (SBI)
|
985 |
|
|
- load bit (BLD)
|
986 |
|
|
- store bit (BST)
|
987 |
|
|
|
988 |
|
|
\par Requests to this port
|
989 |
|
|
- pavr_s5_iof_rq \n
|
990 |
|
|
Needed by instructions that manipulate IO File in stage s5: CBI, SBI, SBIC,
|
991 |
|
|
SBIS, BSET, BCLR, IN, OUT, BLD, BST.
|
992 |
|
|
- pavr_s6_iof_rq \n
|
993 |
|
|
Needed by instructions that manipulate IO File in stage s6: CBI, SBI, BSET,
|
994 |
|
|
BCLR.
|
995 |
|
|
- pavr_s5_dacu_iof_rq \n
|
996 |
|
|
Needed by loads and stores that are decoded by DACU as accessing IO File. \n
|
997 |
|
|
\n
|
998 |
|
|
*/
|
999 |
|
|
|
1000 |
|
|
|
1001 |
|
|
|
1002 |
|
|
/*!
|
1003 |
|
|
\defgroup pavr_hwres_iof_sregwr SREG port
|
1004 |
|
|
\ingroup pavr_hwres_iof
|
1005 |
|
|
\par SREG port connectivity
|
1006 |
|
|
\n
|
1007 |
|
|
\image html pavr_hwres_iof_sreg_01.gif
|
1008 |
|
|
\n
|
1009 |
|
|
\par Requests to this port
|
1010 |
|
|
- pavr_s5_alu_sregwr_rq \n
|
1011 |
|
|
This signalizes that an instruction that uses the ALU wants to update the
|
1012 |
|
|
arithmetic flags. \n
|
1013 |
|
|
Flags I (general interrupt enable, SREG(7)) and T (transfer bit, SREG(6))
|
1014 |
|
|
are left unchanged. \n
|
1015 |
|
|
- pavr_s5_setiflag_sregwr_rq \n
|
1016 |
|
|
This sets the I flag. \n
|
1017 |
|
|
Only RETI instruction needs this.
|
1018 |
|
|
- pavr_s5_clriflag_sregwr_rq \n
|
1019 |
|
|
This clears the I flag. \n
|
1020 |
|
|
No instruction explicitely requests this. \n
|
1021 |
|
|
This is only requested when an interrupt is acknowledged (during the
|
1022 |
|
|
consequent implicit CALL). \n
|
1023 |
|
|
\n
|
1024 |
|
|
*/
|
1025 |
|
|
|
1026 |
|
|
|
1027 |
|
|
|
1028 |
|
|
/*!
|
1029 |
|
|
\defgroup pavr_hwres_iof_spwr SP port
|
1030 |
|
|
\ingroup pavr_hwres_iof
|
1031 |
|
|
\par SP port connectivity
|
1032 |
|
|
\n
|
1033 |
|
|
\image html pavr_hwres_iof_sp_01.gif
|
1034 |
|
|
\n
|
1035 |
|
|
This the stack pointer. \n
|
1036 |
|
|
It is 16 bits wide, being composed of two 8 bit registers, SPL and SPH. \n
|
1037 |
|
|
The stack can reside anywhere in the Unified Memory space. That is, anywhere in
|
1038 |
|
|
the RF, IOF or DM. It can even begin, for example, in RF and continue in IOF.
|
1039 |
|
|
However, placing the stack pointer in the IOF is likely to be a programming error,
|
1040 |
|
|
as the IOF registers have dedicated functions. Quasi-random values from stack
|
1041 |
|
|
written into IOF could result, for example, in an unpredictably trigerring any
|
1042 |
|
|
interrupt, and in general, in unpredictable behavior of the controller. \n
|
1043 |
|
|
|
1044 |
|
|
\par Requests to this port
|
1045 |
|
|
- pavr_s5_inc_spwr_rq \n
|
1046 |
|
|
Increment SP (SPH & SPL) with 1. \n
|
1047 |
|
|
Needed by POP.
|
1048 |
|
|
- pavr_s5_dec_spwr_rq \n
|
1049 |
|
|
Increment SP with 1. \n
|
1050 |
|
|
Needed by PUSH.
|
1051 |
|
|
- pavr_s5_calldec_spwr_rq \n
|
1052 |
|
|
Decrement SP with 1. \n
|
1053 |
|
|
Needed by RCALL, ICALL, EICALL, CALL, interrupt implicit CALL.
|
1054 |
|
|
- pavr_s51_calldec_spwr_rq \n
|
1055 |
|
|
Decrement SP with 1. \n
|
1056 |
|
|
Needed by RCALL, ICALL, EICALL, CALL, interrupt implicit CALL.
|
1057 |
|
|
- pavr_s52_calldec_spwr_rq \n
|
1058 |
|
|
Decrement SP with 1. \n
|
1059 |
|
|
Needed by RCALL, ICALL, EICALL, CALL, interrupt implicit CALL.
|
1060 |
|
|
- pavr_s5_retinc2_spwr_rq \n
|
1061 |
|
|
Increment SP with 2. \n
|
1062 |
|
|
Needed by RET, RETI.
|
1063 |
|
|
- pavr_s51_retinc_spwr_rq \n
|
1064 |
|
|
Increment SP with 1. \n
|
1065 |
|
|
Needed by RET, RETI. \n
|
1066 |
|
|
\n
|
1067 |
|
|
*/
|
1068 |
|
|
|
1069 |
|
|
|
1070 |
|
|
|
1071 |
|
|
/*!
|
1072 |
|
|
\defgroup pavr_hwres_iof_rampxwr RAMPX port
|
1073 |
|
|
\ingroup pavr_hwres_iof
|
1074 |
|
|
\par RAMPX port connectivity
|
1075 |
|
|
\n
|
1076 |
|
|
\image html pavr_hwres_iof_rampx_01.gif
|
1077 |
|
|
\n
|
1078 |
|
|
\par Requests to this port
|
1079 |
|
|
- pavr_s5_ldstincrampx_xwr_rq \n
|
1080 |
|
|
Needed by loads and stores with postincrement. \n
|
1081 |
|
|
Only modify RAMPX if the controller has more than 64 KB of Data Mamory.
|
1082 |
|
|
- pavr_s5_ldstdecrampx_xwr_rq \n
|
1083 |
|
|
Needed by loads and stores with predecrement. \n
|
1084 |
|
|
Only modify RAMPX if the controller has more than 64 KB of Data Mamory. \n
|
1085 |
|
|
\n
|
1086 |
|
|
*/
|
1087 |
|
|
|
1088 |
|
|
|
1089 |
|
|
|
1090 |
|
|
/*!
|
1091 |
|
|
\defgroup pavr_hwres_iof_rampywr RAMPY port
|
1092 |
|
|
\ingroup pavr_hwres_iof
|
1093 |
|
|
\par RAMPY port connectivity
|
1094 |
|
|
\n
|
1095 |
|
|
\image html pavr_hwres_iof_rampy_01.gif
|
1096 |
|
|
\n
|
1097 |
|
|
\par Requests to this port
|
1098 |
|
|
- pavr_s5_ldstincrampy_xwr_rq \n
|
1099 |
|
|
Needed by loads and stores with postincrement. \n
|
1100 |
|
|
Only modify RAMPY if the controller has more than 64 KB of Data Mamory.
|
1101 |
|
|
- pavr_s5_ldstdecrampy_xwr_rq \n
|
1102 |
|
|
Needed by loads and stores with predecrement. \n
|
1103 |
|
|
Only modify RAMPY if the controller has more than 64 KB of Data Mamory. \n
|
1104 |
|
|
\n
|
1105 |
|
|
*/
|
1106 |
|
|
|
1107 |
|
|
|
1108 |
|
|
|
1109 |
|
|
/*!
|
1110 |
|
|
\defgroup pavr_hwres_iof_rampzwr RAMPZ port
|
1111 |
|
|
\ingroup pavr_hwres_iof
|
1112 |
|
|
\par RAMPZ port connectivity
|
1113 |
|
|
\n
|
1114 |
|
|
\image html pavr_hwres_iof_rampz_01.gif
|
1115 |
|
|
\n
|
1116 |
|
|
\par Requests to this port
|
1117 |
|
|
- pavr_s5_ldstincrampz_xwr_rq \n
|
1118 |
|
|
Needed by loads and stores with postincrement. \n
|
1119 |
|
|
Only modify RAMPZ if the controller has more than 64 KB of Data Mamory.
|
1120 |
|
|
- pavr_s5_ldstdecrampz_xwr_rq \n
|
1121 |
|
|
Needed by loads and stores with predecrement. \n
|
1122 |
|
|
Only modify RAMPZ if the controller has more than 64 KB of Data Mamory. \n
|
1123 |
|
|
\n
|
1124 |
|
|
*/
|
1125 |
|
|
|
1126 |
|
|
|
1127 |
|
|
|
1128 |
|
|
/*!
|
1129 |
|
|
\defgroup pavr_hwres_iof_rampdwr RAMPD port
|
1130 |
|
|
\ingroup pavr_hwres_iof
|
1131 |
|
|
\par RAMPD port connectivity
|
1132 |
|
|
\n
|
1133 |
|
|
\image html pavr_hwres_iof_rampd_01.gif
|
1134 |
|
|
\n
|
1135 |
|
|
This is a trivial read-only port. \n
|
1136 |
|
|
\n
|
1137 |
|
|
The register RAMPD is used in controllers with more than 64KB of Data Memory,
|
1138 |
|
|
to access the whole Data Memory space. \n
|
1139 |
|
|
RAMPD is used by instructions LDS (LoaD direct from data Space) and STS (STore
|
1140 |
|
|
direct to data Space). In order to get to the desired Data Memory space address,
|
1141 |
|
|
these instructions concatenate RAMPD with a 16 bit constant from the instruction
|
1142 |
|
|
word (RAMPD:k16). \n
|
1143 |
|
|
\n
|
1144 |
|
|
In controllers with less than 64KB of Data Memory, this register is not used. \n
|
1145 |
|
|
The RAMPD register can be written only through the IOF general read and write port. \n
|
1146 |
|
|
No instruction explicitely requests to write this register. \n
|
1147 |
|
|
\n
|
1148 |
|
|
*/
|
1149 |
|
|
|
1150 |
|
|
|
1151 |
|
|
|
1152 |
|
|
/*!
|
1153 |
|
|
\defgroup pavr_hwres_iof_eindwr EIND port
|
1154 |
|
|
\ingroup pavr_hwres_iof
|
1155 |
|
|
\par EIND port connectivity
|
1156 |
|
|
\n
|
1157 |
|
|
\image html pavr_hwres_iof_eind_01.gif
|
1158 |
|
|
\n
|
1159 |
|
|
This is a trivial read-only port. \n
|
1160 |
|
|
\n
|
1161 |
|
|
The register EIND is used in controllers with more than 64K words of Program
|
1162 |
|
|
Memory, to access the whole program space. \n
|
1163 |
|
|
EIND is used by instructions EICALL (Extended Indirect CALL) and EIJMP (Extended
|
1164 |
|
|
Indirect JuMP). In order to get to the desired program space address, these
|
1165 |
|
|
instructions concatenate EIND with the Z register (EIND:Z). \n
|
1166 |
|
|
\n
|
1167 |
|
|
In controllers with less than 64K words of Program Memory, this register is not
|
1168 |
|
|
used. \n
|
1169 |
|
|
The EIND register can be written only through the IOF general read and write
|
1170 |
|
|
port. \n
|
1171 |
|
|
No instruction explicitely requests to write this register. \n
|
1172 |
|
|
\n
|
1173 |
|
|
*/
|
1174 |
|
|
|
1175 |
|
|
|
1176 |
|
|
|
1177 |
|
|
/*!
|
1178 |
|
|
\defgroup pavr_hwres_iof_perif Peripherals
|
1179 |
|
|
\ingroup pavr_hwres_iof
|
1180 |
|
|
Peripherals are only of secondary importance for this project. \n
|
1181 |
|
|
However, an \ref pavr_hwres_iof_perif_pa "IO port", an
|
1182 |
|
|
\ref pavr_hwres_iof_perif_int0 "external interrupt" and an
|
1183 |
|
|
\ref pavr_hwres_iof_perif_t0 "8 bit timer" are implemented, to properly test the
|
1184 |
|
|
interrupt system. \n
|
1185 |
|
|
Peripherals have been designed to be \b decoupled from the kernel. They are easily
|
1186 |
|
|
upgradable, without needing to touch the kernel. \n
|
1187 |
|
|
\n
|
1188 |
|
|
*/
|
1189 |
|
|
|
1190 |
|
|
|
1191 |
|
|
|
1192 |
|
|
/*!
|
1193 |
|
|
\defgroup pavr_hwres_iof_perif_pa Port A
|
1194 |
|
|
\ingroup pavr_hwres_iof_perif
|
1195 |
|
|
\par Port A structure
|
1196 |
|
|
The port A offers 8 bidirectional general purpose IO lines. \n
|
1197 |
|
|
Lines 0 and 1 also have alternate functions:
|
1198 |
|
|
|
1199 |
|
|
line 0 can be used as \ref pavr_hwres_iof_perif_int0 "external interrupt 0" input
|
1200 |
|
|
line 1 can be used as \ref pavr_hwres_iof_perif_t0 "timer 0" clock input.
|
1201 |
|
|
|
1202 |
|
|
\n
|
1203 |
|
|
Port A is managed through 3 IO File locations: \b PORTA, \b DDRA and \b PINA. \n
|
1204 |
|
|
\b DDRA sets each pin's direction: DDRA(i)=0 means that line i is input,
|
1205 |
|
|
DDRA(i)=1 means that line i is output. \n
|
1206 |
|
|
When writing a value to the port, that value goes into \b PORTA. If DDRA configures
|
1207 |
|
|
the corresponding lines as outputs, the contents of PORTA will be available on
|
1208 |
|
|
external pins. However, if DDRA configures the lines as inputs (DDRA(i)=0), then:
|
1209 |
|
|
|
1210 |
|
|
if PORTA(i)=0, the line i is `pure' input (High Z). \n
|
1211 |
|
|
if PORTA(i)=1, the line i is an input weakly pulled high. \n
|
1212 |
|
|
|
1213 |
|
|
\b PINA reads the physical value of external lines, rather than PORTA. \n
|
1214 |
|
|
\par Port A schematics
|
1215 |
|
|
\n
|
1216 |
|
|
\image html pavr_hwres_iof_perif_pa_01.gif
|
1217 |
|
|
\n
|
1218 |
|
|
\n
|
1219 |
|
|
*/
|
1220 |
|
|
|
1221 |
|
|
|
1222 |
|
|
/*!
|
1223 |
|
|
\defgroup pavr_hwres_iof_perif_int0 External interrupt 0
|
1224 |
|
|
\ingroup pavr_hwres_iof_perif
|
1225 |
|
|
\par Features
|
1226 |
|
|
External interrupt 0 is physically mapped on the line 0 (bit 0) of
|
1227 |
|
|
\ref pavr_hwres_iof_perif_pa "port A". \n
|
1228 |
|
|
\n
|
1229 |
|
|
Its associated interrupt flag resides into the IO File register GIFR (General
|
1230 |
|
|
Interrupt Flags Register): \n
|
1231 |
|
|
\n
|
1232 |
|
|
\image html pavr_hwres_iof_perif_int0_01.gif
|
1233 |
|
|
\n
|
1234 |
|
|
External interrupt 0 is enabled/disabled by setting/clearing bit 6 in GIMSK
|
1235 |
|
|
(General Interrupt Mask) register: \n
|
1236 |
|
|
\n
|
1237 |
|
|
\image html pavr_hwres_iof_perif_int0_02.gif
|
1238 |
|
|
\n
|
1239 |
|
|
If enabled, it can trigger an interrupt on high-to-low transition, low-to-high
|
1240 |
|
|
transition, or on a low level of the interrupt 0 input. This behavior is defined
|
1241 |
|
|
by 2 bits in the MCUCR (Microcontroller Control) register: \n
|
1242 |
|
|
\n
|
1243 |
|
|
\image html pavr_hwres_iof_perif_int0_03.gif
|
1244 |
|
|
\n
|
1245 |
|
|
*/
|
1246 |
|
|
|
1247 |
|
|
|
1248 |
|
|
/*!
|
1249 |
|
|
\defgroup pavr_hwres_iof_perif_t0 Timer 0
|
1250 |
|
|
\ingroup pavr_hwres_iof_perif
|
1251 |
|
|
\par Features
|
1252 |
|
|
The IO File register that holds the current count is TCNT0. \n
|
1253 |
|
|
Its behavior is controlled by a set of other IO File registers:
|
1254 |
|
|
- TIFR (Timer Interrupt Flag Register) holds the Timer 0 interrupt flag: \n
|
1255 |
|
|
\n
|
1256 |
|
|
\image html pavr_hwres_iof_perif_t0_02.gif
|
1257 |
|
|
\n
|
1258 |
|
|
- TIMSK (Timer Interrupt Mask) contains the flag that enables/disables Timer 0
|
1259 |
|
|
interrupt: \n
|
1260 |
|
|
\n
|
1261 |
|
|
\image html pavr_hwres_iof_perif_t0_03.gif
|
1262 |
|
|
\n
|
1263 |
|
|
- TCCR0 (Timer 0 Control Register) register defines the prescaling source of
|
1264 |
|
|
Timer 0. \n
|
1265 |
|
|
When external input pin is selected, Timer 0 clock source will be the line 0
|
1266 |
|
|
of \ref pavr_hwres_iof_perif_pa "port A": \n
|
1267 |
|
|
\n
|
1268 |
|
|
\image html pavr_hwres_iof_perif_t0_01.gif
|
1269 |
|
|
\n
|
1270 |
|
|
\n
|
1271 |
|
|
*/
|
1272 |
|
|
|
1273 |
|
|
|
1274 |
|
|
|
1275 |
|
|
|
1276 |
|
|
/*!
|
1277 |
|
|
\defgroup pavr_hwres_alu ALU
|
1278 |
|
|
\ingroup pavr_hwres
|
1279 |
|
|
\par ALU connectivity:
|
1280 |
|
|
\n
|
1281 |
|
|
\image html pavr_hwres_alu_01.gif
|
1282 |
|
|
\n
|
1283 |
|
|
\ref alu_pipe_ref_01 "Here" it can be seen how the ALU plugs into the pipeline. \n
|
1284 |
|
|
\n
|
1285 |
|
|
The ALU is a 100% \b combinational device. \n
|
1286 |
|
|
It accepts 2 operands:
|
1287 |
|
|
|
1288 |
|
|
a 16 bit operand \n
|
1289 |
|
|
This is taken through the Bypass Unit.
|
1290 |
|
|
an 8 bit operand \n
|
1291 |
|
|
This is taken through the Bypass Unit.
|
1292 |
|
|
|
1293 |
|
|
The ALU output is 16 bits wide. \n
|
1294 |
|
|
|
1295 |
|
|
\par ALU opcodes:
|
1296 |
|
|
- NOP \n
|
1297 |
|
|
- OP1 \n
|
1298 |
|
|
Transfers operand 1 directly to the ALU output. \n
|
1299 |
|
|
OP2 \n
|
1300 |
|
|
Transfers operand 2 directly to the lower 8 bits of ALU output. \n
|
1301 |
|
|
- ADD8 \n
|
1302 |
|
|
ADC8 \n
|
1303 |
|
|
Adds with carry lower 8 bits of operand 1 with operand 2. \n
|
1304 |
|
|
SUB8 \n
|
1305 |
|
|
SBC8 \n
|
1306 |
|
|
- AND8 \n
|
1307 |
|
|
EOR8 \n
|
1308 |
|
|
OR8 \n
|
1309 |
|
|
- INC8 \n
|
1310 |
|
|
DEC8 \n
|
1311 |
|
|
- COM8 \n
|
1312 |
|
|
NEG8 \n
|
1313 |
|
|
SWAP8 \n
|
1314 |
|
|
- LSR8 \n
|
1315 |
|
|
ASR8 \n
|
1316 |
|
|
ROR8 \n
|
1317 |
|
|
- ADD16 \n
|
1318 |
|
|
Adds without carry operand 1 with operand 2 sign extended to 16 bits. \n
|
1319 |
|
|
SUB16 \n
|
1320 |
|
|
- MUL8 \n
|
1321 |
|
|
MULS8 \n
|
1322 |
|
|
MULSU8 \n
|
1323 |
|
|
FMUL8 \n
|
1324 |
|
|
FMULS8 \n
|
1325 |
|
|
FMULSU8 \n
|
1326 |
|
|
|
1327 |
|
|
\par ALU flags:
|
1328 |
|
|
- H (half carry)
|
1329 |
|
|
- S (sign)
|
1330 |
|
|
- V (two's complement)
|
1331 |
|
|
- N (negative)
|
1332 |
|
|
- Z (zero)
|
1333 |
|
|
- C (carry)
|
1334 |
|
|
*/
|
1335 |
|
|
|
1336 |
|
|
|
1337 |
|
|
|
1338 |
|
|
|
1339 |
|
|
/*!
|
1340 |
|
|
\defgroup pavr_hwres_dacu DACU
|
1341 |
|
|
\ingroup pavr_hwres
|
1342 |
|
|
\par Overview
|
1343 |
|
|
The Data Address Calculation Unit offers a unified read and write access over the
|
1344 |
|
|
concatenated RF, IOF and DM space, that is, over the Unified Memory (UM) space. \n
|
1345 |
|
|
Loads and stores operate in the UM space. They use the DACU in order to translate
|
1346 |
|
|
the Unified Memory address into a RF, IOF or DM address. \n
|
1347 |
|
|
The DACU takes requests to read or write into UM space, translates the UM address
|
1348 |
|
|
into RF, IOF or DM address, and transparently places requests to read or write
|
1349 |
|
|
the specific hardware resource (RF, IOF or DM) that corresponds to the given UM
|
1350 |
|
|
address. \n
|
1351 |
|
|
|
1352 |
|
|
\par Reading DACU
|
1353 |
|
|
\n
|
1354 |
|
|
\image html pavr_hwres_dacu_01.gif
|
1355 |
|
|
\n
|
1356 |
|
|
DACU read \b requests:
|
1357 |
|
|
|
1358 |
|
|
pavr_s5_x_dacurd_rq
|
1359 |
|
|
Needed by loads from address given by X pointer register.
|
1360 |
|
|
pavr_s5_y_dacurd_rq
|
1361 |
|
|
Needed by loads from address given by Y pointer register.
|
1362 |
|
|
pavr_s5_z_dacurd_rq
|
1363 |
|
|
Needed by loads from address given by Z pointer register.
|
1364 |
|
|
pavr_s5_sp_dacurd_rq
|
1365 |
|
|
Needed by POP instruction.
|
1366 |
|
|
pavr_s5_k16_dacurd_rq
|
1367 |
|
|
Needed by LDS instruction.
|
1368 |
|
|
If the controller has more than 64KB of Data Memory, the Unified Memory
|
1369 |
|
|
address is built by concatenating the RAMPD with the 16 bit constant.
|
1370 |
|
|
pavr_s5_pchi8_dacurd_rq
|
1371 |
|
|
The higher 8 bits of the PC are loaded from the stack.
|
1372 |
|
|
Needed by RET and RETI instructions.
|
1373 |
|
|
pavr_s51_pcmid8_dacurd_rq
|
1374 |
|
|
The middle 8 bits of the PC are loaded from the stack.
|
1375 |
|
|
Needed by RET and RETI instructions.
|
1376 |
|
|
pavr_s52_pclo8_dacurd_rq
|
1377 |
|
|
The lower 8 bits of the PC are loaded from the stack.
|
1378 |
|
|
Needed by RET and RETI instructions.
|
1379 |
|
|
|
1380 |
|
|
\n
|
1381 |
|
|
As a response to read requests, the DACU places read \b requests to RF, IOF or DM:
|
1382 |
|
|
|
1383 |
|
|
pavr_s5_dacu_rfrd1_rq
|
1384 |
|
|
pavr_s5_dacu_iof_rq
|
1385 |
|
|
pavr_s5_dacu_dmrd_rq
|
1386 |
|
|
|
1387 |
|
|
\par Writing DACU
|
1388 |
|
|
\n
|
1389 |
|
|
\image html pavr_hwres_dacu_02.gif
|
1390 |
|
|
\n
|
1391 |
|
|
DACU write \b requests:
|
1392 |
|
|
|
1393 |
|
|
pavr_s5_x_dacuwr_rq
|
1394 |
|
|
Needed by stores to address given by X pointer register.
|
1395 |
|
|
pavr_s5_y_dacuwr_rq
|
1396 |
|
|
Needed by stores to address given by Y pointer register.
|
1397 |
|
|
pavr_s5_z_dacuwr_rq
|
1398 |
|
|
Needed by stores to address given by Z pointer register.
|
1399 |
|
|
pavr_s5_sp_dacuwr_rq
|
1400 |
|
|
Needed by PUSH instruction.
|
1401 |
|
|
pavr_s5_k16_dacuwr_rq
|
1402 |
|
|
Needed by STS instruction.
|
1403 |
|
|
If the controller has more than 64KB of Data Memory, the Unified Memory
|
1404 |
|
|
address is built by concatenating the RAMPD with the 16 bit constant.
|
1405 |
|
|
pavr_s5_pclo8_dacuwr_rq
|
1406 |
|
|
The lower 8 bits of the PC are stored on the stack.
|
1407 |
|
|
Needed by CALL family instructions (CALL, RCALL, ICALL, EICALL, implicit
|
1408 |
|
|
interrupt CALL).
|
1409 |
|
|
pavr_s51_pcmid8_dacuwr_rq
|
1410 |
|
|
The middle 8 bits of the PC are stored on the stack.
|
1411 |
|
|
Needed by CALL family instructions.
|
1412 |
|
|
pavr_s52_pchi8_dacuwr_rq
|
1413 |
|
|
The higher 8 bits of the PC are stored on the stack.
|
1414 |
|
|
Needed by CALL family instructions.
|
1415 |
|
|
|
1416 |
|
|
\n
|
1417 |
|
|
As a response to write requests, the DACU places write \b requests to RF, IOF or DM,
|
1418 |
|
|
and BPU:
|
1419 |
|
|
|
1420 |
|
|
pavr_s5_dacu_rfwr_rq
|
1421 |
|
|
pavr_s5_dacu_iof_rq
|
1422 |
|
|
pavr_s5_dacu_dmwr_rq
|
1423 |
|
|
pavr_s5_dacust_bpr0wr_rq
|
1424 |
|
|
|
1425 |
|
|
\n
|
1426 |
|
|
*/
|
1427 |
|
|
|
1428 |
|
|
|
1429 |
|
|
|
1430 |
|
|
/*!
|
1431 |
|
|
\defgroup pavr_hwres_dm Data Memory
|
1432 |
|
|
\ingroup pavr_hwres
|
1433 |
|
|
\par Data Memory connectivity
|
1434 |
|
|
\n
|
1435 |
|
|
\image html pavr_hwres_dm_01.gif
|
1436 |
|
|
\n
|
1437 |
|
|
The Data Memory is a single port RAM. \n
|
1438 |
|
|
That port provides both read and write DM accesses. \n
|
1439 |
|
|
The DM is organized on bytes, and has the length set by a constant in the
|
1440 |
|
|
constants definition file (`pavr-constants.vhd'). \n
|
1441 |
|
|
\n
|
1442 |
|
|
\par Requests to DM
|
1443 |
|
|
Requests to access DM come only from the DACU: \n
|
1444 |
|
|
- pavr_s5_dacu_dmrd_rq \n
|
1445 |
|
|
- pavr_s5_dacu_dmwr_rq \n
|
1446 |
|
|
\n
|
1447 |
|
|
*/
|
1448 |
|
|
|
1449 |
|
|
|
1450 |
|
|
|
1451 |
|
|
/*!
|
1452 |
|
|
\defgroup pavr_hwres_pm Program Memory
|
1453 |
|
|
\ingroup pavr_hwres
|
1454 |
|
|
\par PM handling
|
1455 |
|
|
\n
|
1456 |
|
|
\image html pavr_hwres_pm_01.gif
|
1457 |
|
|
\n
|
1458 |
|
|
The Program Memory is a single port RAM. \n
|
1459 |
|
|
That port provides read-only access. Support for the instruction SPM (Store
|
1460 |
|
|
Program Memory) is currently not provided. \n
|
1461 |
|
|
The PM is organized on 16 bit words, and has the length set by a constant in the
|
1462 |
|
|
constants definition file (`pavr-constants.vhd'). \n
|
1463 |
|
|
\n
|
1464 |
|
|
Apart from controlling the Program Memory, the PM manager also controls the Program
|
1465 |
|
|
Counter. \n
|
1466 |
|
|
Some PM access requests need to modify the PC, others don't. The only PM requests
|
1467 |
|
|
that don't modify the PC are the loads from PM (LPM and ELPM instructions). The
|
1468 |
|
|
other requests correspond to instructions that want to modify the instruction
|
1469 |
|
|
flow, thus modify the PC (jumps, branches, calls and returns). \n
|
1470 |
|
|
|
1471 |
|
|
\par Program Counter handling
|
1472 |
|
|
At a given time, the pipeline can process more than one instruction. Up to 6
|
1473 |
|
|
instructions can be simultaneousely processed. Obviousely, each of these
|
1474 |
|
|
instructions has its own address in the PM. \n
|
1475 |
|
|
One may ask: how is defined the Program Counter, as long as two or more instructions
|
1476 |
|
|
are simultaneousely executed? Whose address is considered to be the Program Counter? \n
|
1477 |
|
|
The answer is: the Program Counter is in fact composed of a set of registers. Each
|
1478 |
|
|
instruction in the pipeline has an associated Program Counter that follows it
|
1479 |
|
|
while flowing through the pipeline. Implementation details can be found in the
|
1480 |
|
|
description of \ref pavr_pipeline_jumps "jumps", \ref pavr_pipeline_branches "branches",
|
1481 |
|
|
\ref pavr_pipeline_skips "skips", \ref pavr_pipeline_calls "calls" and
|
1482 |
|
|
\ref pavr_pipeline_returns "returns". \n
|
1483 |
|
|
As an example, when a relative jump computes the target address, it considers its
|
1484 |
|
|
own Program Counter rather than the address of the instruction fetched that
|
1485 |
|
|
moment from the PM.
|
1486 |
|
|
The instructions that modify the instruction flow (jumps, branches, skips, calls
|
1487 |
|
|
and returns) must be able to manipulate the program counters associated with
|
1488 |
|
|
pipeline stages s1, s2 and s3. However, this is done not directly, but via the
|
1489 |
|
|
Program Memory manager. The PM manager centralizes all instruction flow access
|
1490 |
|
|
requests (jump requests, branch requests, etc) and takes care of the program
|
1491 |
|
|
counters in an organized and manageable manner. \n
|
1492 |
|
|
|
1493 |
|
|
\par Requests to PM
|
1494 |
|
|
- pavr_s5_lpm_pm_rq \n
|
1495 |
|
|
Needed by LPM instruction. \n
|
1496 |
|
|
This request doesn't modify the instruction flow. \n
|
1497 |
|
|
- pavr_s5_elpm_pm_rq \n
|
1498 |
|
|
Needed by ELPM instruction. \n
|
1499 |
|
|
This request doesn't modify the instruction flow. \n
|
1500 |
|
|
- pavr_s4_z_pm_rq \n
|
1501 |
|
|
Needed by ICALL and IJMP. \n
|
1502 |
|
|
- pavr_s4_zeind_pm_rq \n
|
1503 |
|
|
Needed by EICALL and EIJMP. \n
|
1504 |
|
|
- pavr_s4_k22abs_pm_rq \n
|
1505 |
|
|
Needed by CALL and JMP. \n
|
1506 |
|
|
To get to the jump address, the 16 bit instruction constant is concatenated
|
1507 |
|
|
with a 6 bit constant previousely read also from the instruction opcode.
|
1508 |
|
|
- pavr_s4_k12rel_pm_rq \n
|
1509 |
|
|
Needed by RCALL and RJMP. \n
|
1510 |
|
|
Note that pavr_s4_pc is a pipeline register that holds the Program Memory
|
1511 |
|
|
address of the instruction executing in pipeline stage s4. \n
|
1512 |
|
|
Because the relative jump actually occurs in stage s4, pavr_s4_pc is needed
|
1513 |
|
|
rather than the current Program Counter (pavr_pc).
|
1514 |
|
|
- pavr_s6_branch_pm_rq \n
|
1515 |
|
|
Needed by branch instructions (BRBC and BRBS). \n
|
1516 |
|
|
- pavr_s6_skip_pm_rq \n
|
1517 |
|
|
Needed by some skip instructions (CPSE, SBRC and SBRS). \n
|
1518 |
|
|
- pavr_s61_skip_pm_rq \n
|
1519 |
|
|
Needed by some skip instructions (SBIC and SBIS). \n
|
1520 |
|
|
- pavr_s4_k22int_pm_rq \n
|
1521 |
|
|
Needed by implicit interrupt CALL. \n
|
1522 |
|
|
- pavr_s54_ret_pm_rq \n
|
1523 |
|
|
Needed by RET and RETI. \n
|
1524 |
|
|
\n
|
1525 |
|
|
*/
|
1526 |
|
|
|
1527 |
|
|
|
1528 |
|
|
|
1529 |
|
|
/*!
|
1530 |
|
|
\defgroup pavr_hwres_sfu Stall and Flush Unit
|
1531 |
|
|
\ingroup pavr_hwres
|
1532 |
|
|
The pipeline controls its own stall and flush status, through specific stall and
|
1533 |
|
|
flush-related request signals. These requests are sent to the Stall and Flush
|
1534 |
|
|
Unit (SFU). The output of the SFU is a set of signals that directly control
|
1535 |
|
|
pipeline stages (a stall and flush control signals pair for each stage): \n
|
1536 |
|
|
\n
|
1537 |
|
|
\image html pavr_hwres_sfu_01.gif
|
1538 |
|
|
\n
|
1539 |
|
|
\par Requests to SFU
|
1540 |
|
|
- stall requests \n
|
1541 |
|
|
The SFU stalls \b all younger stages. However, by stalling-only, the
|
1542 |
|
|
current instruction is spawned into 2 instances. One of them must
|
1543 |
|
|
be killed (flushed). The the younger instance is killed (the
|
1544 |
|
|
previous stage is flushed). \n
|
1545 |
|
|
Thus, a nop is introduced in the pipeline \b before the instruction
|
1546 |
|
|
wavefront. \n
|
1547 |
|
|
If more than one stage request a stall at the same time, the older
|
1548 |
|
|
one has priority (the younger one will be stalled along with the
|
1549 |
|
|
others). Only after that, the younger one will be ackowledged its
|
1550 |
|
|
stall by means of appropriate stall and flush control signals. \n
|
1551 |
|
|
Stall \b requests:
|
1552 |
|
|
- pavr_s3_stall_rq \n
|
1553 |
|
|
- pavr_s5_stall_rq \n
|
1554 |
|
|
- pavr_s6_stall_rq
|
1555 |
|
|
- flush requests \n
|
1556 |
|
|
The SFU simply flushes that stage. \n
|
1557 |
|
|
More than one flush could be acknolewdged at the same time, without
|
1558 |
|
|
competition. However, all flush requests happen to request to flush the
|
1559 |
|
|
same pipeline stage, s2. \n
|
1560 |
|
|
Flush \b requests:
|
1561 |
|
|
- pavr_s3_flush_s2_rq \n
|
1562 |
|
|
- pavr_s4_flush_s2_rq \n
|
1563 |
|
|
- pavr_s4_ret_flush_s2_rq \n
|
1564 |
|
|
- pavr_s5_ret_flush_s2_rq \n
|
1565 |
|
|
- pavr_s51_ret_flush_s2_rq \n
|
1566 |
|
|
- pavr_s52_ret_flush_s2_rq \n
|
1567 |
|
|
- pavr_s53_ret_flush_s2_rq \n
|
1568 |
|
|
- pavr_s54_ret_flush_s2_rq \n
|
1569 |
|
|
- pavr_s55_ret_flush_s2_rq \n
|
1570 |
|
|
- branch requests \n
|
1571 |
|
|
The SFU flushes stages s2...s5, because the corresponding instructions were
|
1572 |
|
|
already uselessly fetched, and requests the PC to be loaded with the branch
|
1573 |
|
|
relative jump address. \n
|
1574 |
|
|
Branch \b requests:
|
1575 |
|
|
- pavr_s6_branch_rq \n
|
1576 |
|
|
- skip requests \n
|
1577 |
|
|
The SFU treats skips as branches that have the relative jump address equal to
|
1578 |
|
|
0, 1 or 2, depending on the skip condition and on next instruction's length
|
1579 |
|
|
(16/32 bits). \n
|
1580 |
|
|
Skip \b requests:
|
1581 |
|
|
- pavr_s6_skip_rq \n
|
1582 |
|
|
- pavr_s61_skip_rq
|
1583 |
|
|
- nop requests \n
|
1584 |
|
|
The SFU stalls all younger instructions. The current instruction is
|
1585 |
|
|
spawned into 2 instances. The older instance is killed (the very
|
1586 |
|
|
same stage that requested the nop stage is flushed). \n
|
1587 |
|
|
Thus, a nop is introduced in the pipeline \b after the instruction
|
1588 |
|
|
wavefront. \n
|
1589 |
|
|
In order to do that, a micro-state machine is needed outside the
|
1590 |
|
|
pipeline, because otherwise that stage would undefinitely stall
|
1591 |
|
|
itself. \n
|
1592 |
|
|
Nop \b requests: \n
|
1593 |
|
|
- pavr_s4_nop_rq \n
|
1594 |
|
|
\par SFU control signals
|
1595 |
|
|
Each main pipeline stage (s1-s6) has 2 kinds of control signals, that are generated by
|
1596 |
|
|
the SFU:
|
1597 |
|
|
|
1598 |
|
|
stall control \n
|
1599 |
|
|
All registers in this stage are instructed to remain unchanged
|
1600 |
|
|
All possible requests to hardware resources (such as RF, IOF, BPU,
|
1601 |
|
|
DACU, SREG, etc) are reseted (to 0).
|
1602 |
|
|
flush control \n
|
1603 |
|
|
All registers in this stage are reseted (to 0), to a most "benign"
|
1604 |
|
|
state (a nop). Also, all requests to hardware resources are
|
1605 |
|
|
reseted.
|
1606 |
|
|
|
1607 |
|
|
\n
|
1608 |
|
|
Each main pipeline stage has an associated flag that determines whether or not
|
1609 |
|
|
that stage has the right to access hardware resources. These flags are also
|
1610 |
|
|
managed by the SFU. \n
|
1611 |
|
|
Hardware resources enabling flags:
|
1612 |
|
|
- pavr_s1_hwrq_en
|
1613 |
|
|
- pavr_s2_hwrq_en
|
1614 |
|
|
- pavr_s3_hwrq_en
|
1615 |
|
|
- pavr_s4_hwrq_en
|
1616 |
|
|
- pavr_s5_hwrq_en
|
1617 |
|
|
- pavr_s6_hwrq_en
|
1618 |
|
|
*/
|
1619 |
|
|
|
1620 |
|
|
|
1621 |
|
|
|
1622 |
|
|
/*!
|
1623 |
|
|
\defgroup pavr_pipeline Pipeline details
|
1624 |
|
|
\ingroup pavr_implementation
|
1625 |
|
|
*/
|
1626 |
|
|
|
1627 |
|
|
|
1628 |
|
|
|
1629 |
|
|
/*!
|
1630 |
|
|
\defgroup pavr_pipeline_alu ALU
|
1631 |
|
|
\ingroup pavr_pipeline
|
1632 |
|
|
\par ALU description
|
1633 |
|
|
The ALU is not a potentially conflicting resource, as it is fully controlled by
|
1634 |
|
|
pipeline stage s5. \n
|
1635 |
|
|
\n
|
1636 |
|
|
There are two ALU operands. The first operand is taken either from RF read port 1,
|
1637 |
|
|
if it's an 8 bit operand, or taken from RF read port 1 (lower 8 bits) and from RF
|
1638 |
|
|
read port 2 (higher 8 bits), if it's a 16 bit operand. The second operand is taken
|
1639 |
|
|
either from the RF read port 2 or directly from the instruction opcode; it is
|
1640 |
|
|
always 8 bit-wide. \n
|
1641 |
|
|
Both operands are fed to the ALU through the Bypass Unit. \n
|
1642 |
|
|
All ALU-requiring instructions write their result into the Bypass Unit. \n
|
1643 |
|
|
Details about the ALU hardware resource (connectivity, ALU opcodes) can be found
|
1644 |
|
|
\ref pavr_hwres_alu "here". \n
|
1645 |
|
|
Instructions that make use of the ALU-related pipeline registers:
|
1646 |
|
|
- ADD, ADC, ADIW
|
1647 |
|
|
- SUB, SUBI, SBC, SBCI, SBIW
|
1648 |
|
|
- INC, DEC
|
1649 |
|
|
- AND, ANDI
|
1650 |
|
|
- OR, ORI, EOR
|
1651 |
|
|
- COM, NEG, CP, CPC, CPI, SWAP
|
1652 |
|
|
- LSR, ROR, ASR
|
1653 |
|
|
- MUL, MULS, MULSU
|
1654 |
|
|
- FMUL, FMULS, FMULSU
|
1655 |
|
|
- MOV, MOVW
|
1656 |
|
|
|
1657 |
|
|
\par Plugging the ALU into the pipeline
|
1658 |
|
|
The pipeline registers related to ALU access are presented in the picture below. \n
|
1659 |
|
|
From this picture, it can also easely figured out instructions' timing. \n
|
1660 |
|
|
\anchor alu_pipe_ref_01
|
1661 |
|
|
\n
|
1662 |
|
|
\image html pavr_pipe_alu_01.gif
|
1663 |
|
|
\n
|
1664 |
|
|
*/
|
1665 |
|
|
|
1666 |
|
|
|
1667 |
|
|
/*!
|
1668 |
|
|
\defgroup pavr_pipeline_iof IOF access
|
1669 |
|
|
\ingroup pavr_pipeline
|
1670 |
|
|
\par A few details
|
1671 |
|
|
The IO File is accessed during stages s5 or/and s6. \n
|
1672 |
|
|
As presented \ref pavr_hwres_iof_gen "here", the IO File can do more than
|
1673 |
|
|
byte-oriented read-write operations. It can also do bit processing. \n
|
1674 |
|
|
The following data is provided to the IOF, for each pipeline stage in which IOF
|
1675 |
|
|
access is required:
|
1676 |
|
|
- byte address
|
1677 |
|
|
- bit address
|
1678 |
|
|
- opcode
|
1679 |
|
|
- byte data in
|
1680 |
|
|
|
1681 |
|
|
\par Accessing the IOF
|
1682 |
|
|
Main pipeline registers that implement IOF accessing instructions are presented
|
1683 |
|
|
here: \n
|
1684 |
|
|
\n
|
1685 |
|
|
\image html pavr_pipe_iof_01.gif
|
1686 |
|
|
\n
|
1687 |
|
|
*/
|
1688 |
|
|
|
1689 |
|
|
|
1690 |
|
|
/*!
|
1691 |
|
|
\defgroup pavr_pipeline_dacu DACU access
|
1692 |
|
|
\ingroup pavr_pipeline
|
1693 |
|
|
\par A few details
|
1694 |
|
|
The Data Address Calculation Unit (DACU) handles the Unified Memory, by mapping
|
1695 |
|
|
Unified Memory addresses into Register File, IO File or Data Memory addresses. \n
|
1696 |
|
|
It also transparently places access requests to RF, IOF or DM, as response to UM
|
1697 |
|
|
access requests. \n
|
1698 |
|
|
More details on DACU requests can be found \ref pavr_hwres_dacu "here". \n
|
1699 |
|
|
\par Plugging the DACU into the pipeline
|
1700 |
|
|
\n
|
1701 |
|
|
\image html pavr_pipe_dacu_01.gif
|
1702 |
|
|
\n
|
1703 |
|
|
*/
|
1704 |
|
|
|
1705 |
|
|
|
1706 |
|
|
/*!
|
1707 |
|
|
\defgroup pavr_pipeline_jumps Jumps
|
1708 |
|
|
\ingroup pavr_pipeline
|
1709 |
|
|
\par A few details
|
1710 |
|
|
There are 4 jump instructions:
|
1711 |
|
|
|
1712 |
|
|
RJMP (relative jump) \n
|
1713 |
|
|
The jump address is obtained by adding to the current Program Counter a 12
|
1714 |
|
|
bit signed offset obtained from the instruction word.
|
1715 |
|
|
IJMP (indirect jump) \n
|
1716 |
|
|
The jump address is read from the Z pointer register. \n
|
1717 |
|
|
The jump destination resides in the lower 64 Kwords of Program Memory. \n
|
1718 |
|
|
EIJMP (extended indirect jump) \n
|
1719 |
|
|
The jump address is read from EIND:Z (higher 6 bis from EIND register in IOF,
|
1720 |
|
|
and lower 16 bits from Z pointer in RF). \n
|
1721 |
|
|
This jump accesses the whole 22 bit addressing space of the Program Memory. \n
|
1722 |
|
|
JMP (long jump) \n
|
1723 |
|
|
The jump address is read from two consecutive instruction words. \n
|
1724 |
|
|
This jump accesses the whole 22 bit addressing space of the Program Memory. \n
|
1725 |
|
|
|
1726 |
|
|
\n
|
1727 |
|
|
When a jump is detected into the pipeline, next two instructions (that were already
|
1728 |
|
|
uselessly fetched from the Program Memory) are flushed. Then, the Program Memory
|
1729 |
|
|
manager is asked permission to access the Program Memory and to modify the
|
1730 |
|
|
instruction flow (modify the Program Counter). \n
|
1731 |
|
|
After that, unless it gets flushed or stalled by an older instruction, the jump
|
1732 |
|
|
instruction will configure the pipeline to fetch from the new PM address. \n
|
1733 |
|
|
\n
|
1734 |
|
|
RJMP and JMP take 3 clocks, while IJMP and EIJMP take 4 clocks. \n
|
1735 |
|
|
|
1736 |
|
|
\par Jump state machine
|
1737 |
|
|
\n
|
1738 |
|
|
\image html pavr_pipe_jumps_01.gif
|
1739 |
|
|
\n
|
1740 |
|
|
*/
|
1741 |
|
|
|
1742 |
|
|
|
1743 |
|
|
/*!
|
1744 |
|
|
\defgroup pavr_pipeline_branches Branches
|
1745 |
|
|
\ingroup pavr_pipeline
|
1746 |
|
|
\par A few details
|
1747 |
|
|
The branches condition a 7 bit relative jump by the value of a bit in the Status
|
1748 |
|
|
Register. \n
|
1749 |
|
|
If the branch condition is not met, no further action is taken. However, if the
|
1750 |
|
|
branch condition is evaluated as true, then all previous stages are flushed and
|
1751 |
|
|
the Stall and Flush Unit is requestd a branch. The SFU, in turn, asks the PM
|
1752 |
|
|
manager permission to access the Program Memory and modify the program flow. \n
|
1753 |
|
|
Branches take place in stage s6. \n
|
1754 |
|
|
\n
|
1755 |
|
|
Not taken branches take 2 clocks, while taken branches take 4 clocks. \n
|
1756 |
|
|
|
1757 |
|
|
\par Branch state machine
|
1758 |
|
|
\n
|
1759 |
|
|
\image html pavr_pipe_branches_01.gif
|
1760 |
|
|
\n
|
1761 |
|
|
*/
|
1762 |
|
|
|
1763 |
|
|
|
1764 |
|
|
/*!
|
1765 |
|
|
\defgroup pavr_pipeline_skips Skips
|
1766 |
|
|
\ingroup pavr_pipeline
|
1767 |
|
|
\par A few details
|
1768 |
|
|
Skips are implemented as branches that have the relative target address equal to
|
1769 |
|
|
0, 1 or 2, depending on the skip condition and on whether the following
|
1770 |
|
|
instruction has 16 or 32 bits. \n
|
1771 |
|
|
There are two kinds of skips: one category that makes the skip request in stage
|
1772 |
|
|
s6 (the same as branches), and one that requests skip in s61. The first category
|
1773 |
|
|
includes instructions CPSE (Compare registers and skip if equal), SBRC and SBRS
|
1774 |
|
|
(skip if bit in register is cleared/set). The second category includes SBIC, SBIS
|
1775 |
|
|
(Skip if bit in IO register is cleared/set). \n
|
1776 |
|
|
\n
|
1777 |
|
|
CPSE, SBRC and SBRS take 2 clocks if not taken, and 4 clocks if taken. \n
|
1778 |
|
|
SBIC and SBIS take 3 clocks if not taken, and 5 clocks if taken. \n
|
1779 |
|
|
|
1780 |
|
|
\par Skip state machine
|
1781 |
|
|
\n
|
1782 |
|
|
\image html pavr_pipe_skips_01.gif
|
1783 |
|
|
\n
|
1784 |
|
|
*/
|
1785 |
|
|
|
1786 |
|
|
|
1787 |
|
|
/*!
|
1788 |
|
|
\defgroup pavr_pipeline_calls Calls
|
1789 |
|
|
\ingroup pavr_pipeline
|
1790 |
|
|
\par A few details
|
1791 |
|
|
There are 4 call instructions, analogue to the \ref pavr_pipeline_jumps "jump"
|
1792 |
|
|
instructions:
|
1793 |
|
|
|
1794 |
|
|
RCALL (relative call) \n
|
1795 |
|
|
The call address is obtained by adding to the current Program Counter a 12
|
1796 |
|
|
bit signed offset obtained from the instruction word.
|
1797 |
|
|
ICALL (indirect call) \n
|
1798 |
|
|
The call address is read from the Z pointer register. \n
|
1799 |
|
|
The destination resides in the lower 64 Kwords of Program Memory. \n
|
1800 |
|
|
EICALL (extended indirect call) \n
|
1801 |
|
|
The call address is read from EIND:Z (higher 6 bis from EIND register in IOF,
|
1802 |
|
|
and lower 16 bits from Z pointer in RF). \n
|
1803 |
|
|
This call accesses the whole 22 bit addressing space of the Program Memory. \n
|
1804 |
|
|
CALL (far call) \n
|
1805 |
|
|
The call address is read from two consecutive instruction words. \n
|
1806 |
|
|
This call accesses the whole 22 bit addressing space of the Program Memory. \n
|
1807 |
|
|
|
1808 |
|
|
\n
|
1809 |
|
|
Apart from these, there is another kind of call, automatically inserted into the
|
1810 |
|
|
pipeline when an interrupt is processed. In addition to the regular calls, the
|
1811 |
|
|
implicit interrupt call also clears the general interrupt flag (flag I in the
|
1812 |
|
|
Status Register). This way, nested interrupts are disabled by default. However,
|
1813 |
|
|
they can be enabled explicitely. This behavior is questionable, but is implemented
|
1814 |
|
|
for the sake of AVR compatibility. \n
|
1815 |
|
|
After an interrupt generates an implicit call, further interrupts are disabled for
|
1816 |
|
|
4 clocks. This way, at least one instruction will be executed fron the called
|
1817 |
|
|
subroutine. Only after that, another interrupt can change the instruction flow. \n
|
1818 |
|
|
\n
|
1819 |
|
|
All calls take 4 clocks. \n
|
1820 |
|
|
|
1821 |
|
|
\par Call state machine
|
1822 |
|
|
\n
|
1823 |
|
|
\image html pavr_pipe_calls_01.gif
|
1824 |
|
|
\n
|
1825 |
|
|
*/
|
1826 |
|
|
|
1827 |
|
|
|
1828 |
|
|
/*!
|
1829 |
|
|
\defgroup pavr_pipeline_returns Returns
|
1830 |
|
|
\ingroup pavr_pipeline
|
1831 |
|
|
\par A few details
|
1832 |
|
|
There are two kinds of returns:
|
1833 |
|
|
|
1834 |
|
|
RET \n
|
1835 |
|
|
Return from subroutine. \n
|
1836 |
|
|
The Program Counter is loaded with the return address (22 bit wide) read
|
1837 |
|
|
from the stack, and the Stack Pointer is incremented by 3.
|
1838 |
|
|
RETI \n
|
1839 |
|
|
The same as RET, but in addition sets the general interrupt flag (flag I in
|
1840 |
|
|
the Status Register).
|
1841 |
|
|
|
1842 |
|
|
\n
|
1843 |
|
|
Returns are the slowest instructions in the pAVR implementation of the AVR
|
1844 |
|
|
instruction set. They take 9 clocks. \n
|
1845 |
|
|
First 2 clocks are spent while waiting the previous instructions to write the
|
1846 |
|
|
Unified Memory. Next 5 clocks, the Program Counter is read from the Unified
|
1847 |
|
|
Memory. In a future version, this part might take only 4 clocks. Finally,
|
1848 |
|
|
another 2 clocks are spent while bringing the target instruction into the
|
1849 |
|
|
instruction register. \n
|
1850 |
|
|
\n
|
1851 |
|
|
\image html pavr_pipe_returns_01.gif
|
1852 |
|
|
\n
|
1853 |
|
|
*/
|
1854 |
|
|
|
1855 |
|
|
|
1856 |
|
|
|
1857 |
|
|
/*!
|
1858 |
|
|
\defgroup pavr_pipeline_int Interrupts
|
1859 |
|
|
\ingroup pavr_pipeline
|
1860 |
|
|
\par General
|
1861 |
|
|
The Interrupt System can forcedly place calls into the pipeline stage s3, as a
|
1862 |
|
|
result of specific IO activity. \n
|
1863 |
|
|
\n
|
1864 |
|
|
\par Implementation
|
1865 |
|
|
The core of the Interrupt System is the Interrupt Manager module. It prioritizes
|
1866 |
|
|
the interrupt sources, checks if interrupts are enabled and if the pipeline is
|
1867 |
|
|
ready to process interrupts, and finally sends interrupt requests to the
|
1868 |
|
|
pipeline, together with the associated interrupt vector and other pipeline
|
1869 |
|
|
control signals. \n
|
1870 |
|
|
\n
|
1871 |
|
|
\image html pavr_hwres_int_01.gif
|
1872 |
|
|
\n
|
1873 |
|
|
The pipeline acknowledges interrupt requests by forcing the Instruction Decoder
|
1874 |
|
|
to decode a call instruction, with the absolute jump address given by the
|
1875 |
|
|
Interrupt Manager. Next 2 instructions, that were already uselessly fetched, are
|
1876 |
|
|
flushed. \n
|
1877 |
|
|
\n
|
1878 |
|
|
The interrupt vectors are parameterized, and can be placed anywhere in the
|
1879 |
|
|
Program Memory. \n
|
1880 |
|
|
Every interrupt has a parameterized priority. \n
|
1881 |
|
|
In the present implementation, up to 32 interrupt sources are handled. \n
|
1882 |
|
|
2 interrupt sources are implemented:
|
1883 |
|
|
\ref pavr_hwres_iof_perif_int0 "external interrupt 0" and
|
1884 |
|
|
\ref pavr_hwres_iof_perif_t0 "timer 0" interrupt. \n
|
1885 |
|
|
\n
|
1886 |
|
|
Because the Interrupt Manger shares much with the IO File, it is not
|
1887 |
|
|
built as a separate entity, but rather embedded into the IO File. The Interrupt
|
1888 |
|
|
Manager might be implemented as separate entity in a future version of pAVR. \n
|
1889 |
|
|
\n
|
1890 |
|
|
The interrupt latency is 5 clocks (1 clock needed by the interrupt manager and
|
1891 |
|
|
4 clocks needed by the implicit call).
|
1892 |
|
|
\n
|
1893 |
|
|
*/
|
1894 |
|
|
|
1895 |
|
|
|
1896 |
|
|
/*!
|
1897 |
|
|
\defgroup pavr_pipeline_others Others
|
1898 |
|
|
\ingroup pavr_pipeline
|
1899 |
|
|
\par LPM/ELPM state machine
|
1900 |
|
|
This is how the LPM/ELPM state machine plugs into the pipeline: \n
|
1901 |
|
|
\n
|
1902 |
|
|
\image html pavr_pipe_others_01.gif
|
1903 |
|
|
\n
|
1904 |
|
|
*/
|
1905 |
|
|
|
1906 |
|
|
|
1907 |
|
|
|
1908 |
|
|
/*!
|
1909 |
|
|
\defgroup pavr_test Testing
|
1910 |
|
|
\par Testing strategy
|
1911 |
|
|
When testing a certain entity, the following \b testing \b strategy was adopted:
|
1912 |
|
|
|
1913 |
|
|
embed that entity into a larger one that also includes all other
|
1914 |
|
|
ingredients needed for a real-life simulation of the tested entity. Typical
|
1915 |
|
|
such `other ingredients' are RAMs and multiplexers.
|
1916 |
|
|
run custom VHDL tests that test as much as possible of the functionality
|
1917 |
|
|
of the device under test. Extreme cases are the first situations to be tested.
|
1918 |
|
|
|
1919 |
|
|
\n
|
1920 |
|
|
Two kinds of tests were conducted on pAVR:
|
1921 |
|
|
|
1922 |
|
|
every module of pAVR was separately tested as described in the testing
|
1923 |
|
|
strategy above.
|
1924 |
|
|
pAVR as whole was tested as described in the testing strategy above.
|
1925 |
|
|
|
1926 |
|
|
\n
|
1927 |
|
|
|
1928 |
|
|
\par Testing pAVR modules
|
1929 |
|
|
Each pAVR module was separately tested. \n
|
1930 |
|
|
The particular tests carried out are presented below, grouped by the entities
|
1931 |
|
|
under test:
|
1932 |
|
|
|
1933 |
|
|
\b utilities defined in `std_util.vhd' \n
|
1934 |
|
|
The associated test file is `test_std_util.vhd'. \n
|
1935 |
|
|
The utilities defined in `std_util.vhd' here are:
|
1936 |
|
|
|
1937 |
|
|
type conversion routines often used throughout the other source files
|
1938 |
|
|
in this project
|
1939 |
|
|
basic arithmetic functions
|
1940 |
|
|
sign and zero-extend functions \n
|
1941 |
|
|
Both are tested in `test_std_util.vhd'. \n
|
1942 |
|
|
Extreme cases and typical cases are considered. \n
|
1943 |
|
|
vector comparision function \n
|
1944 |
|
|
Tested in `test_std_util.vhd'. \n
|
1945 |
|
|
Extreme cases and typical cases are considered. \n
|
1946 |
|
|
|
1947 |
|
|
\b ALU \n
|
1948 |
|
|
The associated tests are defined in `test_pavr_alu.vhd'. They consist of
|
1949 |
|
|
checking the ALU output and flags output for all ALU opcodes, one by one,
|
1950 |
|
|
for all of these situations:
|
1951 |
|
|
|
1952 |
|
|
carry in = 0
|
1953 |
|
|
carry in = 1
|
1954 |
|
|
additions generate overflow
|
1955 |
|
|
substractions generate overflow
|
1956 |
|
|
|
1957 |
|
|
There are 26 ALU opcodes to be checked for each situation.
|
1958 |
|
|
\b Register \b File \n
|
1959 |
|
|
The associated tests are defined in `test_pavr_register_file.vhd'. \n
|
1960 |
|
|
The following tests are done:
|
1961 |
|
|
|
1962 |
|
|
read all ports, one at a time
|
1963 |
|
|
|
1964 |
|
|
read port 1 (RFRD1)
|
1965 |
|
|
read port 2 (RFRD2)
|
1966 |
|
|
write port (RFWR)
|
1967 |
|
|
write pointer register X (RFXWR)
|
1968 |
|
|
write pointer register Y (RFYWR)
|
1969 |
|
|
write pointer register Z (RFZWR)
|
1970 |
|
|
|
1971 |
|
|
combined RFRD1, RFRD2, RFWR \n
|
1972 |
|
|
They should work simultaneousely.
|
1973 |
|
|
combined RFXWR, RFYWR, RFZWR \n
|
1974 |
|
|
They should work simultaneousely.
|
1975 |
|
|
combined RFRD1, RFRD2, RFWR, RFXWR, RFYWR, RFZWR \n
|
1976 |
|
|
That is, all RF ports are accessed simultaneousely. They should
|
1977 |
|
|
do their job. \n
|
1978 |
|
|
However, note that the pointer registers are accessible for writting
|
1979 |
|
|
by their own ports but also by the RF write port. Writing them via
|
1980 |
|
|
pointer register write ports overwrites writing via general write
|
1981 |
|
|
port. Even though concurrent writing could happen in a perfectly legal
|
1982 |
|
|
AVR implementation, AVR's behavior is unpredictible (what write port
|
1983 |
|
|
has priority). We have chosen for pAVR the priority as mentioned above.
|
1984 |
|
|
|
1985 |
|
|
\b IO \b File \n
|
1986 |
|
|
The associated tests are defined in `test_pavr_io_file.vhd'. \n
|
1987 |
|
|
The following tests are performed on the IOF:
|
1988 |
|
|
|
1989 |
|
|
test the IOF general write/read/bit processing port. \n
|
1990 |
|
|
Test all opcodes that this port is capable of:
|
1991 |
|
|
|
1992 |
|
|
wrbyte
|
1993 |
|
|
rdbyte
|
1994 |
|
|
clrbit
|
1995 |
|
|
setbit
|
1996 |
|
|
stbit
|
1997 |
|
|
ldbit
|
1998 |
|
|
|
1999 |
|
|
test the IOF port A. \n
|
2000 |
|
|
Port A is intended to offer to pAVR pin-level IO connectivity with the
|
2001 |
|
|
outside world. \n
|
2002 |
|
|
Test reading from and writing to Port A. \n
|
2003 |
|
|
Test that Port A pins correctly take the appropriate logic values
|
2004 |
|
|
(high, low, high Z or weak high).
|
2005 |
|
|
test Timer 0.
|
2006 |
|
|
|
2007 |
|
|
test Timer 0 prescaler.
|
2008 |
|
|
test Timer 0 overflow.
|
2009 |
|
|
test Timer 0 interrupt.
|
2010 |
|
|
|
2011 |
|
|
test External Interrupt 0. \n
|
2012 |
|
|
Test if each possible configuration (activation on low level, rising
|
2013 |
|
|
edge or falling edge) correctly triggers External Interrupt 0 flag.
|
2014 |
|
|
|
2015 |
|
|
\b Data \b Memory \n
|
2016 |
|
|
The tests defined in `test_pavr_dm.vhd' are simple read-write confirmations
|
2017 |
|
|
that the Data Memory does its job.
|
2018 |
|
|
|
2019 |
|
|
|
2020 |
|
|
\par Testing the pAVR entity
|
2021 |
|
|
pAVR as a whole was tested by building an upper entity that embedds a pAVR, its
|
2022 |
|
|
Program Memory and some multiplexers. Those multiplexers are meant to give Program
|
2023 |
|
|
Memory control to the test entity (for properly setting up Program Memory
|
2024 |
|
|
contents) or to pAVR (while pAVR is actually being monitored as it executes
|
2025 |
|
|
intructions from the Program Memory). \n
|
2026 |
|
|
\n
|
2027 |
|
|
The binary file that will be executed by pAVR during the test is automatically
|
2028 |
|
|
loaded into the Program Memory using an ANSI C utility, TagScan. The test entity
|
2029 |
|
|
has a number of tags spread over the source code, as comments. The TagScan utility
|
2030 |
|
|
reads the binary file to be loaded, scans the test file, and inserts VHDL
|
2031 |
|
|
statements into the properly tagged places. These statements load the Program
|
2032 |
|
|
Memory using its own write port. This way of initializing the Program Memory
|
2033 |
|
|
seems more general (and surely more interesting) than using file IO VHDL
|
2034 |
|
|
functions. \n
|
2035 |
|
|
The TagScan utility is also used for other purposes. For example, for
|
2036 |
|
|
inserting a certain header in all source files. It is heavily used as
|
2037 |
|
|
a general preprocessor. \n
|
2038 |
|
|
\n
|
2039 |
|
|
Testing pAVR as a whole actually means designing and running binaries that put
|
2040 |
|
|
pAVR on extreme situations. \n
|
2041 |
|
|
The following tests are done:
|
2042 |
|
|
\htmlonly
|
2043 |
|
|
|
2044 |
|
|
Interrupts
|
2045 |
|
|
This exercises pAVR interrupt handling.
|
2046 |
|
|
All interrupts are tested.
|
2047 |
|
|
The associated peripherals (Port A, Timer 0 and External Interrupt 0) are
|
2048 |
|
|
put in a variety of conditions.
|
2049 |
|
|
Results:
|
2050 |
|
|
tbd
|
2051 |
|
|
General test
|
2052 |
|
|
This is a hand-written assembler source that is meant to be assembled and
|
2053 |
|
|
run on pAVR.
|
2054 |
|
|
It exercises each of pAVR instructions, one by one.
|
2055 |
|
|
It tries to put pAVR in most difficult situations, for each instruction. For
|
2056 |
|
|
example, it exercises:
|
2057 |
|
|
|
2058 |
|
|
concurrent stalls
|
2059 |
|
|
stalls combined with 32 bit instructions
|
2060 |
|
|
stalls combined with intructions that change the instruction flow
|
2061 |
|
|
control hazard candidates (stress the Program Memory Manager and
|
2062 |
|
|
the Stall and Flush Unit)
|
2063 |
|
|
data hazard candidates (stress the Bypass Unit)
|
2064 |
|
|
|
2065 |
|
|
Results:
|
2066 |
|
|
Passed OK. The verification consisted of checking each instruction, each
|
2067 |
|
|
intermediate result and each relevant intermediate internal state.
|
2068 |
|
|
2069 |
|
|
2070 |
|
|
Assembler |
|
2071 |
|
|
Clocks |
|
2072 |
|
|
Instructions |
|
2073 |
|
|
CPI |
|
2074 |
|
|
2075 |
|
|
avrasm32, by Atmel |
|
2076 |
|
|
667 |
|
2077 |
|
|
361 |
|
2078 |
|
|
1.85 |
|
2079 |
|
|
|
|
|
|
2080 |
|
|
Sieve
|
2081 |
|
|
Sieve of Eratosthenes; finds the the first 100 prime numbers.
|
2082 |
|
|
Written in ANSI C.
|
2083 |
|
|
Results:
|
2084 |
|
|
2085 |
|
|
2086 |
|
|
Compiler |
|
2087 |
|
|
Clocks |
|
2088 |
|
|
Instructions |
|
2089 |
|
|
CPI |
|
2090 |
|
|
2091 |
|
|
avr-gcc, O0 |
|
2092 |
|
|
12170 |
|
2093 |
|
|
8851 |
|
2094 |
|
|
1.37 |
|
2095 |
|
|
2096 |
|
|
avr-gcc, O3 |
|
2097 |
|
|
11946 |
|
2098 |
|
|
8824 |
|
2099 |
|
|
1.35 |
|
2100 |
|
|
|
|
|
|
|
2101 |
|
|
TagScan
|
2102 |
|
|
Exercises string manipulating routines.
|
2103 |
|
|
Written in ANSI C.
|
2104 |
|
|
Results:
|
2105 |
|
|
2106 |
|
|
2107 |
|
|
Compiler |
|
2108 |
|
|
Clocks |
|
2109 |
|
|
Instructions |
|
2110 |
|
|
CPI |
|
2111 |
|
|
2112 |
|
|
tbd |
|
2113 |
|
|
tbd |
|
2114 |
|
|
tbd |
|
2115 |
|
|
tbd |
|
2116 |
|
|
|
|
|
|
2117 |
|
|
C compiler
|
2118 |
|
|
Written in ANSI C.
|
2119 |
|
|
Results:
|
2120 |
|
|
2121 |
|
|
2122 |
|
|
Compiler |
|
2123 |
|
|
Clocks |
|
2124 |
|
|
Instructions |
|
2125 |
|
|
CPI |
|
2126 |
|
|
2127 |
|
|
tbd |
|
2128 |
|
|
tbd |
|
2129 |
|
|
tbd |
|
2130 |
|
|
tbd |
|
2131 |
|
|
|
|
|
|
2132 |
|
|
Waves
|
2133 |
|
|
Simulates waves on the surface of a liquid.
|
2134 |
|
|
Written in ANSI C.
|
2135 |
|
|
Uses floating point numbers (observation: the avr-gcc compiler seems to
|
2136 |
|
|
take about 200 pAVR clocks per floating point operation).
|
2137 |
|
|
A mesh of only 5x5 points is considered, and only 5 iterations
|
2138 |
|
|
are done. Bigger values make the simulation unacceptably long on
|
2139 |
|
|
the available computer.
|
2140 |
|
|
|
2141 |
|
|
Checking the result is done by converting the array of 25 floats
|
2142 |
|
|
into a scaled array of 25 chars, copying these chars from Data
|
2143 |
|
|
Memory (by hand), constructing a 3D image of the result, and
|
2144 |
|
|
comparing it to a reference 3D image.
|
2145 |
|
|
|
2146 |
|
|
Results:
|
2147 |
|
|
Passed OK. As expected, the chars array to be tested exactly matches
|
2148 |
|
|
the reference array.
|
2149 |
|
|
2150 |
|
|
2151 |
|
|
Compiler |
|
2152 |
|
|
Clocks |
|
2153 |
|
|
Instructions |
|
2154 |
|
|
CPI |
|
2155 |
|
|
2156 |
|
|
avr-gcc |
|
2157 |
|
|
209,175 |
|
2158 |
|
|
122,236 |
|
2159 |
|
|
1.71 |
|
2160 |
|
|
|
|
|
|
2161 |
|
|
|
2162 |
|
|
\endhtmlonly
|
2163 |
|
|
\n
|
2164 |
|
|
*/
|
2165 |
|
|
|
2166 |
|
|
|
2167 |
|
|
|
2168 |
|
|
/*!
|
2169 |
|
|
\defgroup pavr_test_bugs Bugs
|
2170 |
|
|
\ingroup pavr_test
|
2171 |
|
|
\par Errata to Atmel's AVR documentation:
|
2172 |
|
|
The corrected versions of some paragraphs from Atmel's documentation are shown
|
2173 |
|
|
below. \n
|
2174 |
|
|
Original, wrong, terms are strikelined, while corrected terms are bolded: \n
|
2175 |
|
|
- The following text can be found throughout the references. \n
|
2176 |
|
|
\n
|
2177 |
|
|
\htmlonly
|
2178 |
|
|
|
2179 |
|
|
"...
|
2180 |
|
|
RAMPD
|
2181 |
|
|
|
2182 |
|
|
Register concatenated with the Z register instruction word
|
2183 |
|
|
enabling direct addressing of the whole data space on MCUs with more than 64K
|
2184 |
|
|
bytes data space.
|
2185 |
|
|
|
2186 |
|
|
EIND
|
2187 |
|
|
|
2188 |
|
|
Register concatenated with the instruction word Z register
|
2189 |
|
|
enabling indirect jump and call to the whole program space on MCUs with more
|
2190 |
|
|
than 64K bytes words program space.
|
2191 |
|
|
..."
|
2192 |
|
|
|
2193 |
|
|
|
2194 |
|
|
\endhtmlonly
|
2195 |
|
|
- In the `AVR Instruction Set' document, page 60:
|
2196 |
|
|
\n
|
2197 |
|
|
\htmlonly
|
2198 |
|
|
|
2199 |
|
|
"...
|
2200 |
|
|
V: Rd7 * /Rd7 /Rr7 * /R7 + /Rd7 * Rr7 * R7
|
2201 |
|
|
..."
|
2202 |
|
|
|
2203 |
|
|
|
2204 |
|
|
\endhtmlonly
|
2205 |
|
|
|
2206 |
|
|
|
2207 |
|
|
|
2208 |
|
|
\par Atmel's AVRStudio simulator bugs
|
2209 |
|
|
- bug001
|
2210 |
|
|
- \b symptom: NEG instruction computes the H flag via other formula than that
|
2211 |
|
|
given in the AVR Instruction Set (H=R3+Rd3). \n
|
2212 |
|
|
Where is the bug, in the simulator or in the document, it's up to be seen. \n
|
2213 |
|
|
Versions 3.53 and 4.04 of AVRStudio behave the same (weird) way. \n
|
2214 |
|
|
Example: initially having SREG=0x01 and R10=0xD9, NEG R10 sets SREG to 0x01
|
2215 |
|
|
instead of 0x21. \n
|
2216 |
|
|
The AVRStudio formula for H seems to be R3*(not Rd3) rather than R3+Rd3.
|
2217 |
|
|
- bug002
|
2218 |
|
|
- \b symptom: when trying to set/reset port A pins, there is a 1 clock delay
|
2219 |
|
|
between the moment PORTA receives the bits and the moment PINA gets updated.
|
2220 |
|
|
Those events should have been simultaneous (of course, port A direction was
|
2221 |
|
|
considered already configured as output, by setting DDRA(i)=1).
|
2222 |
|
|
|
2223 |
|
|
|
2224 |
|
|
|
2225 |
|
|
\b pAVR \b bugs \b history
|
2226 |
|
|
\par 28-31 July 2002
|
2227 |
|
|
- The Program Memory and Program Counter are handled in different places, even
|
2228 |
|
|
though they share much functionality. Moreover, the Program Counter doesn't
|
2229 |
|
|
have associated an explicit manager. This makes PM and PC quite difficult to
|
2230 |
|
|
maintain. \n
|
2231 |
|
|
Reorganized PM and PC handling. Now they are handled by a common manager,
|
2232 |
|
|
the PM manager. \n
|
2233 |
|
|
- Every test runs smoothly so far.
|
2234 |
|
|
|
2235 |
|
|
\par 27 July 2002
|
2236 |
|
|
- The Stall and Flush Unit and Shadow Manager are difficult to maintain because
|
2237 |
|
|
of too many rules and exceptions. \n
|
2238 |
|
|
Reorganized the SFU so that its behavior follows only one rule, the so-called
|
2239 |
|
|
`SFU rule': older hardware resource requests have priority over younger ones. \n
|
2240 |
|
|
Reorganized the Shadow Manager so that its behavior accurately implements
|
2241 |
|
|
the shadow protocol. However, a few exceptions still exist (such as LPM
|
2242 |
|
|
Program Memory handling or CPSE RF handling).
|
2243 |
|
|
- *** Modelsim 5.3 behaves strange again. \n
|
2244 |
|
|
It asserts hardware managers warnings, but when the the local conditions are
|
2245 |
|
|
investigated, the situation is perfectly legal. It seems that at a moment
|
2246 |
|
|
when a signal has a 0-1 transition and another one has a 1-0 transition,
|
2247 |
|
|
there is a `small' (theoretically 0) amount of time that both signals are
|
2248 |
|
|
considered 1, and that transient triggers the warning. That shouldn't happen,
|
2249 |
|
|
it seems to be a Modelsim bug. \n
|
2250 |
|
|
However, trying to reproduce that behavior was unsuccessfull. It only appears
|
2251 |
|
|
sometimes; the apparition rule is well hidden. \n
|
2252 |
|
|
For now, it's best to ignore these warnings during simulation. However, it
|
2253 |
|
|
means that those assertions don't fullfill their purpose.
|
2254 |
|
|
|
2255 |
|
|
\par 25 July 2002
|
2256 |
|
|
- bug023
|
2257 |
|
|
- \b symptom: IJMP and EIJMP don't jump were they are supposed to, if the
|
2258 |
|
|
instruction before them modifies the Z pointer.
|
2259 |
|
|
- \b remedy: IJMP and EIJMP actually jump before even the BPU gets updated
|
2260 |
|
|
by the previous instruction. As they use the Register File mapped Z
|
2261 |
|
|
pointer for finding target address, they need to be calmed down for a
|
2262 |
|
|
clock (Z pointer is modified in stage s5). \n
|
2263 |
|
|
Just request a nop in pipe stage s4. Now IJMP and EIJMP take 4 clocks
|
2264 |
|
|
(RJMP and JMP still take 3).
|
2265 |
|
|
- \b status: corrected
|
2266 |
|
|
- bug024
|
2267 |
|
|
- \b symptom: loads don't work any more (!). They (sometimes) get garbage.
|
2268 |
|
|
- \b remedy: when correcting bug 021, the shadow protocol was applied for
|
2269 |
|
|
all devices that could use it. It was wrong. The Data Address Calculation
|
2270 |
|
|
Unit must not use the shadow protocol, because it gets RF/IOF/DM
|
2271 |
|
|
exclusivity by means of stalling, and it must be granted access to these
|
2272 |
|
|
resources, even during stalls. \n
|
2273 |
|
|
When trying to read from Unified Memory, loads got data from shadow
|
2274 |
|
|
registers, not directly from the RF/IOF/DM 's data out.
|
2275 |
|
|
- \b status: corrected DACU
|
2276 |
|
|
- bug025
|
2277 |
|
|
- \b symptom: JMP gets corrupted if the previous instruction is a load.
|
2278 |
|
|
- \b remedy: JMP is a 32 bit instruction. The second word (a 16 bit constant)
|
2279 |
|
|
can get flushed by a previous instruction stall s5. \n
|
2280 |
|
|
Flush s2 requested in s3 and s4 are more delicate than other flushes.
|
2281 |
|
|
They can interfere with stalls requested by older instructions. They must
|
2282 |
|
|
be stallable because older instructions might want that. If stall s2
|
2283 |
|
|
requested in s3 or s4, then if older instructions require stall, don't
|
2284 |
|
|
blindly flush s2, but rather do nothing and wait for the stall to end.
|
2285 |
|
|
Only after that acknowledge the flush. \n
|
2286 |
|
|
- \b status: corrected
|
2287 |
|
|
- bug026
|
2288 |
|
|
- \b symptom: CPSE doesn't skip the following instruction, when it should.
|
2289 |
|
|
- \b remedy: the skip condition was picked as `not zero flag', instead of
|
2290 |
|
|
`zero flag'.
|
2291 |
|
|
- \b status: corrected
|
2292 |
|
|
- bug027
|
2293 |
|
|
- \b symptom: SBIC and SBIS don't do their job.
|
2294 |
|
|
- \b remedy: IOF read access was simply not requestd.
|
2295 |
|
|
- \b status: corrected the Instruction Decoder by placing an IOF request
|
2296 |
|
|
in pipe stage s5, for SBIC and SBIS
|
2297 |
|
|
- bug028
|
2298 |
|
|
- \b symptom: RCALL doesn't work.
|
2299 |
|
|
- \b remedy:
|
2300 |
|
|
- the 12 bit relative offset wasn't initialized in the Instruction
|
2301 |
|
|
Decoder. Just do that (cut&paste the corresponding code line from RJMP,
|
2302 |
|
|
as the relative jump address is placed in the same bits in the
|
2303 |
|
|
instruction code).
|
2304 |
|
|
- the return address was correct for CALL but bigger with one than needed
|
2305 |
|
|
for RCALL. Actually, CALL and RCALL need \b different return addresses,
|
2306 |
|
|
as CALL has 32 bits and RCALL only 16. \n
|
2307 |
|
|
Modification: now, the current instruction's PC is \b conditionally
|
2308 |
|
|
incremented in pipe stage s4. A new set of wires and registers were
|
2309 |
|
|
introduced so that CALL can request to increment its return address.
|
2310 |
|
|
RCALL doesn't need to do that.
|
2311 |
|
|
- \b status: corrected the Instruction Decoder, so that CALL requires to
|
2312 |
|
|
increment its return address.
|
2313 |
|
|
- note: all instructions seem to work.
|
2314 |
|
|
|
2315 |
|
|
\par 24 July 2002
|
2316 |
|
|
- bug020
|
2317 |
|
|
- \b symptom: garbage got by loads placed immediately after stores that
|
2318 |
|
|
modify their pointer.
|
2319 |
|
|
- \b remedy: loads and stores can modify their data pointer. However, the
|
2320 |
|
|
Bypass Unit must also be updated, because the pointer registers are
|
2321 |
|
|
placed in the Register File. The BPU wasn't updated.
|
2322 |
|
|
- \b status: corrected
|
2323 |
|
|
- \b note: the modularity of the design (separate hardware managers, small
|
2324 |
|
|
set of conventions regarding signal naming, grouping similar-function
|
2325 |
|
|
code) payed off. This bug required an intervention spread out over half
|
2326 |
|
|
megabyte of code. The Data Address Calculation Unit, Bypass Unit were
|
2327 |
|
|
modifed, new wires and registers were defined, some of them were renamed.
|
2328 |
|
|
- bug021
|
2329 |
|
|
- \b symptom: stores that modify their pointer make the following
|
2330 |
|
|
instruction unable to update the Bypass Unit. Moreover, the BPU is
|
2331 |
|
|
written with garbage.
|
2332 |
|
|
- \b remedy: Stores and the instruction after them can require to
|
2333 |
|
|
simultaneousely write the BPU. That's because these stores make intensive
|
2334 |
|
|
use of BPU and eat all its write resources. They write 3 bytes: 2 of
|
2335 |
|
|
them in s5 (the modified pointer) and 1 in s6 (the data to be written
|
2336 |
|
|
into the Register File). The one written in s6 can be simultaneous with
|
2337 |
|
|
following instruction's s5 write BPU request. \n
|
2338 |
|
|
To correct this bug, there are 2 options:
|
2339 |
|
|
|
2340 |
|
|
1. add a stall in pipe stage s5 for all stores. That is, stores
|
2341 |
|
|
will take 3 clocks.
|
2342 |
|
|
2. increase BPU width from 2 chains to 3 chains and modify the way
|
2343 |
|
|
stores make use of Bypass Unit (write all what has to be written -
|
2344 |
|
|
3 bytes - in the same pipe stage, s5). This is more attractive
|
2345 |
|
|
because stores still need only 2 clocks. However, the Bypass Unit
|
2346 |
|
|
continues to grow (from initial depth/width of 2/2 to the present 4/3).
|
2347 |
|
|
|
2348 |
|
|
Option 2 was chosen. \n
|
2349 |
|
|
The Unified Memory architecture favorized this bug. Stores
|
2350 |
|
|
must be able to write the Register File and, consequently, write their
|
2351 |
|
|
data into BPU along with the pointer they have modified.
|
2352 |
|
|
- \b status: corrected
|
2353 |
|
|
- bug022
|
2354 |
|
|
- \b symptom: LPM always returns 0.
|
2355 |
|
|
- \b remedy: multiple bug:
|
2356 |
|
|
- The LPM stalled s2, then read s2 status. Seeing it `busy', gives up from
|
2357 |
|
|
reading what it needed and maintains pavr_pm_addr_int at its present value.
|
2358 |
|
|
The Program Memory Manager needs to be instructed to forcedly grant access
|
2359 |
|
|
to LPM instructions to s2, even if it is stalled. Also, the shadow protocol
|
2360 |
|
|
must be bypassed.
|
2361 |
|
|
- LPM didn't update BPU.
|
2362 |
|
|
- pointer registers were used directly in a few hardware managers, not via BPU.
|
2363 |
|
|
This enables subtle read before write hazards (they escaped until now).
|
2364 |
|
|
- \b status: corrected
|
2365 |
|
|
- note: \n
|
2366 |
|
|
LD Rd, -X; LD Rd, X; LD Rd, X+; \n
|
2367 |
|
|
LD Rd, -Y; LD Rd, Y; LD Rd, Y+; LDD Rd, Y+q; \n
|
2368 |
|
|
LD Rd, -Z; LD Rd, Z; LD Rd, Z+; LDD Rd, Z+q; \n
|
2369 |
|
|
ST -X, Rr; ST X, Rr; ST X+, Rr; \n
|
2370 |
|
|
ST -Y, Rr; ST Y, Rr; ST Y+, Rr; STD Y+q, Rr; \n
|
2371 |
|
|
ST -Z, Rr; ST Z, Rr; ST Z+, Rr; STD Z+q, Rr; \n
|
2372 |
|
|
LPM; LPM Rd, Z; LPM Rd, Z+ \n
|
2373 |
|
|
seem to work.
|
2374 |
|
|
|
2375 |
|
|
\par 23 July 2002
|
2376 |
|
|
- bug016
|
2377 |
|
|
- \b symptom: read before write data hazards
|
2378 |
|
|
- \b remedy: BLD instruction didn't update BPU.
|
2379 |
|
|
- \b status: corrected
|
2380 |
|
|
- bug017
|
2381 |
|
|
- \b symptom: BLD doesn't modify the target register.
|
2382 |
|
|
- \b remedy: while processing pavr_s5_iof_rq IOF request, the IOF Manager
|
2383 |
|
|
set IOF bit address to zero instead of pavr_s5_iof_bitaddr. Correct that.
|
2384 |
|
|
- \b status: corrected
|
2385 |
|
|
- bug018
|
2386 |
|
|
- \b symptom: Even though they work fine separately, POP, PUSH and MOVW one
|
2387 |
|
|
after another (in various combinations) don't.
|
2388 |
|
|
- \b remedy:
|
2389 |
|
|
This is a triple (!) bug:
|
2390 |
|
|
|
2391 |
|
|
MOVW requires a stall in s6 while POP requires a stall in s5. The
|
2392 |
|
|
two stalls are simultaneous. \n
|
2393 |
|
|
The Stall and Flush Unit doesn't handle properly multiple stalls. \n
|
2394 |
|
|
Modify SFU so that the oldest stall doesn't kill the younger one(s),
|
2395 |
|
|
but only delays it (them).
|
2396 |
|
|
The SP was incremented during a stall, and the DACU received after
|
2397 |
|
|
the stall a wrong pointer (the new SP). \n
|
2398 |
|
|
All hardware resources must be stallable. Presently they are not.
|
2399 |
|
|
The instruction after MOVW, PUSH is skipped. The PM data out shadow
|
2400 |
|
|
register doesn't do its job. \n
|
2401 |
|
|
The shadow registers are updated every clock. That's not right. \n
|
2402 |
|
|
Update them only if they don't already hold meaningful data (check
|
2403 |
|
|
the corresponding `shadow_active' flag). Otherwise, during
|
2404 |
|
|
successive stalls they get corrupted.
|
2405 |
|
|
|
2406 |
|
|
This was a tough one.
|
2407 |
|
|
- \b status: corrected
|
2408 |
|
|
- bug019
|
2409 |
|
|
- \b symptom: the sequence \n
|
2410 |
|
|
LDI R17, 0xC3 \n
|
2411 |
|
|
ST Z+, R17 \n
|
2412 |
|
|
results in storing garbage into memory.
|
2413 |
|
|
- \b remedy: the nop requests (placed by ST) increase the needed BPU depth
|
2414 |
|
|
with one. Thus, BPU depth must be increased from 3 to 4.
|
2415 |
|
|
- \b status: corrected
|
2416 |
|
|
- note:
|
2417 |
|
|
- CBI, SBI, BST, BLD, MOVW, IN, OUT, PUSH, POP, LDS, STS seem to work.
|
2418 |
|
|
|
2419 |
|
|
\par 22 July 2002
|
2420 |
|
|
- bug011
|
2421 |
|
|
- \b symptom: DEC does in fact INC
|
2422 |
|
|
- \b remedy: ALU operand 2 is selected as -1 in pipe stage s5, and then, the
|
2423 |
|
|
DEC-related code does out=op1-op2, which results in out=op1+1. \n
|
2424 |
|
|
Just make the ALU treat INC and DEC the same way (that is, out=op1+op2).
|
2425 |
|
|
- \b status: corrected
|
2426 |
|
|
- bug012
|
2427 |
|
|
- \b symptom: BPU doesn't do its job.
|
2428 |
|
|
- \b remedy: stupid and time costly bug, generated by a (too) quick cut and
|
2429 |
|
|
paste in the BPU code.
|
2430 |
|
|
- \b status: corrected
|
2431 |
|
|
- \b note: Modelsim PE/Plus 5.3a_p1 has a cache problem. After correcting
|
2432 |
|
|
this bug, the same results came after recompiling and restarting the
|
2433 |
|
|
simultation. It was enough to close Modelsim and open the project again
|
2434 |
|
|
for things to go fine. It's not the first time Modelsim behaves this way.
|
2435 |
|
|
- bug013
|
2436 |
|
|
- \b symptom: Z flag is computed wrongly for ALU opcodes that need 8 bit
|
2437 |
|
|
substraction with carry.
|
2438 |
|
|
- \b remedy: Z=Z*oldZ
|
2439 |
|
|
- \b status: corrected
|
2440 |
|
|
- bug014
|
2441 |
|
|
- \b symptom: Z flag is computed wrongly for all ALU opcodes (!).
|
2442 |
|
|
- \b remedy: instead of and-ing the negated bits of output, Z output was
|
2443 |
|
|
computed by and-ing output's bits.
|
2444 |
|
|
- \b status: corrected
|
2445 |
|
|
- bug15
|
2446 |
|
|
- \b symptom: read before write data hazards related to IN instruction
|
2447 |
|
|
- \b remedy: IN doesn't write the Bypass Unit. Do that. Nasty one, requiring
|
2448 |
|
|
new wires and registers.
|
2449 |
|
|
- \b status: corrected
|
2450 |
|
|
- \b note: the shadow manager was completed. Pretty much code, hopefully
|
2451 |
|
|
with no new bugs.
|
2452 |
|
|
- notes:
|
2453 |
|
|
- MOV, INC, DEC, AND, AND, OR, ORI, EOR, COM, NEG, CP, CPC, CPI, SWAP, LSR,
|
2454 |
|
|
ROR, ASR, multiplications (timing-only), BCLR, BSET seem to work.
|
2455 |
|
|
|
2456 |
|
|
\par 21 July 2002
|
2457 |
|
|
- bug008
|
2458 |
|
|
- \b symptom: read before write data hazards.
|
2459 |
|
|
- \b remedy: the Bypass depth was increased from 2 to 3. Design bug.\n
|
2460 |
|
|
*** To update the documentation!
|
2461 |
|
|
- \b status: corrected.
|
2462 |
|
|
- bug009
|
2463 |
|
|
- \b symptom: the 16 bit arithmetic instructions write only the lower byte of
|
2464 |
|
|
the result in the Register File if the next few instructions aren't
|
2465 |
|
|
nops.
|
2466 |
|
|
- \b remedy: 16 bit arithmetic instructions stalled s6. During stalling s6,
|
2467 |
|
|
the Bypass flushed a value that was needed later. A signal was needed
|
2468 |
|
|
that can stall the BPU. Now, the stall s6 requests also stall the BPU.\n
|
2469 |
|
|
Pretty triky design bug.\n
|
2470 |
|
|
*** To update the documentation!\n
|
2471 |
|
|
- \b status: corrected.
|
2472 |
|
|
- bug010
|
2473 |
|
|
- \b symptom: stalls needed by 16 bit arithmetic instructions induce the
|
2474 |
|
|
replacement of the instruction placed 4 clocks later by a nop
|
2475 |
|
|
- \b remedy: shadow registers were assigned, but never used. PM data out,
|
2476 |
|
|
(and consequently, the instruction register) read a nop instead the
|
2477 |
|
|
correct data that was read during the stall. Now the pipeline uses
|
2478 |
|
|
shadow registers related by PM data out.\n
|
2479 |
|
|
*** The other shadow registers (related to DM, RF, IOF and DACU data out)
|
2480 |
|
|
are still unused!\n
|
2481 |
|
|
*** To update the documentation with shadow-related issues!
|
2482 |
|
|
- \b status: corrected.\n
|
2483 |
|
|
- note:
|
2484 |
|
|
- ADD, ADC, ADIW, SUB, SUBI, SBC, SBIW seem to work.
|
2485 |
|
|
|
2486 |
|
|
\par 15 July 2002
|
2487 |
|
|
- bug004
|
2488 |
|
|
- \b remedy: reporting this bug was a bug. The Register File works fine. This
|
2489 |
|
|
bug report was generated by modifying X register (RF addr 27:26) and
|
2490 |
|
|
expecting that RF bulk data (RF addr 0...25) to be modified, which won't
|
2491 |
|
|
happen.
|
2492 |
|
|
- \b status: ok.
|
2493 |
|
|
- bug005
|
2494 |
|
|
- \b remedy: DACU data out was duplicated, with 2 different names: pavr_dacu_do
|
2495 |
|
|
and pavr_s6_dacudo. pavr_dacu_do was only writen, and pavr_s6_dacudo was
|
2496 |
|
|
only read. When RET tried to read the return address from DACU, it got
|
2497 |
|
|
garbage, because it read DACU data out from pavr_s6_dacudo, that was not
|
2498 |
|
|
assigned any value.\n
|
2499 |
|
|
Cut out pavr_s6_dacudo. DACU data out is now unique, for both read an
|
2500 |
|
|
write (that is, pavr_dacu_do). Also, the documentation was updated.
|
2501 |
|
|
- \b status: corrected.
|
2502 |
|
|
- bug006
|
2503 |
|
|
- \b symptom: CALL doesn't work.
|
2504 |
|
|
- \b remedy: in the SP Manager, pavr_s5_calldec_spwr_rq was writen twice, and
|
2505 |
|
|
pavr_s52_calldec_spwr_rq wasn't writen at all, because of a less careful
|
2506 |
|
|
cut-and-paste. As a result, during CALL, PC's lsByte was not stored.
|
2507 |
|
|
- \b status: corrected
|
2508 |
|
|
- bug007
|
2509 |
|
|
- \b symptom: ALU flags are not defined.
|
2510 |
|
|
- \b remedy: ALU flags in was not connected to SREG (zero-level assignment)
|
2511 |
|
|
- \b status: corrected
|
2512 |
|
|
- notes:
|
2513 |
|
|
- RET, CALL seem to work.
|
2514 |
|
|
- pAVR runs its first complete program (12 instructions).
|
2515 |
|
|
|
2516 |
|
|
\par 13 July 2002
|
2517 |
|
|
- bug003
|
2518 |
|
|
- \b symptom: RET is a mess
|
2519 |
|
|
- \b remedy: during nop requests, stall must have higher priority that flush in
|
2520 |
|
|
s2. The Stall Manager (the nop request-related lines) must take care of
|
2521 |
|
|
that.
|
2522 |
|
|
- \b status: corrected
|
2523 |
|
|
- bug004
|
2524 |
|
|
- \b symptom: RF seems to be unable to write other registers than pointer
|
2525 |
|
|
registers.
|
2526 |
|
|
- \b status: NOT corrected!
|
2527 |
|
|
- bug005
|
2528 |
|
|
- \b symptom: RET is still a mess.
|
2529 |
|
|
- \b status: NOT corrected!
|
2530 |
|
|
- bugs pool: 004, 005
|
2531 |
|
|
|
2532 |
|
|
\par 27 June 2002
|
2533 |
|
|
- bug001
|
2534 |
|
|
- \b symptom: read before write data hazards. Hmm, this kind of bugs shouldn't
|
2535 |
|
|
have occured.
|
2536 |
|
|
- \b remedy: LDI didn't update BPU0. Just do that.
|
2537 |
|
|
- \b status: corrected.
|
2538 |
|
|
- bug002
|
2539 |
|
|
- \b symptom: while reading the code, something was smelling bad.
|
2540 |
|
|
- \b remedy: the code that computes the branch/skip conditions was not writen
|
2541 |
|
|
at all.
|
2542 |
|
|
- \b status: corrected.
|
2543 |
|
|
- notes:
|
2544 |
|
|
- The controller has successfully executed its first instruction (a RJMP)!
|
2545 |
|
|
However, it was the only...\n
|
2546 |
|
|
- The kernel seems to be easy to debug thanks to its regular structure.
|
2547 |
|
|
- RJMP, LDI, NOP seem to work.
|
2548 |
|
|
|
2549 |
|
|
\n
|
2550 |
|
|
*/
|
2551 |
|
|
|
2552 |
|
|
|
2553 |
|
|
|
2554 |
|
|
/*!
|
2555 |
|
|
\defgroup pavr_fpga FPGA prototyping
|
2556 |
|
|
\ingroup pavr_test
|
2557 |
|
|
No FPGAs were burned so far. \n
|
2558 |
|
|
\n
|
2559 |
|
|
\n
|
2560 |
|
|
\n
|
2561 |
|
|
*/
|
2562 |
|
|
|
2563 |
|
|
|
2564 |
|
|
/*!
|
2565 |
|
|
\defgroup pavr_src Sources
|
2566 |
|
|
\par Sources
|
2567 |
|
|
The source package contains the following files, in the compiling order:
|
2568 |
|
|
- std_util.vhd
|
2569 |
|
|
- Type conversion routines ofted used throughout the other source files in this
|
2570 |
|
|
project
|
2571 |
|
|
- Basic arithmetic functions
|
2572 |
|
|
- Sign and zero-extend functions
|
2573 |
|
|
- Vector comparision function
|
2574 |
|
|
- pavr_util.vhd
|
2575 |
|
|
- Bypass Unit access function
|
2576 |
|
|
- Interrupt arbitrer function
|
2577 |
|
|
- pavr_constants.vhd
|
2578 |
|
|
- Constants needed by pAVR
|
2579 |
|
|
- When costumizing pAVR, look and modify (seek-and-destroy) here.
|
2580 |
|
|
- pavr_data_mem.vhd
|
2581 |
|
|
- pavr_alu.vhd
|
2582 |
|
|
- pavr_register_file.vhd
|
2583 |
|
|
- pavr_io_file.vhd
|
2584 |
|
|
- pavr_control.vhd
|
2585 |
|
|
- pAVR pipeline (pAVR kernel)
|
2586 |
|
|
|
2587 |
|
|
\par Test sources
|
2588 |
|
|
The test sources in this package implement all the tests presented
|
2589 |
|
|
\ref pavr_test "above". \n
|
2590 |
|
|
The test source package contains the following files:
|
2591 |
|
|
- test_pavr_alu.vhd \n
|
2592 |
|
|
Tests the ALU.
|
2593 |
|
|
-
|
2594 |
|
|
test_pavr_control_interrupts.vhd \n
|
2595 |
|
|
This test is yet to be done.
|
2596 |
|
|
- test_std_util.vhd \n
|
2597 |
|
|
Tests the utilities defined in `std_util.vhd'.
|
2598 |
|
|
- test_pavr_data_mem.vhd \n
|
2599 |
|
|
Tests the Data Memory.
|
2600 |
|
|
- test_pavr_register_file.vhd \n
|
2601 |
|
|
Tests the Register File.
|
2602 |
|
|
- test_pavr_io_file.vhd \n
|
2603 |
|
|
Tests the IO File.
|
2604 |
|
|
- test_pavr_constants.vhd \n
|
2605 |
|
|
Defines constants needed by the main test entity.
|
2606 |
|
|
- test_pavr_util.vhd \n
|
2607 |
|
|
Defines utilities needed by the main test entity.
|
2608 |
|
|
- test_pavr_pm.vhd \n
|
2609 |
|
|
Defines the Program Memory that is needed by the main test entity.
|
2610 |
|
|
- test_pavr.vhd \n
|
2611 |
|
|
Defines the main test entity. \n
|
2612 |
|
|
Tests pAVR as a whole.
|
2613 |
|
|
|
2614 |
|
|
\anchor pavr_src_conv
|
2615 |
|
|
\par Conventions used when writting the VHDL sources
|
2616 |
|
|
The terminology used reflects the data flow. \n
|
2617 |
|
|
For example, `pavr_s4_s6_rfwr_addr1' is assigned in s3 (by the instruction decoder),
|
2618 |
|
|
shifts into `pavr_s5_s6_rfwr_addr1', that finally shifts into
|
2619 |
|
|
`pavr_s6_rfwr_addr1' (terminal register). Only this one carries information
|
2620 |
|
|
actually used by hardware resource managers. This particualr one signalizes
|
2621 |
|
|
an access request to the Register File write port manager. \n
|
2622 |
|
|
\n
|
2623 |
|
|
Process splitting strategy:
|
2624 |
|
|
|
2625 |
|
|
requests to hardware resources are managed by dedicated processes, one
|
2626 |
|
|
VHDL process per hardware resource.
|
2627 |
|
|
a main asynchronous process (instruction decoder) computes values that
|
2628 |
|
|
initialize the pipeline in s3.
|
2629 |
|
|
a main synchronous process assings new values to pipeline registers.
|
2630 |
|
|
|
2631 |
|
|
\todo
|
2632 |
|
|
Replace `next_...' signals family with a (pretty wide) state decoder.
|
2633 |
|
|
|
2634 |
|
|
\par Licensing
|
2635 |
|
|
Please read the \ref pavr_about "licensing terms".
|
2636 |
|
|
\n
|
2637 |
|
|
*/
|
2638 |
|
|
|
2639 |
|
|
|
2640 |
|
|
|
2641 |
|
|
|
2642 |
|
|
/*!
|
2643 |
|
|
\defgroup pavr_ref References
|
2644 |
|
|
\par References
|
2645 |
|
|
Most of the documentation needed for this project was found on Atmel's website,
|
2646 |
|
|
http://www.atmel.com. While working on this project (2002 Q1, Q2), it was
|
2647 |
|
|
available in PDF format, free for downloading. \n
|
2648 |
|
|
\n
|
2649 |
|
|
The specific documents that were used are:
|
2650 |
|
|
|
2651 |
|
|
"AVR Instruction Set", Atmel Corporation
|
2652 |
|
|
Datasheets for the controllers:
|
2653 |
|
|
|
2654 |
|
|
ATtiny28 series
|
2655 |
|
|
AT90S2313
|
2656 |
|
|
AT90S8535
|
2657 |
|
|
ATmega8 series
|
2658 |
|
|
ATmega103 series
|
2659 |
|
|
|
2660 |
|
|
|
2661 |
|
|
\n
|
2662 |
|
|
While designing pAVR's pipeline, I found many interesting ideas in the book
|
2663 |
|
|
"Computer architecture - a quantitative approach", by J. Hennessy and D.
|
2664 |
|
|
Patterson. If you are a processor designer, then this book is for you. \n
|
2665 |
|
|
|
2666 |
|
|
\par Errata
|
2667 |
|
|
A few \ref pavr_test_bugs "bugs" have been found in Atmel's documents.
|
2668 |
|
|
\n
|
2669 |
|
|
\n
|
2670 |
|
|
\n
|
2671 |
|
|
*/
|
2672 |
|
|
|
2673 |
|
|
|
2674 |
|
|
|
2675 |
|
|
/*!
|
2676 |
|
|
\defgroup pavr_thoughts Some final thoughts
|
2677 |
|
|
\par Instead of conclusion...
|
2678 |
|
|
It's relatively easy to design a fast 8 bit controller. All that has to be done
|
2679 |
|
|
is to follow the path well known from the big brothers, the 32 bit
|
2680 |
|
|
controllers. The short story is: analyze what "typical programs" mean,
|
2681 |
|
|
imagine a simple and fast instruction set, and implement it into a deep
|
2682 |
|
|
pipeline (by the way, for this topics, I recommend you "Computer
|
2683 |
|
|
architecture - a quantitative approach", by J. Hennessy and D. Patterson). \n
|
2684 |
|
|
\n
|
2685 |
|
|
Then, why are the 8 bit controllers currently on the market so slow? The
|
2686 |
|
|
instruction set, CPI, max frequency for current 8 bit ucs are bad. In fact,
|
2687 |
|
|
they are so bad, that we must consider other factors than pure uc design to
|
2688 |
|
|
explain that. My guess is that market issues distructively interfere here. How
|
2689 |
|
|
is that, this could be another project's goal... \n
|
2690 |
|
|
\n
|
2691 |
|
|
\n
|
2692 |
|
|
*/
|
2693 |
|
|
|
2694 |
|
|
|
2695 |
|
|
|
2696 |
|
|
/*!
|
2697 |
|
|
\defgroup pavr_about About ...
|
2698 |
|
|
\par Project
|
2699 |
|
|
\b pAVR (pipelined AVR) is an 8 bit RISC controller, compatible with Atmel's
|
2700 |
|
|
AVR core, but about 3x faster in terms of both clock frequency and MIPS. \n
|
2701 |
|
|
The increase in speed comes from a relatively deep pipeline.
|
2702 |
|
|
\par Version
|
2703 |
|
|
0.32
|
2704 |
|
|
\par Date
|
2705 |
|
|
2002 August 07
|
2706 |
|
|
\par Author
|
2707 |
|
|
Doru Cuturela, doruu@yahoo.com \n
|
2708 |
|
|
\par Licensing
|
2709 |
|
|
This program is free software; you can redistribute it and/or modify
|
2710 |
|
|
it under the terms of the GNU General Public License as published by
|
2711 |
|
|
the Free Software Foundation; either version 2 of the License, or
|
2712 |
|
|
(at your option) any later version. \n
|
2713 |
|
|
This program is distributed in the hope that it will be useful,
|
2714 |
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
2715 |
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
2716 |
|
|
GNU General Public License for more details. \n
|
2717 |
|
|
You should have received a copy of the GNU General Public License
|
2718 |
|
|
along with this program; if not, write to the Free Software
|
2719 |
|
|
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
|
2720 |
|
|
|
2721 |
|
|
\par Note
|
2722 |
|
|
The design effort for this project was about 6 months (2002, Feb-Aug), one
|
2723 |
|
|
man working. \n
|
2724 |
|
|
\n
|
2725 |
|
|
*/
|