1 |
2 |
jsauermann |
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
2 |
|
|
"http://www.w3.org/TR/html4/strict.dtd">
|
3 |
|
|
<HTML>
|
4 |
|
|
<HEAD>
|
5 |
|
|
<TITLE>html/Pipelining</TITLE>
|
6 |
|
|
<META NAME="generator" CONTENT="HTML::TextToHTML v2.46">
|
7 |
|
|
<LINK REL="stylesheet" TYPE="text/css" HREF="lecture.css">
|
8 |
|
|
</HEAD>
|
9 |
|
|
<BODY>
|
10 |
|
|
<P><table class="ttop"><th class="tpre"><a href="02_Top_Level.html">Previous Lesson</a></th><th class="ttop"><a href="toc.html">Table of Content</a></th><th class="tnxt"><a href="04_Cpu_Core.html">Next Lesson</a></th></table>
|
11 |
|
|
<hr>
|
12 |
|
|
|
13 |
|
|
<H1><A NAME="section_1">3 DIGRESSION: PIPELINING</A></H1>
|
14 |
|
|
|
15 |
|
|
<P>In this short lesson we will give a brief overview of a design technique
|
16 |
|
|
known as pipelining. Most readers will already be familiar with it; those
|
17 |
|
|
readers should take a day off or proceed to the next lesson.
|
18 |
|
|
|
19 |
|
|
<P>Assume we have a piece of combinational logic that happens to have a
|
20 |
|
|
long propagation delay even in its fastest implementation. The long delay
|
21 |
|
|
is then caused by the slowest path through the logic, which will run
|
22 |
|
|
through either many fast elements (like gates) or a number of slower
|
23 |
|
|
elements (likes adders or multipliers), or both.
|
24 |
|
|
|
25 |
|
|
<P>That is the situation where you should use pipelining. We will explain
|
26 |
|
|
it by an example. Consider the circuit shown in the following figure.
|
27 |
|
|
|
28 |
|
|
<P><br>
|
29 |
|
|
|
30 |
|
|
<P><img src="pipelining_1.png">
|
31 |
|
|
|
32 |
|
|
<P><br>
|
33 |
|
|
|
34 |
|
|
<P>The circuit is a sequential logic which consists of 3 combinational
|
35 |
|
|
functions f1, f2, and f3 and a flip-flop at the output of f3.
|
36 |
|
|
|
37 |
|
|
<P>Let t1, t2, and t3 be the respective propagation delays of f1, f2, and f3.
|
38 |
|
|
Assume that the slowest path of the combinational logic runs from the
|
39 |
|
|
upper input of f1 towards the output of f3. Then the total delay of
|
40 |
|
|
the combinational is t = t1 + t2 + t3. The entire circuit cannot be
|
41 |
|
|
clocked faster than with frequency 1/t.
|
42 |
|
|
|
43 |
|
|
<P>Now pipelining is a technique that slightly increases the delay of a
|
44 |
|
|
combinational circuit, but thereby allows different parts of the logic
|
45 |
|
|
at the same time. The slight increase in total propagation delay is more
|
46 |
|
|
than compensated by a much higher throughput.
|
47 |
|
|
|
48 |
|
|
<P>Pipelining divides a complex combinational logic with an accordingly long
|
49 |
|
|
delay into a number of stages and places flip-flops between the stages as
|
50 |
|
|
shown in the next figure.
|
51 |
|
|
|
52 |
|
|
<P><br>
|
53 |
|
|
|
54 |
|
|
<P><img src="pipelining_2.png">
|
55 |
|
|
|
56 |
|
|
<P><br>
|
57 |
|
|
|
58 |
|
|
<P>The slowest path is now max(t1, t2, t3) and the new circuit can be clocked
|
59 |
|
|
with frequency 1/max(t1, t2, t3) instead of 1/(t1 + t2 + t3). If the
|
60 |
|
|
functions f1, f2, and f3 had equal propagation delays, then the max.
|
61 |
|
|
frequency of the new circuit would have tripled compared to the old circuit.
|
62 |
|
|
|
63 |
|
|
<P>It is generally a good idea when using pipelining to divide the
|
64 |
|
|
combinational logic that shall be pipelined into pieces with similar delay.
|
65 |
|
|
Another aspect is to divide the combinational logic at places where the
|
66 |
|
|
number of connections between the pieces is small since this reduces the
|
67 |
|
|
number of flip-flops that are being inserted.
|
68 |
|
|
|
69 |
|
|
<P>The first design of the CPU described in this lecture had the opcode decoding
|
70 |
|
|
logic (which is combinational) and the data path logic combined. That design
|
71 |
|
|
had a worst path delay of over 50 ns (and hence a max. frequency of less
|
72 |
|
|
than 20 MHz). After splitting of the opcode decoder, the worst path delay
|
73 |
|
|
was below 30 ns which allows for a frequency of 33 MHz. We could have
|
74 |
|
|
divides the pipeline into even more stages (and thereby increasing the
|
75 |
|
|
max. frequency even further). This would, however, have obscured the design
|
76 |
|
|
so we did not do it.
|
77 |
|
|
|
78 |
|
|
<P>The reason for the improved throughput is that the different stages of a
|
79 |
|
|
pipeline work in parallel while without pipelining the entire logic would
|
80 |
|
|
be occupied by a single operation. In a pipeline the single operation is
|
81 |
|
|
often displayed like this (one color = one operation).
|
82 |
|
|
|
83 |
|
|
<P><br>
|
84 |
|
|
|
85 |
|
|
<P><img src="pipelining_3.png">
|
86 |
|
|
|
87 |
|
|
<P><br>
|
88 |
|
|
|
89 |
|
|
<P>This kind of diagram shows how an operation is distributed over the
|
90 |
|
|
different stages over time.
|
91 |
|
|
|
92 |
|
|
<P>To summarize, pipelining typically results in:
|
93 |
|
|
|
94 |
|
|
<UL>
|
95 |
|
|
<LI>a slightly more complex design,
|
96 |
|
|
<LI>a moderately longer total delay, and
|
97 |
|
|
<LI>a considerable improvement in throughput.
|
98 |
|
|
</UL>
|
99 |
|
|
<P><hr><BR>
|
100 |
|
|
<table class="ttop"><th class="tpre"><a href="02_Top_Level.html">Previous Lesson</a></th><th class="ttop"><a href="toc.html">Table of Content</a></th><th class="tnxt"><a href="04_Cpu_Core.html">Next Lesson</a></th></table>
|
101 |
|
|
</BODY>
|
102 |
|
|
</HTML>
|