OpenCores
URL https://opencores.org/ocsvn/cpu_lecture/cpu_lecture/trunk

Subversion Repositories cpu_lecture

[/] [cpu_lecture/] [trunk/] [html/] [03_Pipelining.html] - Blame information for rev 2

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 2 jsauermann
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2
"http://www.w3.org/TR/html4/strict.dtd">
3
<HTML>
4
<HEAD>
5
<TITLE>html/Pipelining</TITLE>
6
<META NAME="generator" CONTENT="HTML::TextToHTML v2.46">
7
<LINK REL="stylesheet" TYPE="text/css" HREF="lecture.css">
8
</HEAD>
9
<BODY>
10
<P><table class="ttop"><th class="tpre"><a href="02_Top_Level.html">Previous Lesson</a></th><th class="ttop"><a href="toc.html">Table of Content</a></th><th class="tnxt"><a href="04_Cpu_Core.html">Next Lesson</a></th></table>
11
<hr>
12
 
13
<H1><A NAME="section_1">3 DIGRESSION: PIPELINING</A></H1>
14
 
15
<P>In this short lesson we will give a brief overview of a design technique
16
known as pipelining. Most readers will already be familiar with it; those
17
readers should take a day off or proceed to the next lesson.
18
 
19
<P>Assume we have a piece of combinational logic that happens to have a
20
long propagation delay even in its fastest implementation. The long delay
21
is then caused by the slowest path through the logic, which will run
22
through either many fast elements (like gates) or a number of slower
23
elements (likes adders or multipliers), or both.
24
 
25
<P>That is the situation where you should use pipelining. We will explain
26
it by an example. Consider the circuit shown in the following figure.
27
 
28
<P><br>
29
 
30
<P><img src="pipelining_1.png">
31
 
32
<P><br>
33
 
34
<P>The circuit is a sequential logic which consists of 3 combinational
35
functions f1, f2, and f3 and a flip-flop at the output of f3.
36
 
37
<P>Let t1, t2, and t3 be the respective propagation delays of f1, f2, and f3.
38
Assume that the slowest path of the combinational logic runs from the
39
upper input of f1 towards the output of f3. Then the total delay of
40
the combinational is t = t1 + t2 + t3. The entire circuit cannot be
41
clocked faster than with frequency 1/t.
42
 
43
<P>Now pipelining is a technique that slightly increases the delay of a
44
combinational circuit, but thereby allows different parts of the logic
45
at the same time. The slight increase in total propagation delay is more
46
than compensated by a much higher throughput.
47
 
48
<P>Pipelining divides a complex combinational logic with an accordingly long
49
delay into a number of stages and places flip-flops between the stages as
50
shown in the next figure.
51
 
52
<P><br>
53
 
54
<P><img src="pipelining_2.png">
55
 
56
<P><br>
57
 
58
<P>The slowest path is now max(t1, t2, t3) and the new circuit can be clocked
59
with frequency 1/max(t1, t2, t3) instead of 1/(t1 + t2 + t3). If the
60
functions f1, f2, and f3 had equal propagation delays, then the max.
61
frequency of the new circuit would have tripled compared to the old circuit.
62
 
63
<P>It is generally a good idea when using pipelining to divide the
64
combinational logic that shall be pipelined into pieces with similar delay.
65
Another aspect is to divide the combinational logic at  places where the
66
number of connections between the pieces is small since this reduces the
67
number of flip-flops that are being inserted.
68
 
69
<P>The first design of the CPU described in this lecture had the opcode decoding
70
logic (which is combinational) and the data path logic combined. That design
71
had a worst path delay of over 50 ns (and hence a max. frequency of less
72
than 20 MHz). After splitting of the opcode decoder, the worst path delay
73
was below 30 ns which allows for a frequency of 33 MHz. We could have
74
divides the pipeline into even more stages (and thereby increasing the
75
max. frequency even further). This would, however, have obscured the design
76
so we did not do it.
77
 
78
<P>The reason for the improved throughput is that the different stages of a
79
pipeline work in parallel while without pipelining the entire logic would
80
be occupied by a single operation. In a pipeline the single operation is
81
often displayed like this (one color = one operation).
82
 
83
<P><br>
84
 
85
<P><img src="pipelining_3.png">
86
 
87
<P><br>
88
 
89
<P>This kind of diagram shows how an operation is distributed over the
90
different stages over time.
91
 
92
<P>To summarize, pipelining typically results in:
93
 
94
<UL>
95
  <LI>a slightly more complex design,
96
  <LI>a moderately longer total delay, and
97
  <LI>a considerable improvement in throughput.
98
</UL>
99
<P><hr><BR>
100
<table class="ttop"><th class="tpre"><a href="02_Top_Level.html">Previous Lesson</a></th><th class="ttop"><a href="toc.html">Table of Content</a></th><th class="tnxt"><a href="04_Cpu_Core.html">Next Lesson</a></th></table>
101
</BODY>
102
</HTML>

powered by: WebSVN 2.1.0

© copyright 1999-2024 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.