URL
https://opencores.org/ocsvn/cpu_lecture/cpu_lecture/trunk
Subversion Repositories cpu_lecture
[/] [cpu_lecture/] [trunk/] [html/] [03_Pipelining.html] - Rev 21
Go to most recent revision | Compare with Previous | Blame | View Log
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <HTML> <HEAD> <TITLE>html/Pipelining</TITLE> <META NAME="generator" CONTENT="HTML::TextToHTML v2.46"> <LINK REL="stylesheet" TYPE="text/css" HREF="lecture.css"> </HEAD> <BODY> <P><table class="ttop"><th class="tpre"><a href="02_Top_Level.html">Previous Lesson</a></th><th class="ttop"><a href="toc.html">Table of Content</a></th><th class="tnxt"><a href="04_Cpu_Core.html">Next Lesson</a></th></table> <hr> <H1><A NAME="section_1">3 DIGRESSION: PIPELINING</A></H1> <P>In this short lesson we will give a brief overview of a design technique known as pipelining. Most readers will already be familiar with it; those readers should take a day off or proceed to the next lesson. <P>Assume we have a piece of combinational logic that happens to have a long propagation delay even in its fastest implementation. The long delay is then caused by the slowest path through the logic, which will run through either many fast elements (like gates) or a number of slower elements (likes adders or multipliers), or both. <P>That is the situation where you should use pipelining. We will explain it by an example. Consider the circuit shown in the following figure. <P><br> <P><img src="pipelining_1.png"> <P><br> <P>The circuit is a sequential logic which consists of 3 combinational functions f1, f2, and f3 and a flip-flop at the output of f3. <P>Let t1, t2, and t3 be the respective propagation delays of f1, f2, and f3. Assume that the slowest path of the combinational logic runs from the upper input of f1 towards the output of f3. Then the total delay of the combinational is t = t1 + t2 + t3. The entire circuit cannot be clocked faster than with frequency 1/t. <P>Now pipelining is a technique that slightly increases the delay of a combinational circuit, but thereby allows different parts of the logic at the same time. The slight increase in total propagation delay is more than compensated by a much higher throughput. <P>Pipelining divides a complex combinational logic with an accordingly long delay into a number of stages and places flip-flops between the stages as shown in the next figure. <P><br> <P><img src="pipelining_2.png"> <P><br> <P>The slowest path is now max(t1, t2, t3) and the new circuit can be clocked with frequency 1/max(t1, t2, t3) instead of 1/(t1 + t2 + t3). If the functions f1, f2, and f3 had equal propagation delays, then the max. frequency of the new circuit would have tripled compared to the old circuit. <P>It is generally a good idea when using pipelining to divide the combinational logic that shall be pipelined into pieces with similar delay. Another aspect is to divide the combinational logic at places where the number of connections between the pieces is small since this reduces the number of flip-flops that are being inserted. <P>The first design of the CPU described in this lecture had the opcode decoding logic (which is combinational) and the data path logic combined. That design had a worst path delay of over 50 ns (and hence a max. frequency of less than 20 MHz). After splitting of the opcode decoder, the worst path delay was below 30 ns which allows for a frequency of 33 MHz. We could have divides the pipeline into even more stages (and thereby increasing the max. frequency even further). This would, however, have obscured the design so we did not do it. <P>The reason for the improved throughput is that the different stages of a pipeline work in parallel while without pipelining the entire logic would be occupied by a single operation. In a pipeline the single operation is often displayed like this (one color = one operation). <P><br> <P><img src="pipelining_3.png"> <P><br> <P>This kind of diagram shows how an operation is distributed over the different stages over time. <P>To summarize, pipelining typically results in: <UL> <LI>a slightly more complex design, <LI>a moderately longer total delay, and <LI>a considerable improvement in throughput. </UL> <P><hr><BR> <table class="ttop"><th class="tpre"><a href="02_Top_Level.html">Previous Lesson</a></th><th class="ttop"><a href="toc.html">Table of Content</a></th><th class="tnxt"><a href="04_Cpu_Core.html">Next Lesson</a></th></table> </BODY> </HTML>
Go to most recent revision | Compare with Previous | Blame | View Log