URL https://opencores.org/ocsvn/openarty/openarty/trunk

Subversion Repositories openarty

[/] [openarty/] [trunk/] [rtl/] [cpu/] [cpudefs.v] - Blame information for rev 50

Details | Compare with Previous | View Log


////////////////////////////////////////////////////////////////////////////////
//
// Filename:    cpudefs.v
//
// Project:     OpenArty, an entirely open SoC based upon the Arty platform
//
// Purpose:     Some architectures have some needs, others have other needs.
//              Some of my projects need a Zip CPU with pipelining, others
//      can't handle the timing required to get the answer from the ALU
//      back into the input for the ALU.  As each different projects has
//      different needs, I can either 1) reconfigure my entire baseline prior
//      to building each project, or 2) host a configuration file which contains
//      the information regarding each baseline.  This file is that
//      configuration file.  It controls how the CPU (not the system,
//      peripherals, or other) is defined and implemented.  Several options
//      are available within here, making the Zip CPU pipelined or not,
//      able to handle a faster clock with more stalls or a slower clock with
//      no stalls, etc.
//
//      This file encapsulates those control options.
//
//      The number of LUTs the Zip CPU uses varies dramatically with the
//      options defined in this file.
//
//
// OpenArty comments:
//      My goal on the OpenArty is going to be using the CPU to its fullest
//      extent.  All features shall be turned on if they exist, full pipelines,
//      multiplies, divides, and hopefully even the 162MHz clock.  This file
//      reflects that purpose.
//
//
// Creator:     Dan Gisselquist, Ph.D.
//              Gisselquist Technology, LLC
//
////////////////////////////////////////////////////////////////////////////////
//
// Copyright (C) 2015-2016, Gisselquist Technology, LLC
//
// This program is free software (firmware): you can redistribute it and/or
// modify it under the terms of  the GNU General Public License as published
// by the Free Software Foundation, either version 3 of the License, or (at
// your option) any later version.
//
// This program is distributed in the hope that it will be useful, but WITHOUT
// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or
// FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
// for more details.
//
// You should have received a copy of the GNU General Public License along
// with this program.  (It's in the $(ROOT)/doc directory.  Run make with no
// target there if the PDF file isn't present.)  If not, see
// <http://www.gnu.org/licenses/> for a copy.
//
// License:     GPL, v3, as defined and found on www.gnu.org,
//              http://www.gnu.org/licenses/gpl.html
//
//
////////////////////////////////////////////////////////////////////////////////
//
//
`ifndef CPUDEFS_H
`define CPUDEFS_H
//
//
// The first couple options control the Zip CPU instruction set, and how
// it handles various instructions within the set:
//
//
// OPT_ILLEGAL_INSTRUCTION is part of a new section of code that is supposed
// to recognize illegal instructions and interrupt the CPU whenever one such
// instruction is encountered.  The goal is to create a soft floating point
// unit via this approach, that can then be replaced with a true floating point
// unit.  As I'm not there yet, it just catches illegal instructions and
// interrupts the CPU on any such instruction--when defined.  Otherwise,
// illegal instructions are quietly ignored and their behaviour is ...
// undefined. (Many get treated like NOOPs ...)
//
// I recommend setting this flag so highly, that I'm likely going to remove
// the option to turn this off in future versions of this CPU.
//
`define OPT_ILLEGAL_INSTRUCTION
//
//
//
// OPT_MULTIPLY controls whether or not the multiply is built and included
// in the ALU by default.  Set this option and a parameter will be set that
// includes the multiply.  (This parameter may still be overridden, as with
// any parameter ...)  If the multiply is not included and
// OPT_ILLEGAL_INSTRUCTION is set, then the multiply will create an illegal
// instruction that will then trip the illegal instruction trap.
//
// Either not defining this value, or defining it to zero will disable the
// hardware multiply.  A value of '1' will cause the multiply to occurr in one
// clock cycle only--often at the expense of the rest of the CPUs speed.
// A value of 2 will cause the multiply to have a single delay cycle, 3 will
// have two delay cycles, and 4 (or more) will have 3 delay cycles.
//
//
`define OPT_MULTIPLY    3
//
//
//
// OPT_DIVIDE controls whether or not the divide instruction is built and
// included into the ZipCPU by default.  Set this option and a parameter will
// be set that causes the divide unit to be included.  (This parameter may
// still be overridden, as with any parameter ...)  If the divide is not
// included and OPT_ILLEGAL_INSTRUCTION is set, then the multiply will create
// an illegal instruction exception that will send the CPU into supervisor
// mode.
//
//
`define OPT_DIVIDE
//
//
//
// OPT_IMPLEMENT_FPU will (one day) control whether or not the floating point
// unit (once I have one) is built and included into the ZipCPU by default.
// At that time, if this option is set then a parameter will be set that
// causes the floating point unit to be included.  (This parameter may
// still be overridden, as with any parameter ...)  If the floating point unit
// is not included and OPT_ILLEGAL_INSTRUCTION is set, then as with the
// multiply and divide any floating point instruction will result in an illegal
// instruction exception that will send the CPU into supervisor mode.
//
//
// `define      OPT_IMPLEMENT_FPU
//
//
//
//
// OPT_SINGLE_FETCH controls whether or not the prefetch has a cache, and
// whether or not it can issue one instruction per clock.  When set, the
// prefetch has no cache, and only one instruction is fetched at a time.
// This effectively sets the CPU so that only one instruction is ever
// in the pipeline at once, and hence you may think of this as a "kill
// pipeline" option.  However, since the pipelined fetch component uses so
// much area on the FPGA, this is an important option to use in trimming down
// used area if necessary.  Hence, it needs to be maintained for that purpose.
// Be aware, though, it will drop your performance by a factor between 2x and
// 3x.
//
// We can either pipeline our fetches, or issue one fetch at a time.  Pipelined
// fetches are more complicated and therefore use more FPGA resources, while
// single fetches will cause the CPU to stall for about 5 stalls each
// instruction cycle, effectively reducing the instruction count per clock to
// about 0.2.  However, the area cost may be worth it.  Consider:
//
//      Slice LUTs              ZipSystem       ZipCPU
//      Single Fetching         2521            1734
//      Pipelined fetching      2796            2046
//      (These numbers may be dated, but should still be representative ...)
//
// I recommend only defining this if you "need" to, if area is tight and
// speed isn't as important.  Otherwise, just leave this undefined.
//
// `define      OPT_SINGLE_FETCH
//
//
//
// The next several options are pipeline optimization options.  They make no
// sense in a single instruction fetch mode, hence we #ifndef them so they
// are only defined if we are in a full pipelined mode (i.e. OPT_SINGLE_FETCH
// is not defined).
//
`ifndef OPT_SINGLE_FETCH
//
//
//
// OPT_PIPELINED is the natural result and opposite of using the single
// instruction fetch unit.  If you are not using that unit, the ZipCPU will
// be pipelined.  The option is defined here more for readability than
// anything else, since OPT_PIPELINED makes more sense than OPT_SINGLE_FETCH,
// well ... that and it does a better job of explaining what is going on.
//
// In other words, leave this define alone--lest you break the ZipCPU.
//
`define OPT_PIPELINED
//
//
//
// OPT_TRADITIONAL_PFCACHE allows you to switch between one of two prefetch
// caches.  If enabled, a more traditional cache is implemented.  This more
// traditional cache (currently) uses many more LUTs, but it also reduces
// the stall count tremendously over the alternative hacked pipeline cache.
// (The traditional pfcache is also pipelined, whereas the pipeline cache
// implements a windowed approach to caching.)
//
// If you have the fabric to support this option, I recommend including it.
//
`define OPT_TRADITIONAL_PFCACHE
//
//
//
// OPT_EARLY_BRANCHING is an attempt to execute a BRA statement as early
// as possible, to avoid as many pipeline stalls on a branch as possible.
// It's not tremendously successful yet--BRA's still suffer stalls,
// but I intend to keep working on this approach until the number of stalls
// gets down to one or (ideally) zero.  (With the OPT_TRADITIONAL_PFCACHE, this
// gets down to a single stall cycle ...)  That way a "BRA" can be used as the
// compiler's branch prediction optimizer: BRA's barely stall, while branches
// on conditions will always suffer about 4 stall cycles or so.
//
// I recommend setting this flag, so as to turn early branching on.
//
`define OPT_EARLY_BRANCHING
//
//
//
// OPT_PIPELINED_BUS_ACCESS controls whether or not LOD/STO instructions
// can take advantaged of pipelined bus instructions.  To be eligible, the
// operations must be identical (cannot pipeline loads and stores, just loads
// only or stores only), and the addresses must either be identical or one up
// from the previous address.  Further, the load/store string must all have
// the same conditional.  This approach gains the must use, in my humble
// opinion, when saving registers to or restoring registers from the stack
// at the beginning/end of a procedure, or when doing a context swap.
//
// I recommend setting this flag, for performance reasons, especially if your
// wishbone bus can handle pipelined bus accesses.
//
`define OPT_PIPELINED_BUS_ACCESS
//
//
//
//
//
// The instruction set defines an optional compressed instruction set (CIS)
// complement.  These were at one time erroneously called Very Long Instruction
// Words.  They are more appropriately referred to as compressed instructions.
// The compressed instruction format allows two instructions to be packed into
// the same instruction word.  Some instructions can be compressed, not all.
// Compressed instructions take the same time to complete.  Set OPT_CIS to
// include these double instructions as part of the instruction set.  These
// instructions are designed to get more code density from the instruction set,
// and to hopefully take some pain off of the performance of the pre-fetch and
// instruction cache.
//
// These new instructions, however, also necessitate a change in the Zip
// CPU--the Zip CPU can no longer execute instructions atomically.  It must
// now execute non-CIS instructions, or CIS instruction pairs, atomically. 
// This logic has been added into the ZipCPU, but it has not (yet) been
// tested thoroughly.
//
//
`define OPT_CIS
//
//
//
`endif  // OPT_SINGLE_FETCH
//
//
//
// Now let's talk about peripherals for a moment.  These next two defines
// control whether the DMA controller is included in the Zip System, and
// whether or not the 8 accounting timers are also included.  Set these to
// include the respective peripherals, comment them out not to.
//
`define INCLUDE_DMA_CONTROLLER
`define INCLUDE_ACCOUNTING_COUNTERS
//
//
`define DEBUG_SCOPE
//
// The following is experimental:
// `define      OPT_NO_USERMODE // Savings: about 143 LUTs or so
//
`endif  // CPUDEFS_H

Line No.	Rev	Author	Line
1	50	dgisselq	`////////////////////////////////////////////////////////////////////////////////`
2	3	dgisselq	`//`
3			`// Filename: cpudefs.v`
4			`//`
5			`// Project: OpenArty, an entirely open SoC based upon the Arty platform`
6			`//`
7			`// Purpose: Some architectures have some needs, others have other needs.`
8			`// Some of my projects need a Zip CPU with pipelining, others`
9			`// can't handle the timing required to get the answer from the ALU`
10			`// back into the input for the ALU. As each different projects has`
11			`// different needs, I can either 1) reconfigure my entire baseline prior`
12			`// to building each project, or 2) host a configuration file which contains`
13			`// the information regarding each baseline. This file is that`
14			`// configuration file. It controls how the CPU (not the system,`
15			`// peripherals, or other) is defined and implemented. Several options`
16			`// are available within here, making the Zip CPU pipelined or not,`
17			`// able to handle a faster clock with more stalls or a slower clock with`
18			`// no stalls, etc.`
19			`//`
20			`// This file encapsulates those control options.`
21			`//`
22			`// The number of LUTs the Zip CPU uses varies dramatically with the`
23			`// options defined in this file.`
24			`//`
25			`//`
26			`// OpenArty comments:`
27			`// My goal on the OpenArty is going to be using the CPU to its fullest`
28			`// extent. All features shall be turned on if they exist, full pipelines,`
29	42	dgisselq	`// multiplies, divides, and hopefully even the 162MHz clock. This file`
30	3	dgisselq	`// reflects that purpose.`
31			`//`
32			`//`
33			`// Creator: Dan Gisselquist, Ph.D.`
34			`// Gisselquist Technology, LLC`
35			`//`
36	50	dgisselq	`////////////////////////////////////////////////////////////////////////////////`
37	3	dgisselq	`//`
38			`// Copyright (C) 2015-2016, Gisselquist Technology, LLC`
39			`//`
40			`// This program is free software (firmware): you can redistribute it and/or`
41			`// modify it under the terms of the GNU General Public License as published`
42			`// by the Free Software Foundation, either version 3 of the License, or (at`
43			`// your option) any later version.`
44			`//`
45			`// This program is distributed in the hope that it will be useful, but WITHOUT`
46			`// ANY WARRANTY; without even the implied warranty of MERCHANTIBILITY or`
47			`// FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License`
48			`// for more details.`
49			`//`
50	50	dgisselq	`// You should have received a copy of the GNU General Public License along`
51			`// with this program. (It's in the $(ROOT)/doc directory. Run make with no`
52			`// target there if the PDF file isn't present.) If not, see`
53			`// <http://www.gnu.org/licenses/> for a copy.`
54			`//`
55	3	dgisselq	`// License: GPL, v3, as defined and found on www.gnu.org,`
56			`// http://www.gnu.org/licenses/gpl.html`
57			`//`
58			`//`
59	50	dgisselq	`////////////////////////////////////////////////////////////////////////////////`
60			`//`
61			`//`
62	3	dgisselq	`ifndef CPUDEFS_H
63			`define CPUDEFS_H
64			`//`
65			`//`
66			`// The first couple options control the Zip CPU instruction set, and how`
67			`// it handles various instructions within the set:`
68			`//`
69			`//`
70			`// OPT_ILLEGAL_INSTRUCTION is part of a new section of code that is supposed`
71			`// to recognize illegal instructions and interrupt the CPU whenever one such`
72			`// instruction is encountered. The goal is to create a soft floating point`
73			`// unit via this approach, that can then be replaced with a true floating point`
74			`// unit. As I'm not there yet, it just catches illegal instructions and`
75			`// interrupts the CPU on any such instruction--when defined. Otherwise,`
76			`// illegal instructions are quietly ignored and their behaviour is ...`
77			`// undefined. (Many get treated like NOOPs ...)`
78			`//`
79	32	dgisselq	`// I recommend setting this flag so highly, that I'm likely going to remove`
80			`// the option to turn this off in future versions of this CPU.`
81	3	dgisselq	`//`
82			`define OPT_ILLEGAL_INSTRUCTION
83			`//`
84			`//`
85			`//`
86			`// OPT_MULTIPLY controls whether or not the multiply is built and included`
87			`// in the ALU by default. Set this option and a parameter will be set that`
88			`// includes the multiply. (This parameter may still be overridden, as with`
89			`// any parameter ...) If the multiply is not included and`
90			`// OPT_ILLEGAL_INSTRUCTION is set, then the multiply will create an illegal`
91			`// instruction that will then trip the illegal instruction trap.`
92			`//`
93	42	dgisselq	`// Either not defining this value, or defining it to zero will disable the`
94			`// hardware multiply. A value of '1' will cause the multiply to occurr in one`
95			`// clock cycle only--often at the expense of the rest of the CPUs speed.`
96			`// A value of 2 will cause the multiply to have a single delay cycle, 3 will`
97			`// have two delay cycles, and 4 (or more) will have 3 delay cycles.`
98	3	dgisselq	`//`
99			`//`
100	42	dgisselq	`define OPT_MULTIPLY 3
101	3	dgisselq	`//`
102			`//`
103	42	dgisselq	`//`
104	3	dgisselq	`// OPT_DIVIDE controls whether or not the divide instruction is built and`
105			`// included into the ZipCPU by default. Set this option and a parameter will`
106			`// be set that causes the divide unit to be included. (This parameter may`
107			`// still be overridden, as with any parameter ...) If the divide is not`
108			`// included and OPT_ILLEGAL_INSTRUCTION is set, then the multiply will create`
109			`// an illegal instruction exception that will send the CPU into supervisor`
110			`// mode.`
111			`//`
112			`//`
113	30	dgisselq	`define OPT_DIVIDE
114	3	dgisselq	`//`
115			`//`
116			`//`
117			`// OPT_IMPLEMENT_FPU will (one day) control whether or not the floating point`
118	50	dgisselq	`// unit (once I have one) is built and included into the ZipCPU by default.`
119	3	dgisselq	`// At that time, if this option is set then a parameter will be set that`
120			`// causes the floating point unit to be included. (This parameter may`
121			`// still be overridden, as with any parameter ...) If the floating point unit`
122			`// is not included and OPT_ILLEGAL_INSTRUCTION is set, then as with the`
123			`// multiply and divide any floating point instruction will result in an illegal`
124			`// instruction exception that will send the CPU into supervisor mode.`
125			`//`
126			`//`
127			// `define OPT_IMPLEMENT_FPU
128			`//`
129			`//`
130			`//`
131			`//`
132	50	dgisselq	`// OPT_SINGLE_FETCH controls whether or not the prefetch has a cache, and`
133	3	dgisselq	`// whether or not it can issue one instruction per clock. When set, the`
134			`// prefetch has no cache, and only one instruction is fetched at a time.`
135	50	dgisselq	`// This effectively sets the CPU so that only one instruction is ever`
136			`// in the pipeline at once, and hence you may think of this as a "kill`
137	3	dgisselq	`// pipeline" option. However, since the pipelined fetch component uses so`
138			`// much area on the FPGA, this is an important option to use in trimming down`
139			`// used area if necessary. Hence, it needs to be maintained for that purpose.`
140			`// Be aware, though, it will drop your performance by a factor between 2x and`
141			`// 3x.`
142			`//`
143			`// We can either pipeline our fetches, or issue one fetch at a time. Pipelined`
144			`// fetches are more complicated and therefore use more FPGA resources, while`
145	50	dgisselq	`// single fetches will cause the CPU to stall for about 5 stalls each`
146	3	dgisselq	`// instruction cycle, effectively reducing the instruction count per clock to`
147			`// about 0.2. However, the area cost may be worth it. Consider:`
148			`//`
149			`// Slice LUTs ZipSystem ZipCPU`
150			`// Single Fetching 2521 1734`
151			`// Pipelined fetching 2796 2046`
152			`// (These numbers may be dated, but should still be representative ...)`
153			`//`
154			`// I recommend only defining this if you "need" to, if area is tight and`
155			`// speed isn't as important. Otherwise, just leave this undefined.`
156			`//`
157			// `define OPT_SINGLE_FETCH
158			`//`
159			`//`
160			`//`
161			`// The next several options are pipeline optimization options. They make no`
162			`// sense in a single instruction fetch mode, hence we #ifndef them so they`
163			`// are only defined if we are in a full pipelined mode (i.e. OPT_SINGLE_FETCH`
164			`// is not defined).`
165			`//`
166			`ifndef OPT_SINGLE_FETCH
167			`//`
168			`//`
169			`//`
170	50	dgisselq	`// OPT_PIPELINED is the natural result and opposite of using the single`
171	3	dgisselq	`// instruction fetch unit. If you are not using that unit, the ZipCPU will`
172	50	dgisselq	`// be pipelined. The option is defined here more for readability than`
173	3	dgisselq	`// anything else, since OPT_PIPELINED makes more sense than OPT_SINGLE_FETCH,`
174			`// well ... that and it does a better job of explaining what is going on.`
175			`//`
176			`// In other words, leave this define alone--lest you break the ZipCPU.`
177			`//`
178			`define OPT_PIPELINED
179			`//`
180			`//`
181			`//`
182			`// OPT_TRADITIONAL_PFCACHE allows you to switch between one of two prefetch`
183			`// caches. If enabled, a more traditional cache is implemented. This more`
184			`// traditional cache (currently) uses many more LUTs, but it also reduces`
185			`// the stall count tremendously over the alternative hacked pipeline cache.`
186			`// (The traditional pfcache is also pipelined, whereas the pipeline cache`
187			`// implements a windowed approach to caching.)`
188			`//`
189			`// If you have the fabric to support this option, I recommend including it.`
190			`//`
191			`define OPT_TRADITIONAL_PFCACHE
192			`//`
193			`//`
194			`//`
195			`// OPT_EARLY_BRANCHING is an attempt to execute a BRA statement as early`
196			`// as possible, to avoid as many pipeline stalls on a branch as possible.`
197			`// It's not tremendously successful yet--BRA's still suffer stalls,`
198			`// but I intend to keep working on this approach until the number of stalls`
199			`// gets down to one or (ideally) zero. (With the OPT_TRADITIONAL_PFCACHE, this`
200			`// gets down to a single stall cycle ...) That way a "BRA" can be used as the`
201			`// compiler's branch prediction optimizer: BRA's barely stall, while branches`
202			`// on conditions will always suffer about 4 stall cycles or so.`
203			`//`
204			`// I recommend setting this flag, so as to turn early branching on.`
205			`//`
206			`define OPT_EARLY_BRANCHING
207			`//`
208			`//`
209			`//`
210			`// OPT_PIPELINED_BUS_ACCESS controls whether or not LOD/STO instructions`
211			`// can take advantaged of pipelined bus instructions. To be eligible, the`
212			`// operations must be identical (cannot pipeline loads and stores, just loads`
213			`// only or stores only), and the addresses must either be identical or one up`
214			`// from the previous address. Further, the load/store string must all have`
215			`// the same conditional. This approach gains the must use, in my humble`
216			`// opinion, when saving registers to or restoring registers from the stack`
217			`// at the beginning/end of a procedure, or when doing a context swap.`
218			`//`
219			`// I recommend setting this flag, for performance reasons, especially if your`
220			`// wishbone bus can handle pipelined bus accesses.`
221			`//`
222			`define OPT_PIPELINED_BUS_ACCESS
223			`//`
224			`//`
225			`//`
226			`//`
227			`//`
228	50	dgisselq	`// The instruction set defines an optional compressed instruction set (CIS)`
229			`// complement. These were at one time erroneously called Very Long Instruction`
230			`// Words. They are more appropriately referred to as compressed instructions.`
231			`// The compressed instruction format allows two instructions to be packed into`
232			`// the same instruction word. Some instructions can be compressed, not all.`
233			`// Compressed instructions take the same time to complete. Set OPT_CIS to`
234			`// include these double instructions as part of the instruction set. These`
235			`// instructions are designed to get more code density from the instruction set,`
236			`// and to hopefully take some pain off of the performance of the pre-fetch and`
237			`// instruction cache.`
238	3	dgisselq	`//`
239			`// These new instructions, however, also necessitate a change in the Zip`
240			`// CPU--the Zip CPU can no longer execute instructions atomically. It must`
241	50	dgisselq	`// now execute non-CIS instructions, or CIS instruction pairs, atomically.`
242	3	dgisselq	`// This logic has been added into the ZipCPU, but it has not (yet) been`
243			`// tested thoroughly.`
244			`//`
245			`//`
246	50	dgisselq	`define OPT_CIS
247	3	dgisselq	`//`
248			`//`
249			`//`
250			`endif // OPT_SINGLE_FETCH
251			`//`
252			`//`
253			`//`
254			`// Now let's talk about peripherals for a moment. These next two defines`
255			`// control whether the DMA controller is included in the Zip System, and`
256			`// whether or not the 8 accounting timers are also included. Set these to`
257			`// include the respective peripherals, comment them out not to.`
258			`//`
259			`define INCLUDE_DMA_CONTROLLER
260			`define INCLUDE_ACCOUNTING_COUNTERS
261			`//`
262			`//`
263	30	dgisselq	`define DEBUG_SCOPE
264	3	dgisselq	`//`
265	49	dgisselq	`// The following is experimental:`
266			// `define OPT_NO_USERMODE // Savings: about 143 LUTs or so
267			`//`
268	3	dgisselq	`endif // CPUDEFS_H

Browse

Tools

Subversion Repositories openarty

[/] [openarty/] [trunk/] [rtl/] [cpu/] [cpudefs.v] - Blame information for rev 50