URL https://opencores.org/ocsvn/forwardcom/forwardcom/trunk

Subversion Repositories forwardcom

[/] [forwardcom/] [manual/] [fwc_conclusion.tex] - Blame information for rev 142

Details | Compare with Previous | View Log


% chapter included in forwardcom.tex
\documentclass[forwardcom.tex]{subfiles}
\begin{document}
\RaggedRight
 
\chapter{Conclusion}
The ForwardCom instruction set architecture is a consistent, modular,
flexible, orthogonal, scalable, and expansible instruction set offering a good compromise between the RISC principle that gives fast decoding and efficient pipelining, and the CISC principle that gives a more compact code and more work done per instruction. Support for efficient out-of-order execution and vector processing is a basic part of the design rather than a suboptimal patch added later as we have seen in other systems.
\vv
 
There are relatively few instructions, but each instruction can be coded in many different variants with integer operands of different sizes and floating point operands of different precisions. The operands can be scalars or vectors of any length. Operands can be registers, immediate constants in various compact forms, or memory operands with different addressing modes.  Instructions can have predicates or masks with specified fallback values. Vectors have variable lengths with unlimited room for future expansions.
\vv
 
All in all, the same basic instruction can have many different variants with the same operation code where other instruction sets have many different instructions to cover the same diversity. Everything fits into a consistent template format that simplifies the hardware implementation. The design also has plenty of space for single-format instructions with fewer variants.
\vv
 
The instruction format is designed so that the microprocessor pipeline can be simple and efficient. All instructions fit into the same format templates and the same pipeline structure to facilitate an efficient hardware design. It is possible to make complex instructions with complex functionality, but only if this fits into the pipeline system and the timing constraints. Microcode is preferably not used.
\vv
 
The decoder front-end can load multiple instructions per clock cycle because it is easy to detect the length of each instruction, and the decoder needs only distinguish between a few different instruction sizes. Actually, it is
possible to make a working program with only single-word (32 bits) instructions, but it is highly recommended to also support double-word and triple-word instructions. \vv
 
It is possible to add support for longer instructions in future extensions, but the priority has been to avoid any bottleneck in the decoding of instruction length (which is a serious bottleneck in the x86 architecture).
\vv
 
The code format is designed to be compact in order to save code cache space. This compactness is obtained in several ways. The same instruction can be coded in different sizes with two- and three-operand forms, different sizes of immediate constants, shifted immediate constants, and relative addresses with different sizes of offsets and scale factors, while avoiding absolute addresses that would require 64 bits for the address alone. It is always possible to choose the smallest version of an instruction that fits the particular need. The load on the data cache can be reduced by storing immediate constants in the code rather than in memory operands.
\vv
 
Most instructions can have a mask register which is used for predication in scalar instructions and for masking in vector instructions. The same mask register is also used for specifying various options such as rounding mode, exception handling, etc., that would otherwise require extra bits in the instruction code.
\vv
 
The introduction of vector registers with variable length is an important improvement over the most common current architectures. The ForwardCom vector system has the following advantages:
 
\begin{itemize}
\item The system is scalable. Different microprocessors can have different maximum vector lengths with no upper limit. It can be used for small embedded systems as well as large supercomputers with very long vectors.
 
\item The same code can run on different microprocessors with different maximum vector lengths and automatically utilize the full vector capabilities of each microprocessor.
 
\item The code does not have to be recompiled when a new microprocessor version with longer vectors becomes available. Software developers do not have to maintain multiple versions of their software for different vector lengths.
 
\item The software can save and restore a vector register in a way that is guaranteed to work with future processors with longer vectors. The inability to do so is a big problem in current architectures.
 
\item Only the part of a vector register that is actually used needs to be saved and restored. Each vector register includes information about how many bytes of it are used. Therefore, no unnecessary resources are wasted on saving a full-length vector if it is unused or only partially used.
 
\item A special addressing mode supports a very efficient loop structure that will automatically use the maximum vector length on all but the last iteration of an array loop. The last iteration will automatically use a shorter vector to handle the remaining array elements in case the array size is not divisible by the maximum vector length. There is no need to handle the remaining elements separately outside the main loop and no need to make separate versions of the loop for different special cases.
 
\item Functions can have variable-length vector registers as parameters. This makes it easy for the compiler to vectorize loops that contain function calls.
 
\item Instructions with vector register operands need no extra information about the vector length because this information is included in the vector registers. This makes these instructions more compact. Instructions with vector memory operands do need this extra information, though.
 
\item The design takes into account the hardware requirements of microprocessors with very long vectors where transport delays across a vector may depend on the vector length.
 
\item A new efficient system with re-linkable function libraries eliminates the need for dynamic link libraries and shared objects.
 
\item Strong security features are built into the design.
 
\item Software standards guarantee compatibility between different compilers, programming languages, user interface frameworks, and operating systems.
 
\end{itemize}
 
The memory model is flexible with relative addresses. Everything is position-independent. Memory management is simpler than in many current systems.
There is little or no need for virtual address translation. The preferred implementation has no translation lookaside buffer (TLB) and no memory paging, but a simple on-chip memory map with variable-size memory sections. Problems with stack overflow, memory fragmentation, etc. can be avoided completely in most cases. Task switches will be fast because of the small memory map and because of the efficient mechanism for saving vector registers.
\vv
 
The principle that a fundamental redesign enables us to learn from history and integrate late additions into the basic design also applies to the whole ecosystem of ABI standard, function libraries, compilers, linkers, and operating system. By defining not only an instruction set, but also ABI standard, binary file format, interface library standard, etc. we get the further advantage that different compilers and different programming languages will be compatible with each other. It will be possible to write different parts of a program in different programming languages and to use the same function libraries with all compilers. Even different operating systems will be compatible to some degree. A feature for re-linking an executable file makes it possible to run the same binary program file in different operating systems or on different platforms where the appropriate user interface framework is selected when the program is installed.
\vv
 
We have also learned from past mistakes that it is difficult to predict future needs. While the ForwardCom instruction set is intended to be flexible with room for future extensions, we may ask whether the future will bring needs for new features that are difficult to integrate into our design and standards. The best way to prevent such unforeseen problems is to allow input and suggestions from the entire community of hardware and software developers. It is important that the design and standards are developed through an open process that allows everybody to comment and make suggestions. We have often seen the problems of leaving this to a commercial industry. The industry often makes short-term decisions for marketing reasons. Patents, license restrictions, and trade secrets harm competition and prevent niche operators from entering the market. The microprocessor industry often keeps new features and instruction set extensions secret for competitive reasons until it is too late to change them in case the IT community comes up with better proposals.
\vv
 
The ForwardCom project is being developed in an open process with contributions and ideas from various people based on the philosophy that the problems mentioned above can best be avoided through openness and collaboration.
\end{document}

Line No.	Rev	Author	Line
1	142	Agner	`% chapter included in forwardcom.tex`
2			`\documentclass[forwardcom.tex]{subfiles}`
3			`\begin{document}`
4			`\RaggedRight`
5
6			`\chapter{Conclusion}`
7			`The ForwardCom instruction set architecture is a consistent, modular,`
8			`flexible, orthogonal, scalable, and expansible instruction set offering a good compromise between the RISC principle that gives fast decoding and efficient pipelining, and the CISC principle that gives a more compact code and more work done per instruction. Support for efficient out-of-order execution and vector processing is a basic part of the design rather than a suboptimal patch added later as we have seen in other systems.`
9			`\vv`
10
11			There are relatively few instructions, but each instruction can be coded in many different variants with integer operands of different sizes and floating point operands of different precisions. The operands can be scalars or vectors of any length. Operands can be registers, immediate constants in various compact forms, or memory operands with different addressing modes. Instructions can have predicates or masks with specified fallback values. Vectors have variable lengths with unlimited room for future expansions.
12			`\vv`
13
14			`All in all, the same basic instruction can have many different variants with the same operation code where other instruction sets have many different instructions to cover the same diversity. Everything fits into a consistent template format that simplifies the hardware implementation. The design also has plenty of space for single-format instructions with fewer variants.`
15			`\vv`
16
17			`The instruction format is designed so that the microprocessor pipeline can be simple and efficient. All instructions fit into the same format templates and the same pipeline structure to facilitate an efficient hardware design. It is possible to make complex instructions with complex functionality, but only if this fits into the pipeline system and the timing constraints. Microcode is preferably not used.`
18			`\vv`
19
20			`The decoder front-end can load multiple instructions per clock cycle because it is easy to detect the length of each instruction, and the decoder needs only distinguish between a few different instruction sizes. Actually, it is`
21			`possible to make a working program with only single-word (32 bits) instructions, but it is highly recommended to also support double-word and triple-word instructions. \vv`
22
23			`It is possible to add support for longer instructions in future extensions, but the priority has been to avoid any bottleneck in the decoding of instruction length (which is a serious bottleneck in the x86 architecture).`
24			`\vv`
25
26			The code format is designed to be compact in order to save code cache space. This compactness is obtained in several ways. The same instruction can be coded in different sizes with two- and three-operand forms, different sizes of immediate constants, shifted immediate constants, and relative addresses with different sizes of offsets and scale factors, while avoiding absolute addresses that would require 64 bits for the address alone. It is always possible to choose the smallest version of an instruction that fits the particular need. The load on the data cache can be reduced by storing immediate constants in the code rather than in memory operands.
27			`\vv`
28
29			`Most instructions can have a mask register which is used for predication in scalar instructions and for masking in vector instructions. The same mask register is also used for specifying various options such as rounding mode, exception handling, etc., that would otherwise require extra bits in the instruction code.`
30			`\vv`
31
32			`The introduction of vector registers with variable length is an important improvement over the most common current architectures. The ForwardCom vector system has the following advantages:`
33
34			`\begin{itemize}`
35			`\item The system is scalable. Different microprocessors can have different maximum vector lengths with no upper limit. It can be used for small embedded systems as well as large supercomputers with very long vectors.`
36
37			`\item The same code can run on different microprocessors with different maximum vector lengths and automatically utilize the full vector capabilities of each microprocessor.`
38
39			`\item The code does not have to be recompiled when a new microprocessor version with longer vectors becomes available. Software developers do not have to maintain multiple versions of their software for different vector lengths.`
40
41			`\item The software can save and restore a vector register in a way that is guaranteed to work with future processors with longer vectors. The inability to do so is a big problem in current architectures.`
42
43			`\item Only the part of a vector register that is actually used needs to be saved and restored. Each vector register includes information about how many bytes of it are used. Therefore, no unnecessary resources are wasted on saving a full-length vector if it is unused or only partially used.`
44
45			\item A special addressing mode supports a very efficient loop structure that will automatically use the maximum vector length on all but the last iteration of an array loop. The last iteration will automatically use a shorter vector to handle the remaining array elements in case the array size is not divisible by the maximum vector length. There is no need to handle the remaining elements separately outside the main loop and no need to make separate versions of the loop for different special cases.
46
47			`\item Functions can have variable-length vector registers as parameters. This makes it easy for the compiler to vectorize loops that contain function calls.`
48
49			`\item Instructions with vector register operands need no extra information about the vector length because this information is included in the vector registers. This makes these instructions more compact. Instructions with vector memory operands do need this extra information, though.`
50
51			`\item The design takes into account the hardware requirements of microprocessors with very long vectors where transport delays across a vector may depend on the vector length.`
52
53			`\item A new efficient system with re-linkable function libraries eliminates the need for dynamic link libraries and shared objects.`
54
55			`\item Strong security features are built into the design.`
56
57			`\item Software standards guarantee compatibility between different compilers, programming languages, user interface frameworks, and operating systems.`
58
59			`\end{itemize}`
60
61			`The memory model is flexible with relative addresses. Everything is position-independent. Memory management is simpler than in many current systems.`
62			`There is little or no need for virtual address translation. The preferred implementation has no translation lookaside buffer (TLB) and no memory paging, but a simple on-chip memory map with variable-size memory sections. Problems with stack overflow, memory fragmentation, etc. can be avoided completely in most cases. Task switches will be fast because of the small memory map and because of the efficient mechanism for saving vector registers.`
63			`\vv`
64
65			The principle that a fundamental redesign enables us to learn from history and integrate late additions into the basic design also applies to the whole ecosystem of ABI standard, function libraries, compilers, linkers, and operating system. By defining not only an instruction set, but also ABI standard, binary file format, interface library standard, etc. we get the further advantage that different compilers and different programming languages will be compatible with each other. It will be possible to write different parts of a program in different programming languages and to use the same function libraries with all compilers. Even different operating systems will be compatible to some degree. A feature for re-linking an executable file makes it possible to run the same binary program file in different operating systems or on different platforms where the appropriate user interface framework is selected when the program is installed.
66			`\vv`
67
68			We have also learned from past mistakes that it is difficult to predict future needs. While the ForwardCom instruction set is intended to be flexible with room for future extensions, we may ask whether the future will bring needs for new features that are difficult to integrate into our design and standards. The best way to prevent such unforeseen problems is to allow input and suggestions from the entire community of hardware and software developers. It is important that the design and standards are developed through an open process that allows everybody to comment and make suggestions. We have often seen the problems of leaving this to a commercial industry. The industry often makes short-term decisions for marketing reasons. Patents, license restrictions, and trade secrets harm competition and prevent niche operators from entering the market. The microprocessor industry often keeps new features and instruction set extensions secret for competitive reasons until it is too late to change them in case the IT community comes up with better proposals.
69			`\vv`
70
71			`The ForwardCom project is being developed in an open process with contributions and ideas from various people based on the philosophy that the problems mentioned above can best be avoided through openness and collaboration.`
72			`\end{document}`

Browse

Tools

Subversion Repositories forwardcom

[/] [forwardcom/] [manual/] [fwc_conclusion.tex] - Blame information for rev 142