OpenCores

OpenCores

Proposal for a new high performance ISA

First

Prev

2/2

no use

no use

RE: Proposal for a new high performance ISA by dgisselq on Mar 27, 2016			dgisselq Posts: 247 Joined: Feb 20, 2015 Last seen: Mar 27, 2025
[q]There is an existing project like this: OR1K the openRISC project I am wondering why it is not mentioned in this proposal. [/q] <P>Well, let's see, if you'd read the proposal you might've read how this proposal differed from the OR1k instruction set in section 1.1. One of the very unique concepts in this proposal is that of variable length vectors. While I personally see it as a long shot to success, I'd still be glad to sit on the sidelines, cheer on Agner, and watch carefully how it turns out. If it does turn out well, it promises to be a game-changer. <P>JW, as for your comment, thank you: I just learned a new word, specious. Perhaps one of these days we can actually have a conversation about low power and FPGA's. Yours, <P>Dan There is an existing project like this: OR1K the openRISC project I am wondering why it is not mentioned in this proposal. Well, let's see, if you'd read the proposal you might've read how this proposal differed from the OR1k instruction set in section 1.1. One of the very unique concepts in this proposal is that of variable length vectors. While I personally see it as a long shot to success, I'd still be glad to sit on the sidelines, cheer on Agner, and watch carefully how it turns out. If it does turn out well, it promises to be a game-changer. JW, as for your comment, thank you: I just learned a new word, specious. Perhaps one of these days we can actually have a conversation about low power and FPGA's. Yours, Dan

RE: Proposal for a new high performance ISA by robfinch on Apr 5, 2016			robfinch Posts: 28 Joined: Sep 29, 2005 Last seen: Jun 5, 2025
I was just wondering what the interrupt response time would be like if one were to do a long vector divide for instance. Is the vector operation going to be interruptible ? I was just wondering what the interrupt response time would be like if one were to do a long vector divide for instance. Is the vector operation going to be interruptible ?

RE: Proposal for a new high performance ISA by Agner on Apr 5, 2016			Agner Posts: 12 Joined: Mar 23, 2016 Last seen: Mar 18, 2023
[q]I was just wondering what the interrupt response time would be like if one were to do a long vector divide for instance. Is the vector operation going to be interruptible ?[/q] If you are talking about external interrupts, then all instructions should in principle be interruptible. If the instruction has long latency then it may be best to abort the instruction and redo it after the interrupt in order to get a short response time. It is not possible to save a half-finished result in case of an interrupt. Another problem is error traps. Traps is not the best way to track errors in vector code because an exception in one vector element will interrupt the whole vector. The behavior of a software program will depend on the vector length, which can be different on different processors in my proposal. Errors in floating point calculations can be detected through the propagation of INF and NAN values. But it is a problem how to detect overflow in integer vector calculations. This has been discussed on my blog ( http://www.agner.org/optimize/blog/read.php?i=421 ) and various solutions have been discussed. None of the solutions seem really perfect so I would like to hear your opinion as well. Here is a list of proposed solutions for detecting overflow in integer vector calculations: Proposal 1. Use the mask/flags register for both input and output, and propagate overflow flags through this register. The opcode has a 3-bit field which can specify vector register v1 - v7 as mask/flags register. The mask is used for conditionally executing some elements in a vector and turning other off. Other bits in the mask/flags register is used for floating point options such as trap control, rounding mode, enabling subnormal numbers, etc. Proposal 1 is to use some bits in the mask/flags register as output to indicate signed and unsigned integer overflow. The problem is that you get an extra output. If instructions can have two outputs than the OOO scheduler and the register renamer will be more complicated. You also get extra dependencies for subsequent instructions that use the same flags register, so the compiler will have to use multiple flags registers to remove spurious dependencies. Proposal 2. Extra flags in the vector register itself. This proposal is to have one carry/overflow bit for each 32 bits in all vector registers. The overflow information is then saved in the same register as the result of the instruction. The mask/flags register, which is now input only, has option bits for choosing between signed and unsigned integer overflow and for whether the overflow bit is propagated from input register operands to the output operand. There is a special instruction for saving a vector register on task switches and in a callee-save situation. This instruction will save everything to guarantee that the register can be restored completely. Such an instruction is needed anyway because I have decided to store the vector length in the variable-length vector registers. (Then you only have to save the part of the vector register that is actually used). The problems with proposal 2 are: You can propagate the overflow information through register operands, but not memory operands. This will be a problem for the compiler. And it is a lot of extra machinery for something that is rarely used. On the other hand, it is possible that detection of integer overflow will be used more if the hardware makes it easy and efficient. Proposal 3. Make extra instructions for integer add, multiply, etc. which do not output the result, only the boolean information on whether it overflows. These instructions may have an extra input operand for propagating the overflow information. The problems with this proposal are that you have to do all calculations twice, and that the propagation creates additional dependency chains which can slow down OOO execution. Any suggestions? Which solution do you think is best? I was just wondering what the interrupt response time would be like if one were to do a long vector divide for instance. Is the vector operation going to be interruptible ? If you are talking about external interrupts, then all instructions should in principle be interruptible. If the instruction has long latency then it may be best to abort the instruction and redo it after the interrupt in order to get a short response time. It is not possible to save a half-finished result in case of an interrupt. Another problem is error traps. Traps is not the best way to track errors in vector code because an exception in one vector element will interrupt the whole vector. The behavior of a software program will depend on the vector length, which can be different on different processors in my proposal. Errors in floating point calculations can be detected through the propagation of INF and NAN values. But it is a problem how to detect overflow in integer vector calculations. This has been discussed on my blog ( http://www.agner.org/optimize/blog/read.php?i=421 ) and various solutions have been discussed. None of the solutions seem really perfect so I would like to hear your opinion as well. Here is a list of proposed solutions for detecting overflow in integer vector calculations: Proposal 1. Use the mask/flags register for both input and output, and propagate overflow flags through this register. The opcode has a 3-bit field which can specify vector register v1 - v7 as mask/flags register. The mask is used for conditionally executing some elements in a vector and turning other off. Other bits in the mask/flags register is used for floating point options such as trap control, rounding mode, enabling subnormal numbers, etc. Proposal 1 is to use some bits in the mask/flags register as output to indicate signed and unsigned integer overflow. The problem is that you get an extra output. If instructions can have two outputs than the OOO scheduler and the register renamer will be more complicated. You also get extra dependencies for subsequent instructions that use the same flags register, so the compiler will have to use multiple flags registers to remove spurious dependencies. Proposal 2. Extra flags in the vector register itself. This proposal is to have one carry/overflow bit for each 32 bits in all vector registers. The overflow information is then saved in the same register as the result of the instruction. The mask/flags register, which is now input only, has option bits for choosing between signed and unsigned integer overflow and for whether the overflow bit is propagated from input register operands to the output operand. There is a special instruction for saving a vector register on task switches and in a callee-save situation. This instruction will save everything to guarantee that the register can be restored completely. Such an instruction is needed anyway because I have decided to store the vector length in the variable-length vector registers. (Then you only have to save the part of the vector register that is actually used). The problems with proposal 2 are: You can propagate the overflow information through register operands, but not memory operands. This will be a problem for the compiler. And it is a lot of extra machinery for something that is rarely used. On the other hand, it is possible that detection of integer overflow will be used more if the hardware makes it easy and efficient. Proposal 3. Make extra instructions for integer add, multiply, etc. which do not output the result, only the boolean information on whether it overflows. These instructions may have an extra input operand for propagating the overflow information. The problems with this proposal are that you have to do all calculations twice, and that the propagation creates additional dependency chains which can slow down OOO execution. Any suggestions? Which solution do you think is best?

RE: Proposal for a new high performance ISA by robfinch on Apr 5, 2016			robfinch Posts: 28 Joined: Sep 29, 2005 Last seen: Jun 5, 2025
I like proposal #2 the best, but I'm a neophyte when it comes to vector code. It should be possible to represent the status bits as another set of registers with the bits packed into the register eg 32 sets of flag bits fitting into 1 64 bit register. You also might need a status bit for divide by zero and square root of zero. What about combining #2 and #3 somehow ? Making the result output wider to accommodate status flags which are stored in the vector registers, then using a second instruction if desired that works with status results from multiple vector elements at the same time. If interrupts were locked out (1 cycle) until the second instruction completes it wouldn't be necessary to save and restore the extra bits to memory. For proposal #3. Note the second set of instructions that detects overflow / carry / divide by zero that executes optionally after a vector instruction executes doesn't have to perform the whole calculation over again, it can bit fiddle and be done in a single cycle. I like proposal #2 the best, but I'm a neophyte when it comes to vector code. It should be possible to represent the status bits as another set of registers with the bits packed into the register eg 32 sets of flag bits fitting into 1 64 bit register. You also might need a status bit for divide by zero and square root of zero. What about combining #2 and #3 somehow ? Making the result output wider to accommodate status flags which are stored in the vector registers, then using a second instruction if desired that works with status results from multiple vector elements at the same time. If interrupts were locked out (1 cycle) until the second instruction completes it wouldn't be necessary to save and restore the extra bits to memory. For proposal #3. Note the second set of instructions that detects overflow / carry / divide by zero that executes optionally after a vector instruction executes doesn't have to perform the whole calculation over again, it can bit fiddle and be done in a single cycle.

RE: Proposal for a new high performance ISA by Agner on May 11, 2016			Agner Posts: 12 Joined: Mar 23, 2016 Last seen: Mar 18, 2023
Thank you for your comments and ideas. My proposal has now been updated. The most important changes are: <ul> <li> It has got a name: CRISC1 <li> The length of a vector is now stored in the vector register. When saving a long vector register you only need to save the part of the register that is actually used. <li> Instructions can have no more than one output dependency and up to five input dependencies. <li> All application-level instructions are now defined. System instructions are not defined yet. </ul> <p> The updated proposal is at: <a href="http://www.agner.org/optimize/instructionset.pdf">my website</a> <p> Further discussion at: <a href="http://www.agner.org/optimize/blog/read.php?i=421">my blog</a> </p> Thank you for your comments and ideas. My proposal has now been updated. The most important changes are: It has got a name: CRISC1 The length of a vector is now stored in the vector register. When saving a long vector register you only need to save the part of the register that is actually used. Instructions can have no more than one output dependency and up to five input dependencies. All application-level instructions are now defined. System instructions are not defined yet. The updated proposal is at: my website Further discussion at: my blog

RE: Proposal for a new high performance ISA by dgisselq on May 11, 2016			dgisselq Posts: 247 Joined: Feb 20, 2015 Last seen: Mar 27, 2025
Agner, <P>Is your choice of <em>CRISC1</em> in any way related to the CRISC architecture or a reference to it? <P>You mentioned a laundry list of tasks a while back for your next steps: assembler, compiler, linker, simulator, profiler, FPGA, ASIC (and I'll add debugger). Will these tasks be conducted publicly where we can all watch, or quietly? In other words, will you post your work to OpenCores as a work in progress, to github, or somewhere else? <P>As another related question, how involved do you personally expect to be with these tasks? <P>Others, <P>May I commend to your reading attention Agner's blog? There's some very fascinating articles there that I truly enjoyed reading regarding how particular arc hitectural features related to x86 based computers were implemented, and how code for them may be optimized. <P>Dan Agner, Is your choice of CRISC1 in any way related to the CRISC architecture or a reference to it? You mentioned a laundry list of tasks a while back for your next steps: assembler, compiler, linker, simulator, profiler, FPGA, ASIC (and I'll add debugger). Will these tasks be conducted publicly where we can all watch, or quietly? In other words, will you post your work to OpenCores as a work in progress, to github, or somewhere else? As another related question, how involved do you personally expect to be with these tasks? Others, May I commend to your reading attention Agner's blog? There's some very fascinating articles there that I truly enjoyed reading regarding how particular arc hitectural features related to x86 based computers were implemented, and how code for them may be optimized. Dan

Now on Github: Proposal for a new high performance ISA by Agner on Jun 26, 2016			Agner Posts: 12 Joined: Mar 23, 2016 Last seen: Mar 18, 2023
I have now made a repository on Github for the development of this ISA and software toolchain. The address is <a href="https://github.com/ForwardCom">github.com/ForwardCom</a>. ForwardCom stands for Forward Compatible Computer system (The previous name CRISC was not available). <p> Unfortunately, Github has no mailing list system or forum feature. Right now the discussion is taking place mainly at my own website <a href="http://agner.org/optimize/blog/">agner.org/optimize/blog</a>. I don't know if we can host the discussion here on OpenCores? The project is much more than the development of a hardware core - it is a complete vertical redesign of ISA, ABI standards, memory management system, software toolchain, and of course microprocessor core. The development of the hardware belongs here on OpenCores anyway, of course. <p> The latest version of the manual is now at <a href="https://github.com/ForwardCom/manual/raw/master/forwardcom.pdf">github.com/ForwardCom/manual</a>. <p> New in this version: <ul> <li>Security features added. <li>Support for dual stack. <li>Some instructions and formats modified, including more formats for jump and call instructions. <li>System call, system return and trap instructions added. <li>New addressing mode for arrays with bounds checking. <li>Memory management and ABI standards described in more detail. <li>Instruction list in comma separated file instruction_list.csv to be used by assemblers, emulators, debuggers, etc. <li>Object file format defined in file elf_forwardcom.h </ul> I have now made a repository on Github for the development of this ISA and software toolchain. The address is github.com/ForwardCom. ForwardCom stands for Forward Compatible Computer system (The previous name CRISC was not available). Unfortunately, Github has no mailing list system or forum feature. Right now the discussion is taking place mainly at my own website agner.org/optimize/blog. I don't know if we can host the discussion here on OpenCores? The project is much more than the development of a hardware core - it is a complete vertical redesign of ISA, ABI standards, memory management system, software toolchain, and of course microprocessor core. The development of the hardware belongs here on OpenCores anyway, of course. The latest version of the manual is now at github.com/ForwardCom/manual. New in this version: Security features added. Support for dual stack. Some instructions and formats modified, including more formats for jump and call instructions. System call, system return and trap instructions added. New addressing mode for arrays with bounds checking. Memory management and ABI standards described in more detail. Instruction list in comma separated file instruction_list.csv to be used by assemblers, emulators, debuggers, etc. Object file format defined in file elf_forwardcom.h

First

Prev

2/2

no use

no use

© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.