



Re: or1200 execution units
by Unknown on Feb 19, 2004 |
Not available! | ||
>"mul" and "mac" instructions have their results ready after three cycles.
"mac" takes effectively one clock cycle if there is no data hazard. Otherwise it takes 3. mac takes less time than mul? I thought mac uses the multiplier from mul. Isn't that why mac requires mul to be implemented?
BTW what would be excellent is if you had a look at code size. It seems
current toolchain is not very efficient when it comes to code size. I know that folks have used MIPS compiler in the past and mapped generated code to openrisc. The two architectures are similar enough that this is possible. I hear that MIPS compiler generates smaller codee than GCC. So investing in direction of code size would be very much valuable (I know a couple of openrisc based ASIC projects where code size is very important...) Optimization is done by gcc. So I believe that getting smaller code is a matter of describing the machine more precisely. I noticed that some instructions are not used by gcc yet, maybe there is some potential. I will examine the target definitions as I find some time for it. Heiko |
Re: or1200 execution units
by Unknown on Feb 19, 2004 |
Not available! | ||
> "mac" takes effectively one clock cycle if there is no data hazard.
> Otherwise it takes 3. mac takes less time than mul? I thought mac uses the multiplier from mul. Isn't that why mac requires mul to be implemented? Yes mac uses same hardware resources as mul. But because it doesn't have GPR destination register but only MACLO/MACHI as destination, it can complete effectively in 1 clock cycle (if there is no l.macrc followed, if there is one it wil ltake 3 clock cycles). Anyway there is no definition for l.mac in gcc. I tried to add l.mac but I was never successful to implement it in .md. I don't know why it didn't work. I tried to copy other mac definitions from other ports of GCC but it simply didn't emit l.mac insn.
Optimization is done by gcc. So I believe that getting smaller code is a
matter of describing the machine more precisely. I noticed that some instructions are not used by gcc yet, maybe there is some potential. I will examine the target definitions as I find some time for it. I can think of two optimizations, they would also be speed optimizations as well: - right now l.sfXX is always followed by cond branch instructions (or cond branch is always preceeded with l.sfXX insn). Splitting this pair into two separate insns would be good for speed and maybe also for size (?) - right now l.sfXXi are not implemented (I tried to implement and you can see some attempts in or32.c but it didn't work properly al lthe time - it emitted code but sometimes gcc crashed) - some optional insns like l.addc are not implemented regards, Damjan
Heiko
_______________________________________________
http://www.opencores.org/mailman/listinfo/openrisc
|
Re: or1200 execution units
by Unknown on Feb 20, 2004 |
Not available! | ||
OK, based on that information, here's the new function unit description.
It looks like we can express the machine with only one function unit, since no insn can execute before the previous insn has been executed. (define_function_unit "pipeline" 1 0 (eq_attr "type" "shift,add,logic,extend,move,compare") 1 1) (define_function_unit "pipeline" 1 0 (eq_attr "type" "mul") 3 1) (define_function_unit "pipeline" 1 0 (eq_attr "type" "load") 2 1) (define_function_unit "pipeline" 1 0 (eq_attr "type" "store") 1 1) Heiko |



