OpenCores
no use no use 1/1 no use no use
Re: or1200 execution units
by Unknown on Feb 19, 2004
Not available!
>"mul" and "mac" instructions have their results ready after three cycles.


"mac" takes effectively one clock cycle if there is no data hazard.
Otherwise it takes 3.

mac takes less time than mul? I thought mac uses the multiplier from
mul. Isn't that why mac requires mul to be implemented?

BTW what would be excellent is if you had a look at code size. It seems
current toolchain is not very efficient when it comes to code size. I know
that folks have used MIPS compiler in the past and mapped generated code to
openrisc. The two architectures are similar enough that this is possible. I
hear that MIPS compiler generates smaller codee than GCC. So investing in
direction of code size would be very much valuable (I know a couple of
openrisc based ASIC projects where code size is very important...)


Optimization is done by gcc. So I believe that getting smaller code is a
matter of describing the machine more precisely. I noticed that some
instructions are not used by gcc yet, maybe there is some potential. I
will examine the target definitions as I find some time for it.

Heiko

Re: or1200 execution units
by Unknown on Feb 19, 2004
Not available!
> "mac" takes effectively one clock cycle if there is no data hazard.
> Otherwise it takes 3.

mac takes less time than mul? I thought mac uses the multiplier from
mul. Isn't that why mac requires mul to be implemented?


Yes mac uses same hardware resources as mul. But because it doesn't have GPR
destination register but only MACLO/MACHI as destination, it can complete
effectively in 1 clock cycle (if there is no l.macrc followed, if there is
one it wil ltake 3 clock cycles). Anyway there is no definition for l.mac in
gcc. I tried to add l.mac but I was never successful to implement it in .md.
I don't know why it didn't work. I tried to copy other mac definitions from
other ports of GCC but it simply didn't emit l.mac insn.

Optimization is done by gcc. So I believe that getting smaller code is a
matter of describing the machine more precisely. I noticed that some
instructions are not used by gcc yet, maybe there is some potential. I
will examine the target definitions as I find some time for it.


I can think of two optimizations, they would also be speed optimizations as
well:
- right now l.sfXX is always followed by cond branch instructions (or cond
branch is always preceeded with l.sfXX insn). Splitting this pair into two
separate insns would be good for speed and maybe also for size (?)
- right now l.sfXXi are not implemented (I tried to implement and you can
see some attempts in or32.c but it didn't work properly al lthe time - it
emitted code but sometimes gcc crashed)
- some optional insns like l.addc are not implemented

regards,
Damjan

Heiko _______________________________________________ http://www.opencores.org/mailman/listinfo/openrisc



Re: or1200 execution units
by Unknown on Feb 20, 2004
Not available!
OK, based on that information, here's the new function unit description.
It looks like we can express the machine with only one function unit,
since no insn can execute before the previous insn has been executed.

(define_function_unit "pipeline" 1 0 (eq_attr "type"
"shift,add,logic,extend,move,compare") 1 1)
(define_function_unit "pipeline" 1 0 (eq_attr "type" "mul") 3 1)
(define_function_unit "pipeline" 1 0 (eq_attr "type" "load") 2 1)
(define_function_unit "pipeline" 1 0 (eq_attr "type" "store") 1 1)

Heiko
no use no use 1/1 no use no use
© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.