The or32 toolchain appears to be incapable of even simple optimizations.
Below is a sample from a simple loop that writes a value to consecutive words in memory.
This is what the or32 toolchain from git generated with -O3:
22f8: 18 40 ff ff l.movhi r2,0xffff //constant i is checked against 22fc: d4 03 20 00 l.sw 0x0(r3),r4 //Store value to memory 2300: a8 42 0a f0 l.ori r2,r2,0xaf0 //constant i is checked against 2304: 9c 63 ff fc l.addi r3,r3,0xfffffffc //--i 2308: e4 23 10 00 l.sfne r3,r2 //check i 230c: 13 ff ff fb l.bf 22f8
This is what I would expect to be generated (note the branch target change):
22f8: 18 40 ff ff l.movhi r2,0xffff //constant i is checked against 22fc: a8 42 0a f0 l.ori r2,r2,0xaf0 //constant i is checked against 2230: d4 03 20 00 l.sw 0x0(r3),r4 //Store value to memory 2304: 9c 63 ff fc l.addi r3,r3,0xfffffffc //--i 2308: e4 23 10 00 l.sfne r3,r2 //check i 230c: 13 ff ff fb l.bf 2230
I am using rc1 from: http://git.openrisc.net/cgit.cgi/jonas/toolchain/
I will update my toolchain and get back to you.
Regardless, here is a simplified build showing the issue: http://www.wuala.com/Deathbob/Files/OR32/Toolchain_LoopBug.zip
Thanks for taking a look at it since my toolchain is still compiling...
I figured it was a low priority but it will never be fixed if you don't know about it. I don't expect most people look at the generated assembly (I only did to squeeze some cycles in an interrupt).
Regards, Thomas