1/1

|
low compile efficiency
by bruceli on Aug 27, 2010 |
bruceli
Posts: 5 Joined: Feb 3, 2009 Last seen: Feb 10, 2011 |
||
|
I find our or32-elf-gcc works in a weird way, there is lots of redudnancy in the
complie result. What I need is to move a byte value 35(0x23) to address 0x9000_0007. While from the disassembled code, the compiler handle this task in a complex way, therefore the code size is unnecessary big. Anybody can explain why gcc working like this? and give some hint on how to solve this problem? thanks a lot!!! ----------------- disassembled code (the comment are added by me)--------------------- #define REG8(add) *((volatile unsigned char *)(add)) #define MSG_UART_BR_LSB_ADDR 0x9000_0007 void msg_uart_init(void) { f4e4: 9c 21 ff d0 l.addi r1,r1,0xffffffd0 /* baud rate config */ REG8(MSG_UART_BR_LSB_ADDR) = 35; f4e8: 18 60 90 00 l.movhi r3,0x9000 // load high address to r3 high part, then r3 = 0x9000_0000 f4ec: d4 01 18 20 l.sw 0x20(r1),r3 // store r3 to stack (@sp+0x20) f4f0: 84 81 00 20 l.lwz r4,0x20(r1) // move stack (@sp+0x20) to r4 f4f4: a8 84 00 07 l.ori r4,r4,0x7 // r4|0x7, then r4 = 0x9000_0007, value of MSG_UART_BR_LSB_ADDR f4f8: d4 01 20 24 l.sw 0x24(r1),r4 // store r4 to stack (@sp+0x24) f4fc: 9c 60 00 23 l.addi r3,r0,0x23 // move 35(0x23) to r3 f500: d4 01 18 1c l.sw 0x1c(r1),r3 // move r3 to stack (@sp+0x1c) f504: 84 81 00 1c l.lwz r4,0x1c(r1) // move stack (@sp+0x1c) to r4, now r4 = 0x23 f508: d8 01 20 1b l.sb 0x1b(r1),r4 // move r4 to stack (@sp+0x1b) f50c: 8c 81 00 1b l.lbz r4,0x1b(r1) // load stack (@sp+0x1b) to r4 f510: 84 61 00 24 l.lwz r3,0x24(r1) // load stack (@sp+0x24) to r3 f514: d8 03 20 00 l.sb 0x0(r3),r4 // move r4(35) to address (EA)r3 (0x9000_0007) REG8(MSG_UART_BR_MSB_ADDR) = 0; f518: 18 60 90 00 l.movhi r3,0x9000 f51c: d4 01 18 14 l.sw 0x14(r1),r3 f520: 84 81 00 14 l.lwz r4,0x14(r1) f524: a8 84 00 08 l.ori r4,r4,0x8 f528: d4 01 20 28 l.sw 0x28(r1),r4 ....... |
|||
|
RE: low compile efficiency
by jeremybennett on Aug 27, 2010 |
jeremybennett
Posts: 689 Joined: May 29, 2008 Last seen: Feb 9, 2012 |
||
|
Hi Brucelli This problem only occurs in unoptimized code. It was due to a bug, which by coincidence was spotted and fixed yesterday. Try the version of GCC in SVN. Here are some results from that compiler, using a stripped down version of your test case.
#define REG8(add) *((volatile unsigned char *)(add))
#define MSG_UART_BR_LSB_ADDR 0x90000007
void msg_uart_init(void)
{
/* baud rate config */
REG8(MSG_UART_BR_LSB_ADDR) = 35;
}
The assembly code is as follows
l.addi r1,r1,-4
l.sw 0(r1),r2
l.addi r2,r1,4
l.movhi r3,hi(-1879048192)
l.ori r4,r3,7
l.addi r3,r0,35
l.sb 0(r4),r3
l.lwz r2,0(r1)
l.jr r9
l.addi r1,r1,4
For comparison this is the assembly code with -O2
l.addi r1,r1,-4
l.sw 0(r1),r2
l.addi r2,r1,4
l.movhi r3,hi(-1879048192)
l.addi r4,r0,35
l.ori r3,r3,7
l.sb 0(r3),r4
l.lwz r2,0(r1)
l.jr r9
l.addi r1,r1,4
Apart from some register name juggling this is the same. However if we tell GCC to omit the frame pointer (-fomit-frame-pointer), the code is much better, even without optimization. GCC even realizes it doesn't need the stack pointer (r1) either.
l.movhi r3,hi(-1879048192)
l.ori r4,r3,7
l.addi r3,r0,35
l.sb 0(r4),r3
l.jr r9
l.nop
This still is not perfect. The delay slot could have used the l.sb instruction and l.nop been avoided. This represents a failing in the GCC compiler. I know its handling of epilogues is not good and needs improving. HTH Jeremy
-- |
|||
|
RE: low compile efficiency
by mikerez on Aug 27, 2010 |
mikerez
Posts: 3 Joined: May 14, 2009 Last seen: Apr 12, 2011 |
||
|
>This problem only occurs in unoptimized code. It was due to a bug, which by coincidence was spotted >and fixed yesterday. Try the version of GCC in SVN.
But what I am not sure in is that bug touches unoptimized code only. I think local allocation metter for optimized code too. But this is more difficult to check, may be in future. Mikhail. |
|||
|
RE: low compile efficiency
by bruceli on Aug 30, 2010 |
bruceli
Posts: 5 Joined: Feb 3, 2009 Last seen: Feb 10, 2011 |
||
|
hi, jb and mikhail:
I apply the patch, now gcc give pretty neat result. I list the disassembled code below. Thank you! ------------------------------------------- void msg_uart_init(void) { c3e8: 9c 21 ff fc l.addi r1,r1,0xfffffffc c3ec: d4 01 10 00 l.sw 0x0(r1),r2 c3f0: 9c 41 00 04 l.addi r2,r1,0x4 /* baud rate config */ REG8(MSG_UART_BR_LSB_ADDR) = 35; c3f4: 18 60 90 00 l.movhi r3,0x9000 c3f8: a8 83 00 07 l.ori r4,r3,0x7 c3fc: 9c 60 00 23 l.addi r3,r0,0x23 c400: d8 04 18 00 l.sb 0x0(r4),r3 REG8(MSG_UART_BR_MSB_ADDR) = 0; c404: 18 60 90 00 l.movhi r3,0x9000 ...... ------------------------------------------- |
|||
1/1

