OpenCores
no use no use 1/1 no use no use
low compile efficiency
by bruceli on Aug 27, 2010
bruceli
Posts: 5
Joined: Feb 3, 2009
Last seen: Feb 10, 2011
I find our or32-elf-gcc works in a weird way, there is lots of redudnancy in the
complie result.

What I need is to move a byte value 35(0x23) to address 0x9000_0007. While from the
disassembled code, the compiler handle this task in a complex way, therefore the
code size is unnecessary big. Anybody can explain why gcc working like this?
and give some hint on how to solve this problem?

thanks a lot!!!

----------------- disassembled code (the comment are added by me)---------------------

#define REG8(add) *((volatile unsigned char *)(add))
#define MSG_UART_BR_LSB_ADDR 0x9000_0007

void msg_uart_init(void)
{
f4e4: 9c 21 ff d0 l.addi r1,r1,0xffffffd0
/* baud rate config */
REG8(MSG_UART_BR_LSB_ADDR) = 35;
f4e8: 18 60 90 00 l.movhi r3,0x9000 // load high address to r3 high part, then r3 = 0x9000_0000
f4ec: d4 01 18 20 l.sw 0x20(r1),r3 // store r3 to stack (@sp+0x20)
f4f0: 84 81 00 20 l.lwz r4,0x20(r1) // move stack (@sp+0x20) to r4
f4f4: a8 84 00 07 l.ori r4,r4,0x7 // r4|0x7, then r4 = 0x9000_0007, value of MSG_UART_BR_LSB_ADDR
f4f8: d4 01 20 24 l.sw 0x24(r1),r4 // store r4 to stack (@sp+0x24)
f4fc: 9c 60 00 23 l.addi r3,r0,0x23 // move 35(0x23) to r3
f500: d4 01 18 1c l.sw 0x1c(r1),r3 // move r3 to stack (@sp+0x1c)
f504: 84 81 00 1c l.lwz r4,0x1c(r1) // move stack (@sp+0x1c) to r4, now r4 = 0x23
f508: d8 01 20 1b l.sb 0x1b(r1),r4 // move r4 to stack (@sp+0x1b)
f50c: 8c 81 00 1b l.lbz r4,0x1b(r1) // load stack (@sp+0x1b) to r4
f510: 84 61 00 24 l.lwz r3,0x24(r1) // load stack (@sp+0x24) to r3
f514: d8 03 20 00 l.sb 0x0(r3),r4 // move r4(35) to address (EA)r3 (0x9000_0007)
REG8(MSG_UART_BR_MSB_ADDR) = 0;
f518: 18 60 90 00 l.movhi r3,0x9000
f51c: d4 01 18 14 l.sw 0x14(r1),r3
f520: 84 81 00 14 l.lwz r4,0x14(r1)
f524: a8 84 00 08 l.ori r4,r4,0x8
f528: d4 01 20 28 l.sw 0x28(r1),r4
.......
RE: low compile efficiency
by jeremybennett on Aug 27, 2010
jeremybennett
Posts: 689
Joined: May 29, 2008
Last seen: Feb 9, 2012

Hi Brucelli

This problem only occurs in unoptimized code. It was due to a bug, which by coincidence was spotted and fixed yesterday. Try the version of GCC in SVN.

Here are some results from that compiler, using a stripped down version of your test case.

#define REG8(add) *((volatile unsigned char *)(add))
#define MSG_UART_BR_LSB_ADDR 0x90000007

void msg_uart_init(void)
{
  /* baud rate config */
  REG8(MSG_UART_BR_LSB_ADDR) = 35;
}

The assembly code is as follows

        l.addi  r1,r1,-4
        l.sw    0(r1),r2
        l.addi  r2,r1,4
        l.movhi r3,hi(-1879048192)
        l.ori   r4,r3,7
        l.addi  r3,r0,35
        l.sb    0(r4),r3
        l.lwz   r2,0(r1)
        l.jr    r9
        l.addi  r1,r1,4

For comparison this is the assembly code with -O2

        l.addi  r1,r1,-4
        l.sw    0(r1),r2
        l.addi  r2,r1,4
        l.movhi r3,hi(-1879048192)
        l.addi  r4,r0,35
        l.ori   r3,r3,7
        l.sb    0(r3),r4
        l.lwz   r2,0(r1)
        l.jr    r9
        l.addi  r1,r1,4

Apart from some register name juggling this is the same. However if we tell GCC to omit the frame pointer (-fomit-frame-pointer), the code is much better, even without optimization. GCC even realizes it doesn't need the stack pointer (r1) either.

        l.movhi r3,hi(-1879048192)
        l.ori   r4,r3,7
        l.addi  r3,r0,35
        l.sb    0(r4),r3
        l.jr    r9
        l.nop

This still is not perfect. The delay slot could have used the l.sb instruction and l.nop been avoided. This represents a failing in the GCC compiler. I know its handling of epilogues is not good and needs improving.

HTH

Jeremy

--
Tel: +44 (1590) 610184
Cell: +44 (7970) 676050
SkypeID: jeremybennett
Email: jeremy.bennett@embecosm.com
Web: www.embecosm.com

RE: low compile efficiency
by mikerez on Aug 27, 2010
mikerez
Posts: 3
Joined: May 14, 2009
Last seen: Apr 12, 2011
>This problem only occurs in unoptimized code. It was due to a bug, which by coincidence was spotted >and fixed yesterday. Try the version of GCC in SVN.

But what I am not sure in is that bug touches unoptimized code only.
I think local allocation metter for optimized code too.
But this is more difficult to check, may be in future.

Mikhail.
RE: low compile efficiency
by bruceli on Aug 30, 2010
bruceli
Posts: 5
Joined: Feb 3, 2009
Last seen: Feb 10, 2011
hi, jb and mikhail:
I apply the patch, now gcc give pretty neat result. I list the disassembled
code below.
Thank you!

-------------------------------------------
void msg_uart_init(void)
{
c3e8: 9c 21 ff fc l.addi r1,r1,0xfffffffc
c3ec: d4 01 10 00 l.sw 0x0(r1),r2
c3f0: 9c 41 00 04 l.addi r2,r1,0x4
/* baud rate config */
REG8(MSG_UART_BR_LSB_ADDR) = 35;
c3f4: 18 60 90 00 l.movhi r3,0x9000
c3f8: a8 83 00 07 l.ori r4,r3,0x7
c3fc: 9c 60 00 23 l.addi r3,r0,0x23
c400: d8 04 18 00 l.sb 0x0(r4),r3
REG8(MSG_UART_BR_MSB_ADDR) = 0;
c404: 18 60 90 00 l.movhi r3,0x9000
......
-------------------------------------------
no use no use 1/1 no use no use
© copyright 1999-2012 OpenCores.org, equivalent to ORSoC AB, all rights reserved. OpenCores®, registered trademark.