OpenCores

OpenCores

about or32-uclinux-gcc

no use

no use

1/1

no use

no use

about or32-uclinux-gcc by Unknown on Jan 13, 2004			Not available!
Hi all, I used or32-uclinux-gcc generated an assembly file from a FIR filter that coded in C language. ////// c code ///// #include<stdlib.h> #include<stdio.h> int main() { int i; int N; int F; int x[10], c[10]; F =0; N =10; for(i=0; i<N; i++){ x[i] = i; c[i] = 2i; } for(i=0; i<N; i++){ F=F + x[i]c[i]; } } I found that the generated ASM code is inefficient. i.e. in a loop operation, it takes a lot of time on calculating the address of storing the data for example l.lwz r4,-4(r2) # SI load l.addi r3,r0,4 # move immediate l.mul r4,r4,r3 l.addi r3,r2,-52 l.add r5,r3,r4 # calculating the address of x[i] It wastes a lot of time. Can anyone tell me if there is any options in the compiler that can used to geneate the optimized assembly code Thank you very much. Regards, Stephen //// the detailed ASM code here/// .file "fir.c" .text .align 4 .proc _main .global _main .type <A HREF="mailto:_main,@function">_main,@function</A> _main: # 00111100000100000000000000000000 # gpr_save_area 0 vars 92 current_function_outgoing_args_size 0 l.addi r1,r1,-96 # reserve 96 bytes for store the data 204 +4(i) +4(N)+4(F)+4(for storing r2 value) l.sw 0(r1),r2 l.addi r2,r1,96 l.addi r3,r0,0 # move immediate l.sw -12(r2),r3 # F l.addi r4,r0,10 # move immediate N l.sw -8(r2),r4 l.addi r3,r0,0 # move immediate i l.sw -4(r2),r3 .L2: l.lwz r4,-4(r2) # SI load l.lwz r3,-8(r2) # SI load l.sflts r4,r3 # if r4 < r3, the flag is set otherwise clear l.bf .L5 # if flag set jump to .L5 l.nop # nop delay slot l.j .L3 l.nop # nop delay slot .L5: l.lwz r4,-4(r2) # SI load # load index i l.addi r3,r0,4 # move immediate # r3 stores 4 (4byte = 32 bits) l.mul r4,r4,r3 # calculate the offset address l.addi r3,r2,-52 # the address for storing x[0] l.add r3,r3,r4 # address for x[i] l.lwz r4,-4(r2) # SI load # load index i l.sw 0(r3),r4 # x[i] = i; l.lwz r4,-4(r2) # SI load l.addi r3,r0,4 # move immediate l.mul r4,r4,r3 # calculating the offset address l.addi r3,r2,-92 # the address for storing c[0] l.add r5,r3,r4 # address for storing c[i] l.lwz r4,-4(r2) # SI load # load index i l.addi r3,r0,2 # move immediate r3 stores the constant 2 l.mul r3,r4,r3 # 2i l.sw 0(r5),r3 # c[i]=2i l.lwz r3,-4(r2) # SI load load index l.addi r3,r3,1 # index increased 1 l.sw -4(r2),r3 # update the index value l.j .L2 l.nop # nop delay slot .L3: l.addi r3,r0,0 # move immediate # reset the index i to zero for new loop l.sw -4(r2),r3 .L6: l.lwz r4,-4(r2) # SI load (if i<N) l.lwz r3,-8(r2) # SI load l.sflts r4,r3 l.bf .L9 l.nop # nop delay slot l.j .L7 l.nop # nop delay slot .L9: # i<N l.lwz r4,-4(r2) # SI load l.addi r3,r0,4 # move immediate l.mul r4,r4,r3 l.addi r3,r2,-52 l.add r5,r3,r4 # calculating the address of x[i] l.lwz r4,-4(r2) # SI load l.addi r3,r0,4 # move immediate l.mul r4,r4,r3 l.addi r3,r2,-92 l.add r3,r3,r4 # calculating address c[i] l.lwz r4,0(r5) # SI load get the value of x[i] l.lwz r3,0(r3) # SI load get teh value of c[i] l.mul r4,r4,r3 # x[i] c[i] l.lwz r3,-12(r2) # SI load load the value of F l.add r3,r3,r4 # F=F+x[i]c[i] l.sw -12(r2),r3 # store the value of F into address -12 (r2) l.lwz r3,-4(r2) # SI load # load index l.addi r3,r3,1 # index increased 1 l.sw -4(r2),r3 # update index l.j .L6 l.nop # nop delay slot .L7: l.ori r11,r3,0 # move reg to reg l.lwz r2,0(r1) # restore the value of r2 l.jr r9 # jump to effective address stored in r9 l.addi r1,r1,96 # restore the value of r1 .endproc _main .Lfe1: .size _main,.Lfe1-_main .ident "GCC: (GNU) 3.1 20020121 (experimental)" Hi all, I used or32-uclinux-gcc generated an assembly file from a FIR filter that coded in C language. ////// c code ///// #include #include int main() { int i; int N; int F; int x[10], c[10]; F =0; N =10; for(i=0; i_main,@function _main: # 00111100000100000000000000000000 # gpr_save_area 0 vars 92 current_function_outgoing_args_size 0 l.addi r1,r1,-96 # reserve 96 bytes for store the data 204 +4(i) +4(N)+4(F)+4(for storing r2 value) l.sw 0(r1),r2 l.addi r2,r1,96 l.addi r3,r0,0 # move immediate l.sw -12(r2),r3 # F l.addi r4,r0,10 # move immediate N l.sw -8(r2),r4 l.addi r3,r0,0 # move immediate i l.sw -4(r2),r3 .L2: l.lwz r4,-4(r2) # SI load l.lwz r3,-8(r2) # SI load l.sflts r4,r3 # if r4

about or32-uclinux-gcc by Unknown on Jan 13, 2004			Not available!
[q] I found that the generated ASM code is inefficient. i.e. in a loop operation, it takes a lot of time on calculating the address of storing the data for example l.lwz r4,-4(r2) # SI load l.addi r3,r0,4 # move immediate l.mul r4,r4,r3 l.addi r3,r2,-52 l.add r5,r3,r4 # calculating the address of x[i][/q] seems like gcc has problems recognizing l.muli instruction. Marko I found that the generated ASM code is inefficient. i.e. in a loop operation, it takes a lot of time on calculating the address of storing the data for example l.lwz r4,-4(r2) # SI load l.addi r3,r0,4 # move immediate l.mul r4,r4,r3 l.addi r3,r2,-52 l.add r5,r3,r4 # calculating the address of x seems like gcc has problems recognizing l.muli instruction. Marko

about or32-uclinux-gcc by Unknown on Jan 13, 2004			Not available!
[q] l.lwz r4,-4(r2) # SI load l.addi r3,r0,4 # move immediate l.mul r4,r4,r3 l.addi r3,r2,-52 l.add r5,r3,r4 # calculating the address of x[i][/q] seems like gcc has problems recognizing l.muli instruction.[/q] the l.muli is not used by our gcc, a thing that might be worth doing too. I'm no asm guru, but this looks like a shift would suffice as well. And then, there could be one of the l.add saved if the array address was calculated outside the loop. Stephen, do you have optimization (-O or -O2) enabled? Heiko l.lwz r4,-4(r2) # SI load l.addi r3,r0,4 # move immediate l.mul r4,r4,r3 l.addi r3,r2,-52 l.add r5,r3,r4 # calculating the address of x seems like gcc has problems recognizing l.muli instruction. the l.muli is not used by our gcc, a thing that might be worth doing too. I'm no asm guru, but this looks like a shift would suffice as well. And then, there could be one of the l.add saved if the array address was calculated outside the loop. Stephen, do you have optimization (-O or -O2) enabled? Heiko

about or32-uclinux-gcc by Unknown on Jan 14, 2004			Not available!
Hi Heiko, Thanks for the comment. I didnot enable the optimization option, -O or -O2. Now I have tried to enable the optimization option, but I found that there are some problems. my c code is a fir filter ////// C source code///// #include<stdlib.h> #include<stdio.h> int main() { int i; int N; int F; int x[10], c[10]; F =0; N =10; for(i=0; i<N; i++){ x[i] = i; c[i] = 2i; } for(i=0; i<N; i++){ F=F + x[i]c[i]; } } It seems that there are something wrong in the generated assembly code with optimization option.(wrong operation). There are two loops in C code. The first one is for initialization. The second one is for the calculation. The code for the first loop is o.k. But for the second one, it is wrong, it just have a loop but do nothing on calculation. And the assembly code even does not have a memory location for the variable F. The detail showed in the following source code. Any idea about it? Is it the bug of gcc when it operates with optimization option? ////// generated assembly code //// .file "fir3.c" .text .align 4 .proc _main .global _main .type <A HREF="mailto:_main,@function">_main,@function</A> _main: # 00011111110100000000000000000000 # gpr_save_area 0 vars 80 current_function_outgoing_args_size 0 l.addi r1,r1,-84 l.sw 0(r1),r9 l.addi r7,r0,10 # move immediate l.addi r6,r0,0 # move immediate l.addi r9,r1,44 l.addi r8,r1,4 # first loop for initialization, no problem .L5: l.slli r3,r6,2 l.slli r5,r6,1 l.add r4,r9,r3 l.add r3,r8,r3 l.sw 0(r4),r6 l.addi r6,r6,1 l.sflts r6,r7 l.bf .L5 # delay slot filled l.sw 0(r3),r5 l.addi r6,r0,0 # move immediate l.addi r6,r6,1 # the second loop, just have looping, no operation, wrong (bug???) .L16: l.sflts r6,r7 l.bf .L16 # delay slot filled l.addi r6,r6,1 l.lwz r9,0(r1) l.jr r9 l.addi r1,r1,84 .endproc _main .Lfe1: .size _main,.Lfe1-_main .ident "GCC: (GNU) 3.1 20020121 (experimental)" Heiko Panther <<A HREF="mailto:heiko.panther@web.de">heiko.panther@web.de</A>> wrote: [q] l.lwz r4,-4(r2) # SI load l.addi r3,r0,4 # move immediate l.mul r4,r4,r3 l.addi r3,r2,-52 l.add r5,r3,r4 # calculating the address of x[i][/q] seems like gcc has problems recognizing l.muli instruction.[/q] the l.muli is not used by our gcc, a thing that might be worth doing too. I'm no asm guru, but this looks like a shift would suffice as well. And then, there could be one of the l.add saved if the array address was calculated outside the loop. Stephen, do you have optimization (-O or -O2) enabled? Heiko Shining FriendsÂ¡BÂ¦nÂ¤ÃŸÂ¦nÂ³Ã¸Â¡BÂ·Â³Â¤Ã«Â¦pÂºq... Â®Ã¶ÂºÂ©Â¹aÃn Â±Â¡Â¤ÃŸÂ³sÃƒÂ´ <A HREF="http://ringtone.yahoo.com.hk/">http://ringtone.yahoo.com.hk/</A> -------------- next part -------------- An HTML attachment was scrubbed... URL: <A HREF="http://www.opencores.org/forums/openrisc/attachments/20040114/360d5aaf/attachment.htm">http://www.opencores.org/forums/openrisc/attachments/20040114/360d5aaf/attachment.htm</A> Hi Heiko, Thanks for the comment. I didnot enable the optimization option, -O or -O2. Now I have tried to enable the optimization option, but I found that there are some problems. my c code is a fir filter ////// C source code///// #include #include int main() { int i; int N; int F; int x[10], c[10]; F =0; N =10; for(i=0; i_main,@function _main: # 00011111110100000000000000000000 # gpr_save_area 0 vars 80 current_function_outgoing_args_size 0 l.addi r1,r1,-84 l.sw 0(r1),r9 l.addi r7,r0,10 # move immediate l.addi r6,r0,0 # move immediate l.addi r9,r1,44 l.addi r8,r1,4 # first loop for initialization, no problem .L5: l.slli r3,r6,2 l.slli r5,r6,1 l.add r4,r9,r3 l.add r3,r8,r3 l.sw 0(r4),r6 l.addi r6,r6,1 l.sflts r6,r7 l.bf .L5 # delay slot filled l.sw 0(r3),r5 l.addi r6,r0,0 # move immediate l.addi r6,r6,1 # the second loop, just have looping, no operation, wrong (bug???) .L16: l.sflts r6,r7 l.bf .L16 # delay slot filled l.addi r6,r6,1 l.lwz r9,0(r1) l.jr r9 l.addi r1,r1,84 .endproc _main .Lfe1: .size _main,.Lfe1-_main .ident "GCC: (GNU) 3.1 20020121 (experimental)" Heiko Panther heiko.panther@web.de> wrote: l.lwz r4,-4(r2) # SI load l.addi r3,r0,4 # move immediate l.mul r4,r4,r3 l.addi r3,r2,-52 l.add r5,r3,r4 # calculating the address of x seems like gcc has problems recognizing l.muli instruction. the l.muli is not used by our gcc, a thing that might be worth doing too. I'm no asm guru, but this looks like a shift would suffice as well. And then, there could be one of the l.add saved if the array address was calculated outside the loop. Stephen, do you have optimization (-O or -O2) enabled? Heiko Shining FriendsÂ¡BÂ¦nÂ¤ÃŸÂ¦nÂ³Ã¸Â¡BÂ·Â³Â¤Ã«Â¦pÂºq... Â®Ã¶ÂºÂ©Â¹aÃn Â±Â¡Â¤ÃŸÂ³sÃƒÂ´ http://ringtone.yahoo.com.hk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.opencores.org/forums/openrisc/attachments/20040114/360d5aaf/attachment.htm

about or32-uclinux-gcc by Unknown on Jan 14, 2004			Not available!
Stephen, [q] Any idea about it? Is it the bug of gcc when it operates with optimization option?[/q] I would guess that your code is optimized away because you're not using the results. Try and use the results, and see what happens then. Heiko Stephen, Any idea about it? Is it the bug of gcc when it operates with optimization option? I would guess that your code is optimized away because you're not using the results. Try and use the results, and see what happens then. Heiko

no use

no use

1/1

no use

no use

© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.