![no use](https://cdn.opencores.org/img/pils_lt.png)
![no use](https://cdn.opencores.org/img/pil_lt.png)
![no use](https://cdn.opencores.org/img/pil_rt.png)
![no use](https://cdn.opencores.org/img/pils_rt.png)
Why open processors are so much slower than commercial ones?)
by Unknown on Aug 14, 2004 |
Not available! | ||
Le Jeudi 15 Juillet 2004 16:05, Justin Young a écrit :
Thank you for pointing that our Austin.
You are right, what I meant by markup was that it goes faster! I worded that quite badly! Sorry everyone!
> Actually, this, in my experience, is not true. It is rare, these days,
> that you can hand compile code that runs >faster than a compiler can, especially code of any size. Compilers have >come a very long way over the past few decades. You are right, but I recently compared a matrix multiply program (hence my reference earlier) assembled by hand and compared it with gcc, and hand compiled code ran a lot faster. But do note, it is only a small program. I had try many way to code a matrix multiplication in C. The difference between implementation for the worst case (matric of 512*512) is x25... void mulMatrix11(const unsigned int n, const int A[n][n], const int B[n][n], int C[n][n]) { register int tmp6,tmp7,tmp8,tmp9; register int tmp10,tmp11,tmp12,tmp13; register unsigned int i,j,k,ii,jj,kk,jjj,kkk; for(ii=0; ii for(kk=0; kk for(i=ii;i { for(kkk=kk;kkk __asm__ __volatile__ ("prefetchnta 16(%0) " : : "S" (&(A[kkk+4])) ); tmp6 = A[kkk]; tmp7 = A[kkk+1]; tmp8 = A[kkk+2]; tmp9 = A[kkk+3]; for(j=0;j { __asm__ __volatile__ ("prefetchnta 512(%0)" : : "D"(&(B[kkk][j+128]))); __asm__ __volatile__ ("prefetchnta 512(%0)" : : "S"(&(B[kkk+1][j+128]))); __asm__ __volatile__ ("prefetchnta 512(%0)" : : "D"(&(B[kkk+2][j+128]))); __asm__ __volatile__ ("prefetchnta 512(%0)" : : "S"(&(B[kkk+3][j+128]))); for(jjj=j;jjj C[jjj] += tmp6*B[kkk][jjj]+tmp7*B[kkk+1][jjj] + tmp8*B[kkk+2][jjj]+tmp9*B[kkk+3][jjj]; C[jjj+1] += tmp6*B[kkk][jjj+1]+tmp7*B[kkk+1][jjj+1] + tmp8*B[kkk+2][jjj+1]+tmp9*B[kkk+3][jjj+1]; } } } } } The previous code is between 45% (small matrix) and 400% faster than the following one in average case. void mulMatrix1(int n,int A[n][n] ,int B[n][n],int C[n][n]) { int i,j,k; for(i=0; i for(j=0; j for(k=0; k C[j] = C[j] + A[k]*B[k][j]; } |
![no use](https://cdn.opencores.org/img/pils_lt.png)
![no use](https://cdn.opencores.org/img/pil_lt.png)
![no use](https://cdn.opencores.org/img/pil_rt.png)
![no use](https://cdn.opencores.org/img/pils_rt.png)