OpenCores
no use no use 1/1 no use no use
Why open processors are so much slower than commercial ones?)
by Unknown on Aug 14, 2004
Not available!
Le Jeudi 15 Juillet 2004 16:05, Justin Young a écrit :
Thank you for pointing that our Austin.
You are right, what I meant by markup was that it goes faster! I worded
that quite badly! Sorry everyone!

> Actually, this, in my experience, is not true. It is rare, these days,
> that you can hand compile code that runs
>faster than a compiler can, especially code of any size. Compilers have
>come a very long way over the past few decades.


You are right, but I recently compared a matrix multiply program (hence my
reference earlier) assembled by hand and compared it with gcc, and hand
compiled code ran a lot faster. But do note, it is only a small program.


I had try many way to code a matrix multiplication in C. The difference
between implementation for the worst case (matric of 512*512) is x25...


void mulMatrix11(const unsigned int n, const int A[n][n], const int B[n][n],
int C[n][n])
{

register int tmp6,tmp7,tmp8,tmp9;
register int tmp10,tmp11,tmp12,tmp13;

register unsigned int i,j,k,ii,jj,kk,jjj,kkk;

for(ii=0; ii for(kk=0; kk for(i=ii;i {

for(kkk=kk;kkk __asm__ __volatile__ ("prefetchnta 16(%0) " : : "S" (&(A[kkk+4])) );

tmp6 = A[kkk];
tmp7 = A[kkk+1];
tmp8 = A[kkk+2];
tmp9 = A[kkk+3];

for(j=0;j {
__asm__ __volatile__ ("prefetchnta 512(%0)" : :
"D"(&(B[kkk][j+128])));
__asm__ __volatile__ ("prefetchnta 512(%0)" : :
"S"(&(B[kkk+1][j+128])));
__asm__ __volatile__ ("prefetchnta 512(%0)" : :
"D"(&(B[kkk+2][j+128])));
__asm__ __volatile__ ("prefetchnta 512(%0)" : :
"S"(&(B[kkk+3][j+128])));

for(jjj=j;jjj C[jjj] += tmp6*B[kkk][jjj]+tmp7*B[kkk+1][jjj]
+ tmp8*B[kkk+2][jjj]+tmp9*B[kkk+3][jjj];

C[jjj+1] += tmp6*B[kkk][jjj+1]+tmp7*B[kkk+1][jjj+1]
+ tmp8*B[kkk+2][jjj+1]+tmp9*B[kkk+3][jjj+1];
}
}
}
}
}


The previous code is between 45% (small matrix) and 400% faster than the
following one in average case.

void mulMatrix1(int n,int A[n][n] ,int B[n][n],int C[n][n])
{
int i,j,k;
for(i=0; i for(j=0; j for(k=0; k C[j] = C[j] + A[k]*B[k][j];
}



no use no use 1/1 no use no use
© copyright 1999-2025 OpenCores.org, equivalent to Oliscience, all rights reserved. OpenCores®, registered trademark.