[petsc-users] Question about matrix permutation

Barry Smith bsmith at mcs.anl.gov
Sat Jan 30 14:41:03 CST 2010


On Jan 30, 2010, at 1:38 PM, Jed Brown wrote:
>
>> I owe you a beer if the difference is more than say 2 percent.
>
> Would you accept a 30 percent speedup instead of a 2 percent slowdown?
> Apply the attached patch, compile with GCC (I don't know if other
> compilers have the same __builtin_prefetch), and compare the following
> (top result is before the patch).
>
> ./ex19 -ksp_type cgs -pc_type none -ksp_monitor -ksp_max_it 1000 - 
> snes_max_it 1 -da_grid_x 50 -da_grid_y 50 -log_summary
> MatMult             2001 1.0 5.3909e+00 1.0 3.03e+09 1.0 0.0e+00 0.0e 
> +00 0.0e+00 41 41  0  0  0  82 81  0  0  0   563
> MatMult             2001 1.0 3.9953e+00 1.0 3.03e+09 1.0 0.0e+00 0.0e 
> +00 0.0e+00 38 41  0  0  0  77 81  0  0  0   759
>
> ./ex19 -ksp_type cgs -pc_type none -ksp_monitor -ksp_max_it 100 - 
> snes_max_it 1 -da_grid_x 200 -da_grid_y 200 -log_summary
> MatMult              201 1.0 7.9618e+00 1.0 4.98e+09 1.0 0.0e+00 0.0e 
> +00 0.0e+00 28 38  0  0  0  60 77  0  0  0   626
> MatMult              201 1.0 6.1575e+00 1.0 4.98e+09 1.0 0.0e+00 0.0e 
> +00 0.0e+00 24 38  0  0  0  54 77  0  0  0   809
>
> ./ex19 -ksp_type cgs -pc_type none -ksp_monitor -ksp_max_it 100 - 
> snes_max_it 1 -da_grid_x 300 -da_grid_y 300 -log_summary
> MatMult              201 1.0 1.7829e+01 1.0 1.12e+10 1.0 0.0e+00 0.0e 
> +00 0.0e+00 27 38  0  0  0  60 77  0  0  0   630
> MatMult              201 1.0 1.3561e+01 1.0 1.12e+10 1.0 0.0e+00 0.0e 
> +00 0.0e+00 24 38  0  0  0  53 77  0  0  0   828
>
>
> This blows me away.

    This looks great; push it to petsc-dev if it really works.

    BUT, this is NOT hardware prefetching, this is specifically  
software prefetching, you are telling it exactly what to prefetch and  
when.
I would call it hardware prefetching only when the hardware detects a  
particular pattern of access and then automatically extrapolates that  
pattern to prefetch further along the pattern.

    Barry

>
> Jed
>
> diff --git a/src/mat/impls/aij/seq/inode.c b/src/mat/impls/aij/seq/ 
> inode.c
> --- a/src/mat/impls/aij/seq/inode.c
> +++ b/src/mat/impls/aij/seq/inode.c
> @@ -392,6 +392,11 @@
>
> /* ----------------------------------------------------------- */
>
> +#define PetscPrefetchReadOnly(addr,len,loc) do  
> {                        \
> +    char *__p = (char*)(addr),*__end = (char*)((addr)+ 
> (len));           \
> +    for ( ; __p < __end; __p += 64) __builtin_prefetch(__p, 
> 0,loc);      \
> +  } while(0)
> +
> #undef __FUNCT__
> #define __FUNCT__ "MatMult_SeqAIJ_Inode"
> static PetscErrorCode MatMult_SeqAIJ_Inode(Mat A,Vec xx,Vec yy)
> @@ -423,6 +428,8 @@
>     n    = ii[1] - ii[0];
>     nonzerorow += (n>0)*nsz;
>     ii  += nsz;
> +    PetscPrefetchReadOnly(idx+nsz*n,n,0);    /* Prefetch the  
> indices for the block row after the current one */
> +    PetscPrefetchReadOnly(v1+nsz*n,nsz*n,0); /* Prefetch the values  
> for the block row after the current one  */
>     sz   = n;                   /* No of non zeros in this row */
>                                 /* Switch on the size of Node */
>     switch (nsz){               /* Each loop in 'case' is unrolled */



More information about the petsc-users mailing list