[petsc-users] Question about matrix permutation
Barry Smith
bsmith at mcs.anl.gov
Sat Jan 30 14:41:03 CST 2010
On Jan 30, 2010, at 1:38 PM, Jed Brown wrote:
>
>> I owe you a beer if the difference is more than say 2 percent.
>
> Would you accept a 30 percent speedup instead of a 2 percent slowdown?
> Apply the attached patch, compile with GCC (I don't know if other
> compilers have the same __builtin_prefetch), and compare the following
> (top result is before the patch).
>
> ./ex19 -ksp_type cgs -pc_type none -ksp_monitor -ksp_max_it 1000 -
> snes_max_it 1 -da_grid_x 50 -da_grid_y 50 -log_summary
> MatMult 2001 1.0 5.3909e+00 1.0 3.03e+09 1.0 0.0e+00 0.0e
> +00 0.0e+00 41 41 0 0 0 82 81 0 0 0 563
> MatMult 2001 1.0 3.9953e+00 1.0 3.03e+09 1.0 0.0e+00 0.0e
> +00 0.0e+00 38 41 0 0 0 77 81 0 0 0 759
>
> ./ex19 -ksp_type cgs -pc_type none -ksp_monitor -ksp_max_it 100 -
> snes_max_it 1 -da_grid_x 200 -da_grid_y 200 -log_summary
> MatMult 201 1.0 7.9618e+00 1.0 4.98e+09 1.0 0.0e+00 0.0e
> +00 0.0e+00 28 38 0 0 0 60 77 0 0 0 626
> MatMult 201 1.0 6.1575e+00 1.0 4.98e+09 1.0 0.0e+00 0.0e
> +00 0.0e+00 24 38 0 0 0 54 77 0 0 0 809
>
> ./ex19 -ksp_type cgs -pc_type none -ksp_monitor -ksp_max_it 100 -
> snes_max_it 1 -da_grid_x 300 -da_grid_y 300 -log_summary
> MatMult 201 1.0 1.7829e+01 1.0 1.12e+10 1.0 0.0e+00 0.0e
> +00 0.0e+00 27 38 0 0 0 60 77 0 0 0 630
> MatMult 201 1.0 1.3561e+01 1.0 1.12e+10 1.0 0.0e+00 0.0e
> +00 0.0e+00 24 38 0 0 0 53 77 0 0 0 828
>
>
> This blows me away.
This looks great; push it to petsc-dev if it really works.
BUT, this is NOT hardware prefetching, this is specifically
software prefetching, you are telling it exactly what to prefetch and
when.
I would call it hardware prefetching only when the hardware detects a
particular pattern of access and then automatically extrapolates that
pattern to prefetch further along the pattern.
Barry
>
> Jed
>
> diff --git a/src/mat/impls/aij/seq/inode.c b/src/mat/impls/aij/seq/
> inode.c
> --- a/src/mat/impls/aij/seq/inode.c
> +++ b/src/mat/impls/aij/seq/inode.c
> @@ -392,6 +392,11 @@
>
> /* ----------------------------------------------------------- */
>
> +#define PetscPrefetchReadOnly(addr,len,loc) do
> { \
> + char *__p = (char*)(addr),*__end = (char*)((addr)+
> (len)); \
> + for ( ; __p < __end; __p += 64) __builtin_prefetch(__p,
> 0,loc); \
> + } while(0)
> +
> #undef __FUNCT__
> #define __FUNCT__ "MatMult_SeqAIJ_Inode"
> static PetscErrorCode MatMult_SeqAIJ_Inode(Mat A,Vec xx,Vec yy)
> @@ -423,6 +428,8 @@
> n = ii[1] - ii[0];
> nonzerorow += (n>0)*nsz;
> ii += nsz;
> + PetscPrefetchReadOnly(idx+nsz*n,n,0); /* Prefetch the
> indices for the block row after the current one */
> + PetscPrefetchReadOnly(v1+nsz*n,nsz*n,0); /* Prefetch the values
> for the block row after the current one */
> sz = n; /* No of non zeros in this row */
> /* Switch on the size of Node */
> switch (nsz){ /* Each loop in 'case' is unrolled */
More information about the petsc-users
mailing list