most efficient way to use local CSR matrix?

Chris Kees christopher.e.kees at usace.army.mil
Thu Oct 29 15:34:23 CDT 2009


I pulled the most recent version of petsc-dev and added superlu_dist.   
The only case that fails is spooles on 1 processor with an mpiaij  
matrix created with the Create/SetSizes/... paradigm.  I would suspect  
something in spooles wrapper code.

code allocating mat:
MatCreate(Py_PETSC_COMM_WORLD,&self->m);
MatSetSizes(self->m,n,n,N,N);
MatSetType(self->m,MATMPIAIJ);

results:
-pc_type lu -pc_factor_mat_solver_packages spooles - 
ksp_monitor_true_residual (spooles on 1 proc)
  0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
  1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm  
1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
  2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm  
2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14

-pc_type lu -pc_factor_mat_solver_packages spooles - 
ksp_monitor_true_residual (spooleson 8 procs)
0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
  1 KSP preconditioned resid norm 2.898588418444e-14 true resid norm  
1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15

-pc_type lu -pc_factor_mat_solver_packages superlu_dist - 
ksp_monitor_true_residual
0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
  1 KSP preconditioned resid norm 2.917052183641e-14 true resid norm  
1.299023464819e-13 ||Ae||/||Ax|| 1.222263471084e-14

code allocating mat:

MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);

results:
-pc_type lu -pc_factor_mat_solver_packages spooles - 
ksp_monitor_true_residual (spooles on 1 proc)
  0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm  
1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
  1 KSP preconditioned resid norm 3.093811333624e-14 true resid norm  
1.046870732638e-13 ||Ae||/||Ax|| 9.850105791801e-15

On Oct 29, 2009, at 1:50 PM, Matthew Knepley wrote:

> On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees <christopher.e.kees at usace.army.mil 
> > wrote:
> Mark, Barry,
>
> Thanks for the help.  For now I'm sticking with the approach of  
> copying the csr matrix and using the csr data structures to do the  
> preallocations. I'll eventually get around to writing the code for  
> assembling directly into the petsc matrix.  I have two more questions.
>
> 1) On 1 processor, the linear solver is not converging in 1  
> iteration using -pc_type lu -pc_factor_mat_solver_package spooles.  
> It converges in one iteration in parallel or on a single processor  
> if I set -mat_type seqaij. Also, if I go back to the old way of  
> constructing the matrix it DOES converge in 1 iteration on 1  
> processor. That is, if I replace the MatCreate/MatSetSizes/...   
> calls from the earlier email, with what I had originally:
>
> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N, 
> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>
> then the solver converges in 1 iteration as expected.  This wouldn't  
> be a big deal except that we're trying to verify that our quadratic  
> finite elements are working, and the behavior for p1 and p2 elements  
> is different. The above observations are for p2, while all the cases  
> work as expected for p1. Should we just not be using mpiaij at all  
> on 1 processor?
>
> No, you have a problem.
>
> 2) In the code snippet I took from the petsc example there are two  
> preallocation calls (see below). Is that correct, or should I be  
> checking the matrix type and calling only the preallocation function  
> for the actual matrix type? i.e. something like
>
> MatCreate(...
> MatSetSizes(...
> MatSetFromOptions(...
> MatGetType(...
> if type==mpiaij: MatMPIAIJSetPreallocationCSR(...
>
> It does not hurt to call them all.
>
>   Matt
>
> Chris
>
> On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote:
>
> Chris, just a note,
>
> Perhaps I missed something here but do something similar to you, eg  
> overlapping partitions, and PETSc is setup very well to be intrusive  
> in your code (I sometimes write a little 'addvalue' wrapper) and  
> assemble your FE matrices directly into a global matrix.  You use  
> the (global) MatSetValues(...) and set the matrix option  
> MAT_IGNORE_OFF_PROC_ENTRIES (or something like that) so that these  
> off processor matrix entries, which are computed redundantly, are  
> thrown away and you get the correct matrix.
>
> As Barry said you don't need exact non-zero counts to allocate  
> storage in PETSC matrices, just don't underestimate.
>
> Mark
>
> On Oct 21, 2009, at 9:16 AM, Barry Smith wrote:
>
>
> On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:
>
> Thanks. Just to make sure I'm following, when I create the matrix I  
> do:
>
>   MatCreate(PETSC_COMM_WORLD,&self->m);
>   MatSetSizes(self->m,n,n,N,N);
>   MatSetType(self->m,MATMPIAIJ);
>   MatSetFromOptions(self->m);
>   MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
> >A.colind,(double*)(SMP(L)->A.nzval));
>   MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)- 
> >A.colind,(double*)(SMP(L)->A.nzval));
>
> and then I copy the matrix at each Newton iteration using the code  
> in the first message below.  You said "...PreallocationCSR() does  
> the copy very efficiently", but I don't think I'm supposed to  
> repeatedly call it, right?  Looking at the output some more it does  
> seem like building the preconditioner initially is taking much more  
> time than the initial pass through MatSetValues.
>
>  If the matrix nonzero structure stays the same then yes you need  
> call MatMPIAIJSetPreallocationCSR() only once and use the  
> MatSetValues() calls each Newton step to update the nonzeros.
>
>  Barry
>
>
> Chris
>
> On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:
>
>
> Here's the deal. In parallel PETSc does not use a single CSR matrix  
> on each process to hold the MPIAIJ matrix. Hence if you store the  
> matrix on a process as a single CSR matrix then it has to be copied  
> into the PETSc datastructure. MatMPIAIJSetPreallocationCSR() does  
> the copy very efficiently. But you are right, until you free YOUR  
> rowpt[], colind[] and a[] there are two copies of the matrix.
> One could argue that this is ok, because anyways any preconditioner  
> worth anything will take up at least that much memory later  
> anyways.  For example, if you use the PETSc default preconditioner,  
> block Jacobi with ILU(0) on each block it will take pretty much the  
> same amount of memory.
>
> For people who use SOR or some other preconditioner that requires  
> very little memory this is not satisfactory because they feel they  
> could run larger problems without the copy.
>
> I strongly recommend you use MatMPIAIJSetPreallocationCSR() and live  
> with any memory limitation. The reason is that doing more than that  
> has orders of magnitude more pain with generally little payoff.
> If you like pain then you can
> 1) change your code to not build directly into CSR, instead call  
> MatSetValuesLocal() as you compute the entries. With this you need  
> to figure out the preallocation which can be painful code.
> 2) write an entire matrix class that stores the matrix like you  
> store it, not like you store it. This is a huge project, since you  
> have to write your own matrix-vector, ILU factorization etc.
>
> Barry
>
>
>
> On Oct 20, 2009, at 8:04 PM, Chris Kees wrote:
>
> Hi,
>
> Our unstructured finite element code does a fairly standard  
> overlapping decomposition that allows it to calculate all the non- 
> zero column entries for the rows owned by the processor (rows  
> 0...ldim-1 in the local numbering).  We assemble them into a local  
> CSR matrix and then copy them into a PETSc matrix like this:
>
> for (int i=0;i<ldim;i++)
> {
> irow[0] = i;
> MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]- 
> rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES);
> }
>
> where rowptr and colind are the CSR indexing arrays in the local  
> numbering.
>
> I would like to eliminate the duplicate memory (and the copy above).  
> Is there a recommended way to let PETSc reference array 'a' directly  
> or a way to get rid of 'a' and translate our CSR assembly routines  
> to use low level PETSc data structures?
>
> So far I've added MatMPIAIJSetPreallocationCSR to try to speed up  
> the first solve, but the documentation is clear that 'a' is not a  
> pointer to the AIJ storage so I still need the copy. I tried using  
> MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to zero but I  
> got indexing errors, and the documentation really seems to imply  
> that the local matrix is really not a standard CSR anyway. If that's  
> true then it seems like the options I have are to write a shell  
> matrix for parallel CSR storage or write some new assembly routines  
> that use MatSetValuesLocal. I'm more concerned about the memory than  
> the overhead of MatSetValuesLocal, but it would certainly be easier  
> on me to use the CSR-based assembly routines we already have.
>
> Thanks,
> Chris
>
>
>
>
>
>
>
>
>
>
>
>
> -- 
> What most experimenters take for granted before they begin their  
> experiments is infinitely more interesting than any results to which  
> their experiments lead.
> -- Norbert Wiener

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20091029/f88ff12f/attachment.html>


More information about the petsc-dev mailing list