most efficient way to use local CSR matrix?
Barry Smith
bsmith at mcs.anl.gov
Thu Oct 29 21:00:00 CDT 2009
Have you run this with the debug version of PETSc? It does
additional argument testing that might catch a funny value passed in.
Barry
On Oct 29, 2009, at 5:02 PM, Chris Kees wrote:
> I think I added the correct test code and got a difference of 0, but
> spooles is still not returning the correct result, at least not
> until the 2nd iteration. I'm attaching some of the code and output.
>
> Chris
>
> code for mat creates
> ---------------------------
> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N,
> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m2);
>
> MatCreate(Py_PETSC_COMM_WORLD,&self->m);
> MatSetSizes(self->m,n,n,N,N);
> MatSetFromOptions(self->m);
> MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)-
> >A.colind,(double*)(SMP(L)->A.nzval));
> MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)-
> >A.colind,(double*)(SMP(L)->A.nzval));
>
>
> code for difference in mats
> -----------------------------------
>
> MatAXPY
> (PETSCMAT2(par_L),-1.0,PETSCMAT(par_L),DIFFERENT_NONZERO_PATTERN);
> PetscReal norm=-1.0;
> MatNorm(PETSCMAT2(par_L),NORM_INFINITY,&norm);
> std::cout<<"inf norm of L - L2 = "<<norm<<std::endl;
>
> results for -pc_type lu -pc_factor_mat_solver_package spooles -
> ksp_monitor_true_residual -mat_type mpiaij
> -----------------------------------
>
> inf norm of L - L2 = 0
> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm
> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm
> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm
> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
> On Oct 29, 2009, at 3:56 PM, Barry Smith wrote:
>
>>
>> Hmm, this should not happen. The matrix should be identical in
>> both cases (note the initial residual is the same in both cases so
>> they may be identical).
>>
>> Here's one more thing you can try. In the same routine create TWO
>> matrices; one each way, then use MatAXPY() to take the difference
>> between them.
>> Is it zero? If not, that will help lead to where the problem might
>> be.
>>
>> If that does not help the situation, can you send the code the
>> demonstrates this problem to petsc-maint at mcs.anl.gov
>>
>>
>> Barry
>>
>> On Oct 29, 2009, at 3:34 PM, Chris Kees wrote:
>>
>>> I pulled the most recent version of petsc-dev and added
>>> superlu_dist. The only case that fails is spooles on 1 processor
>>> with an mpiaij matrix created with the Create/SetSizes/...
>>> paradigm. I would suspect something in spooles wrapper code.
>>>
>>> code allocating mat:
>>> MatCreate(Py_PETSC_COMM_WORLD,&self->m);
>>> MatSetSizes(self->m,n,n,N,N);
>>> MatSetType(self->m,MATMPIAIJ);
>>>
>>> results:
>>> -pc_type lu -pc_factor_mat_solver_packages spooles -
>>> ksp_monitor_true_residual (spooles on 1 proc)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>> 1 KSP preconditioned resid norm 3.936838941525e-01 true resid norm
>>> 1.050257318153e+01 ||Ae||/||Ax|| 9.881970495395e-01
>>> 2 KSP preconditioned resid norm 7.332823321496e-14 true resid norm
>>> 2.408053447453e-13 ||Ae||/||Ax|| 2.265760276816e-14
>>>
>>> -pc_type lu -pc_factor_mat_solver_packages spooles -
>>> ksp_monitor_true_residual (spooleson 8 procs)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>> 1 KSP preconditioned resid norm 2.898588418444e-14 true resid norm
>>> 1.045214942604e-13 ||Ae||/||Ax|| 9.834526306678e-15
>>>
>>> -pc_type lu -pc_factor_mat_solver_packages superlu_dist -
>>> ksp_monitor_true_residual
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>> 1 KSP preconditioned resid norm 2.917052183641e-14 true resid norm
>>> 1.299023464819e-13 ||Ae||/||Ax|| 1.222263471084e-14
>>>
>>> code allocating mat:
>>>
>>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N,
>>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>>>
>>> results:
>>> -pc_type lu -pc_factor_mat_solver_packages spooles -
>>> ksp_monitor_true_residual (spooles on 1 proc)
>>> 0 KSP preconditioned resid norm 2.284405915285e+01 true resid norm
>>> 1.062801511746e+01 ||Ae||/||Ax|| 1.000000000000e+00
>>> 1 KSP preconditioned resid norm 3.093811333624e-14 true resid norm
>>> 1.046870732638e-13 ||Ae||/||Ax|| 9.850105791801e-15
>>>
>>> On Oct 29, 2009, at 1:50 PM, Matthew Knepley wrote:
>>>
>>>> On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees <christopher.e.kees at usace.army.mil
>>>> > wrote:
>>>> Mark, Barry,
>>>>
>>>> Thanks for the help. For now I'm sticking with the approach of
>>>> copying the csr matrix and using the csr data structures to do
>>>> the preallocations. I'll eventually get around to writing the
>>>> code for assembling directly into the petsc matrix. I have two
>>>> more questions.
>>>>
>>>> 1) On 1 processor, the linear solver is not converging in 1
>>>> iteration using -pc_type lu -pc_factor_mat_solver_package
>>>> spooles. It converges in one iteration in parallel or on a single
>>>> processor if I set -mat_type seqaij. Also, if I go back to the
>>>> old way of constructing the matrix it DOES converge in 1
>>>> iteration on 1 processor. That is, if I replace the MatCreate/
>>>> MatSetSizes/... calls from the earlier email, with what I had
>>>> originally:
>>>>
>>>> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N,
>>>> 1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>>>>
>>>> then the solver converges in 1 iteration as expected. This
>>>> wouldn't be a big deal except that we're trying to verify that
>>>> our quadratic finite elements are working, and the behavior for
>>>> p1 and p2 elements is different. The above observations are for
>>>> p2, while all the cases work as expected for p1. Should we just
>>>> not be using mpiaij at all on 1 processor?
>>>>
>>>> No, you have a problem.
>>>>
>>>> 2) In the code snippet I took from the petsc example there are
>>>> two preallocation calls (see below). Is that correct, or should I
>>>> be checking the matrix type and calling only the preallocation
>>>> function for the actual matrix type? i.e. something like
>>>>
>>>> MatCreate(...
>>>> MatSetSizes(...
>>>> MatSetFromOptions(...
>>>> MatGetType(...
>>>> if type==mpiaij: MatMPIAIJSetPreallocationCSR(...
>>>>
>>>> It does not hurt to call them all.
>>>>
>>>> Matt
>>>>
>>>> Chris
>>>>
>>>> On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote:
>>>>
>>>> Chris, just a note,
>>>>
>>>> Perhaps I missed something here but do something similar to you,
>>>> eg overlapping partitions, and PETSc is setup very well to be
>>>> intrusive in your code (I sometimes write a little 'addvalue'
>>>> wrapper) and assemble your FE matrices directly into a global
>>>> matrix. You use the (global) MatSetValues(...) and set the
>>>> matrix option MAT_IGNORE_OFF_PROC_ENTRIES (or something like
>>>> that) so that these off processor matrix entries, which are
>>>> computed redundantly, are thrown away and you get the correct
>>>> matrix.
>>>>
>>>> As Barry said you don't need exact non-zero counts to allocate
>>>> storage in PETSC matrices, just don't underestimate.
>>>>
>>>> Mark
>>>>
>>>> On Oct 21, 2009, at 9:16 AM, Barry Smith wrote:
>>>>
>>>>
>>>> On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:
>>>>
>>>> Thanks. Just to make sure I'm following, when I create the matrix
>>>> I do:
>>>>
>>>> MatCreate(PETSC_COMM_WORLD,&self->m);
>>>> MatSetSizes(self->m,n,n,N,N);
>>>> MatSetType(self->m,MATMPIAIJ);
>>>> MatSetFromOptions(self->m);
>>>> MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)-
>>>> >A.colind,(double*)(SMP(L)->A.nzval));
>>>> MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)-
>>>> >A.colind,(double*)(SMP(L)->A.nzval));
>>>>
>>>> and then I copy the matrix at each Newton iteration using the
>>>> code in the first message below. You said "...PreallocationCSR()
>>>> does the copy very efficiently", but I don't think I'm supposed
>>>> to repeatedly call it, right? Looking at the output some more it
>>>> does seem like building the preconditioner initially is taking
>>>> much more time than the initial pass through MatSetValues.
>>>>
>>>> If the matrix nonzero structure stays the same then yes you need
>>>> call MatMPIAIJSetPreallocationCSR() only once and use the
>>>> MatSetValues() calls each Newton step to update the nonzeros.
>>>>
>>>> Barry
>>>>
>>>>
>>>> Chris
>>>>
>>>> On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:
>>>>
>>>>
>>>> Here's the deal. In parallel PETSc does not use a single CSR
>>>> matrix on each process to hold the MPIAIJ matrix. Hence if you
>>>> store the matrix on a process as a single CSR matrix then it has
>>>> to be copied into the PETSc datastructure.
>>>> MatMPIAIJSetPreallocationCSR() does the copy very efficiently.
>>>> But you are right, until you free YOUR rowpt[], colind[] and a[]
>>>> there are two copies of the matrix.
>>>> One could argue that this is ok, because anyways any
>>>> preconditioner worth anything will take up at least that much
>>>> memory later anyways. For example, if you use the PETSc default
>>>> preconditioner, block Jacobi with ILU(0) on each block it will
>>>> take pretty much the same amount of memory.
>>>>
>>>> For people who use SOR or some other preconditioner that requires
>>>> very little memory this is not satisfactory because they feel
>>>> they could run larger problems without the copy.
>>>>
>>>> I strongly recommend you use MatMPIAIJSetPreallocationCSR() and
>>>> live with any memory limitation. The reason is that doing more
>>>> than that has orders of magnitude more pain with generally little
>>>> payoff.
>>>> If you like pain then you can
>>>> 1) change your code to not build directly into CSR, instead call
>>>> MatSetValuesLocal() as you compute the entries. With this you
>>>> need to figure out the preallocation which can be painful code.
>>>> 2) write an entire matrix class that stores the matrix like you
>>>> store it, not like you store it. This is a huge project, since
>>>> you have to write your own matrix-vector, ILU factorization etc.
>>>>
>>>> Barry
>>>>
>>>>
>>>>
>>>> On Oct 20, 2009, at 8:04 PM, Chris Kees wrote:
>>>>
>>>> Hi,
>>>>
>>>> Our unstructured finite element code does a fairly standard
>>>> overlapping decomposition that allows it to calculate all the non-
>>>> zero column entries for the rows owned by the processor (rows
>>>> 0...ldim-1 in the local numbering). We assemble them into a
>>>> local CSR matrix and then copy them into a PETSc matrix like this:
>>>>
>>>> for (int i=0;i<ldim;i++)
>>>> {
>>>> irow[0] = i;
>>>> MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]-
>>>> rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES);
>>>> }
>>>>
>>>> where rowptr and colind are the CSR indexing arrays in the local
>>>> numbering.
>>>>
>>>> I would like to eliminate the duplicate memory (and the copy
>>>> above). Is there a recommended way to let PETSc reference array
>>>> 'a' directly or a way to get rid of 'a' and translate our CSR
>>>> assembly routines to use low level PETSc data structures?
>>>>
>>>> So far I've added MatMPIAIJSetPreallocationCSR to try to speed up
>>>> the first solve, but the documentation is clear that 'a' is not a
>>>> pointer to the AIJ storage so I still need the copy. I tried
>>>> using MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to
>>>> zero but I got indexing errors, and the documentation really
>>>> seems to imply that the local matrix is really not a standard CSR
>>>> anyway. If that's true then it seems like the options I have are
>>>> to write a shell matrix for parallel CSR storage or write some
>>>> new assembly routines that use MatSetValuesLocal. I'm more
>>>> concerned about the memory than the overhead of
>>>> MatSetValuesLocal, but it would certainly be easier on me to use
>>>> the CSR-based assembly routines we already have.
>>>>
>>>> Thanks,
>>>> Chris
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to
>>>> which their experiments lead.
>>>> -- Norbert Wiener
>>>
>>
>
More information about the petsc-dev
mailing list