most efficient way to use local CSR matrix?

Thu Oct 29 13:50:02 CDT 2009

On Thu, Oct 29, 2009 at 1:46 PM, Chris Kees <
christopher.e.kees at usace.army.mil> wrote:

> Mark, Barry,
>
> Thanks for the help.  For now I'm sticking with the approach of copying the
> csr matrix and using the csr data structures to do the preallocations. I'll
> eventually get around to writing the code for assembling directly into the
> petsc matrix.  I have two more questions.
>
> 1) On 1 processor, the linear solver is not converging in 1 iteration using
> -pc_type lu -pc_factor_mat_solver_package spooles. It converges in one
> iteration in parallel or on a single processor if I set -mat_type seqaij.
> Also, if I go back to the old way of constructing the matrix it DOES
> converge in 1 iteration on 1 processor. That is, if I replace the
> MatCreate/MatSetSizes/...  calls from the earlier email, with what I had
> originally:
>
>
> MatCreateMPIAIJ(Py_PETSC_COMM_WORLD,n,n,N,N,1,PETSC_NULL,max_nNeighbors,PETSC_NULL,&self->m);
>
> then the solver converges in 1 iteration as expected.  This wouldn't be a
> big deal except that we're trying to verify that our quadratic finite
> elements are working, and the behavior for p1 and p2 elements is different.
> The above observations are for p2, while all the cases work as expected for
> p1. Should we just not be using mpiaij at all on 1 processor?
>

No, you have a problem.

> 2) In the code snippet I took from the petsc example there are two
> preallocation calls (see below). Is that correct, or should I be checking
> the matrix type and calling only the preallocation function for the actual
> matrix type? i.e. something like
>
> MatCreate(...
> MatSetSizes(...
> MatSetFromOptions(...
> MatGetType(...
> if type==mpiaij: MatMPIAIJSetPreallocationCSR(...
>

It does not hurt to call them all.

  Matt

> Chris
>
> On Oct 21, 2009, at 11:50 AM, Mark F. Adams wrote:
>
>  Chris, just a note,
>>
>> Perhaps I missed something here but do something similar to you, eg
>> overlapping partitions, and PETSc is setup very well to be intrusive in your
>> code (I sometimes write a little 'addvalue' wrapper) and assemble your FE
>> matrices directly into a global matrix.  You use the (global)
>> MatSetValues(...) and set the matrix option MAT_IGNORE_OFF_PROC_ENTRIES (or
>> something like that) so that these off processor matrix entries, which are
>> computed redundantly, are thrown away and you get the correct matrix.
>>
>> As Barry said you don't need exact non-zero counts to allocate storage in
>> PETSC matrices, just don't underestimate.
>>
>> Mark
>>
>> On Oct 21, 2009, at 9:16 AM, Barry Smith wrote:
>>
>>
>>> On Oct 20, 2009, at 10:13 PM, Chris Kees wrote:
>>>
>>>  Thanks. Just to make sure I'm following, when I create the matrix I do:
>>>>
>>>>   MatCreate(PETSC_COMM_WORLD,&self->m);
>>>>   MatSetSizes(self->m,n,n,N,N);
>>>>   MatSetType(self->m,MATMPIAIJ);
>>>>   MatSetFromOptions(self->m);
>>>>
>>>> MatSeqAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)->A.colind,(double*)(SMP(L)->A.nzval));
>>>>
>>>> MatMPIAIJSetPreallocationCSR(self->m,SMP(L)->A.rowptr,SMP(L)->A.colind,(double*)(SMP(L)->A.nzval));
>>>>
>>>> and then I copy the matrix at each Newton iteration using the code in
>>>> the first message below.  You said "...PreallocationCSR() does the copy very
>>>> efficiently", but I don't think I'm supposed to repeatedly call it, right?
>>>>  Looking at the output some more it does seem like building the
>>>> preconditioner initially is taking much more time than the initial pass
>>>> through MatSetValues.
>>>>
>>>
>>>  If the matrix nonzero structure stays the same then yes you need call
>>> MatMPIAIJSetPreallocationCSR() only once and use the MatSetValues() calls
>>> each Newton step to update the nonzeros.
>>>
>>>  Barry
>>>
>>>
>>>> Chris
>>>>
>>>> On Oct 20, 2009, at 8:32 PM, Barry Smith wrote:
>>>>
>>>>
>>>>> Here's the deal. In parallel PETSc does not use a single CSR matrix on
>>>>> each process to hold the MPIAIJ matrix. Hence if you store the matrix on a
>>>>> process as a single CSR matrix then it has to be copied into the PETSc
>>>>> datastructure. MatMPIAIJSetPreallocationCSR() does the copy very
>>>>> efficiently. But you are right, until you free YOUR rowpt[], colind[] and
>>>>> a[] there are two copies of the matrix.
>>>>> One could argue that this is ok, because anyways any preconditioner
>>>>> worth anything will take up at least that much memory later anyways.  For
>>>>> example, if you use the PETSc default preconditioner, block Jacobi with
>>>>> ILU(0) on each block it will take pretty much the same amount of memory.
>>>>>
>>>>> For people who use SOR or some other preconditioner that requires very
>>>>> little memory this is not satisfactory because they feel they could run
>>>>> larger problems without the copy.
>>>>>
>>>>> I strongly recommend you use MatMPIAIJSetPreallocationCSR() and live
>>>>> with any memory limitation. The reason is that doing more than that has
>>>>> orders of magnitude more pain with generally little payoff.
>>>>> If you like pain then you can
>>>>> 1) change your code to not build directly into CSR, instead call
>>>>> MatSetValuesLocal() as you compute the entries. With this you need to figure
>>>>> out the preallocation which can be painful code.
>>>>> 2) write an entire matrix class that stores the matrix like you store
>>>>> it, not like you store it. This is a huge project, since you have to write
>>>>> your own matrix-vector, ILU factorization etc.
>>>>>
>>>>> Barry
>>>>>
>>>>>
>>>>>
>>>>> On Oct 20, 2009, at 8:04 PM, Chris Kees wrote:
>>>>>
>>>>>  Hi,
>>>>>>
>>>>>> Our unstructured finite element code does a fairly standard
>>>>>> overlapping decomposition that allows it to calculate all the non-zero
>>>>>> column entries for the rows owned by the processor (rows 0...ldim-1 in the
>>>>>> local numbering).  We assemble them into a local CSR matrix and then copy
>>>>>> them into a PETSc matrix like this:
>>>>>>
>>>>>> for (int i=0;i<ldim;i++)
>>>>>> {
>>>>>> irow[0] = i;
>>>>>>
>>>>>> MatSetValuesLocal(PETSCMAT(par_L),1,irow,rowptr[i+1]-rowptr[i],&colind[rowptr[i]],&a[rowptr[i]],INSERT_VALUES);
>>>>>> }
>>>>>>
>>>>>> where rowptr and colind are the CSR indexing arrays in the local
>>>>>> numbering.
>>>>>>
>>>>>> I would like to eliminate the duplicate memory (and the copy above).
>>>>>> Is there a recommended way to let PETSc reference array 'a' directly or a
>>>>>> way to get rid of 'a' and translate our CSR assembly routines to use low
>>>>>> level PETSc data structures?
>>>>>>
>>>>>> So far I've added MatMPIAIJSetPreallocationCSR to try to speed up the
>>>>>> first solve, but the documentation is clear that 'a' is not a pointer to the
>>>>>> AIJ storage so I still need the copy. I tried using
>>>>>> MatCreateMPIAIJWithSplitArrays with the 'o' arrays set to zero but I got
>>>>>> indexing errors, and the documentation really seems to imply that the local
>>>>>> matrix is really not a standard CSR anyway. If that's true then it seems
>>>>>> like the options I have are to write a shell matrix for parallel CSR storage
>>>>>> or write some new assembly routines that use MatSetValuesLocal. I'm more
>>>>>> concerned about the memory than the overhead of MatSetValuesLocal, but it
>>>>>> would certainly be easier on me to use the CSR-based assembly routines we
>>>>>> already have.
>>>>>>
>>>>>> Thanks,
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-dev/attachments/20091029/7a7f7f4e/attachment.html>