[petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods

zakaryah . zakaryah at gmail.com
Wed Nov 1 22:32:07 CDT 2017


I worked on the assumptions in my previous email and I at least partially
implemented the function to assign the couplings.  For row 0, which is the
redundant field, I set dnz[0] to end-start, and onz[0] to the size of the
matrix minus dnz[0].  For all other rows, I just increment the existing
values of dnz[i] and onz[i], since the coupling to the redundant field adds
one extra element beyond what's allocated for the DMDA stencil.

I see in the source that the FormCoupleLocations function is called once if
the DM has PREALLOC_ONLY set to true, but twice otherwise.  I assume that
the second call is for setting the nonzero structure.  Do I need to do this?

In any case, something is still not right.  Even with the extra elements
preallocated, the first assembly of the matrix is very slow.  I ran a test
problem on a single process with -info, and got this:

0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1
used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode
routines

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0
unneeded,629703 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using
Inode routines

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0
unneeded,1 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode
routines

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0
unneeded,629703 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using
Inode routines

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018
unneeded,629704 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using
Inode routines

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208
30818112

[0] DMGetDMSNES(): Creating new DMSNES

[0] DMGetDMKSP(): Creating new DMKSP

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

  0 SNES Function norm 2.302381528359e+00

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056
30194064

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space:
126132 unneeded,647722 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.

[0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using
Inode routines

[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0
unneeded,647722 used

[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0

[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010

[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.


The 9610 mallocs during MatSetValues seem suspicious and are probably
what's taking so long with larger problems.  601 of them are apparently in
the call to set Jbh, and 9009 are in the call to set Jhb (b is the
redundant field, h is the DMDA field).  If I run with more than one
process, I get a segfault when a process which has rank greater than 0 sets
dnz or onz in the FormCoupleLocations call.

On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . <zakaryah at gmail.com> wrote:

> Thanks Barry, that looks like exactly what I need.  I'm looking at pack.c
> and packm.c and I want to check my understanding of what my coupling
> function should do.  The relevant line in *DMCreateMatrix_Composite_AIJ *seems
> to be:
>
> (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__
> nrows,__start,__end);
>
> and I infer that dnz and onz are the number of nonzero elements in the
> diagonal and off-diagonal submatrices, for each row of the DMComposite
> matrix.  I suppose I can just set each of these in a for loop, but can I
> use the arguments to FormCoupleLocations as the range for the loop?  Which
> ones - __rstart to __rstart+__nrows?  How can I determine the number of
> rows on each processor from within the function that I pass?  From the
> preallocation macros it looks like __start to __end describe the range of
> the columns of the diagonal submatrix - is that right?  It looks like the
> ranges will be specific to each processor.  Do I just set the values in dnz
> and onz, or do I need to reduce them?
>
> Thanks for all the help!  Maybe if I get things working I can carve out
> the core of the code to make an example program for DMRedundant/Composite.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171101/330fb9b2/attachment-0001.html>


More information about the petsc-users mailing list