[petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods

Smith, Barry F. bsmith at mcs.anl.gov
Thu Nov 2 11:31:07 CDT 2017



> On Nov 1, 2017, at 10:32 PM, zakaryah . <zakaryah at gmail.com> wrote:
> 
> I worked on the assumptions in my previous email and I at least partially implemented the function to assign the couplings.  For row 0, which is the redundant field, I set dnz[0] to end-start, and onz[0] to the size of the matrix minus dnz[0].  For all other rows, I just increment the existing values of dnz[i] and onz[i], since the coupling to the redundant field adds one extra element beyond what's allocated for the DMDA stencil.

  Sounds reasonable.
> 
> I see in the source that the FormCoupleLocations function is called once if the DM has PREALLOC_ONLY set to true, but twice otherwise.  I assume that the second call is for setting the nonzero structure.  

  Yes
> Do I need to do this?

   You probably should.

   I would do a MatView() small DMDA on the matrix you obtain and then again after you add the numerical values. This will show you what values are not being properly allocated/put in when the matrix is created by the DM.
> 
> In any case, something is still not right.  Even with the extra elements preallocated, the first assembly of the matrix is very slow.  I ran a test problem on a single process with -info, and got this:
> 
> 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018 unneeded,629704 used

  Yes, really bad.
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208 30818112
> 
> [0] DMGetDMSNES(): Creating new DMSNES
> 
> [0] DMGetDMKSP(): Creating new DMKSP
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064
> 
>   0 SNES Function norm 2.302381528359e+00 
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064
> 
> [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 126132 unneeded,647722 used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> 
> [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines
> 
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0 unneeded,647722 used
> 
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> 
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010
> 
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> 
> 
> 
> The 9610 mallocs during MatSetValues seem suspicious and are probably what's taking so long with larger problems.  601 of them are apparently in the call to set Jbh, and 9009 are in the call to set Jhb (b is the redundant field, h is the DMDA field).  If I run with more than one process, I get a segfault when a process which has rank greater than 0 sets dnz or onz in the FormCoupleLocations call.
> 
> 
> On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . <zakaryah at gmail.com> wrote:
> Thanks Barry, that looks like exactly what I need.  I'm looking at pack.c and packm.c and I want to check my understanding of what my coupling function should do.  The relevant line in DMCreateMatrix_Composite_AIJ seems to be:
> 
> (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__nrows,__start,__end);
> 
> and I infer that dnz and onz are the number of nonzero elements in the diagonal and off-diagonal submatrices, for each row of the DMComposite matrix.  I suppose I can just set each of these in a for loop, but can I use the arguments to FormCoupleLocations as the range for the loop?  Which ones - __rstart to __rstart+__nrows?  How can I determine the number of rows on each processor from within the function that I pass?  From the preallocation macros it looks like __start to __end describe the range of the columns of the diagonal submatrix - is that right?  It looks like the ranges will be specific to each processor.  Do I just set the values in dnz and onz, or do I need to reduce them?
> 
> Thanks for all the help!  Maybe if I get things working I can carve out the core of the code to make an example program for DMRedundant/Composite.
> 



More information about the petsc-users mailing list