[petsc-users] A number of questions about DMDA with SNES and Quasi-Newton methods

zakaryah . zakaryah at gmail.com
Thu Nov 2 16:03:37 CDT 2017


I ran MatView with the smallest possible grid size.  The elements which are
nonzero after the first assembly but not after the matrix is created are
exactly the couplings, i.e. row and column 0 off the diagonal, which I'm
trying to preallocate, and eventually place in the second call to
FormCoupleLocations, although right now it's only being called once - not
sure why prealloc_only is set - maybe because I have
MAT_NEW_NONZERO_ALLOCATION_ERR set to false?  I'm not sure why the Jbh
terms result in so few additional mallocs - are columns allocated in chunks
on the fly?  For example, when my DMDA size is 375, adding values to Jhb
results in 375 additional mallocs, but adding values to Jbh only adds 25
mallocs.

I guess my FormCoupleLocations is pretty incomplete, because incrementing
dnz and onz doesn't seem to have any effect.  I also don't know how this
function should behave - as I said before, writing to dnz[i] and onz[i]
where i goes from rstart to rstart+nrows-1 causes a segfault on processors
>0, and if I inspect those values of dnz[i] and onz[i] by writing them to a
file, they are nonsense when processor >0.  Are dnz and onz local to the
processor, i.e. should my iterations go from 0 to nrows-1?  What is the
general idea behind setting the structure from within FormCoupleLocations?
I'm currently not doing that at all.

Thanks for all the help!

On Thu, Nov 2, 2017 at 12:31 PM, Smith, Barry F. <bsmith at mcs.anl.gov> wrote:

>
>
> > On Nov 1, 2017, at 10:32 PM, zakaryah . <zakaryah at gmail.com> wrote:
> >
> > I worked on the assumptions in my previous email and I at least
> partially implemented the function to assign the couplings.  For row 0,
> which is the redundant field, I set dnz[0] to end-start, and onz[0] to the
> size of the matrix minus dnz[0].  For all other rows, I just increment the
> existing values of dnz[i] and onz[i], since the coupling to the redundant
> field adds one extra element beyond what's allocated for the DMDA stencil.
>
>   Sounds reasonable.
> >
> > I see in the source that the FormCoupleLocations function is called once
> if the DM has PREALLOC_ONLY set to true, but twice otherwise.  I assume
> that the second call is for setting the nonzero structure.
>
>   Yes
> > Do I need to do this?
>
>    You probably should.
>
>    I would do a MatView() small DMDA on the matrix you obtain and then
> again after you add the numerical values. This will show you what values
> are not being properly allocated/put in when the matrix is created by the
> DM.
> >
> > In any case, something is still not right.  Even with the extra elements
> preallocated, the first assembly of the matrix is very slow.  I ran a test
> problem on a single process with -info, and got this:
> >
> > 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0
> unneeded,1 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode
> routines
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0
> unneeded,629703 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5.
> Using Inode routines
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0
> unneeded,1 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode
> routines
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0
> unneeded,629703 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5.
> Using Inode routines
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space:
> 18018 unneeded,629704 used
>
>   Yes, really bad.
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5.
> Using Inode routines
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446774208 30818112
> >
> > [0] DMGetDMSNES(): Creating new DMSNES
> >
> > [0] DMGetDMKSP(): Creating new DMKSP
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> >
> >   0 SNES Function norm 2.302381528359e+00
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> >
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 139995446773056 30194064
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space:
> 126132 unneeded,647722 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is
> 9610
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> >
> > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5.
> Using Inode routines
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0
> unneeded,647722 used
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> >
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010
> >
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.
> >
> >
> >
> > The 9610 mallocs during MatSetValues seem suspicious and are probably
> what's taking so long with larger problems.  601 of them are apparently in
> the call to set Jbh, and 9009 are in the call to set Jhb (b is the
> redundant field, h is the DMDA field).  If I run with more than one
> process, I get a segfault when a process which has rank greater than 0 sets
> dnz or onz in the FormCoupleLocations call.
> >
> >
> > On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . <zakaryah at gmail.com> wrote:
> > Thanks Barry, that looks like exactly what I need.  I'm looking at
> pack.c and packm.c and I want to check my understanding of what my coupling
> function should do.  The relevant line in DMCreateMatrix_Composite_AIJ
> seems to be:
> >
> > (*com->FormCoupleLocations)(dm,NULL,dnz,onz,__rstart,__
> nrows,__start,__end);
> >
> > and I infer that dnz and onz are the number of nonzero elements in the
> diagonal and off-diagonal submatrices, for each row of the DMComposite
> matrix.  I suppose I can just set each of these in a for loop, but can I
> use the arguments to FormCoupleLocations as the range for the loop?  Which
> ones - __rstart to __rstart+__nrows?  How can I determine the number of
> rows on each processor from within the function that I pass?  From the
> preallocation macros it looks like __start to __end describe the range of
> the columns of the diagonal submatrix - is that right?  It looks like the
> ranges will be specific to each processor.  Do I just set the values in dnz
> and onz, or do I need to reduce them?
> >
> > Thanks for all the help!  Maybe if I get things working I can carve out
> the core of the code to make an example program for DMRedundant/Composite.
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20171102/e8628ffb/attachment-0001.html>


More information about the petsc-users mailing list