<div dir="ltr"><div class="gmail_default" style="font-size:small">OK thanks Barry! I've attached source for a minimal example which has the same structure as my problem.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Nov 2, 2017 at 5:36 PM, Smith, Barry F. <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Please send your "simple" code that attempts to do the appropriate preallocation and we'll look at why it is failing.<br>
<span class="HOEnZb"><font color="#888888"><br>
Barry<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
<br>
> On Nov 2, 2017, at 4:03 PM, zakaryah . <<a href="mailto:zakaryah@gmail.com">zakaryah@gmail.com</a>> wrote:<br>
><br>
> I ran MatView with the smallest possible grid size. The elements which are nonzero after the first assembly but not after the matrix is created are exactly the couplings, i.e. row and column 0 off the diagonal, which I'm trying to preallocate, and eventually place in the second call to FormCoupleLocations, although right now it's only being called once - not sure why prealloc_only is set - maybe because I have MAT_NEW_NONZERO_ALLOCATION_ERR set to false? I'm not sure why the Jbh terms result in so few additional mallocs - are columns allocated in chunks on the fly? For example, when my DMDA size is 375, adding values to Jhb results in 375 additional mallocs, but adding values to Jbh only adds 25 mallocs.<br>
><br>
> I guess my FormCoupleLocations is pretty incomplete, because incrementing dnz and onz doesn't seem to have any effect. I also don't know how this function should behave - as I said before, writing to dnz[i] and onz[i] where i goes from rstart to rstart+nrows-1 causes a segfault on processors >0, and if I inspect those values of dnz[i] and onz[i] by writing them to a file, they are nonsense when processor >0. Are dnz and onz local to the processor, i.e. should my iterations go from 0 to nrows-1? What is the general idea behind setting the structure from within FormCoupleLocations? I'm currently not doing that at all.<br>
><br>
> Thanks for all the help!<br>
><br>
> On Thu, Nov 2, 2017 at 12:31 PM, Smith, Barry F. <<a href="mailto:bsmith@mcs.anl.gov">bsmith@mcs.anl.gov</a>> wrote:<br>
><br>
><br>
> > On Nov 1, 2017, at 10:32 PM, zakaryah . <<a href="mailto:zakaryah@gmail.com">zakaryah@gmail.com</a>> wrote:<br>
> ><br>
> > I worked on the assumptions in my previous email and I at least partially implemented the function to assign the couplings. For row 0, which is the redundant field, I set dnz[0] to end-start, and onz[0] to the size of the matrix minus dnz[0]. For all other rows, I just increment the existing values of dnz[i] and onz[i], since the coupling to the redundant field adds one extra element beyond what's allocated for the DMDA stencil.<br>
><br>
> Sounds reasonable.<br>
> ><br>
> > I see in the source that the FormCoupleLocations function is called once if the DM has PREALLOC_ONLY set to true, but twice otherwise. I assume that the second call is for setting the nonzero structure.<br>
><br>
> Yes<br>
> > Do I need to do this?<br>
><br>
> You probably should.<br>
><br>
> I would do a MatView() small DMDA on the matrix you obtain and then again after you add the numerical values. This will show you what values are not being properly allocated/put in when the matrix is created by the DM.<br>
> ><br>
> > In any case, something is still not right. Even with the extra elements preallocated, the first assembly of the matrix is very slow. I ran a test problem on a single process with -info, and got this:<br>
> ><br>
> > 0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1<br>
> ><br>
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.<br>
> ><br>
> > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81<br>
> ><br>
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.<br>
> ><br>
> > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 1 X 1; storage space: 0 unneeded,1 used<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1<br>
> ><br>
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 1) < 0.6. Do not use CompressedRow routines.<br>
> ><br>
> > [0] MatSeqAIJCheckInode(): Found 1 nodes out of 1 rows. Not using Inode routines<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9009 X 9009; storage space: 0 unneeded,629703 used<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81<br>
> ><br>
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9009) < 0.6. Do not use CompressedRow routines.<br>
> ><br>
> > [0] MatSeqAIJCheckInode(): Found 3003 nodes of 9009. Limit used: 5. Using Inode routines<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 18018 unneeded,629704 used<br>
><br>
> Yes, really bad.<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81<br>
> ><br>
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.<br>
> ><br>
> > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446774208 30818112<br>
> ><br>
> > [0] DMGetDMSNES(): Creating new DMSNES<br>
> ><br>
> > [0] DMGetDMKSP(): Creating new DMKSP<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064<br>
> ><br>
> > 0 SNES Function norm 2.302381528359e+00<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064<br>
> ><br>
> > [0] PetscCommDuplicate(): Using internal PETSc communicator 139995446773056 30194064<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 126132 unneeded,647722 used<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 9610<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010<br>
> ><br>
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.<br>
> ><br>
> > [0] MatSeqAIJCheckInode(): Found 3004 nodes of 9010. Limit used: 5. Using Inode routines<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 9010 X 9010; storage space: 0 unneeded,647722 used<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> ><br>
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 9010<br>
> ><br>
> > [0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 9010) < 0.6. Do not use CompressedRow routines.<br>
> ><br>
> ><br>
> ><br>
> > The 9610 mallocs during MatSetValues seem suspicious and are probably what's taking so long with larger problems. 601 of them are apparently in the call to set Jbh, and 9009 are in the call to set Jhb (b is the redundant field, h is the DMDA field). If I run with more than one process, I get a segfault when a process which has rank greater than 0 sets dnz or onz in the FormCoupleLocations call.<br>
> ><br>
> ><br>
> > On Tue, Oct 31, 2017 at 10:40 PM, zakaryah . <<a href="mailto:zakaryah@gmail.com">zakaryah@gmail.com</a>> wrote:<br>
> > Thanks Barry, that looks like exactly what I need. I'm looking at pack.c and packm.c and I want to check my understanding of what my coupling function should do. The relevant line in DMCreateMatrix_Composite_AIJ seems to be:<br>
> ><br>
> > (*com->FormCoupleLocations)(<wbr>dm,NULL,dnz,onz,__rstart,__<wbr>nrows,__start,__end);<br>
> ><br>
> > and I infer that dnz and onz are the number of nonzero elements in the diagonal and off-diagonal submatrices, for each row of the DMComposite matrix. I suppose I can just set each of these in a for loop, but can I use the arguments to FormCoupleLocations as the range for the loop? Which ones - __rstart to __rstart+__nrows? How can I determine the number of rows on each processor from within the function that I pass? From the preallocation macros it looks like __start to __end describe the range of the columns of the diagonal submatrix - is that right? It looks like the ranges will be specific to each processor. Do I just set the values in dnz and onz, or do I need to reduce them?<br>
> ><br>
> > Thanks for all the help! Maybe if I get things working I can carve out the core of the code to make an example program for DMRedundant/Composite.<br>
> ><br>
><br>
><br>
<br>
</div></div></blockquote></div><br></div>