Barry,<div><br></div><div>I've tracked down the problem.<div><br><div>I ran with -info -mat_view_info, and fpe's enabled and got a SIGFPE after entering MatCreateMPIAIJWithSplitArrays (Petsc did not produce a stacktrace unfortunately). This was due to a floating point exception in a typecast inside mat/interface/matrix.c:</div>
<div><br></div><div><div> if (mat->ops->getinfo) {</div><div> MatInfo info;</div><div> ierr = MatGetInfo(mat,MAT_GLOBAL_SUM,&info);CHKERRQ(ierr);</div><div> ierr = PetscViewerASCIIPrintf(viewer,"<b>total: nonzeros=%D</b>, allocated nonzeros=%D\n",<b>(PetscInt)info.nz_used</b>,(PetscInt)info.nz_allocated);CHKERRQ(ierr);</div>
<div> ierr = PetscViewerASCIIPrintf(viewer,"total number of mallocs used during MatSetValues calls =%D\n",(PetscInt)info.mallocs);CHKERRQ(ierr);</div><div> }</div></div><div><br></div><div>My sparse matrix has about 6 billion nonzeros. When I disable FPEs, i get a silent overflow when converting MatInfo.nz_used from PetscLogDouble to (32-bit) PetscInt:</div>
<div><div><br></div><div>Matrix Object: 96 MPI processes</div><div> type: mpiaij</div><div> rows=131857963, cols=18752388</div></div><div> total: <b>nonzeros=-2147483648</b>, allocated nonzeros=0 <br><br>and the code runs just fine. Maybe PETSc should cast nz_used to a long int?</div>
<div><br></div><div><br>Mihai<br><div class="gmail_quote">On Thu, Nov 29, 2012 at 6:25 PM, Barry Smith <span dir="ltr"><<a href="mailto:bsmith@mcs.anl.gov" target="_blank">bsmith@mcs.anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
On Nov 29, 2012, at 9:48 AM, Mihai Alexe <<a href="mailto:malexe@vt.edu">malexe@vt.edu</a>> wrote:<br>
<br>
> Hello all,<br>
><br>
> I am creating a large rectangular MPIAIJ matrix, then a shell NormalMatrix that eventually gets passed to a KSP object (all part of a constrained least-squares solver).<br>
> Code looks as follows:<br>
><br>
> //user.A_mat and user.Hess are PETSc Mat<br>
><br>
> info = MatCreateMPIAIJWithSplitArrays( PETSC_COMM_WORLD, *locrow, *loccol, nrow,<br>
> *ncol, onrowidx, oncolidx,<br>
> (PetscScalar*) onvals, offrowidx, offcolidx,<br>
> (PetscScalar*) values, &user.A_mat ); CHKERRQ(info);<br>
><br>
> info = MatCreateNormal( user.A_mat, &user.Hess ); CHKERRQ(info);<br>
> info = MatSetUp( user.Hess );<br>
><br>
> Is MatSetUp() required for A or Hess to be initialized correctly? Or some call to MatSetPreallocation?<br>
</div>'<br>
No you shouldn't need them. Try with valgrind <a href="http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind" target="_blank">http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind</a><br>
<span class="HOEnZb"><font color="#888888"><br>
Barry<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
><br>
> My code crashes after displaying (with -info -mat_view_info):<br>
><br>
> [0] PetscCommDuplicate(): Duplicating a communicator 47534399113024 67425648 max tags = <a href="tel:2147483647" value="+12147483647">2147483647</a><br>
> [0] PetscCommDuplicate(): Duplicating a communicator 47534399112000 67760592 max tags = <a href="tel:2147483647" value="+12147483647">2147483647</a><br>
> [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to -mat_no_inode<br>
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used<br>
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615<br>
> Matrix Object: 1 MPI processes<br>
> type: seqaij<br>
> rows=8920860, cols=1508490<br>
> total: nonzeros=34572269, allocated nonzeros=0<br>
> total number of mallocs used during MatSetValues calls =0<br>
> not using I-node routines<br>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592<br>
> [0] MatCreate_SeqAIJ_Inode(): Not using Inode routines due to -mat_no_inode<br>
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage space: 0 unneeded,1762711 used<br>
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349<br>
> Matrix Object: 1 MPI processes<br>
> type: seqaij<br>
> rows=8920860, cols=18752388<br>
> total: nonzeros=1762711, allocated nonzeros=0<br>
> total number of mallocs used during MatSetValues calls =0<br>
> not using I-node routines<br>
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used<br>
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615<br>
> Matrix Object: 1 MPI processes<br>
> type: seqaij<br>
> rows=8920860, cols=1508490<br>
> total: nonzeros=34572269, allocated nonzeros=0<br>
> total number of mallocs used during MatSetValues calls =0<br>
> not using I-node routines<br>
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 18752388; storage space: 0 unneeded,1762711 used<br>
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349<br>
> Matrix Object: 1 MPI processes<br>
> type: seqaij<br>
> rows=8920860, cols=18752388<br>
> total: nonzeros=1762711, allocated nonzeros=0<br>
> total number of mallocs used during MatSetValues calls =0<br>
> not using I-node routines<br>
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 1508490; storage space: 0 unneeded,34572269 used<br>
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 615<br>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592<br>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 47534399112000 67760592<br>
> [0] VecScatterCreateCommon_PtoS(): Using blocksize 1 scatter<br>
> [0] VecScatterCreate(): General case: MPI to Seq<br>
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8920860 X 38109; storage space: 0 unneeded,1762711 used<br>
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 349<br>
> Matrix Object: 160 MPI processes<br>
> type: mpiaij<br>
> rows=131858910, cols=18752388<br>
><br>
> The code ran just fine on a smaller (pruned) input dataset.<br>
> I don't get a stacktrace unfortunately... (running in production mode, trying to switch to debug mode now).<br>
><br>
><br>
> Regards,<br>
> Mihai<br>
><br>
<br>
</div></div></blockquote></div><br></div></div></div>