On Fri, Jan 20, 2012 at 10:21 AM, Wen Jiang <span dir="ltr"><<a href="mailto:jiangwen84@gmail.com">jiangwen84@gmail.com</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi, Matt<br><br>Could you tell me some more details about how to get a stack trace there? I know little about it. The job is submitted on head node and running on compute nodes. <br></blockquote><div><br></div><div>1) Always run serial problems until you understand what is happening</div>
<div><br></div><div>2) Run with -start_in_debugger, and type 'cont' in the debugger (read about gdb)</div><div><br></div><div>3) When it stalls, Ctrl-C and then type 'where'</div><div><br></div><div> Matt</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Thanks.<br><br>On Fri, Jan 20, 2012 at 9:44 AM, Wen Jiang <<a href="mailto:jiangwen84@gmail.com" target="_blank">jiangwen84@gmail.com</a>> wrote:<br>
<br>
> Hi Barry,<br>
><br>
> Thanks for your suggestion. I just added MatSetOption(mat,<br>
> MAT_NEW_NONZERO_ALLOCATION_<div>ERR,PETSC_TRUE) to my code, but I did not get<br>
> any error information regarding to bad allocation. And my code is stuck<br>
> there. I attached the output file below. Thanks.<br>
><br>
<br>
Run with -start_in_debugger and get a stack trace. Note that your stashes<br>
are enormous. You might consider<br>
MatAssemblyBegin/End(A, MAT_ASSEMBLY_FLUSH) during assembly.<br>
<br>
Matt<br>
<br>
<br>
> [0] VecAssemblyBegin_MPI(): Stash has 210720 entries, uses 12 mallocs.<br>
> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.<br>
> [5] MatAssemblyBegin_MPIAIJ(): Stash has 4806656 entries, uses 8 mallocs.<br>
> [6] MatAssemblyBegin_MPIAIJ(): Stash has 5727744 entries, uses 9 mallocs.<br>
> [4] MatAssemblyBegin_MPIAIJ(): Stash has 5964288 entries, uses 9 mallocs.<br>
> [7] MatAssemblyBegin_MPIAIJ(): Stash has 7408128 entries, uses 9 mallocs.<br>
> [3] MatAssemblyBegin_MPIAIJ(): Stash has 8123904 entries, uses 9 mallocs.<br>
> [2] MatAssemblyBegin_MPIAIJ(): Stash has 11544576 entries, uses 10 mallocs.<br>
> [0] MatStashScatterBegin_Private(): No of messages: 1<br>
> [0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 107888648<br>
> [0] MatAssemblyBegin_MPIAIJ(): Stash has 13486080 entries, uses 10 mallocs.<br>
> [1] MatAssemblyBegin_MPIAIJ(): Stash has 16386048 entries, uses 10 mallocs.<br>
> [7] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0<br>
> unneeded,2514194 used<br>
> [7] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [7] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>
> [7] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using Inode<br>
> routines<br>
> [7] PetscCommDuplicate(): Using internal PETSc communicator 47582902893600<br>
> 339106512<br>
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0<br>
> unneeded,2514537 used<br>
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>
> [0] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode<br>
> routines<br>
> [0] PetscCommDuplicate(): Using internal PETSc communicator 46968795675680<br>
> 536030192<br>
> [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter<br>
> [6] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0<br>
> unneeded,2499938 used<br>
> [6] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [6] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>
> [6] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using Inode<br>
> routines<br>
> [6] PetscCommDuplicate(): Using internal PETSc communicator 47399146302496<br>
> 509504096<br>
> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0<br>
> unneeded,2525390 used<br>
> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>
> [5] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using Inode<br>
> routines<br>
> [5] PetscCommDuplicate(): Using internal PETSc communicator 47033309994016<br>
> 520223440<br>
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0<br>
> unneeded,2500281 used<br>
> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>
> [1] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode<br>
> routines<br>
> [1] PetscCommDuplicate(): Using internal PETSc communicator 47149241441312<br>
> 163068544<br>
> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0<br>
> unneeded,2525733 used<br>
> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0<br>
> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294<br>
> [2] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode<br>
> routines<br>
> [2] PetscCommDuplicate(): Using internal PETSc communicator 47674980494368<br>
> 119371056<br>
><br>
><br>
><br>
> ><br>
>> > Since my code never finishes, I cannot get the summary files by add<br>
>> -log_summary. any other way to get summary file?<br>
>><br>
><br>
> My guess is that you are running a larger problem on the this system and<br>
>> your preallocation for the matrix is wrong. While in the small run you sent<br>
>> the preallocation is correct.<br>
>><br>
>> Usually the only thing that causes it to take forever is not the<br>
>> parallel communication but is the preallocation. After you create the<br>
>> matrix and set its preallocation call<br>
>> MatSetOption(mat, NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE); then run. It<br>
>> will stop with an error message if preallocation is wrong.<br>
>><br>
>> Barry<br>
>><br>
>><br>
>><br>
>> ><br>
>> > BTW, my codes are running without any problem on shared-memory desktop<br>
>> with any number of processes.<br>
>> ><br>
>><br>
></div>
</blockquote></div><br><br clear="all"><div><br></div>-- <br>What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.<br>
-- Norbert Wiener<br>