[petsc-users] generate entries on 'wrong' process

Wen Jiang jiangwen84 at gmail.com
Fri Jan 20 10:21:59 CST 2012


Hi, Matt

Could you tell me some more details about how to get a stack trace there? I
know little about it. The job is submitted on head node and running on
compute nodes.

Thanks.

On Fri, Jan 20, 2012 at 9:44 AM, Wen Jiang <jiangwen84 at gmail.com> wrote:

> Hi Barry,
>
> Thanks for your suggestion. I just added MatSetOption(mat,
> MAT_NEW_NONZERO_ALLOCATION_
ERR,PETSC_TRUE) to my code, but I did not get
> any error information regarding to bad allocation. And my code is stuck
> there. I attached the output file below. Thanks.
>

Run with -start_in_debugger and get a stack trace. Note that your stashes
are enormous. You might consider
MatAssemblyBegin/End(A, MAT_ASSEMBLY_FLUSH) during assembly.

  Matt


> [0] VecAssemblyBegin_MPI(): Stash has 210720 entries, uses 12 mallocs.
> [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
> [5] MatAssemblyBegin_MPIAIJ(): Stash has 4806656 entries, uses 8 mallocs.
> [6] MatAssemblyBegin_MPIAIJ(): Stash has 5727744 entries, uses 9 mallocs.
> [4] MatAssemblyBegin_MPIAIJ(): Stash has 5964288 entries, uses 9 mallocs.
> [7] MatAssemblyBegin_MPIAIJ(): Stash has 7408128 entries, uses 9 mallocs.
> [3] MatAssemblyBegin_MPIAIJ(): Stash has 8123904 entries, uses 9 mallocs.
> [2] MatAssemblyBegin_MPIAIJ(): Stash has 11544576 entries, uses 10
mallocs.
> [0] MatStashScatterBegin_Private(): No of messages: 1
> [0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 107888648
> [0] MatAssemblyBegin_MPIAIJ(): Stash has 13486080 entries, uses 10
mallocs.
> [1] MatAssemblyBegin_MPIAIJ(): Stash has 16386048 entries, uses 10
mallocs.
> [7] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0
> unneeded,2514194 used
> [7] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [7] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [7] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using Inode
> routines
> [7] PetscCommDuplicate(): Using internal PETSc communicator 47582902893600
> 339106512
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> unneeded,2514537 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [0] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode
> routines
> [0] PetscCommDuplicate(): Using internal PETSc communicator 46968795675680
> 536030192
> [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
> [6] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0
> unneeded,2499938 used
> [6] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [6] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [6] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using Inode
> routines
> [6] PetscCommDuplicate(): Using internal PETSc communicator 47399146302496
> 509504096
> [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0
> unneeded,2525390 used
> [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [5] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using Inode
> routines
> [5] PetscCommDuplicate(): Using internal PETSc communicator 47033309994016
> 520223440
> [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> unneeded,2500281 used
> [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [1] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode
> routines
> [1] PetscCommDuplicate(): Using internal PETSc communicator 47149241441312
> 163068544
> [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> unneeded,2525733 used
> [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> [2] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using Inode
> routines
> [2] PetscCommDuplicate(): Using internal PETSc communicator 47674980494368
> 119371056
>
>
>
> >
>> > Since my code never finishes, I cannot get the summary files by add
>> -log_summary. any other way to get summary file?
>>
>
>   My guess is that you are running a larger problem on the this system and
>> your preallocation for the matrix is wrong. While in the small run you
sent
>> the preallocation is correct.
>>
>>   Usually the only thing that causes it to take forever is not the
>> parallel communication but is the preallocation. After you create the
>> matrix and set its preallocation call
>> MatSetOption(mat, NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE);  then run. It
>> will stop with an error message if preallocation is wrong.
>>
>>   Barry
>>
>>
>>
>> >
>> > BTW, my codes are running without any problem on shared-memory desktop
>> with any number of processes.
>> >
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120120/f71de29a/attachment.htm>


More information about the petsc-users mailing list