[petsc-users] generate entries on 'wrong' process

Matthew Knepley knepley at gmail.com
Fri Jan 20 10:25:00 CST 2012


On Fri, Jan 20, 2012 at 10:21 AM, Wen Jiang <jiangwen84 at gmail.com> wrote:

> Hi, Matt
>
> Could you tell me some more details about how to get a stack trace there?
> I know little about it. The job is submitted on head node and running on
> compute nodes.
>

1) Always run serial problems until you understand what is happening

2) Run with -start_in_debugger, and type 'cont' in the debugger (read about
gdb)

3) When it stalls, Ctrl-C and then type 'where'

  Matt


> Thanks.
>
> On Fri, Jan 20, 2012 at 9:44 AM, Wen Jiang <jiangwen84 at gmail.com> wrote:
>
> > Hi Barry,
> >
> > Thanks for your suggestion. I just added MatSetOption(mat,
> > MAT_NEW_NONZERO_ALLOCATION_
> ERR,PETSC_TRUE) to my code, but I did not get
> > any error information regarding to bad allocation. And my code is stuck
> > there. I attached the output file below. Thanks.
> >
>
> Run with -start_in_debugger and get a stack trace. Note that your stashes
> are enormous. You might consider
> MatAssemblyBegin/End(A, MAT_ASSEMBLY_FLUSH) during assembly.
>
>   Matt
>
>
> > [0] VecAssemblyBegin_MPI(): Stash has 210720 entries, uses 12 mallocs.
> > [0] VecAssemblyBegin_MPI(): Block-Stash has 0 entries, uses 0 mallocs.
> > [5] MatAssemblyBegin_MPIAIJ(): Stash has 4806656 entries, uses 8 mallocs.
> > [6] MatAssemblyBegin_MPIAIJ(): Stash has 5727744 entries, uses 9 mallocs.
> > [4] MatAssemblyBegin_MPIAIJ(): Stash has 5964288 entries, uses 9 mallocs.
> > [7] MatAssemblyBegin_MPIAIJ(): Stash has 7408128 entries, uses 9 mallocs.
> > [3] MatAssemblyBegin_MPIAIJ(): Stash has 8123904 entries, uses 9 mallocs.
> > [2] MatAssemblyBegin_MPIAIJ(): Stash has 11544576 entries, uses 10
> mallocs.
> > [0] MatStashScatterBegin_Private(): No of messages: 1
> > [0] MatStashScatterBegin_Private(): Mesg_to: 1: size: 107888648
> > [0] MatAssemblyBegin_MPIAIJ(): Stash has 13486080 entries, uses 10
> mallocs.
> > [1] MatAssemblyBegin_MPIAIJ(): Stash has 16386048 entries, uses 10
> mallocs.
> > [7] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0
> > unneeded,2514194 used
> > [7] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [7] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> > [7] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using
> Inode
> > routines
> > [7] PetscCommDuplicate(): Using internal PETSc communicator
> 47582902893600
> > 339106512
> > [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> > unneeded,2514537 used
> > [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> > [0] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using
> Inode
> > routines
> > [0] PetscCommDuplicate(): Using internal PETSc communicator
> 46968795675680
> > 536030192
> > [0] MatSetUpMultiply_MPIAIJ(): Using block index set to define scatter
> > [6] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0
> > unneeded,2499938 used
> > [6] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [6] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> > [6] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using
> Inode
> > routines
> > [6] PetscCommDuplicate(): Using internal PETSc communicator
> 47399146302496
> > 509504096
> > [5] MatAssemblyEnd_SeqAIJ(): Matrix size: 11390 X 11390; storage space: 0
> > unneeded,2525390 used
> > [5] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [5] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> > [5] Mat_CheckInode(): Found 11390 nodes out of 11390 rows. Not using
> Inode
> > routines
> > [5] PetscCommDuplicate(): Using internal PETSc communicator
> 47033309994016
> > 520223440
> > [1] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> > unneeded,2500281 used
> > [1] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [1] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> > [1] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using
> Inode
> > routines
> > [1] PetscCommDuplicate(): Using internal PETSc communicator
> 47149241441312
> > 163068544
> > [2] MatAssemblyEnd_SeqAIJ(): Matrix size: 11391 X 11391; storage space: 0
> > unneeded,2525733 used
> > [2] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> > [2] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 294
> > [2] Mat_CheckInode(): Found 11391 nodes out of 11391 rows. Not using
> Inode
> > routines
> > [2] PetscCommDuplicate(): Using internal PETSc communicator
> 47674980494368
> > 119371056
> >
> >
> >
> > >
> >> > Since my code never finishes, I cannot get the summary files by add
> >> -log_summary. any other way to get summary file?
> >>
> >
> >   My guess is that you are running a larger problem on the this system
> and
> >> your preallocation for the matrix is wrong. While in the small run you
> sent
> >> the preallocation is correct.
> >>
> >>   Usually the only thing that causes it to take forever is not the
> >> parallel communication but is the preallocation. After you create the
> >> matrix and set its preallocation call
> >> MatSetOption(mat, NEW_NONZERO_ALLOCATION_ERR,PETSC_TRUE);  then run. It
> >> will stop with an error message if preallocation is wrong.
> >>
> >>   Barry
> >>
> >>
> >>
> >> >
> >> > BTW, my codes are running without any problem on shared-memory desktop
> >> with any number of processes.
> >> >
> >>
> >
>



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20120120/4d7fe702/attachment.htm>


More information about the petsc-users mailing list