[petsc-users] How to understand these error messages

Jed Brown jedbrown at mcs.anl.gov
Sun Jun 23 12:03:30 CDT 2013


Barry Smith <bsmith at mcs.anl.gov> writes:

>    What kind of computer system are you running? What MPI does it use? These values are nonsense MPI_SOURCE=-32766 MPI_TAG=-32766 

From configure.log, this is Intel MPI.  Can you ask their support what
this error condition is supposed to mean?  It's not clear to me that
MPI_SOURCE or MPI_TAG contain any meaningful information (though it
could be indicative of an internal overflow), but this value of
MPI_ERROR should mean something.

>     Is it possible to run the code with valgrind?  
>
>     Any chance of running the code with a different compiler?
>
>    Barry
>
>
>
> On Jun 23, 2013, at 4:12 AM, Fande Kong <fd.kong at siat.ac.cn> wrote:
>
>> Thanks Jed,
>> 
>> I added your code into the petsc. I run my code with 10240 cores. I got the following error messages:
>> 
>> [6724]PETSC ERROR: --------------------- Error Message ------------------------------------
>> [6724]PETSC ERROR: Petsc has generated inconsistent data!
>> [6724]PETSC ERROR: Negative MPI source: stash->nrecvs=8 i=11 MPI_SOURCE=-32766 MPI_TAG=-32766 MPI_ERROR=20613892!
>> [6724]PETSC ERROR: ------------------------------------------------------------------------
>> [6724]PETSC ERROR: Petsc Release Version 3.4.1, unknown 
>> [6724]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [6724]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [6724]PETSC ERROR: See docs/index.html for manual pages.
>> [6724]PETSC ERROR: ------------------------------------------------------------------------
>> [6724]PETSC ERROR: ./linearElasticity on a arch-linux2-cxx-debug named ys4350 by fandek Sun Jun 23 02:58:23 2013
>> [6724]PETSC ERROR: Libraries linked from /glade/p/work/fandek/petsc/arch-linux2-cxx-debug/lib
>> [6724]PETSC ERROR: Configure run at Sun Jun 23 00:46:05 2013
>> [6724]PETSC ERROR: Configure options --with-valgrind=1 --with-clanguage=cxx --with-shared-libraries=1 --with-dynamic-loading=1 --download-f-blas-lapack=1 --with-mpi=1 --d
>> ownload-parmetis=1 --download-metis=1 --with-64-bit-indices=1 --download-netcdf=1 --download-exodusii=1 --download-ptscotch=1 --download-hdf5=1 --with-debugging=yes
>> [6724]PETSC ERROR: ------------------------------------------------------------------------
>> [6724]PETSC ERROR: MatStashScatterGetMesg_Private() line 633 in /src/mat/utilsmatstash.c
>> [6724]PETSC ERROR: MatAssemblyEnd_MPIAIJ() line 676 in /src/mat/impls/aij/mpimpiaij.c
>> [6724]PETSC ERROR: MatAssemblyEnd() line 4939 in /src/mat/interfacematrix.c
>> [6724]PETSC ERROR: SpmcsDMMeshCreatVertexMatrix() line 65 in meshreorder.cpp
>> [6724]PETSC ERROR: SpmcsDMMeshReOrderingMeshPoints() line 125 in meshreorder.cpp
>> [6724]PETSC ERROR: CreateProblem() line 59 in preProcessSetUp.cpp
>> [6724]PETSC ERROR: DMmeshInitialize() line 78 in mgInitialize.cpp
>> [6724]PETSC ERROR: main() line 71 in linearElasticity3d.cpp
>> Abort(77) on node 6724 (rank 6724 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 77) - process 6724
>> [2921]PETSC ERROR: --------------------- Error Message ------------------------------------
>> [2921]PETSC ERROR: Petsc has generated inconsistent data!
>> [2921]PETSC ERROR: Negative MPI source: stash->nrecvs=15 i=3 MPI_SOURCE=-32766 MPI_TAG=-32766 MPI_ERROR=3825270!
>> [2921]PETSC ERROR: ------------------------------------------------------------------------
>> [2921]PETSC ERROR: Petsc Release Version 3.4.1, unknown 
>> [2921]PETSC ERROR: See docs/changes/index.html for recent updates.
>> [2921]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
>> [2921]PETSC ERROR: See docs/index.html for manual pages.
>> [2921]PETSC ERROR: ------------------------------------------------------------------------
>> [2921]PETSC ERROR: ./linearElasticity on a arch-linux2-cxx-debug named ys0270 by fandek Sun Jun 23 02:58:23 2013
>> [2921]PETSC ERROR: Libraries linked from /glade/p/work/fandek/petsc/arch-linux2-cxx-debug/lib
>> [2921]PETSC ERROR: Configure run at Sun Jun 23 00:46:05 2013
>> [2921]PETSC ERROR: Configure options --with-valgrind=1 --with-clanguage=cxx --with-shared-libraries=1 --with-dynamic-loading=1 --download-f-blas-lapack=1 --with-mpi=1 --download-parmetis=1 --download-metis=1 --with-64-bit-indices=1 --download-netcdf=1 --download-exodusii=1 --download-ptscotch=1 --download-hdf5=1 --with-debugging=yes
>> [2921]PETSC ERROR: ------------------------------------------------------------------------
>> [2921]PETSC ERROR: MatStashScatterGetMesg_Private() line 633 in /src/mat/utilsmatstash.c
>> [2921]PETSC ERROR: MatAssemblyEnd_MPIAIJ() line 676 in /src/mat/impls/aij/mpimpiaij.c
>> [2921]PETSC ERROR: MatAssemblyEnd() line 4939 in /src/mat/interfacematrix.c
>> [2921]PETSC ERROR: SpmcsDMMeshCreatVertexMatrix() line 65 in meshreorder.cpp
>> [2921]PETSC ERROR: SpmcsDMMeshReOrderingMeshPoints() line 125 in meshreorder.cpp
>> [2921]PETSC ERROR: CreateProblem() line 59 in preProcessSetUp.cpp
>> [2921]PETSC ERROR: DMmeshInitialize() line 78 in mgInitialize.cpp
>> [2921]PETSC ERROR: main() line 71 in linearElasticity3d.cpp
>> :
>> 
>> On Fri, Jun 21, 2013 at 4:33 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
>> Fande Kong <fd.kong at siat.ac.cn> writes:
>> 
>> > The code works well with less cores. And It also works well with
>> > petsc-3.3-p7. But it does not work with petsc-3.4.1. Thus, If you can check
>> > the differences between petsc-3.3-p7 and petsc-3.4.1, you can figure out
>> > the reason.
>> 
>> That is one way to start debugging, but there are no changes to the core
>> MatStash code, and many, many changes to PETSc in total.  The relevant
>> snippet of code is here:
>> 
>>     if (stash->reproduce) {
>>       i    = stash->reproduce_count++;
>>       ierr = MPI_Wait(stash->recv_waits+i,&recv_status);CHKERRQ(ierr);
>>     } else {
>>       ierr = MPI_Waitany(2*stash->nrecvs,stash->recv_waits,&i,&recv_status);CHKERRQ(ierr);
>>     }
>>     if (recv_status.MPI_SOURCE < 0) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_PLIB,"Negative MPI source!");
>> 
>> So MPI returns correctly (stash->reproduce will be FALSE unless you
>> changed it).  You could change the line above to the following:
>> 
>>   if (recv_status.MPI_SOURCE < 0) SETERRQ5(PETSC_COMM_SELF,PETSC_ERR_PLIB,"Negative MPI source: stash->nrecvs=%D i=%d MPI_SOURCE=%d MPI_TAG=%d MPI_ERROR=%d",
>>                                           stash->nrecvs,i,recv_status.MPI_SOURCE,recv_status.MPI_TAG,recv_status.MPI_ERROR);
>> 
>> 
>> It would help to debug --with-debugging=1, so that more checks for
>> corrupt data are performed.  You can still make the compiler optimize if
>> it takes a long time to reach the error condition.
>> 
>> 
>> 
>> -- 
>> Fande Kong
>> ShenZhen Institutes of Advanced Technology
>> Chinese Academy of Sciences
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130623/43d2068d/attachment.pgp>


More information about the petsc-users mailing list