[petsc-users] How to understand these error messages

Peter Lichtner peter.lichtner at gmail.com
Mon Jun 24 08:33:53 CDT 2013


Just in case this helps I use Yellowstone for running PFLOTRAN with both gcc and intel compilers using the developer version of PETSc. My configuration script reads for intel:

./config/configure.py --with-cc=mpicc --with-fc=mpif90 --with-cxx=mpicxx --with-clanguage=c --with-blas-lapack-dir=$BLAS_LAPACK_LIB_DIR --with-shared-libraries=0 --with-debugging=0 --download-hdf5=yes --download-parmetis=yes --download-metis=yes

echo $BLAS_LAPACK_LIB_DIR
/ncar/opt/intel/12.1.0.233/composer_xe_2013.1.117/mkl

module load cmake/2.8.10.2

Intel was a little faster compared to gcc.

...Peter

On Jun 24, 2013, at 1:53 AM, Fande Kong <fd.kong at siat.ac.cn> wrote:

> Hi Barry,
> 
> I switched to gnu compiler. I also got the similar results:
> 
> 
> [330]PETSC ERROR: --------------------- Error Message ------------------------------------
> [330]PETSC ERROR: Petsc has generated inconsistent data!
> [330]PETSC ERROR: Negative MPI source: stash->nrecvs=27 i=33 MPI_SOURCE=-32766 MPI_TAG=-32766 MPI_ERROR=5243744!
> [330]PETSC ERROR: ------------------------------------------------------------------------
> [330]PETSC ERROR: Petsc Release Version 3.4.1, unknown 
> [330]PETSC ERROR: See docs/changes/index.html for recent updates.
> [330]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> [330]PETSC ERROR: See docs/index.html for manual pages.
> [330]PETSC ERROR: ------------------------------------------------------------------------
> [330]PETSC ERROR: ./linearElasticity on a arch-linux2-cxx-opt_gnu named ys0554 by fandek Mon Jun 24 01:42:37 2013
> [330]PETSC ERROR: Libraries linked from /glade/p/work/fandek/petsc/arch-linux2-cxx-opt_gnu/lib
> [330]PETSC ERROR: Configure run at Mon Jun 24 00:34:40 2013
> [330]PETSC ERROR: Configure options --with-valgrind=1 --with-clanguage=cxx --with-shared-libraries=1 --with-dynamic-loading=1 --download-f-blas-lapack=1 --with-mpi=1 --download-parmetis=1 --download-metis=1 --with-64-bit-indices=1 --download-netcdf=1 --download-exodusii=1 --download-ptscotch=1 --download-hdf5=1 --with-debugging=no
> [330]PETSC ERROR: ------------------------------------------------------------------------
> [330]PETSC ERROR: MatStashScatterGetMesg_Private() line 633 in /src/mat/utilsmatstash.c
> [330]PETSC ERROR: MatAssemblyEnd_MPIAIJ() line 676 in /src/mat/impls/aij/mpimpiaij.c
> [330]PETSC ERROR: MatAssemblyEnd() line 4939 in /src/mat/interfacematrix.c
> [330]PETSC ERROR: SpmcsDMMeshCreatVertexMatrix() line 65 in meshreorder.cpp
> [330]PETSC ERROR: SpmcsDMMeshReOrderingMeshPoints() line 125 in meshreorder.cpp
> [330]PETSC ERROR: CreateProblem() line 59 in preProcessSetUp.cpp
> [330]PETSC ERROR: DMmeshInitialize() line 78 in mgInitialize.cpp
> [330]PETSC ERROR: main() line 71 in linearElasticity3d.cpp
> 
> 
> 
> Thus, I think that it has nothing to do with the compiler.
> 
> 
> On Sun, Jun 23, 2013 at 11:45 PM, Fande Kong <fd.kong at siat.ac.cn> wrote:
> Thanks Barry,
> 
> I will try impi. 
> 
> I have another question. In the previous email, you said If I can change to use another compiler. Why I need to change the compiler?
> 
> 
> On Mon, Jun 24, 2013 at 12:27 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
> 
>    Fande,
> 
>    We've seen trouble before with IBM on large intel systems at scale.
> 
>     From the previous configure.log you sent I see
> 
> sh: mpicc -show
> Executing: mpicc -show
> sh: /ncar/opt/intel/12.1.0.233/composer_xe_2011_sp1.11.339/bin/intel64/icc   -I/glade/apps/el6/include  -I/glade/apps/el6/usr/include  -I/glade/apps/opt/netcdf/4.2/intel/default/include  -Wl,-rpath,/ncar/opt/intel/12.1.0.233/composer_xe_2011_sp1.11.339/compiler/lib/intel64  -Wl,-rpath,/ncar/opt/intel/12.1.0.233/composer_xe_2011_sp1.11.339/compiler/lib/ia32  -L/glade/apps/el6/usr/lib  -L/glade/apps/el6/usr/lib64  -Wl,-rpath,/glade/apps/el6/usr/lib  -Wl,-rpath,/glade/apps/el6/usr/lib64  -L/glade/apps/opt/netcdf/4.2/intel/default/lib  -lnetcdf_c++4  -lnetcdff  -lnetcdf  -Wl,-rpath,/glade/apps/opt/netcdf/4.2/intel/default/lib  -m64 -D__64BIT__ -Wl,--allow-shlib-undefined -Wl,--enable-new-dtags -Wl,-rpath,/opt/ibmhpc/pe1209/mpich2/intel/lib64 -Wl,-rpath,/ncar/opt/intel/12.1.0.233/composer_xe_2011_sp1.11.339/compiler/lib/intel64 -I/opt/ibmhpc/pe1209/mpich2/intel/include64 -I/opt/ibmhpc/pe1209/base/include -L/opt/ibmhpc/pe1209/mpich2/intel/lib64 -lmpi -ldl -L/ncar/opt/intel/12.1.0.233/composer_xe_2011_sp1.11.339/compiler/lib/intel64 -lirc -lpthread -lrt
> 
>   Note the -I/opt/ibmhpc/pe1209/base/include -L/opt/ibmhpc/pe1209/mpich2/intel/lib64 -lmpi   which is probably some IBM hack job of some ancient mpich2
> 
>   Now the page http://www2.cisl.ucar.edu/resources/yellowstone/software/modules-intel-dependent  has the modules
> 
> 
> impi/4.0.3.008  This module loads the Intel MPI Library. See http://software.intel.com/en-us/intel-mpi-library/ for details.
> impi/4.1.0.030  This module loads the Intel MPI Library. See http://software.intel.com/en-us/intel-mpi-library/ for details.
> 
>   Perhaps you could load those modules with the Intel compilers and avoid the IBM MPI? If that solves the problem then we know the IBM MPI is too blame.   We are interested in working with you to determine the problem.
> 
>    Barry
> 
> 
> 
> 
> On Jun 23, 2013, at 9:14 PM, Fande Kong <fd.kong at siat.ac.cn> wrote:
> 
> > Thanks Barry,
> > Thanks Jed,
> >
> > The computer I am using is Yellowstone http://en.wikipedia.org/wiki/Yellowstone_(supercomputer), or http://www2.cisl.ucar.edu/resources/yellowstone.    The compiler is intel compiler. The mpi is IBM mpi which is a part of IBM PE.
> >
> > With less unknowns (about 5 \times 10^7), the code can correctly run. With unknowns (4 \times 10^8), the code produced  the error messages.  But with  so large unknowns (4 \times 10^8), the code can also run with less cores. This is very strange.
> >
> > When I switch to gnu compiler, I can not install petsc, I got the following errors:
> >
> > *******************************************************************************
> >          UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for details):
> > -------------------------------------------------------------------------------
> > Downloaded exodusii could not be used. Please check install in /glade/p/work/fandek/petsc/arch-linux2-cxx-opt_gnu
> > *******************************************************************************
> >   File "./config/configure.py", line 293, in petsc_configure
> >     framework.configure(out = sys.stdout)
> >   File "/glade/p/work/fandek/petsc/config/BuildSystem/config/framework.py", line 933, in configure
> >     child.configure()
> >   File "/glade/p/work/fandek/petsc/config/BuildSystem/config/package.py", line 556, in configure
> >     self.executeTest(self.configureLibrary)
> >   File "/glade/p/work/fandek/petsc/config/BuildSystem/config/base.py", line 115, in executeTest
> >     ret = apply(test, args,kargs)
> >   File "/glade/p/work/fandek/petsc/config/BuildSystem/config/packages/exodusii.py", line 36, in configureLibrary
> >     config.package.Package.configureLibrary(self)
> >   File "/glade/p/work/fandek/petsc/config/BuildSystem/config/package.py", line 484, in configureLibrary
> >     for location, directory, lib, incl in self.generateGuesses():
> >   File "/glade/p/work/fandek/petsc/config/BuildSystem/config/package.py", line 238, in generateGuesses
> >     raise RuntimeError('Downloaded '+self.package+' could not be used. Please check install in '+d+'\n')
> >
> >
> > The configure.log is attached.
> >
> > Regards,
> > On Mon, Jun 24, 2013 at 1:03 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> > Barry Smith <bsmith at mcs.anl.gov> writes:
> >
> > >    What kind of computer system are you running? What MPI does it use? These values are nonsense MPI_SOURCE=-32766 MPI_TAG=-32766
> >
> > From configure.log, this is Intel MPI.  Can you ask their support what
> > this error condition is supposed to mean?  It's not clear to me that
> > MPI_SOURCE or MPI_TAG contain any meaningful information (though it
> > could be indicative of an internal overflow), but this value of
> > MPI_ERROR should mean something.
> >
> > >     Is it possible to run the code with valgrind?
> > >
> > >     Any chance of running the code with a different compiler?
> > >
> > >    Barry
> > >
> > >
> > >
> > > On Jun 23, 2013, at 4:12 AM, Fande Kong <fd.kong at siat.ac.cn> wrote:
> > >
> > >> Thanks Jed,
> > >>
> > >> I added your code into the petsc. I run my code with 10240 cores. I got the following error messages:
> > >>
> > >> [6724]PETSC ERROR: --------------------- Error Message ------------------------------------
> > >> [6724]PETSC ERROR: Petsc has generated inconsistent data!
> > >> [6724]PETSC ERROR: Negative MPI source: stash->nrecvs=8 i=11 MPI_SOURCE=-32766 MPI_TAG=-32766 MPI_ERROR=20613892!
> > >> [6724]PETSC ERROR: ------------------------------------------------------------------------
> > >> [6724]PETSC ERROR: Petsc Release Version 3.4.1, unknown
> > >> [6724]PETSC ERROR: See docs/changes/index.html for recent updates.
> > >> [6724]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > >> [6724]PETSC ERROR: See docs/index.html for manual pages.
> > >> [6724]PETSC ERROR: ------------------------------------------------------------------------
> > >> [6724]PETSC ERROR: ./linearElasticity on a arch-linux2-cxx-debug named ys4350 by fandek Sun Jun 23 02:58:23 2013
> > >> [6724]PETSC ERROR: Libraries linked from /glade/p/work/fandek/petsc/arch-linux2-cxx-debug/lib
> > >> [6724]PETSC ERROR: Configure run at Sun Jun 23 00:46:05 2013
> > >> [6724]PETSC ERROR: Configure options --with-valgrind=1 --with-clanguage=cxx --with-shared-libraries=1 --with-dynamic-loading=1 --download-f-blas-lapack=1 --with-mpi=1 --d
> > >> ownload-parmetis=1 --download-metis=1 --with-64-bit-indices=1 --download-netcdf=1 --download-exodusii=1 --download-ptscotch=1 --download-hdf5=1 --with-debugging=yes
> > >> [6724]PETSC ERROR: ------------------------------------------------------------------------
> > >> [6724]PETSC ERROR: MatStashScatterGetMesg_Private() line 633 in /src/mat/utilsmatstash.c
> > >> [6724]PETSC ERROR: MatAssemblyEnd_MPIAIJ() line 676 in /src/mat/impls/aij/mpimpiaij.c
> > >> [6724]PETSC ERROR: MatAssemblyEnd() line 4939 in /src/mat/interfacematrix.c
> > >> [6724]PETSC ERROR: SpmcsDMMeshCreatVertexMatrix() line 65 in meshreorder.cpp
> > >> [6724]PETSC ERROR: SpmcsDMMeshReOrderingMeshPoints() line 125 in meshreorder.cpp
> > >> [6724]PETSC ERROR: CreateProblem() line 59 in preProcessSetUp.cpp
> > >> [6724]PETSC ERROR: DMmeshInitialize() line 78 in mgInitialize.cpp
> > >> [6724]PETSC ERROR: main() line 71 in linearElasticity3d.cpp
> > >> Abort(77) on node 6724 (rank 6724 in comm 1140850688): application called MPI_Abort(MPI_COMM_WORLD, 77) - process 6724
> > >> [2921]PETSC ERROR: --------------------- Error Message ------------------------------------
> > >> [2921]PETSC ERROR: Petsc has generated inconsistent data!
> > >> [2921]PETSC ERROR: Negative MPI source: stash->nrecvs=15 i=3 MPI_SOURCE=-32766 MPI_TAG=-32766 MPI_ERROR=3825270!
> > >> [2921]PETSC ERROR: ------------------------------------------------------------------------
> > >> [2921]PETSC ERROR: Petsc Release Version 3.4.1, unknown
> > >> [2921]PETSC ERROR: See docs/changes/index.html for recent updates.
> > >> [2921]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> > >> [2921]PETSC ERROR: See docs/index.html for manual pages.
> > >> [2921]PETSC ERROR: ------------------------------------------------------------------------
> > >> [2921]PETSC ERROR: ./linearElasticity on a arch-linux2-cxx-debug named ys0270 by fandek Sun Jun 23 02:58:23 2013
> > >> [2921]PETSC ERROR: Libraries linked from /glade/p/work/fandek/petsc/arch-linux2-cxx-debug/lib
> > >> [2921]PETSC ERROR: Configure run at Sun Jun 23 00:46:05 2013
> > >> [2921]PETSC ERROR: Configure options --with-valgrind=1 --with-clanguage=cxx --with-shared-libraries=1 --with-dynamic-loading=1 --download-f-blas-lapack=1 --with-mpi=1 --download-parmetis=1 --download-metis=1 --with-64-bit-indices=1 --download-netcdf=1 --download-exodusii=1 --download-ptscotch=1 --download-hdf5=1 --with-debugging=yes
> > >> [2921]PETSC ERROR: ------------------------------------------------------------------------
> > >> [2921]PETSC ERROR: MatStashScatterGetMesg_Private() line 633 in /src/mat/utilsmatstash.c
> > >> [2921]PETSC ERROR: MatAssemblyEnd_MPIAIJ() line 676 in /src/mat/impls/aij/mpimpiaij.c
> > >> [2921]PETSC ERROR: MatAssemblyEnd() line 4939 in /src/mat/interfacematrix.c
> > >> [2921]PETSC ERROR: SpmcsDMMeshCreatVertexMatrix() line 65 in meshreorder.cpp
> > >> [2921]PETSC ERROR: SpmcsDMMeshReOrderingMeshPoints() line 125 in meshreorder.cpp
> > >> [2921]PETSC ERROR: CreateProblem() line 59 in preProcessSetUp.cpp
> > >> [2921]PETSC ERROR: DMmeshInitialize() line 78 in mgInitialize.cpp
> > >> [2921]PETSC ERROR: main() line 71 in linearElasticity3d.cpp
> > >> :
> > >>
> > >> On Fri, Jun 21, 2013 at 4:33 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:
> > >> Fande Kong <fd.kong at siat.ac.cn> writes:
> > >>
> > >> > The code works well with less cores. And It also works well with
> > >> > petsc-3.3-p7. But it does not work with petsc-3.4.1. Thus, If you can check
> > >> > the differences between petsc-3.3-p7 and petsc-3.4.1, you can figure out
> > >> > the reason.
> > >>
> > >> That is one way to start debugging, but there are no changes to the core
> > >> MatStash code, and many, many changes to PETSc in total.  The relevant
> > >> snippet of code is here:
> > >>
> > >>     if (stash->reproduce) {
> > >>       i    = stash->reproduce_count++;
> > >>       ierr = MPI_Wait(stash->recv_waits+i,&recv_status);CHKERRQ(ierr);
> > >>     } else {
> > >>       ierr = MPI_Waitany(2*stash->nrecvs,stash->recv_waits,&i,&recv_status);CHKERRQ(ierr);
> > >>     }
> > >>     if (recv_status.MPI_SOURCE < 0) SETERRQ(PETSC_COMM_SELF,PETSC_ERR_PLIB,"Negative MPI source!");
> > >>
> > >> So MPI returns correctly (stash->reproduce will be FALSE unless you
> > >> changed it).  You could change the line above to the following:
> > >>
> > >>   if (recv_status.MPI_SOURCE < 0) SETERRQ5(PETSC_COMM_SELF,PETSC_ERR_PLIB,"Negative MPI source: stash->nrecvs=%D i=%d MPI_SOURCE=%d MPI_TAG=%d MPI_ERROR=%d",
> > >>                                           stash->nrecvs,i,recv_status.MPI_SOURCE,recv_status.MPI_TAG,recv_status.MPI_ERROR);
> > >>
> > >>
> > >> It would help to debug --with-debugging=1, so that more checks for
> > >> corrupt data are performed.  You can still make the compiler optimize if
> > >> it takes a long time to reach the error condition.
> > >>
> > >>
> > >>
> > >> --
> > >> Fande Kong
> > >> ShenZhen Institutes of Advanced Technology
> > >> Chinese Academy of Sciences
> >
> >
> >
> > --
> > Fande Kong
> > ShenZhen Institutes of Advanced Technology
> > Chinese Academy of Sciences
> > <configure.zip>
> 
> 
> 
> 
> 
> -- 
> Fande Kong
> ShenZhen Institutes of Advanced Technology
> Chinese Academy of Sciences
> 
> 
> 
> -- 
> Fande Kong
> ShenZhen Institutes of Advanced Technology
> Chinese Academy of Sciences

________________
Peter Lichtner
Santa Fe, NM 87507
(505) 692-4029 (c)
OFM Research/LANL Guest Scientist

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130624/0ef0d46a/attachment-0001.html>


More information about the petsc-users mailing list