[petsc-users] How to understand these error messages

Fande Kong fd.kong at siat.ac.cn
Sun Jun 23 21:14:35 CDT 2013


Thanks Barry,
Thanks Jed,

The computer I am using is Yellowstone
http://en.wikipedia.org/wiki/Yellowstone_(supercomputer), or
http://www2.cisl.ucar.edu/resources/yellowstone.    The compiler is intel
compiler. The mpi is IBM mpi which is a part of IBM PE.

With less unknowns (about 5 \times 10^7), the code can correctly run. With
unknowns (4 \times 10^8), the code produced  the error messages.  But with
 so large unknowns (4 \times 10^8), the code can also run with less cores.
This is very strange.

When I switch to gnu compiler, I can not install petsc, I got the following
errors:

*******************************************************************************
         UNABLE to CONFIGURE with GIVEN OPTIONS    (see configure.log for
details):
-------------------------------------------------------------------------------
Downloaded exodusii could not be used. Please check install in
/glade/p/work/fandek/petsc/arch-linux2-cxx-opt_gnu
*******************************************************************************
  File "./config/configure.py", line 293, in petsc_configure
    framework.configure(out = sys.stdout)
  File "/glade/p/work/fandek/petsc/config/BuildSystem/config/framework.py",
line 933, in configure
    child.configure()
  File "/glade/p/work/fandek/petsc/config/BuildSystem/config/package.py",
line 556, in configure
    self.executeTest(self.configureLibrary)
  File "/glade/p/work/fandek/petsc/config/BuildSystem/config/base.py", line
115, in executeTest
    ret = apply(test, args,kargs)
  File
"/glade/p/work/fandek/petsc/config/BuildSystem/config/packages/exodusii.py",
line 36, in configureLibrary
    config.package.Package.configureLibrary(self)
  File "/glade/p/work/fandek/petsc/config/BuildSystem/config/package.py",
line 484, in configureLibrary
    for location, directory, lib, incl in self.generateGuesses():
  File "/glade/p/work/fandek/petsc/config/BuildSystem/config/package.py",
line 238, in generateGuesses
    raise RuntimeError('Downloaded '+self.package+' could not be used.
Please check install in '+d+'\n')


The configure.log is attached.

Regards,
On Mon, Jun 24, 2013 at 1:03 AM, Jed Brown <jedbrown at mcs.anl.gov> wrote:

> Barry Smith <bsmith at mcs.anl.gov> writes:
>
> >    What kind of computer system are you running? What MPI does it use?
> These values are nonsense MPI_SOURCE=-32766 MPI_TAG=-32766
>
> From configure.log, this is Intel MPI.  Can you ask their support what
> this error condition is supposed to mean?  It's not clear to me that
> MPI_SOURCE or MPI_TAG contain any meaningful information (though it
> could be indicative of an internal overflow), but this value of
> MPI_ERROR should mean something.
>
> >     Is it possible to run the code with valgrind?
> >
> >     Any chance of running the code with a different compiler?
> >
> >    Barry
> >
> >
> >
> > On Jun 23, 2013, at 4:12 AM, Fande Kong <fd.kong at siat.ac.cn> wrote:
> >
> >> Thanks Jed,
> >>
> >> I added your code into the petsc. I run my code with 10240 cores. I got
> the following error messages:
> >>
> >> [6724]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> >> [6724]PETSC ERROR: Petsc has generated inconsistent data!
> >> [6724]PETSC ERROR: Negative MPI source: stash->nrecvs=8 i=11
> MPI_SOURCE=-32766 MPI_TAG=-32766 MPI_ERROR=20613892!
> >> [6724]PETSC ERROR:
> ------------------------------------------------------------------------
> >> [6724]PETSC ERROR: Petsc Release Version 3.4.1, unknown
> >> [6724]PETSC ERROR: See docs/changes/index.html for recent updates.
> >> [6724]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> >> [6724]PETSC ERROR: See docs/index.html for manual pages.
> >> [6724]PETSC ERROR:
> ------------------------------------------------------------------------
> >> [6724]PETSC ERROR: ./linearElasticity on a arch-linux2-cxx-debug named
> ys4350 by fandek Sun Jun 23 02:58:23 2013
> >> [6724]PETSC ERROR: Libraries linked from
> /glade/p/work/fandek/petsc/arch-linux2-cxx-debug/lib
> >> [6724]PETSC ERROR: Configure run at Sun Jun 23 00:46:05 2013
> >> [6724]PETSC ERROR: Configure options --with-valgrind=1
> --with-clanguage=cxx --with-shared-libraries=1 --with-dynamic-loading=1
> --download-f-blas-lapack=1 --with-mpi=1 --d
> >> ownload-parmetis=1 --download-metis=1 --with-64-bit-indices=1
> --download-netcdf=1 --download-exodusii=1 --download-ptscotch=1
> --download-hdf5=1 --with-debugging=yes
> >> [6724]PETSC ERROR:
> ------------------------------------------------------------------------
> >> [6724]PETSC ERROR: MatStashScatterGetMesg_Private() line 633 in
> /src/mat/utilsmatstash.c
> >> [6724]PETSC ERROR: MatAssemblyEnd_MPIAIJ() line 676 in
> /src/mat/impls/aij/mpimpiaij.c
> >> [6724]PETSC ERROR: MatAssemblyEnd() line 4939 in
> /src/mat/interfacematrix.c
> >> [6724]PETSC ERROR: SpmcsDMMeshCreatVertexMatrix() line 65 in
> meshreorder.cpp
> >> [6724]PETSC ERROR: SpmcsDMMeshReOrderingMeshPoints() line 125 in
> meshreorder.cpp
> >> [6724]PETSC ERROR: CreateProblem() line 59 in preProcessSetUp.cpp
> >> [6724]PETSC ERROR: DMmeshInitialize() line 78 in mgInitialize.cpp
> >> [6724]PETSC ERROR: main() line 71 in linearElasticity3d.cpp
> >> Abort(77) on node 6724 (rank 6724 in comm 1140850688): application
> called MPI_Abort(MPI_COMM_WORLD, 77) - process 6724
> >> [2921]PETSC ERROR: --------------------- Error Message
> ------------------------------------
> >> [2921]PETSC ERROR: Petsc has generated inconsistent data!
> >> [2921]PETSC ERROR: Negative MPI source: stash->nrecvs=15 i=3
> MPI_SOURCE=-32766 MPI_TAG=-32766 MPI_ERROR=3825270!
> >> [2921]PETSC ERROR:
> ------------------------------------------------------------------------
> >> [2921]PETSC ERROR: Petsc Release Version 3.4.1, unknown
> >> [2921]PETSC ERROR: See docs/changes/index.html for recent updates.
> >> [2921]PETSC ERROR: See docs/faq.html for hints about trouble shooting.
> >> [2921]PETSC ERROR: See docs/index.html for manual pages.
> >> [2921]PETSC ERROR:
> ------------------------------------------------------------------------
> >> [2921]PETSC ERROR: ./linearElasticity on a arch-linux2-cxx-debug named
> ys0270 by fandek Sun Jun 23 02:58:23 2013
> >> [2921]PETSC ERROR: Libraries linked from
> /glade/p/work/fandek/petsc/arch-linux2-cxx-debug/lib
> >> [2921]PETSC ERROR: Configure run at Sun Jun 23 00:46:05 2013
> >> [2921]PETSC ERROR: Configure options --with-valgrind=1
> --with-clanguage=cxx --with-shared-libraries=1 --with-dynamic-loading=1
> --download-f-blas-lapack=1 --with-mpi=1 --download-parmetis=1
> --download-metis=1 --with-64-bit-indices=1 --download-netcdf=1
> --download-exodusii=1 --download-ptscotch=1 --download-hdf5=1
> --with-debugging=yes
> >> [2921]PETSC ERROR:
> ------------------------------------------------------------------------
> >> [2921]PETSC ERROR: MatStashScatterGetMesg_Private() line 633 in
> /src/mat/utilsmatstash.c
> >> [2921]PETSC ERROR: MatAssemblyEnd_MPIAIJ() line 676 in
> /src/mat/impls/aij/mpimpiaij.c
> >> [2921]PETSC ERROR: MatAssemblyEnd() line 4939 in
> /src/mat/interfacematrix.c
> >> [2921]PETSC ERROR: SpmcsDMMeshCreatVertexMatrix() line 65 in
> meshreorder.cpp
> >> [2921]PETSC ERROR: SpmcsDMMeshReOrderingMeshPoints() line 125 in
> meshreorder.cpp
> >> [2921]PETSC ERROR: CreateProblem() line 59 in preProcessSetUp.cpp
> >> [2921]PETSC ERROR: DMmeshInitialize() line 78 in mgInitialize.cpp
> >> [2921]PETSC ERROR: main() line 71 in linearElasticity3d.cpp
> >> :
> >>
> >> On Fri, Jun 21, 2013 at 4:33 AM, Jed Brown <jedbrown at mcs.anl.gov>
> wrote:
> >> Fande Kong <fd.kong at siat.ac.cn> writes:
> >>
> >> > The code works well with less cores. And It also works well with
> >> > petsc-3.3-p7. But it does not work with petsc-3.4.1. Thus, If you can
> check
> >> > the differences between petsc-3.3-p7 and petsc-3.4.1, you can figure
> out
> >> > the reason.
> >>
> >> That is one way to start debugging, but there are no changes to the core
> >> MatStash code, and many, many changes to PETSc in total.  The relevant
> >> snippet of code is here:
> >>
> >>     if (stash->reproduce) {
> >>       i    = stash->reproduce_count++;
> >>       ierr = MPI_Wait(stash->recv_waits+i,&recv_status);CHKERRQ(ierr);
> >>     } else {
> >>       ierr =
> MPI_Waitany(2*stash->nrecvs,stash->recv_waits,&i,&recv_status);CHKERRQ(ierr);
> >>     }
> >>     if (recv_status.MPI_SOURCE < 0)
> SETERRQ(PETSC_COMM_SELF,PETSC_ERR_PLIB,"Negative MPI source!");
> >>
> >> So MPI returns correctly (stash->reproduce will be FALSE unless you
> >> changed it).  You could change the line above to the following:
> >>
> >>   if (recv_status.MPI_SOURCE < 0)
> SETERRQ5(PETSC_COMM_SELF,PETSC_ERR_PLIB,"Negative MPI source:
> stash->nrecvs=%D i=%d MPI_SOURCE=%d MPI_TAG=%d MPI_ERROR=%d",
> >>
> stash->nrecvs,i,recv_status.MPI_SOURCE,recv_status.MPI_TAG,recv_status.MPI_ERROR);
> >>
> >>
> >> It would help to debug --with-debugging=1, so that more checks for
> >> corrupt data are performed.  You can still make the compiler optimize if
> >> it takes a long time to reach the error condition.
> >>
> >>
> >>
> >> --
> >> Fande Kong
> >> ShenZhen Institutes of Advanced Technology
> >> Chinese Academy of Sciences
>



-- 
Fande Kong
ShenZhen Institutes of Advanced Technology
Chinese Academy of Sciences
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130624/996494a5/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: configure.zip
Type: application/zip
Size: 159246 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20130624/996494a5/attachment-0001.zip>


More information about the petsc-users mailing list