[petsc-users] How to understand these error messages

Jeff Hammond jhammond at alcf.anl.gov
Wed Jun 26 09:56:01 CDT 2013

This concerns IBM PE-MPI on iDataPlex, which is likely based upon the cluster implementation of PAMI, which is a completely different code base from the PAMI Blue Gene implementation.  If you can reproduce it on Blue Gene/Q, I will care.

As an IBM customer, NCAR is endowed with the ability to file bug reports directly with IBM related to the products they possess.  There is a link to their support system on http://www2.cisl.ucar.edu/resources/yellowstone, which is the appropriate channel for users of Yellowstone that have issues with the system software installed there.


----- Original Message -----
From: "Jed Brown" <jedbrown at mcs.anl.gov>
To: "Fande Kong" <fd.kong at siat.ac.cn>, "petsc-users" <petsc-users at mcs.anl.gov>
Cc: "Jeff Hammond" <jhammond at alcf.anl.gov>
Sent: Wednesday, June 26, 2013 9:21:48 AM
Subject: Re: [petsc-users] How to understand these error messages

Fande Kong <fd.kong at siat.ac.cn> writes:

> Hi Barry,
> If I use the intel mpi, my code can correctly run and can produce some
> correct results. Yes, you are right. The IBM MPI has some bugs.

Fande, please report this issue to the IBM.

Jeff, Fande has a reproducible case where when running on 10k cores and
problem sizes over 100M, this



      [6724]PETSC ERROR: Negative MPI source: stash->nrecvs=8 i=11
      MPI_SOURCE=-32766 MPI_TAG=-32766 MPI_ERROR=20613892!

It runs correctly for smaller problem sizes, smaller core counts, or for
all sizes when using Intel MPI.  This is on Yellowstone (iDataPlex, 4500
dx360 nodes).  Do you know someone at IBM that should be notified?

Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
ALCF docs: http://www.alcf.anl.gov/user-guides

More information about the petsc-users mailing list