[mpich-discuss] Fatal error in PMPI_Alltoall: Other MPI error, error stack

Jeff Hammond jhammond at alcf.anl.gov
Thu Oct 18 20:16:31 CDT 2012


What do you mean "I've also watched the memory usage..."?  Are you
using a memory profiler, such as the one in TAU?  Calling "free" from
the command line while a simulation is running is not a reliable way
to determine if the application is allocating memory.

Jeff

On Thu, Oct 18, 2012 at 5:24 PM, Ryan Crocker <rcrocker at uvm.edu> wrote:
> I'm implementing MPICH2_1.3.1 compiled with gcc and gfortran in 64bit on a linux cluster on 144 processors with 2GB per node.  I'm running an in house flow solver coded in fortran and for some reason i get this error:
>
> MXMPI:FATAL-ERROR:0:Fatal error in PMPI_Alltoall: Other MPI error, error stack:
> PMPI_Alltoall(773).....................: MPI_Alltoall(sbuf=0x273436e0, scount=22, MPI_DOUBLE_PRECISION, rbuf=0x1cd98fb0, rcount=22, MPI_DOUBLE_PRECISION, comm=0x84000001) failed
> MPIR_Alltoall_impl(651)................:
> MPIR_Alltoall(619).....................:
> MPIR_Alltoall_intra(206)...............:
> MPIR_Type_create_indexed_block_impl(48):
> MPID_Type_vector(57)...................: Out of memory
> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
>
> it happens about 2000 iteration into my run.  I've put run the exact same simulation on a mac workstation and i do not get this error.  I've also watched the memory usage, it does not increase during my run on the work station.
>
> So far i've tried adding  MPI_BARRIER in front of my alltoall calls but does not seem to help.  I've also updated a local version of mpich2_1.5, and though slow i've run into the same problem, at the same iteration number.
>
> Ryan Crocker
> University of Vermont, School of Engineering
> Mechanical Engineering Department
> rcrocker at uvm.edu
> 315-212-7331
>
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond


More information about the mpich-discuss mailing list