[mpich-discuss] Fatal error in PMPI_Alltoall: Other MPI error, error stack

Ryan Crocker rcrocker at uvm.edu
Thu Oct 18 21:37:33 CDT 2012


I use my activity monitor to see if the amount of virtual and physical memory for the simulation increases per processor as i run a simulation it does not.  So there is no blatant memory leak.  I've done a core dump and i am below 2GB of allocated memory per node on the cluster and my workstation.  I really need to know what specifically that error means.  Is that node out of memory, are the send and receive buffers different sizes, what?   I just coppied one error but when i run in verbose every processor spits out that error, so that has me stumped.

On Oct 18, 2012, at 6:16 PM, Jeff Hammond wrote:

> What do you mean "I've also watched the memory usage..."?  Are you
> using a memory profiler, such as the one in TAU?  Calling "free" from
> the command line while a simulation is running is not a reliable way
> to determine if the application is allocating memory.
> 
> Jeff
> 
> On Thu, Oct 18, 2012 at 5:24 PM, Ryan Crocker <rcrocker at uvm.edu> wrote:
>> I'm implementing MPICH2_1.3.1 compiled with gcc and gfortran in 64bit on a linux cluster on 144 processors with 2GB per node.  I'm running an in house flow solver coded in fortran and for some reason i get this error:
>> 
>> MXMPI:FATAL-ERROR:0:Fatal error in PMPI_Alltoall: Other MPI error, error stack:
>> PMPI_Alltoall(773).....................: MPI_Alltoall(sbuf=0x273436e0, scount=22, MPI_DOUBLE_PRECISION, rbuf=0x1cd98fb0, rcount=22, MPI_DOUBLE_PRECISION, comm=0x84000001) failed
>> MPIR_Alltoall_impl(651)................:
>> MPIR_Alltoall(619).....................:
>> MPIR_Alltoall_intra(206)...............:
>> MPIR_Type_create_indexed_block_impl(48):
>> MPID_Type_vector(57)...................: Out of memory
>> APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
>> 
>> it happens about 2000 iteration into my run.  I've put run the exact same simulation on a mac workstation and i do not get this error.  I've also watched the memory usage, it does not increase during my run on the work station.
>> 
>> So far i've tried adding  MPI_BARRIER in front of my alltoall calls but does not seem to help.  I've also updated a local version of mpich2_1.5, and though slow i've run into the same problem, at the same iteration number.
>> 
>> Ryan Crocker
>> University of Vermont, School of Engineering
>> Mechanical Engineering Department
>> rcrocker at uvm.edu
>> 315-212-7331
>> 
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
> 
> -- 
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

Ryan Crocker
University of Vermont, School of Engineering
Mechanical Engineering Department
rcrocker at uvm.edu
315-212-7331



More information about the mpich-discuss mailing list