[mpich-discuss] Fwd: [Mpi-forum] The MPI Internal error running on Hopper

Rebecca Yuan xyuan at lbl.gov
Sat Jul 30 10:29:05 CDT 2011


Hello,

Could u pls give me some suggestions to resolve the MPI problem on Hopper?

Thanks very much!

Rebecca

Begin forwarded message:

> From: Jeff Hammond <jeff.science at gmail.com>
> Date: July 30, 2011 8:17:09 AM PDT
> To: Main MPI Forum mailing list <mpi-forum at lists.mpi-forum.org>
> Subject: Re: [Mpi-forum] The MPI Internal error running on Hopper
> Reply-To: Main MPI Forum mailing list <mpi-forum at lists.mpi-forum.org>
> 

> Report to NERSC support. This is not the appropriate email list for
> support of MPI implementations.
> 
> CrayMPI is an MPICH2-based implementation so you can also try
> mpich-discuss at mcs.anl.gov but it is still preferred to contact NERSC
> first since they are the ones who own the Cray support contract for
> Hopper.
> 
> Jeff
> 
> Sent from my iPhone
> 
> On Jul 30, 2011, at 9:54 AM, "Xuefei (Rebecca) Yuan" <xyuan at lbl.gov> wrote:
> 
>> Hello, all,
>> 
>> I got some MPI internal error while running on a Cray XE6 machine (Hopper), the error message reads:
>> 
>> 
>> Rank 9 [Sat Jul 30 07:39:14 2011] [c5-2c2s1n3] Fatal error in PMPI_Wait: Other MPI error, error stack:
>> PMPI_Wait(179).....................: MPI_Wait(request=0x7fffffff7438, status0x7fffffff7460) failed
>> MPIR_Wait_impl(69).................:
>> MPIDI_CH3I_Progress(370)...........:
>> MPIDI_CH3_PktHandler_EagerSend(606): Failed to allocate memory for an unexpected message. 0 unexpected messages queued.
>> Rank 63 [Sat Jul 30 07:39:14 2011] [c0-2c2s3n0] Fatal error in MPI_Irecv: Other MPI error, error stack:
>> MPI_Irecv(147): MPI_Irecv(buf=0x4a81890, count=52, MPI_DOUBLE, src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, comm=0x84000007, request=0x7fffffff7438) failed
>> MPID_Irecv(53): failure occurred while allocating memory for a request object
>> Rank 54 [Sat Jul 30 07:39:14 2011] [c1-2c2s3n2] Fatal error in PMPI_Isend: Internal MPI error!, error stack:
>> PMPI_Isend(148): MPI_Isend(buf=0x3d12a350, count=52, MPI_DOUBLE, dest=30, tag=21, comm=0xc4000003, request=0x3c9c12f0) failed
>> (unknown)(): Internal MPI error!
>> Rank 45 [Sat Jul 30 07:39:14 2011] [c1-2c2s2n3] Fatal error in PMPI_Isend: Internal MPI error!, error stack:
>> PMPI_Isend(148): MPI_Isend(buf=0x3c638de0, count=34, MPI_DOUBLE, dest=61, tag=21, comm=0x84000007, request=0x3c03be90) failed
>> (unknown)(): Internal MPI error!
>> Rank 36 [Sat Jul 30 07:39:14 2011] [c3-2c2s2n1] Fatal error in PMPI_Isend: Internal MPI error!, error stack:
>> PMPI_Isend(148): MPI_Isend(buf=0x3caaf170, count=52, MPI_DOUBLE, dest=28, tag=21, comm=0xc4000003, request=0x3c2e561c) failed
>> (unknown)(): Internal MPI error!
>> _pmii_daemon(SIGCHLD): [NID 00102] [c0-2c2s3n0] [Sat Jul 30 07:39:14 2011] PE 63 exit signal Aborted
>> _pmii_daemon(SIGCHLD): [NID 06043] [c3-2c2s2n1] [Sat Jul 30 07:39:14 2011] PE 36 exit signal Aborted
>> _pmii_daemon(SIGCHLD): [NID 06328] [c1-2c2s3n2] [Sat Jul 30 07:39:14 2011] PE 54 exit signal Aborted
>> _pmii_daemon(SIGCHLD): [NID 05565] [c5-2c2s1n3] [Sat Jul 30 07:39:14 2011] PE 9 exit signal Aborted
>> _pmii_daemon(SIGCHLD): [NID 06331] [c1-2c2s2n3] [Sat Jul 30 07:39:14 2011] PE 45 exit signal Aborted
>> [NID 00102] 2011-07-30 07:39:38 Apid 2986821: initiated application termination
>> 
>> So I checked up the environment parameters on hopper at
>> 
>> https://www.nersc.gov/users/computational-systems/hopper/running-jobs/runtime-tuning-options/#toc-anchor-1
>> 
>> I tried to increase MPI_GNI_MAX_EAGER_MSG_SIZE from 8192 to 131070, but it did not help.
>> 
>> Any suggestions that how could resolve this error for MPI_Irecv() and MPI_Isend()?
>> 
>> Thanks very much!
>> 
>> 
>> Xuefei (Rebecca) Yuan
>> Postdoctoral Fellow
>> Lawrence Berkeley National Laboratory
>> Tel: 1-510-486-7031
>> 
>> 
>> 
>> _______________________________________________
>> mpi-forum mailing list
>> mpi-forum at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110730/6c5043ba/attachment.htm>


More information about the mpich-discuss mailing list