<html><body bgcolor="#FFFFFF"><div><br>Hello,</div><div><br></div><div>Could u pls give me some suggestions to resolve the MPI problem on Hopper?</div><div><br></div><div>Thanks very much!</div><div><br></div><div>Rebecca<br><br>Begin forwarded message:<br><br></div><blockquote type="cite"><div><b>From:</b> Jeff Hammond <<a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a>><br><b>Date:</b> July 30, 2011 8:17:09 AM PDT<br><b>To:</b> Main MPI Forum mailing list <<a href="mailto:mpi-forum@lists.mpi-forum.org">mpi-forum@lists.mpi-forum.org</a>><br><b>Subject:</b> <b>Re: [Mpi-forum] The MPI Internal error running on Hopper</b><br><b>Reply-To:</b> Main MPI Forum mailing list <<a href="mailto:mpi-forum@lists.mpi-forum.org">mpi-forum@lists.mpi-forum.org</a>><br><br></div></blockquote><div></div><blockquote type="cite"><div><span>Report to NERSC support. This is not the appropriate email list for</span><br><span>support of MPI implementations.</span><br><span></span><br><span>CrayMPI is an MPICH2-based implementation so you can also try</span><br><span><a href="mailto:mpich-discuss@mcs.anl.gov">mpich-discuss@mcs.anl.gov</a> but it is still preferred to contact NERSC</span><br><span>first since they are the ones who own the Cray support contract for</span><br><span>Hopper.</span><br><span></span><br><span>Jeff</span><br><span></span><br><span>Sent from my iPhone</span><br><span></span><br><span>On Jul 30, 2011, at 9:54 AM, "Xuefei (Rebecca) Yuan" <<a href="mailto:xyuan@lbl.gov">xyuan@lbl.gov</a>> wrote:</span><br><span></span><br><blockquote type="cite"><span>Hello, all,</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>I got some MPI internal error while running on a Cray XE6 machine (Hopper), the error message reads:</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Rank 9 [Sat Jul 30 07:39:14 2011] [c5-2c2s1n3] Fatal error in PMPI_Wait: Other MPI error, error stack:</span><br></blockquote><blockquote type="cite"><span>PMPI_Wait(179).....................: MPI_Wait(request=0x7fffffff7438, status0x7fffffff7460) failed</span><br></blockquote><blockquote type="cite"><span>MPIR_Wait_impl(69).................:</span><br></blockquote><blockquote type="cite"><span>MPIDI_CH3I_Progress(370)...........:</span><br></blockquote><blockquote type="cite"><span>MPIDI_CH3_PktHandler_EagerSend(606): Failed to allocate memory for an unexpected message. 0 unexpected messages queued.</span><br></blockquote><blockquote type="cite"><span>Rank 63 [Sat Jul 30 07:39:14 2011] [c0-2c2s3n0] Fatal error in MPI_Irecv: Other MPI error, error stack:</span><br></blockquote><blockquote type="cite"><span>MPI_Irecv(147): MPI_Irecv(buf=0x4a81890, count=52, MPI_DOUBLE, src=MPI_ANY_SOURCE, tag=MPI_ANY_TAG, comm=0x84000007, request=0x7fffffff7438) failed</span><br></blockquote><blockquote type="cite"><span>MPID_Irecv(53): failure occurred while allocating memory for a request object</span><br></blockquote><blockquote type="cite"><span>Rank 54 [Sat Jul 30 07:39:14 2011] [c1-2c2s3n2] Fatal error in PMPI_Isend: Internal MPI error!, error stack:</span><br></blockquote><blockquote type="cite"><span>PMPI_Isend(148): MPI_Isend(buf=0x3d12a350, count=52, MPI_DOUBLE, dest=30, tag=21, comm=0xc4000003, request=0x3c9c12f0) failed</span><br></blockquote><blockquote type="cite"><span>(unknown)(): Internal MPI error!</span><br></blockquote><blockquote type="cite"><span>Rank 45 [Sat Jul 30 07:39:14 2011] [c1-2c2s2n3] Fatal error in PMPI_Isend: Internal MPI error!, error stack:</span><br></blockquote><blockquote type="cite"><span>PMPI_Isend(148): MPI_Isend(buf=0x3c638de0, count=34, MPI_DOUBLE, dest=61, tag=21, comm=0x84000007, request=0x3c03be90) failed</span><br></blockquote><blockquote type="cite"><span>(unknown)(): Internal MPI error!</span><br></blockquote><blockquote type="cite"><span>Rank 36 [Sat Jul 30 07:39:14 2011] [c3-2c2s2n1] Fatal error in PMPI_Isend: Internal MPI error!, error stack:</span><br></blockquote><blockquote type="cite"><span>PMPI_Isend(148): MPI_Isend(buf=0x3caaf170, count=52, MPI_DOUBLE, dest=28, tag=21, comm=0xc4000003, request=0x3c2e561c) failed</span><br></blockquote><blockquote type="cite"><span>(unknown)(): Internal MPI error!</span><br></blockquote><blockquote type="cite"><span>_pmii_daemon(SIGCHLD): [NID 00102] [c0-2c2s3n0] [Sat Jul 30 07:39:14 2011] PE 63 exit signal Aborted</span><br></blockquote><blockquote type="cite"><span>_pmii_daemon(SIGCHLD): [NID 06043] [c3-2c2s2n1] [Sat Jul 30 07:39:14 2011] PE 36 exit signal Aborted</span><br></blockquote><blockquote type="cite"><span>_pmii_daemon(SIGCHLD): [NID 06328] [c1-2c2s3n2] [Sat Jul 30 07:39:14 2011] PE 54 exit signal Aborted</span><br></blockquote><blockquote type="cite"><span>_pmii_daemon(SIGCHLD): [NID 05565] [c5-2c2s1n3] [Sat Jul 30 07:39:14 2011] PE 9 exit signal Aborted</span><br></blockquote><blockquote type="cite"><span>_pmii_daemon(SIGCHLD): [NID 06331] [c1-2c2s2n3] [Sat Jul 30 07:39:14 2011] PE 45 exit signal Aborted</span><br></blockquote><blockquote type="cite"><span>[NID 00102] 2011-07-30 07:39:38 Apid 2986821: initiated application termination</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>So I checked up the environment parameters on hopper at</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span><a href="https://www.nersc.gov/users/computational-systems/hopper/running-jobs/runtime-tuning-options/#toc-anchor-1">https://www.nersc.gov/users/computational-systems/hopper/running-jobs/runtime-tuning-options/#toc-anchor-1</a></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>I tried to increase MPI_GNI_MAX_EAGER_MSG_SIZE from 8192 to 131070, but it did not help.</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Any suggestions that how could resolve this error for MPI_Irecv() and MPI_Isend()?</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Thanks very much!</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>Xuefei (Rebecca) Yuan</span><br></blockquote><blockquote type="cite"><span>Postdoctoral Fellow</span><br></blockquote><blockquote type="cite"><span>Lawrence Berkeley National Laboratory</span><br></blockquote><blockquote type="cite"><span>Tel: 1-510-486-7031</span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span></span><br></blockquote><blockquote type="cite"><span>_______________________________________________</span><br></blockquote><blockquote type="cite"><span>mpi-forum mailing list</span><br></blockquote><blockquote type="cite"><span><a href="mailto:mpi-forum@lists.mpi-forum.org">mpi-forum@lists.mpi-forum.org</a></span><br></blockquote><blockquote type="cite"><span><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum</a></span><br></blockquote><span>_______________________________________________</span><br><span>mpi-forum mailing list</span><br><span><a href="mailto:mpi-forum@lists.mpi-forum.org">mpi-forum@lists.mpi-forum.org</a></span><br><span><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum</a></span><br></div></blockquote></body></html>