[mpich-discuss] Fatal error in PMPI_Bcast:

Dave Goodell goodell at mcs.anl.gov
Fri May 27 10:55:28 CDT 2011


The problem looks like a networking issue, either a firewall or DNS (bad /etc/hosts file?) issue.  Are the firewalls disabled on these machines?  How are the hostnames configured?

What version of MPICH2 is this?  What configure options did you use when you built MPICH2?

-Dave

On May 27, 2011, at 10:49 AM CDT, Fujun Liu wrote:

> The cpi also does not work. There is no error message, but it takes forever:
> 
> xxxx at query:~/MPI$ mpiexec -n 2 -f machinefile /home/netlab/MPI/mpich2-build/examples/cpi
> Process 1 of 2 is on query
> Process 0 of 2 is on trigger
> 
> I think my two hosts are still trying to communicate to each other. Any suggestions?
> 
> Best wishes,
> 
> 
> On Fri, May 27, 2011 at 9:42 AM, Dave Goodell <goodell at mcs.anl.gov> wrote:
> Does the "examples/cpi" program from the MPICH2 build directory work correctly for you when you run it on multiple nodes?
> 
> -Dave
> 
> On May 26, 2011, at 5:49 PM CDT, Fujun Liu wrote:
> 
> > Hi everyone,
> >
> > When I try one example from http://beige.ucs.indiana.edu/I590/node62.html, I got the following error message as below. In the MPI cluster, there are two hosts. If I run the two processes on just one host, everything works fine. But if I run two processes on the two-host cluster, the following error happens. I think the two hosts just can't send/receive message to each other, but I don't know how to resolve this.
> >
> > Thanks in advance!
> >
> > xxxx at query:~/MPI$ mpiexec -n 2 -f machinefile ./GreetMaster
> > Fatal error in PMPI_Bcast: Other MPI error, error stack:
> > PMPI_Bcast(1430).......................: MPI_Bcast(buf=0x7fff13114cb0, count=8192, MPI_CHAR, root=0, MPI_COMM_WORLD) failed
> > MPIR_Bcast_impl(1273)..................:
> > MPIR_Bcast_intra(1107).................:
> > MPIR_Bcast_binomial(143)...............:
> > MPIC_Recv(110).........................:
> > MPIC_Wait(540).........................:
> > MPIDI_CH3I_Progress(353)...............:
> > MPID_nem_mpich2_blocking_recv(905).....:
> > MPID_nem_tcp_connpoll(1823)............:
> > state_commrdy_handler(1665)............:
> > MPID_nem_tcp_recv_handler(1559)........:
> > MPID_nem_handle_pkt(587)...............:
> > MPIDI_CH3_PktHandler_EagerSend(632)....: failure occurred while posting a receive for message data (MPIDI_CH3_PKT_EAGER_SEND)
> > MPIDI_CH3U_Receive_data_unexpected(251): Out of memory (unable to allocate -1216907051 bytes)
> > [mpiexec at query] ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP
> > APPLICATION TERMINATED WITH THE EXIT STRING: Hangup (signal 1)
> >
> > --
> > Fujun Liu
> > Department of Computer Science, University of Kentucky, 2010.08-
> > fujun.liu at uky.edu, (859)229-3659
> >
> >
> >
> > _______________________________________________
> > mpich-discuss mailing list
> > mpich-discuss at mcs.anl.gov
> > https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> 
> 
> -- 
> Fujun Liu
> Department of Computer Science, University of Kentucky, 2010.08-
> fujun.liu at uky.edu, (859)229-3659
> 
>  
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list