[mpich2-dev] mpich2-1.4.1 communication error.

Dave Goodell goodell at mcs.anl.gov
Wed Jun 20 10:32:42 CDT 2012


[mpich-discuss at mcs.anl.gov would be a more appropriate list for these sorts of questions]

You said that you turned off the firewall, but that's usually the main culprit with these errors.  Please double check this on both machines.

Another possibility is a DNS configuration problem.  Make sure that "host zhang-Lenovo-IdeaPad-Y470" and "host lianjie2" yield the same results on both machines.

-Dave

On Jun 20, 2012, at 6:18 AM CDT, 张磊 wrote:

> I've configured two computers of Ubuntu12.04 with mpich2-1.4.1p1,and it all works well with the Helloworld program in one machine or two machines together.
> But I got wrong info in running the cpi examples.Before,all two machines can ping and ssh from each other to the other with no password. and I also turned off the firewall.(I've checked the uwf-status.) However, when I running the cpi example, I got these below:
> 
> zhang at zhang-Lenovo-IdeaPad-Y470:~/test$ mpiexec -f hosts -np 2 ./cpi
> Process 0 of 2 is on zhang-Lenovo-IdeaPad-Y470
> Process 1 of 2 is on lianjie2
> Fatal error in PMPI_Reduce: Other MPI error, error stack:
> PMPI_Reduce(1270)...............: MPI_Reduce(sbuf=0xbf8f9b18, rbuf=0xbf8f9b20, count=1, MPI_DOUBLE, MPI_SUM, root=0, MPI_COMM_WORLD) failed
> MPIR_Reduce_impl(1087)..........: 
> MPIR_Reduce_intra(895)..........: 
> MPIR_Reduce_binomial(144).......: 
> MPIDI_CH3U_Recvq_FDU_or_AEP(380): Communication error with rank 1
> 
> [mpiexec at zhang-Lenovo-IdeaPad-Y470] control_cb (./pm/pmiserv/pmiserv_cb.c:321): assert (!closed) failed
> [mpiexec at zhang-Lenovo-IdeaPad-Y470] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
> [mpiexec at zhang-Lenovo-IdeaPad-Y470] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:181): error waiting for event
> [mpiexec at zhang-Lenovo-IdeaPad-Y470] main (./ui/mpich/mpiexec.c:405): process manager error waiting for completion
> And the two computers are on a private net 192.168.48.xxx. I would very strongly appreciate when somebody could give me hint how to cope with this problem.
> 
> Thanks very much in advance for any tip.



More information about the mpich2-dev mailing list