[mpich-discuss] (no subject)

T.R. Sanderson trs38 at cam.ac.uk
Tue Feb 22 07:32:24 CST 2011


Hi Pavan,

Thanks for the pointer.

Unfortunately none of the 4 points apply, SSH is fine, firewalls are
off and we use external DNS.  Are there any other possible causes?

Best

Theo


On Tue, Feb 22, 2011 at 12:18 AM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
> Here's an entry on the FAQ that describes this:
>
> http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_My=
_MPI_program_aborts_with_an_error_saying_it_cannot_communicate_with_other_p=
rocesses
>
> =A0-- Pavan
>
> On 02/21/2011 06:03 PM, T.R. Sanderson wrote:
>>
>> Hello, I've been testing my MPICH2 installation with cpi and am having
>> some
>> issues. Cpi runs fine on either computer. If I add two nodes to my hosts
>> file I receive the error below, if I only have one node in the host file
>> it
>> runs happily even if executed via MPI.
>>
>> I would be very grateful for any advice, if you would like the verbose
>> output just let me know.
>>
>> Many thanks,
>> Theo
>>
>> trs38 at node0:~$ mpiexec.hydra -l -n 2 /root/mpich2-1.3.2/examples/cpi [1]
>> Process 1 of 2 is on node2 [0] Process 0 of 2 is on node1 [0] Fatal erro=
r
>> in PMPI_Reduce: Other MPI error, error stack: [0]
>> PMPI_Reduce(1322)...............: MPI_Reduce(sbuf=3D0x7fffbfe9d028,
>> rbuf=3D0x7fffbfe9d020, count=3D1, MPI_DOUBLE, MPI_SUM, root=3D0, MPI_COM=
M_WORLD)
>> failed [0] MPIR_Reduce_impl(1139)..........: [0]
>> MPIR_Reduce_intra(947)..........: [0] MPIR_Reduce_binomial(176).......:
>> [0]
>> MPIDI_CH3U_Recvq_FDU_or_AEP(380): Communication error with rank 1
>> [mpiexec at node0] ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP
>> [proxy:0:1 at node2] HYD_pmcd_pmip_control_cmd_cb
>> (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at node2]
>> HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback
>> returned error status [proxy:0:1 at node2] main (./pm/pmiserv/pmip.c:208):
>> demux engine error waiting for event APPLICATION TERMINATED WITH THE EXI=
T
>> STRING: Hangup (signal 1)
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>


More information about the mpich-discuss mailing list