[mpich-discuss] Fatal error in PMPI_Reduce: Other MPI error, error stac

T.R. Sanderson trs38 at cam.ac.uk
Tue Feb 22 07:53:31 CST 2011


Just noticed something rather odd - if I specify a port range using 

export MPICH_PORT_RANGE=50001:59999

the process just hangs after saying:

[1] Process 1 of 2 is on node2

[0] Process 0 of 2 is on node1

And both processes continue to use 100% CPU until I kill the program. Does 
that explain anything? If I don't specify ports it gives the same message 
given before.

Best

Theo

On Feb 22 2011, Pavan Balaji wrote:

>
>Here's an entry on the FAQ that describes this:
>
>  
> http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_My_MPI_program_aborts_with_an_error_saying_it_cannot_communicate_with_other_processes
>
>  -- Pavan
>
>On 02/21/2011 06:03 PM, T.R. Sanderson wrote:
>> Hello, I've been testing my MPICH2 installation with cpi and am having 
>> some issues. Cpi runs fine on either computer. If I add two nodes to my 
>> hosts file I receive the error below, if I only have one node in the 
>> host file it runs happily even if executed via MPI.
>>
>> I would be very grateful for any advice, if you would like the verbose
>> output just let me know.
>>
>> Many thanks,
>> Theo
>>
>> trs38 at node0:~$ mpiexec.hydra -l -n 2 /root/mpich2-1.3.2/examples/cpi 
>> [1] Process 1 of 2 is on node2 [0] Process 0 of 2 is on node1 [0] Fatal 
>> error in PMPI_Reduce: Other MPI error, error stack: [0] 
>> PMPI_Reduce(1322)...............: MPI_Reduce(sbuf=0x7fffbfe9d028, 
>> rbuf=0x7fffbfe9d020, count=1, MPI_DOUBLE, MPI_SUM, root=0, 
>> MPI_COMM_WORLD) failed [0] MPIR_Reduce_impl(1139)..........: [0] 
>> MPIR_Reduce_intra(947)..........: [0] MPIR_Reduce_binomial(176).......: 
>> [0] MPIDI_CH3U_Recvq_FDU_or_AEP(380): Communication error with rank 1 
>> [mpiexec at node0] ONE OF THE PROCESSES TERMINATED BADLY: CLEANING UP 
>> [proxy:0:1 at node2] HYD_pmcd_pmip_control_cmd_cb 
>> (./pm/pmiserv/pmip_cb.c:868): assert (!closed) failed [proxy:0:1 at node2] 
>> HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback 
>> returned error status [proxy:0:1 at node2] main (./pm/pmiserv/pmip.c:208): 
>> demux engine error waiting for event APPLICATION TERMINATED WITH THE 
>> EXIT STRING: Hangup (signal 1)
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>


More information about the mpich-discuss mailing list