[mpich-discuss] Fatal error in PMPI_Reduce: Other MPI error, error stac

T.R. Sanderson trs38 at cam.ac.uk
Wed Feb 23 10:12:17 CST 2011


Hi Pavan and Nicolas, 

Thank you for your responses. Pavan, the delay persists even if the number 
of ports is lowered and I can listen on the ports with netcat

>* Does "fine" include having double-checked that, in both directions,
>neither an interactive password nor any other interactive confirmation
>(e.g. yes/no) is ever needed?
Yes, I set this up.
>
>
>* Does your DNS correctly handle reverse lookup of all hosts involved?
>  http://en.wikipedia.org/wiki/Reverse_DNS_lookup
>
>* If you use the "host" command to look up the name of the IP of a
>name, do you end up with exactly the same name you started with? If
>not, what do you get if you look up the IP for that new name?

I have amended my MPI hosts file to add the full domain of the nodes now, 
so yes they do.

trs38 at node0:~$ host node5.rbc.internal.cam.ac.uk

node5.rbc.internal.cam.ac.uk has address 192.168.128.205

trs38 at node0:~$ host 192.168.128.205

205.128.168.192.in-addr.arpa domain name pointer 
node5.rbc.internal.cam.ac.uk.




>
>
>> If I add two nodes to my hosts file I receive the error below,
>
>* Your log says you ran mpiexec on node0 while the file contained
>node1 and node2. That involves 3 nodes total, counting the head.
>Correct?
Correct. But that's just because I happened to post that log, exactly the 
same thing happens if I run from either of the nodes involved, or if I make 
the nodes involved node0 and node1.
>
>
>* What about varying the choice of those 2 out of 3?
Have tried quite a variety
>
>* Is there something special about node0, or just a convention?
Nothing special
>
>
>> if I only have one node in the host file it runs happily even if 
>> executed via MPI
>
>* Does this hold for any one node you choose?
Yes
>
>* Does it still hold regardless of whether the node where you run
>mpiexec is or isn't the one you choose to put on the list?
Yes, it works regardless.

Many thanks, still scratching my head here but it's nice to be ruling out 
some possible problems.



More information about the mpich-discuss mailing list