[mpich-discuss] mpiexec woes

Janzen Brewer janzen.brewer at gtri.gatech.edu
Tue Aug 25 10:18:02 CDT 2009


I performed the steps in the troubleshooting guide and was able to get 
two nodes to handshake (with mpdcheck -s / -c), but the test for running 
/bin/hostname locally AND remotely failed. The command simply hung and 
didn't produce any output. I was able to use mpdlistjobs and mpdkilljob 
on one of the slave nodes to kill the job, so the nodes can obviously 
communicate.

Janzen

Rajeev Thakur wrote:
> It could be a problem with the networking settings on the machines. To
> debug, you could follow the steps outlined in the Appendix of the MPICH2
> installation guide (using mpdcheck). And try with a smaller set of nodes
> first.
>
> Rajeev
>
>
>   
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov 
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Janzen Brewer
>> Sent: Thursday, August 20, 2009 8:14 AM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: [mpich-discuss] mpiexec woes
>>
>> I'm implementing MPICH2 on a small GPU cluster. It will eventually be 
>> integrated with Condor 7.2, but for now it's running by itself. The 
>> cluster is set up such that the command:
>>
>> $ mpdboot -n 10 --ifhn=192.168.1.100 --rsh=rsh
>>
>> appears to start the daemon on all nodes. Running mpdtrace 
>> returns all 
>> the nodes hostnames and 'mpdringtest 100' runs successfully. However, 
>> when I try to run anything with mpiexec, the shell hangs indefinitely 
>> and I have to kill it with mpdallexit from a separate shell. 
>> Here's the 
>> particular command I've been using:
>>
>>
>> $ mpiexec -n 10 /bin/hostname
>>
>> This command works when the daemon is only booted on the master node 
>> (i.e. -n argument is 1 for both commands above). I've lurked 
>> around but 
>> have been unable to find the solution.
>>
>> Thanks!
>> Janzen
>>
>>     
>
>   



More information about the mpich-discuss mailing list