[mpich-discuss] mpiexec woes

Rajeev Thakur thakur at mcs.anl.gov
Thu Aug 20 13:22:13 CDT 2009


It could be a problem with the networking settings on the machines. To
debug, you could follow the steps outlined in the Appendix of the MPICH2
installation guide (using mpdcheck). And try with a smaller set of nodes
first.

Rajeev


> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov 
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Janzen Brewer
> Sent: Thursday, August 20, 2009 8:14 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] mpiexec woes
> 
> I'm implementing MPICH2 on a small GPU cluster. It will eventually be 
> integrated with Condor 7.2, but for now it's running by itself. The 
> cluster is set up such that the command:
> 
> $ mpdboot -n 10 --ifhn=192.168.1.100 --rsh=rsh
> 
> appears to start the daemon on all nodes. Running mpdtrace 
> returns all 
> the nodes hostnames and 'mpdringtest 100' runs successfully. However, 
> when I try to run anything with mpiexec, the shell hangs indefinitely 
> and I have to kill it with mpdallexit from a separate shell. 
> Here's the 
> particular command I've been using:
> 
> 
> $ mpiexec -n 10 /bin/hostname
> 
> This command works when the daemon is only booted on the master node 
> (i.e. -n argument is 1 for both commands above). I've lurked 
> around but 
> have been unable to find the solution.
> 
> Thanks!
> Janzen
> 



More information about the mpich-discuss mailing list