[mpich-discuss] mpiexec woes
Rajeev Thakur
thakur at mcs.anl.gov
Thu Aug 20 13:22:13 CDT 2009
It could be a problem with the networking settings on the machines. To
debug, you could follow the steps outlined in the Appendix of the MPICH2
installation guide (using mpdcheck). And try with a smaller set of nodes
first.
Rajeev
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Janzen Brewer
> Sent: Thursday, August 20, 2009 8:14 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [mpich-discuss] mpiexec woes
>
> I'm implementing MPICH2 on a small GPU cluster. It will eventually be
> integrated with Condor 7.2, but for now it's running by itself. The
> cluster is set up such that the command:
>
> $ mpdboot -n 10 --ifhn=192.168.1.100 --rsh=rsh
>
> appears to start the daemon on all nodes. Running mpdtrace
> returns all
> the nodes hostnames and 'mpdringtest 100' runs successfully. However,
> when I try to run anything with mpiexec, the shell hangs indefinitely
> and I have to kill it with mpdallexit from a separate shell.
> Here's the
> particular command I've been using:
>
>
> $ mpiexec -n 10 /bin/hostname
>
> This command works when the daemon is only booted on the master node
> (i.e. -n argument is 1 for both commands above). I've lurked
> around but
> have been unable to find the solution.
>
> Thanks!
> Janzen
>
More information about the mpich-discuss
mailing list