[mpich-discuss] mpiexec woes
Janzen Brewer
janzen.brewer at gtri.gatech.edu
Tue Aug 25 10:18:02 CDT 2009
I performed the steps in the troubleshooting guide and was able to get
two nodes to handshake (with mpdcheck -s / -c), but the test for running
/bin/hostname locally AND remotely failed. The command simply hung and
didn't produce any output. I was able to use mpdlistjobs and mpdkilljob
on one of the slave nodes to kill the job, so the nodes can obviously
communicate.
Janzen
Rajeev Thakur wrote:
> It could be a problem with the networking settings on the machines. To
> debug, you could follow the steps outlined in the Appendix of the MPICH2
> installation guide (using mpdcheck). And try with a smaller set of nodes
> first.
>
> Rajeev
>
>
>
>> -----Original Message-----
>> From: mpich-discuss-bounces at mcs.anl.gov
>> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Janzen Brewer
>> Sent: Thursday, August 20, 2009 8:14 AM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: [mpich-discuss] mpiexec woes
>>
>> I'm implementing MPICH2 on a small GPU cluster. It will eventually be
>> integrated with Condor 7.2, but for now it's running by itself. The
>> cluster is set up such that the command:
>>
>> $ mpdboot -n 10 --ifhn=192.168.1.100 --rsh=rsh
>>
>> appears to start the daemon on all nodes. Running mpdtrace
>> returns all
>> the nodes hostnames and 'mpdringtest 100' runs successfully. However,
>> when I try to run anything with mpiexec, the shell hangs indefinitely
>> and I have to kill it with mpdallexit from a separate shell.
>> Here's the
>> particular command I've been using:
>>
>>
>> $ mpiexec -n 10 /bin/hostname
>>
>> This command works when the daemon is only booted on the master node
>> (i.e. -n argument is 1 for both commands above). I've lurked
>> around but
>> have been unable to find the solution.
>>
>> Thanks!
>> Janzen
>>
>>
>
>
More information about the mpich-discuss
mailing list