[mpich-discuss] Hydra unable to execute jobs that use more than one node(host) under PBS RMK

Pavan Balaji balaji at mcs.anl.gov
Tue Jan 26 09:51:08 CST 2010


Mario,

This is good information. Yes, it shouldn't matter which process does
the ssh, and yes it is possible that closing stdin is the culprit. Would
you be willing to try out the trunk version of Hydra, which has a bunch
of fixes in this area?

http://www.mcs.anl.gov/research/projects/mpich2/downloads/tarballs/nightly/hydra

Note that the trunk has a few critical bugs that I'm working on right
now, so these nightly tarballs are only meant for testing, and not for
production use.

 -- Pavan

On 01/21/2010 09:17 PM, Mário Costa wrote:
> Hi again,
> 
> I found out the problem comes up only on some specific ssh versions
> (in my caseOpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005), and it depends
> on the order of the processes in the command.
> 
> If I test
> 
> 1. mpiexec.hydra -bootstrap fork -n 1 /bin/true : ssh gorgon002 hostname
> I've got the problem, it hangs, and reports to stderr
> 
> bad fd
> ssh_keysign: no reply
> key_sign failed
> 
> After some googling I found this
> (http://l-sourcemotel.gsfc.nasa.gov/pipermail/test-proj-1-commits/2006-March/000350.html),
> which looks like the problem I have.
> 
> 2. mpiexec.hydra -bootstrap fork -n 1 ssh gorgon002 hostname : /bin/true
> 
> Works fine!
> 
> Shouldn't it be the same, independently of the order ? Could you be
> closing the stdin (or changing it) of the second exec before its time
> ?
> 
> I've replaced the hostname by sleep 5m to get the opened files via
> lsof, check the difference
> 
> 1. mpiexec.hydra -bootstrap fork -n 1 /bin/true : ssh gorgon002 sleep 5m
> 
> $>ps auxf
> mjscosta 27653  0.0  0.0  10100  2416 pts/1    Ss   01:00   0:00  |
>    \_ -bash
> mjscosta 28825  0.0  0.0   6076   704 pts/1    S+   02:59   0:00  |
>        \_ mpiexec.hydra -bootstrap fork -n 1 /bin/true : ssh gorgon002
> sleep 5m
> mjscosta 28826  0.0  0.0   6220   756 pts/1    S+   02:59   0:00  |
>            \_ /usr/bin/pmi_proxy --launch-mode 1 --proxy-port
> gorgon001 49063 --bootstrap fork --proxy-id 0
> mjscosta 28827  0.0  0.0      0     0 pts/1    Z+   02:59   0:00  |
>                \_ [true] <defunct>
> mjscosta 28828  0.0  0.0  24064  2500 pts/1    S+   02:59   0:00  |
>                \_ ssh gorgon002 sleep 5m
> mjscosta 28829  0.0  0.0      0     0 pts/1    Z+   02:59   0:00  |
>                    \_ [ssh-keysign] <defunct>
> 
> $>lsof 28827
> COMMAND   PID     USER   FD   TYPE  DEVICE    SIZE      NODE NAME
> ...
> ssh     28828 mjscosta    0u  IPv4 1839838               TCP
> gorgon001.lnec.pt:50989->gorgon002.lnec.pt:ssh (ESTABLISHED) <<
> ssh     28828 mjscosta    1w  FIFO     0,6           1839832 pipe
> ssh     28828 mjscosta    2w  FIFO     0,6           1839833 pipe
> ssh     28828 mjscosta    3u  IPv4 1839820               TCP *:58133 (LISTEN)
> ssh     28828 mjscosta    4u  IPv4 1839821               TCP *:49063 (LISTEN)
> ssh     28828 mjscosta    5r  FIFO     0,6           1839822 pipe
> ssh     28828 mjscosta    6u  IPv4 1839827               TCP
> gorgon001.lnec.pt:51911->gorgon001.lnec.pt:49063 (ESTABLISHED)
> ssh     28828 mjscosta    7u  IPv4 1839838               TCP
> gorgon001.lnec.pt:50989->gorgon002.lnec.pt:ssh (ESTABLISHED)
> ssh     28828 mjscosta    8w  FIFO     0,6           1839823 pipe
> ssh     28828 mjscosta    9w  FIFO     0,6           1839829 pipe
> ssh     28828 mjscosta   10w  FIFO     0,6           1839824 pipe
> ssh     28828 mjscosta   11r  FIFO     0,6           1839830 pipe
> ssh     28828 mjscosta   12w  FIFO     0,6           1839832 pipe
> ssh     28828 mjscosta   13r  FIFO     0,6           1839831 pipe
> ssh     28828 mjscosta   14w  FIFO     0,6           1839839 pipe
> ssh     28828 mjscosta   15w  FIFO     0,6           1839833 pipe
> ssh     28828 mjscosta   16r  FIFO     0,6           1839840 pipe
> ssh     28828 mjscosta   17w  FIFO     0,6           1839832 pipe
> ssh     28828 mjscosta   18w  FIFO     0,6           1839833 pipe
> 
> 2. mpiexec.hydra -bootstrap fork -n 1 ssh gorgon002 sleep 5m : /bin/true
> 
> $>ps auxf
> mjscosta 27653  0.0  0.0  10100  2416 pts/1    Ss   01:00   0:00  |
>    \_ -bash
> mjscosta 28870  0.0  0.0   6072   704 pts/1    S+   03:03   0:00  |
>        \_ mpiexec.hydra -bootstrap fork -n 1 ssh gorgon002 sleep 5m :
> /bin/true
> mjscosta 28871  0.0  0.0   6216   756 pts/1    S+   03:03   0:00  |
>            \_ /usr/bin/pmi_proxy --launch-mode 1 --proxy-port
> gorgon001 44391 --bootstrap fork --proxy-id 0
> mjscosta 28872  0.4  0.0  24064  2504 pts/1    S+   03:03   0:00  |
>                \_ ssh gorgon002 sleep 5m
> mjscosta 28873  0.0  0.0      0     0 pts/1    Z+   03:03   0:00  |
>                \_ [true] <defunct>
> 
> $>lsof 28872
> COMMAND   PID     USER   FD   TYPE  DEVICE    SIZE      NODE NAME
> ...
> ssh     28872 mjscosta    0r  FIFO     0,6           1839988 pipe <<
> ssh     28872 mjscosta    1w  FIFO     0,6           1839989 pipe
> ssh     28872 mjscosta    2w  FIFO     0,6           1839990 pipe
> ssh     28872 mjscosta    3u  IPv4 1839979               TCP *:41804 (LISTEN)
> ssh     28872 mjscosta    4u  IPv4 1839980               TCP *:44391 (LISTEN)
> ssh     28872 mjscosta    5r  FIFO     0,6           1839981 pipe
> ssh     28872 mjscosta    6u  IPv4 1839986               TCP
> gorgon001.lnec.pt:45713->gorgon001.lnec.pt:44391 (ESTABLISHED)
> ssh     28872 mjscosta    7r  FIFO     0,6           1839988 pipe
> ssh     28872 mjscosta    8w  FIFO     0,6           1839982 pipe
> ssh     28872 mjscosta    9u  IPv4 1839997               TCP
> gorgon001.lnec.pt:58955->gorgon002.lnec.pt:ssh (ESTABLISHED)
> ssh     28872 mjscosta   10w  FIFO     0,6           1839983 pipe
> ssh     28872 mjscosta   12w  FIFO     0,6           1839989 pipe
> ssh     28872 mjscosta   13w  FIFO     0,6           1839989 pipe
> ssh     28872 mjscosta   14w  FIFO     0,6           1839990 pipe
> ssh     28872 mjscosta   15w  FIFO     0,6           1839990 pipe
> 
> Anyway it can be solved updating to a more recent ssh version, that's
> why you can't reproduce it, but non the less there is something in the
> mpiexec.hydra that causes it to work depending on the order the
> command is invoked.
> 
> Let me know what you think about this...
> 
> Thanks and Regards,
> 
> 2010/1/17 Mário Costa <mario.silva.costa at gmail.com>:
>> 2010/1/17 Pavan Balaji <balaji at mcs.anl.gov>:
>>> On 01/16/2010 07:13 PM, Mário Costa wrote:
>>>> I have one question, does mpiexec.hydra agregates the outputs from all
>>>> launched mpi processes ?
>>> Yes.
>>>
>>>> I think it might hang waiting for the output of ssh, that for some
>>>> reason doesn't come out, could this be the case ?
>>> Yes, that's my guess too. This behavior is also possible if the MPI
>>> processes hang. But an ssh problem seems more likely in this case. In
>>> the previous email, when you tried a non-MPI program, did it hang as well?
>> Yes, the same, in a deterministic way ...
>>> % mpiexec.hydra -rmk pbs hostname
>>>
>>>> Here we use ldap in the nodes of the cluster, I've read something
>>>> about ssh processes getting defunct due to ldap ...
>>> Hmm.. This keeps getting more and more interesting :-).
>>>
>>>  -- Pavan
>>>
>>> --
>>> Pavan Balaji
>>> http://www.mcs.anl.gov/~balaji
>>>
>>
>>
>> --
>> Mário Costa
>>
>> Laboratório Nacional de Engenharia Civil
>> LNEC.CTI.NTIEC
>> Avenida do Brasil 101
>> 1700-066 Lisboa, Portugal
>> Tel : ++351 21 844 3911
>>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list