[mpich-discuss] Hydra unable to execute jobs that use more than one node(host) under PBS RMK

Mário Costa mario.silva.costa at gmail.com
Thu Jan 21 21:17:07 CST 2010


Hi again,

I found out the problem comes up only on some specific ssh versions
(in my caseOpenSSH_4.2p1, OpenSSL 0.9.8a 11 Oct 2005), and it depends
on the order of the processes in the command.

If I test

1. mpiexec.hydra -bootstrap fork -n 1 /bin/true : ssh gorgon002 hostname
I've got the problem, it hangs, and reports to stderr

bad fd
ssh_keysign: no reply
key_sign failed

After some googling I found this
(http://l-sourcemotel.gsfc.nasa.gov/pipermail/test-proj-1-commits/2006-March/000350.html),
which looks like the problem I have.

2. mpiexec.hydra -bootstrap fork -n 1 ssh gorgon002 hostname : /bin/true

Works fine!

Shouldn't it be the same, independently of the order ? Could you be
closing the stdin (or changing it) of the second exec before its time
?

I've replaced the hostname by sleep 5m to get the opened files via
lsof, check the difference

1. mpiexec.hydra -bootstrap fork -n 1 /bin/true : ssh gorgon002 sleep 5m

$>ps auxf
mjscosta 27653  0.0  0.0  10100  2416 pts/1    Ss   01:00   0:00  |
   \_ -bash
mjscosta 28825  0.0  0.0   6076   704 pts/1    S+   02:59   0:00  |
       \_ mpiexec.hydra -bootstrap fork -n 1 /bin/true : ssh gorgon002
sleep 5m
mjscosta 28826  0.0  0.0   6220   756 pts/1    S+   02:59   0:00  |
           \_ /usr/bin/pmi_proxy --launch-mode 1 --proxy-port
gorgon001 49063 --bootstrap fork --proxy-id 0
mjscosta 28827  0.0  0.0      0     0 pts/1    Z+   02:59   0:00  |
               \_ [true] <defunct>
mjscosta 28828  0.0  0.0  24064  2500 pts/1    S+   02:59   0:00  |
               \_ ssh gorgon002 sleep 5m
mjscosta 28829  0.0  0.0      0     0 pts/1    Z+   02:59   0:00  |
                   \_ [ssh-keysign] <defunct>

$>lsof 28827
COMMAND   PID     USER   FD   TYPE  DEVICE    SIZE      NODE NAME
...
ssh     28828 mjscosta    0u  IPv4 1839838               TCP
gorgon001.lnec.pt:50989->gorgon002.lnec.pt:ssh (ESTABLISHED) <<
ssh     28828 mjscosta    1w  FIFO     0,6           1839832 pipe
ssh     28828 mjscosta    2w  FIFO     0,6           1839833 pipe
ssh     28828 mjscosta    3u  IPv4 1839820               TCP *:58133 (LISTEN)
ssh     28828 mjscosta    4u  IPv4 1839821               TCP *:49063 (LISTEN)
ssh     28828 mjscosta    5r  FIFO     0,6           1839822 pipe
ssh     28828 mjscosta    6u  IPv4 1839827               TCP
gorgon001.lnec.pt:51911->gorgon001.lnec.pt:49063 (ESTABLISHED)
ssh     28828 mjscosta    7u  IPv4 1839838               TCP
gorgon001.lnec.pt:50989->gorgon002.lnec.pt:ssh (ESTABLISHED)
ssh     28828 mjscosta    8w  FIFO     0,6           1839823 pipe
ssh     28828 mjscosta    9w  FIFO     0,6           1839829 pipe
ssh     28828 mjscosta   10w  FIFO     0,6           1839824 pipe
ssh     28828 mjscosta   11r  FIFO     0,6           1839830 pipe
ssh     28828 mjscosta   12w  FIFO     0,6           1839832 pipe
ssh     28828 mjscosta   13r  FIFO     0,6           1839831 pipe
ssh     28828 mjscosta   14w  FIFO     0,6           1839839 pipe
ssh     28828 mjscosta   15w  FIFO     0,6           1839833 pipe
ssh     28828 mjscosta   16r  FIFO     0,6           1839840 pipe
ssh     28828 mjscosta   17w  FIFO     0,6           1839832 pipe
ssh     28828 mjscosta   18w  FIFO     0,6           1839833 pipe

2. mpiexec.hydra -bootstrap fork -n 1 ssh gorgon002 sleep 5m : /bin/true

$>ps auxf
mjscosta 27653  0.0  0.0  10100  2416 pts/1    Ss   01:00   0:00  |
   \_ -bash
mjscosta 28870  0.0  0.0   6072   704 pts/1    S+   03:03   0:00  |
       \_ mpiexec.hydra -bootstrap fork -n 1 ssh gorgon002 sleep 5m :
/bin/true
mjscosta 28871  0.0  0.0   6216   756 pts/1    S+   03:03   0:00  |
           \_ /usr/bin/pmi_proxy --launch-mode 1 --proxy-port
gorgon001 44391 --bootstrap fork --proxy-id 0
mjscosta 28872  0.4  0.0  24064  2504 pts/1    S+   03:03   0:00  |
               \_ ssh gorgon002 sleep 5m
mjscosta 28873  0.0  0.0      0     0 pts/1    Z+   03:03   0:00  |
               \_ [true] <defunct>

$>lsof 28872
COMMAND   PID     USER   FD   TYPE  DEVICE    SIZE      NODE NAME
...
ssh     28872 mjscosta    0r  FIFO     0,6           1839988 pipe <<
ssh     28872 mjscosta    1w  FIFO     0,6           1839989 pipe
ssh     28872 mjscosta    2w  FIFO     0,6           1839990 pipe
ssh     28872 mjscosta    3u  IPv4 1839979               TCP *:41804 (LISTEN)
ssh     28872 mjscosta    4u  IPv4 1839980               TCP *:44391 (LISTEN)
ssh     28872 mjscosta    5r  FIFO     0,6           1839981 pipe
ssh     28872 mjscosta    6u  IPv4 1839986               TCP
gorgon001.lnec.pt:45713->gorgon001.lnec.pt:44391 (ESTABLISHED)
ssh     28872 mjscosta    7r  FIFO     0,6           1839988 pipe
ssh     28872 mjscosta    8w  FIFO     0,6           1839982 pipe
ssh     28872 mjscosta    9u  IPv4 1839997               TCP
gorgon001.lnec.pt:58955->gorgon002.lnec.pt:ssh (ESTABLISHED)
ssh     28872 mjscosta   10w  FIFO     0,6           1839983 pipe
ssh     28872 mjscosta   12w  FIFO     0,6           1839989 pipe
ssh     28872 mjscosta   13w  FIFO     0,6           1839989 pipe
ssh     28872 mjscosta   14w  FIFO     0,6           1839990 pipe
ssh     28872 mjscosta   15w  FIFO     0,6           1839990 pipe

Anyway it can be solved updating to a more recent ssh version, that's
why you can't reproduce it, but non the less there is something in the
mpiexec.hydra that causes it to work depending on the order the
command is invoked.

Let me know what you think about this...

Thanks and Regards,

2010/1/17 Mário Costa <mario.silva.costa at gmail.com>:
> 2010/1/17 Pavan Balaji <balaji at mcs.anl.gov>:
>>
>> On 01/16/2010 07:13 PM, Mário Costa wrote:
>>> I have one question, does mpiexec.hydra agregates the outputs from all
>>> launched mpi processes ?
>>
>> Yes.
>>
>>> I think it might hang waiting for the output of ssh, that for some
>>> reason doesn't come out, could this be the case ?
>>
>> Yes, that's my guess too. This behavior is also possible if the MPI
>> processes hang. But an ssh problem seems more likely in this case. In
>> the previous email, when you tried a non-MPI program, did it hang as well?
>
> Yes, the same, in a deterministic way ...
>>
>> % mpiexec.hydra -rmk pbs hostname
>>
>>> Here we use ldap in the nodes of the cluster, I've read something
>>> about ssh processes getting defunct due to ldap ...
>>
>> Hmm.. This keeps getting more and more interesting :-).
>>
>>  -- Pavan
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>>
>
>
>
> --
> Mário Costa
>
> Laboratório Nacional de Engenharia Civil
> LNEC.CTI.NTIEC
> Avenida do Brasil 101
> 1700-066 Lisboa, Portugal
> Tel : ++351 21 844 3911
>
-- 
Mário Costa


More information about the mpich-discuss mailing list