[mpich-discuss] Hydra unable to execute jobs that use more than one node(host) under PBS RMK

Mário Costa mario.silva.costa at gmail.com
Sun Jan 17 06:47:52 CST 2010


2010/1/17 Pavan Balaji <balaji at mcs.anl.gov>:
>
> On 01/16/2010 07:13 PM, Mário Costa wrote:
>> I have one question, does mpiexec.hydra agregates the outputs from all
>> launched mpi processes ?
>
> Yes.
>
>> I think it might hang waiting for the output of ssh, that for some
>> reason doesn't come out, could this be the case ?
>
> Yes, that's my guess too. This behavior is also possible if the MPI
> processes hang. But an ssh problem seems more likely in this case. In
> the previous email, when you tried a non-MPI program, did it hang as well?

Yes, the same, in a deterministic way ...
>
> % mpiexec.hydra -rmk pbs hostname
>
>> Here we use ldap in the nodes of the cluster, I've read something
>> about ssh processes getting defunct due to ldap ...
>
> Hmm.. This keeps getting more and more interesting :-).
>
>  -- Pavan
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>



-- 
Mário Costa

Laboratório Nacional de Engenharia Civil
LNEC.CTI.NTIEC
Avenida do Brasil 101
1700-066 Lisboa, Portugal
Tel : ++351 21 844 3911


More information about the mpich-discuss mailing list