[mpich-discuss] HYDRA and kill process

Torquil Macdonald Sørensen torquil at gmail.com
Fri Mar 25 10:09:38 CDT 2011


On 25/03/11 15:27, Pavan Balaji wrote:
>
> On 03/25/2011 03:37 AM, Torquil Macdonald Sørensen wrote:
>> I started a job consisting of 8 processes, 4 on hostA and 4 on hostB
>> using
>> "mpiexec -n 8 progfile". Hitting CTRL-c on hostA kills its four
>> processes, but
>> the remaining four still run on hostB, so I am forced to log in there
>> and kill
>> them myself.
>
> This case has almost always worked correctly for us. We haven't seen any
> bug reports for this. We need more information to figure out what's
> going on.
>
> 1. Did you set a separate host file environment? Are you using any
> resource manager (like SLURM, PBS, ...) in your environment? The reason
> I'm asking is "mpiexec -n 8 progfile" itself will not know anything
> about hostB, unless you have it this information through some other means.
>
> 2. Are you seeing any errors while running the job before you try to
> kill it?
>
> 3. Did you try running one of the MPICH2 example programs (./examples/cpi)?
>
> -- Pavan

Thanks!

1) I have a file ~/mpd.hosts and I set the environment variable HYDRA_HOST_FILE. 
The file is essentially just

hostA.uio.no:4
hostB.uio.no:4

I don't know what a "resource manager" is, but I don't think I'm using anysuch 
thing. Anyway, it worked fine before I started with the newer Mpich2 and HYDRA.

2)  No errors of any kind from the MPI processes/mpiexec. I have set up 
passwordless ssh usage between hostA and hostB, and is working correctly. When 
logging into hostB from hostA I do get the message:

"X11 connection rejected because of wrong authentication.
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Warning: No xauth data; using fake authentication data for X11 forwarding.
Last login: Fri Mar 25 10:39:22 2011 from hostA.uio.no"

Could this be related. I assumed it was harmless and unrelated to mpich since it 
concerns X-forwarding.

3) No, but I will do now, and report back :-)

Best regards
Torquil Sørensen


More information about the mpich-discuss mailing list