[MPICH] MPICH2 hangs on diskless SuSE 10.2 based cluster

Shaun Qualheim shaun at c-think.com
Wed Dec 20 16:04:21 CST 2006



Shaun Qualheim wrote:
> Hey everyone...
>
> I'm trying to figure out why I'm having an issue with running a job on 
> multiple machines here.
>
> It worked fine with a 32-bit SuSE 9.3 based setup.
>
> I started it up with a 64-bit base distro and kernel...
>
> I can start up the process with:
> mpd &; mpiexec -n 2 process &
> and it starts with no issues...
>
> When I try doing that with 2 machines though...
> mpdboot -n 2 &; mpiexec -n 2 process &
> It hangs for about 5 minutes and then starts up.
>
> After it starts up and runs to completion, I can issue the same 
> command again and it starts up right away.
>
> Any ideas of where to start here or what might be causing this issue?
>
> Thanks!
> Shaun
>
>
So, I'm getting a little more information too from the users here...

Apparently it works just fine for some of the users... others not.  But 
it's always reproducible for those users it does work for. 

Could this be a shell issue?

Shaun




More information about the mpich-discuss mailing list