[MPICH] MPICH2 hangs on diskless SuSE 10.2 based cluster

Shaun Q shaun at c-think.com
Thu Dec 21 17:16:37 CST 2006


Rusty --

This I did. I can start mpd on one machine, manually connect it to another 
machine, do a ringtest, do a trace....  Still the same problem.

I am running the 1.0.5 release also, FYI.

Any ideas?

Thanks!
Shaun

>I recommend that you follow the steps in the User's Guide starting with 
>the use of mpdcheck to make sure both machines are properly configured 
and 
>can communicate with one another, and start the mpd's on these two 
>machines "by hand" instead of going immediately to the more scalable but 
>less informative mpdboot.
>
>Regards,
>Rusty
>

On Dec 20, 2006, at 6:33 AMDecember, Shaun Qualheim wrote:


     Hey everyone...


     I'm trying to figure out why I'm having an issue with running a job on 
multiple machines here.

     It worked fine with a 32-bit SuSE 9.3 based setup.


     I started it up with a 64-bit base distro and kernel...


     I can start up the process with:
     mpd &; mpiexec -n 2 process &
     and it starts with no issues...


     When I try doing that with 2 machines though...
     mpdboot -n 2 &; mpiexec -n 2 process &
     It hangs for about 5 minutes and then starts up.


     After it starts up and runs to completion, I can issue the same 
command again and it starts up right away.

     Any ideas of where to start here or what might be causing this issue?


     Thanks!
     Shaun







More information about the mpich-discuss mailing list