[mpich-discuss] Using mpd MPICH2-1.0.8 on 64-bit Mac cluster under SGE

Edric Ellis Edric.Ellis at mathworks.co.uk
Thu Mar 18 10:51:31 CDT 2010


Hi all,

I'm trying to get an MPD build of MPICH2-1.0.8 working on a 64-bit Mac cluster, with jobs being scheduled by SGE.

I'm submitting a shell script which calls mpdboot based on the hosts allocated by SGE, using "rsh". When I then attempt to run my process under mpiexec, the process that gets launched on the remote nodes seems to be "broken" in some way. For example, imagine I get allocated nodes "node01" and "node02". If were to run (from within my shell script that I've submitted to SGE)

mpiexec -n 2 whoami

this would give me the expected output on "node01" where SGE has launched my wrapper script, but on "node02", I just see a numeric result from whoami. Also, if I attempt to do something more adventurous like

mpiexec -n 2 ping -c node01

this shows that the process launched on "node02" cannot access the network. If I try and launch my actual application, it is completely broken by this lack of access to the network.

Has anyone seen anything like this? I have no idea if the problem is with MPICH2, SGE, Mac, or something else. Any clues gratefully received. (At the moment, I haven't been able to attempt to use mpiexec outside of the control of SGE to remove that piece, but I should be able to do that)

Cheers,

Edric.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100318/6b2a6323/attachment-0001.htm>


More information about the mpich-discuss mailing list