[mpich-discuss] Using mpd MPICH2-1.0.8 on 64-bit Mac cluster under SGE

Reuti reuti at Staff.Uni-Marburg.DE
Thu Mar 18 10:58:43 CDT 2010


Hi,

Am 18.03.2010 um 16:51 schrieb Edric Ellis:

> I’m trying to get an MPD build of MPICH2-1.0.8 working on a 64-bit  
> Mac cluster, with jobs being scheduled by SGE.
>
>
>
> I’m submitting a shell script which calls mpdboot based on the  
> hosts allocated by SGE, using “rsh”. When I then attempt to run my  
> process under mpiexec, the process that gets launched on the remote  
> nodes seems to be “broken” in some way. For example, imagine I get  
> allocated nodes “node01” and “node02”. If were to run (from within  
> my shell script that I’ve submitted to SGE)
>
using one MPD ring per job on a unique port might help:

http://gridengine.sunsource.net/howto/mpich2-integration/mpich2- 
integration.html

I never tried it on a Mac though.

-- Reuti

>  mpiexec -n 2 whoami
>
>
>
> this would give me the expected output on “node01” where SGE has  
> launched my wrapper script, but on “node02”, I just see a numeric  
> result from whoami. Also, if I attempt to do something more  
> adventurous like
>
>
>
> mpiexec -n 2 ping -c node01
>
>
>
> this shows that the process launched on “node02” cannot access the  
> network. If I try and launch my actual application, it is  
> completely broken by this lack of access to the network.
>
>
>
> Has anyone seen anything like this? I have no idea if the problem  
> is with MPICH2, SGE, Mac, or something else. Any clues gratefully  
> received. (At the moment, I haven’t been able to attempt to use  
> mpiexec outside of the control of SGE to remove that piece, but I  
> should be able to do that)
>
>
>
> Cheers,
>
>
>
> Edric.
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list