[mpich-discuss] sge and mpich2

Dave Goodell goodell at mcs.anl.gov
Mon Jun 20 17:02:20 CDT 2011


It looks like you are having problems with mpd: http://wiki.mcs.anl.gov/mpich2/index.php/Frequently_Asked_Questions#Q:_I_don.27t_like_.3CWHATEVER.3E_about_mpd.2C_or_I.27m_having_a_problem_with_mpdboot.2C_can_you_fix_it.3F

Please use hydra instead of MPD: http://wiki.mcs.anl.gov/mpich2/index.php/Using_the_Hydra_Process_Manager

-Dave

On Jun 20, 2011, at 4:25 PM CDT, c cook wrote:

> Hello,
> 
> I was using the sge6.01u4 to runs some serial jobs for some time.
> 
> The cluster I am using has 8+1 nodes with Opteron procs. 
> 
> I wanted to take advantage of this as the software I am using has a parallel version.
> So I've installed mpich2 as the parallel enviroment, I've activated the mpd demon. when doing mpdtrace -l it sees all the 8 nodes(slave) + 1 headnode
> Now when I am submitting the job using this script:
> 
> #!/bin/bash
> #$ -S /bin/bash
> #$ -o test.log
> #$ -e test.err
> #$ -N TEST_Parallel
> #$ -pe mpich 2
> #$ -cwd
> 
> mpiexec -n $NSLOTS siesta <input> output 
> 
> the scheduler submits the job, when doing qstat I see that it's running but  no output is produced, and this will go on for days, nothing happens, the job will stay in the queue with status "r" forever.
> the only info i get is in the test.log file is:
> 
> -catch_rsh /home/sge6.01u4/default/spool/cn105/active_jobs/4049.1/pe_hostfile
> cn105
> cn102
>  
> so it seems that the scheduler did the job 
> nothing in the test.err, the output is created, but it's empty.
> the nodes are from cn101 to cn108
> 
> 
> The serial version works fine, this is the script I am using
> 
> #!/bin/bash
> #$ -S /bin/bash
> #$ -o test.log
> #$ -e test.err
> #$ -N TEST
> #$ -cwd
> 
>  siesta <input> output 
> 
> 
>  I may have missed something during the instalation of mpich2.
> 
> Maybe some of you encountered similar problems, any ideas are welcomed.
> 
> Thanks,
> Eli
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list