[mpich-discuss] sge and mpich2
c cook
csecook at gmail.com
Mon Jun 20 16:25:04 CDT 2011
Hello,
I was using the sge6.01u4 to runs some serial jobs for some time.
The cluster I am using has 8+1 nodes with Opteron procs.
I wanted to take advantage of this as the software I am using has a parallel
version.
So I've installed mpich2 as the parallel enviroment, I've activated the mpd
demon. when doing mpdtrace -l it sees all the 8 nodes(slave) + 1 headnode
Now when I am submitting the job using this script:
#!/bin/bash
#$ -S /bin/bash
#$ -o test.log
#$ -e test.err
#$ -N TEST_Parallel
#$ -pe mpich 2
#$ -cwd
mpiexec -n $NSLOTS siesta <input> output
the scheduler submits the job, when doing qstat I see that it's running but
no output is produced, and this will go on for days, nothing happens, the
job will stay in the queue with status "r" forever.
the only info i get is in the test.log file is:
-catch_rsh
/home/sge6.01u4/default/spool/cn105/active_jobs/4049.1/pe_hostfile
cn105
cn102
so it seems that the scheduler did the job
nothing in the test.err, the output is created, but it's empty.
the nodes are from cn101 to cn108
The serial version works fine, this is the script I am using
#!/bin/bash
#$ -S /bin/bash
#$ -o test.log
#$ -e test.err
#$ -N TEST
#$ -cwd
siesta <input> output
I may have missed something during the instalation of mpich2.
Maybe some of you encountered similar problems, any ideas are welcomed.
Thanks,
Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110620/7ccc5c23/attachment-0001.htm>
More information about the mpich-discuss
mailing list