[mpich-discuss] mpich2 does not work with SGE

tilakraj dattaram tilakraj1985 at gmail.com
Wed Jul 13 05:13:13 CDT 2011


Hi Reuti

Thanks for your reply. I forgot to mention in my previous message, but I had
tried adding a Parallel Environment in SGE using qconf -ap. I did the
following,

qconf -Ap mpich

and then edited the pe file to,

pe_name            mpich
slots              999
user_lists         NONE
xuser_lists        NONE
start_proc_args    NONE
stop_proc_args    NONE
allocation_rule    $fill_up
control_slaves     TRUE
job_is_first_task  FALSE
urgency_slots      min
accounting_summary FALSE

This did not work. However, here I don't see how SGE would know where to
look for mpich2/ hydra. I do see a mpi directory in the $SGE_ROOT directory,
where there is rocks-mpich.template file that reads the following

pe_name          mpich
slots            9999
user_lists       NONE
xuser_lists      NONE
start_proc_args  /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args   /opt/gridengine/mpi/stopmpi.sh
allocation_rule  $fill_up
control_slaves   TRUE
job_is_first_task FALSE
urgency_slots     min
accounting_summary TRUE

Does SGE need re-configuration after the mpich2 install?

Thanks in advance!

Regards
Tilak

Message: 6
> Date: Tue, 12 Jul 2011 13:19:18 +0200
> From: Reuti <reuti at staff.uni-marburg.de>
> Subject: Re: [mpich-discuss] mpich2 does not work with SGE
> To: mpich-discuss at mcs.anl.gov
> Message-ID:
>        <8768BE3D-BE2D-498C-98A6-D3A72F397291 at staff.uni-marburg.de>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
>
> Am 12.07.2011 um 13:03 schrieb tilakraj dattaram:
>
> > We have a rocks cluster with 10 nodes, with sun grid engine installed and
> running. I then installed the most recent version of mpich2 (1.4) on the
> master and compute nodes. However, we are unable to run parallel jobs
> through SGE (we can submit serial jobs without a problem). I am a sge
> newbie, and most of the installation that we have done is by reading
> step-by-step tutorials on the web.
> >
> > The mpich2 manual says that hydra is the default process manager for
> mpich2, and I have checked that the mpiexec command points to mpiexec.hydra.
> Also, which mpicc, which mpiexec point to the desired location of mpich2. I
> understand that in this version of mpich2, hydra should be integrated with
> SGE by default. But maybe I am missing something here.
> >
> > We are able to run parallel jobs using command line by specifying a host
> file (e.g, mpiexec -f hostfile -np 16 ./a.out), but would like the resource
> manager to take care of allocating resources on the cluster.
>
> it's necessary to set up a so called parallel environment (i.e. a PE) in
> SGE and request it during the job submission. Then a plain mpirun without
> any hostfile or -np specification will do, as all is directly delivered by
> SGE. If all is set up in a proper way, you could even switch off `rsh` and
> `ssh` inside the cluster completely, as SGE's internal startup mechanism is
> used then to start processes on other nodes. In fact, disabling or limiting
> `ssh` to admin staff is a good way to check whether your parallel
> application has a tight integration into the queuingsystem where all slave
> processes are accounted also correctly and under full SGE control for a
> delition by `qdel`.
>
> For SGE there is also a mailing list:
> http://gridengine.org/mailman/listinfo/users
>
> -- Reuti
>
>
>
>
> ------------------------------
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
> End of mpich-discuss Digest, Vol 34, Issue 15
> *********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110713/f49d6d88/attachment.htm>


More information about the mpich-discuss mailing list