[mpich-discuss] mpich2 does not work with SGE
tilakraj dattaram
tilakraj1985 at gmail.com
Wed Jul 13 05:13:13 CDT 2011
Hi Reuti
Thanks for your reply. I forgot to mention in my previous message, but I had
tried adding a Parallel Environment in SGE using qconf -ap. I did the
following,
qconf -Ap mpich
and then edited the pe file to,
pe_name mpich
slots 999
user_lists NONE
xuser_lists NONE
start_proc_args NONE
stop_proc_args NONE
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary FALSE
This did not work. However, here I don't see how SGE would know where to
look for mpich2/ hydra. I do see a mpi directory in the $SGE_ROOT directory,
where there is rocks-mpich.template file that reads the following
pe_name mpich
slots 9999
user_lists NONE
xuser_lists NONE
start_proc_args /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile
stop_proc_args /opt/gridengine/mpi/stopmpi.sh
allocation_rule $fill_up
control_slaves TRUE
job_is_first_task FALSE
urgency_slots min
accounting_summary TRUE
Does SGE need re-configuration after the mpich2 install?
Thanks in advance!
Regards
Tilak
Message: 6
> Date: Tue, 12 Jul 2011 13:19:18 +0200
> From: Reuti <reuti at staff.uni-marburg.de>
> Subject: Re: [mpich-discuss] mpich2 does not work with SGE
> To: mpich-discuss at mcs.anl.gov
> Message-ID:
> <8768BE3D-BE2D-498C-98A6-D3A72F397291 at staff.uni-marburg.de>
> Content-Type: text/plain; charset=us-ascii
>
> Hi,
>
> Am 12.07.2011 um 13:03 schrieb tilakraj dattaram:
>
> > We have a rocks cluster with 10 nodes, with sun grid engine installed and
> running. I then installed the most recent version of mpich2 (1.4) on the
> master and compute nodes. However, we are unable to run parallel jobs
> through SGE (we can submit serial jobs without a problem). I am a sge
> newbie, and most of the installation that we have done is by reading
> step-by-step tutorials on the web.
> >
> > The mpich2 manual says that hydra is the default process manager for
> mpich2, and I have checked that the mpiexec command points to mpiexec.hydra.
> Also, which mpicc, which mpiexec point to the desired location of mpich2. I
> understand that in this version of mpich2, hydra should be integrated with
> SGE by default. But maybe I am missing something here.
> >
> > We are able to run parallel jobs using command line by specifying a host
> file (e.g, mpiexec -f hostfile -np 16 ./a.out), but would like the resource
> manager to take care of allocating resources on the cluster.
>
> it's necessary to set up a so called parallel environment (i.e. a PE) in
> SGE and request it during the job submission. Then a plain mpirun without
> any hostfile or -np specification will do, as all is directly delivered by
> SGE. If all is set up in a proper way, you could even switch off `rsh` and
> `ssh` inside the cluster completely, as SGE's internal startup mechanism is
> used then to start processes on other nodes. In fact, disabling or limiting
> `ssh` to admin staff is a good way to check whether your parallel
> application has a tight integration into the queuingsystem where all slave
> processes are accounted also correctly and under full SGE control for a
> delition by `qdel`.
>
> For SGE there is also a mailing list:
> http://gridengine.org/mailman/listinfo/users
>
> -- Reuti
>
>
>
>
> ------------------------------
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
>
> End of mpich-discuss Digest, Vol 34, Issue 15
> *********************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20110713/f49d6d88/attachment.htm>
More information about the mpich-discuss
mailing list