[mpich-discuss] mpd as system process?

Marc Moreau jebnor at gmail.com
Thu Aug 5 14:53:55 CDT 2010


I have been playing this for sometime with no luck what so ever.  mpd
never boots and everything times out.  Here is what I get

=== Begin ===
-catch_rsh /gridware/sge/default/spool/compute-1-24/active_jobs/53194.1/pe_hostfile
/opt/mpich2-1.2.1p1
compute-1-24:4
usage: start_mpich2 [-n <hostname>] mpich2-mpd-path [mpd-parameters ..]

where: 'hostname' gives the name of the target host
startmpich2.sh: check for mpd daemons (1 of 10)
startmpich2.sh: check for mpd daemons (2 of 10)
startmpich2.sh: check for mpd daemons (3 of 10)
startmpich2.sh: check for mpd daemons (4 of 10)
startmpich2.sh: check for mpd daemons (5 of 10)
startmpich2.sh: check for mpd daemons (6 of 10)
startmpich2.sh: check for mpd daemons (7 of 10)
startmpich2.sh: check for mpd daemons (8 of 10)
startmpich2.sh: check for mpd daemons (9 of 10)
startmpich2.sh: check for mpd daemons (10 of 10)
startmpich2.sh: got only 8 of 1 nodes, aborting
-catch_rsh /opt/mpich2-1.2.1p1
mpdallexit: cannot connect to local mpd
(/tmp/mpd2.console_marc.moreau_sge_53194.undefined); possible causes:
  1. no mpd is running on this host
  2. an mpd is running but was started without a "console" (-n option)
In case 1, you can start an mpd on this host with:
    mpd &
and you will be able to run jobs just on this host.
For more details on starting mpds on a set of hosts, see
the MPICH2 Installation Guide.
=== END ===

Any suggestions?

-- Marc

On Thu, Aug 5, 2010 at 11:53 AM, Reuti <reuti at staff.uni-marburg.de> wrote:
> Hi Marc,
>
> Am 05.08.2010 um 19:46 schrieb Marc Moreau:
>
>> I'm setting up MPICH2 on my cluster where users run many relatively
>> short processes ( 2-10 hours ).  I am using SunGridEngine to manage
>> the scheduling. The problem that I am running into is that SGE kills
>> the mpd process when the job is done, even when other jobs are using
>> it.  So if there are multiple MPI jobs running on the same node, they
>> all die when the first process dies.
>>
>> As a solution I'd like to set everything up so that users can just
>> 'run' MPI jobs and not need to worry about starting and killing mpd
>> within each job.  I'm thinking it would be nice to setup mpd as a
>> system process and then have all the jobs run on the system mpd.  Is
>> this sane and possible? Any other solutions ?
>
> please have a look here:
>
> http://gridengine.sunsource.net/howto/mpich2-integration/mpich2-integration.html
>
> it will create one dedicated ring per job. The ring will be setup and removed by the PE start/stop_proc_args scripts. The users just need to setup the correct portnumber in their scripts (please check the included demo-script in the archive for this).
>
> -- Reuti
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>


More information about the mpich-discuss mailing list