[mpich-discuss] multiple mpich2 jobs on a single node
Bryan Putnam
bfp at purdue.edu
Thu May 1 12:40:21 CDT 2008
On Thu, 1 May 2008, Rajeev Thakur wrote:
> Can you try running them with a single mpdboot?
>
> Rajeev
Rajeev,
Yes, it does work correctly when I use only one mpdboot as you describe.
However, this is within a PBS environment, and the user doesn't know
whether or not the daemon is already running.
Bryan
>
> > -----Original Message-----
> > From: owner-mpich-discuss at mcs.anl.gov
> > [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Bryan Putnam
> > Sent: Thursday, May 01, 2008 7:52 AM
> > To: mpich-discuss at mcs.anl.gov
> > Subject: [mpich-discuss] multiple mpich2 jobs on a single node
> >
> > Hi, perhaps this is an FAQ, but I can't seem to find it.
> >
> > I've noticed that if I have two mpich2 jobs running on for
> > example, the same 8-processor node, submitted as
> >
> > mpdboot
> > mpirun -np 4 ./a.out
> >
> >
> > mpdboot
> > mpirun -np 4 ./a.out2
> >
> >
> > then when one of the jobs completes, it kills all the mpd
> > processes along
> > with it, and the remaining job dies with message
> >
> > job aborted; reason = mpd disappeared
> >
> >
> > Is there an easy fix for this?
> >
> > Thanks,
> > Bryan
> >
> >
> > --
> > Bryan Putnam
> > Rosen Center for Advanced Computing, Purdue University
> > Young Hall (Rm. 519)
> > 302 Wood Street
> > West Lafayette, IN 47907-2108
> > Ph 765-496-8225 Fax 765-494-0566
> > bfp at purdue.edu
> > http://www.purdue.edu/itap
> >
> >
> >
> >
>
>
More information about the mpich-discuss
mailing list