[mpich-discuss] Re: [mvapich-discuss] Followup: mvapich2 issue regarding mpd timeout in mpiexec (fwd)
Brian Curtis
curtisbr at cse.ohio-state.edu
Mon Jun 9 11:29:58 CDT 2008
As a solution to this problem identified by Dell (see included
emails), we have enhanced the MPD mpiexec in MVAPICH2 with a
recvTimeout based on a multiplier and the number of processes. The
multiplier can be configured with the environment variable
MV2_MPD_RECVTIMEOUT_MULTIPLIER. We wanted to share this patch (see
below) with the MPICH2 community since this issue is also present in
the latest release of MPICH2.
Brian
===================================================================
--- src/pm/mpd/mpiexec.py (revision 2667)
+++ src/pm/mpd/mpiexec.py (working copy)
@@ -841,6 +841,19 @@
print 'no cmd specified'
usage()
+ global recvTimeout
+ recvTimeoutMultiplier = 0.05
+ if os.environ.has_key('MV2_MPD_RECVTIMEOUT_MULTIPLIER'):
+ try:
+ recvTimeoutMultiplier = int(os.environ
['MV2_MPD_RECVTIMEOUT_MULTIPLIER'])
+ except ValueError:
+ try:
+ recvTimeoutMultiplier = float(os.environ
['MV2_MPD_RECVTIMEOUT_MULTIPLIER'])
+ except ValueError:
+ print 'Invalid MV2_MPD_RECVTIMEOUT_MULTIPLIER. Value
must be a number.'
+ sys.exit(-1)
+ recvTimeout = nProcs * recvTimeoutMultiplier
+
argsetLoRange = nextRange
argsetHiRange = nextRange + nProcs - 1
loRange = argsetLoRange
On Jun 5, 2008, at 11:32 PM, Dhabaleswar Panda wrote:
> Hi,
>
> We have received the following note re. MPD timeout on large
> clusters. We
> had a discussion about this in our group and we feel that the best
> solution will be to provide this timeout value in a `dynamic'
> manner based
> on the system size. For example, for smaller system, a timeout
> value of 20
> will be sufficient. However, for larger system, this value may be
> set to
> 200 or higher. Will it be feasible to provide such a solution?
>
> Thanks,
>
> DK
>
> ---------- Forwarded message ----------
> Date: Thu, 29 May 2008 22:06:43 -0500
> From: David_Kewley at Dell.com
> To: mvapich-discuss at cse.ohio-state.edu
> Subject: [mvapich-discuss] Followup: mvapich2 issue regarding mpd
> timeout
> in mpiexec
>
> This is a followup to this thread:
>
> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2007-May/
> 000834
> .html
>
> between Greg Bauer and Qi Gao.
>
> We had the same problem that Greg saw -- failure of mpiexec, with the
> characteristic error message "no msg recvd from mpd when expecting ack
> of request". It was resolved for us by setting recvTimeout in
> mpiexec.py to a higher value, just as Greg suggested and Qi concurred.
> The default value is 20; we chose 200 (we did not experiment with
> values
> between these two, so lower may work in many cases).
>
> I think this change should be made permanent in MVAPICH2. I do not
> think it will negatively impact anyone, because in the four cases
> where
> this timeout is used, if the timeout expires mpiexec immediately makes
> an error exit anyway. So the worst consequence is that mpiexec would
> take longer to fail (3 minutes longer if 200 is used instead of 20).
> The user who encounters this timeout has to fix the root cause of the
> timeout in order to get any work done, so they are not likely to
> encounter it repeatedly and thereby lose lots of runtime simply
> because
> the timeout is large. Is this analysis correct?
>
> Meanwhile, this change would clearly help at least some people with
> large clusters. We see failure with the default recvTimeout
> between 900
> and 1000 processes; larger recvTimeout allows us to scale to 3000
> processes and beyond.
>
> The default setting does not cause failure if I make a simple, direct
> call to mpiexec. I only see it when I use mpirun.lsf to launch a
> large
> job. I think the failure in the LSF case is due to the longer time it
> presumably takes to launch LSF's TaskStarter for every process, etc.
> The time required seems to be O(#processes) in the LSF case. (We have
> LSF 6.2, with a local custom wrapper script for TaskStarter).
>
> If you agree that this change to the value of recvTimeout is OK,
> please
> implement this one-line change in MVAPICH2, and consider
> contributing it
> upstream to MPICH2 as well.
>
> If you decline to make this change, at least it's now on the web that
> this change does fix the problem. :)
>
> Thanks,
> David
>
> David Kewley
> Dell Infrastructure Consulting Services
> Onsite Engineer at the Maui HPC Center
> Cell: 602-460-7617
> David_Kewley at Dell.com
>
> Dell Services: http://www.dell.com/services/
> How am I doing? Email my manager Russell_Kelly at Dell.com with any
> feedback.
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss
More information about the mpich-discuss
mailing list