[mpich-discuss] Re: [mvapich-discuss] Followup: mvapich2 issue regarding mpd timeout in mpiexec (fwd)

Brian Curtis curtisbr at cse.ohio-state.edu
Mon Jun 9 11:29:58 CDT 2008


As a solution to this problem identified by Dell (see included  
emails), we have enhanced the MPD mpiexec in MVAPICH2 with a  
recvTimeout based on a multiplier and the number of processes.  The  
multiplier can be configured with the environment variable  
MV2_MPD_RECVTIMEOUT_MULTIPLIER.  We wanted to share this patch (see  
below) with the MPICH2 community since this issue is also present in  
the latest release of MPICH2.


Brian



===================================================================
--- src/pm/mpd/mpiexec.py       (revision 2667)
+++ src/pm/mpd/mpiexec.py       (working copy)
@@ -841,6 +841,19 @@
          print 'no cmd specified'
          usage()

+    global recvTimeout
+    recvTimeoutMultiplier = 0.05
+    if os.environ.has_key('MV2_MPD_RECVTIMEOUT_MULTIPLIER'):
+        try:
+            recvTimeoutMultiplier = int(os.environ 
['MV2_MPD_RECVTIMEOUT_MULTIPLIER'])
+        except ValueError:
+            try:
+                recvTimeoutMultiplier = float(os.environ 
['MV2_MPD_RECVTIMEOUT_MULTIPLIER'])
+            except ValueError:
+                print 'Invalid MV2_MPD_RECVTIMEOUT_MULTIPLIER. Value  
must be a number.'
+                sys.exit(-1)
+    recvTimeout = nProcs * recvTimeoutMultiplier
+
      argsetLoRange = nextRange
      argsetHiRange = nextRange + nProcs - 1
      loRange = argsetLoRange




On Jun 5, 2008, at 11:32 PM, Dhabaleswar Panda wrote:

> Hi,
>
> We have received the following note re. MPD timeout on large  
> clusters. We
> had a discussion about this in our group and we feel that the best
> solution will be to provide this timeout value in a `dynamic'  
> manner based
> on the system size. For example, for smaller system, a timeout  
> value of 20
> will be sufficient. However, for larger system, this value may be  
> set to
> 200 or higher. Will it be feasible to provide such a solution?
>
> Thanks,
>
> DK
>
> ---------- Forwarded message ----------
> Date: Thu, 29 May 2008 22:06:43 -0500
> From: David_Kewley at Dell.com
> To: mvapich-discuss at cse.ohio-state.edu
> Subject: [mvapich-discuss] Followup: mvapich2 issue regarding mpd  
> timeout
>     in mpiexec
>
> This is a followup to this thread:
>
> http://mail.cse.ohio-state.edu/pipermail/mvapich-discuss/2007-May/ 
> 000834
> .html
>
> between Greg Bauer and Qi Gao.
>
> We had the same problem that Greg saw -- failure of mpiexec, with the
> characteristic error message "no msg recvd from mpd when expecting ack
> of request".  It was resolved for us by setting recvTimeout in
> mpiexec.py to a higher value, just as Greg suggested and Qi concurred.
> The default value is 20; we chose 200 (we did not experiment with  
> values
> between these two, so lower may work in many cases).
>
> I think this change should be made permanent in MVAPICH2.  I do not
> think it will negatively impact anyone, because in the four cases  
> where
> this timeout is used, if the timeout expires mpiexec immediately makes
> an error exit anyway.  So the worst consequence is that mpiexec would
> take longer to fail (3 minutes longer if 200 is used instead of 20).
> The user who encounters this timeout has to fix the root cause of the
> timeout in order to get any work done, so they are not likely to
> encounter it repeatedly and thereby lose lots of runtime simply  
> because
> the timeout is large.  Is this analysis correct?
>
> Meanwhile, this change would clearly help at least some people with
> large clusters.  We see failure with the default recvTimeout  
> between 900
> and 1000 processes; larger recvTimeout allows us to scale to 3000
> processes and beyond.
>
> The default setting does not cause failure if I make a simple, direct
> call to mpiexec.  I only see it when I use mpirun.lsf to launch a  
> large
> job.  I think the failure in the LSF case is due to the longer time it
> presumably takes to launch LSF's TaskStarter for every process, etc.
> The time required seems to be O(#processes) in the LSF case.  (We have
> LSF 6.2, with a local custom wrapper script for TaskStarter).
>
> If you agree that this change to the value of recvTimeout is OK,  
> please
> implement this one-line change in MVAPICH2, and consider  
> contributing it
> upstream to MPICH2 as well.
>
> If you decline to make this change, at least it's now on the web that
> this change does fix the problem. :)
>
> Thanks,
> David
>
> David Kewley
> Dell Infrastructure Consulting Services
> Onsite Engineer at the Maui HPC Center
> Cell: 602-460-7617
> David_Kewley at Dell.com
>
> Dell Services: http://www.dell.com/services/
> How am I doing? Email my manager Russell_Kelly at Dell.com with any
> feedback.
>
>
> _______________________________________________
> mvapich-discuss mailing list
> mvapich-discuss at cse.ohio-state.edu
> http://mail.cse.ohio-state.edu/mailman/listinfo/mvapich-discuss




More information about the mpich-discuss mailing list