[Swift-devel] misassignment of jobs

Mihael Hategan hategan at mcs.anl.gov
Sun Nov 21 20:49:13 CST 2010


Right. I would hold off on the service timeout. My tests show that it
has no impact, and, in theory, that both shouldn't have an impact and it
should not be removed.

Mihael

On Sun, 2010-11-21 at 20:45 -0600, Michael Wilde wrote:
> I was testing with the two mods below in place (long values in both worker timeout and service timeout).
> 
> - Mike
> 
> login1$ pwd
> /scratch/local/wilde/swift/src/trunk.gomods/cog/modules/provider-coaster
> login1$ 
> 
> login1$ svn diff
> Index: src/org/globus/cog/abstraction/coaster/service/CoasterService.java
> ===================================================================
> --- src/org/globus/cog/abstraction/coaster/service/CoasterService.java  (revision 2932)
> +++ src/org/globus/cog/abstraction/coaster/service/CoasterService.java  (working copy)
> @@ -41,7 +41,7 @@
>      public static final Logger logger = Logger
>              .getLogger(CoasterService.class);
> 
> -    public static final int IDLE_TIMEOUT = 120 * 1000;
> +    public static final int IDLE_TIMEOUT = 120 * 1000 /* extend it: */ * 30 * 240;
> 
>      public static final int CONNECT_TIMEOUT = 2 * 60 * 1000;
> 
> Index: resources/worker.pl
> ===================================================================
> --- resources/worker.pl (revision 2932)
> +++ resources/worker.pl (working copy)
> @@ -123,7 +123,7 @@
>  my $URISTR=$ARGV[0];
>  my $BLOCKID=$ARGV[1];
>  my $LOGDIR=$ARGV[2];
> -my $IDLETIMEOUT = ( $#ARGV <= 2 ) ? (4 * 60) : $ARGV[3];
> +my $IDLETIMEOUT = ( $#ARGV <= 2 ) ? (4 * 60 * 60 * 24) : $ARGV[3];
> 
> 
>  # REQUESTS holds a map of incoming requests
> login1$ 
> 
> 
> ----- Original Message -----
> > Ok. I will remove the idle timeouts from the worker. I do not expect
> > any
> > negative consequences there given the reasoning I outlined before.
> > 
> > Mihael
> > 
> > On Sun, 2010-11-21 at 19:37 -0600, Michael Wilde wrote:
> > > OK, re bug 2: I didnt connect the symptoms of this issue with your
> > > earlier comments on timeouts, and just verified that you are
> > > correct: with the same extended timeouts I was using to try to keep
> > > a persistent coaster service up for an extended time, the failing
> > > case for bug 2 works.
> > >
> > > I'll try to reproduce bug 1 now, then 3.
> > >
> > > - Mike
> > >
> > >
> > > ----- Original Message -----
> > > > On Sun, 2010-11-21 at 17:10 -0600, Michael Wilde wrote:
> > > > > Mihael,
> > > > >
> > > > > If you're in fixin' mode,
> > > >
> > > > I've been in fixin' mode for the past two months :)
> > > >
> > > > >  I'll spend some time now trying to reproduce the 3 coaster
> > > > >  problems
> > > > >  that are high on my "needed for users" list:
> > > > >
> > > > > 1. Swift hangs/fails talking to persistent server if it sites
> > > > > idle
> > > > > for
> > > > > a few minutes, even with large timeout values (which were
> > > > > possibly
> > > > > not
> > > > > set correctly or fully).
> > > > >
> > > > > 2. With normal coaster mode, if workers start toiming out for
> > > > > lack
> > > > > of work, the Swift run dies.
> > > >
> > > > That one is addressed by removing the worker timeout. As I
> > > > mentioned
> > > > in
> > > > a previous email, that timeout is a artifact of an older worker
> > > > management scheme.
> > > >
> > > > >
> > > > > 3. Errors in provider staging at high volume.
> > > > >
> > > > > If you already have test cases for these issues, let me know,
> > > > > and
> > > > > I'll
> > > > > focus on the missing ones. But Im assuming for now you need all
> > > > > three.
> > > >
> > > > I have test cases for 1 and 3. I couldn't reproduce the problems
> > > > so
> > > > far.
> > > >
> > > > Mihael
> > >
> 





More information about the Swift-devel mailing list