[Swift-devel] misassignment of jobs
Michael Wilde
wilde at mcs.anl.gov
Sun Nov 21 20:45:38 CST 2010
I was testing with the two mods below in place (long values in both worker timeout and service timeout).
- Mike
login1$ pwd
/scratch/local/wilde/swift/src/trunk.gomods/cog/modules/provider-coaster
login1$
login1$ svn diff
Index: src/org/globus/cog/abstraction/coaster/service/CoasterService.java
===================================================================
--- src/org/globus/cog/abstraction/coaster/service/CoasterService.java (revision 2932)
+++ src/org/globus/cog/abstraction/coaster/service/CoasterService.java (working copy)
@@ -41,7 +41,7 @@
public static final Logger logger = Logger
.getLogger(CoasterService.class);
- public static final int IDLE_TIMEOUT = 120 * 1000;
+ public static final int IDLE_TIMEOUT = 120 * 1000 /* extend it: */ * 30 * 240;
public static final int CONNECT_TIMEOUT = 2 * 60 * 1000;
Index: resources/worker.pl
===================================================================
--- resources/worker.pl (revision 2932)
+++ resources/worker.pl (working copy)
@@ -123,7 +123,7 @@
my $URISTR=$ARGV[0];
my $BLOCKID=$ARGV[1];
my $LOGDIR=$ARGV[2];
-my $IDLETIMEOUT = ( $#ARGV <= 2 ) ? (4 * 60) : $ARGV[3];
+my $IDLETIMEOUT = ( $#ARGV <= 2 ) ? (4 * 60 * 60 * 24) : $ARGV[3];
# REQUESTS holds a map of incoming requests
login1$
----- Original Message -----
> Ok. I will remove the idle timeouts from the worker. I do not expect
> any
> negative consequences there given the reasoning I outlined before.
>
> Mihael
>
> On Sun, 2010-11-21 at 19:37 -0600, Michael Wilde wrote:
> > OK, re bug 2: I didnt connect the symptoms of this issue with your
> > earlier comments on timeouts, and just verified that you are
> > correct: with the same extended timeouts I was using to try to keep
> > a persistent coaster service up for an extended time, the failing
> > case for bug 2 works.
> >
> > I'll try to reproduce bug 1 now, then 3.
> >
> > - Mike
> >
> >
> > ----- Original Message -----
> > > On Sun, 2010-11-21 at 17:10 -0600, Michael Wilde wrote:
> > > > Mihael,
> > > >
> > > > If you're in fixin' mode,
> > >
> > > I've been in fixin' mode for the past two months :)
> > >
> > > > I'll spend some time now trying to reproduce the 3 coaster
> > > > problems
> > > > that are high on my "needed for users" list:
> > > >
> > > > 1. Swift hangs/fails talking to persistent server if it sites
> > > > idle
> > > > for
> > > > a few minutes, even with large timeout values (which were
> > > > possibly
> > > > not
> > > > set correctly or fully).
> > > >
> > > > 2. With normal coaster mode, if workers start toiming out for
> > > > lack
> > > > of work, the Swift run dies.
> > >
> > > That one is addressed by removing the worker timeout. As I
> > > mentioned
> > > in
> > > a previous email, that timeout is a artifact of an older worker
> > > management scheme.
> > >
> > > >
> > > > 3. Errors in provider staging at high volume.
> > > >
> > > > If you already have test cases for these issues, let me know,
> > > > and
> > > > I'll
> > > > focus on the missing ones. But Im assuming for now you need all
> > > > three.
> > >
> > > I have test cases for 1 and 3. I couldn't reproduce the problems
> > > so
> > > far.
> > >
> > > Mihael
> >
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list