[Swift-devel] misassignment of jobs

Michael Wilde wilde at mcs.anl.gov
Sun Nov 21 20:45:38 CST 2010


I was testing with the two mods below in place (long values in both worker timeout and service timeout).

- Mike

login1$ pwd
/scratch/local/wilde/swift/src/trunk.gomods/cog/modules/provider-coaster
login1$ 

login1$ svn diff
Index: src/org/globus/cog/abstraction/coaster/service/CoasterService.java
===================================================================
--- src/org/globus/cog/abstraction/coaster/service/CoasterService.java  (revision 2932)
+++ src/org/globus/cog/abstraction/coaster/service/CoasterService.java  (working copy)
@@ -41,7 +41,7 @@
     public static final Logger logger = Logger
             .getLogger(CoasterService.class);

-    public static final int IDLE_TIMEOUT = 120 * 1000;
+    public static final int IDLE_TIMEOUT = 120 * 1000 /* extend it: */ * 30 * 240;

     public static final int CONNECT_TIMEOUT = 2 * 60 * 1000;

Index: resources/worker.pl
===================================================================
--- resources/worker.pl (revision 2932)
+++ resources/worker.pl (working copy)
@@ -123,7 +123,7 @@
 my $URISTR=$ARGV[0];
 my $BLOCKID=$ARGV[1];
 my $LOGDIR=$ARGV[2];
-my $IDLETIMEOUT = ( $#ARGV <= 2 ) ? (4 * 60) : $ARGV[3];
+my $IDLETIMEOUT = ( $#ARGV <= 2 ) ? (4 * 60 * 60 * 24) : $ARGV[3];


 # REQUESTS holds a map of incoming requests
login1$ 


----- Original Message -----
> Ok. I will remove the idle timeouts from the worker. I do not expect
> any
> negative consequences there given the reasoning I outlined before.
> 
> Mihael
> 
> On Sun, 2010-11-21 at 19:37 -0600, Michael Wilde wrote:
> > OK, re bug 2: I didnt connect the symptoms of this issue with your
> > earlier comments on timeouts, and just verified that you are
> > correct: with the same extended timeouts I was using to try to keep
> > a persistent coaster service up for an extended time, the failing
> > case for bug 2 works.
> >
> > I'll try to reproduce bug 1 now, then 3.
> >
> > - Mike
> >
> >
> > ----- Original Message -----
> > > On Sun, 2010-11-21 at 17:10 -0600, Michael Wilde wrote:
> > > > Mihael,
> > > >
> > > > If you're in fixin' mode,
> > >
> > > I've been in fixin' mode for the past two months :)
> > >
> > > >  I'll spend some time now trying to reproduce the 3 coaster
> > > >  problems
> > > >  that are high on my "needed for users" list:
> > > >
> > > > 1. Swift hangs/fails talking to persistent server if it sites
> > > > idle
> > > > for
> > > > a few minutes, even with large timeout values (which were
> > > > possibly
> > > > not
> > > > set correctly or fully).
> > > >
> > > > 2. With normal coaster mode, if workers start toiming out for
> > > > lack
> > > > of work, the Swift run dies.
> > >
> > > That one is addressed by removing the worker timeout. As I
> > > mentioned
> > > in
> > > a previous email, that timeout is a artifact of an older worker
> > > management scheme.
> > >
> > > >
> > > > 3. Errors in provider staging at high volume.
> > > >
> > > > If you already have test cases for these issues, let me know,
> > > > and
> > > > I'll
> > > > focus on the missing ones. But Im assuming for now you need all
> > > > three.
> > >
> > > I have test cases for 1 and 3. I couldn't reproduce the problems
> > > so
> > > far.
> > >
> > > Mihael
> >

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list