[Swift-devel] replication/recall of jobs from slow queues

Mihael Hategan hategan at mcs.anl.gov
Wed May 7 23:05:51 CDT 2008


Ok. Second attempt. swift r1944, cog r2003.

This time it got some testing.

There is still some trouble and that is a possible race condition
between the time a job becomes active and its replicas are canceled. It
may very well happen that if canceling takes a sufficient amount of
time, more than one job will complete causing who knows what.

On Mon, 2008-05-05 at 01:51 -0500, Mihael Hategan wrote:
> And then there's the "pbs simulator" I have. Anyway, the point is it
> (replication) doesn't work just yet (as one would easily have
> suspected). Stay tuned.
> 
> On Sun, 2008-05-04 at 17:50 -0500, Mihael Hategan wrote:
> > That makes sense. I think PBS has similar capabilities, but I'm not sure
> > how one would express that in cog/gram.
> > 
> > On Sun, 2008-05-04 at 22:43 +0000, Ben Clifford wrote:
> > > On Fri, 2 May 2008, Mihael Hategan wrote:
> > > 
> > > > I didn't have time to test this much (given that it's not very easy to
> > > > test), so probably there will be problems.
> > > 
> > > One way I was thinking of testing on a real site is to set profile keys so 
> > > that jobs go into a condor pool with a requirement to not run for a 
> > > specified time after submission (I think that is expressible in the 
> > > classad language). That should give reproducible at-least-one-resubmission 
> > > behaviour.
> > > 
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list