[Swift-devel] replication/recall of jobs from slow queues
Mihael Hategan
hategan at mcs.anl.gov
Wed May 7 23:05:51 CDT 2008
Ok. Second attempt. swift r1944, cog r2003.
This time it got some testing.
There is still some trouble and that is a possible race condition
between the time a job becomes active and its replicas are canceled. It
may very well happen that if canceling takes a sufficient amount of
time, more than one job will complete causing who knows what.
On Mon, 2008-05-05 at 01:51 -0500, Mihael Hategan wrote:
> And then there's the "pbs simulator" I have. Anyway, the point is it
> (replication) doesn't work just yet (as one would easily have
> suspected). Stay tuned.
>
> On Sun, 2008-05-04 at 17:50 -0500, Mihael Hategan wrote:
> > That makes sense. I think PBS has similar capabilities, but I'm not sure
> > how one would express that in cog/gram.
> >
> > On Sun, 2008-05-04 at 22:43 +0000, Ben Clifford wrote:
> > > On Fri, 2 May 2008, Mihael Hategan wrote:
> > >
> > > > I didn't have time to test this much (given that it's not very easy to
> > > > test), so probably there will be problems.
> > >
> > > One way I was thinking of testing on a real site is to set profile keys so
> > > that jobs go into a condor pool with a requirement to not run for a
> > > specified time after submission (I think that is expressible in the
> > > classad language). That should give reproducible at-least-one-resubmission
> > > behaviour.
> > >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
More information about the Swift-devel
mailing list