[Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules

bugzilla-daemon at mcs.anl.gov bugzilla-daemon at mcs.anl.gov
Sun Jul 1 10:48:09 CDT 2007


http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72





------- Comment #11 from iraicu at cs.uchicago.edu  2007-07-01 10:48 -------
(In reply to comment #9)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > (In reply to comment #4)
> > The same machine (tg-v024) that we had trouble with before acted up again, I
> > should have removed it before we started the experiment.  If this is the
> > consensus, we can certainly try it again, and make sure this machine is not in
> > the resource pool.  Another idea is to increase the retry # from 3 to something
> > higher, maybe 10, 30, etc?
> 
> Not a good idea in the general case, since many times the error may not be
> something temporary. The swift scheduler takes bad machines into account and
> attempts to avoid submitting to them.
>
Yes, but in this case, Falkon was the only set of resources that were available
to Swift, so giving up early means giving up on the entire workflow.  If it was
indeed that the # of failures reached up to the maximum of 3 and that is why
the worklow didn't complete, I would argue that it would be worthwhile to
increase this upper ceiling.... at least when running solely with Falkon, or at
the very least, for this experiment to see th 244 mol run succeed.  Remember
that Falkon is much faster than GRAM/PBS, so if errors happen quick, as in the
case on this tg-v024 node, where it happens in <50 ms, then 1000s of errors can
happen in a matter of seconds to minutes....  I am not sure what the correct
solution is, bu something to consider as the dynamics of the problem is now
different than it was before prior to Falkon.

Ioan 
> > 
> > Ioan
> > 
> 


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.



More information about the Swift-devel mailing list