[Swift-devel] [Bug 72] Campaign for scaling wf up to 244 molecules

bugzilla-daemon at mcs.anl.gov bugzilla-daemon at mcs.anl.gov
Sun Jul 1 00:09:12 CDT 2007


http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=72





------- Comment #7 from iraicu at cs.uchicago.edu  2007-07-01 00:09 -------
(In reply to comment #6)
> (In reply to comment #4)
> > Hi again,
> > Here is an update of yesterday's 244 molecule run.  The experiment ran further
> > than before, but it still did not complete.  There were 240 molecules that
> > completed successfully (in the previous run, no molecule finished), but 4
> > molecules still did not finish. 
> > 
> 
> Actually it looks tasks worked fine:
> bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ubmitted"|wc
>   24309  243090 2806214
> bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ailed"|wc
>    3614   36140  405816
> bash-3.1$ cat MolDyn-244-63ar6atbg2ae1.log |grep "type=1.*ompleted"|wc
>   20695  206950 2389556
> 
> All tasks are accounted for. It may be that some jobs failed 3 times in a row.
> From the logs it looks like the workflow almost finished and it got to the
> point where the error reporting was to be done. Perhaps the stack overflow that
> you saw occurred there, and perhaps the impossible size of the workflow might
> have something to do with it.
> 
The same machine (tg-v024) that we had trouble with before acted up again, I
should have removed it before we started the experiment.  If this is the
consensus, we can certainly try it again, and make sure this machine is not in
the resource pool.  Another idea is to increase the retry # from 3 to something
higher, maybe 10, 30, etc?  Jobs can be resubmitted relatively fast with
Falkon, so retrying many times is not a big overhead... except that it takes
longer for Swift to give up!

Ioan


-- 
Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.



More information about the Swift-devel mailing list