[Swift-devel] CASP jobs hang - seems to be in coaster scheduling

Michael Wilde wilde at mcs.anl.gov
Thu Jul 1 10:23:40 CDT 2010


Sorry, false alarm - please ignore the request below.

The problem was indeed simply requesting a larger maxwalltime than any available coaster maxtime slot.

Can this be detected and a clear error message issued, as well as ending the run?

- Mike

----- wilde at mcs.anl.gov wrote:

> [Mihael: help urgently needed on this if possible]
> 
> Aashish, I see the runs you submitted around 3-4AM this morning in
> /home/aashish/CASP/{T0608,T0610,T0611}
> 
> Each of them show a similar problem to what we saw earlier last night
> with T0608: the script submits 300 jobs to the pads coaster pool, and
> none of them run.
> 
> In some of these scripts, the first round of 300 (boostThreader) work
> fine, but the later round of 300 loops jobs get "stuck".
> 
> Mihael, can you set aside some time as soon as possible this morning
> to look at these? These need to be submitted to CASP by 2PM CDT today,
> so attention to the problem is rather urgent.
> 
> The scripts are all coming from /home/aashish/RapLoops
> The swift release is from /home/wilde/swift/src/stable/...
> 
> In the above directories, you will find all source for scripts,
> mappers, tc, and sites, as well as all logs. In some of the Tnnnn
> directories (each one is a protein target for the CASP competition)
> you will see multiple runs, each with an outN file log of stdout/err
> and then a run directory for that run with all relevant files.
> 
> This *looks* like the familiar problem of trying to run an app whose
> maxwalltime wont fit into any available coaster slot, but the times in
> tc and sites.xml dont seem to explain that behavior.
> 
> This script has been running well since May; "slight" changes were
> made to work around the unavailability of GPFS on PADS this week, but
> we still cant figure out why these scripts are hanging in this
> manner.
> 
> - Mike
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list