[Swift-devel] CASP jobs hang - seems to be in coaster scheduling
Michael Wilde
wilde at mcs.anl.gov
Thu Jul 1 10:23:40 CDT 2010
Sorry, false alarm - please ignore the request below.
The problem was indeed simply requesting a larger maxwalltime than any available coaster maxtime slot.
Can this be detected and a clear error message issued, as well as ending the run?
- Mike
----- wilde at mcs.anl.gov wrote:
> [Mihael: help urgently needed on this if possible]
>
> Aashish, I see the runs you submitted around 3-4AM this morning in
> /home/aashish/CASP/{T0608,T0610,T0611}
>
> Each of them show a similar problem to what we saw earlier last night
> with T0608: the script submits 300 jobs to the pads coaster pool, and
> none of them run.
>
> In some of these scripts, the first round of 300 (boostThreader) work
> fine, but the later round of 300 loops jobs get "stuck".
>
> Mihael, can you set aside some time as soon as possible this morning
> to look at these? These need to be submitted to CASP by 2PM CDT today,
> so attention to the problem is rather urgent.
>
> The scripts are all coming from /home/aashish/RapLoops
> The swift release is from /home/wilde/swift/src/stable/...
>
> In the above directories, you will find all source for scripts,
> mappers, tc, and sites, as well as all logs. In some of the Tnnnn
> directories (each one is a protein target for the CASP competition)
> you will see multiple runs, each with an outN file log of stdout/err
> and then a run directory for that run with all relevant files.
>
> This *looks* like the familiar problem of trying to run an app whose
> maxwalltime wont fit into any available coaster slot, but the times in
> tc and sites.xml dont seem to explain that behavior.
>
> This script has been running well since May; "slight" changes were
> made to work around the unavailability of GPFS on PADS this week, but
> we still cant figure out why these scripts are hanging in this
> manner.
>
> - Mike
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list