[Swift-devel] Re: Coaster error

Mihael Hategan hategan at mcs.anl.gov
Tue Aug 17 12:43:47 CDT 2010


On Tue, 2010-08-17 at 12:08 -0500, Jonathan Monette wrote:
> Ok.  Have ran more tests on this problem.  I am running on both 
> localhost and pads.  In the first stage of my workflow I run on 
> localhost to collect some metadata.  I then use this metadata to 
> reproject the images submitting these jobs to pads.  All the images are 
> reprojected and completes without error.  After this the coasters is 
> waiting for more jobs to submit to the workers while localhost is 
> collecting more metadata.  I believe coasters starts to shutdown some of 
> the workers because they are idle and wants to free the resources on the 
> machine(am I correct so far?)

You are.

>   During the shutdown some workers are 
> shutdown successfully but there is always 1 or 2 that fail to shutdown 
> and I get the qdel error 153 I mentioned yesterday.  If coasters fails 
> to shutdown a job does the service terminate?

No. The qdel part is not critical and is used when workers don't shut
down cleanly or on time.

>   I ask this because after 
> the job fails to shutdown there are no more jobs being submitted in the 
> queue and my script hangs since it is waiting for the next stage in my 
> workflow to complete.  Is there a coaster parameter that lets coasters 
> know to not shutdown the workers even if they become idle for a bit or 
> is this a legitimate error in coasters?

You are assuming that the shutdown failure has something to do with jobs
not being run. I do not think that's necessarily right.






More information about the Swift-devel mailing list