[Swift-devel] Re: Coaster error
Mihael Hategan
hategan at mcs.anl.gov
Tue Aug 17 12:43:47 CDT 2010
On Tue, 2010-08-17 at 12:08 -0500, Jonathan Monette wrote:
> Ok. Have ran more tests on this problem. I am running on both
> localhost and pads. In the first stage of my workflow I run on
> localhost to collect some metadata. I then use this metadata to
> reproject the images submitting these jobs to pads. All the images are
> reprojected and completes without error. After this the coasters is
> waiting for more jobs to submit to the workers while localhost is
> collecting more metadata. I believe coasters starts to shutdown some of
> the workers because they are idle and wants to free the resources on the
> machine(am I correct so far?)
You are.
> During the shutdown some workers are
> shutdown successfully but there is always 1 or 2 that fail to shutdown
> and I get the qdel error 153 I mentioned yesterday. If coasters fails
> to shutdown a job does the service terminate?
No. The qdel part is not critical and is used when workers don't shut
down cleanly or on time.
> I ask this because after
> the job fails to shutdown there are no more jobs being submitted in the
> queue and my script hangs since it is waiting for the next stage in my
> workflow to complete. Is there a coaster parameter that lets coasters
> know to not shutdown the workers even if they become idle for a bit or
> is this a legitimate error in coasters?
You are assuming that the shutdown failure has something to do with jobs
not being run. I do not think that's necessarily right.
More information about the Swift-devel
mailing list