[Swift-devel] Re: Coaster error

Jonathan Monette jon.monette at gmail.com
Tue Aug 17 13:21:03 CDT 2010


Or so the qdel error I am seeing is ignorable?  And I am assuming that 
the shutdown failure has something to do with the jobs being run because 
when I run a smaller data set (10 images instead of 1300 images) the 
shutdown error happens at the end of the workflow and I also get the error

Failed to shut down channel
org.globus.cog.karajan.workflow.service.channels.ChannelException: 
Invalid channel: 1338035062: {}
     at 
org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:442)
     at 
org.globus.cog.karajan.workflow.service.channels.ChannelManager.getMetaChannel(ChannelManager.java:422)
     at 
org.globus.cog.karajan.workflow.service.channels.ChannelManager.shutdownChannel(ChannelManager.java:411)
     at 
org.globus.cog.karajan.workflow.service.channels.ChannelManager.handleChannelException(ChannelManager.java:284)
     at 
org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel.handleChannelException(AbstractStreamKarajanChannel.java:83)
     at 
org.globus.cog.karajan.workflow.service.channels.AbstractStreamKarajanChannel$Sender.run(AbstractStreamKarajanChannel.java:257)


On 8/17/10 12:43 PM, Mihael Hategan wrote:
> On Tue, 2010-08-17 at 12:08 -0500, Jonathan Monette wrote:
>    
>> Ok.  Have ran more tests on this problem.  I am running on both
>> localhost and pads.  In the first stage of my workflow I run on
>> localhost to collect some metadata.  I then use this metadata to
>> reproject the images submitting these jobs to pads.  All the images are
>> reprojected and completes without error.  After this the coasters is
>> waiting for more jobs to submit to the workers while localhost is
>> collecting more metadata.  I believe coasters starts to shutdown some of
>> the workers because they are idle and wants to free the resources on the
>> machine(am I correct so far?)
>>      
> You are.
>
>    
>>    During the shutdown some workers are
>> shutdown successfully but there is always 1 or 2 that fail to shutdown
>> and I get the qdel error 153 I mentioned yesterday.  If coasters fails
>> to shutdown a job does the service terminate?
>>      
> No. The qdel part is not critical and is used when workers don't shut
> down cleanly or on time.
>
>    
>>    I ask this because after
>> the job fails to shutdown there are no more jobs being submitted in the
>> queue and my script hangs since it is waiting for the next stage in my
>> workflow to complete.  Is there a coaster parameter that lets coasters
>> know to not shutdown the workers even if they become idle for a bit or
>> is this a legitimate error in coasters?
>>      
> You are assuming that the shutdown failure has something to do with jobs
> not being run. I do not think that's necessarily right.
>
>
>
>    

-- 
Jon

Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein




More information about the Swift-devel mailing list