[Swift-devel] Hangchecker tweak

Mihael Hategan hategan at mcs.anl.gov
Wed Jun 29 11:46:31 CDT 2011


Can you copy those logs to a ci machine?

On Wed, 2011-06-29 at 11:36 -0500, Ketan Maheshwari wrote:
> Mihael, All,
> 
> Continuing with my experiments with Swift trunk pbs coaster provider
> on Beagle, somehow, it seems that as soon as the hangchecker kicks in,
> it prevents jobs from getting submitted.
> 
> In support of this hypothesis, I have about 12 runs with different job
> throttle values from 100 - 1000 where, I observe that when there is no
> activity for 10s while the stageins are being done, the hangchecker
> thread gets invoked and no further job submissions takes place after
> that.
> 
> In one of the longer experiments, I also observed this after a long
> time while jobs were running and when the hangchecker gets invoked
> Swift does not submit any more jobs.
> 
> Following are 2 instances of log where the said phenomena occurs:
> 
> /lustre/beagle/ketan/labs/modftdock/production/campaign5/ftdock-20110629-1020-x1la8psc.log
> /lustre/beagle/ketan/labs/modftdock/production/campaign5/ftdock-20110629-1021-y43tid39.log
> 
> Following is a log where it does not occur:
> 
> /lustre/beagle/ketan/labs/modftdock/production/campaign5/ftdock-20110629-1023-rq2eosp5.log
> 
> 
> Further notes:
> 1. In cases where jobs do not get submitted, no submit files are
> created in ~/.globus/scripts
> 2. I went through the log files, however, it seems the log file
> records things that have happened, for instance vdl:execute2 lines for
> jobs that have submitted and absence of them in case they were not. I
> could not find any error messages in the log that could indicate what
> has been happening.
> 
> To confirm the hypothesis, could you indicate how could I disable the
> hangchecker or increase the time period before it gets invoked.
> 
> Any other help you can offer to resolve this would be very useful.
> 
> Regards,
> -- 
> Ketan
> 
> 





More information about the Swift-devel mailing list