[Swift-devel] Hangchecker tweak

Ketan Maheshwari ketancmaheshwari at gmail.com
Wed Jun 29 11:36:48 CDT 2011


Mihael, All,

Continuing with my experiments with Swift trunk pbs coaster provider on
Beagle, somehow, it seems that as soon as the hangchecker kicks in, it
prevents jobs from getting submitted.

In support of this hypothesis, I have about 12 runs with different job
throttle values from 100 - 1000 where, I observe that when there is no
activity for 10s while the stageins are being done, the hangchecker thread
gets invoked and no further job submissions takes place after that.

In one of the longer experiments, I also observed this after a long time
while jobs were running and when the hangchecker gets invoked Swift does not
submit any more jobs.

Following are 2 instances of log where the said phenomena occurs:

/lustre/beagle/ketan/labs/modftdock/production/campaign5/ftdock-20110629-1020-x1la8psc.log
/lustre/beagle/ketan/labs/modftdock/production/campaign5/ftdock-20110629-1021-y43tid39.log

Following is a log where it does not occur:

/lustre/beagle/ketan/labs/modftdock/production/campaign5/ftdock-20110629-1023-rq2eosp5.log


Further notes:
1. In cases where jobs do not get submitted, no submit files are created in
~/.globus/scripts
2. I went through the log files, however, it seems the log file records
things that have happened, for instance vdl:execute2 lines for jobs that
have submitted and absence of them in case they were not. I could not find
any error messages in the log that could indicate what has been happening.

To confirm the hypothesis, could you indicate how could I disable the
hangchecker or increase the time period before it gets invoked.

Any other help you can offer to resolve this would be very useful.

Regards,
-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110629/a4759d29/attachment.html>


More information about the Swift-devel mailing list