[Swift-devel] second wave of jobs do not start

Ketan Maheshwari ketan at mcs.anl.gov
Wed Mar 11 10:33:05 CDT 2015


Hi

With trunk, coasters on ALCF, I am seeing that after a first wave of jobs
finish, the second wave does not start.

After the completion of first wave of jobs, the Swift progress text shows
jobs in submitted state while the queue (qstat) still shows running status.
After a while the queue walltime expires and there are no more new jobs
submitted to the queue.

Two worker log files are created for the run, possibly the worker shuts
down and restarts for a second wave.

Attached are the run log and worker logs.

Thanks for any help debugging/fixing.
--
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20150311/3ba51f64/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: worker-0311-5002070-000001.log
Type: application/octet-stream
Size: 1587762 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20150311/3ba51f64/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: worker-0311-5002070-000000.log
Type: application/octet-stream
Size: 483696 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20150311/3ba51f64/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run001.tgz
Type: application/x-gzip
Size: 37688 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20150311/3ba51f64/attachment.bin>


More information about the Swift-devel mailing list