[Swift-user] hung submission
Yadu Nand Babuji
yadunand at uchicago.edu
Sun May 3 08:32:46 CDT 2015
Hi Mark,
What you are seeing is progress reports from swift at an interval of
30s, and all this
indicates is that your jobs were submitted to the queue for execution.
Until the local resource
manager, in this case the SGE scheduler starts the execution of jobs
swift will have to wait.
From you description all I can gather is that you are seeing long wait
times, with no indications
of a any failure.
Could you check if you can spot the jobs submitted by swift to the queue
? For this, open
a separate terminal on the login node while your swift run is waiting in
submitted state,
and use qstat to see your jobs.
[coursa1 at login06 part05]$ qstat
job-ID prior name user state submit/start at
queue slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
6593408 0.00000 B0503-2802 coursa1 qw 05/03/2015
14:28:40 1
6593409 0.00000 B0503-2802 coursa1 qw 05/03/2015
14:28:41 1
The qw state indicates that your jobs are waiting in the queue.
Thanks,
Yadu
On 05/03/2015 01:11 AM, Altaweel, Mark wrote:
> Hi,
>
> I tried executing Swift on our institutions’s sge-based cluster and
> the submission seems hung or not executing properly. It has the
> following message:
>
> Swift 0.96-RC1 git-rev: c7a1dc478a40865f5639f186284697d53978bd48
> heads/release-0.96-swift 6274 (modified locally)
> RunID: run002
> Progress: Sun, 03 May 2015 07:00:29+0100
> Number of parameter combinations: 2
> Stride: 1
> Begin: 1, End: 1
> Begin: 2, End: 2
> Progress: Sun, 03 May 2015 07:00:30+0100 Submitted:2
> Error: No parallel environment specified
> Progress: Sun, 03 May 2015 07:01:00+0100 Submitted:2
> Progress: Sun, 03 May 2015 07:01:30+0100 Submitted:2
> Progress: Sun, 03 May 2015 07:02:00+0100 Submitted:2
> Progress: Sun, 03 May 2015 07:02:30+0100 Submitted:2
> Progress: Sun, 03 May 2015 07:03:00+0100 Submitted:2
> Progress: Sun, 03 May 2015 07:03:30+0100 Submitted:2
> Progress: Sun, 03 May 2015 07:04:00+0100 Submitted:2
> Progress: Sun, 03 May 2015 07:04:30+0100 Submitted:2
>
> This is just repeated and does not seem to stop
>
> The log file has the following messages, which also repeat:
>
> 2015-05-03 07:08:22,401+0100 INFO RuntimeStats$ProgressTicker HeapMax:
> 954728448, CrtHeap: 378535936, UsedHeap: 64559392, JVMThreads: 52
> 2015-05-03 07:08:23,401+0100 INFO RuntimeStats$ProgressTicker HeapMax:
> 954728448, CrtHeap: 378535936, UsedHeap: 64559432, JVMThreads: 52
> 2015-05-03 07:08:23,709+0100 INFO AbstractQueuePoller Actively
> monitored: 1, New: 0, Done: 0
> 2015-05-03 07:08:24,401+0100 INFO RuntimeStats$ProgressTicker HeapMax:
> 954728448, CrtHeap: 378535936, UsedHeap: 64584080, JVMThreads: 52
> 2015-05-03 07:08:25,401+0100 INFO RuntimeStats$ProgressTicker HeapMax:
> 954728448, CrtHeap: 378535936, UsedHeap: 64584120, JVMThreads: 52
> 2015-05-03 07:08:26,401+0100 INFO RuntimeStats$ProgressTicker HeapMax:
> 954728448, CrtHeap: 378535936, UsedHeap: 64584160, JVMThreads: 52
> 2015-05-03 07:08:27,401+0100 INFO RuntimeStats$ProgressTicker HeapMax:
> 954728448, CrtHeap: 378535936, UsedHeap: 64584200, JVMThreads: 52
> 2015-05-03 07:08:28,401+0100 INFO RuntimeStats$ProgressTicker HeapMax:
> 954728448, CrtHeap: 378535936, UsedHeap: 64584240, JVMThreads: 52
> 2015-05-03 07:08:29,401+0100 INFO RuntimeStats$ProgressTicker HeapMax:
> 954728448, CrtHeap: 378535936, UsedHeap: 64584280, JVMThreads: 52
>
>
> I did run this locally to see if anything is wrong with the submission
> and it worked fine with proper output.
>
> Thank you.
>
> Mark
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20150503/f26e46a9/attachment.html>
More information about the Swift-user
mailing list