[Swift-user] hung submission

Yadu Nand Babuji yadunand at uchicago.edu
Sun May 3 08:32:46 CDT 2015


Hi Mark,

What you are seeing is progress reports from swift at an interval of 
30s, and all this
indicates is that your jobs were submitted to the queue for execution. 
Until the local resource
manager, in this case the SGE scheduler starts the execution of jobs 
swift will have to wait.
 From you description all I can gather is that you are seeing long wait 
times, with no indications
of a any failure.

Could you check if you can spot the jobs submitted by swift to the queue 
? For this, open
a separate terminal on the login node while your swift run is waiting in 
submitted state,
and use qstat to see your jobs.

[coursa1 at login06 part05]$ qstat
job-ID  prior   name       user         state submit/start at 
queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
6593408 0.00000 B0503-2802 coursa1      qw    05/03/2015 
14:28:40                                    1
6593409 0.00000 B0503-2802 coursa1      qw    05/03/2015 
14:28:41                                    1

The qw state indicates that your jobs are waiting in the queue.

Thanks,
Yadu


On 05/03/2015 01:11 AM, Altaweel, Mark wrote:
> Hi,
>
> I tried executing Swift on our institutions’s sge-based cluster and 
> the submission seems hung or not executing properly. It has the 
> following message:
>
> Swift 0.96-RC1 git-rev: c7a1dc478a40865f5639f186284697d53978bd48 
> heads/release-0.96-swift 6274 (modified locally)
> RunID: run002
> Progress: Sun, 03 May 2015 07:00:29+0100
> Number of parameter combinations: 2
> Stride: 1
> Begin: 1, End: 1
> Begin: 2, End: 2
> Progress: Sun, 03 May 2015 07:00:30+0100  Submitted:2
> Error: No parallel environment specified
> Progress: Sun, 03 May 2015 07:01:00+0100  Submitted:2
> Progress: Sun, 03 May 2015 07:01:30+0100  Submitted:2
> Progress: Sun, 03 May 2015 07:02:00+0100  Submitted:2
> Progress: Sun, 03 May 2015 07:02:30+0100  Submitted:2
> Progress: Sun, 03 May 2015 07:03:00+0100  Submitted:2
> Progress: Sun, 03 May 2015 07:03:30+0100  Submitted:2
> Progress: Sun, 03 May 2015 07:04:00+0100  Submitted:2
> Progress: Sun, 03 May 2015 07:04:30+0100  Submitted:2
>
> This is just repeated and does not seem to stop
>
> The log file has the following messages, which also repeat:
>
> 2015-05-03 07:08:22,401+0100 INFO RuntimeStats$ProgressTicker HeapMax: 
> 954728448, CrtHeap: 378535936, UsedHeap: 64559392, JVMThreads: 52
> 2015-05-03 07:08:23,401+0100 INFO RuntimeStats$ProgressTicker HeapMax: 
> 954728448, CrtHeap: 378535936, UsedHeap: 64559432, JVMThreads: 52
> 2015-05-03 07:08:23,709+0100 INFO AbstractQueuePoller Actively 
> monitored: 1, New: 0, Done: 0
> 2015-05-03 07:08:24,401+0100 INFO RuntimeStats$ProgressTicker HeapMax: 
> 954728448, CrtHeap: 378535936, UsedHeap: 64584080, JVMThreads: 52
> 2015-05-03 07:08:25,401+0100 INFO RuntimeStats$ProgressTicker HeapMax: 
> 954728448, CrtHeap: 378535936, UsedHeap: 64584120, JVMThreads: 52
> 2015-05-03 07:08:26,401+0100 INFO RuntimeStats$ProgressTicker HeapMax: 
> 954728448, CrtHeap: 378535936, UsedHeap: 64584160, JVMThreads: 52
> 2015-05-03 07:08:27,401+0100 INFO RuntimeStats$ProgressTicker HeapMax: 
> 954728448, CrtHeap: 378535936, UsedHeap: 64584200, JVMThreads: 52
> 2015-05-03 07:08:28,401+0100 INFO RuntimeStats$ProgressTicker HeapMax: 
> 954728448, CrtHeap: 378535936, UsedHeap: 64584240, JVMThreads: 52
> 2015-05-03 07:08:29,401+0100 INFO RuntimeStats$ProgressTicker HeapMax: 
> 954728448, CrtHeap: 378535936, UsedHeap: 64584280, JVMThreads: 52
>
>
> I did run this locally to see if anything is wrong with the submission 
> and it worked fine with proper output.
>
> Thank you.
>
> Mark
>
>
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20150503/f26e46a9/attachment.html>


More information about the Swift-user mailing list