[Swift-user] Queuing isssue with SWFIT on Orthros, APS cluster
Justin M Wozniak
wozniak at mcs.anl.gov
Fri Aug 22 08:30:23 CDT 2014
This message was held in our mailing list- is this still an issue?
On 6/24/2014 7:32 PM, Hemant Sharma wrote:
> Hi guys,
>
> I'm having a problem with queuing using SWIFT. I have a swift script,
> which should execute about 27000 iterations. In order to limit the
> initial memory size, I created the following config file:
>
> use.provider.staging=false
> provider.staging.pin.swiftfiles=false
> use.wrapper.staging=false
> status.mode=provider
> wrapperlog.always.transfer=true
> execution.retries=0
> lazy.errors=true
> sitedir.keep=true
> file.gc.enabled=false
> wrapper.parameter.mode=files
> foreach.max.threads=330
>
> Some times, when I execute the script, it starts with 320 active jobs
> (on 320 processors), but after some time, it just gets stuck with 330
> submitted jobs and none of them are active. Example output to screen is:
>
> Swift 0.94 swift-r6637 cog-r3742
>
> RunID: 20140624-1200-qke4z0v8
> Progress: time: Tue, 24 Jun 2014 12:00:30 -0500
> Progress: time: Tue, 24 Jun 2014 12:00:31 -0500 Selecting site:328
> Initializing site shared directory:1 Stage in:1
> Progress: time: Tue, 24 Jun 2014 12:00:32 -0500 Selecting site:10
> Stage in:277 Submitting:3 Submitted:40
> Progress: time: Tue, 24 Jun 2014 12:00:38 -0500 Selecting site:10
> Submitted:319 Active:1
> Progress: time: Tue, 24 Jun 2014 12:00:43 -0500 Selecting site:10
> Active:319 Checking status:1
> Progress: time: Tue, 24 Jun 2014 12:00:44 -0500 Selecting site:1
> Stage in:20 Active:208 Checking status:31 Stage out:70 Finished
> successfully:21
> Progress: time: Tue, 24 Jun 2014 12:00:45 -0500 Stage in:11
> Active:120 Checking status:10 Stage out:189 Finished successfully:31
> Progress: time: Tue, 24 Jun 2014 12:00:46 -0500 Stage in:25
> Active:117 Stage out:188 Finished successfully:54
> Progress: time: Tue, 24 Jun 2014 12:00:47 -0500 Initializing:1
> Selecting site:1 Stage in:46 Active:118 Stage out:164 Finished
> successfully:86
> Progress: time: Tue, 24 Jun 2014 12:00:48 -0500 Selecting site:2
> Stage in:102 Submitting:1 Submitted:2 Active:165 Checking
> status:1 Stage out:57 Finished successfully:199
> Progress: time: Tue, 24 Jun 2014 12:00:49 -0500 Submitted:5
> Active:324 Checking status:1 Finished successfully:265
> Progress: time: Tue, 24 Jun 2014 12:00:50 -0500 Submitted:12
> Active:317 Checking status:1 Finished successfully:272
> Progress: time: Tue, 24 Jun 2014 12:00:51 -0500 Submitted:22
> Active:307 Finished successfully:283
> Progress: time: Tue, 24 Jun 2014 12:00:52 -0500 Selecting site:1
> Stage in:13 Submitted:47 Active:223 Stage out:46 Finished
> successfully:321
> Progress: time: Tue, 24 Jun 2014 12:00:53 -0500 Stage in:28
> Submitted:73 Active:153 Stage out:75 Finished successfully:362
> Progress: time: Tue, 24 Jun 2014 12:00:55 -0500 Submitted:182
> Active:147 Checking status:1 Finished successfully:442
> Progress: time: Tue, 24 Jun 2014 12:00:57 -0500 Submitted:183
> Active:146 Checking status:1 Finished successfully:443
> Progress: time: Tue, 24 Jun 2014 12:01:00 -0500 Submitted:185
> Active:144 Checking status:1 Finished successfully:445
> Progress: time: Tue, 24 Jun 2014 12:01:01 -0500 Submitted:186
> Active:143 Checking status:1 Finished successfully:446
> Progress: time: Tue, 24 Jun 2014 12:01:02 -0500 Submitted:190
> Active:139 Checking status:1 Finished successfully:450
> Progress: time: Tue, 24 Jun 2014 12:01:05 -0500 Submitted:193
> Active:136 Checking status:1 Finished successfully:453
> Progress: time: Tue, 24 Jun 2014 12:01:07 -0500 Submitted:196
> Active:133 Checking status:1 Finished successfully:456
> Progress: time: Tue, 24 Jun 2014 12:01:09 -0500 Submitted:198
> Active:131 Checking status:1 Finished successfully:458
> Progress: time: Tue, 24 Jun 2014 12:01:10 -0500 Stage in:5
> Submitted:202 Active:63 Stage out:60 Finished successfully:467
> Progress: time: Tue, 24 Jun 2014 12:01:11 -0500 Submitted:273
> Active:56 Checking status:1 Finished successfully:533
> Progress: time: Tue, 24 Jun 2014 12:01:13 -0500 Submitted:282
> Active:47 Checking status:1 Finished successfully:542
> Progress: time: Tue, 24 Jun 2014 12:01:14 -0500 Submitting:1
> Submitted:292 Active:37 Finished successfully:553
> Progress: time: Tue, 24 Jun 2014 12:01:15 -0500 Submitted:298
> Active:31 Checking status:1 Finished successfully:558
> Progress: time: Tue, 24 Jun 2014 12:01:16 -0500 Submitted:305
> Active:24 Checking status:1 Finished successfully:565
> Progress: time: Tue, 24 Jun 2014 12:01:17 -0500 Submitted:307
> Active:22 Checking status:1 Finished successfully:567
> Progress: time: Tue, 24 Jun 2014 12:01:18 -0500 Submitted:313
> Active:16 Checking status:1 Finished successfully:573
> Progress: time: Tue, 24 Jun 2014 12:01:20 -0500 Submitted:315
> Active:14 Checking status:1 Finished successfully:575
> Progress: time: Tue, 24 Jun 2014 12:01:21 -0500 Submitted:317
> Active:12 Checking status:1 Finished successfully:577
> Progress: time: Tue, 24 Jun 2014 12:01:22 -0500 Submitted:319
> Active:10 Checking status:1 Finished successfully:579
> Progress: time: Tue, 24 Jun 2014 12:01:23 -0500 Submitted:320
> Active:9 Checking status:1 Finished successfully:580
> Progress: time: Tue, 24 Jun 2014 12:01:25 -0500 Submitted:323
> Active:6 Checking status:1 Finished successfully:583
> Progress: time: Tue, 24 Jun 2014 12:01:26 -0500 Submitted:324
> Active:5 Checking status:1 Finished successfully:584
> Progress: time: Tue, 24 Jun 2014 12:01:27 -0500 Submitted:325
> Active:4 Checking status:1 Finished successfully:585
> Progress: time: Tue, 24 Jun 2014 12:01:29 -0500 Submitted:326
> Active:3 Checking status:1 Finished successfully:586
> Progress: time: Tue, 24 Jun 2014 12:01:36 -0500 Submitted:327
> Active:2 Checking status:1 Finished successfully:587
> Progress: time: Tue, 24 Jun 2014 12:01:39 -0500 Submitted:328
> Active:1 Checking status:1 Finished successfully:588
> Progress: time: Tue, 24 Jun 2014 12:01:50 -0500 Submitted:329
> Checking status:1 Finished successfully:589
> Progress: time: Tue, 24 Jun 2014 12:02:00 -0500 Submitted:330
> Finished successfully:590
> Progress: time: Tue, 24 Jun 2014 12:02:30 -0500 Submitted:330
> Finished successfully:590
> Progress: time: Tue, 24 Jun 2014 12:03:00 -0500 Submitted:330
> Finished successfully:590
> Progress: time: Tue, 24 Jun 2014 12:03:30 -0500 Submitted:330
> Finished successfully:590
> Progress: time: Tue, 24 Jun 2014 12:04:00 -0500 Submitted:330
> Finished successfully:590
>
> The issue is not really reproducible, nor is the number of successful
> jobs. Any ideas how to solve this problem? I'm attaching the log file.
>
> Thanks,
> Hemant
>
> Hemant Sharma
> Post-doctoral Researcher
> Advanced Photon Source
> Argonne National Laboratory
> Lemont IL 60429
> USA
>
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
--
Justin M Wozniak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20140822/8479e7aa/attachment.html>
More information about the Swift-user
mailing list