[Swift-devel] Re: reproducible problem running under coasters on ranger from communicado

Mihael Hategan hategan at mcs.anl.gov
Tue Apr 28 17:31:07 CDT 2009


[hategan at communicado coaster_logs]$ pwd
/home/wilde/oops/swift/output/rangeroutdir.20/coaster_logs
[hategan at communicado coaster_logs]$ cd coasters/
-bash: cd: coasters/: Permission denied


On Tue, 2009-04-28 at 17:19 -0500, Glen Max Hocky wrote:
> I had the following problem this morning and just recreated under mike's login.
> (showing him how to run the latest stuff and i wanted to see if this problem 
> could be recreated)
> 
> This is all with the latest svn version
> Hundreds of jobs were in the active state and running on an equiv number of 
> cpus on ranger. All of the sudden, all but 100 switched to a failed state. Then 
> the run proceeded fairly normally until it crashed with a "coaster failed to start" 
> error.
> 
> clips of errors below
> 
> all logs in 
> /home/wilde/oops/swift/output/rangeroutdir.20
> coaster logs in 
> /home/wilde/oops/swift/output/rangeroutdir.20/coaster_logs
> 
> ------------------------------------
> 
> Progress:  Selecting site:3  Submitted:784  Active:113  Finished successfully:9
> Progress:  Selecting site:3  Submitted:512  Active:385  Finished successfully:9
> Progress:  Selecting site:3  Submitted:379  Active:518  Finished successfully:9
> Progress:  Selecting site:3  Submitted:337  Active:560  Finished successfully:9
> Progress:  Selecting site:3  Submitted:337  Active:560  Finished successfully:9
> Progress:  Selecting site:3  Submitted:337  Active:560  Finished successfully:9
> Progress:  Selecting site:3  Submitted:337  Active:559  Finished successfully:9 
> Failed but can retry:1
> Progress:  Selecting site:3  Submitted:335  Active:559  Finished successfully:9 
> Failed but can retry:3
> Progress:  Selecting site:3  Submitted:335  Active:543  Finished successfully:9 
> Failed but can retry:19
> Progress:  Selecting site:3  Submitted:333  Active:543  Finished successfully:9 
> Failed but can retry:21
> Progress:  Selecting site:3  Submitted:333  Active:527  Finished successfully:9 
> Failed but can retry:37
> Progress:  Selecting site:3  Submitted:333  Active:495  Finished successfully:9 
> Failed but can retry:69
> Progress:  Selecting site:3  Submitted:332  Active:481  Finished successfully:9 
> Failed but can retry:84
> Progress:  Selecting site:3  Submitted:332  Active:479  Finished successfully:9 
> Failed but can retry:86
> Progress:  Selecting site:3  Submitted:331  Active:465  Finished successfully:9 
> Failed but can retry:101
> Progress:  Selecting site:3  Submitted:331  Active:463  Finished successfully:9 
> Failed but can retry:103
> Progress:  Selecting site:3  Submitted:330  Active:447  Finished successfully:9 
> Failed but can retry:120
> Progress:  Selecting site:3  Submitted:329  Active:433  Finished successfully:9 
> Failed but can retry:135
> Progress:  Selecting site:3  Submitted:329  Active:415  Finished successfully:9 
> Failed but can retry:153
> Progress:  Selecting site:3  Submitted:329  Active:399  Finished successfully:9 
> Failed but can retry:169
> Progress:  Selecting site:3  Submitted:329  Active:383  Finished successfully:9 
> Failed but can retry:185
> Progress:  Selecting site:3  Submitted:328  Active:367  Finished successfully:9 
> Failed but can retry:202
> Progress:  Selecting site:3  Submitted:327  Active:351  Finished successfully:9 
> Failed but can retry:219
> Progress:  Selecting site:3  Submitted:326  Active:336  Finished successfully:9 
> Failed but can retry:235
> Progress:  Selecting site:3  Submitted:326  Active:319  Finished successfully:9 
> Failed but can retry:252
> Progress:  Selecting site:3  Submitted:220  Active:408  Finished successfully:9 
> Failed but can retry:269
> Progress:  Selecting site:3  Submitted:219  Active:363  Finished successfully:9 
> Failed but can retry:315
> Progress:  Selecting site:3  Submitted:216  Active:334  Finished successfully:9 
> Failed but can retry:347
> Progress:  Selecting site:3  Submitted:214  Active:303  Finished successfully:9 
> Failed but can retry:380
> Progress:  Selecting site:3  Submitted:214  Active:287  Finished successfully:9 
> Failed but can retry:396
> Progress:  Selecting site:3  Submitted:214  Active:271  Finished successfully:9 
> Failed but can retry:412
> Progress:  Selecting site:3  Submitted:213  Active:255  Finished successfully:9 
> Failed but can retry:429
> Progress:  Selecting site:3  Submitted:213  Active:239  Finished successfully:9 
> Failed but can retry:445
> Progress:  Selecting site:3  Submitted:213  Active:223  Finished successfully:9 
> Failed but can retry:461
> Progress:  Selecting site:3  Submitted:213  Active:207  Finished successfully:9 
> Failed but can retry:477
> Progress:  Selecting site:3  Submitted:212  Active:207  Finished successfully:9 
> Failed but can retry:478
> Progress:  Selecting site:3  Submitted:212  Active:175  Finished successfully:9 
> Failed but can retry:510
> Progress:  Selecting site:3  Submitted:211  Active:143  Finished successfully:9 
> Failed but can retry:543
> Progress:  Selecting site:3  Submitted:211  Active:112  Finished successfully:9 
> Failed but can retry:574
> Progress:  Selecting site:3  Submitted:211  Active:111  Finished successfully:9 
> Failed but can retry:575
> Progress:  Selecting site:3  Submitted:211  Active:96  Finished successfully:9 
> Failed but can retry:590
> Progress:  Selecting site:3  Submitted:211  Active:96  Finished successfully:9 
> Failed but can retry:590
> 
> 
> 
> -----------------------------------
> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/a on 
> ranger
> Progress:  Submitted:801  Active:44  Finished successfully:61 Failed but can 
> retry:3
> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/x on 
> ranger
> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/l on 
> ranger
> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/v on 
> ranger
> Progress:  Stage in:1  Submitted:802  Active:42  Checking status:1  Finished 
> successfully:61 Failed but can retry:2
> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/3 on 
> ranger
> Execution failed:
>         Exception in runramaSpeed:
> Arguments: [input/fasta/T1af7.fasta, 
> home/wilde/oops/swift/output/rangeroutdir.20/T1af7/T1af7.ST25.TU200.000
> 0.secseq, input/native/T1af7.pdb, input/rama/T1af7.rama_map, home/wi
> lde/oops/swift/output/rangeroutdir.20/T1af7//ST25.TU200/0000/01/64/T1af
> 7.ST25.TU200.0000.0164.pdt, 
> home/wilde/oops/swift/output/rangeroutdir.20/T1af7//ST25.TU200/0000/01/
> 64/T1a
> f7.ST25.TU200.0000.0164.rmsd, 164, DEFAULT_INIT_TEMP_=_25, 
> TEMP_UPDATE_INTERVAL_=_200, MAX_NUMBER_OF_ANNEALING_STEPS_=_0, 
> KILL_TIME_=_30]
> Host: ranger
> Directory: oops-20090428-1642-ils1yrj8/jobs/3/runramaSpeed-383qd2aj
> stderr.txt: 
> 
> stdout.txt: 
> 
> ----
> 
> Caused by:
>         Failed to start worker: Worker ended prematurely
> Cleaning up...
> Shutting down service at https://129.114.50.163:49375
> Got channel MetaChannel: 3994917 -> GSSSChannel-null(1)
> - Done
> ------------------------------------------




More information about the Swift-devel mailing list