[Swift-devel] Re: reproducible problem running under coasters on ranger from communicado
Mihael Hategan
hategan at mcs.anl.gov
Tue Apr 28 17:31:07 CDT 2009
[hategan at communicado coaster_logs]$ pwd
/home/wilde/oops/swift/output/rangeroutdir.20/coaster_logs
[hategan at communicado coaster_logs]$ cd coasters/
-bash: cd: coasters/: Permission denied
On Tue, 2009-04-28 at 17:19 -0500, Glen Max Hocky wrote:
> I had the following problem this morning and just recreated under mike's login.
> (showing him how to run the latest stuff and i wanted to see if this problem
> could be recreated)
>
> This is all with the latest svn version
> Hundreds of jobs were in the active state and running on an equiv number of
> cpus on ranger. All of the sudden, all but 100 switched to a failed state. Then
> the run proceeded fairly normally until it crashed with a "coaster failed to start"
> error.
>
> clips of errors below
>
> all logs in
> /home/wilde/oops/swift/output/rangeroutdir.20
> coaster logs in
> /home/wilde/oops/swift/output/rangeroutdir.20/coaster_logs
>
> ------------------------------------
>
> Progress: Selecting site:3 Submitted:784 Active:113 Finished successfully:9
> Progress: Selecting site:3 Submitted:512 Active:385 Finished successfully:9
> Progress: Selecting site:3 Submitted:379 Active:518 Finished successfully:9
> Progress: Selecting site:3 Submitted:337 Active:560 Finished successfully:9
> Progress: Selecting site:3 Submitted:337 Active:560 Finished successfully:9
> Progress: Selecting site:3 Submitted:337 Active:560 Finished successfully:9
> Progress: Selecting site:3 Submitted:337 Active:559 Finished successfully:9
> Failed but can retry:1
> Progress: Selecting site:3 Submitted:335 Active:559 Finished successfully:9
> Failed but can retry:3
> Progress: Selecting site:3 Submitted:335 Active:543 Finished successfully:9
> Failed but can retry:19
> Progress: Selecting site:3 Submitted:333 Active:543 Finished successfully:9
> Failed but can retry:21
> Progress: Selecting site:3 Submitted:333 Active:527 Finished successfully:9
> Failed but can retry:37
> Progress: Selecting site:3 Submitted:333 Active:495 Finished successfully:9
> Failed but can retry:69
> Progress: Selecting site:3 Submitted:332 Active:481 Finished successfully:9
> Failed but can retry:84
> Progress: Selecting site:3 Submitted:332 Active:479 Finished successfully:9
> Failed but can retry:86
> Progress: Selecting site:3 Submitted:331 Active:465 Finished successfully:9
> Failed but can retry:101
> Progress: Selecting site:3 Submitted:331 Active:463 Finished successfully:9
> Failed but can retry:103
> Progress: Selecting site:3 Submitted:330 Active:447 Finished successfully:9
> Failed but can retry:120
> Progress: Selecting site:3 Submitted:329 Active:433 Finished successfully:9
> Failed but can retry:135
> Progress: Selecting site:3 Submitted:329 Active:415 Finished successfully:9
> Failed but can retry:153
> Progress: Selecting site:3 Submitted:329 Active:399 Finished successfully:9
> Failed but can retry:169
> Progress: Selecting site:3 Submitted:329 Active:383 Finished successfully:9
> Failed but can retry:185
> Progress: Selecting site:3 Submitted:328 Active:367 Finished successfully:9
> Failed but can retry:202
> Progress: Selecting site:3 Submitted:327 Active:351 Finished successfully:9
> Failed but can retry:219
> Progress: Selecting site:3 Submitted:326 Active:336 Finished successfully:9
> Failed but can retry:235
> Progress: Selecting site:3 Submitted:326 Active:319 Finished successfully:9
> Failed but can retry:252
> Progress: Selecting site:3 Submitted:220 Active:408 Finished successfully:9
> Failed but can retry:269
> Progress: Selecting site:3 Submitted:219 Active:363 Finished successfully:9
> Failed but can retry:315
> Progress: Selecting site:3 Submitted:216 Active:334 Finished successfully:9
> Failed but can retry:347
> Progress: Selecting site:3 Submitted:214 Active:303 Finished successfully:9
> Failed but can retry:380
> Progress: Selecting site:3 Submitted:214 Active:287 Finished successfully:9
> Failed but can retry:396
> Progress: Selecting site:3 Submitted:214 Active:271 Finished successfully:9
> Failed but can retry:412
> Progress: Selecting site:3 Submitted:213 Active:255 Finished successfully:9
> Failed but can retry:429
> Progress: Selecting site:3 Submitted:213 Active:239 Finished successfully:9
> Failed but can retry:445
> Progress: Selecting site:3 Submitted:213 Active:223 Finished successfully:9
> Failed but can retry:461
> Progress: Selecting site:3 Submitted:213 Active:207 Finished successfully:9
> Failed but can retry:477
> Progress: Selecting site:3 Submitted:212 Active:207 Finished successfully:9
> Failed but can retry:478
> Progress: Selecting site:3 Submitted:212 Active:175 Finished successfully:9
> Failed but can retry:510
> Progress: Selecting site:3 Submitted:211 Active:143 Finished successfully:9
> Failed but can retry:543
> Progress: Selecting site:3 Submitted:211 Active:112 Finished successfully:9
> Failed but can retry:574
> Progress: Selecting site:3 Submitted:211 Active:111 Finished successfully:9
> Failed but can retry:575
> Progress: Selecting site:3 Submitted:211 Active:96 Finished successfully:9
> Failed but can retry:590
> Progress: Selecting site:3 Submitted:211 Active:96 Finished successfully:9
> Failed but can retry:590
>
>
>
> -----------------------------------
> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/a on
> ranger
> Progress: Submitted:801 Active:44 Finished successfully:61 Failed but can
> retry:3
> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/x on
> ranger
> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/l on
> ranger
> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/v on
> ranger
> Progress: Stage in:1 Submitted:802 Active:42 Checking status:1 Finished
> successfully:61 Failed but can retry:2
> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/3 on
> ranger
> Execution failed:
> Exception in runramaSpeed:
> Arguments: [input/fasta/T1af7.fasta,
> home/wilde/oops/swift/output/rangeroutdir.20/T1af7/T1af7.ST25.TU200.000
> 0.secseq, input/native/T1af7.pdb, input/rama/T1af7.rama_map, home/wi
> lde/oops/swift/output/rangeroutdir.20/T1af7//ST25.TU200/0000/01/64/T1af
> 7.ST25.TU200.0000.0164.pdt,
> home/wilde/oops/swift/output/rangeroutdir.20/T1af7//ST25.TU200/0000/01/
> 64/T1a
> f7.ST25.TU200.0000.0164.rmsd, 164, DEFAULT_INIT_TEMP_=_25,
> TEMP_UPDATE_INTERVAL_=_200, MAX_NUMBER_OF_ANNEALING_STEPS_=_0,
> KILL_TIME_=_30]
> Host: ranger
> Directory: oops-20090428-1642-ils1yrj8/jobs/3/runramaSpeed-383qd2aj
> stderr.txt:
>
> stdout.txt:
>
> ----
>
> Caused by:
> Failed to start worker: Worker ended prematurely
> Cleaning up...
> Shutting down service at https://129.114.50.163:49375
> Got channel MetaChannel: 3994917 -> GSSSChannel-null(1)
> - Done
> ------------------------------------------
More information about the Swift-devel
mailing list