[Swift-devel] reproducible problem running under coasters on ranger from communicado
Glen Max Hocky
hockyg at uchicago.edu
Tue Apr 28 17:19:31 CDT 2009
I had the following problem this morning and just recreated under mike's login.
(showing him how to run the latest stuff and i wanted to see if this problem
could be recreated)
This is all with the latest svn version
Hundreds of jobs were in the active state and running on an equiv number of
cpus on ranger. All of the sudden, all but 100 switched to a failed state. Then
the run proceeded fairly normally until it crashed with a "coaster failed to start"
error.
clips of errors below
all logs in
/home/wilde/oops/swift/output/rangeroutdir.20
coaster logs in
/home/wilde/oops/swift/output/rangeroutdir.20/coaster_logs
------------------------------------
Progress: Selecting site:3 Submitted:784 Active:113 Finished successfully:9
Progress: Selecting site:3 Submitted:512 Active:385 Finished successfully:9
Progress: Selecting site:3 Submitted:379 Active:518 Finished successfully:9
Progress: Selecting site:3 Submitted:337 Active:560 Finished successfully:9
Progress: Selecting site:3 Submitted:337 Active:560 Finished successfully:9
Progress: Selecting site:3 Submitted:337 Active:560 Finished successfully:9
Progress: Selecting site:3 Submitted:337 Active:559 Finished successfully:9
Failed but can retry:1
Progress: Selecting site:3 Submitted:335 Active:559 Finished successfully:9
Failed but can retry:3
Progress: Selecting site:3 Submitted:335 Active:543 Finished successfully:9
Failed but can retry:19
Progress: Selecting site:3 Submitted:333 Active:543 Finished successfully:9
Failed but can retry:21
Progress: Selecting site:3 Submitted:333 Active:527 Finished successfully:9
Failed but can retry:37
Progress: Selecting site:3 Submitted:333 Active:495 Finished successfully:9
Failed but can retry:69
Progress: Selecting site:3 Submitted:332 Active:481 Finished successfully:9
Failed but can retry:84
Progress: Selecting site:3 Submitted:332 Active:479 Finished successfully:9
Failed but can retry:86
Progress: Selecting site:3 Submitted:331 Active:465 Finished successfully:9
Failed but can retry:101
Progress: Selecting site:3 Submitted:331 Active:463 Finished successfully:9
Failed but can retry:103
Progress: Selecting site:3 Submitted:330 Active:447 Finished successfully:9
Failed but can retry:120
Progress: Selecting site:3 Submitted:329 Active:433 Finished successfully:9
Failed but can retry:135
Progress: Selecting site:3 Submitted:329 Active:415 Finished successfully:9
Failed but can retry:153
Progress: Selecting site:3 Submitted:329 Active:399 Finished successfully:9
Failed but can retry:169
Progress: Selecting site:3 Submitted:329 Active:383 Finished successfully:9
Failed but can retry:185
Progress: Selecting site:3 Submitted:328 Active:367 Finished successfully:9
Failed but can retry:202
Progress: Selecting site:3 Submitted:327 Active:351 Finished successfully:9
Failed but can retry:219
Progress: Selecting site:3 Submitted:326 Active:336 Finished successfully:9
Failed but can retry:235
Progress: Selecting site:3 Submitted:326 Active:319 Finished successfully:9
Failed but can retry:252
Progress: Selecting site:3 Submitted:220 Active:408 Finished successfully:9
Failed but can retry:269
Progress: Selecting site:3 Submitted:219 Active:363 Finished successfully:9
Failed but can retry:315
Progress: Selecting site:3 Submitted:216 Active:334 Finished successfully:9
Failed but can retry:347
Progress: Selecting site:3 Submitted:214 Active:303 Finished successfully:9
Failed but can retry:380
Progress: Selecting site:3 Submitted:214 Active:287 Finished successfully:9
Failed but can retry:396
Progress: Selecting site:3 Submitted:214 Active:271 Finished successfully:9
Failed but can retry:412
Progress: Selecting site:3 Submitted:213 Active:255 Finished successfully:9
Failed but can retry:429
Progress: Selecting site:3 Submitted:213 Active:239 Finished successfully:9
Failed but can retry:445
Progress: Selecting site:3 Submitted:213 Active:223 Finished successfully:9
Failed but can retry:461
Progress: Selecting site:3 Submitted:213 Active:207 Finished successfully:9
Failed but can retry:477
Progress: Selecting site:3 Submitted:212 Active:207 Finished successfully:9
Failed but can retry:478
Progress: Selecting site:3 Submitted:212 Active:175 Finished successfully:9
Failed but can retry:510
Progress: Selecting site:3 Submitted:211 Active:143 Finished successfully:9
Failed but can retry:543
Progress: Selecting site:3 Submitted:211 Active:112 Finished successfully:9
Failed but can retry:574
Progress: Selecting site:3 Submitted:211 Active:111 Finished successfully:9
Failed but can retry:575
Progress: Selecting site:3 Submitted:211 Active:96 Finished successfully:9
Failed but can retry:590
Progress: Selecting site:3 Submitted:211 Active:96 Finished successfully:9
Failed but can retry:590
-----------------------------------
Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/a on
ranger
Progress: Submitted:801 Active:44 Finished successfully:61 Failed but can
retry:3
Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/x on
ranger
Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/l on
ranger
Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/v on
ranger
Progress: Stage in:1 Submitted:802 Active:42 Checking status:1 Finished
successfully:61 Failed but can retry:2
Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/3 on
ranger
Execution failed:
Exception in runramaSpeed:
Arguments: [input/fasta/T1af7.fasta,
home/wilde/oops/swift/output/rangeroutdir.20/T1af7/T1af7.ST25.TU200.000
0.secseq, input/native/T1af7.pdb, input/rama/T1af7.rama_map, home/wi
lde/oops/swift/output/rangeroutdir.20/T1af7//ST25.TU200/0000/01/64/T1af
7.ST25.TU200.0000.0164.pdt,
home/wilde/oops/swift/output/rangeroutdir.20/T1af7//ST25.TU200/0000/01/
64/T1a
f7.ST25.TU200.0000.0164.rmsd, 164, DEFAULT_INIT_TEMP_=_25,
TEMP_UPDATE_INTERVAL_=_200, MAX_NUMBER_OF_ANNEALING_STEPS_=_0,
KILL_TIME_=_30]
Host: ranger
Directory: oops-20090428-1642-ils1yrj8/jobs/3/runramaSpeed-383qd2aj
stderr.txt:
stdout.txt:
----
Caused by:
Failed to start worker: Worker ended prematurely
Cleaning up...
Shutting down service at https://129.114.50.163:49375
Got channel MetaChannel: 3994917 -> GSSSChannel-null(1)
- Done
------------------------------------------
More information about the Swift-devel
mailing list