[Swift-devel] Re: reproducible problem running under coasters on ranger from communicado
Michael Wilde
wilde at mcs.anl.gov
Tue Apr 28 18:08:47 CDT 2009
Should be readable now - sorry.
I think when Glen copied the files they retained their perms from the
originals on Ranger.
Glen - need to make the files readable when you copy 'em.
On 4/28/09 5:31 PM, Mihael Hategan wrote:
> [hategan at communicado coaster_logs]$ pwd
> /home/wilde/oops/swift/output/rangeroutdir.20/coaster_logs
> [hategan at communicado coaster_logs]$ cd coasters/
> -bash: cd: coasters/: Permission denied
>
>
> On Tue, 2009-04-28 at 17:19 -0500, Glen Max Hocky wrote:
>> I had the following problem this morning and just recreated under mike's login.
>> (showing him how to run the latest stuff and i wanted to see if this problem
>> could be recreated)
>>
>> This is all with the latest svn version
>> Hundreds of jobs were in the active state and running on an equiv number of
>> cpus on ranger. All of the sudden, all but 100 switched to a failed state. Then
>> the run proceeded fairly normally until it crashed with a "coaster failed to start"
>> error.
>>
>> clips of errors below
>>
>> all logs in
>> /home/wilde/oops/swift/output/rangeroutdir.20
>> coaster logs in
>> /home/wilde/oops/swift/output/rangeroutdir.20/coaster_logs
>>
>> ------------------------------------
>>
>> Progress: Selecting site:3 Submitted:784 Active:113 Finished successfully:9
>> Progress: Selecting site:3 Submitted:512 Active:385 Finished successfully:9
>> Progress: Selecting site:3 Submitted:379 Active:518 Finished successfully:9
>> Progress: Selecting site:3 Submitted:337 Active:560 Finished successfully:9
>> Progress: Selecting site:3 Submitted:337 Active:560 Finished successfully:9
>> Progress: Selecting site:3 Submitted:337 Active:560 Finished successfully:9
>> Progress: Selecting site:3 Submitted:337 Active:559 Finished successfully:9
>> Failed but can retry:1
>> Progress: Selecting site:3 Submitted:335 Active:559 Finished successfully:9
>> Failed but can retry:3
>> Progress: Selecting site:3 Submitted:335 Active:543 Finished successfully:9
>> Failed but can retry:19
>> Progress: Selecting site:3 Submitted:333 Active:543 Finished successfully:9
>> Failed but can retry:21
>> Progress: Selecting site:3 Submitted:333 Active:527 Finished successfully:9
>> Failed but can retry:37
>> Progress: Selecting site:3 Submitted:333 Active:495 Finished successfully:9
>> Failed but can retry:69
>> Progress: Selecting site:3 Submitted:332 Active:481 Finished successfully:9
>> Failed but can retry:84
>> Progress: Selecting site:3 Submitted:332 Active:479 Finished successfully:9
>> Failed but can retry:86
>> Progress: Selecting site:3 Submitted:331 Active:465 Finished successfully:9
>> Failed but can retry:101
>> Progress: Selecting site:3 Submitted:331 Active:463 Finished successfully:9
>> Failed but can retry:103
>> Progress: Selecting site:3 Submitted:330 Active:447 Finished successfully:9
>> Failed but can retry:120
>> Progress: Selecting site:3 Submitted:329 Active:433 Finished successfully:9
>> Failed but can retry:135
>> Progress: Selecting site:3 Submitted:329 Active:415 Finished successfully:9
>> Failed but can retry:153
>> Progress: Selecting site:3 Submitted:329 Active:399 Finished successfully:9
>> Failed but can retry:169
>> Progress: Selecting site:3 Submitted:329 Active:383 Finished successfully:9
>> Failed but can retry:185
>> Progress: Selecting site:3 Submitted:328 Active:367 Finished successfully:9
>> Failed but can retry:202
>> Progress: Selecting site:3 Submitted:327 Active:351 Finished successfully:9
>> Failed but can retry:219
>> Progress: Selecting site:3 Submitted:326 Active:336 Finished successfully:9
>> Failed but can retry:235
>> Progress: Selecting site:3 Submitted:326 Active:319 Finished successfully:9
>> Failed but can retry:252
>> Progress: Selecting site:3 Submitted:220 Active:408 Finished successfully:9
>> Failed but can retry:269
>> Progress: Selecting site:3 Submitted:219 Active:363 Finished successfully:9
>> Failed but can retry:315
>> Progress: Selecting site:3 Submitted:216 Active:334 Finished successfully:9
>> Failed but can retry:347
>> Progress: Selecting site:3 Submitted:214 Active:303 Finished successfully:9
>> Failed but can retry:380
>> Progress: Selecting site:3 Submitted:214 Active:287 Finished successfully:9
>> Failed but can retry:396
>> Progress: Selecting site:3 Submitted:214 Active:271 Finished successfully:9
>> Failed but can retry:412
>> Progress: Selecting site:3 Submitted:213 Active:255 Finished successfully:9
>> Failed but can retry:429
>> Progress: Selecting site:3 Submitted:213 Active:239 Finished successfully:9
>> Failed but can retry:445
>> Progress: Selecting site:3 Submitted:213 Active:223 Finished successfully:9
>> Failed but can retry:461
>> Progress: Selecting site:3 Submitted:213 Active:207 Finished successfully:9
>> Failed but can retry:477
>> Progress: Selecting site:3 Submitted:212 Active:207 Finished successfully:9
>> Failed but can retry:478
>> Progress: Selecting site:3 Submitted:212 Active:175 Finished successfully:9
>> Failed but can retry:510
>> Progress: Selecting site:3 Submitted:211 Active:143 Finished successfully:9
>> Failed but can retry:543
>> Progress: Selecting site:3 Submitted:211 Active:112 Finished successfully:9
>> Failed but can retry:574
>> Progress: Selecting site:3 Submitted:211 Active:111 Finished successfully:9
>> Failed but can retry:575
>> Progress: Selecting site:3 Submitted:211 Active:96 Finished successfully:9
>> Failed but can retry:590
>> Progress: Selecting site:3 Submitted:211 Active:96 Finished successfully:9
>> Failed but can retry:590
>>
>>
>>
>> -----------------------------------
>> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/a on
>> ranger
>> Progress: Submitted:801 Active:44 Finished successfully:61 Failed but can
>> retry:3
>> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/x on
>> ranger
>> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/l on
>> ranger
>> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/v on
>> ranger
>> Progress: Stage in:1 Submitted:802 Active:42 Checking status:1 Finished
>> successfully:61 Failed but can retry:2
>> Failed to transfer wrapper log from oops-20090428-1642-ils1yrj8/info/3 on
>> ranger
>> Execution failed:
>> Exception in runramaSpeed:
>> Arguments: [input/fasta/T1af7.fasta,
>> home/wilde/oops/swift/output/rangeroutdir.20/T1af7/T1af7.ST25.TU200.000
>> 0.secseq, input/native/T1af7.pdb, input/rama/T1af7.rama_map, home/wi
>> lde/oops/swift/output/rangeroutdir.20/T1af7//ST25.TU200/0000/01/64/T1af
>> 7.ST25.TU200.0000.0164.pdt,
>> home/wilde/oops/swift/output/rangeroutdir.20/T1af7//ST25.TU200/0000/01/
>> 64/T1a
>> f7.ST25.TU200.0000.0164.rmsd, 164, DEFAULT_INIT_TEMP_=_25,
>> TEMP_UPDATE_INTERVAL_=_200, MAX_NUMBER_OF_ANNEALING_STEPS_=_0,
>> KILL_TIME_=_30]
>> Host: ranger
>> Directory: oops-20090428-1642-ils1yrj8/jobs/3/runramaSpeed-383qd2aj
>> stderr.txt:
>>
>> stdout.txt:
>>
>> ----
>>
>> Caused by:
>> Failed to start worker: Worker ended prematurely
>> Cleaning up...
>> Shutting down service at https://129.114.50.163:49375
>> Got channel MetaChannel: 3994917 -> GSSSChannel-null(1)
>> - Done
>> ------------------------------------------
>
More information about the Swift-devel
mailing list