[Swift-devel] mystery runs on ucanl
skenny at uchicago.edu
skenny at uchicago.edu
Tue Jul 29 15:19:50 CDT 2008
>> because there is still this odd behavior of jobs
>> remaining in the queue even after they've been killed, i'm
>> tempted to blame pbs (gotta blame someone ;) also, i'm getting
>> emails from pbs like this:
>>
>> PBS Job Id: 1759910.tg-master.uc.teragrid.org
>> Job Name: STDIN
>> Exec host: tg-c054/0
>> Aborted by PBS Server
>> Job cannot be executed
>> See Administrator for help
>>
>> and the swift log simply gives "Failed Error code: 271,
>> ProcessDied"
>
>Not the same kind of failures. So we may be dealing with
multiple issues
>here.
so, in looking back at the pbs notices from a run on 7/23-24 i
actually see about 25 failures indicating tg-c054 as the node,
so i may have jumped the gun on there not being a bum node
involved...i'm also seeing that the batch i submitted today
the failures were either going to 32-bit nodes (which i expect
to fail) or to tg-c054...sooo, that 054 is looking like a
culprit for at least some of what we're seeing.
More information about the Swift-devel
mailing list