[Swift-devel] mystery runs on ucanl

skenny at uchicago.edu skenny at uchicago.edu
Tue Jul 29 15:19:50 CDT 2008


>> because there is still this odd behavior of jobs
>> remaining in the queue even after they've been killed, i'm
>> tempted to blame pbs (gotta blame someone ;) also, i'm getting
>> emails from pbs like this:
>> 
>> PBS Job Id: 1759910.tg-master.uc.teragrid.org
>> Job Name:   STDIN
>> Exec host:  tg-c054/0
>> Aborted by PBS Server 
>> Job cannot be executed
>> See Administrator for help
>> 
>> and the swift log simply gives "Failed Error code: 271,
>> ProcessDied"
>
>Not the same kind of failures. So we may be dealing with
multiple issues
>here.

so, in looking back at the pbs notices from a run on 7/23-24 i
actually see about 25 failures indicating tg-c054 as the node,
so i may have jumped the gun on there not being a bum node
involved...i'm also seeing that the batch i submitted today
the failures were either going to 32-bit nodes (which i expect
to fail) or to tg-c054...sooo, that 054 is looking like a
culprit for at least some of what we're seeing. 



More information about the Swift-devel mailing list