[Swift-devel] mystery runs on ucanl

skenny at uchicago.edu skenny at uchicago.edu
Tue Jul 29 14:34:28 CDT 2008


>> >> yes (see below) and SOME of the jobs in the workflow do
>> >> complete when we submit the whole workflow to ucanl.
>> >
>> >Indeed. It seems like roughly half of them work and the other
>> half
>> >break. Could this be an ia32/ia64 issue? Like python being
>> compiled for
>> >the wrong platform or something?

well, i thought that sounded pretty likely (apparently some
jobs were going to 32-bit machines even though 64 was
specified in the sites file). however, i've just sent a batch
to the site and am getting failures on 64-bit nodes as
well (and on varying nodes, so not just 1 or 2 bum
nodes)...because there is still this odd behavior of jobs
remaining in the queue even after they've been killed, i'm
tempted to blame pbs (gotta blame someone ;) also, i'm getting
emails from pbs like this:

PBS Job Id: 1759910.tg-master.uc.teragrid.org
Job Name:   STDIN
Exec host:  tg-c054/0
Aborted by PBS Server 
Job cannot be executed
See Administrator for help

and the swift log simply gives "Failed Error code: 271,
ProcessDied"

hence, i'm copying help at teragrid on this...if there are any
other tests i can run to try and narrow down the bug let me
know. i've tried submitting several globusrun-ws jobs but
haven't gotten an error that way as of yet. 



More information about the Swift-devel mailing list