[Swift-devel] mystery runs on ucanl
skenny at uchicago.edu
skenny at uchicago.edu
Tue Jul 29 14:34:28 CDT 2008
>> >> yes (see below) and SOME of the jobs in the workflow do
>> >> complete when we submit the whole workflow to ucanl.
>> >
>> >Indeed. It seems like roughly half of them work and the other
>> half
>> >break. Could this be an ia32/ia64 issue? Like python being
>> compiled for
>> >the wrong platform or something?
well, i thought that sounded pretty likely (apparently some
jobs were going to 32-bit machines even though 64 was
specified in the sites file). however, i've just sent a batch
to the site and am getting failures on 64-bit nodes as
well (and on varying nodes, so not just 1 or 2 bum
nodes)...because there is still this odd behavior of jobs
remaining in the queue even after they've been killed, i'm
tempted to blame pbs (gotta blame someone ;) also, i'm getting
emails from pbs like this:
PBS Job Id: 1759910.tg-master.uc.teragrid.org
Job Name: STDIN
Exec host: tg-c054/0
Aborted by PBS Server
Job cannot be executed
See Administrator for help
and the swift log simply gives "Failed Error code: 271,
ProcessDied"
hence, i'm copying help at teragrid on this...if there are any
other tests i can run to try and narrow down the bug let me
know. i've tried submitting several globusrun-ws jobs but
haven't gotten an error that way as of yet.
More information about the Swift-devel
mailing list