[Swift-devel] mystery runs on ucanl

Ti Leggett leggett at ci.uchicago.edu
Tue Jul 29 15:50:38 CDT 2008


This looks like you're trying to run ia64 code on an ia32 machine.  
Double verify that you are in fact requesting the right type of node  
(ia64-compute for ia64 and ia32-compute for ia32). If you don't, you  
will arbitrarily be placed on an available node, which could be either  
architecture.

On Jul 29, 2008, at 2:43 PM, Michael Wilde wrote:

>
> On 7/29/08 2:34 PM, skenny at uchicago.edu wrote:
>>>>>> yes (see below) and SOME of the jobs in the workflow do
>>>>>> complete when we submit the whole workflow to ucanl.
>>>>> Indeed. It seems like roughly half of them work and the other
>>>> half
>>>>> break. Could this be an ia32/ia64 issue? Like python being
>>>> compiled for
>>>>> the wrong platform or something?
>> well, i thought that sounded pretty likely (apparently some
>> jobs were going to 32-bit machines even though 64 was
>> specified in the sites file).
>
> Is it possible that the property was mis-spelled? I recall some  
> issues with this profile attribute in the past, when you first  
> started running Swift last Oct-Nov.
>
>> however, i've just sent a batch
>> to the site and am getting failures on 64-bit nodes as
>> well (and on varying nodes, so not just 1 or 2 bum
>> nodes)...because there is still this odd behavior of jobs
>> remaining in the queue even after they've been killed, i'm
>> tempted to blame pbs (gotta blame someone ;) also, i'm getting
>> emails from pbs like this:
>> PBS Job Id: 1759910.tg-master.uc.teragrid.org
>> Job Name:   STDIN
>> Exec host:  tg-c054/0
>> Aborted by PBS Server Job cannot be executed
>> See Administrator for help
>> and the swift log simply gives "Failed Error code: 271,
>> ProcessDied"
>
> I also recall some similar issues on UC Teragrid last Nov (2007) as  
> we were preparing Angle runs for SC07. Ti was involved in that  
> debugging and had given us PBS diagnostic commands to capture log  
> data on the problem at the time.  Ti, can you recall the details?
>
> - Mike
>
>> hence, i'm copying help at teragrid on this...if there are any
>> other tests i can run to try and narrow down the bug let me
>> know. i've tried submitting several globusrun-ws jobs but
>> haven't gotten an error that way as of yet.  
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel




More information about the Swift-devel mailing list