[Swift-devel] mystery runs on ucanl & ncsa--warning very long email, sorry!

Mihael Hategan hategan at mcs.anl.gov
Thu Jul 24 18:02:01 CDT 2008


On Thu, 2008-07-24 at 17:49 -0500, skenny at uchicago.edu wrote:
> >On Thu, 2008-07-24 at 17:32 -0500, skenny at uchicago.edu wrote:
> >> yes (see below) and SOME of the jobs in the workflow do
> >> complete when we submit the whole workflow to ucanl.
> >
> >Indeed. It seems like roughly half of them work and the other
> half
> >break. Could this be an ia32/ia64 issue? Like python being
> compiled for
> >the wrong platform or something?
> 
> hmm, not quite sure i follow, since we're only sending to ia64
> on this run...how can i test?

Although it would be bash failing, since we don't get to the wrapper
script. I'm thinking instead of /bin/hostname you could try /bin/bash
-c /bin/hostname. Repeatedly. With globusrun-ws.

> 
> >> unfortunately i can't test anything on ncsa right now 'cause
> >> it's down. 
> >
> >It being down would generally prevent swift from being able
> to run jobs
> >there. Which is probably what happened the week before.
> 
> ha ha, what swift can't run jobs on a site that's down?

As strange as it may sound, it can't.

> lame! heh, actually we've had a couple of runs now where we
> see the behavior i described on ncsa--e.g. a few jobs
> completing but some failing and an eventual decline. though,
> it's true the site's been up and down quite a bit over the
> past few weeks so could be indicative of something else wrong
> entirely. incidentally, i told them a couple weeks
> ago i was having trouble submitting to gram4 so we switched
> back to gram2 and it *seemed* to be working...for a while.
> 
> well, we're trying on yet another site now so if we see more
> of the same we'll know we need to do *something* on our end. 

May I (again) suggest not storing all the eggs in one basket if eggs are
the only food you can have for lunch?





More information about the Swift-devel mailing list