<div dir="ltr">it's ucanl (not ncsa) that has been completing a few and declining, e.g.<br><br>Progress:  Initializing:73 Selecting site:6922 Executing:5<br>Mediator completed<br>Progress:  Initializing:73 Selecting site:6922 Stage out:4 Finished successfully:1<br>

Mediator completed<br>Mediator completed<br>Mediator completed<br>Mediator completed<br>Progress:  Initializing:73 Selecting site:6916 Executing:5 Finished successfully:5 Failed but can retry:1<br>Failed to transfer wrapper log from PermFriedman-20080724-1033-<div dir="ltr">
7eg450y8/info/z/ANLUCTERAGRID64<br>
Failed to transfer wrapper log from PermFriedman-20080724-1033-7eg450y8/info/1/ANLUCTERAGRID64<br>Failed to transfer wrapper log from PermFriedman-20080724-1033-7eg450y8/info/3/ANLUCTERAGRID64<br>Progress:  Initializing:73 Selecting site:6918 Executing:2 Finished successfully:5 Failed but can retry:2<br>

Failed to transfer wrapper log from PermFriedman-20080724-1033-7eg450y8/info/2/ANLUCTERAGRID64<br>Progress:  Initializing:73 Selecting site:6919 Executing:2 Finished successfully:5 Failed but can retry:1<br>Failed to transfer wrapper log from PermFriedman-20080724-1033-7eg450y8/info/9/ANLUCTERAGRID64<br>

Progress:  Initializing:73 Selecting site:6919 Executing:2 Finished successfully:5 Failed but can retry:1<br>Failed to transfer wrapper log from PermFriedman-20080724-1033-7eg450y8/info/b/ANLUCTERAGRID64<br>Progress:  Initializing:73 Selecting site:6919 Executing:3 Finished successfully:5<br>

Failed to transfer wrapper log from PermFriedman-20080724-1033-7eg450y8/info/d/ANLUCTERAGRID64<br>Progress:  Initializing:73 Selecting site:6919 Executing:2 Finished successfully:5 Failed but can retry:1<br>Failed to transfer wrapper log from PermFriedman-20080724-1033-7eg450y8/info/f/ANLUCTERAGRID64<br>

Progress:  Initializing:73 Selecting site:6919 Executing:2 Finished successfully:5 Failed but can retry:1<br>Failed to transfer wrapper log from PermFriedman-20080724-1033-7eg450y8/info/h/ANLUCTERAGRID64<br>Progress:  Initializing:73 Selecting site:6919 Executing:2 Finished successfully:5 Failed but can retry:1<br>

Failed to transfer wrapper log from PermFriedman-20080724-1033-7eg450y8/info/j/ANLUCTERAGRID64<br>Progress:  Initializing:73 Selecting site:6919 Executing:3 Finished successfully:5<br>Progress:  Initializing:73 Selecting site:6919 Executing:3 Finished successfully:5<br>
<br><br><br>on ncsa, it seems recently to either all-out work or not work.  yesterday i got 73 jobs 'Finished successfully' on there and then it just hung, so i killed it (after letting it hang for a few hours).  today, i couldn't get it to even start executing (re: the site is down).  <br>
<br>and this 'new site', it's been sitting at: <br><br>Progress:  Selecting site:6994 Executing:6<br>Progress:  Selecting site:6994 Executing:6<br>Progress:  Selecting site:6994 Executing:6<br><br>since 2pm this afternoon, still with nothing finished, no errors, no indication of what's going on...<br>
woo grid computing! <br></div><br><br><div class="gmail_quote">On Thu, Jul 24, 2008 at 5:49 PM,  <<a href="mailto:skenny@uchicago.edu">skenny@uchicago.edu</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d">>On Thu, 2008-07-24 at 17:32 -0500, <a href="mailto:skenny@uchicago.edu">skenny@uchicago.edu</a> wrote:<br>
>> yes (see below) and SOME of the jobs in the workflow do<br>
>> complete when we submit the whole workflow to ucanl.<br>
><br>
>Indeed. It seems like roughly half of them work and the other<br>
half<br>
>break. Could this be an ia32/ia64 issue? Like python being<br>
compiled for<br>
>the wrong platform or something?<br>
<br>
</div>hmm, not quite sure i follow, since we're only sending to ia64<br>
on this run...how can i test?<br>
<div class="Ih2E3d"><br>
>> unfortunately i can't test anything on ncsa right now 'cause<br>
>> it's down.<br>
><br>
>It being down would generally prevent swift from being able<br>
to run jobs<br>
>there. Which is probably what happened the week before.<br>
<br>
</div>ha ha, what swift can't run jobs on a site that's down?<br>
lame! heh, actually we've had a couple of runs now where we<br>
see the behavior i described on ncsa--e.g. a few jobs<br>
completing but some failing and an eventual decline. though,<br>
it's true the site's been up and down quite a bit over the<br>
past few weeks so could be indicative of something else wrong<br>
entirely. incidentally, i told them a couple weeks<br>
ago i was having trouble submitting to gram4 so we switched<br>
back to gram2 and it *seemed* to be working...for a while.<br>
<br>
well, we're trying on yet another site now so if we see more<br>
of the same we'll know we need to do *something* on our end.<br>
<br>
thanks<br>
sarah<br>
<br>
<br>
><br>
><br>
</blockquote></div><br></div>