Hi, <br> I've encountered this issue with SwiftR, running release 0.92 from the svn repository. The issue occurs when GLOBUS::maxWallTime="03:55:00" in tc and maxTime is 4 hours in sites.xml. After 5 minutes (or whatever the difference is between the two times), I get the exception copied below. A tarball is attached with the logs, script, etc. replicate.sh shows how to replicate the issue on PADS.<br>
<br>Assuming that my problem is the same as the others, it would be good if the fix could be merged to release 0.92, as I'm trying to bundle stable swift releases with SwiftR.<br><br>- Tim<br><br><br>Swift svn swift-r4336 cog-r3096 (cog modified locally)<br>
<br>RunID: 20110526-1317-2c8ybi10<br>Progress:<br>SwiftScript trace: top of loop: rserver waiting for input on, /tmp/nbest/SwiftR/swift.0827/requestpipe<br>Progress: Active:1<br>Progress: Finished successfully:1<br>SwiftScript trace: rserver: got dir, /tmp/nbest/SwiftR/requests.P09626/R0000007<br>
Progress: uninitialized:1 Finished successfully:1<br>Progress: Submitted:1 Finished successfully:1<br>Progress: Active:1 Finished successfully:1<br>Progress: Active:1 Finished successfully:1<br>Progress: Active:1 Finished successfully:1<br>
Progress: Active:1 Finished successfully:1<br>Progress: Active:1 Finished successfully:1<br>Progress: Active:1 Finished successfully:1<br>Progress: Active:1 Finished successfully:1<br>Progress: Active:1 Finished successfully:1<br>
Progress: Active:1 Finished successfully:1<br>queuedsize > 0 but no job dequeued. Queued: {}<br>java.lang.Throwable<br> at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252)<br>
at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520)<br> at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)<br>
queuedsize > 0 but no job dequeued. Queued: {}<br>java.lang.Throwable<br> at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252)<br> at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520)<br>
at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)<br>Progress: Finished successfully:1 Failed but can retry:1<br><br><br><div class="gmail_quote">On Sun, May 22, 2011 at 1:51 PM, Mihael Hategan <span dir="ltr"><<a href="mailto:hategan@mcs.anl.gov">hategan@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">The second one looks to me like a coaster problem. Can't say much about<br>
the first issue.<br>
<br>
Can you try with plain pbs if you want to test the pbs provider?<br>
<font color="#888888"><br>
Mihael<br>
</font><div><div></div><div class="h5"><br>
On Sun, 2011-05-22 at 08:39 -0500, ketan wrote:<br>
> I can confirm that the trunk is not usable for pbs provider. I am using<br>
> trunk for submitting jobs on beagle and I see a few unexpected things:<br>
><br>
> 1. The stderr is showing inconsistent messages: The results are getting<br>
> written to the output even though stderr doesn't report any.<br>
> 2. qsub jobs being cancelled inadvertantly: I submitted 40 of them<br>
> yesterday, however, only 2 survived today. The log is here:<br>
><br>
> <a href="http://www.ci.uchicago.edu/%7Eketan/files/ftdock-20110521-0337-pokpgg89.log" target="_blank">http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-0337-pokpgg89.log</a><br>
><br>
> In addition, the ssh-pbs provider does not seem to be working for large<br>
> runs (it worked for a small number of test runs): Getting unexpected<br>
> stdouts. Following is the stdout:<br>
><br>
> <a href="http://www.ci.uchicago.edu/%7Eketan/files/ssh-pbs.stdout" target="_blank">http://www.ci.uchicago.edu/~ketan/files/ssh-pbs.stdout</a><br>
><br>
> Following is the log file for the above run:<br>
><br>
> <a href="http://www.ci.uchicago.edu/%7Eketan/files/ftdock-20110521-1750-b0cot9sa.log" target="_blank">http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-1750-b0cot9sa.log</a><br>
><br>
><br>
> Ketan<br>
><br>
> On 5/21/11 5:12 PM, Michael Wilde wrote:<br>
> ><br>
> > ----- Original Message -----<br>
> >> On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky wrote:<br>
> >>> as I mentioned, I've been running with Mike's swift which was<br>
> >>> patched<br>
> >>> for beagle. are all the things that make running on beagle work in<br>
> >>> trunk?<br>
> >> No idea.<br>
> >><br>
> >> Mike?<br>
> > Justin, working with Ketan, just applied changes to trunk which should make it work now on Beagle (or any Cray XT5+ or XE). This uses a different set of sites.xml tags than the prototype in the current Beagle swift 0.92.1 module. Justin has a note on this at:<br>
> > <a href="https://sites.google.com/site/swiftdevel/sites/pbs/cray" target="_blank">https://sites.google.com/site/swiftdevel/sites/pbs/cray</a><br>
> ><br>
> > It was working before for one-node worker jobs; now it should work for multi-node worker jobs as well.<br>
> ><br>
> > Justin and Ketan should comment on the state of testing and readiness of this trunk feature. Don't try trunk on Beagle till they give the go-ahead.<br>
> ><br>
> > - Mike<br>
> ><br>
> >>> If so i'll update to the latest and test. I don't think I'm<br>
> >>> using stable...<br>
> >> Ok<br>
> >><br>
> >> Mihael<br>
> _______________________________________________<br>
> Swift-devel mailing list<br>
> <a href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a><br>
> <a href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel" target="_blank">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a><br>
<br>
<br>
_______________________________________________<br>
Swift-devel mailing list<br>
<a href="mailto:Swift-devel@ci.uchicago.edu">Swift-devel@ci.uchicago.edu</a><br>
<a href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel" target="_blank">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a><br>
</div></div></blockquote></div><br>