<html><head><style type='text/css'>p { margin: 0; }</style></head><body><div style='font-family: Times New Roman; font-size: 12pt; color: #000000'>Tim, in addition: whats the status of the problem of not being able to launch two concurrent R applications on the same compute node?<div><br></div><div>The problem below implies that you've resolved this prior problem? If so, what was the resolution?</div><div><br></div><div>Thanks,</div><div><br></div><div>Mike<br><div><br></div><div><hr id="zwchr"><blockquote style="border-left:2px solid rgb(16, 16, 255);margin-left:5px;padding-left:5px;">Any word on this bug? I have a nice use-case for SwiftR where it would be very handy to take advantage of Swift's dynamic resource procurement.<br><br>- Tim<br><br><div class="gmail_quote">On Thu, May 26, 2011 at 3:41 PM, Mihael Hategan <span dir="ltr"><<a href="mailto:hategan@mcs.anl.gov" target="_blank">hategan@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Given that this has now been reported a number of times, it may make<br>
sense to backport the fix from trunk and make a patch release for 0.92.<br>
<br>
Objections?<br>
<div><div></div><div class="h5"><br>
On Thu, 2011-05-26 at 14:59 -0500, Tim Armstrong wrote:<br>
> Hi,<br>
> I've encountered this issue with SwiftR, running release 0.92 from<br>
> the svn repository. The issue occurs when<br>
> GLOBUS::maxWallTime="03:55:00" in tc and maxTime is 4 hours in<br>
> sites.xml. After 5 minutes (or whatever the difference is between the<br>
> two times), I get the exception copied below. A tarball is attached<br>
> with the logs, script, etc. replicate.sh shows how to replicate the<br>
> issue on PADS.<br>
><br>
> Assuming that my problem is the same as the others, it would be good<br>
> if the fix could be merged to release 0.92, as I'm trying to bundle<br>
> stable swift releases with SwiftR.<br>
><br>
> - Tim<br>
><br>
><br>
> Swift svn swift-r4336 cog-r3096 (cog modified locally)<br>
><br>
> RunID: 20110526-1317-2c8ybi10<br>
> Progress:<br>
> SwiftScript trace: top of loop: rserver waiting for input<br>
> on, /tmp/nbest/SwiftR/swift.0827/requestpipe<br>
> Progress: Active:1<br>
> Progress: Finished successfully:1<br>
> SwiftScript trace: rserver: got<br>
> dir, /tmp/nbest/SwiftR/requests.P09626/R0000007<br>
> Progress: uninitialized:1 Finished successfully:1<br>
> Progress: Submitted:1 Finished successfully:1<br>
> Progress: Active:1 Finished successfully:1<br>
> Progress: Active:1 Finished successfully:1<br>
> Progress: Active:1 Finished successfully:1<br>
> Progress: Active:1 Finished successfully:1<br>
> Progress: Active:1 Finished successfully:1<br>
> Progress: Active:1 Finished successfully:1<br>
> Progress: Active:1 Finished successfully:1<br>
> Progress: Active:1 Finished successfully:1<br>
> Progress: Active:1 Finished successfully:1<br>
> queuedsize > 0 but no job dequeued. Queued: {}<br>
> java.lang.Throwable<br>
> at<br>
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252)<br>
> at<br>
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520)<br>
> at<br>
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)<br>
> queuedsize > 0 but no job dequeued. Queued: {}<br>
> java.lang.Throwable<br>
> at<br>
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252)<br>
> at<br>
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520)<br>
> at<br>
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109)<br>
> Progress: Finished successfully:1 Failed but can retry:1<br>
><br>
><br>
> On Sun, May 22, 2011 at 1:51 PM, Mihael Hategan <<a href="mailto:hategan@mcs.anl.gov" target="_blank">hategan@mcs.anl.gov</a>><br>
> wrote:<br>
> The second one looks to me like a coaster problem. Can't say<br>
> much about<br>
> the first issue.<br>
><br>
> Can you try with plain pbs if you want to test the pbs<br>
> provider?<br>
><br>
> Mihael<br>
><br>
><br>
> On Sun, 2011-05-22 at 08:39 -0500, ketan wrote:<br>
> > I can confirm that the trunk is not usable for pbs provider.<br>
> I am using<br>
> > trunk for submitting jobs on beagle and I see a few<br>
> unexpected things:<br>
> ><br>
> > 1. The stderr is showing inconsistent messages: The results<br>
> are getting<br>
> > written to the output even though stderr doesn't report any.<br>
> > 2. qsub jobs being cancelled inadvertantly: I submitted 40<br>
> of them<br>
> > yesterday, however, only 2 survived today. The log is here:<br>
> ><br>
> ><br>
> <a href="http://www.ci.uchicago.edu/%7Eketan/files/ftdock-20110521-0337-pokpgg89.log" target="_blank">http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-0337-pokpgg89.log</a><br>
> ><br>
> > In addition, the ssh-pbs provider does not seem to be<br>
> working for large<br>
> > runs (it worked for a small number of test runs): Getting<br>
> unexpected<br>
> > stdouts. Following is the stdout:<br>
> ><br>
> > <a href="http://www.ci.uchicago.edu/%7Eketan/files/ssh-pbs.stdout" target="_blank">http://www.ci.uchicago.edu/~ketan/files/ssh-pbs.stdout</a><br>
> ><br>
> > Following is the log file for the above run:<br>
> ><br>
> ><br>
> <a href="http://www.ci.uchicago.edu/%7Eketan/files/ftdock-20110521-1750-b0cot9sa.log" target="_blank">http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-1750-b0cot9sa.log</a><br>
> ><br>
> ><br>
> > Ketan<br>
> ><br>
> > On 5/21/11 5:12 PM, Michael Wilde wrote:<br>
> > ><br>
> > > ----- Original Message -----<br>
> > >> On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky wrote:<br>
> > >>> as I mentioned, I've been running with Mike's swift<br>
> which was<br>
> > >>> patched<br>
> > >>> for beagle. are all the things that make running on<br>
> beagle work in<br>
> > >>> trunk?<br>
> > >> No idea.<br>
> > >><br>
> > >> Mike?<br>
> > > Justin, working with Ketan, just applied changes to trunk<br>
> which should make it work now on Beagle (or any Cray XT5+ or<br>
> XE). This uses a different set of sites.xml tags than the<br>
> prototype in the current Beagle swift 0.92.1 module. Justin<br>
> has a note on this at:<br>
> > > <a href="https://sites.google.com/site/swiftdevel/sites/pbs/cray" target="_blank">https://sites.google.com/site/swiftdevel/sites/pbs/cray</a><br>
> > ><br>
> > > It was working before for one-node worker jobs; now it<br>
> should work for multi-node worker jobs as well.<br>
> > ><br>
> > > Justin and Ketan should comment on the state of testing<br>
> and readiness of this trunk feature. Don't try trunk on<br>
> Beagle till they give the go-ahead.<br>
> > ><br>
> > > - Mike<br>
> > ><br>
> > >>> If so i'll update to the latest and test. I don't<br>
> think I'm<br>
> > >>> using stable...<br>
> > >> Ok<br>
> > >><br>
> > >> Mihael<br>
> > _______________________________________________<br>
> > Swift-devel mailing list<br>
> > <a href="mailto:Swift-devel@ci.uchicago.edu" target="_blank">Swift-devel@ci.uchicago.edu</a><br>
> > <a href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel" target="_blank">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a><br>
><br>
><br>
> _______________________________________________<br>
> Swift-devel mailing list<br>
> <a href="mailto:Swift-devel@ci.uchicago.edu" target="_blank">Swift-devel@ci.uchicago.edu</a><br>
> <a href="http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel" target="_blank">http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel</a><br>
><br>
><br>
<br>
<br>
</div></div></blockquote></div><br>
<br>_______________________________________________<br>Swift-devel mailing list<br>Swift-devel@ci.uchicago.edu<br>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel<br></blockquote><br><span><br><br>-- <br><span name="x"></span>Michael Wilde<br>Computation Institute, University of Chicago<br>Mathematics and Computer Science Division<br>Argonne National Laboratory<br><span name="x"></span><br></span></div></div></div></body></html>