[Swift-devel] recent error on beagle

Michael Wilde wilde at mcs.anl.gov
Thu Jun 2 15:31:51 CDT 2011


Tim, in addition: whats the status of the problem of not being able to launch two concurrent R applications on the same compute node? 


The problem below implies that you've resolved this prior problem? If so, what was the resolution? 


Thanks, 


Mike 



----- Original Message -----


Any word on this bug? I have a nice use-case for SwiftR where it would be very handy to take advantage of Swift's dynamic resource procurement. 

- Tim 


On Thu, May 26, 2011 at 3:41 PM, Mihael Hategan < hategan at mcs.anl.gov > wrote: 


Given that this has now been reported a number of times, it may make 
sense to backport the fix from trunk and make a patch release for 0.92. 

Objections? 




On Thu, 2011-05-26 at 14:59 -0500, Tim Armstrong wrote: 
> Hi, 
> I've encountered this issue with SwiftR, running release 0.92 from 
> the svn repository. The issue occurs when 
> GLOBUS::maxWallTime="03:55:00" in tc and maxTime is 4 hours in 
> sites.xml. After 5 minutes (or whatever the difference is between the 
> two times), I get the exception copied below. A tarball is attached 
> with the logs, script, etc. replicate.sh shows how to replicate the 
> issue on PADS. 
> 
> Assuming that my problem is the same as the others, it would be good 
> if the fix could be merged to release 0.92, as I'm trying to bundle 
> stable swift releases with SwiftR. 
> 
> - Tim 
> 
> 
> Swift svn swift-r4336 cog-r3096 (cog modified locally) 
> 
> RunID: 20110526-1317-2c8ybi10 
> Progress: 
> SwiftScript trace: top of loop: rserver waiting for input 
> on, /tmp/nbest/SwiftR/swift.0827/requestpipe 
> Progress: Active:1 
> Progress: Finished successfully:1 
> SwiftScript trace: rserver: got 
> dir, /tmp/nbest/SwiftR/requests.P09626/R0000007 
> Progress: uninitialized:1 Finished successfully:1 
> Progress: Submitted:1 Finished successfully:1 
> Progress: Active:1 Finished successfully:1 
> Progress: Active:1 Finished successfully:1 
> Progress: Active:1 Finished successfully:1 
> Progress: Active:1 Finished successfully:1 
> Progress: Active:1 Finished successfully:1 
> Progress: Active:1 Finished successfully:1 
> Progress: Active:1 Finished successfully:1 
> Progress: Active:1 Finished successfully:1 
> Progress: Active:1 Finished successfully:1 
> queuedsize > 0 but no job dequeued. Queued: {} 
> java.lang.Throwable 
> at 
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) 
> at 
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) 
> at 
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) 
> queuedsize > 0 but no job dequeued. Queued: {} 
> java.lang.Throwable 
> at 
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) 
> at 
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) 
> at 
> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) 
> Progress: Finished successfully:1 Failed but can retry:1 
> 
> 
> On Sun, May 22, 2011 at 1:51 PM, Mihael Hategan < hategan at mcs.anl.gov > 
> wrote: 
> The second one looks to me like a coaster problem. Can't say 
> much about 
> the first issue. 
> 
> Can you try with plain pbs if you want to test the pbs 
> provider? 
> 
> Mihael 
> 
> 
> On Sun, 2011-05-22 at 08:39 -0500, ketan wrote: 
> > I can confirm that the trunk is not usable for pbs provider. 
> I am using 
> > trunk for submitting jobs on beagle and I see a few 
> unexpected things: 
> > 
> > 1. The stderr is showing inconsistent messages: The results 
> are getting 
> > written to the output even though stderr doesn't report any. 
> > 2. qsub jobs being cancelled inadvertantly: I submitted 40 
> of them 
> > yesterday, however, only 2 survived today. The log is here: 
> > 
> > 
> http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-0337-pokpgg89.log 
> > 
> > In addition, the ssh-pbs provider does not seem to be 
> working for large 
> > runs (it worked for a small number of test runs): Getting 
> unexpected 
> > stdouts. Following is the stdout: 
> > 
> > http://www.ci.uchicago.edu/~ketan/files/ssh-pbs.stdout 
> > 
> > Following is the log file for the above run: 
> > 
> > 
> http://www.ci.uchicago.edu/~ketan/files/ftdock-20110521-1750-b0cot9sa.log 
> > 
> > 
> > Ketan 
> > 
> > On 5/21/11 5:12 PM, Michael Wilde wrote: 
> > > 
> > > ----- Original Message ----- 
> > >> On Sat, 2011-05-21 at 17:06 -0400, Glen Hocky wrote: 
> > >>> as I mentioned, I've been running with Mike's swift 
> which was 
> > >>> patched 
> > >>> for beagle. are all the things that make running on 
> beagle work in 
> > >>> trunk? 
> > >> No idea. 
> > >> 
> > >> Mike? 
> > > Justin, working with Ketan, just applied changes to trunk 
> which should make it work now on Beagle (or any Cray XT5+ or 
> XE). This uses a different set of sites.xml tags than the 
> prototype in the current Beagle swift 0.92.1 module. Justin 
> has a note on this at: 
> > > https://sites.google.com/site/swiftdevel/sites/pbs/cray 
> > > 
> > > It was working before for one-node worker jobs; now it 
> should work for multi-node worker jobs as well. 
> > > 
> > > Justin and Ketan should comment on the state of testing 
> and readiness of this trunk feature. Don't try trunk on 
> Beagle till they give the go-ahead. 
> > > 
> > > - Mike 
> > > 
> > >>> If so i'll update to the latest and test. I don't 
> think I'm 
> > >>> using stable... 
> > >> Ok 
> > >> 
> > >> Mihael 
> > _______________________________________________ 
> > Swift-devel mailing list 
> > Swift-devel at ci.uchicago.edu 
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel 
> 
> 
> _______________________________________________ 
> Swift-devel mailing list 
> Swift-devel at ci.uchicago.edu 
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel 
> 
> 




_______________________________________________ 
Swift-devel mailing list 
Swift-devel at ci.uchicago.edu 
http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel 



-- 
Michael Wilde 
Computation Institute, University of Chicago 
Mathematics and Computer Science Division 
Argonne National Laboratory 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110602/c4f0581e/attachment.html>


More information about the Swift-devel mailing list