[Swift-devel] [alcf-support #60887] Can Cobalt command-line bug on Eureka be fixed?

Paul Rich pmrich at gmail.com
Wed Jun 22 14:39:12 CDT 2011


Michael,

I wanted to let you know that a recent patch to Cobalt on Eureka should allow you to pass command-line arguments into the program supplied to the Cobalt job. Let us know if you encounter any further difficulties, and I am sorry that this took so long to deploy.

Thank you for your patience,

--
Paul Rich
ALCF Operations -- AIG
richp at alcf.anl.gov


----- Original Message -----
From: "Michael Wilde" <wilde at mcs.anl.gov>
To: "Paul M. Rich" <richp at alcf.anl.gov>, "Andrew Cherry" <acherry at alcf.anl.gov>
Cc: "swift-devel" <swift-devel at ci.uchicago.edu>, "Robert Jacob" <jacob at mcs.anl.gov>, support at alcf.anl.gov
Sent: Tuesday, January 11, 2011 7:30:30 PM
Subject: Re: [alcf-support #60887] Can Cobalt command-line bug on Eureka be fixed?

Paul, Andrew,

What I think we're going to do on this from the Swift side is temporarily try to use Eureka in a mode where we manually start Swift workers on the cluster using a batch job.  

We'll wait on testing the Swift Cobolt interface (which is different than the above) until we hear from you that the bug is fixed and ready for testing.

So even though it may be many weeks or more away, we'd like to put in our vote for fixing this issue (realizing that you have many other priorities :)

Thanks,

MIke


----- Original Message -----
> Thanks, Rich and Andrew, for the very fast responses.
> 
> We'll try the work-around, then.
> 
> Regards,
> 
> - Mike
> 
> 
> ----- Original Message -----
> > Michael,
> >
> > Unfortunately a fix for this will, at this point in time, take a
> > minimum
> > of four weeks to deploy to a production resource like Eureka, due to
> > our
> > testing, upgrade and maintenance procedures.
> >
> > As a workaround for this on Eureka, since every job effectively runs
> > in
> > script mode, you should be able to set environment variables within
> > the
> > script that you submit to Cobalt.
> >
> > We apologize for the inconvenience. Let us know if you have any
> > other
> > questions.
> >
> > --
> > Paul Rich
> > ALCF Operations -- AIG
> > richp at alcf.anl.gov
> >
> >
> > On 1/11/11 4:48 PM, Michael Wilde wrote:
> > > User info for wilde at mcs.anl.gov
> > > =================================
> > > Username: wilde
> > > Full Name: Michael Wilde
> > > Projects:
> > > HTCScienceApps,JGI-Pilot,MTCScienceApps,OOPS,PTMAP,pilot-wilde
> > > 	     ('*' denotes INCITE projects)
> > > =================================
> > >
> > >
> > > Hi ALCF Team,
> > >
> > > The following known issue in Cobalt is currently preventing us
> > > from
> > > running Swift on Eureka:
> > >
> > >    http://trac.mcs.anl.gov/projects/cobalt/ticket/462
> > >
> > > With some additional development effort we can work around this,
> > > but
> > > it would be much cleaner and better if this were fixed in Cobalt,
> > > instead, as suggested in ticket 462 above.
> > >
> > > Is there any chance that can be done in the next few days?
> > > If not, please let me know, and we will implement the work-around
> > > instead.
> > >
> > > This is holding up work on the DOE ParVis project (Rob Jacob, PI)
> > > and we've had to move some work we want to run on Eureka to other
> > > platforms in the meantime.
> > >
> > > Thanks very much,
> > >
> > > Mike
> > >
> > > 462 is:
> > >
> > > Ticket #462 (new defect)
> > > Opened 7 months ago
> > > Cobalt on clusters ignores job script arguments
> > >
> > > Reported by: acherry
> > > Priority: major
> > > Component: clients
> > >
> > > Description
> > >
> > > It appears that cobalt-launcher.py does not support running a job
> > > script or executable with command arguments, even though qsub will
> > > accept the arguments, and the man page and help for qsub indicates
> > > that arguments are accepted.
> > >
> > > I'm filing this as a bug rather than a feature request, since the
> > > behavior isn't consistent with the documentation. But I'd rather
> > > the
> > > fix for this to be adding support for args, rather than changing
> > > the
> > > docs to say they aren't accepted. :-)
> > >
> > >
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list