[Swift-devel] Re: [alcf-support #60887] Can Cobalt command-line bug on Eureka be fixed?

Michael Wilde wilde at mcs.anl.gov
Tue Jan 11 17:40:31 CST 2011


One workaround we can try here, which may be more valuable than a temp fix, would be to make a more user-ready script to launch manual coasters (persistent/passive) on any cluster.

We have several such scripts floating around; probably Sheri could use one if it were only slightly polished.

That would be a good project for you, David.

Such a script would be useful on any cluster, and would need only slight flexibility to specify the batch jobs for various PBS, SGE, Cobalt, and Slurm systems.

It has all the drawbacks of manual coasters (which some folks like) and is a usage mode we want to support.

Justin, you noted yesterday that its hard to make such a script general.  Maybe if we split the script into 2 variants (one for clusters, and one for sets of workstations) that would ake the resultant scripts more maintainable and testable?

- Mike


----- Original Message -----
> Thanks, Rich and Andrew, for the very fast responses.
> 
> We'll try the work-around, then.
> 
> Regards,
> 
> - Mike
> 
> 
> ----- Original Message -----
> > Michael,
> >
> > Unfortunately a fix for this will, at this point in time, take a
> > minimum
> > of four weeks to deploy to a production resource like Eureka, due to
> > our
> > testing, upgrade and maintenance procedures.
> >
> > As a workaround for this on Eureka, since every job effectively runs
> > in
> > script mode, you should be able to set environment variables within
> > the
> > script that you submit to Cobalt.
> >
> > We apologize for the inconvenience. Let us know if you have any
> > other
> > questions.
> >
> > --
> > Paul Rich
> > ALCF Operations -- AIG
> > richp at alcf.anl.gov
> >
> >
> > On 1/11/11 4:48 PM, Michael Wilde wrote:
> > > User info for wilde at mcs.anl.gov
> > > =================================
> > > Username: wilde
> > > Full Name: Michael Wilde
> > > Projects:
> > > HTCScienceApps,JGI-Pilot,MTCScienceApps,OOPS,PTMAP,pilot-wilde
> > > 	     ('*' denotes INCITE projects)
> > > =================================
> > >
> > >
> > > Hi ALCF Team,
> > >
> > > The following known issue in Cobalt is currently preventing us
> > > from
> > > running Swift on Eureka:
> > >
> > >    http://trac.mcs.anl.gov/projects/cobalt/ticket/462
> > >
> > > With some additional development effort we can work around this,
> > > but
> > > it would be much cleaner and better if this were fixed in Cobalt,
> > > instead, as suggested in ticket 462 above.
> > >
> > > Is there any chance that can be done in the next few days?
> > > If not, please let me know, and we will implement the work-around
> > > instead.
> > >
> > > This is holding up work on the DOE ParVis project (Rob Jacob, PI)
> > > and we've had to move some work we want to run on Eureka to other
> > > platforms in the meantime.
> > >
> > > Thanks very much,
> > >
> > > Mike
> > >
> > > 462 is:
> > >
> > > Ticket #462 (new defect)
> > > Opened 7 months ago
> > > Cobalt on clusters ignores job script arguments
> > >
> > > Reported by: acherry
> > > Priority: major
> > > Component: clients
> > >
> > > Description
> > >
> > > It appears that cobalt-launcher.py does not support running a job
> > > script or executable with command arguments, even though qsub will
> > > accept the arguments, and the man page and help for qsub indicates
> > > that arguments are accepted.
> > >
> > > I'm filing this as a bug rather than a feature request, since the
> > > behavior isn't consistent with the documentation. But I'd rather
> > > the
> > > fix for this to be adding support for args, rather than changing
> > > the
> > > docs to say they aren't accepted. :-)
> > >
> > >
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list