[Swift-devel] Manual start script for persistent coasters on Cobalt and other schedulers

Michael Wilde wilde at mcs.anl.gov
Tue Jan 11 18:08:32 CST 2011


was: Re: [Swift-devel] Re:
  [alcf-support #60887] Can Cobalt command-line	bug on Eureka be fixed?

David, the evolving Swift R package has a start-swift command in this directory:

  https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/SwiftR/Swift/exec

which has the logic needed to start a manual persistent passive coaster pool on both clusters and workstations.

You'll need to pick up the files that start-swift sources from that same directory, and remove the final stage of the script where it actually launches Swift (that part is just for the Swift R service).

You'll want to keep the part where it launches the Swift script "passivate.swift" to force the persistent service into passive mode.

I think that with some cleanup and much testing, this script could be adapted to launch all means of manual coaster configurations.

Justin has expressed the view that perhaps this whole process can not be scripted cleanly, and that we instead should provide tools for the user to do this manually.

I would like to try, though, to see if this script can be made clean and reliable, and then we could place it in Swift and factor it out of SwiftR.

I'm willing to help you get this set up and tested.

- Mike

----- Original Message -----
> One workaround we can try here, which may be more valuable than a temp
> fix, would be to make a more user-ready script to launch manual
> coasters (persistent/passive) on any cluster.
> 
> We have several such scripts floating around; probably Sheri could use
> one if it were only slightly polished.
> 
> That would be a good project for you, David.
> 
> Such a script would be useful on any cluster, and would need only
> slight flexibility to specify the batch jobs for various PBS, SGE,
> Cobalt, and Slurm systems.
> 
> It has all the drawbacks of manual coasters (which some folks like)
> and is a usage mode we want to support.
> 
> Justin, you noted yesterday that its hard to make such a script
> general. Maybe if we split the script into 2 variants (one for
> clusters, and one for sets of workstations) that would ake the
> resultant scripts more maintainable and testable?
> 
> - Mike
> 
> 
> ----- Original Message -----
> > Thanks, Rich and Andrew, for the very fast responses.
> >
> > We'll try the work-around, then.
> >
> > Regards,
> >
> > - Mike
> >
> >
> > ----- Original Message -----
> > > Michael,
> > >
> > > Unfortunately a fix for this will, at this point in time, take a
> > > minimum
> > > of four weeks to deploy to a production resource like Eureka, due
> > > to
> > > our
> > > testing, upgrade and maintenance procedures.
> > >
> > > As a workaround for this on Eureka, since every job effectively
> > > runs
> > > in
> > > script mode, you should be able to set environment variables
> > > within
> > > the
> > > script that you submit to Cobalt.
> > >
> > > We apologize for the inconvenience. Let us know if you have any
> > > other
> > > questions.
> > >
> > > --
> > > Paul Rich
> > > ALCF Operations -- AIG
> > > richp at alcf.anl.gov
> > >
> > >
> > > On 1/11/11 4:48 PM, Michael Wilde wrote:
> > > > User info for wilde at mcs.anl.gov
> > > > =================================
> > > > Username: wilde
> > > > Full Name: Michael Wilde
> > > > Projects:
> > > > HTCScienceApps,JGI-Pilot,MTCScienceApps,OOPS,PTMAP,pilot-wilde
> > > > 	     ('*' denotes INCITE projects)
> > > > =================================
> > > >
> > > >
> > > > Hi ALCF Team,
> > > >
> > > > The following known issue in Cobalt is currently preventing us
> > > > from
> > > > running Swift on Eureka:
> > > >
> > > >    http://trac.mcs.anl.gov/projects/cobalt/ticket/462
> > > >
> > > > With some additional development effort we can work around this,
> > > > but
> > > > it would be much cleaner and better if this were fixed in
> > > > Cobalt,
> > > > instead, as suggested in ticket 462 above.
> > > >
> > > > Is there any chance that can be done in the next few days?
> > > > If not, please let me know, and we will implement the
> > > > work-around
> > > > instead.
> > > >
> > > > This is holding up work on the DOE ParVis project (Rob Jacob,
> > > > PI)
> > > > and we've had to move some work we want to run on Eureka to
> > > > other
> > > > platforms in the meantime.
> > > >
> > > > Thanks very much,
> > > >
> > > > Mike
> > > >
> > > > 462 is:
> > > >
> > > > Ticket #462 (new defect)
> > > > Opened 7 months ago
> > > > Cobalt on clusters ignores job script arguments
> > > >
> > > > Reported by: acherry
> > > > Priority: major
> > > > Component: clients
> > > >
> > > > Description
> > > >
> > > > It appears that cobalt-launcher.py does not support running a
> > > > job
> > > > script or executable with command arguments, even though qsub
> > > > will
> > > > accept the arguments, and the man page and help for qsub
> > > > indicates
> > > > that arguments are accepted.
> > > >
> > > > I'm filing this as a bug rather than a feature request, since
> > > > the
> > > > behavior isn't consistent with the documentation. But I'd rather
> > > > the
> > > > fix for this to be adding support for args, rather than changing
> > > > the
> > > > docs to say they aren't accepted. :-)
> > > >
> > > >
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list