[Swift-devel] Re: Manual start script for persistent coasters on Cobalt and other schedulers

Michael Wilde wilde at mcs.anl.gov
Wed Jan 12 09:07:03 CST 2011


David, my apologies, I posted the wrong script. The start-swift command no longer starts a coaster service, because it uses a persistent swift command that reads requests from R and runs them. So the coaster service is embedded in the persistent swift.

Lets look at the one Justin posted. I suspect you can merge the logic in the SwiftR start-swift command that starts the workers with Justin's logic that start the service.

- Mike


----- Original Message -----
> David, lets do a skype call in a few hours to discuss.
> 
> I *think* this command should "just work" to a large extent if you
> make sure that the helper script is accessible and the "R"-specific
> stuff is commented out.
> 
> I last tested it on SGE but it has worked on PADS/PBS.
> 
> - Mike
> 
> 
> ----- Original Message -----
> > Mike,
> >
> > I will give it a try. Would the configuration for this be similar to
> > the persistent passive coaster configuration used on the MCS
> > machines?
> >
> > For example:
> > <execution provider="coaster-persistent" url="churn.mcs.anl.gov"
> > jobmanager="local:local"/>
> > <profile namespace="globus" key="workerManager">passive</profile>
> >
> > With each of the 4 worker nodes having it's own entry? Do you happen
> > to know the names of the workers for Gadzooks?
> >
> > Thanks,
> > David
> >
> > On Tue, Jan 11, 2011 at 7:08 PM, Michael Wilde <wilde at mcs.anl.gov>
> > wrote:
> > > was: Re: [Swift-devel] Re:
> > >  [alcf-support #60887] Can Cobalt command-line bug on Eureka be
> > >  fixed?
> > >
> > > David, the evolving Swift R package has a start-swift command in
> > > this directory:
> > >
> > >  https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/SwiftR/Swift/exec
> > >
> > > which has the logic needed to start a manual persistent passive
> > > coaster pool on both clusters and workstations.
> > >
> > > You'll need to pick up the files that start-swift sources from
> > > that
> > > same directory, and remove the final stage of the script where it
> > > actually launches Swift (that part is just for the Swift R
> > > service).
> > >
> > > You'll want to keep the part where it launches the Swift script
> > > "passivate.swift" to force the persistent service into passive
> > > mode.
> > >
> > > I think that with some cleanup and much testing, this script could
> > > be adapted to launch all means of manual coaster configurations.
> > >
> > > Justin has expressed the view that perhaps this whole process can
> > > not be scripted cleanly, and that we instead should provide tools
> > > for the user to do this manually.
> > >
> > > I would like to try, though, to see if this script can be made
> > > clean
> > > and reliable, and then we could place it in Swift and factor it
> > > out
> > > of SwiftR.
> > >
> > > I'm willing to help you get this set up and tested.
> > >
> > > - Mike
> > >
> > > ----- Original Message -----
> > >> One workaround we can try here, which may be more valuable than a
> > >> temp
> > >> fix, would be to make a more user-ready script to launch manual
> > >> coasters (persistent/passive) on any cluster.
> > >>
> > >> We have several such scripts floating around; probably Sheri
> > >> could
> > >> use
> > >> one if it were only slightly polished.
> > >>
> > >> That would be a good project for you, David.
> > >>
> > >> Such a script would be useful on any cluster, and would need only
> > >> slight flexibility to specify the batch jobs for various PBS,
> > >> SGE,
> > >> Cobalt, and Slurm systems.
> > >>
> > >> It has all the drawbacks of manual coasters (which some folks
> > >> like)
> > >> and is a usage mode we want to support.
> > >>
> > >> Justin, you noted yesterday that its hard to make such a script
> > >> general. Maybe if we split the script into 2 variants (one for
> > >> clusters, and one for sets of workstations) that would ake the
> > >> resultant scripts more maintainable and testable?
> > >>
> > >> - Mike
> > >>
> > >>
> > >> ----- Original Message -----
> > >> > Thanks, Rich and Andrew, for the very fast responses.
> > >> >
> > >> > We'll try the work-around, then.
> > >> >
> > >> > Regards,
> > >> >
> > >> > - Mike
> > >> >
> > >> >
> > >> > ----- Original Message -----
> > >> > > Michael,
> > >> > >
> > >> > > Unfortunately a fix for this will, at this point in time,
> > >> > > take
> > >> > > a
> > >> > > minimum
> > >> > > of four weeks to deploy to a production resource like Eureka,
> > >> > > due
> > >> > > to
> > >> > > our
> > >> > > testing, upgrade and maintenance procedures.
> > >> > >
> > >> > > As a workaround for this on Eureka, since every job
> > >> > > effectively
> > >> > > runs
> > >> > > in
> > >> > > script mode, you should be able to set environment variables
> > >> > > within
> > >> > > the
> > >> > > script that you submit to Cobalt.
> > >> > >
> > >> > > We apologize for the inconvenience. Let us know if you have
> > >> > > any
> > >> > > other
> > >> > > questions.
> > >> > >
> > >> > > --
> > >> > > Paul Rich
> > >> > > ALCF Operations -- AIG
> > >> > > richp at alcf.anl.gov
> > >> > >
> > >> > >
> > >> > > On 1/11/11 4:48 PM, Michael Wilde wrote:
> > >> > > > User info for wilde at mcs.anl.gov
> > >> > > > =================================
> > >> > > > Username: wilde
> > >> > > > Full Name: Michael Wilde
> > >> > > > Projects:
> > >> > > > HTCScienceApps,JGI-Pilot,MTCScienceApps,OOPS,PTMAP,pilot-wilde
> > >> > > >              ('*' denotes INCITE projects)
> > >> > > > =================================
> > >> > > >
> > >> > > >
> > >> > > > Hi ALCF Team,
> > >> > > >
> > >> > > > The following known issue in Cobalt is currently preventing
> > >> > > > us
> > >> > > > from
> > >> > > > running Swift on Eureka:
> > >> > > >
> > >> > > >    http://trac.mcs.anl.gov/projects/cobalt/ticket/462
> > >> > > >
> > >> > > > With some additional development effort we can work around
> > >> > > > this,
> > >> > > > but
> > >> > > > it would be much cleaner and better if this were fixed in
> > >> > > > Cobalt,
> > >> > > > instead, as suggested in ticket 462 above.
> > >> > > >
> > >> > > > Is there any chance that can be done in the next few days?
> > >> > > > If not, please let me know, and we will implement the
> > >> > > > work-around
> > >> > > > instead.
> > >> > > >
> > >> > > > This is holding up work on the DOE ParVis project (Rob
> > >> > > > Jacob,
> > >> > > > PI)
> > >> > > > and we've had to move some work we want to run on Eureka to
> > >> > > > other
> > >> > > > platforms in the meantime.
> > >> > > >
> > >> > > > Thanks very much,
> > >> > > >
> > >> > > > Mike
> > >> > > >
> > >> > > > 462 is:
> > >> > > >
> > >> > > > Ticket #462 (new defect)
> > >> > > > Opened 7 months ago
> > >> > > > Cobalt on clusters ignores job script arguments
> > >> > > >
> > >> > > > Reported by: acherry
> > >> > > > Priority: major
> > >> > > > Component: clients
> > >> > > >
> > >> > > > Description
> > >> > > >
> > >> > > > It appears that cobalt-launcher.py does not support running
> > >> > > > a
> > >> > > > job
> > >> > > > script or executable with command arguments, even though
> > >> > > > qsub
> > >> > > > will
> > >> > > > accept the arguments, and the man page and help for qsub
> > >> > > > indicates
> > >> > > > that arguments are accepted.
> > >> > > >
> > >> > > > I'm filing this as a bug rather than a feature request,
> > >> > > > since
> > >> > > > the
> > >> > > > behavior isn't consistent with the documentation. But I'd
> > >> > > > rather
> > >> > > > the
> > >> > > > fix for this to be adding support for args, rather than
> > >> > > > changing
> > >> > > > the
> > >> > > > docs to say they aren't accepted. :-)
> > >> > > >
> > >> > > >
> > >> >
> > >> > --
> > >> > Michael Wilde
> > >> > Computation Institute, University of Chicago
> > >> > Mathematics and Computer Science Division
> > >> > Argonne National Laboratory
> > >> >
> > >> > _______________________________________________
> > >> > Swift-devel mailing list
> > >> > Swift-devel at ci.uchicago.edu
> > >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >>
> > >> --
> > >> Michael Wilde
> > >> Computation Institute, University of Chicago
> > >> Mathematics and Computer Science Division
> > >> Argonne National Laboratory
> > >>
> > >> _______________________________________________
> > >> Swift-devel mailing list
> > >> Swift-devel at ci.uchicago.edu
> > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> > > --
> > > Michael Wilde
> > > Computation Institute, University of Chicago
> > > Mathematics and Computer Science Division
> > > Argonne National Laboratory
> > >
> > >
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list