[Swift-devel] Re: Manual start script for persistent coasters on Cobalt and other schedulers

David Kelly dk0966 at cs.ship.edu
Tue Jan 11 20:11:34 CST 2011


Mike,

I will give it a try. Would the configuration for this be similar to
the persistent passive coaster configuration used on the MCS machines?

For example:
    <execution provider="coaster-persistent" url="churn.mcs.anl.gov"
jobmanager="local:local"/>
    <profile namespace="globus" key="workerManager">passive</profile>

With each of the 4 worker nodes having it's own entry? Do you happen
to know the names of the workers for Gadzooks?

Thanks,
David

On Tue, Jan 11, 2011 at 7:08 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:
> was: Re: [Swift-devel] Re:
>  [alcf-support #60887] Can Cobalt command-line bug on Eureka be fixed?
>
> David, the evolving Swift R package has a start-swift command in this directory:
>
>  https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/SwiftR/Swift/exec
>
> which has the logic needed to start a manual persistent passive coaster pool on both clusters and workstations.
>
> You'll need to pick up the files that start-swift sources from that same directory, and remove the final stage of the script where it actually launches Swift (that part is just for the Swift R service).
>
> You'll want to keep the part where it launches the Swift script "passivate.swift" to force the persistent service into passive mode.
>
> I think that with some cleanup and much testing, this script could be adapted to launch all means of manual coaster configurations.
>
> Justin has expressed the view that perhaps this whole process can not be scripted cleanly, and that we instead should provide tools for the user to do this manually.
>
> I would like to try, though, to see if this script can be made clean and reliable, and then we could place it in Swift and factor it out of SwiftR.
>
> I'm willing to help you get this set up and tested.
>
> - Mike
>
> ----- Original Message -----
>> One workaround we can try here, which may be more valuable than a temp
>> fix, would be to make a more user-ready script to launch manual
>> coasters (persistent/passive) on any cluster.
>>
>> We have several such scripts floating around; probably Sheri could use
>> one if it were only slightly polished.
>>
>> That would be a good project for you, David.
>>
>> Such a script would be useful on any cluster, and would need only
>> slight flexibility to specify the batch jobs for various PBS, SGE,
>> Cobalt, and Slurm systems.
>>
>> It has all the drawbacks of manual coasters (which some folks like)
>> and is a usage mode we want to support.
>>
>> Justin, you noted yesterday that its hard to make such a script
>> general. Maybe if we split the script into 2 variants (one for
>> clusters, and one for sets of workstations) that would ake the
>> resultant scripts more maintainable and testable?
>>
>> - Mike
>>
>>
>> ----- Original Message -----
>> > Thanks, Rich and Andrew, for the very fast responses.
>> >
>> > We'll try the work-around, then.
>> >
>> > Regards,
>> >
>> > - Mike
>> >
>> >
>> > ----- Original Message -----
>> > > Michael,
>> > >
>> > > Unfortunately a fix for this will, at this point in time, take a
>> > > minimum
>> > > of four weeks to deploy to a production resource like Eureka, due
>> > > to
>> > > our
>> > > testing, upgrade and maintenance procedures.
>> > >
>> > > As a workaround for this on Eureka, since every job effectively
>> > > runs
>> > > in
>> > > script mode, you should be able to set environment variables
>> > > within
>> > > the
>> > > script that you submit to Cobalt.
>> > >
>> > > We apologize for the inconvenience. Let us know if you have any
>> > > other
>> > > questions.
>> > >
>> > > --
>> > > Paul Rich
>> > > ALCF Operations -- AIG
>> > > richp at alcf.anl.gov
>> > >
>> > >
>> > > On 1/11/11 4:48 PM, Michael Wilde wrote:
>> > > > User info for wilde at mcs.anl.gov
>> > > > =================================
>> > > > Username: wilde
>> > > > Full Name: Michael Wilde
>> > > > Projects:
>> > > > HTCScienceApps,JGI-Pilot,MTCScienceApps,OOPS,PTMAP,pilot-wilde
>> > > >              ('*' denotes INCITE projects)
>> > > > =================================
>> > > >
>> > > >
>> > > > Hi ALCF Team,
>> > > >
>> > > > The following known issue in Cobalt is currently preventing us
>> > > > from
>> > > > running Swift on Eureka:
>> > > >
>> > > >    http://trac.mcs.anl.gov/projects/cobalt/ticket/462
>> > > >
>> > > > With some additional development effort we can work around this,
>> > > > but
>> > > > it would be much cleaner and better if this were fixed in
>> > > > Cobalt,
>> > > > instead, as suggested in ticket 462 above.
>> > > >
>> > > > Is there any chance that can be done in the next few days?
>> > > > If not, please let me know, and we will implement the
>> > > > work-around
>> > > > instead.
>> > > >
>> > > > This is holding up work on the DOE ParVis project (Rob Jacob,
>> > > > PI)
>> > > > and we've had to move some work we want to run on Eureka to
>> > > > other
>> > > > platforms in the meantime.
>> > > >
>> > > > Thanks very much,
>> > > >
>> > > > Mike
>> > > >
>> > > > 462 is:
>> > > >
>> > > > Ticket #462 (new defect)
>> > > > Opened 7 months ago
>> > > > Cobalt on clusters ignores job script arguments
>> > > >
>> > > > Reported by: acherry
>> > > > Priority: major
>> > > > Component: clients
>> > > >
>> > > > Description
>> > > >
>> > > > It appears that cobalt-launcher.py does not support running a
>> > > > job
>> > > > script or executable with command arguments, even though qsub
>> > > > will
>> > > > accept the arguments, and the man page and help for qsub
>> > > > indicates
>> > > > that arguments are accepted.
>> > > >
>> > > > I'm filing this as a bug rather than a feature request, since
>> > > > the
>> > > > behavior isn't consistent with the documentation. But I'd rather
>> > > > the
>> > > > fix for this to be adding support for args, rather than changing
>> > > > the
>> > > > docs to say they aren't accepted. :-)
>> > > >
>> > > >
>> >
>> > --
>> > Michael Wilde
>> > Computation Institute, University of Chicago
>> > Mathematics and Computer Science Division
>> > Argonne National Laboratory
>> >
>> > _______________________________________________
>> > Swift-devel mailing list
>> > Swift-devel at ci.uchicago.edu
>> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>
>> --
>> Michael Wilde
>> Computation Institute, University of Chicago
>> Mathematics and Computer Science Division
>> Argonne National Laboratory
>>
>> _______________________________________________
>> Swift-devel mailing list
>> Swift-devel at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
>



More information about the Swift-devel mailing list