[Swift-devel] auto-coaster bootstrap for stampede cluster

Michael Wilde wilde at mcs.anl.gov
Thu Apr 25 09:25:04 CDT 2013


David, this sounds great - nice work.

Can you test with multiple, mixed sites and provider and gridftp
staging? Try e.g. Stampede+trestles(+midway+beagle+kraken)

Also gt2:slurm:slurm might work well.

Please add this all to the site guide (ideally with a diagram).

Mihael, how hard would it be to make ssh-cl:slurm:slurm to work? I.e.
start the coaster service ond the remote site as a slurm job instaed
of on the login host, which is the objective of this configuration.

Very cool.

- Mike

On 4/24/13, David Kelly <davidk at ci.uchicago.edu> wrote:
> Ketan,
>
>
> I have gram working to Stampede now. Given the restrictions about running
> swift on the head nodes, I think this is the way to go. I'll add this info
> to the site guide, but for now here is a quick overview of what's needed.
>
>
> Get a proxy: myproxy-logon -l username -s myproxy.teragrid.org
>
>
> Make sure you have GLOBUS_HOSTNAME and GLOBUS_TCP_PORT_RANGE defined
> correctly.
>
>
> Use something like this for your sites .xml (with work directory, project,
> and throttle adjusted as needed)
> ---
>
>
> <config>
> <pool handle="stampede">
> <execution provider="coaster" jobmanager="gt2:gt2:slurm"
> url="login5.stampede.tacc.utexas.edu:2119/jobmanager-slurm"/>
> <filesystem provider="gsiftp"
> url="gsiftp://gridftp.stampede.tacc.utexas.edu:2811"/>
> <profile namespace="globus" key="jobsPerNode">16</profile>
> <profile namespace="globus" key="ppn">16</profile>
> <profile namespace="globus" key="maxTime">3600</profile>
> <profile namespace="globus" key="maxwalltime">00:05:00</profile>
> <profile namespace="globus" key="lowOverallocation">100</profile>
> <profile namespace="globus" key="highOverallocation">100</profile>
> <profile namespace="globus" key="queue">normal</profile>
> <profile namespace="globus" key="nodeGranularity">1</profile>
> <profile namespace="globus" key="maxNodes">1</profile>
> <profile namespace="globus" key="project">TG-EAR130015</profile>
> <profile namespace="karajan" key="jobThrottle">.3199</profile>
> <profile namespace="karajan" key="initialScore">10000</profile>
> <workdirectory>/scratch/01503/davidkel</workdirectory>
> </pool>
> </config>
> ---
>
>
> You'll also need the latest version of Swift from SVN. Swift was setting
> some invalid gram RSL attributes that were causing jobs to fail. I added a
> check to verify only valid attributes get set now. I've tested this with a
> simple swift script that calls /bin/hostname and it ran across multiple
> Stampede nodes. I haven't tested it with any larger applications yet -
> please let me know if you run into any problems with it.
>
>
> Thanks,
> David
> ----- Original Message -----
>
>
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Wednesday, April 17, 2013 3:51:31 PM
> Subject: [Swift-devel] auto-coaster bootstrap for stampede cluster
>
>
> I'm moving this topic to swift-devel, so others, in particular Mihael, can
> weigh in.
>
> - Mike
>
> ----- Forwarded Message -----
> From: "David Kelly" <davidk at ci.uchicago.edu>
> To: "Ketan Maheshwari" <ketan at mcs.anl.gov>
> Cc: "Wilde" <wilde at mcs.anl.gov>
> Sent: Wednesday, April 17, 2013 3:45:30 PM
> Subject: Fwd: auto-coaster bootstrap for stamped
>
> Hey Ketan,
>
> Mike mentioned that you were interested in running remotely to Stampede via
> ssh-cl. Normally we could use ssh-cl like any other site, but the problem we
> ran into here is that we can't run Swift on the stampede head node. We need
> to ssh-cl AND also start swift on a remote worker node, which is a setup
> that hasn't been tested very much.
>
> I believe you've used start-coaster-service before when we were running on
> ec2. You can this configuration for Stampede too. Modify
> coaster-service.conf to set WORKER_NODE=slurm,
> WORKER_RELAY_HOST=stampede.tacc.utexas.edu, and it will generate a slurm
> script, scp it to stampede, and remotely start swift on a worker node. I'll
> see if I can find an example config file for this.
>
> With automatic coaters it's a bit more complicated and completely untested
> as far as I know.
>
> You may be able to use gram2. This worked on Ranger, but haven't tried yet
> on Stampede.
> Mike mentioned in the email below you may be able to change the ssh-cl
> provider to add some kind of prefix command (srun).
> Maybe you can modify your PATH so the 'ssh' command is actually a wrapper
> you created and does something sneaky.
> You may also be able to add a prefix command to
> cog/modules/provider-coaster/resources/bootstrap.sh.
>
> Hopefully this can help you get started - let me know if any of this works
> for you, curious to see how we can get it working well.
>
> David
>
> ----- Forwarded Message -----
>
>
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "David Kelly" <davidk at ci.uchicago.edu>
> Sent: Tuesday, April 16, 2013 10:59:22 AM
> Subject: auto-coaster bootstrap for stamped
>
>
> was: Re: Another item for the to-do list
>
> David, thanks for the details.
>
> Im wondering, for systems like stampede, could automatic coasters work to it
> (eg from swift.rcc) by adding a sinteractive or srun command into the middle
> of the ssh command generated by the ssh-cl parameter?
>
> ie instead of doing ssh -sshargsgere auto-boostrap-coaster-stuff-here.sh
> do: ssh -sshargsgere srun auto-boostrap-coaster-stuff-here.sh
>
> ?
>
>> This is the only mode that I've been able to test on Stampede so far.
>> I will experiment more the others when Stampede is back up.
>
> Others meaning GRAM? Perhaps using myproxy-logon? That *should* work out of
> the box but we've not tested GRAM in ages so it probably doesnt.
>
> Lets keep this lower on the prio list. I just want to be sure we have a
> ticket for this. Please create one if not - thanks.
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
>
>
>

-- 
Sent from my mobile device



More information about the Swift-devel mailing list