[Swift-devel] auto-coaster bootstrap for stampede cluster

Ketan Maheshwari ketancmaheshwari at gmail.com
Wed Apr 24 17:10:34 CDT 2013


Thanks David. This sounds very useful for Stampede. I will try it for the
remaining VASP runs on Stampede.


On Wed, Apr 24, 2013 at 3:51 PM, David Kelly <davidk at ci.uchicago.edu> wrote:

> Ketan,
>
> I have gram working to Stampede now. Given the restrictions about running
> swift on the head nodes, I think this is the way to go. I'll add this
> info to the site guide, but for now here is a quick overview of what's
> needed.
>
> Get a proxy: myproxy-logon -l username -s myproxy.teragrid.org
>
> Make sure you have GLOBUS_HOSTNAME and GLOBUS_TCP_PORT_RANGE defined
> correctly.
>
> Use something like this for your sites.xml (with work directory, project,
> and throttle adjusted as needed)
> ---
> <config>
>   <pool handle="stampede">
>     <execution provider="coaster" jobmanager="gt2:gt2:slurm" url="
> login5.stampede.tacc.utexas.edu:2119/jobmanager-slurm"/>
>     <filesystem provider="gsiftp" url="gsiftp://
> gridftp.stampede.tacc.utexas.edu:2811"/>
>     <profile namespace="globus"  key="jobsPerNode">16</profile>
>     <profile namespace="globus"  key="ppn">16</profile>
>     <profile namespace="globus"  key="maxTime">3600</profile>
>     <profile namespace="globus"  key="maxwalltime">00:05:00</profile>
>     <profile namespace="globus"  key="lowOverallocation">100</profile>
>     <profile namespace="globus"  key="highOverallocation">100</profile>
>     <profile namespace="globus"  key="queue">normal</profile>
>     <profile namespace="globus"  key="nodeGranularity">1</profile>
>     <profile namespace="globus"  key="maxNodes">1</profile>
>     <profile namespace="globus"  key="project">TG-EAR130015</profile>
>     <profile namespace="karajan" key="jobThrottle">.3199</profile>
>     <profile namespace="karajan" key="initialScore">10000</profile>
>     <workdirectory>/scratch/01503/davidkel</workdirectory>
>   </pool>
> </config>
> ---
>
> You'll also need the latest version of Swift from SVN. Swift was setting
> some invalid gram RSL attributes that were causing jobs to fail. I added a
> check to verify only valid attributes get set now. I've tested this with a
> simple swift script that calls /bin/hostname and it ran across multiple
> Stampede nodes. I haven't tested it with any larger applications yet -
> please let me know if you run into any problems with it.
>
> Thanks,
> David
>
> ------------------------------
>
> *From: *"Michael Wilde" <wilde at mcs.anl.gov>
> *To: *"Swift Devel" <swift-devel at ci.uchicago.edu>
> *Sent: *Wednesday, April 17, 2013 3:51:31 PM
> *Subject: *[Swift-devel] auto-coaster bootstrap for stampede cluster
>
>
>
> I'm moving this topic to swift-devel, so others, in particular Mihael, can
> weigh in.
>
> - Mike
>
> ----- Forwarded Message -----
> From: "David Kelly" <davidk at ci.uchicago.edu>
> To: "Ketan Maheshwari" <ketan at mcs.anl.gov>
> Cc: "Wilde" <wilde at mcs.anl.gov>
> Sent: Wednesday, April 17, 2013 3:45:30 PM
> Subject: Fwd: auto-coaster bootstrap for stamped
>
> Hey Ketan,
>
> Mike mentioned that you were interested in running remotely to Stampede
> via ssh-cl. Normally we could use ssh-cl like any other site, but the
> problem we ran into here is that we can't run Swift on the stampede head
> node. We need to ssh-cl AND also start swift on a remote worker node, which
> is a setup that hasn't been tested very much.
>
> I believe you've used start-coaster-service before when we were running on
> ec2. You can this configuration for Stampede too. Modify
> coaster-service.conf to set WORKER_NODE=slurm, WORKER_RELAY_HOST=
> stampede.tacc.utexas.edu, and it will generate a slurm script, scp it to
> stampede, and remotely start swift on a worker node. I'll see if I can find
> an example config file for this.
>
> With automatic coaters it's a bit more complicated and completely untested
> as far as I know.
>
> You may be able to use gram2. This worked on Ranger, but haven't tried yet
> on Stampede.
> Mike mentioned in the email below you may be able to change the ssh-cl
> provider to add some kind of prefix command (srun).
> Maybe you can modify your PATH so the 'ssh' command is actually a wrapper
> you created and does something sneaky.
> You may also be able to add a prefix command to
> cog/modules/provider-coaster/resources/bootstrap.sh.
>
> Hopefully this can help you get started - let me know if any of this works
> for you, curious to see how we can get it working well.
>
> David
>
> ----- Forwarded Message -----
>
>
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "David Kelly" <davidk at ci.uchicago.edu>
> Sent: Tuesday, April 16, 2013 10:59:22 AM
> Subject: auto-coaster bootstrap for stamped
>
>
> was: Re: Another item for the to-do list
>
> David, thanks for the details.
>
> Im wondering, for systems like stampede, could automatic coasters work to
> it (eg from swift.rcc) by adding a sinteractive or srun command into the
> middle of the ssh command generated by the ssh-cl parameter?
>
> ie instead of doing ssh -sshargsgere auto-boostrap-coaster-stuff-here.sh
> do: ssh -sshargsgere srun auto-boostrap-coaster-stuff-here.sh
>
> ?
>
> > This is the only mode that I've been able to test on Stampede so far.
> > I will experiment more the others when Stampede is back up.
>
> Others meaning GRAM? Perhaps using myproxy-logon? That *should* work out
> of the box but we've not tested GRAM in ages so it probably doesnt.
>
> Lets keep this lower on the prio list. I just want to be sure we have a
> ticket for this. Please create one if not - thanks.
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
>
>


-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20130424/c44d8cda/attachment.html>


More information about the Swift-devel mailing list