[Swift-devel] auto-coaster bootstrap for stampede cluster
David Kelly
davidk at ci.uchicago.edu
Wed Apr 24 15:51:52 CDT 2013
Ketan,
I have gram working to Stampede now. Given the restrictions about running swift on the head nodes, I think this is the way to go. I'll add this info to the site guide, but for now here is a quick overview of what's needed.
Get a proxy: myproxy-logon -l username -s myproxy.teragrid.org
Make sure you have GLOBUS_HOSTNAME and GLOBUS_TCP_PORT_RANGE defined correctly.
Use something like this for your sites .xml (with work directory, project, and throttle adjusted as needed)
---
<config>
<pool handle="stampede">
<execution provider="coaster" jobmanager="gt2:gt2:slurm" url="login5.stampede.tacc.utexas.edu:2119/jobmanager-slurm"/>
<filesystem provider="gsiftp" url="gsiftp://gridftp.stampede.tacc.utexas.edu:2811"/>
<profile namespace="globus" key="jobsPerNode">16</profile>
<profile namespace="globus" key="ppn">16</profile>
<profile namespace="globus" key="maxTime">3600</profile>
<profile namespace="globus" key="maxwalltime">00:05:00</profile>
<profile namespace="globus" key="lowOverallocation">100</profile>
<profile namespace="globus" key="highOverallocation">100</profile>
<profile namespace="globus" key="queue">normal</profile>
<profile namespace="globus" key="nodeGranularity">1</profile>
<profile namespace="globus" key="maxNodes">1</profile>
<profile namespace="globus" key="project">TG-EAR130015</profile>
<profile namespace="karajan" key="jobThrottle">.3199</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
<workdirectory>/scratch/01503/davidkel</workdirectory>
</pool>
</config>
---
You'll also need the latest version of Swift from SVN. Swift was setting some invalid gram RSL attributes that were causing jobs to fail. I added a check to verify only valid attributes get set now. I've tested this with a simple swift script that calls /bin/hostname and it ran across multiple Stampede nodes. I haven't tested it with any larger applications yet - please let me know if you run into any problems with it.
Thanks,
David
----- Original Message -----
From: "Michael Wilde" <wilde at mcs.anl.gov>
To: "Swift Devel" <swift-devel at ci.uchicago.edu>
Sent: Wednesday, April 17, 2013 3:51:31 PM
Subject: [Swift-devel] auto-coaster bootstrap for stampede cluster
I'm moving this topic to swift-devel, so others, in particular Mihael, can weigh in.
- Mike
----- Forwarded Message -----
From: "David Kelly" <davidk at ci.uchicago.edu>
To: "Ketan Maheshwari" <ketan at mcs.anl.gov>
Cc: "Wilde" <wilde at mcs.anl.gov>
Sent: Wednesday, April 17, 2013 3:45:30 PM
Subject: Fwd: auto-coaster bootstrap for stamped
Hey Ketan,
Mike mentioned that you were interested in running remotely to Stampede via ssh-cl. Normally we could use ssh-cl like any other site, but the problem we ran into here is that we can't run Swift on the stampede head node. We need to ssh-cl AND also start swift on a remote worker node, which is a setup that hasn't been tested very much.
I believe you've used start-coaster-service before when we were running on ec2. You can this configuration for Stampede too. Modify coaster-service.conf to set WORKER_NODE=slurm, WORKER_RELAY_HOST=stampede.tacc.utexas.edu, and it will generate a slurm script, scp it to stampede, and remotely start swift on a worker node. I'll see if I can find an example config file for this.
With automatic coaters it's a bit more complicated and completely untested as far as I know.
You may be able to use gram2. This worked on Ranger, but haven't tried yet on Stampede.
Mike mentioned in the email below you may be able to change the ssh-cl provider to add some kind of prefix command (srun).
Maybe you can modify your PATH so the 'ssh' command is actually a wrapper you created and does something sneaky.
You may also be able to add a prefix command to cog/modules/provider-coaster/resources/bootstrap.sh.
Hopefully this can help you get started - let me know if any of this works for you, curious to see how we can get it working well.
David
----- Forwarded Message -----
From: "Michael Wilde" <wilde at mcs.anl.gov>
To: "David Kelly" <davidk at ci.uchicago.edu>
Sent: Tuesday, April 16, 2013 10:59:22 AM
Subject: auto-coaster bootstrap for stamped
was: Re: Another item for the to-do list
David, thanks for the details.
Im wondering, for systems like stampede, could automatic coasters work to it (eg from swift.rcc) by adding a sinteractive or srun command into the middle of the ssh command generated by the ssh-cl parameter?
ie instead of doing ssh -sshargsgere auto-boostrap-coaster-stuff-here.sh
do: ssh -sshargsgere srun auto-boostrap-coaster-stuff-here.sh
?
> This is the only mode that I've been able to test on Stampede so far.
> I will experiment more the others when Stampede is back up.
Others meaning GRAM? Perhaps using myproxy-logon? That *should* work out of the box but we've not tested GRAM in ages so it probably doesnt.
Lets keep this lower on the prio list. I just want to be sure we have a ticket for this. Please create one if not - thanks.
_______________________________________________
Swift-devel mailing list
Swift-devel at ci.uchicago.edu
https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20130424/dca096bf/attachment.html>
More information about the Swift-devel
mailing list