[Swift-devel] Re: alcfbgpnat and BG/P compute-node-to-login-host connectivity

Mihael Hategan hategan at mcs.anl.gov
Wed Dec 1 13:34:24 CST 2010


This code snippet may be of relevance:
if (settings.getAlcfbgpnat()) {
	spec.addEnvironmentVariable("ZOID_ENABLE_NAT", "true");
}

So you should set that env variable for the job if you want NAT.

Mihael

On Wed, 2010-12-01 at 12:18 -0600, Michael Wilde wrote:
> was: Re: [Swift-devel] coaster-service error on Intrepid
> 
> Mihael, how does "alcfbgpnat" work, and what does that imply for running manual persisten coasters on BG/P with the workers launched from a single qsub job?
> 
> Im probing on surveyor at the moment trying to figure out how worker.pl can reach a persistent coaster service on the login node, and seem unable to ping login6 from a compute node.
> 
> Does the worker.pl script (or coaster service) do something special when alcfbgpnat is set to enable connectivity?
> 
> - Mike
> 
> 
> ----- Original Message -----
> > Justin, I was experimenting on PADS with the persistent coaster
> > service; thats where I tested Mihael's fix, which enabled the service
> > to be used repeatedly and to remain up for extended periods of time.
> > 
> > I just started yesterday trying to move that to the BG/P - I think for
> > the same reason as you.
> > 
> > My script is in /home/wilde/swift/lab/pecos/start-coasters on
> > Surveyor.
> > 
> > I'll stop by to see if we can get this working, as it will help us
> > both on the CDM runs.
> > 
> > One thing to note: I run one artificial job to put the service into
> > passive mode, which seems necessary to enable externally started
> > workers to connect to it. Ideally we'll soon just make this a command
> > line flag to the service.
> > 
> > - Mike
> > 
> > 
> > ----- Original Message -----
> > > Hello all
> > > I'm getting started with the coaster-service on Intrepid. I start
> > > up the service and the first run completes. The second fails with
> > > the
> > > trace below. sites.xml is also included below. I'm looking into this
> > > but
> > > I thought I should post it...
> > > Justin
> > >
> > > Intrepid: ~> coaster-service -p 2390 -nosec
> > > Started coaster service: http://140.221.82.115:2390
> > > original callback URI is http://10.40.5.144:32907
> > > callback URI has been overridden to http://172.17.5.144:32907
> > > Failed to send remote log message
> > > org.globus.cog.karajan.workflow.service.channels.ChannelException:
> > > Channel
> > > died and no contact available
> > > at
> > > org.globus.cog.karajan.workflow.service.channels.ChannelManager.connect(ChannelManager.java:235)
> > > at
> > > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:257)
> > > at
> > > org.globus.cog.karajan.workflow.service.channels.ChannelManager.reserveChannel(ChannelManager.java:227)
> > > at
> > > org.globus.cog.abstraction.coaster.rlog.RemoteLogger.log(RemoteLogger.java:31)
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.Block.start(Block.java:87)
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.addBlock(BlockQueueProcessor.java:213)
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.allocateBlocks(BlockQueueProcessor.java:395)
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:518)
> > > at
> > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:100)
> > >
> > > <pool handle="coasters_alcfbgp">
> > > <filesystem provider="local" />
> > > <execution provider="coaster-persistent"
> > > jobmanager="local:cobalt"
> > > url="http://140.221.82.115:2390"
> > > />
> > > <!-- <profile namespace="swift" key="stagingMethod">local</profile>
> > > -->
> > > <profile namespace="globus"
> > > key="internalHostname">172.17.5.144</profile>
> > > <profile namespace="globus" key="project">HTCScienceApps</profile>
> > > <profile namespace="globus" key="queue">prod-devel</profile>
> > > <profile namespace="globus" key="kernelprofile">zeptoos</profile>
> > > <profile namespace="globus" key="alcfbgpnat">true</profile>
> > > <profile namespace="karajan" key="jobthrottle">21</profile>
> > > <profile namespace="karajan" key="initialScore">10000</profile>
> > > <profile namespace="globus" key="workersPerNode">1</profile>
> > > <profile namespace="globus" key="slots">1</profile>
> > > <profile namespace="globus" key="maxTime">3300</profile>
> > > <profile namespace="globus" key="nodeGranularity">64</profile>
> > > <profile namespace="globus" key="maxNodes">64</profile>
> > > <profile namespace="globus"
> > > key="hookClass">org.globus.swift.data.policy.AllocationHook
> > > </profile>
> > > <!-- <scratch>/scratch</scratch> -->
> > > <workdirectory>/home/wozniak/work</workdirectory>
> > > </pool>
> > >
> > >
> > > --
> > > Justin M Wozniak
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > 
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 





More information about the Swift-devel mailing list