[Swift-devel] Problems running Swift on BG/P

Jonathan Monette jonmon at mcs.anl.gov
Tue Feb 28 23:09:28 CST 2012


Is the internalHostname variable being set in the sites file? It should be set to the 172.*.* address returned from ifconfig

On Feb 28, 2012, at 11:07 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:

> Emalayan and I spent a considerable amount of time debugging Swift on surveyor tonight.
> 
> As far as I can tell, after fixing a few config problems, it seems like the workers are unable to connect the coaster service. They seem to be trying to connect on the correct address. The workers start, and produce logs, but dont seem to make connections.
> 
> I noticed the following email thread:
>  http://lists.ci.uchicago.edu/pipermail/swift-devel/2010-December/007099.html
> 
> which talk about the sites attribute "alcfbgpnat" and state:
> ---
> This code snippet may be of relevance:
> if (settings.getAlcfbgpnat()) {
>    spec.addEnvironmentVariable("ZOID_ENABLE_NAT", "true");
> }
> 
> So you should set that env variable for the job if you want NAT.
> ---
> 
> Is this being done in the current start-coaster-service job? (Presumably needs to be done in the cobalt job?)
> 
> We also noticed that Emalayan was unable to follow the standard recipe for logging into the compute nodes of a running job. He could get to the IOP, but from there, got something like "no route to host" when he tried to telnet (or ping?) to the compute nodes.
> 
> I'll check on the ZOID_ENABLE_NAT setting, but any thoughts?
> 
> Thanks,
> 
> - Mike
> 
> -- 
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel



More information about the Swift-devel mailing list