[Swift-devel] Problems running Swift on BG/P

Michael Wilde wilde at mcs.anl.gov
Tue Feb 28 23:07:35 CST 2012


Emalayan and I spent a considerable amount of time debugging Swift on surveyor tonight.

As far as I can tell, after fixing a few config problems, it seems like the workers are unable to connect the coaster service. They seem to be trying to connect on the correct address. The workers start, and produce logs, but dont seem to make connections.

I noticed the following email thread:
  http://lists.ci.uchicago.edu/pipermail/swift-devel/2010-December/007099.html

which talk about the sites attribute "alcfbgpnat" and state:
---
This code snippet may be of relevance:
if (settings.getAlcfbgpnat()) {
	spec.addEnvironmentVariable("ZOID_ENABLE_NAT", "true");
}

So you should set that env variable for the job if you want NAT.
---

Is this being done in the current start-coaster-service job? (Presumably needs to be done in the cobalt job?)

We also noticed that Emalayan was unable to follow the standard recipe for logging into the compute nodes of a running job. He could get to the IOP, but from there, got something like "no route to host" when he tried to telnet (or ping?) to the compute nodes.

I'll check on the ZOID_ENABLE_NAT setting, but any thoughts?

Thanks,

- Mike

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list