[Swift-user] Coasters and PBS resource requests: nodes and ppn

Thu Nov 4 10:06:58 CDT 2010

[long response follows, sorry - I tried to condense but this hits messy issues]

Hi Matthew,

Your question hits issues that we need to resolve and and do more testing on.

Most common modes seem to be working, but I have been worried that some bugs remain and its possible - but not 100% clear - that we'll need more attribute-setting control.  I think that some node-packing issues Marcin encountered on the Argonne PBS Fusion cluster went unresolved.

Specifically, Ive been suspicious that with automated (default) coaster operation, there may be cases with PBS and SGE where we either get too few (1 instead of N) or too many (N^2 instead of N) jobs running per node.

I'll try to cover these by answering your questions, below.

----- Original Message -----
> Good afternoon,
> 
> Is there a way to update PBS resource requests when using coasters to
> supply modified PBS resource strings such as "nodes=1:ppn=8"? (Or
> other arbitrary resource requests, such as node properties?)

Not that I know of.

You can set the number of cores that should be used on each node using the coasters pool attribute "workersPerNode". But see issues in the table below.

You can also start coaster workers manually, in which case you can set any scheduler attributes explicitly. We have a growing set of scripts that enable this, but they're not ready for release yet. We hope to integrate this option into the evolving swiftconfig/swiftrun tools that you may have seen discussed on the list and which are in the trunk but not yet documented in the users guide. Lets discuss this possibility in a separate thread if after reading thing you feel you need it.

> Of course, I'm just trying to get coasters to allocate all of the
> processors on an 8-core node, using either the "gt2:gt2:pbs" or
> "local:pbs" provider. Both submit jobs just fine. I found no
> discernible difference with the "host_types" Globus namespace
> variable, presuming I'm setting it right.

Did you try just setting workersPerNode (in the Globus profile) to 8?  This should be working with coasters on PBS and gt2:gt2:pbs, and Im pretty sure is working on TCC Ranger (and SGE machine with N=16). Note that this attribute is in the "Globus" profile set but that's a misnomer - many attributes in that profile affect coasters and the local providers and are unrelated to Globus operation per se.

> The particular cluster I'm using allows node packing for users that
> run lots of single-processor tasks, so without ppn, it will assume
> nodes=1,ncpus=1 and thus pack 8 jobs on each node before moving on to
> the next node. (I know it won't be an issue at sites that make nodes
> exclusive. On this system, the queue default is "nodes=1:ppn=8", but
> because coasters explicitly specifies the number of nodes in its
> generated resource request, the ppn default seems to get lost!)

You can set "debug=true" in etc/provider-pbs.properties, and then Swift will retain the submit file in $HOME/.globus/scripts, so you can verify the scheduler directives that Swift is setting.

> I see that this has been discussed as far back as 2007, and I found
> Marcin and Mike's previous discussion of the topic at
> 
> http://mail.ci.uchicago.edu/pipermail/swift-user/2010-March/001409.html

Right - that issue is still unresolved. I'll try to push it forward. Can you help us determine the right conventions and then verify that they are working for you?

The issue, I think, is that the user either needs to know whether or not the scheduler does node packing, or a way to specify job attributes in a way that makes such knowledge un-necessary. 

What I think we need is:

- a set of attributes that forces the scheduler to allocate complete nodes and give the user control over how many jobs to run per node

- a set of attributes that assumes the scheduler *will* pack nodes, and that does the right thing in that case.

In summary I think the current situation is this:

- when coasters submits a 1-node job:
  -- workersPerNode=1
     o works fine if scheduler packs nodes
     o uses only 1 core if scheduler does not pack nodes
  -- workersPerNode=N
     o runs up to N^2 tasks if scheduler packs nodes
     o works fine if scheduler does not pack nodes
- when coasters submits an N-node job, N>1
  -- workersPerNode=1
     o works fine if scheduler packs nodes
     o uses only 1 core per node if scheduler does not pack nodes
  -- workersPerNode=N
     o runs up to N^2 tasks if scheduler packs nodes
     o works fine if scheduler does not pack nodes

Based on the above cases, it seems that "all is fine" as long as the sites description is set based on whether the scheduler will node-pack or not, and that workersPerNode is set correctly.

Typically, you want to run either in 1-core packing mode, or N-core full-node-allocation mode.

I *think* that some schedulers (pbs, maybe sge) may determine packing behavior based on the queue, or in the case of SGE, perhaps by the parallel environment (PE). For now, the user must know how to match the sites.xml spec to the behavior of the target cluster.  We're trying to work out suggested specs for all the clusters in the Argonne/UChicago/TeraGrid mode, and most of Open Science Grid as well.

I am worried that there remains an issue in what we call "multi-node" operation. When a coaster job ("slot") uses more than one node, then Swift itself needs to start the coaster agent, worker.pl, on each node in the job. This is done with explicit shell code that Swift places in the submit file.

I have argued in the past that we simply need one more attribute (I called it coresPerNode) which tells swift exactly how many cores per node to request from the scheduler. The two typical values would be 1 (if the user wants to use node packing) and N, where N is the actual number of cores per node, when the user wants to allocate entire nodes.

I think this may be needed for SGE but possible not PBS. Im pretty sure we have cases in SGE local-scheduler coaster mode where the provider needs this in order to formulate a submit file that SGE will accept.

Mihael did not agree that this was necessary, and we never resolved the issue.

So what we need to discuss, test and resolve is:

- is the Coaster provider correctly handling both "node-packed" and "full-node" mode?

- is it handling these modes correctly with both the local-scheduler parent provider and with GT2?

- with SGE, are we working correctly with all or most processing environments and job-launching programs? How we we know, for a given SGE deployment? Are we starting multi-node jobs correctly on all schedulers in all modes?

- do we have a sufficient way to set scheduler attributes?

- we need to automate the testing of the many mode combinations that are likely to be used.

If you are willing to help define (or even develop) and test improvements, we'd welcome your assistance.

Sorry for the long response. I think with more analysis we can simplify the issue.

Regards,

Mike

> but there didn't seem to be any definitive conclusion. Any suggestions
> would be appreciated!
> 
> Matthew
> 
> 
> 
> 
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory