[Swift-devel] Re: [Swift-user] pbs ppn count and stuff
Michael Wilde
wilde at mcs.anl.gov
Tue Feb 1 15:34:55 CST 2011
Hi Mihael,
This issue is very timely - it came up in our meeting on the 0.92 release.
I dont understand the specifics of much of what you say below, regarding which of the many count parameters you are referring to, how this works with coasters, plain PBS and SGE (and Condor providers), and MPI issues.
I think a good step would be to help us (Sarah, Justin, and me) update the User Guide with all that a user needs to know to get node and processor counts specified correctly for the many different configurations of sites and Swift that are possible.
Some of my initial questions are below. Maybe this would be best discussed in a teleconference, but we can start by trying to clarify the issues using this email thread.
> On Mon, 2011-01-24 at 10:46 -0800, Mihael Hategan wrote:
> > So I think some of the problems with ppn are as follows:
> > 1. count in cog means number of processes. count in PBS means number
> > of
> > nodes.
What is "count in cog"? Presumably a pool attribute? How does it get specified both for coasters and non-coasters? Is this related to the xcount parameter in the GLOBUS profile in the Swift User Guide MPI example: GLOBUS::host_xcount=3 ?
> > 2. when the number of nodes requested was 1 but ppn > 1,
You mean the number of nodes that Swift requested in the PBS submit file?
as in #PBS -l nodes=$nodes:ppn=$cores
> the
> > multiple
> > job scheme was not enabled so, despite having multiple lines in
> > PBS_NODEFILE, only one process would get started. If count was > 1
> > then
> > PBS would understand that count*ppn lines should be in PBS_NODEFILE,
> > which would result in that number of processes be started. In other
> > words there was no way to tell PBS to start 4 jobs on only one node.
> > So:
> >
> > - I changed this to be consistent with 1. Count means number of
> > processes to be started. This imposes the restriction that count %
> > ppn =
> > 0. If not, the pbs provider will throw an exception.
# of processes to be started is number of workers in coaster case?
> > - I also added mppnppn if USE_MPPWIDTH is enabled.
Where & how should USE_MPPWIDTH be specified?
> >
> > This is in trunk.
Should it be retrofitted to 0.92?
Does it apply to SGE and the associated "pe" parallel environment issues?
How does it relate to workersPerNode and the various coaster settings that control size of node allocations?
How does it relate to issues of whether or not a site does node-packing, and whether or not a user wants to use node-packing (ie single-core jobs in most or all cases).
I apologize that I cant formulate the question cleanly, but Im finding the terminology and processor-count model between Swift, cog, coasters, and multiple schedulers with multiple modes to be so complex as to require a more detailed review of this entire issue, with a Swift end-user focus.
Lets start with a voice call and then bring the issue back to the devel list.
- Mike
> >
> > Mihael
> >
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
>
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-devel
mailing list