[Swift-user] Coasters and PBS resource requests: nodes and ppn
Matthew Woitaszek
matthew.woitaszek at gmail.com
Mon Nov 8 11:34:03 CST 2010
Mihael,
I confirm that the "ppn" attribute now gets passed through to PBS, which can
be used to force Torque-based clusters using the local:pbs provider to
allocate the entire node. I tested with 1 and 2 nodes.
This is exactly what I was hoping for -- thank you very much.
* * *
One observational note:
At least on my Torque-scheduled cluster, using
-l nodes=1:ppn=8
puts 8 copies of the hostname in the PBS_NODEFILE.
Since the Coasters multi-node PBS script does a simple cat/loop/ssh over
PBS_NODEFILE, when using PPN > 1, ppn copies of the Perl script get run on
each node. Thus, it's important that workersPerNode be set to 1.
<profile namespace="globus" key="workersPerNode">1</profile>
This works fine for me. I'll defer to the broader discussion of nodes,
workers per node, and the variables that make things work... regarding
whether something like NODE= "cat $PBS_NODEFILE | sort | uniq" would be
prefered to run just one script per node with worker count control returned
to workersPerNode...
NODES=`cat $PBS_NODEFILE` [could be edited to enforce only one entry per
physical node]
...
for NODE in $NODES; do
...
ssh $NODE /bin/bash -c ...
Matthew
On Sun, Nov 7, 2010 at 12:57 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:
>
> Attributes are not directly copied since there is no one-to-one mapping
> between jobs and coaster blocks. So theoretically some "merge" operation
> needs to exist.
>
> I added "ppn" as one of the attributes that is copied from the first
> job, so the scenario I mentioned should now work.
>
> This is cog r2927/trunk.
>
> Mihael
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20101108/04ad2b8a/attachment.html>
More information about the Swift-user
mailing list