[Swift-user] Coasters and PBS resource requests: nodes and ppn

Matthew Woitaszek matthew.woitaszek at gmail.com
Mon Nov 8 11:34:03 CST 2010


Mihael,

I confirm that the "ppn" attribute now gets passed through to PBS, which can
be used to force Torque-based clusters using the local:pbs provider to
allocate the entire node. I tested with 1 and 2 nodes.

This is exactly what I was hoping for -- thank you very much.

* * *

One observational note:

At least on my Torque-scheduled cluster, using
  -l nodes=1:ppn=8
puts 8 copies of the hostname in the PBS_NODEFILE.

Since the Coasters multi-node PBS script does a simple cat/loop/ssh over
PBS_NODEFILE, when using PPN > 1, ppn copies of the Perl script get run on
each node. Thus, it's important that workersPerNode be set to 1.

  <profile namespace="globus" key="workersPerNode">1</profile>

This works fine for me. I'll defer to the broader discussion of nodes,
workers per node, and the variables that make things work... regarding
whether something like NODE= "cat $PBS_NODEFILE | sort | uniq" would be
prefered to run just one script per node with worker count control returned
to workersPerNode...

   NODES=`cat $PBS_NODEFILE`  [could be edited to enforce only one entry per
physical node]
   ...
   for NODE in $NODES; do
      ...
      ssh $NODE /bin/bash -c ...

Matthew


On Sun, Nov 7, 2010 at 12:57 PM, Mihael Hategan <hategan at mcs.anl.gov> wrote:

>
> Attributes are not directly copied since there is no one-to-one mapping
> between jobs and coaster blocks. So theoretically some "merge" operation
> needs to exist.
>
> I added "ppn" as one of the attributes that is copied from the first
> job, so the scenario I mentioned should now work.
>
> This is cog r2927/trunk.
>
> Mihael
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20101108/04ad2b8a/attachment.html>


More information about the Swift-user mailing list