[Swift-user] Job bundles

Ben Clifford benc at hawaga.org.uk
Tue Nov 6 12:57:46 CST 2007


yeah, I see same. though the TG UC docs suggest it should work.

I can't log into abe to see what happens there but it would be interesting 
to know.

On Tue, 6 Nov 2007, Ioan Raicu wrote:

> Here is what I get at the UC/ANL TG site:
> qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
> qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
> 
> iraicu at tg-viz-login2:~> showq -u iraicu
> 
> active jobs------------------------
> JOBID              USERNAME      STATE PROCS   REMAINING            STARTTIME
> 
> 1574623              iraicu    Running     2    00:29:55  Tue Nov  6 12:34:23
> 1574621              iraicu    Running     2    00:29:21  Tue Nov  6 12:33:49
> 
> 2 active jobs             4 of 242 processors in use by local jobs (1.65%)
>                         20 of 121 nodes active      (16.53%)
> 
> eligible jobs----------------------
> JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME
> 
> 
> 0 eligible jobs  
> blocked jobs-----------------------
> JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME
> 
> 
> 0 blocked jobs  
> Total jobs:  2
> 
> Notice that both jobs have 2 processors allocated!  These same commands on
> TeraPort would have yielded one allocation with 1 processor and another with 2
> processors.  This is what I meant by "it a policy thing", because PBS can be
> configured to ignore the ppn field.
> 
> Ioan
> 
> Ben Clifford wrote:
> > That's what the ppn parameter specifies to PBS.
> > 
> > On Tue, 6 Nov 2007, Ioan Raicu wrote:
> > 
> >   
> > > Right, its not that PBS doesn't support it, its more of a policy thing.
> > > On
> > > the TeraGrid, my experience has been that when PBS (or whatever LRM is
> > > being
> > > used) allocates CPUs, it always allocates at the machine level, not at the
> > > CPU
> > > level.  That means, if you have an 8 processor machine, and you get 1
> > > processor on that machine, then you get (and are charged for) the whole
> > > machine as you have exclusive rights to this machine for the duration of
> > > your
> > > reservation.  I have seen this behave differently in other environments,
> > > such
> > > as TeraPort, where PBS was allocating at the processor level, and not the
> > > machine level.  This is why I said that I think Swift would need to
> > > somehow
> > > handle this at the worker node scripts, and not rely necessarily on the
> > > LRM
> > > doing this. Ioan
> > > 
> > > Ben Clifford wrote:
> > >     
> > > > On Tue, 6 Nov 2007, Ioan Raicu wrote:
> > > > 
> > > >         
> > > > > 2) the LRM allows the partitioning of the SMP machine into smaller
> > > > > pieces;
> > > > > for
> > > > > example, with 8 processor node, if it lets you submit 8 jobs that only
> > > > > need 1
> > > > > processor, and it will launch 8 different jobs on the same node, then
> > > > > you
> > > > > are
> > > > > fine... the parallelism will be done automatically by the LRM, as long
> > > > > as
> > > > > you
> > > > > ask for only 1 process at a time; on the TG at least, I don't think
> > > > > this
> > > > > is
> > > > > how things work, and when you get a node, regardless of how many
> > > > > processors it
> > > > > has, you get full access to all processors, not just the ones you
> > > > > asked
> > > > > for.
> > > > >             
> > > > PBS allows the specification of multiple processes per node, like this
> > > > (grabbed from google)
> > > > 
> > > >         
> > > > > qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
> > > > >             
> > > > It looks like abe runs PBS.
> > > > 
> > > > So I think you could specify a globus profile key in the sites.xml,
> > > > perhaps
> > > > something like this:
> > > > 
> > > >  <profile namespace="globus" key="ppn">8</profile>
> > > > 
> > > > I haven't tried this myself, but I'd be interested to hear your results.
> > > >         
> > >     
> > 
> >   
> 
> 



More information about the Swift-user mailing list