[Swift-user] Job bundles

Ioan Raicu iraicu at cs.uchicago.edu
Tue Nov 6 12:36:48 CST 2007

Here is what I get at the UC/ANL TG site:
qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T

iraicu at tg-viz-login2:~> showq -u iraicu

active jobs------------------------
JOBID              USERNAME      STATE PROCS   REMAINING            

1574623              iraicu    Running     2    00:29:55  Tue Nov  6 
1574621              iraicu    Running     2    00:29:21  Tue Nov  6 

2 active jobs             4 of 242 processors in use by local jobs (1.65%)
                         20 of 121 nodes active      (16.53%)

eligible jobs----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            

0 eligible jobs  

blocked jobs-----------------------
JOBID              USERNAME      STATE PROCS     WCLIMIT            

0 blocked jobs  

Total jobs:  2

Notice that both jobs have 2 processors allocated!  These same commands 
on TeraPort would have yielded one allocation with 1 processor and 
another with 2 processors.  This is what I meant by "it a policy thing", 
because PBS can be configured to ignore the ppn field.


Ben Clifford wrote:
> That's what the ppn parameter specifies to PBS.
> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>> Right, its not that PBS doesn't support it, its more of a policy thing.  On
>> the TeraGrid, my experience has been that when PBS (or whatever LRM is being
>> used) allocates CPUs, it always allocates at the machine level, not at the CPU
>> level.  That means, if you have an 8 processor machine, and you get 1
>> processor on that machine, then you get (and are charged for) the whole
>> machine as you have exclusive rights to this machine for the duration of your
>> reservation.  I have seen this behave differently in other environments, such
>> as TeraPort, where PBS was allocating at the processor level, and not the
>> machine level.  This is why I said that I think Swift would need to somehow
>> handle this at the worker node scripts, and not rely necessarily on the LRM
>> doing this. 
>> Ioan
>> Ben Clifford wrote:
>>> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>>>> 2) the LRM allows the partitioning of the SMP machine into smaller pieces;
>>>> for
>>>> example, with 8 processor node, if it lets you submit 8 jobs that only
>>>> need 1
>>>> processor, and it will launch 8 different jobs on the same node, then you
>>>> are
>>>> fine... the parallelism will be done automatically by the LRM, as long as
>>>> you
>>>> ask for only 1 process at a time; on the TG at least, I don't think this
>>>> is
>>>> how things work, and when you get a node, regardless of how many
>>>> processors it
>>>> has, you get full access to all processors, not just the ones you asked
>>>> for.
>>> PBS allows the specification of multiple processes per node, like this
>>> (grabbed from google)
>>>> qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
>>> It looks like abe runs PBS.
>>> So I think you could specify a globus profile key in the sites.xml, perhaps
>>> something like this:
>>>  <profile namespace="globus" key="ppn">8</profile>
>>> I haven't tried this myself, but I'd be interested to hear your results.

Ioan Raicu
Ph.D. Student
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20071106/e0fd7741/attachment.html>

More information about the Swift-user mailing list