[Swift-user] Job bundles

Ioan Raicu iraicu at cs.uchicago.edu
Tue Nov 6 13:10:16 CST 2007


If the docs say that PBS should support this option, maybe write help at tg 
to ask them why it doesn't work as the docs say.

Ioan

Ben Clifford wrote:
> yeah, I see same. though the TG UC docs suggest it should work.
>
> I can't log into abe to see what happens there but it would be interesting 
> to know.
>
> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>
>   
>> Here is what I get at the UC/ANL TG site:
>> qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
>> qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
>>
>> iraicu at tg-viz-login2:~> showq -u iraicu
>>
>> active jobs------------------------
>> JOBID              USERNAME      STATE PROCS   REMAINING            STARTTIME
>>
>> 1574623              iraicu    Running     2    00:29:55  Tue Nov  6 12:34:23
>> 1574621              iraicu    Running     2    00:29:21  Tue Nov  6 12:33:49
>>
>> 2 active jobs             4 of 242 processors in use by local jobs (1.65%)
>>                         20 of 121 nodes active      (16.53%)
>>
>> eligible jobs----------------------
>> JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME
>>
>>
>> 0 eligible jobs  
>> blocked jobs-----------------------
>> JOBID              USERNAME      STATE PROCS     WCLIMIT            QUEUETIME
>>
>>
>> 0 blocked jobs  
>> Total jobs:  2
>>
>> Notice that both jobs have 2 processors allocated!  These same commands on
>> TeraPort would have yielded one allocation with 1 processor and another with 2
>> processors.  This is what I meant by "it a policy thing", because PBS can be
>> configured to ignore the ppn field.
>>
>> Ioan
>>
>> Ben Clifford wrote:
>>     
>>> That's what the ppn parameter specifies to PBS.
>>>
>>> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>>>
>>>   
>>>       
>>>> Right, its not that PBS doesn't support it, its more of a policy thing.
>>>> On
>>>> the TeraGrid, my experience has been that when PBS (or whatever LRM is
>>>> being
>>>> used) allocates CPUs, it always allocates at the machine level, not at the
>>>> CPU
>>>> level.  That means, if you have an 8 processor machine, and you get 1
>>>> processor on that machine, then you get (and are charged for) the whole
>>>> machine as you have exclusive rights to this machine for the duration of
>>>> your
>>>> reservation.  I have seen this behave differently in other environments,
>>>> such
>>>> as TeraPort, where PBS was allocating at the processor level, and not the
>>>> machine level.  This is why I said that I think Swift would need to
>>>> somehow
>>>> handle this at the worker node scripts, and not rely necessarily on the
>>>> LRM
>>>> doing this. Ioan
>>>>
>>>> Ben Clifford wrote:
>>>>     
>>>>         
>>>>> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>>>>>
>>>>>         
>>>>>           
>>>>>> 2) the LRM allows the partitioning of the SMP machine into smaller
>>>>>> pieces;
>>>>>> for
>>>>>> example, with 8 processor node, if it lets you submit 8 jobs that only
>>>>>> need 1
>>>>>> processor, and it will launch 8 different jobs on the same node, then
>>>>>> you
>>>>>> are
>>>>>> fine... the parallelism will be done automatically by the LRM, as long
>>>>>> as
>>>>>> you
>>>>>> ask for only 1 process at a time; on the TG at least, I don't think
>>>>>> this
>>>>>> is
>>>>>> how things work, and when you get a node, regardless of how many
>>>>>> processors it
>>>>>> has, you get full access to all processors, not just the ones you
>>>>>> asked
>>>>>> for.
>>>>>>             
>>>>>>             
>>>>> PBS allows the specification of multiple processes per node, like this
>>>>> (grabbed from google)
>>>>>
>>>>>         
>>>>>           
>>>>>> qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
>>>>>>             
>>>>>>             
>>>>> It looks like abe runs PBS.
>>>>>
>>>>> So I think you could specify a globus profile key in the sites.xml,
>>>>> perhaps
>>>>> something like this:
>>>>>
>>>>>  <profile namespace="globus" key="ppn">8</profile>
>>>>>
>>>>> I haven't tried this myself, but I'd be interested to hear your results.
>>>>>         
>>>>>           
>>>>     
>>>>         
>>>   
>>>       
>>     
>
>   

-- 
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web:   http://www.cs.uchicago.edu/~iraicu
       http://dsl.cs.uchicago.edu/
============================================
============================================

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20071106/b85da024/attachment.html>


More information about the Swift-user mailing list