[Swift-user] Job bundles
Ioan Raicu
iraicu at cs.uchicago.edu
Tue Nov 6 12:36:48 CST 2007
Here is what I get at the UC/ANL TG site:
qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
iraicu at tg-viz-login2:~> showq -u iraicu
active jobs------------------------
JOBID USERNAME STATE PROCS REMAINING
STARTTIME
1574623 iraicu Running 2 00:29:55 Tue Nov 6
12:34:23
1574621 iraicu Running 2 00:29:21 Tue Nov 6
12:33:49
2 active jobs 4 of 242 processors in use by local jobs (1.65%)
20 of 121 nodes active (16.53%)
eligible jobs----------------------
JOBID USERNAME STATE PROCS WCLIMIT
QUEUETIME
0 eligible jobs
blocked jobs-----------------------
JOBID USERNAME STATE PROCS WCLIMIT
QUEUETIME
0 blocked jobs
Total jobs: 2
Notice that both jobs have 2 processors allocated! These same commands
on TeraPort would have yielded one allocation with 1 processor and
another with 2 processors. This is what I meant by "it a policy thing",
because PBS can be configured to ignore the ppn field.
Ioan
Ben Clifford wrote:
> That's what the ppn parameter specifies to PBS.
>
> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>
>
>> Right, its not that PBS doesn't support it, its more of a policy thing. On
>> the TeraGrid, my experience has been that when PBS (or whatever LRM is being
>> used) allocates CPUs, it always allocates at the machine level, not at the CPU
>> level. That means, if you have an 8 processor machine, and you get 1
>> processor on that machine, then you get (and are charged for) the whole
>> machine as you have exclusive rights to this machine for the duration of your
>> reservation. I have seen this behave differently in other environments, such
>> as TeraPort, where PBS was allocating at the processor level, and not the
>> machine level. This is why I said that I think Swift would need to somehow
>> handle this at the worker node scripts, and not rely necessarily on the LRM
>> doing this.
>> Ioan
>>
>> Ben Clifford wrote:
>>
>>> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>>>
>>>
>>>
>>>> 2) the LRM allows the partitioning of the SMP machine into smaller pieces;
>>>> for
>>>> example, with 8 processor node, if it lets you submit 8 jobs that only
>>>> need 1
>>>> processor, and it will launch 8 different jobs on the same node, then you
>>>> are
>>>> fine... the parallelism will be done automatically by the LRM, as long as
>>>> you
>>>> ask for only 1 process at a time; on the TG at least, I don't think this
>>>> is
>>>> how things work, and when you get a node, regardless of how many
>>>> processors it
>>>> has, you get full access to all processors, not just the ones you asked
>>>> for.
>>>>
>>>>
>>> PBS allows the specification of multiple processes per node, like this
>>> (grabbed from google)
>>>
>>>
>>>
>>>> qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
>>>>
>>>>
>>> It looks like abe runs PBS.
>>>
>>> So I think you could specify a globus profile key in the sites.xml, perhaps
>>> something like this:
>>>
>>> <profile namespace="globus" key="ppn">8</profile>
>>>
>>> I haven't tried this myself, but I'd be interested to hear your results.
>>>
>>>
>>
>
>
--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dsl.cs.uchicago.edu/
============================================
============================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20071106/e0fd7741/attachment.html>
More information about the Swift-user
mailing list