[Swift-user] Job bundles
Ben Clifford
benc at hawaga.org.uk
Tue Nov 6 12:57:46 CST 2007
yeah, I see same. though the TG UC docs suggest it should work.
I can't log into abe to see what happens there but it would be interesting
to know.
On Tue, 6 Nov 2007, Ioan Raicu wrote:
> Here is what I get at the UC/ANL TG site:
> qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
> qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
>
> iraicu at tg-viz-login2:~> showq -u iraicu
>
> active jobs------------------------
> JOBID USERNAME STATE PROCS REMAINING STARTTIME
>
> 1574623 iraicu Running 2 00:29:55 Tue Nov 6 12:34:23
> 1574621 iraicu Running 2 00:29:21 Tue Nov 6 12:33:49
>
> 2 active jobs 4 of 242 processors in use by local jobs (1.65%)
> 20 of 121 nodes active (16.53%)
>
> eligible jobs----------------------
> JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
>
>
> 0 eligible jobs
> blocked jobs-----------------------
> JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
>
>
> 0 blocked jobs
> Total jobs: 2
>
> Notice that both jobs have 2 processors allocated! These same commands on
> TeraPort would have yielded one allocation with 1 processor and another with 2
> processors. This is what I meant by "it a policy thing", because PBS can be
> configured to ignore the ppn field.
>
> Ioan
>
> Ben Clifford wrote:
> > That's what the ppn parameter specifies to PBS.
> >
> > On Tue, 6 Nov 2007, Ioan Raicu wrote:
> >
> >
> > > Right, its not that PBS doesn't support it, its more of a policy thing.
> > > On
> > > the TeraGrid, my experience has been that when PBS (or whatever LRM is
> > > being
> > > used) allocates CPUs, it always allocates at the machine level, not at the
> > > CPU
> > > level. That means, if you have an 8 processor machine, and you get 1
> > > processor on that machine, then you get (and are charged for) the whole
> > > machine as you have exclusive rights to this machine for the duration of
> > > your
> > > reservation. I have seen this behave differently in other environments,
> > > such
> > > as TeraPort, where PBS was allocating at the processor level, and not the
> > > machine level. This is why I said that I think Swift would need to
> > > somehow
> > > handle this at the worker node scripts, and not rely necessarily on the
> > > LRM
> > > doing this. Ioan
> > >
> > > Ben Clifford wrote:
> > >
> > > > On Tue, 6 Nov 2007, Ioan Raicu wrote:
> > > >
> > > >
> > > > > 2) the LRM allows the partitioning of the SMP machine into smaller
> > > > > pieces;
> > > > > for
> > > > > example, with 8 processor node, if it lets you submit 8 jobs that only
> > > > > need 1
> > > > > processor, and it will launch 8 different jobs on the same node, then
> > > > > you
> > > > > are
> > > > > fine... the parallelism will be done automatically by the LRM, as long
> > > > > as
> > > > > you
> > > > > ask for only 1 process at a time; on the TG at least, I don't think
> > > > > this
> > > > > is
> > > > > how things work, and when you get a node, regardless of how many
> > > > > processors it
> > > > > has, you get full access to all processors, not just the ones you
> > > > > asked
> > > > > for.
> > > > >
> > > > PBS allows the specification of multiple processes per node, like this
> > > > (grabbed from google)
> > > >
> > > >
> > > > > qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
> > > > >
> > > > It looks like abe runs PBS.
> > > >
> > > > So I think you could specify a globus profile key in the sites.xml,
> > > > perhaps
> > > > something like this:
> > > >
> > > > <profile namespace="globus" key="ppn">8</profile>
> > > >
> > > > I haven't tried this myself, but I'd be interested to hear your results.
> > > >
> > >
> >
> >
>
>
More information about the Swift-user
mailing list