[Swift-devel] Re: [Swift-user] swift and fusion

wilde at mcs.anl.gov wilde at mcs.anl.gov
Tue Apr 6 09:29:34 CDT 2010


Thanks, Marcin - that is very helpful. I'm cc'ing to Swift-Devel.

Looks like the regular queue behaves like TeraGrid (eg Abe & Queenbee) and teh shared queue like PADS and TeraPort.

We need to get back to you with more info, but the following sites.xml should (approximately?) work for you on the Fusion regular queue:

 <pool handle="pbs">
    <execution provider="coaster" url="none" jobManager="local:pbs"/>
    <profile namespace="globus" key="maxtime">3600</profile>
    <profile namespace="globus" key="workersPerNode">8</profile>
    <profile namespace="globus" key="slots">8</profile>
    <profile namespace="globus" key="nodeGranularity">1</profile>
    <profile namespace="globus" key="maxNodes">4</profile>
    <profile namespace="karajan" key="jobThrottle">1.27</profile> <!--128 concurrent tasks-->
    <profile namespace="karajan" key="initialScore">10000</profile>
    <profile namespace="globus" key="queue">regular</profile>
    <scratch>/scratch/local/$USER</scratch>
    <filesystem provider="local"/>
    <workdirectory>$HOME/swiftwork</workdirectory>
  </pool>

Notes:
- set maxWallTime in tc.data on your job
- set maxTime above to be the largest wall time Swift should request from PBS
- leave out scratch in your first run; set it to a large local disk dir on the worker nodes, likely documented in Fusion doc pages, or you can find out by logging into a node with qsub -I and exploring with df and mount commands.
- set your workdirectory as you do in your current sites.xml
- you may want/need to adjust the several throttles based on experience and your run profiles. We can discuss more as needed.


Mike


----- "Marcin Hitczenko" <marcin at galton.uchicago.edu> wrote:

> Hi Michael,
> 
> Here is a fragment of an email I was sent from the people running
> fusion.
> I am not sure if it is sufficient for finding out what you want (it
> kind
> of sounds like it assigns jobs to hosts, but I am of course not sure
> at
> all). If this is the case, then I would like to learn to use
> coasters.
> 
> Best,
> 
> Marcin
> 
>   Finally, please note that time on fusion will be charged by
>    core-hour.  Since there are 8 cores on each node in fusion, and
>    since each node is dedicated to a single job (except for the nodes
>    in the shared queue and other special cases), this means a 1 hour
>    job running on 1 node will be charged 8 core-hours.
> 
> 2) Jobs are now prioritized in the job queue based on the priority
>    assigned to the job's project.
> 
> 3) Since jobs on the regular nodes have exclusive access to all the
>    cores (processors) on the nodes, we've configured the resource
>    manager (pbs) to automatically add the "ppn=8" (process(or)s per
>    node) property to the count of nodes given to qsub.  For example,
>    if you submit a job requesting 32 nodes with "qsub -l nodes=32",
>    the resource manager will convert this to "qsub -l nodes=32:ppn=8"
>    to reflect the fact that the job has access to all 256 cores on
> the
>    allocated nodes.  If you request less processors per node (by, eg,
>    using "ppn=4"), the resource manager will change the request to
>    ppn=8.
> 
>    A consequence of having "ppn=8" added to the node count is that
> the
>    $PBS_NODEFILE created for your job will list each node assigned to
>    your job 8 times.  The default (Hydra) mpiexec used on fusion for
>    running MPI jobs will see this and automatically use all the cores
>    on all the nodes assigned to your job.  Other applications which
>    also use $PBS_NODEFILE should also use all the cores on the nodes.
>    If your jobs script counts the number of lines in $PBS_NODEFILE to
>    determine the number of nodes, this will actually yield the number
>    of processors; instead, you need to count the number of unique
>    nodes listed in $PBS_NODEFILE in order to determine the number of
>    nodes.  For example, using Bourne-shell syntax, the lines
> 
>      nprocs=`wc -l < $PBS_NODEFILE`
>      nnodes=`sort -u $PBS_NODEFILE`
> 
>    will set $nprocs to the number of processors assigned to your job
>    and $nnodes to the number of nodes assigned to your job (for csh,
>    simply add "set " to the start of each of these lines).
> 
>    Jobs submitted to the shared queue won't have the ppn=8 property
>    added, nor will a ppn property submitted with the job be changed.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> > Hi Marcin,
> >
> > We need to do a little research here and then document the findings.
> The
> > issue is whether Fusion assigns jobs to cores (like TeraPort and
> PADS) or
> > hosts (like most TeraGrid PBS sites seem to).
> >
> > If Fusion schedules by-core, then PBS will just place your jobs on
> free
> > cores, which may be on the same host (as the cluster gets filled) or
> may
> > not be.
> >
> > If it schedules by-host, then you'll need to use coasters to avoid
> wasting
> > hosts. Mihael: the last I recall, you were uncertain how to
> determine
> > this. Is it still an open issue? It seems we need to document a
> query one
> > can pose to PBS to determine how its configured.
> >
> > For sites that schedule by-host, the only convenient way to use all
> cores
> > on a host is to use coasters and specify workersPerNode in
> sites.xml.
> >
> > There is another method worked OK for uniform-length jobs, which
> involves
> > job clustering and a tiny mod to the swift clustering script to run
> all
> > the jobs in a cluster in parallel. That was used effectively on a
> TeraGrid
> > site, but I'd use it only as a last resort.
> >
> > - Mike
> >
> >
> >
> >
> > ----- "Marcin Hitczenko" <marcin at galton.uchicago.edu> wrote:
> >
> >> Hi,
> >>
> >> I am using swift to submit several R jobs on fusion and am trying
> to
> >> determine whether or not I am making use of all the available
> cores
> >> on
> >> each node (I believe there are 8 cores on each node). I submitted
> 10
> >> really short jobs to try to see if I could determine what was
> going
> >> on,
> >> but I don't really know what to look for.
> >>
> >> In case it is useful, I am attaching the .log file, sites.xml, and
> an
> >> info
> >> file for one of the jobs.
> >>
> >> Thanks for your help,
> >>
> >> Marcin
> >> _______________________________________________
> >> Swift-user mailing list
> >> Swift-user at ci.uchicago.edu
> >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list