[Swift-devel] Re: Persistent coasters

Michael Wilde wilde at mcs.anl.gov
Fri Jan 14 08:25:26 CST 2011


Both methods have their pros and cons - mainly based on what size jobs the scheduler will favor under a given load.

There's possibly an advantage at the moment for having multiple workers on a node instead of one, in light of the problem of gaps in worker dispatch of user apps. But I expect that will be resolved.

There are also schedulers, in particular on TeraGrid, that will favor mode 1 below, of requesting a set of nodes in a single scheduler job (or a small number of them limited by policy).

- Mike


----- Original Message -----
> Since PBS on PADS doesn't care if we requested for multiple nodes or
> multiple jobs, i just send multiple jobs of
> 
> worker.pl <coaster-service-url> ... ...
> 
> 2011/1/13 Michael Wilde <wilde at mcs.anl.gov>:
> > You should configure the tools for this mode of operation on PADS
> > (and any PBS system):
> >
> > - run the commands on a login node (but should work on any PADS node
> > that you are ssh'ed into)
> >
> > - use qsub to obtain nodes
> >  -- mode 1: 1 N-node M-core job
> >  -- mode 2: N 1-core jobs
> >
> > Do mode 1 first:
> >
> > Job script (the script you use as an arg to qsub) should use a
> > foreach loop to start one worker.pl on each node of the job. You can
> > adapt the code below from Swift R start-swift:
> >
> > make-pbs-submit-file()
> > {
> >  if [ $queue != default ]; then
> >    queueDirective="#PBS -q $queue"
> >  else
> >    queueDirective=""
> >  fi
> > cat >pbs.sub <<END
> > #PBS -S /bin/sh
> > #PBS -N SwiftR-workers
> > #PBS -m n
> > #PBS -l nodes=$nodes
> > #PBS -l walltime=$time
> > #PBS -o $HOME
> > #PBS -e $HOME
> > $queueDirective
> > WORKER_LOGGING_ENABLED=true # FIXME: parameterize; fix w PBS -v
> > #cd / && /usr/bin/perl $SWIFTBIN/worker.pl $CONTACT SwiftR-workers
> > $HOME/.globus/coasters $IDLETIMEOUT
> > HOST=\$(echo $CONTACT | sed -e 's,^http://,,' -e 's/:.*//')
> > PORT=\$(echo $CONTACT | sed -e 's,^.*:,,')
> > echo '***' PBS_NODEFILE file: \$PBS_NODEFILE CONTACT:$CONTACT
> > cat \$PBS_NODEFILE
> > echo '***' unique nodes are:
> > sort < \$PBS_NODEFILE|uniq
> > for h in \$(sort < \$PBS_NODEFILE|uniq); do
> >  ssh \$h "echo Swift R startup running on host; hostname; cd /;
> >  /usr/bin/perl $SWIFTBIN/worker.pl $CONTACT SwiftR-\$h
> >  $HOME/.globus/\
> > coasters $IDLETIMEOUT" &
> > done
> > wait
> > END
> > }
> >
> > then:
> >
> >  make-${server}-submit-file
> >  qsub pbs.sub >$pbsjobidfile
> >
> >
> > Mike
> >
> > ----- Original Message -----
> >> How should I proceed in testing the persistent coasters scripts on
> >> PADS? Should I use workers-ssh from the login node to pads? Should
> >> I
> >> copy the format of workers-cobalt and modify it to use qsub
> >> parameters
> >> that work with pbs?
> >>
> >> David
> 
> --
> Allan M. Espinosa <http://amespinosa.wordpress.com>
> PhD student, Computer Science
> University of Chicago <http://people.cs.uchicago.edu/~aespinosa>

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list