[Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury)

Mihael Hategan hategan at mcs.anl.gov
Wed Mar 21 15:19:33 CDT 2007


Clustering is done before the throttling. So the submit throttle will
apply to the cluster not the individual jobs.

On Wed, 2007-03-21 at 15:17 -0500, Veronika V. Nefedova wrote:
> I thought clustering happens whne the jobs are being submitted at the same 
> time (or really close to each other, less then a second apart). Or I am 
> mistaken ? If I am - when does the clustering happens? Is there a parameter 
> that controls it?
> 
> Thanks!
> 
> Nika
> 
> At 03:05 PM 3/21/2007, Mihael Hategan wrote:
> >I'm not sure what clustering has to do with the submit throttle.
> >
> >On Wed, 2007-03-21 at 15:03 -0500, Veronika V. Nefedova wrote:
> > > It looks like setting the submitThrottle to 1 didn't really make any
> > > difference. For example, I have 50 jobs that  are 15 minutes each. And I
> > > see that some of them were glued together (shown in the queue as 30 or 45
> > > minute jobs):
> > >
> > > 912830.tg-master.ncs nefedova
> > > dque     STDIN         --      1   1    --  00:30 Q   --
> > > 912832.tg-master.ncs nefedova
> > > dque     STDIN         --      1   1    --  00:15 Q   --
> > > 912833.tg-master.ncs nefedova
> > > dque     STDIN         --      1   1    --  00:30 Q   --
> > > 912834.tg-master.ncs nefedova
> > > dque     STDIN         --      1   1    --  00:15 Q   --
> > > 912835.tg-master.ncs nefedova
> > > dque     STDIN         --      1   1    --  00:15 Q   --
> > > 912836.tg-master.ncs nefedova
> > > dque     STDIN         --      1   1    --  00:45 Q   --
> > >
> > > Nika
> > >
> > > At 11:00 AM 3/21/2007, Mihael Hategan wrote:
> > > >On Wed, 2007-03-21 at 10:51 -0500, Veronika V. Nefedova wrote:
> > > > > OK. So if I set this submitThrottle to 1 it will submit jobs one at 
> > time
> > > > > (and it won't wait for the previous jobs to finish) ?
> > > >
> > > >Yes.
> > > >
> > > > >  What will be an
> > > > > indication for swift to go ahead and submit the next jobs (time 
> > delay?)?
> > > >
> > > >The fact that the submission of the previous job has been completed (ie,
> > > >the job manager has put the job in the queue).
> > > >
> > > > >  If
> > > > > thats so - than I think  I am ok.
> > > > >
> > > > > Thanks again,
> > > > >
> > > > > Nika
> > > > >
> > > > > At 10:46 AM 3/21/2007, Mihael Hategan wrote:
> > > > > >I think these should be ok. Unfortunately I can't tell you what a
> > > > > >solution to "as safe as possible" is because of two things:
> > > > > >1. The explanation of why your jobs got killed and the solution they
> > > > > >proposed are ambiguous. They don't explain much. So the proposed
> > > > > >solution may be insufficient or it may superfluous.
> > > > > >2. We don't exactly have submission rate limiters. The closest 
> > thing is
> > > > > >the submission concurrency limiter. Setting this to 1 should work,
> > > > > >because this will ensure that at most one job manager will do the
> > > > > >submission dance at a time.
> > > > > >
> > > > > >Mihael
> > > > > >
> > > > > >On Wed, 2007-03-21 at 10:37 -0500, Veronika V. Nefedova wrote:
> > > > > > > Ok. Hmmm. I am about to submit a large run (50 molecules), 
> > which could
> > > > > > have
> > > > > > > as many as 3500 jobs per tier. I really would like to be sure that
> > > > I do
> > > > > > not
> > > > > > > brake TG. I want to play as safe as possible thus I'd like to 
> > make sure
> > > > > > > that I set all the possible parameters to safeguard the run ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Nika
> > > > > > >
> > > > > > > At 10:31 AM 3/21/2007, Mihael Hategan wrote:
> > > > > > > >On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote:
> > > > > > > > > Hi, Mihael:
> > > > > > > > >
> > > > > > > > > I have these properties modified in my scheduler.xml file:
> > > > > > > > >
> > > > > > > > >                  <property name="jobThrottle" value="384"/>
> > > > > > > > > <property name="maxSimultaneousJobs" value="384"/>
> > > > > > > > >
> > > > > > > > > Are you suggesting to add also this inside
> > > > <scheduler>...</scheduler> :
> > > > > > > > >
> > > > > > > > > <property name="submitThrottle" value="1"/> ?
> > > > > > > > >
> > > > > > > > > Do these set parameters guarantee me that:
> > > > > > > > >
> > > > > > > > > 1. I have no more then 384 jobs in a queue at any time
> > > > > > > > > and
> > > > > > > > > 2. Jobs are submitted to the queue with at least 1 sec delay
> > > > > > > >
> > > > > > > >No. They don't. But they may get you closer to that.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > (these are the requirements from TG NCSA).
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > Nika
> > > > > > > > >
> > > > > > > > > At 09:18 AM 3/17/2007, Mihael Hategan wrote:
> > > > > > > > > >There is no direct rate limiter unfortunately. There is a 
> > submit
> > > > > > > > > >throttle which tells the number of concurrent submissions.
> > > > Setting
> > > > > > that
> > > > > > > > > >to 1 may work.
> > > > > > > > > >
> > > > > > > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote:
> > > > > > > > > > > Hi, Mihael:
> > > > > > > > > > >
> > > > > > > > > > > how do I set this throttling parameter ?
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > >
> > > > > > > > > > > Nika
> > > > > > > > > > >
> > > > > > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600
> > > > > > > > > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster
> > > > (mercury)
> > > > > > > > > > > >To: nefedova at mcs.anl.gov
> > > > > > > > > > > >From: consult at ncsa.uiuc.edu
> > > > > > > > > > > >Cc:
> > > > > > > > > > > >X-Mailer: Perl5 Mail::Internet v1.74
> > > > > > > > > > > >Sender: Nobody <nobody at ncsa.uiuc.edu>
> > > > > > > > > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9
> > > > > > > > > > > >X-NCSA-MailScanner-Information: Please contact
> > > > > > help at ncsa.uiuc.edu for
> > > > > > > > > > more
> > > > > > > > > > > >information, amantadine.ncsa.uiuc.edu
> > > > > > > > > > > >X-NCSA-MailScanner: Found to be clean
> > > > > > > > > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at
> > > > > > > > > > mailgw.mcs.anl.gov
> > > > > > > > > > > >
> > > > > > > > > > > >FROM: Arnold, Galen
> > > > > > > > > > > >(Concerning ticket No. 137212)
> > > > > > > > > > > >
> > > > > > > > > > > >Veronika,
> > > > > > > > > > > >
> > > > > > > > > > > >If you can throttle the job submission so that there's 
> > more
> > > > > > than 1
> > > > > > > > second
> > > > > > > > > > > >between them, that would probably help us out.
> > > > > > > > > > > >
> > > > > > > > > > > >-Galen
> > > > > > > > > > > >
> 
> 




More information about the Swift-devel mailing list