[Swift-devel] jobthrottle value does not correspond to number of parallel jobs on local provider

Michael Wilde wilde at mcs.anl.gov
Tue Oct 23 13:36:42 CDT 2012


Ketan, looking further I see that your app has a large number of output files, O(100). Depending on their size, and the speed of the filesystem on which you are testing, that re-inforces my suspicion that low concurrency you are seeing is due to staging IO.

If this is a local 32-core host, try running with your input and output data and workdirectory all on a local hard disk (or even /dev/shm if it has sufficient RAM/space). Then try using CDM direct as explained at: 

  http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_specific_use_cases

- Mike

----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> Sent: Tuesday, October 23, 2012 12:23:34 PM
> Subject: Re: [Swift-devel] jobthrottle value does not correspond to number of parallel jobs on local provider
> Hi Ketan,
> 
> In the log you attached I see this:
> 
> <profile key="jobThrottle" namespace="karajan">0.10</profile>
> <profile namespace="karajan" key="initialScore">100000</profile>
> 
> You should leave initialScore constant, and set to a large number, no
> matter what level of manual throttling you want to specify via
> sites.xml. We always use 10000 for this value. Don't attempt to vary
> the initialScore value for manual throttle: just use jobThrottle to
> set what you want.
> 
> A jobThrottle value of 0.10 should run 11 jobs in parallel
> (jobThrottle * 100) + 1 (for historical reasons related to the
> automatic throttling algorithm).
> 
> If you are seeing less than that, one common cause is that the ratio
> of your input staging times to your job run times is so high as to
> make it impossible for Swift to keep the expected/desired number of
> jobs in active state at once.
> 
> I suggest you test the throttle behavior with a simple app script like
> "catsnsleep" (catsn with an artificial sleep to increase job
> duration). If your settings (sites + cf) work for that test, then they
> should work for the real app, within the staging constraints. Using
> CDM "direct" mode is likely what you want here to eliminate
> unnecessary staging on a local cluster.
> 
> In your test, what was this ratio? Can you also post your cf file and
> the progress log from stdout/stderr?
> 
> - Mike
> 
> ----- Original Message -----
> > From: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > To: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Tuesday, October 23, 2012 10:34:25 AM
> > Subject: [Swift-devel] jobthrottle value does not correspond to
> > number of parallel jobs on local provider
> > Hi,
> >
> >
> > I am trying to run an experiment on a 32-core machine with the hope
> > of
> > running 8, 16, 24 and 32 jobs in parallel. I am trying to control
> > these numbers of parallel jobs by setting the Karajan jobthrottle
> > values in sites.xml to 0.07, 0.15, and so on.
> >
> >
> > However, it seems that the values are not corresponding to what I
> > see
> > in the Swift progress text.
> >
> >
> > Initially, when I set jobthrottle to 0.07, only 2 jobs started in
> > parallel. Then I added the line setting "Initialscore" value to
> > 10000,
> > which improved the jobs to 5. After this a 10-fold increase in
> > "initialscore" did not improve the jobs count.
> >
> >
> > Furthermore, a new batch of 5 jobs get started only when *all* jobs
> > from the old batch are over as opposed to a continuous supply of
> > jobs
> > from "site selection" to "stage out" state which happens in the case
> > of coaster and other providers.
> >
> >
> > The behavior is same in Swift 0.93.1 and latest trunk.
> >
> >
> >
> > Thank you for any clues on how to set the expected number of
> > parallel
> > jobs to these values.
> >
> >
> > Please find attached one such log of this run.
> > Thanks, --
> > Ketan
> >
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list