[Swift-devel] jobthrottle value does not correspond to number of parallel jobs on local provider

Ketan Maheshwari ketancmaheshwari at gmail.com
Tue Oct 23 14:02:15 CDT 2012


Mike,

Thank you for your answers.

I tried catsnsleep with n=100 and s=10 and indeed the number of parallel
jobs corresponded to the jobthrottle value.
Surprisingly, when I started the mars application immediately after this,
it also started 32 jobs in parallel. However, the run failed with "too many
open files" error after a while.

Now, I am trying cdm method. Will keep you posted.

On Tue, Oct 23, 2012 at 2:36 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:

> Ketan, looking further I see that your app has a large number of output
> files, O(100). Depending on their size, and the speed of the filesystem on
> which you are testing, that re-inforces my suspicion that low concurrency
> you are seeing is due to staging IO.
>
> If this is a local 32-core host, try running with your input and output
> data and workdirectory all on a local hard disk (or even /dev/shm if it has
> sufficient RAM/space). Then try using CDM direct as explained at:
>
>
> http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_specific_use_cases
>
> - Mike
>
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > Cc: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > Sent: Tuesday, October 23, 2012 12:23:34 PM
> > Subject: Re: [Swift-devel] jobthrottle value does not correspond to
> number of parallel jobs on local provider
> > Hi Ketan,
> >
> > In the log you attached I see this:
> >
> > <profile key="jobThrottle" namespace="karajan">0.10</profile>
> > <profile namespace="karajan" key="initialScore">100000</profile>
> >
> > You should leave initialScore constant, and set to a large number, no
> > matter what level of manual throttling you want to specify via
> > sites.xml. We always use 10000 for this value. Don't attempt to vary
> > the initialScore value for manual throttle: just use jobThrottle to
> > set what you want.
> >
> > A jobThrottle value of 0.10 should run 11 jobs in parallel
> > (jobThrottle * 100) + 1 (for historical reasons related to the
> > automatic throttling algorithm).
> >
> > If you are seeing less than that, one common cause is that the ratio
> > of your input staging times to your job run times is so high as to
> > make it impossible for Swift to keep the expected/desired number of
> > jobs in active state at once.
> >
> > I suggest you test the throttle behavior with a simple app script like
> > "catsnsleep" (catsn with an artificial sleep to increase job
> > duration). If your settings (sites + cf) work for that test, then they
> > should work for the real app, within the staging constraints. Using
> > CDM "direct" mode is likely what you want here to eliminate
> > unnecessary staging on a local cluster.
> >
> > In your test, what was this ratio? Can you also post your cf file and
> > the progress log from stdout/stderr?
> >
> > - Mike
> >
> > ----- Original Message -----
> > > From: "Ketan Maheshwari" <ketancmaheshwari at gmail.com>
> > > To: "Swift Devel" <swift-devel at ci.uchicago.edu>
> > > Sent: Tuesday, October 23, 2012 10:34:25 AM
> > > Subject: [Swift-devel] jobthrottle value does not correspond to
> > > number of parallel jobs on local provider
> > > Hi,
> > >
> > >
> > > I am trying to run an experiment on a 32-core machine with the hope
> > > of
> > > running 8, 16, 24 and 32 jobs in parallel. I am trying to control
> > > these numbers of parallel jobs by setting the Karajan jobthrottle
> > > values in sites.xml to 0.07, 0.15, and so on.
> > >
> > >
> > > However, it seems that the values are not corresponding to what I
> > > see
> > > in the Swift progress text.
> > >
> > >
> > > Initially, when I set jobthrottle to 0.07, only 2 jobs started in
> > > parallel. Then I added the line setting "Initialscore" value to
> > > 10000,
> > > which improved the jobs to 5. After this a 10-fold increase in
> > > "initialscore" did not improve the jobs count.
> > >
> > >
> > > Furthermore, a new batch of 5 jobs get started only when *all* jobs
> > > from the old batch are over as opposed to a continuous supply of
> > > jobs
> > > from "site selection" to "stage out" state which happens in the case
> > > of coaster and other providers.
> > >
> > >
> > > The behavior is same in Swift 0.93.1 and latest trunk.
> > >
> > >
> > >
> > > Thank you for any clues on how to set the expected number of
> > > parallel
> > > jobs to these values.
> > >
> > >
> > > Please find attached one such log of this run.
> > > Thanks, --
> > > Ketan
> > >
> > >
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
>


-- 
Ketan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20121023/307e258e/attachment.html>


More information about the Swift-devel mailing list