[Swift-devel] Concurrent dostagein calls limited to 8 ?

Mihael Hategan hategan at mcs.anl.gov
Mon Nov 15 00:08:09 CST 2010


On Mon, 2010-11-15 at 00:02 -0600, Michael Wilde wrote:
> I bumped up the thread count to 32*cores. It was 4 * # cores, so maybe there is some 50% allocation factor going on?

There shouldn't be. I had 2*cores on my version though.

> 
> At any rate, if I reduce the number of files Im processing from 317 to 100, the entire script seems to work fairly reliably. But I can definitely see the ill effects of the large number of threads Im tying up waiting on IO.
> 
> (For one thing, I cant keep my coaster cores busy, and I get the "Canceling job" message from coaster workers shutting down for lack of work).
> 
> This will improve a bit when I enhance the interface to globusonline to wait on individual file transfers rather than on the whole allocation request.
> 
> - Mike
> 
> 
> ----- Original Message -----
> > That explains a lot - the limited number of Karajan threads probably
> > explains why coasters goes haywire in the larger tests as well.
> > 
> > Clearly this should be done as full fledged provider. But that will
> > take a fair bit more work.
> > 
> > Would there be any ill effects from bumping up the number of karajan
> > threads to see if I can make this demo work? WHere is that set?
> > 
> > Also, when you say "use the local provider or
> > > some other scheme that can free the workers while the sub-processes
> > > run." - do you have anything "quick and easy" in mind there?
> > 
> > - Mike
> > 
> > 
> > ----- Original Message -----
> > > The cdm functions (externalin, externalout, externalgo) are not
> > > asynchronous. They block the karajan worker threads and therefore,
> > > besides preventing anything else from running in the interpreter,
> > > are
> > > also limited to concurrently running whatever the number of karajan
> > > worker threads is (2*cores).
> > >
> > > I would suggest changing those functions to use the local provider
> > > or
> > > some other scheme that can free the workers while the sub-processes
> > > run.
> > >
> > > Mihael
> > >
> > > On Sun, 2010-11-14 at 20:56 -0600, Michael Wilde wrote:
> > > > I'm in a cab - vdlint.k is in local fs on:
> > > >
> > > > Login1.pads.ci
> > > > /scratch/local/wilde/swift/src/trunk/...
> > > > Running from dist/swft-svn in that tree
> > > >
> > > > On 11/14/10, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > > > On Sun, 2010-11-14 at 17:23 -0600, Michael Wilde wrote:
> > > > >> Some answers from my handheld:
> > > > >> - foreach loop has 317 files so ample parallelism
> > > > >
> > > > > I would have assumed it's > 8. But I suspect, given one of the
> > > > > answers
> > > > > below, that it does not matter.
> > > > >
> > > > >> - throttle in sites entry set to .63 to run 64 jobs at once
> > > > >> - the "active" external.sh is called from end of dostagein and
> > > > >> dostageout in vdl-int.k (after all files for the job were put
> > > > >> in
> > > > >> a
> > > > >> list by prior calls to externa.sh from within those functions
> > > > >
> > > > > How is this call actually implemented. I.e. can you post the
> > > > > respective
> > > > > snippet of vdl-int?
> > > > >
> > > > >> - the actual staging op by globusonline take 30-60 seconds,
> > > > >> sometimes
> > > > >> more. I batch them up.
> > > > >
> > > > >
> > > > >
> > > >
> > 
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
> > 
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> 





More information about the Swift-devel mailing list