[Swift-devel] Concurrent dostagein calls limited to 8 ?

Michael Wilde wilde at mcs.anl.gov
Mon Nov 15 00:02:06 CST 2010


I bumped up the thread count to 32*cores. It was 4 * # cores, so maybe there is some 50% allocation factor going on?

At any rate, if I reduce the number of files Im processing from 317 to 100, the entire script seems to work fairly reliably. But I can definitely see the ill effects of the large number of threads Im tying up waiting on IO.

(For one thing, I cant keep my coaster cores busy, and I get the "Canceling job" message from coaster workers shutting down for lack of work).

This will improve a bit when I enhance the interface to globusonline to wait on individual file transfers rather than on the whole allocation request.

- Mike


----- Original Message -----
> That explains a lot - the limited number of Karajan threads probably
> explains why coasters goes haywire in the larger tests as well.
> 
> Clearly this should be done as full fledged provider. But that will
> take a fair bit more work.
> 
> Would there be any ill effects from bumping up the number of karajan
> threads to see if I can make this demo work? WHere is that set?
> 
> Also, when you say "use the local provider or
> > some other scheme that can free the workers while the sub-processes
> > run." - do you have anything "quick and easy" in mind there?
> 
> - Mike
> 
> 
> ----- Original Message -----
> > The cdm functions (externalin, externalout, externalgo) are not
> > asynchronous. They block the karajan worker threads and therefore,
> > besides preventing anything else from running in the interpreter,
> > are
> > also limited to concurrently running whatever the number of karajan
> > worker threads is (2*cores).
> >
> > I would suggest changing those functions to use the local provider
> > or
> > some other scheme that can free the workers while the sub-processes
> > run.
> >
> > Mihael
> >
> > On Sun, 2010-11-14 at 20:56 -0600, Michael Wilde wrote:
> > > I'm in a cab - vdlint.k is in local fs on:
> > >
> > > Login1.pads.ci
> > > /scratch/local/wilde/swift/src/trunk/...
> > > Running from dist/swft-svn in that tree
> > >
> > > On 11/14/10, Mihael Hategan <hategan at mcs.anl.gov> wrote:
> > > > On Sun, 2010-11-14 at 17:23 -0600, Michael Wilde wrote:
> > > >> Some answers from my handheld:
> > > >> - foreach loop has 317 files so ample parallelism
> > > >
> > > > I would have assumed it's > 8. But I suspect, given one of the
> > > > answers
> > > > below, that it does not matter.
> > > >
> > > >> - throttle in sites entry set to .63 to run 64 jobs at once
> > > >> - the "active" external.sh is called from end of dostagein and
> > > >> dostageout in vdl-int.k (after all files for the job were put
> > > >> in
> > > >> a
> > > >> list by prior calls to externa.sh from within those functions
> > > >
> > > > How is this call actually implemented. I.e. can you post the
> > > > respective
> > > > snippet of vdl-int?
> > > >
> > > >> - the actual staging op by globusonline take 30-60 seconds,
> > > >> sometimes
> > > >> more. I batch them up.
> > > >
> > > >
> > > >
> > >
> 
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel

-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list