[Swift-devel] Coaster work underway

Michael Wilde wilde at mcs.anl.gov
Thu May 3 10:07:58 CDT 2012


Mihael,

Thanks very much for your help on the Cray benchmarks and coaster speedup. The numbers were very impressive. We'll post the slides on the Swift and Beagle sites soon. A temporary link is:
  http://www.ci.uchicago.edu/~wilde/Swift-CUG.2012.0503.v10.{pdf,pptx}

I think next steps would be:

- continue to work on bug 690 (provider staging issue)
- see if we can increase efficiency for larger number of smaller tasks on Crays
- support Borja's students (Jessica and Reed) in developing a Coaster client C API.

For the efficiency benchmarks, it seems to me that Swift and the coaster service has some headroom in terms of task rates. It looks to me like the coaster worker is having a hard time keeping up and keeping the node saturated. I think we should re-run these tests with real CPU-burning app() calls, and see if we can confirm or refute this conjecture by measuring a single worker eg at 32 cores. Unless you have other ideas on where the next bottleneck lies.

I think we can get fairly regular test runs on a 18K-core Cray (as batch jobs, which we can test on Raven and then pass to Cray for the benchmark runs).  We should review the data we have, and then plan a set of proposed improvements and verifications. Ideally measuring on some local repeatable test setup like the MCS servers or bridled/communcado, or multiple PADS nodes.

I think a good goal would be 95% efficiency for 60 sec tasks at 18K cores.

 100%: 314 tasks/sec
  95%: 298 tasks/sec
  90%: 283 tasks/sec

Coasters seems to be able to do over 330 tasks/sec., so at least hypothetically, this should be doable. And the 330/sec is with only 6 cores and an 8GB Java RAM limit. We can try to use a 32-core compute node to run Swift from, with up to 64GB RAM, so we should be able to get even more headroom on the Swift+CoasterService side.

Lets slate this next for next week or later; till then, would be good to resolve 690.

Regards,

- Mike






-- 
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory




More information about the Swift-devel mailing list