[Mochi-devel] some regression testing and transport updates

Carns, Philip H. carns at mcs.anl.gov
Thu Sep 6 15:10:26 CDT 2018


Hi all,

We recently revamped our nightly testing at ANL.  The (just point to point for now) margo regression tests are in a new separate repository now:

https://xgitlab.cels.anl.gov/sds/sds-tests

There are subdirectories under perf-regression for three different systems:

bebop (linux/omnipath)
cooley (linux/infiniband)
theta (cray/aries)

Each one has a build script and job script that shows how to compile and run everything on those systems using spack.

One notable change from the past is that we've switched over entirely to libfabric for infiniband/verbs systems. At this point it performs well and is more stable than cci on modern systems.  We were already relying on libfabric for omnipath and aries support.

Point to point performance on all three platforms is pretty good now, with the notable exception that round trip latency is poor on theta.  This particular problem is unique to KNL cores, though.  Performance is quite good on Haswell or other Xeon-based chips.

All three have some quirks related to getting them to interact cleanly with MPI and/or how much CPU they use.  The script examples have our current best idea of how to make them work.

I'm not sure what to recommend at present for TCP.  The viable options are BMI/TCP and libfabric/sockets but we aren't routinely testing either one at present.

thanks,
-Phil






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mochi-devel/attachments/20180906/ecc6d18c/attachment.html>


More information about the mochi-devel mailing list