[codes-ross-users] Load imbalance when running TraceR in parallel

Fri Jun 29 12:10:05 CDT 2018

Hi,
I’m trying to run an TraceR OTF simulation with lots of messages and lots of congestion. This is the first time that I’ve had a big enough simulation that I need to run it in parallel, and I’m having a really hard time getting any sort of parallel speedup. I tried running on 4-8 nodes with –sync=2 and –sync=3, as well as various values of --nkp, and the best I’ve gotten is only a few percent faster than serial. I looked into it some, and found that the cause appears to be massive load imbalance. I’m attaching a screenshot from hpctraceviewer that shows that rank 0 does almost all the work while the other ranks spend a large amount of time in MPI_Allreduce, waiting for rank 0 to arrive. I don’t know this part of ROSS/CODES very well, but does this mean the LPs are not being distributed evenly? If so, how can I change the distribution?
It wouldn’t surprise me if my traffic pattern caused some load imbalance because there are 4 endpoints that receive way more traffic than the others, but I don’t think the imbalance should be this bad.
Thank you very much,
Philip Taffet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/codes-ross-users/attachments/20180629/5c266850/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2018-06-26 at 4.16.09 PM[1].png
Type: image/png
Size: 160602 bytes
Desc: Screen Shot 2018-06-26 at 4.16.09 PM[1].png
URL: <http://lists.mcs.anl.gov/pipermail/codes-ross-users/attachments/20180629/5c266850/attachment-0001.png>