<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">
<div class="">Glad that’s working now.</div>
<div class=""><br class="">
</div>
It is the put_wait events, and “dur” is the right field. Those units are microseconds.
<div class="">
<div class=""><br class="">
</div>
<div class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Nov 2, 2021, at 1:12 PM, Phil Carns <<a href="mailto:carns@mcs.anl.gov" class="">carns@mcs.anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="">
<p class="">Thanks Philip, the "= {0};" initialization of that struct got me going.</p>
<p class="">I can run the test case now and it is producing output in the client and server perf dirs. Just to sanity check what to look for, I think the problem should be exhibited in the "put_wait" or maybe "do_put" trace events on the client? For example
on my laptop I see this:</p>
<p class="">carns-x1-7g ~/w/d/d/m/client.perf [SIGINT]> grep do_put trace_events.0.json<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352591977.250000,"dur":350.464053,"args":{"GUID":2,"Parent GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593065.000000,"dur":36.858053,"args":{"GUID":4,"Parent GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593617.000000,"dur":32.954053,"args":{"GUID":6,"Parent GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594193.000000,"dur":36.026053,"args":{"GUID":8,"Parent GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594850.750000,"dur":34.404053,"args":{"GUID":10,"Parent GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595400.750000,"dur":33.524053,"args":{"GUID":12,"Parent GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595927.500000,"dur":34.390053,"args":{"GUID":14,"Parent GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596416.000000,"dur":37.922053,"args":{"GUID":16,"Parent GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596870.000000,"dur":35.506053,"args":{"GUID":18,"Parent GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352597287.500000,"dur":34.774053,"args":{"GUID":20,"Parent GUID":0}},<br class="">
carns-x1-7g ~/w/d/d/m/client.perf> grep put_wait trace_events.0.json<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352592427.750000,"dur":570.428053,"args":{"GUID":3,"Parent GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593122.750000,"dur":429.156053,"args":{"GUID":5,"Parent GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593671.250000,"dur":465.616053,"args":{"GUID":7,"Parent GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594248.500000,"dur":547.054053,"args":{"GUID":9,"Parent GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594906.750000,"dur":428.964053,"args":{"GUID":11,"Parent GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595455.750000,"dur":416.796053,"args":{"GUID":13,"Parent GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595981.250000,"dur":371.040053,"args":{"GUID":15,"Parent GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596485.500000,"dur":334.758053,"args":{"GUID":17,"Parent GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596934.250000,"dur":298.168053,"args":{"GUID":19,"Parent GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352597342.250000,"dur":389.624053,"args":{"GUID":21,"Parent GUID":0}},</p>
<p class="">I should look at the "dur" field right? What are the units on that?</p>
<p class="">I'll see if I can run this on a "real" system shortly.<br class="">
</p>
<p class="">thanks!</p>
<p class="">-Phil<br class="">
</p>
<div class="moz-cite-prefix">On 11/2/21 12:11 PM, Philip Davis wrote:<br class="">
</div>
<blockquote type="cite" cite="mid:104B5045-328B-48EE-A32C-09D25817EED7@rutgers.edu" class="">
Hi Phil,
<div class=""><br class="">
</div>
<div class="">Sorry the data structures are like that; I wanted to preserve as much of the RPC size and ordering in case it ended up being important.</div>
<div class=""><br class="">
</div>
<div class="">I’m surprised in.odsc.size is troublesome, as I set in.odsc.size with the line `in.odsc.size = sizeof(odsc);`. I’m not sure what could be corrupting that value in the meantime.</div>
<div class=""><br class="">
</div>
<div class="">I don’t set in.odsc.gdim_size (which was an oversight, since that’s non-zero normally), so I’m less surprised that’s an issue. I thought I initialized `in` to zero, but I see I didn’t do that after all.</div>
<div class=""><br class="">
</div>
<div class="">Maybe change the line `bulk_gdim_t in;` to `bulk_gdim_t in = {0};`</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Nov 2, 2021, at 11:48 AM, Phil Carns <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="">
<p class="">Awesome, thanks Philip. It came through fine.</p>
<p class="">I started by modifying the job script slightly to just run it on my laptop with sm (I wanted to make sure I understood the test case, and how to use apex, before trying elsewhere). Does in.size needs to be set in client.c? For me there is a random
value in that field and it is causing the encoder on the forward to attempt a very large allocation. The same might be true of gdim_size if it got past that step. I started to alter them but then I wasn't sure what the implications were.<br class="">
</p>
<p class="">(fwiw I needed to include stdlib.h in common.h, but I've hit that a couple of times recently on codes that didn't previously generate warnings; I think something in Ubuntu has gotten strict about that recently).</p>
<p class="">thanks,</p>
<p class="">-Phil<br class="">
</p>
<p class=""><br class="">
</p>
<div class="moz-cite-prefix">On 11/1/21 4:51 PM, Philip Davis wrote:<br class="">
</div>
<blockquote type="cite" cite="mid:67D76603-B4B9-48E5-9452-966D1E4866D2@rutgers.edu" class="">
<div class="" style="word-wrap:break-word;
line-break:after-white-space">
Hi Phil,</div>
<div class="" style="word-wrap:break-word;
line-break:after-white-space">
<div class=""><br class="">
</div>
<div class="">I’ve attached the reproducer. I see the 4th and 8th issue on Frontera, but not Summit. Hopefully it will build and run without too much modification. Let me know if there are any issues with running it (or if the anl listserv eats the tarball,
which I kind of expect).</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Philip</div>
<div class=""><br class="">
<div class=""><br class="">
<blockquote type="cite" class="">
<div class="">On Nov 1, 2021, at 11:14 AM, Phil Carns <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>> wrote:</div>
<br class="x_Apple-interchange-newline">
<div class="">
<div class="">
<p class="">Hi Philip,</p>
<p class="">(FYI I think the first image didn't come through in your email, but I think the others are sufficient to get across what you are seeing)</p>
<p class="">I don't have any idea what would cause that. The recently released libfabric 1.13.2 (available in spack from the mochi-spack-packages repo) includes some fixes to the rxm provider that could be relevant to Frontera and Summit, but nothing that
aligns with what you are observing.</p>
<p class="">If it were later in the sequence (much later) I would speculate that memory allocation/deallocation cycles were eventually causing a hiccup. We've seen something like that in the past, and it's a theory that we could then test with alternative
allocators like jemalloc. That's not memory allocation jitter that early in the run though.<br class="">
</p>
<p class="">Please do share your reproducer if you don't mind! We can try a few systems here to at least isolate if it is something peculiar to the InfiniBand path or if there is a more general problem in Margo.</p>
<p class="">thanks,</p>
<p class="">-Phil<br class="">
</p>
<div class="x_moz-cite-prefix">On 10/29/21 3:20 PM, Philip Davis wrote:<br class="">
</div>
<blockquote type="cite" class="">
<div class="">Hello,</div>
<div class=""><br class="">
</div>
<div class="">I apologize in advance for the winding nature of this email. I’m not sure how to ask my question without explaining the story of my results some.</div>
<div class=""><br class="">
</div>
I’m doing some characterization of our server performance under load, and I have a quirk of performance that I wanted to run by people to see if they make sense. My testing so far has been to iteratively send batches of RPCs using margo_iforward, and then measure
the wait time until they are all complete. On the server side, handling the RPC includes a margo_bulk_transfer as a pull initiated on the server to pull (for now) 8 bytes. The payload of the RPC request is about 500 bytes, and the response payload is 4 bytes.
<div class=""><br class="">
</div>
<div class="">I’ve isolated my results down to one server rank and one client rank, because it’s an easier starting point to reason from. Below is a graph of some of my initial results. These results are from Frontera. The median times are good (a single RPC
takes on the order of 10s of microseconds, which seems fantastic). However, the outliers are fairly high (note the log scale of the y-axis). With only one RPC per timestep, for example, there is a 30x spread between the median and the max.</div>
<div class=""><br class="">
</div>
<div class=""><img id="x_126DD6C1-A420-4D70-8D1C-96320B7F54E7" src="cid:5EB5DF63-97AA-48FC-8BBE-E666E933D79F" class="" moz-do-not-send="true"></div>
<div class=""><br class="">
</div>
<div class="">I was hoping (expecting) the first timestep would be where the long replies resided, but that turned out not to be the case. Below are traces from the 1 RPC (blue) and 2 RPC (orange) per timestep cases, 5 trials of 10 timesteps for each case
(normalized to fix the same x-axis):</div>
<div class=""><br class="">
</div>
<div class=""><span id="x_cid:part1.PbJ8iWDr.ROdFhNeX@mcs.anl.gov" class=""><PastedGraphic-6.png></span></div>
<div class=""><br class="">
</div>
<div class="">What strikes me is how consistent these results are across trials. For the 1 RPC per timestep case, the 3rd and 7th timestep are consistently<i class=""> </i><span class="" style="font-style:normal">slow (and the rest are fast). For the 2 RPC
per timestep case, the 2nd and 4th timestep are always slow and sometimes the 10th is. These results are repeatable with very rare variation.</span></div>
<div class=""><span class="" style="font-style:normal"><br class="">
</span></div>
<div class="">For the single RPC case, I recorded some timers on the server side, and attempted to overlay them with the client side (there is some unknown offset, but probably on the order of 10s of microseconds at worst, given the pattern):</div>
<div class=""><br class="">
</div>
<div class=""><span id="x_cid:part2.1KhzfDFv.Yq10fEh8@mcs.anl.gov" class=""><PastedGraphic-7.png></span></div>
<div class=""><br class="">
</div>
<div class="">I blew up the first few timesteps of one of the trials:</div>
<div class=""><span id="x_cid:part3.B6Byp6pi.aa0hKH2Q@mcs.anl.gov" class=""><PastedGraphic-8.png></span></div>
<div class=""><br class="">
</div>
<div class="">The different colors are different segments of the handler, but there doesn’t seem to be anything too interesting going on inside the handler. So it looks like the time is being introduced before the 3rd RPC handler starts, based on the where
the gap appears on the server side.</div>
<div class=""><br class="">
</div>
<div class="">To try and isolate any dataspaces-specific behavior, I created a pure Margo test case that just sends a single rpc of the same size as dataspaces iteratively, whre the server side does an 8-byte bulk transfer initiated by the server, and sends
a response. The results are similar, except that it is now the 4th and 8th timestep that are slow (and the first timestep is VERY long, presumably because rxm communication state is being established. DataSpaces has an earlier RPC in its init that was absorbing
this latency).</div>
<div class=""><br class="">
</div>
<div class="">I got margo profiling results for this test case:</div>
<div class=""><br class="">
</div>
<div class="">```</div>
<div class="">
<div class="">3</div>
<div class="">18446744025556676964,ofi+verbs;ofi_rxm://192.168.72.245:39573</div>
<div class="">0xa2a1,term_rpc</div>
<div class="">0x27b5,put_rpc</div>
<div class="">0xd320,__shutdown__</div>
<div class="">0x27b5 ,0.000206208,10165,18446744027256353016,0,0.041241646,0.000045538,0.025733232,200,18446744073709551615,286331153,0,18446744073709551615,286331153,0</div>
<div class="">0x27b5 ,0;0.041241646,200.000000000, 0;</div>
<div class="">0xa2a1 ,0.000009298,41633,18446744027256353016,0,0.000009298,0.000009298,0.000009298,1,18446744073709551615,286331153,0,18446744073709551615,286331153,0</div>
<div class="">0xa2a1 ,0;0.000009298,1.000000000, 0;</div>
</div>
<div class="">```</div>
<div class=""><br class="">
</div>
<div class="">So I guess my question at this point is, is there any sensible reason why the 4th and 8th RPC sent would have a long response time? I think I’ve cleared my code on the client side and server side, so it appears to be latency being introduced by
Margo, LibFabric, Argobots, or the underlying OS. I do see long timesteps occasionally after this (perhaps every 20-30 timesteps) but these are not consistent.</div>
<div class=""><br class="">
</div>
<div class="">One last detail: this does not happen on Summit. On summit, I see about 5-7x worse single-RPC performance (250-350 microseconds per RPC), but without the intermittent long timesteps.</div>
<div class=""><br class="">
</div>
<div class="">I can provide the minimal test case if it would be helpful. I am using APEX for timing results, and the following dependencies with Spack:</div>
<div class=""><br class="">
</div>
<div class=""><a class=" x_moz-txt-link-abbreviated moz-txt-link-freetext
" href="mailto:argobots@1.1" moz-do-not-send="true">argobots@1.1</a> <a class=" x_moz-txt-link-abbreviated moz-txt-link-freetext
" href="mailto:json-c@0.15" moz-do-not-send="true">json-c@0.15</a>
<a class=" x_moz-txt-link-abbreviated moz-txt-link-freetext
" href="mailto:libfabric@1.13.1" moz-do-not-send="true">libfabric@1.13.1</a> <a class=" x_moz-txt-link-abbreviated moz-txt-link-freetext
" href="mailto:mercury@2.0.1" moz-do-not-send="true">mercury@2.0.1</a>
<a class=" x_moz-txt-link-abbreviated moz-txt-link-freetext
" href="mailto:mochi-margo@0.9.5" moz-do-not-send="true">mochi-margo@0.9.5</a> rdma-core@20</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Philip</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<br class="">
<fieldset class="x_mimeAttachmentHeader"></fieldset>
<pre class="x_moz-quote-pre">_______________________________________________
mochi-devel mailing list
<a class="moz-txt-link-freetext x_moz-txt-link-abbreviated" href="mailto:mochi-devel@lists.mcs.anl.gov" moz-do-not-send="true">mochi-devel@lists.mcs.anl.gov</a>
<a class="x_moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.mcs.anl.gov%2Fmailman%2Flistinfo%2Fmochi-devel&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7C072a060f86104dba349208d99e23ee3f%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637714700152332372%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=MsfzALh61gWJDkR%2BoE3DW1%2Fc%2FZod8mPoK09NTlo85VQ%3D&reserved=0" originalsrc="https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel" shash="OzxHO9wk+qZFNcy+XwUtQ4SbFnNP/HFP3P4WKfK/AAGTgP5MkLf0lv/ZZEdflBOfx2P/yvu40sY47oH/uO/FrANa4Eh/4mgOZ0ZL4OW4MmylZSIB0EIhQHuF4yLSPJYKBAAFFoatuzWy9ZB2fG8QfB6qbEsBm0svOJTeLDEEuLA=" moz-do-not-send="true">https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel</a>
<a class="x_moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fresearch%2Fprojects%2Fmochi&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7C072a060f86104dba349208d99e23ee3f%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637714700152342375%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=amzR%2ByhZ2gjG8Drk20bR5iB8u5lZeiGmnWr%2FJShdGXA%3D&reserved=0" originalsrc="https://www.mcs.anl.gov/research/projects/mochi" shash="yu7zSIOkKKBhgj62LipCxjlSadBUnDHaRgTdg8mwuud8gYnUUk1z+sUACLs51m02Fjel4nGVO21+hChtgeoTUlRGJ2sg31OW6EA/+vowbtdIZWwoz9nsQlQ2ALdnrvxMNe7WTulpudJ/2cRZfS6BoPy/l125oQRhrshvpv8SMIQ=" moz-do-not-send="true">https://www.mcs.anl.gov/research/projects/mochi</a>
</pre>
</blockquote>
</div>
_______________________________________________<br class="">
mochi-devel mailing list<br class="">
<a href="mailto:mochi-devel@lists.mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">mochi-devel@lists.mcs.anl.gov</a><br class="">
<a class="moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.mcs.anl.gov%2Fmailman%2Flistinfo%2Fmochi-devel&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7C072a060f86104dba349208d99e23ee3f%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637714700152342375%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=KXjcwZCO2iaU%2BgX9wWnfLDXVRo1rTJSjrngcv2Ipkv4%3D&reserved=0" originalsrc="https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel" shash="pZNKL/H5yLEkFKfKA+idvym2kTyXBqYKTzTb8RKDM9rRehDrMsiEtkwFZK/twLGga27Z83K4NJHvm53A5eQh20+YzIXSI0AZDfbJak4MgI56iqrZmQv65hetjcp6avQXKlQLQmuXP21K1qGGexUpW4ZB0Z/QFDwhqqHsawCqk/A=" moz-do-not-send="true">https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel</a><br class="">
<a class="moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fresearch%2Fprojects%2Fmochi&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7C072a060f86104dba349208d99e23ee3f%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637714700152352373%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=c0Q58r478yn%2FdRB7jt3jcKK6pjWZX%2FAzr7ljBPdwmPc%3D&reserved=0" originalsrc="https://www.mcs.anl.gov/research/projects/mochi" shash="eNDzR5xT2chc97mn821PLiNIUyo0fNHBC/6Vq5QybhaWJOHCiWQWVbyvWBQiJ9NCoX3UhdngSdOh7MLdez1iYPVzLiAGqwEyrGswiRHoJCGogmILgrq1yiq5dlyKUDedcXv28iOAVfrx0O6PJ8F50kOx12LwfQjvgJzhuk3jMG8=" moz-do-not-send="true">https://www.mcs.anl.gov/research/projects/mochi</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</body>
</html>