<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>Awesome, thanks Philip. It came through fine.</p>
<p>I started by modifying the job script slightly to just run it on
my laptop with sm (I wanted to make sure I understood the test
case, and how to use apex, before trying elsewhere). Does in.size
needs to be set in client.c? For me there is a random value in
that field and it is causing the encoder on the forward to attempt
a very large allocation. The same might be true of gdim_size if
it got past that step. I started to alter them but then I wasn't
sure what the implications were.<br>
</p>
<p>(fwiw I needed to include stdlib.h in common.h, but I've hit that
a couple of times recently on codes that didn't previously
generate warnings; I think something in Ubuntu has gotten strict
about that recently).</p>
<p>thanks,</p>
<p>-Phil<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 11/1/21 4:51 PM, Philip Davis wrote:<br>
</div>
<blockquote type="cite" cite="mid:67D76603-B4B9-48E5-9452-966D1E4866D2@rutgers.edu">
<div class="" style="word-wrap:break-word;
line-break:after-white-space">Hi Phil,</div>
<div class="" style="word-wrap:break-word;
line-break:after-white-space">
<div class=""><br class="">
</div>
<div class="">I’ve attached the reproducer. I see the 4th and
8th issue on Frontera, but not Summit. Hopefully it will build
and run without too much modification. Let me know if there
are any issues with running it (or if the anl listserv eats
the tarball, which I kind of expect).</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Philip</div>
<div class=""><br class="">
<div><br class="">
<blockquote type="cite" class="">
<div class="">On Nov 1, 2021, at 11:14 AM, Phil Carns <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>
wrote:</div>
<br class="x_Apple-interchange-newline">
<div class="">
<div class="">
<p class="">Hi Philip,</p>
<p class="">(FYI I think the first image didn't come
through in your email, but I think the others are
sufficient to get across what you are seeing)</p>
<p class="">I don't have any idea what would cause
that. The recently released libfabric 1.13.2
(available in spack from the mochi-spack-packages
repo) includes some fixes to the rxm provider that
could be relevant to Frontera and Summit, but
nothing that aligns with what you are observing.</p>
<p class="">If it were later in the sequence (much
later) I would speculate that memory
allocation/deallocation cycles were eventually
causing a hiccup. We've seen something like that in
the past, and it's a theory that we could then test
with alternative allocators like jemalloc. That's
not memory allocation jitter that early in the run
though.<br class="">
</p>
<p class="">Please do share your reproducer if you
don't mind! We can try a few systems here to at
least isolate if it is something peculiar to the
InfiniBand path or if there is a more general
problem in Margo.</p>
<p class="">thanks,</p>
<p class="">-Phil<br class="">
</p>
<div class="x_moz-cite-prefix">On 10/29/21 3:20 PM,
Philip Davis wrote:<br class="">
</div>
<blockquote type="cite" class="">
<div class="">Hello,</div>
<div class=""><br class="">
</div>
<div class="">I apologize in advance for the winding
nature of this email. I’m not sure how to ask my
question without explaining the story of my
results some.</div>
<div class=""><br class="">
</div>
I’m doing some characterization of our server
performance under load, and I have a quirk of
performance that I wanted to run by people to see if
they make sense. My testing so far has been to
iteratively send batches of RPCs using
margo_iforward, and then measure the wait time until
they are all complete. On the server side, handling
the RPC includes a margo_bulk_transfer as a pull
initiated on the server to pull (for now) 8 bytes.
The payload of the RPC request is about 500 bytes,
and the response payload is 4 bytes.
<div class=""><br class="">
</div>
<div class="">I’ve isolated my results down to one
server rank and one client rank, because it’s an
easier starting point to reason from. Below is a
graph of some of my initial results. These results
are from Frontera. The median times are good (a
single RPC takes on the order of 10s of
microseconds, which seems fantastic). However, the
outliers are fairly high (note the log scale of
the y-axis). With only one RPC per timestep, for
example, there is a 30x spread between the median
and the max.</div>
<div class=""><br class="">
</div>
<div class=""><img id="x_126DD6C1-A420-4D70-8D1C-96320B7F54E7" src="cid:5EB5DF63-97AA-48FC-8BBE-E666E933D79F" class="" moz-do-not-send="true"></div>
<div class=""><br class="">
</div>
<div class="">I was hoping (expecting) the first
timestep would be where the long replies resided,
but that turned out not to be the case. Below are
traces from the 1 RPC (blue) and 2 RPC (orange)
per timestep cases, 5 trials of 10 timesteps for
each case (normalized to fix the same x-axis):</div>
<div class=""><br class="">
</div>
<div class=""><span id="x_cid:part1.PbJ8iWDr.ROdFhNeX@mcs.anl.gov"><PastedGraphic-6.png></span></div>
<div class=""><br class="">
</div>
<div class="">What strikes me is how consistent
these results are across trials. For the 1 RPC per
timestep case, the 3rd and 7th timestep are
consistently<i class=""> </i><span class="" style="font-style:normal">slow (and the rest are
fast). For the 2 RPC per timestep case, the 2nd
and 4th timestep are always slow and sometimes
the 10th is. These results are repeatable with
very rare variation.</span></div>
<div class=""><span class="" style="font-style:normal"><br class="">
</span></div>
<div class="">For the single RPC case, I recorded
some timers on the server side, and attempted to
overlay them with the client side (there is some
unknown offset, but probably on the order of 10s
of microseconds at worst, given the pattern):</div>
<div class=""><br class="">
</div>
<div class=""><span id="x_cid:part2.1KhzfDFv.Yq10fEh8@mcs.anl.gov"><PastedGraphic-7.png></span></div>
<div class=""><br class="">
</div>
<div class="">I blew up the first few timesteps of
one of the trials:</div>
<div class=""><span id="x_cid:part3.B6Byp6pi.aa0hKH2Q@mcs.anl.gov"><PastedGraphic-8.png></span></div>
<div class=""><br class="">
</div>
<div class="">The different colors are different
segments of the handler, but there doesn’t seem to
be anything too interesting going on inside the
handler. So it looks like the time is being
introduced before the 3rd RPC handler starts,
based on the where the gap appears on the server
side.</div>
<div class=""><br class="">
</div>
<div class="">To try and isolate any
dataspaces-specific behavior, I created a pure
Margo test case that just sends a single rpc of
the same size as dataspaces iteratively, whre the
server side does an 8-byte bulk transfer initiated
by the server, and sends a response. The results
are similar, except that it is now the 4th and 8th
timestep that are slow (and the first timestep is
VERY long, presumably because rxm communication
state is being established. DataSpaces has an
earlier RPC in its init that was absorbing this
latency).</div>
<div class=""><br class="">
</div>
<div class="">I got margo profiling results for this
test case:</div>
<div class=""><br class="">
</div>
<div class="">```</div>
<div class="">
<div class="">3</div>
<div class="">18446744025556676964,ofi+verbs;ofi_rxm://192.168.72.245:39573</div>
<div class="">0xa2a1,term_rpc</div>
<div class="">0x27b5,put_rpc</div>
<div class="">0xd320,__shutdown__</div>
<div class="">0x27b5
,0.000206208,10165,18446744027256353016,0,0.041241646,0.000045538,0.025733232,200,18446744073709551615,286331153,0,18446744073709551615,286331153,0</div>
<div class="">0x27b5 ,0;0.041241646,200.000000000,
0;</div>
<div class="">0xa2a1
,0.000009298,41633,18446744027256353016,0,0.000009298,0.000009298,0.000009298,1,18446744073709551615,286331153,0,18446744073709551615,286331153,0</div>
<div class="">0xa2a1 ,0;0.000009298,1.000000000,
0;</div>
</div>
<div class="">```</div>
<div class=""><br class="">
</div>
<div class="">So I guess my question at this point
is, is there any sensible reason why the 4th and
8th RPC sent would have a long response time? I
think I’ve cleared my code on the client side and
server side, so it appears to be latency being
introduced by Margo, LibFabric, Argobots, or the
underlying OS. I do see long timesteps
occasionally after this (perhaps every 20-30
timesteps) but these are not consistent.</div>
<div class=""><br class="">
</div>
<div class="">One last detail: this does not happen
on Summit. On summit, I see about 5-7x worse
single-RPC performance (250-350 microseconds per
RPC), but without the intermittent long timesteps.</div>
<div class=""><br class="">
</div>
<div class="">I can provide the minimal test case if
it would be helpful. I am using APEX for timing
results, and the following dependencies with
Spack:</div>
<div class=""><br class="">
</div>
<div class=""><a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:argobots@1.1" moz-do-not-send="true">argobots@1.1</a> <a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:json-c@0.15" moz-do-not-send="true">json-c@0.15</a> <a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:libfabric@1.13.1" moz-do-not-send="true">libfabric@1.13.1</a> <a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:mercury@2.0.1" moz-do-not-send="true">mercury@2.0.1</a> <a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:mochi-margo@0.9.5" moz-do-not-send="true">mochi-margo@0.9.5</a>
rdma-core@20</div>
<div class=""><br class="">
</div>
<div class="">Thanks,</div>
<div class="">Philip</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<br class="">
<fieldset class="x_mimeAttachmentHeader"></fieldset>
<pre class="x_moz-quote-pre">_______________________________________________
mochi-devel mailing list
<a class="x_moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:mochi-devel@lists.mcs.anl.gov" moz-do-not-send="true">mochi-devel@lists.mcs.anl.gov</a>
<a class="x_moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.mcs.anl.gov%2Fmailman%2Flistinfo%2Fmochi-devel&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Cd561df6d1b3a415d98a508d99d4a62df%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637713765117067440%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rqjBzeeOVRXpIymaVQYBc%2Bjc0n%2BrYZUeZHqXDWBr5o8%3D&reserved=0" originalsrc="https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel" shash="ZcdH3ojHEL57oiIw9TLiBe5oA9QOLvpFtb5DOaFcDFKtHdHaum2urK/zj1ucFMpYFctb8nAsuTPqJ9nQfSXdHVdqbvNf3hMUsS/fwX1AHkPVpOORvwlZh68eZzuN8gq9Lx13+FCUH3JZlgk0H/rwWnMeCxN92+OzQWGfWbKyPjM=" moz-do-not-send="true">https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel</a>
<a class="x_moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fresearch%2Fprojects%2Fmochi&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Cd561df6d1b3a415d98a508d99d4a62df%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637713765117067440%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NtfbQ069ftcjDe%2F8e2kU0uGlEh5GLQkcqjuy%2FNW9kZ4%3D&reserved=0" originalsrc="https://www.mcs.anl.gov/research/projects/mochi" shash="kyH7FQK59FSodbkjA+g3O8szs0N4tzLZweR2bc6JCcRGFZ21hs1EM8ai09ECwijw5CgUqix8cFGO05FfzZhp0CavA9V0PbEKl+cfoEQRcBaV+oYIOS6q2m2jrTui7BfK8NKKOK1+fOXRYIAHYeUMpEcR4JpGVCN34GBI0YtbiBI=" moz-do-not-send="true">https://www.mcs.anl.gov/research/projects/mochi</a>
</pre>
</blockquote>
</div>
_______________________________________________<br class="">
mochi-devel mailing list<br class="">
<a href="mailto:mochi-devel@lists.mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">mochi-devel@lists.mcs.anl.gov</a><br class="">
<a class="moz-txt-link-freetext" href="https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel">https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel</a><br class="">
<a class="moz-txt-link-freetext" href="https://www.mcs.anl.gov/research/projects/mochi">https://www.mcs.anl.gov/research/projects/mochi</a><br class="">
</div>
</blockquote>
</div>
<br class="">
</div>
</div>
</blockquote>
</body>
</html>