<html><head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  </head>

  <body>

    <p>Awesome, thanks Philip.  It came through fine.</p>

    <p>I started by modifying the job script slightly to just run it on

      my laptop with sm (I wanted to make sure I understood the test

      case, and how to use apex, before trying elsewhere).  Does in.size

      needs to be set in client.c?  For me there is a random value in

      that field and it is causing the encoder on the forward to attempt

      a very large allocation.  The same might be true of gdim_size if

      it got past that step.  I started to alter them but then I wasn't

      sure what the implications were.<br>

    </p>

    <p>(fwiw I needed to include stdlib.h in common.h, but I've hit that

      a couple of times recently on codes that didn't previously

      generate warnings; I think something in Ubuntu has gotten strict

      about that recently).</p>

    <p>thanks,</p>

    <p>-Phil<br>

    </p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 11/1/21 4:51 PM, Philip Davis wrote:<br>

    </div>

    <blockquote type="cite" cite="mid:67D76603-B4B9-48E5-9452-966D1E4866D2@rutgers.edu">

      <div class="" style="word-wrap:break-word;

        line-break:after-white-space">Hi Phil,</div>

      <div class="" style="word-wrap:break-word;

        line-break:after-white-space">

        <div class=""><br class="">

        </div>

        <div class="">I’ve attached the reproducer. I see the 4th and

          8th issue on Frontera, but not Summit. Hopefully it will build

          and run without too much modification. Let me know if there

          are any issues with running it (or if the anl listserv eats

          the tarball, which I kind of expect).</div>

        <div class=""><br class="">

        </div>

        <div class="">Thanks,</div>

        <div class="">Philip</div>

        <div class=""><br class="">

          <div><br class="">

            <blockquote type="cite" class="">

              <div class="">On Nov 1, 2021, at 11:14 AM, Phil Carns <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>

                wrote:</div>

              <br class="x_Apple-interchange-newline">

              <div class="">

                <div class="">

                  <p class="">Hi Philip,</p>

                  <p class="">(FYI I think the first image didn't come

                    through in your email, but I think the others are

                    sufficient to get across what you are seeing)</p>

                  <p class="">I don't have any idea what would cause

                    that.  The recently released libfabric 1.13.2

                    (available in spack from the mochi-spack-packages

                    repo) includes some fixes to the rxm provider that

                    could be relevant to Frontera and Summit, but

                    nothing that aligns with what you are observing.</p>

                  <p class="">If it were later in the sequence (much

                    later) I would speculate that memory

                    allocation/deallocation cycles were eventually

                    causing a hiccup.  We've seen something like that in

                    the past, and it's a theory that we could then test

                    with alternative allocators like jemalloc.  That's

                    not memory allocation jitter that early in the run

                    though.<br class="">

                  </p>

                  <p class="">Please do share your reproducer if you

                    don't mind!  We can try a few systems here to at

                    least isolate if it is something peculiar to the

                    InfiniBand path or if there is a more general

                    problem in Margo.</p>

                  <p class="">thanks,</p>

                  <p class="">-Phil<br class="">

                  </p>

                  <div class="x_moz-cite-prefix">On 10/29/21 3:20 PM,

                    Philip Davis wrote:<br class="">

                  </div>

                  <blockquote type="cite" class="">

                    <div class="">Hello,</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">I apologize in advance for the winding

                      nature of this email. I’m not sure how to ask my

                      question without explaining the story of my

                      results some.</div>

                    <div class=""><br class="">

                    </div>

                    I’m doing some characterization of our server

                    performance under load, and I have a quirk of

                    performance that I wanted to run by people to see if

                    they make sense. My testing so far has been to

                    iteratively send batches of RPCs using

                    margo_iforward, and then measure the wait time until

                    they are all complete. On the server side, handling

                    the RPC includes a margo_bulk_transfer as a pull

                    initiated on the server to pull (for now) 8 bytes.

                    The payload of the RPC request is about 500 bytes,

                    and the response payload is 4 bytes.

                    <div class=""><br class="">

                    </div>

                    <div class="">I’ve isolated my results down to one

                      server rank and one client rank, because it’s an

                      easier starting point to reason from. Below is a

                      graph of some of my initial results. These results

                      are from Frontera. The median times are good (a

                      single RPC takes on the order of 10s of

                      microseconds, which seems fantastic). However, the

                      outliers are fairly high (note the log scale of

                      the y-axis). With only one RPC per timestep, for

                      example, there is a 30x spread between the median

                      and the max.</div>

                    <div class=""><br class="">

                    </div>

                    <div class=""><img id="x_126DD6C1-A420-4D70-8D1C-96320B7F54E7" src="cid:5EB5DF63-97AA-48FC-8BBE-E666E933D79F" class="" moz-do-not-send="true"></div>

                    <div class=""><br class="">

                    </div>

                    <div class="">I was hoping (expecting) the first

                      timestep would be where the long replies resided,

                      but that turned out not to be the case. Below are

                      traces from the 1 RPC (blue) and 2 RPC  (orange)

                      per timestep cases, 5 trials of 10 timesteps for

                      each case (normalized to fix the same x-axis):</div>

                    <div class=""><br class="">

                    </div>

                    <div class=""><span id="x_cid:part1.PbJ8iWDr.ROdFhNeX@mcs.anl.gov"><PastedGraphic-6.png></span></div>

                    <div class=""><br class="">

                    </div>

                    <div class="">What strikes me is how consistent

                      these results are across trials. For the 1 RPC per

                      timestep case, the 3rd and 7th timestep are

                      consistently<i class=""> </i><span class="" style="font-style:normal">slow (and the rest are

                        fast). For the 2 RPC per timestep case, the 2nd

                        and 4th timestep are always slow and sometimes

                        the 10th is. These results are repeatable with

                        very rare variation.</span></div>

                    <div class=""><span class="" style="font-style:normal"><br class="">

                      </span></div>

                    <div class="">For the single RPC case, I recorded

                      some timers on the server side, and attempted to

                      overlay them with the client side (there is some

                      unknown offset, but probably on the order of 10s

                      of microseconds at worst, given the pattern):</div>

                    <div class=""><br class="">

                    </div>

                    <div class=""><span id="x_cid:part2.1KhzfDFv.Yq10fEh8@mcs.anl.gov"><PastedGraphic-7.png></span></div>

                    <div class=""><br class="">

                    </div>

                    <div class="">I blew up the first few timesteps of

                      one of the trials:</div>

                    <div class=""><span id="x_cid:part3.B6Byp6pi.aa0hKH2Q@mcs.anl.gov"><PastedGraphic-8.png></span></div>

                    <div class=""><br class="">

                    </div>

                    <div class="">The different colors are different

                      segments of the handler, but there doesn’t seem to

                      be anything too interesting going on inside the

                      handler. So it looks like the time is being

                      introduced before the 3rd RPC handler starts,

                      based on the where the gap appears on the server

                      side.</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">To try and isolate any

                      dataspaces-specific behavior, I created a pure

                      Margo test case that just sends a single rpc of

                      the same size as dataspaces iteratively, whre the

                      server side does an 8-byte bulk transfer initiated

                      by the server, and sends a response. The results

                      are similar, except that it is now the 4th and 8th

                      timestep that are slow (and the first timestep is

                      VERY long, presumably because rxm communication

                      state is being established. DataSpaces has an

                      earlier RPC in its init that was absorbing this

                      latency).</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">I got margo profiling results for this

                      test case:</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">```</div>

                    <div class="">

                      <div class="">3</div>

                      <div class="">18446744025556676964,ofi+verbs;ofi_rxm://192.168.72.245:39573</div>

                      <div class="">0xa2a1,term_rpc</div>

                      <div class="">0x27b5,put_rpc</div>

                      <div class="">0xd320,__shutdown__</div>

                      <div class="">0x27b5

,0.000206208,10165,18446744027256353016,0,0.041241646,0.000045538,0.025733232,200,18446744073709551615,286331153,0,18446744073709551615,286331153,0</div>

                      <div class="">0x27b5 ,0;0.041241646,200.000000000,

                        0;</div>

                      <div class="">0xa2a1

,0.000009298,41633,18446744027256353016,0,0.000009298,0.000009298,0.000009298,1,18446744073709551615,286331153,0,18446744073709551615,286331153,0</div>

                      <div class="">0xa2a1 ,0;0.000009298,1.000000000,

                        0;</div>

                    </div>

                    <div class="">```</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">So I guess my question at this point

                      is, is there any sensible reason why the 4th and

                      8th RPC sent would have a long response time? I

                      think I’ve cleared my code on the client side and

                      server side, so it appears to be latency being

                      introduced by Margo, LibFabric, Argobots, or the

                      underlying OS. I do see long timesteps

                      occasionally after this (perhaps every 20-30

                      timesteps) but these are not consistent.</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">One last detail: this does not happen

                      on Summit. On summit, I see about 5-7x worse

                      single-RPC performance (250-350 microseconds per

                      RPC), but without the intermittent long timesteps.</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">I can provide the minimal test case if

                      it would be helpful. I am using APEX for timing

                      results, and the following dependencies with

                      Spack:</div>

                    <div class=""><br class="">

                    </div>

                    <div class=""><a class="x_moz-txt-link-abbreviated

                        moz-txt-link-freetext" href="mailto:argobots@1.1" moz-do-not-send="true">argobots@1.1</a>  <a class="x_moz-txt-link-abbreviated

                        moz-txt-link-freetext" href="mailto:json-c@0.15" moz-do-not-send="true">json-c@0.15</a>  <a class="x_moz-txt-link-abbreviated

                        moz-txt-link-freetext" href="mailto:libfabric@1.13.1" moz-do-not-send="true">libfabric@1.13.1</a>  <a class="x_moz-txt-link-abbreviated

                        moz-txt-link-freetext" href="mailto:mercury@2.0.1" moz-do-not-send="true">mercury@2.0.1</a>  <a class="x_moz-txt-link-abbreviated

                        moz-txt-link-freetext" href="mailto:mochi-margo@0.9.5" moz-do-not-send="true">mochi-margo@0.9.5</a>

                       rdma-core@20</div>

                    <div class=""><br class="">

                    </div>

                    <div class="">Thanks,</div>

                    <div class="">Philip</div>

                    <div class=""><br class="">

                    </div>

                    <div class=""><br class="">

                    </div>

                    <br class="">

                    <fieldset class="x_mimeAttachmentHeader"></fieldset>

                    <pre class="x_moz-quote-pre">_______________________________________________

mochi-devel mailing list

<a class="x_moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:mochi-devel@lists.mcs.anl.gov" moz-do-not-send="true">mochi-devel@lists.mcs.anl.gov</a>

<a class="x_moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.mcs.anl.gov%2Fmailman%2Flistinfo%2Fmochi-devel&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Cd561df6d1b3a415d98a508d99d4a62df%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637713765117067440%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=rqjBzeeOVRXpIymaVQYBc%2Bjc0n%2BrYZUeZHqXDWBr5o8%3D&reserved=0" originalsrc="https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel" shash="ZcdH3ojHEL57oiIw9TLiBe5oA9QOLvpFtb5DOaFcDFKtHdHaum2urK/zj1ucFMpYFctb8nAsuTPqJ9nQfSXdHVdqbvNf3hMUsS/fwX1AHkPVpOORvwlZh68eZzuN8gq9Lx13+FCUH3JZlgk0H/rwWnMeCxN92+OzQWGfWbKyPjM=" moz-do-not-send="true">https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel</a>

<a class="x_moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fresearch%2Fprojects%2Fmochi&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Cd561df6d1b3a415d98a508d99d4a62df%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637713765117067440%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=NtfbQ069ftcjDe%2F8e2kU0uGlEh5GLQkcqjuy%2FNW9kZ4%3D&reserved=0" originalsrc="https://www.mcs.anl.gov/research/projects/mochi" shash="kyH7FQK59FSodbkjA+g3O8szs0N4tzLZweR2bc6JCcRGFZ21hs1EM8ai09ECwijw5CgUqix8cFGO05FfzZhp0CavA9V0PbEKl+cfoEQRcBaV+oYIOS6q2m2jrTui7BfK8NKKOK1+fOXRYIAHYeUMpEcR4JpGVCN34GBI0YtbiBI=" moz-do-not-send="true">https://www.mcs.anl.gov/research/projects/mochi</a>

</pre>

                  </blockquote>

                </div>

                _______________________________________________<br class="">

                mochi-devel mailing list<br class="">

                <a href="mailto:mochi-devel@lists.mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">mochi-devel@lists.mcs.anl.gov</a><br class="">

                <a class="moz-txt-link-freetext" href="https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel">https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel</a><br class="">

                <a class="moz-txt-link-freetext" href="https://www.mcs.anl.gov/research/projects/mochi">https://www.mcs.anl.gov/research/projects/mochi</a><br class="">

              </div>

            </blockquote>

          </div>

          <br class="">

        </div>

      </div>

    </blockquote>

  </body>

</html>