<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>I have some more interesting info to share from trying a few
      different configurations.</p>
    <p>sm (on my laptop) and ofi+gni (on theta) do not exhibit this
      behavior; they have consistent performance across RPCs.</p>
    <p>ofi+verbs (on cooley) shows the same thing you were seeing; the
      4th and 8th RPCs are slow.</p>
    <p>Based on the above, it would sound like a problem with the
      libfabric/verbs path.  But on Jerome's suggestion I tried some
      other supported transports on cooley as well.  In particular I ran
      the same benchmark (the same build in fact, I just compiled in
      support for multiple transports and cycling through them in a
      single job script with runtime options) with these combinations:</p>
    <ul>
      <li>ofi+verbs</li>
      <li>ofi+tcp</li>
      <li>ofi+sockets</li>
      <li>bmi+tcp</li>
    </ul>
    <p>All of them show the same thing!  4th and 8th RPCs are at least
      an order of magnitude slower than the other RPCs.  That was a
      surprising result.  The bmi+tcp one isn't even using libfabric at
      all, even though they are all using the same underlying hardware.<br>
    </p>
    <p>I'm not sure what to make of that yet.  Possibly something with
      threading or signalling?<br>
    </p>
    <p>thanks,</p>
    <p>-Phil<br>
    </p>
    <div class="moz-cite-prefix">On 11/2/21 2:37 PM, Philip Davis wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:82DF3EFD-9A7E-414C-86F7-1894000C30E1@rutgers.edu">
      
      I’m glad you were able to reproduce it on a different system,
      thanks for letting me know. I’m not sure what the overlaps between
      Frontera and Cooley are (that aren’t shared by Summit); a quick
      look shows they are both Intel, and both FDR, but there’s probably
      more salient details.
      <div class="">
        <div><br class="">
          <blockquote type="cite" class="">
            <div class="">On Nov 2, 2021, at 2:24 PM, Phil Carns <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>
              wrote:</div>
            <br class="Apple-interchange-newline">
            <div class="">
              <div class="">
                <p class="">Ok.  Interesting.  I didn't see anything
                  unusual in the timing on my laptop with sm (other than
                  it being a bit slow, but I wasn't tuning or worrying
                  about core affinity or anything).  On Cooley, a
                  somewhat older Linux cluster with InfiniBand, I see
                  the 4th and 8th RPC delay you were talking about:</p>
                <p class="">{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047385464.750000,"dur":33077.620054,"args":{"GUID":3,"Parent
                  GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047418850.000000,"dur":458.322054,"args":{"GUID":5,"Parent
                  GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047419519.250000,"dur":205.328054,"args":{"GUID":7,"Parent
                  GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047419939.500000,"dur":2916.470054,"args":{"GUID":9,"Parent
                  GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047423046.750000,"dur":235.460054,"args":{"GUID":11,"Parent
                  GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047423426.000000,"dur":208.722054,"args":{"GUID":13,"Parent
                  GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047423809.000000,"dur":155.962054,"args":{"GUID":15,"Parent
                  GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047424096.250000,"dur":3573.288054,"args":{"GUID":17,"Parent
                  GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047427857.000000,"dur":243.386054,"args":{"GUID":19,"Parent
                  GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047428328.000000,"dur":154.338054,"args":{"GUID":21,"Parent
                  GUID":0}},<br class="">
                </p>
                <p class="">(assuming the first is high due to
                  connection establishment)</p>
                <p class="">I'll check some other systems/transports,
                  but I wanted to go ahead and share that I've been able
                  to reproduce what you were seeing.</p>
                <p class="">thanks,</p>
                <p class="">-Phil<br class="">
                </p>
                <div class="moz-cite-prefix">On 11/2/21 1:49 PM, Philip
                  Davis wrote:<br class="">
                </div>
                <blockquote type="cite" cite="mid:35E29989-49C6-4AA0-8C41-54D97F1ACBF6@rutgers.edu" class="">
                  <div class="">Glad that’s working now.</div>
                  <div class=""><br class="">
                  </div>
                  It is the put_wait events, and “dur” is the right
                  field. Those units are microseconds.
                  <div class="">
                    <div class=""><br class="">
                    </div>
                    <div class="">
                      <div class=""><br class="">
                        <blockquote type="cite" class="">
                          <div class="">On Nov 2, 2021, at 1:12 PM, Phil
                            Carns <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>
                            wrote:</div>
                          <br class="Apple-interchange-newline">
                          <div class="">
                            <div class="">
                              <p class="">Thanks Philip, the "= {0};"
                                initialization of that struct got me
                                going.</p>
                              <p class="">I can run the test case now
                                and it is producing output in the client
                                and server perf dirs.  Just to sanity
                                check what to look for, I think the
                                problem should be exhibited in the
                                "put_wait" or maybe "do_put" trace
                                events on the client?  For example on my
                                laptop I see this:</p>
                              <p class="">carns-x1-7g
                                ~/w/d/d/m/client.perf [SIGINT]> grep
                                do_put trace_events.0.json<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352591977.250000,"dur":350.464053,"args":{"GUID":2,"Parent
                                GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593065.000000,"dur":36.858053,"args":{"GUID":4,"Parent
                                GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593617.000000,"dur":32.954053,"args":{"GUID":6,"Parent
                                GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594193.000000,"dur":36.026053,"args":{"GUID":8,"Parent
                                GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594850.750000,"dur":34.404053,"args":{"GUID":10,"Parent
                                GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595400.750000,"dur":33.524053,"args":{"GUID":12,"Parent
                                GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595927.500000,"dur":34.390053,"args":{"GUID":14,"Parent
                                GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596416.000000,"dur":37.922053,"args":{"GUID":16,"Parent
                                GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596870.000000,"dur":35.506053,"args":{"GUID":18,"Parent
                                GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352597287.500000,"dur":34.774053,"args":{"GUID":20,"Parent
                                GUID":0}},<br class="">
                                carns-x1-7g ~/w/d/d/m/client.perf>
                                grep put_wait trace_events.0.json<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352592427.750000,"dur":570.428053,"args":{"GUID":3,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593122.750000,"dur":429.156053,"args":{"GUID":5,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593671.250000,"dur":465.616053,"args":{"GUID":7,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594248.500000,"dur":547.054053,"args":{"GUID":9,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594906.750000,"dur":428.964053,"args":{"GUID":11,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595455.750000,"dur":416.796053,"args":{"GUID":13,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595981.250000,"dur":371.040053,"args":{"GUID":15,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596485.500000,"dur":334.758053,"args":{"GUID":17,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596934.250000,"dur":298.168053,"args":{"GUID":19,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352597342.250000,"dur":389.624053,"args":{"GUID":21,"Parent
                                GUID":0}},</p>
                              <p class="">I should look at the "dur"
                                field right?  What are the units on
                                that?</p>
                              <p class="">I'll see if I can run this on
                                a "real" system shortly.<br class="">
                              </p>
                              <p class="">thanks!</p>
                              <p class="">-Phil<br class="">
                              </p>
                              <div class="moz-cite-prefix">On 11/2/21
                                12:11 PM, Philip Davis wrote:<br class="">
                              </div>
                              <blockquote type="cite" cite="mid:104B5045-328B-48EE-A32C-09D25817EED7@rutgers.edu" class="">
                                Hi Phil,
                                <div class=""><br class="">
                                </div>
                                <div class="">Sorry the data structures
                                  are like that; I wanted to preserve as
                                  much of the RPC size and ordering in
                                  case it ended up being important.</div>
                                <div class=""><br class="">
                                </div>
                                <div class="">I’m surprised in.odsc.size
                                  is troublesome, as I set in.odsc.size
                                  with the line `in.odsc.size =
                                  sizeof(odsc);`. I’m not sure what
                                  could be corrupting that value in the
                                  meantime.</div>
                                <div class=""><br class="">
                                </div>
                                <div class="">I don’t
                                  set in.odsc.gdim_size (which was an
                                  oversight, since that’s non-zero
                                  normally), so I’m less surprised
                                  that’s an issue. I thought I
                                  initialized `in` to zero, but I see I
                                  didn’t do that after all.</div>
                                <div class=""><br class="">
                                </div>
                                <div class="">Maybe change the line
                                  `bulk_gdim_t in;` to `bulk_gdim_t in =
                                  {0};`</div>
                                <div class=""><br class="">
                                </div>
                                <div class=""><br class="">
                                </div>
                                <div class="">
                                  <div class=""><br class="">
                                    <blockquote type="cite" class="">
                                      <div class="">On Nov 2, 2021, at
                                        11:48 AM, Phil Carns <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>
                                        wrote:</div>
                                      <br class="Apple-interchange-newline">
                                      <div class="">
                                        <div class="">
                                          <p class="">Awesome, thanks
                                            Philip.  It came through
                                            fine.</p>
                                          <p class="">I started by
                                            modifying the job script
                                            slightly to just run it on
                                            my laptop with sm (I wanted
                                            to make sure I understood
                                            the test case, and how to
                                            use apex, before trying
                                            elsewhere).  Does in.size
                                            needs to be set in
                                            client.c?  For me there is a
                                            random value in that field
                                            and it is causing the
                                            encoder on the forward to
                                            attempt a very large
                                            allocation.  The same might
                                            be true of gdim_size if it
                                            got past that step.  I
                                            started to alter them but
                                            then I wasn't sure what the
                                            implications were.<br class="">
                                          </p>
                                          <p class="">(fwiw I needed to
                                            include stdlib.h in
                                            common.h, but I've hit that
                                            a couple of times recently
                                            on codes that didn't
                                            previously generate
                                            warnings; I think something
                                            in Ubuntu has gotten strict
                                            about that recently).</p>
                                          <p class="">thanks,</p>
                                          <p class="">-Phil<br class="">
                                          </p>
                                          <p class=""><br class="">
                                          </p>
                                          <div class="moz-cite-prefix">On
                                            11/1/21 4:51 PM, Philip
                                            Davis wrote:<br class="">
                                          </div>
                                          <blockquote type="cite" cite="mid:67D76603-B4B9-48E5-9452-966D1E4866D2@rutgers.edu" class="">
                                            <div class="" style="word-wrap:break-word;
line-break:after-white-space">
                                              Hi Phil,</div>
                                            <div class="" style="word-wrap:break-word;
line-break:after-white-space">
                                              <div class=""><br class="">
                                              </div>
                                              <div class="">I’ve
                                                attached the reproducer.
                                                I see the 4th and 8th
                                                issue on Frontera, but
                                                not Summit. Hopefully it
                                                will build and run
                                                without too much
                                                modification. Let me
                                                know if there are any
                                                issues with running it
                                                (or if the anl listserv
                                                eats the tarball, which
                                                I kind of expect).</div>
                                              <div class=""><br class="">
                                              </div>
                                              <div class="">Thanks,</div>
                                              <div class="">Philip</div>
                                              <div class=""><br class="">
                                                <div class=""><br class="">
                                                  <blockquote type="cite" class="">
                                                    <div class="">On Nov
                                                      1, 2021, at 11:14
                                                      AM, Phil Carns
                                                      <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>
                                                      wrote:</div>
                                                    <br class="x_Apple-interchange-newline">
                                                    <div class="">
                                                      <div class="">
                                                        <p class="">Hi
                                                          Philip,</p>
                                                        <p class="">(FYI
                                                          I think the
                                                          first image
                                                          didn't come
                                                          through in
                                                          your email,
                                                          but I think
                                                          the others are
                                                          sufficient to
                                                          get across
                                                          what you are
                                                          seeing)</p>
                                                        <p class="">I
                                                          don't have any
                                                          idea what
                                                          would cause
                                                          that.  The
                                                          recently
                                                          released
                                                          libfabric
                                                          1.13.2
                                                          (available in
                                                          spack from the
mochi-spack-packages repo) includes some fixes to the rxm provider that
                                                          could be
                                                          relevant to
                                                          Frontera and
                                                          Summit, but
                                                          nothing that
                                                          aligns with
                                                          what you are
                                                          observing.</p>
                                                        <p class="">If
                                                          it were later
                                                          in the
                                                          sequence (much
                                                          later) I would
                                                          speculate that
                                                          memory
                                                          allocation/deallocation
                                                          cycles were
                                                          eventually
                                                          causing a
                                                          hiccup.  We've
                                                          seen something
                                                          like that in
                                                          the past, and
                                                          it's a theory
                                                          that we could
                                                          then test with
                                                          alternative
                                                          allocators
                                                          like
                                                          jemalloc. 
                                                          That's not
                                                          memory
                                                          allocation
                                                          jitter that
                                                          early in the
                                                          run though.<br class="">
                                                        </p>
                                                        <p class="">Please
                                                          do share your
                                                          reproducer if
                                                          you don't
                                                          mind!  We can
                                                          try a few
                                                          systems here
                                                          to at least
                                                          isolate if it
                                                          is something
                                                          peculiar to
                                                          the InfiniBand
                                                          path or if
                                                          there is a
                                                          more general
                                                          problem in
                                                          Margo.</p>
                                                        <p class="">thanks,</p>
                                                        <p class="">-Phil<br class="">
                                                        </p>
                                                        <div class="x_moz-cite-prefix">On
                                                          10/29/21 3:20
                                                          PM, Philip
                                                          Davis wrote:<br class="">
                                                        </div>
                                                        <blockquote type="cite" class="">
                                                          <div class="">Hello,</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I
                                                          apologize in
                                                          advance for
                                                          the winding
                                                          nature of this
                                                          email. I’m not
                                                          sure how to
                                                          ask my
                                                          question
                                                          without
                                                          explaining the
                                                          story of my
                                                          results some.</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          I’m doing some
characterization of our server performance under load, and I have a
                                                          quirk of
                                                          performance
                                                          that I wanted
                                                          to run by
                                                          people to see
                                                          if they make
                                                          sense. My
                                                          testing so far
                                                          has been to
                                                          iteratively
                                                          send batches
                                                          of RPCs using
margo_iforward, and then measure the wait time until they are all
                                                          complete. On
                                                          the server
                                                          side, handling
                                                          the RPC
                                                          includes a
                                                          margo_bulk_transfer
                                                          as a pull
                                                          initiated on
                                                          the server to
                                                          pull (for now)
                                                          8 bytes. The
                                                          payload of the
                                                          RPC request is
                                                          about 500
                                                          bytes, and the
                                                          response
                                                          payload is 4
                                                          bytes.
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I’ve
                                                          isolated my
                                                          results down
                                                          to one server
                                                          rank and one
                                                          client rank,
                                                          because it’s
                                                          an easier
                                                          starting point
                                                          to reason
                                                          from. Below is
                                                          a graph of
                                                          some of my
                                                          initial
                                                          results. These
                                                          results are
                                                          from Frontera.
                                                          The median
                                                          times are good
                                                          (a single RPC
                                                          takes on the
                                                          order of 10s
                                                          of
                                                          microseconds,
                                                          which seems
                                                          fantastic).
                                                          However, the
                                                          outliers are
                                                          fairly high
                                                          (note the log
                                                          scale of the
                                                          y-axis). With
                                                          only one RPC
                                                          per timestep,
                                                          for example,
                                                          there is a 30x
                                                          spread between
                                                          the median and
                                                          the max.</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""><img id="x_126DD6C1-A420-4D70-8D1C-96320B7F54E7" src="cid:5EB5DF63-97AA-48FC-8BBE-E666E933D79F" class="" moz-do-not-send="true"></div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I
                                                          was hoping
                                                          (expecting)
                                                          the first
                                                          timestep would
                                                          be where the
                                                          long replies
                                                          resided, but
                                                          that turned
                                                          out not to be
                                                          the case.
                                                          Below are
                                                          traces from
                                                          the 1 RPC
                                                          (blue) and 2
                                                          RPC  (orange)
                                                          per timestep
                                                          cases, 5
                                                          trials of 10
                                                          timesteps for
                                                          each case
                                                          (normalized to
                                                          fix the same
                                                          x-axis):</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""><span id="x_cid:part1.PbJ8iWDr.ROdFhNeX@mcs.anl.gov" class=""><PastedGraphic-6.png></span></div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">What
                                                          strikes me is
                                                          how consistent
                                                          these results
                                                          are across
                                                          trials. For
                                                          the 1 RPC per
                                                          timestep case,
                                                          the 3rd and
                                                          7th timestep
                                                          are
                                                          consistently<i class=""> </i><span class="" style="font-style:normal">slow
                                                          (and the rest
                                                          are fast). For
                                                          the 2 RPC per
                                                          timestep case,
                                                          the 2nd and
                                                          4th timestep
                                                          are always
                                                          slow and
                                                          sometimes the
                                                          10th is. These
                                                          results are
                                                          repeatable
                                                          with very rare
                                                          variation.</span></div>
                                                          <div class=""><span class="" style="font-style:normal"><br class="">
                                                          </span></div>
                                                          <div class="">For
                                                          the single RPC
                                                          case, I
                                                          recorded some
                                                          timers on the
                                                          server side,
                                                          and attempted
                                                          to overlay
                                                          them with the
                                                          client side
                                                          (there is some
                                                          unknown
                                                          offset, but
                                                          probably on
                                                          the order of
                                                          10s of
                                                          microseconds
                                                          at worst,
                                                          given the
                                                          pattern):</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""><span id="x_cid:part2.1KhzfDFv.Yq10fEh8@mcs.anl.gov" class=""><PastedGraphic-7.png></span></div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I
                                                          blew up the
                                                          first few
                                                          timesteps of
                                                          one of the
                                                          trials:</div>
                                                          <div class=""><span id="x_cid:part3.B6Byp6pi.aa0hKH2Q@mcs.anl.gov" class=""><PastedGraphic-8.png></span></div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">The
                                                          different
                                                          colors are
                                                          different
                                                          segments of
                                                          the handler,
                                                          but there
                                                          doesn’t seem
                                                          to be anything
                                                          too
                                                          interesting
                                                          going on
                                                          inside the
                                                          handler. So it
                                                          looks like the
                                                          time is being
                                                          introduced
                                                          before the 3rd
                                                          RPC handler
                                                          starts, based
                                                          on the where
                                                          the gap
                                                          appears on the
                                                          server side.</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">To
                                                          try and
                                                          isolate any
                                                          dataspaces-specific
                                                          behavior, I
                                                          created a pure
                                                          Margo test
                                                          case that just
                                                          sends a single
                                                          rpc of the
                                                          same size as
                                                          dataspaces
                                                          iteratively,
                                                          whre the
                                                          server side
                                                          does an 8-byte
                                                          bulk transfer
                                                          initiated by
                                                          the server,
                                                          and sends a
                                                          response. The
                                                          results are
                                                          similar,
                                                          except that it
                                                          is now the 4th
                                                          and 8th
                                                          timestep that
                                                          are slow (and
                                                          the first
                                                          timestep is
                                                          VERY long,
                                                          presumably
                                                          because rxm
                                                          communication
                                                          state is being
                                                          established.
                                                          DataSpaces has
                                                          an earlier RPC
                                                          in its init
                                                          that was
                                                          absorbing this
                                                          latency).</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I
                                                          got margo
                                                          profiling
                                                          results for
                                                          this test
                                                          case:</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">```</div>
                                                          <div class="">
                                                          <div class="">3</div>
                                                          <div class="">18446744025556676964,ofi+verbs;ofi_rxm://192.168.72.245:39573</div>
                                                          <div class="">0xa2a1,term_rpc</div>
                                                          <div class="">0x27b5,put_rpc</div>
                                                          <div class="">0xd320,__shutdown__</div>
                                                          <div class="">0x27b5
,0.000206208,10165,18446744027256353016,0,0.041241646,0.000045538,0.025733232,200,18446744073709551615,286331153,0,18446744073709551615,286331153,0</div>
                                                          <div class="">0x27b5
,0;0.041241646,200.000000000, 0;</div>
                                                          <div class="">0xa2a1
,0.000009298,41633,18446744027256353016,0,0.000009298,0.000009298,0.000009298,1,18446744073709551615,286331153,0,18446744073709551615,286331153,0</div>
                                                          <div class="">0xa2a1
,0;0.000009298,1.000000000, 0;</div>
                                                          </div>
                                                          <div class="">```</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">So
                                                          I guess my
                                                          question at
                                                          this point is,
                                                          is there any
                                                          sensible
                                                          reason why the
                                                          4th and 8th
                                                          RPC sent would
                                                          have a long
                                                          response time?
                                                          I think I’ve
                                                          cleared my
                                                          code on the
                                                          client side
                                                          and server
                                                          side, so it
                                                          appears to be
                                                          latency being
                                                          introduced by
                                                          Margo,
                                                          LibFabric,
                                                          Argobots, or
                                                          the underlying
                                                          OS. I do see
                                                          long timesteps
                                                          occasionally
                                                          after this
                                                          (perhaps every
                                                          20-30
                                                          timesteps) but
                                                          these are not
                                                          consistent.</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">One
                                                          last detail:
                                                          this does not
                                                          happen on
                                                          Summit. On
                                                          summit, I see
                                                          about 5-7x
                                                          worse
                                                          single-RPC
                                                          performance
                                                          (250-350
                                                          microseconds
                                                          per RPC), but
                                                          without the
                                                          intermittent
                                                          long
                                                          timesteps.</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I
                                                          can provide
                                                          the minimal
                                                          test case if
                                                          it would be
                                                          helpful. I am
                                                          using APEX for
                                                          timing
                                                          results, and
                                                          the following
                                                          dependencies
                                                          with Spack:</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""><a class="x_moz-txt-link-abbreviated moz-txt-link-freetext" href="mailto:argobots@1.1" moz-do-not-send="true">argobots@1.1</a>  <a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:json-c@0.15" moz-do-not-send="true">json-c@0.15</a>
                                                           <a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:libfabric@1.13.1" moz-do-not-send="true">libfabric@1.13.1</a>
                                                           <a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:mercury@2.0.1" moz-do-not-send="true">mercury@2.0.1</a>
                                                           <a class="moz-txt-link-freetext
x_moz-txt-link-abbreviated" href="mailto:mochi-margo@0.9.5" moz-do-not-send="true">mochi-margo@0.9.5</a>
                                                           rdma-core@20</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">Thanks,</div>
                                                          <div class="">Philip</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <br class="">
                                                          <fieldset class="x_mimeAttachmentHeader"></fieldset>
                                                          <pre class="x_moz-quote-pre">_______________________________________________
mochi-devel mailing list
<a class="moz-txt-link-freetext x_moz-txt-link-abbreviated" href="mailto:mochi-devel@lists.mcs.anl.gov" moz-do-not-send="true">mochi-devel@lists.mcs.anl.gov</a>
<a class="x_moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.mcs.anl.gov%2Fmailman%2Flistinfo%2Fmochi-devel&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Ca14ae84bf41946615a7708d99e2e081a%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637714743421137006%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=FsRuBITz1gZaU6RMPLdSfa3TKPCqopMc1VoOZhA1NS8%3D&reserved=0" originalsrc="https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel" shash="GFVAtxlaiWSl+xXyKKr91wIrwOwn1D1FoMGHz6eelXG6C9yaZFTbgcYIfQuY0N7WjQTmJJFGeMdgPlmWL1EdGEbQUIgHem07fd/WzLqnOitfFY3BFNQNQbpHKjx5VLXd5+rXY7eoOIxUmpure/r0Xxis3d9IRWe1+9cncMx5snk=" moz-do-not-send="true">https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel</a>
<a class="x_moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fresearch%2Fprojects%2Fmochi&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Ca14ae84bf41946615a7708d99e2e081a%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637714743421147002%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Gb4zp0QRN5yfdkjt%2FpTcM1mrFRK3odkl5O5gmJ1xRuo%3D&reserved=0" originalsrc="https://www.mcs.anl.gov/research/projects/mochi" shash="ZciRqTT7pleGB9mPHXgEdZ0dMMOIHI8GlhCiALRv9/wXzgcTVVIxFBQtNSR1ejiE911XwJnVFE9HdB8vOXo0+PDSgVFC40Z8EDNbscVRuBXoz/f76YZlTV6DhGJFfADBbkehrINprcX55SGQUjl3QQ6lXGv7vlPmvdjCwI6kP8k=" moz-do-not-send="true">https://www.mcs.anl.gov/research/projects/mochi</a>
</pre>
                                                        </blockquote>
                                                      </div>
_______________________________________________<br class="">
                                                      mochi-devel
                                                      mailing list<br class="">
                                                      <a href="mailto:mochi-devel@lists.mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">mochi-devel@lists.mcs.anl.gov</a><br class="">
                                                      <a class="moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.mcs.anl.gov%2Fmailman%2Flistinfo%2Fmochi-devel&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Ca14ae84bf41946615a7708d99e2e081a%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637714743421147002%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=fwq0wjfEcW1t1StE5IKFrNy1krIP7jTsGXp7qeWM72s%3D&reserved=0" originalsrc="https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel" shash="NOtCfKwO57mOWFj34qtUn/X2u6JyLivvsUW+0Vhsy7WFcLIbBTZkZqbVlwACROtxqyVQ4744scEoyWjJ06lh10rIWrZhU4TE1DZuQoQ6kaVK20HgK3GvhgP8DGvA0s9v2j1xacdE+gPDdE4AdpL8VxoRcCYi9WaJYjQTuo7N10o=" moz-do-not-send="true">https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel</a><br class="">
                                                      <a class="moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fresearch%2Fprojects%2Fmochi&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Ca14ae84bf41946615a7708d99e2e081a%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637714743421157001%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=zl1uuK5Hz%2BPXjbvlF0fNkBDOF1lYsaPXI5fF2gtZ7Zw%3D&reserved=0" originalsrc="https://www.mcs.anl.gov/research/projects/mochi" shash="ahsIn8ix6J/ZD7ASdRBxWhQ6a+k9AYaZPn3iMt99FfF9iNFs8jxELzquYPlgZZexR+xCg2/QcektttWmndkdjSbg86nhTOGDcweQNOWQ6Sf81lr0viM8tF9wmYARC0dNhr0kzZ7OloVxrLJSyalHhheSeBzD9tibFsnCFYTfpwU=" moz-do-not-send="true">https://www.mcs.anl.gov/research/projects/mochi</a><br class="">
                                                    </div>
                                                  </blockquote>
                                                </div>
                                                <br class="">
                                              </div>
                                            </div>
                                          </blockquote>
                                        </div>
                                      </div>
                                    </blockquote>
                                  </div>
                                  <br class="">
                                </div>
                              </blockquote>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br class="">
                    </div>
                  </div>
                </blockquote>
              </div>
            </div>
          </blockquote>
        </div>
        <br class="">
      </div>
    </blockquote>
  </body>
</html>