<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body>
    <p>Srinivasan Ramesh (U. Oregon) has done some work on fine-grained
      RPC component timing, but it's not in the mainline margo tree so
      we'll need to do a little more work to look at it.</p>
    <p>In the mean time on a hunch I found that I can make the latency
      consistent on Cooley by altering the margo_init() arguments to be
      (..., 0, -1) in server.c (meaning that no additional execution
      streams are used at all; all mercury progress and all rpc handlers
      are executed using user-level threads in the process's primary
      execution stream (OS thread).</p>
    <p>It's expected that there would be some more jitter jumping across
      OS threads for RPC handling, but it shouldn't be that extreme,
      regular, or system-specific.<br>
    </p>
    <p>Thanks again for the test case and the Apex instrumentation; this
      is the sort of thing that's normally really hard to isolate.</p>
    <p>thanks,<br>
    </p>
    <p>-Phil<br>
    </p>
    <div class="moz-cite-prefix">On 11/5/21 10:09 AM, Philip Davis
      wrote:<br>
    </div>
    <blockquote type="cite" cite="mid:B0643541-F015-4EB6-AD30-F9F85B196465@rutgers.edu">
      
      That’s extremely interesting. 
      <div class=""><br class="">
      </div>
      <div class="">Are there any internal timers in Margo that can tell
        what the delay was between the server’s progress thread queueing
        the rpc and the handler thread starting to handle it? If I’m
        understanding <font class="" color="#000000"><span style="caret-color: rgb(0, 0, 0);" class=""><a href="https://mochi.readthedocs.io/en/latest/general/03_rpc_model.html" class="moz-txt-link-freetext" moz-do-not-send="true">https://mochi.readthedocs.io/en/latest/general/03_rpc_model.html</a>
            correctly, it seems to me like that is the most likely place
            for non-deterministic delay to be introduced by argobots in
            the client -> server direction.</span></font></div>
      <div class=""><font class="" color="#000000"><span style="caret-color: rgb(0, 0, 0);" class=""><br class="">
          </span></font></div>
      <div class=""><font class="" color="#000000"><span style="caret-color: rgb(0, 0, 0);" class="">I just ran a
            quick test where I changed the number of handler threads to
            5, and I saw no change in behavior (still 4 and 8, not 5 and
            10).</span></font></div>
      <div class="">
        <div class="">
          <div><br class="">
          </div>
          <div><br class="">
            <blockquote type="cite" class="">
              <div class="">On Nov 4, 2021, at 9:04 PM, Phil Carns <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>
                wrote:</div>
              <br class="Apple-interchange-newline">
              <div class="">
                <div class="">
                  <p class="">I have some more interesting info to share
                    from trying a few different configurations.</p>
                  <p class="">sm (on my laptop) and ofi+gni (on theta)
                    do not exhibit this behavior; they have consistent
                    performance across RPCs.</p>
                  <p class="">ofi+verbs (on cooley) shows the same thing
                    you were seeing; the 4th and 8th RPCs are slow.</p>
                  <p class="">Based on the above, it would sound like a
                    problem with the libfabric/verbs path.  But on
                    Jerome's suggestion I tried some other supported
                    transports on cooley as well.  In particular I ran
                    the same benchmark (the same build in fact, I just
                    compiled in support for multiple transports and
                    cycling through them in a single job script with
                    runtime options) with these combinations:</p>
                  <ul class="">
                    <li class="">ofi+verbs</li>
                    <li class="">ofi+tcp</li>
                    <li class="">ofi+sockets</li>
                    <li class="">bmi+tcp</li>
                  </ul>
                  <p class="">All of them show the same thing!  4th and
                    8th RPCs are at least an order of magnitude slower
                    than the other RPCs.  That was a surprising result. 
                    The bmi+tcp one isn't even using libfabric at all,
                    even though they are all using the same underlying
                    hardware.<br class="">
                  </p>
                  <p class="">I'm not sure what to make of that yet. 
                    Possibly something with threading or signalling?<br class="">
                  </p>
                  <p class="">thanks,</p>
                  <p class="">-Phil<br class="">
                  </p>
                  <div class="moz-cite-prefix">On 11/2/21 2:37 PM,
                    Philip Davis wrote:<br class="">
                  </div>
                  <blockquote type="cite" cite="mid:82DF3EFD-9A7E-414C-86F7-1894000C30E1@rutgers.edu" class="">
                    I’m glad you were able to reproduce it on a
                    different system, thanks for letting me know. I’m
                    not sure what the overlaps between Frontera and
                    Cooley are (that aren’t shared by Summit); a quick
                    look shows they are both Intel, and both FDR, but
                    there’s probably more salient details.
                    <div class="">
                      <div class=""><br class="">
                        <blockquote type="cite" class="">
                          <div class="">On Nov 2, 2021, at 2:24 PM, Phil
                            Carns <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>
                            wrote:</div>
                          <br class="Apple-interchange-newline">
                          <div class="">
                            <div class="">
                              <p class="">Ok.  Interesting.  I didn't
                                see anything unusual in the timing on my
                                laptop with sm (other than it being a
                                bit slow, but I wasn't tuning or
                                worrying about core affinity or
                                anything).  On Cooley, a somewhat older
                                Linux cluster with InfiniBand, I see the
                                4th and 8th RPC delay you were talking
                                about:</p>
                              <p class="">{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047385464.750000,"dur":33077.620054,"args":{"GUID":3,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047418850.000000,"dur":458.322054,"args":{"GUID":5,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047419519.250000,"dur":205.328054,"args":{"GUID":7,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047419939.500000,"dur":2916.470054,"args":{"GUID":9,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047423046.750000,"dur":235.460054,"args":{"GUID":11,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047423426.000000,"dur":208.722054,"args":{"GUID":13,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047423809.000000,"dur":155.962054,"args":{"GUID":15,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047424096.250000,"dur":3573.288054,"args":{"GUID":17,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047427857.000000,"dur":243.386054,"args":{"GUID":19,"Parent
                                GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635877047428328.000000,"dur":154.338054,"args":{"GUID":21,"Parent
                                GUID":0}},<br class="">
                              </p>
                              <p class="">(assuming the first is high
                                due to connection establishment)</p>
                              <p class="">I'll check some other
                                systems/transports, but I wanted to go
                                ahead and share that I've been able to
                                reproduce what you were seeing.</p>
                              <p class="">thanks,</p>
                              <p class="">-Phil<br class="">
                              </p>
                              <div class="moz-cite-prefix">On 11/2/21
                                1:49 PM, Philip Davis wrote:<br class="">
                              </div>
                              <blockquote type="cite" cite="mid:35E29989-49C6-4AA0-8C41-54D97F1ACBF6@rutgers.edu" class="">
                                <div class="">Glad that’s working now.</div>
                                <div class=""><br class="">
                                </div>
                                It is the put_wait events, and “dur” is
                                the right field. Those units are
                                microseconds.
                                <div class="">
                                  <div class=""><br class="">
                                  </div>
                                  <div class="">
                                    <div class=""><br class="">
                                      <blockquote type="cite" class="">
                                        <div class="">On Nov 2, 2021, at
                                          1:12 PM, Phil Carns <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>
                                          wrote:</div>
                                        <br class="Apple-interchange-newline">
                                        <div class="">
                                          <div class="">
                                            <p class="">Thanks Philip,
                                              the "= {0};"
                                              initialization of that
                                              struct got me going.</p>
                                            <p class="">I can run the
                                              test case now and it is
                                              producing output in the
                                              client and server perf
                                              dirs.  Just to sanity
                                              check what to look for, I
                                              think the problem should
                                              be exhibited in the
                                              "put_wait" or maybe
                                              "do_put" trace events on
                                              the client?  For example
                                              on my laptop I see this:</p>
                                            <p class="">carns-x1-7g
                                              ~/w/d/d/m/client.perf
                                              [SIGINT]> grep do_put
                                              trace_events.0.json<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352591977.250000,"dur":350.464053,"args":{"GUID":2,"Parent
                                              GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593065.000000,"dur":36.858053,"args":{"GUID":4,"Parent
                                              GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593617.000000,"dur":32.954053,"args":{"GUID":6,"Parent
                                              GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594193.000000,"dur":36.026053,"args":{"GUID":8,"Parent
                                              GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594850.750000,"dur":34.404053,"args":{"GUID":10,"Parent
                                              GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595400.750000,"dur":33.524053,"args":{"GUID":12,"Parent
                                              GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595927.500000,"dur":34.390053,"args":{"GUID":14,"Parent
                                              GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596416.000000,"dur":37.922053,"args":{"GUID":16,"Parent
                                              GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596870.000000,"dur":35.506053,"args":{"GUID":18,"Parent
                                              GUID":0}},<br class="">
{"name":"do_put","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352597287.500000,"dur":34.774053,"args":{"GUID":20,"Parent
                                              GUID":0}},<br class="">
                                              carns-x1-7g
                                              ~/w/d/d/m/client.perf>
                                              grep put_wait
                                              trace_events.0.json<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352592427.750000,"dur":570.428053,"args":{"GUID":3,"Parent
                                              GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593122.750000,"dur":429.156053,"args":{"GUID":5,"Parent
                                              GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352593671.250000,"dur":465.616053,"args":{"GUID":7,"Parent
                                              GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594248.500000,"dur":547.054053,"args":{"GUID":9,"Parent
                                              GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352594906.750000,"dur":428.964053,"args":{"GUID":11,"Parent
                                              GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595455.750000,"dur":416.796053,"args":{"GUID":13,"Parent
                                              GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352595981.250000,"dur":371.040053,"args":{"GUID":15,"Parent
                                              GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596485.500000,"dur":334.758053,"args":{"GUID":17,"Parent
                                              GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352596934.250000,"dur":298.168053,"args":{"GUID":19,"Parent
                                              GUID":0}},<br class="">
{"name":"put_wait","cat":"CPU","ph":"X","pid":0,"tid":0,"ts":1635872352597342.250000,"dur":389.624053,"args":{"GUID":21,"Parent
                                              GUID":0}},</p>
                                            <p class="">I should look at
                                              the "dur" field right? 
                                              What are the units on
                                              that?</p>
                                            <p class="">I'll see if I
                                              can run this on a "real"
                                              system shortly.<br class="">
                                            </p>
                                            <p class="">thanks!</p>
                                            <p class="">-Phil<br class="">
                                            </p>
                                            <div class="moz-cite-prefix">On
                                              11/2/21 12:11 PM, Philip
                                              Davis wrote:<br class="">
                                            </div>
                                            <blockquote type="cite" cite="mid:104B5045-328B-48EE-A32C-09D25817EED7@rutgers.edu" class="">
                                              Hi Phil,
                                              <div class=""><br class="">
                                              </div>
                                              <div class="">Sorry the
                                                data structures are like
                                                that; I wanted to
                                                preserve as much of the
                                                RPC size and ordering in
                                                case it ended up being
                                                important.</div>
                                              <div class=""><br class="">
                                              </div>
                                              <div class="">I’m
                                                surprised in.odsc.size
                                                is troublesome, as I
                                                set in.odsc.size with
                                                the line `in.odsc.size =
                                                sizeof(odsc);`. I’m not
                                                sure what could be
                                                corrupting that value in
                                                the meantime.</div>
                                              <div class=""><br class="">
                                              </div>
                                              <div class="">I don’t
                                                set in.odsc.gdim_size
                                                (which was an oversight,
                                                since that’s non-zero
                                                normally), so I’m less
                                                surprised that’s an
                                                issue. I thought I
                                                initialized `in` to
                                                zero, but I see I didn’t
                                                do that after all.</div>
                                              <div class=""><br class="">
                                              </div>
                                              <div class="">Maybe change
                                                the line `bulk_gdim_t
                                                in;` to `bulk_gdim_t in
                                                = {0};`</div>
                                              <div class=""><br class="">
                                              </div>
                                              <div class=""><br class="">
                                              </div>
                                              <div class="">
                                                <div class=""><br class="">
                                                  <blockquote type="cite" class="">
                                                    <div class="">On Nov
                                                      2, 2021, at 11:48
                                                      AM, Phil Carns
                                                      <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>
                                                      wrote:</div>
                                                    <br class="Apple-interchange-newline">
                                                    <div class="">
                                                      <div class="">
                                                        <p class="">Awesome,
                                                          thanks
                                                          Philip.  It
                                                          came through
                                                          fine.</p>
                                                        <p class="">I
                                                          started by
                                                          modifying the
                                                          job script
                                                          slightly to
                                                          just run it on
                                                          my laptop with
                                                          sm (I wanted
                                                          to make sure I
                                                          understood the
                                                          test case, and
                                                          how to use
                                                          apex, before
                                                          trying
                                                          elsewhere). 
                                                          Does in.size
                                                          needs to be
                                                          set in
                                                          client.c?  For
                                                          me there is a
                                                          random value
                                                          in that field
                                                          and it is
                                                          causing the
                                                          encoder on the
                                                          forward to
                                                          attempt a very
                                                          large
                                                          allocation. 
                                                          The same might
                                                          be true of
                                                          gdim_size if
                                                          it got past
                                                          that step.  I
                                                          started to
                                                          alter them but
                                                          then I wasn't
                                                          sure what the
                                                          implications
                                                          were.<br class="">
                                                        </p>
                                                        <p class="">(fwiw
                                                          I needed to
                                                          include
                                                          stdlib.h in
                                                          common.h, but
                                                          I've hit that
                                                          a couple of
                                                          times recently
                                                          on codes that
                                                          didn't
                                                          previously
                                                          generate
                                                          warnings; I
                                                          think
                                                          something in
                                                          Ubuntu has
                                                          gotten strict
                                                          about that
                                                          recently).</p>
                                                        <p class="">thanks,</p>
                                                        <p class="">-Phil<br class="">
                                                        </p>
                                                        <p class=""><br class="">
                                                        </p>
                                                        <div class="moz-cite-prefix">On
                                                          11/1/21 4:51
                                                          PM, Philip
                                                          Davis wrote:<br class="">
                                                        </div>
                                                        <blockquote type="cite" cite="mid:67D76603-B4B9-48E5-9452-966D1E4866D2@rutgers.edu" class="">
                                                          <div class="" style="word-wrap:break-word;
line-break:after-white-space">Hi Phil,</div>
                                                          <div class="" style="word-wrap:break-word;
line-break:after-white-space">
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I’ve
                                                          attached the
                                                          reproducer. I
                                                          see the 4th
                                                          and 8th issue
                                                          on Frontera,
                                                          but not
                                                          Summit.
                                                          Hopefully it
                                                          will build and
                                                          run without
                                                          too much
                                                          modification.
                                                          Let me know if
                                                          there are any
                                                          issues with
                                                          running it (or
                                                          if the anl
                                                          listserv eats
                                                          the tarball,
                                                          which I kind
                                                          of expect).</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">Thanks,</div>
                                                          <div class="">Philip</div>
                                                          <div class=""><br class="">
                                                          <div class=""><br class="">
                                                          <blockquote type="cite" class="">
                                                          <div class="">On
                                                          Nov 1, 2021,
                                                          at 11:14 AM,
                                                          Phil Carns
                                                          <<a href="mailto:carns@mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">carns@mcs.anl.gov</a>>
                                                          wrote:</div>
                                                          <br class="x_Apple-interchange-newline">
                                                          <div class="">
                                                          <div class="">
                                                          <p class="">Hi
                                                          Philip,</p>
                                                          <p class="">(FYI
                                                          I think the
                                                          first image
                                                          didn't come
                                                          through in
                                                          your email,
                                                          but I think
                                                          the others are
                                                          sufficient to
                                                          get across
                                                          what you are
                                                          seeing)</p>
                                                          <p class="">I
                                                          don't have any
                                                          idea what
                                                          would cause
                                                          that.  The
                                                          recently
                                                          released
                                                          libfabric
                                                          1.13.2
                                                          (available in
                                                          spack from the
mochi-spack-packages repo) includes some fixes to the rxm provider that
                                                          could be
                                                          relevant to
                                                          Frontera and
                                                          Summit, but
                                                          nothing that
                                                          aligns with
                                                          what you are
                                                          observing.</p>
                                                          <p class="">If
                                                          it were later
                                                          in the
                                                          sequence (much
                                                          later) I would
                                                          speculate that
                                                          memory
                                                          allocation/deallocation
                                                          cycles were
                                                          eventually
                                                          causing a
                                                          hiccup.  We've
                                                          seen something
                                                          like that in
                                                          the past, and
                                                          it's a theory
                                                          that we could
                                                          then test with
                                                          alternative
                                                          allocators
                                                          like
                                                          jemalloc. 
                                                          That's not
                                                          memory
                                                          allocation
                                                          jitter that
                                                          early in the
                                                          run though.<br class="">
                                                          </p>
                                                          <p class="">Please
                                                          do share your
                                                          reproducer if
                                                          you don't
                                                          mind!  We can
                                                          try a few
                                                          systems here
                                                          to at least
                                                          isolate if it
                                                          is something
                                                          peculiar to
                                                          the InfiniBand
                                                          path or if
                                                          there is a
                                                          more general
                                                          problem in
                                                          Margo.</p>
                                                          <p class="">thanks,</p>
                                                          <p class="">-Phil<br class="">
                                                          </p>
                                                          <div class="x_moz-cite-prefix">On
                                                          10/29/21 3:20
                                                          PM, Philip
                                                          Davis wrote:<br class="">
                                                          </div>
                                                          <blockquote type="cite" class="">
                                                          <div class="">Hello,</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I
                                                          apologize in
                                                          advance for
                                                          the winding
                                                          nature of this
                                                          email. I’m not
                                                          sure how to
                                                          ask my
                                                          question
                                                          without
                                                          explaining the
                                                          story of my
                                                          results some.</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          I’m doing some
characterization of our server performance under load, and I have a
                                                          quirk of
                                                          performance
                                                          that I wanted
                                                          to run by
                                                          people to see
                                                          if they make
                                                          sense. My
                                                          testing so far
                                                          has been to
                                                          iteratively
                                                          send batches
                                                          of RPCs using
margo_iforward, and then measure the wait time until they are all
                                                          complete. On
                                                          the server
                                                          side, handling
                                                          the RPC
                                                          includes a
                                                          margo_bulk_transfer
                                                          as a pull
                                                          initiated on
                                                          the server to
                                                          pull (for now)
                                                          8 bytes. The
                                                          payload of the
                                                          RPC request is
                                                          about 500
                                                          bytes, and the
                                                          response
                                                          payload is 4
                                                          bytes.
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I’ve
                                                          isolated my
                                                          results down
                                                          to one server
                                                          rank and one
                                                          client rank,
                                                          because it’s
                                                          an easier
                                                          starting point
                                                          to reason
                                                          from. Below is
                                                          a graph of
                                                          some of my
                                                          initial
                                                          results. These
                                                          results are
                                                          from Frontera.
                                                          The median
                                                          times are good
                                                          (a single RPC
                                                          takes on the
                                                          order of 10s
                                                          of
                                                          microseconds,
                                                          which seems
                                                          fantastic).
                                                          However, the
                                                          outliers are
                                                          fairly high
                                                          (note the log
                                                          scale of the
                                                          y-axis). With
                                                          only one RPC
                                                          per timestep,
                                                          for example,
                                                          there is a 30x
                                                          spread between
                                                          the median and
                                                          the max.</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""><img id="x_126DD6C1-A420-4D70-8D1C-96320B7F54E7" src="cid:5EB5DF63-97AA-48FC-8BBE-E666E933D79F" class="" moz-do-not-send="true"></div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I
                                                          was hoping
                                                          (expecting)
                                                          the first
                                                          timestep would
                                                          be where the
                                                          long replies
                                                          resided, but
                                                          that turned
                                                          out not to be
                                                          the case.
                                                          Below are
                                                          traces from
                                                          the 1 RPC
                                                          (blue) and 2
                                                          RPC  (orange)
                                                          per timestep
                                                          cases, 5
                                                          trials of 10
                                                          timesteps for
                                                          each case
                                                          (normalized to
                                                          fix the same
                                                          x-axis):</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""><span id="x_cid:part1.PbJ8iWDr.ROdFhNeX@mcs.anl.gov" class=""><PastedGraphic-6.png></span></div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">What
                                                          strikes me is
                                                          how consistent
                                                          these results
                                                          are across
                                                          trials. For
                                                          the 1 RPC per
                                                          timestep case,
                                                          the 3rd and
                                                          7th timestep
                                                          are
                                                          consistently<i class=""> </i><span class="" style="font-style:normal">slow
                                                          (and the rest
                                                          are fast). For
                                                          the 2 RPC per
                                                          timestep case,
                                                          the 2nd and
                                                          4th timestep
                                                          are always
                                                          slow and
                                                          sometimes the
                                                          10th is. These
                                                          results are
                                                          repeatable
                                                          with very rare
                                                          variation.</span></div>
                                                          <div class=""><span class="" style="font-style:normal"><br class="">
                                                          </span></div>
                                                          <div class="">For
                                                          the single RPC
                                                          case, I
                                                          recorded some
                                                          timers on the
                                                          server side,
                                                          and attempted
                                                          to overlay
                                                          them with the
                                                          client side
                                                          (there is some
                                                          unknown
                                                          offset, but
                                                          probably on
                                                          the order of
                                                          10s of
                                                          microseconds
                                                          at worst,
                                                          given the
                                                          pattern):</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""><span id="x_cid:part2.1KhzfDFv.Yq10fEh8@mcs.anl.gov" class=""><PastedGraphic-7.png></span></div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I
                                                          blew up the
                                                          first few
                                                          timesteps of
                                                          one of the
                                                          trials:</div>
                                                          <div class=""><span id="x_cid:part3.B6Byp6pi.aa0hKH2Q@mcs.anl.gov" class=""><PastedGraphic-8.png></span></div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">The
                                                          different
                                                          colors are
                                                          different
                                                          segments of
                                                          the handler,
                                                          but there
                                                          doesn’t seem
                                                          to be anything
                                                          too
                                                          interesting
                                                          going on
                                                          inside the
                                                          handler. So it
                                                          looks like the
                                                          time is being
                                                          introduced
                                                          before the 3rd
                                                          RPC handler
                                                          starts, based
                                                          on the where
                                                          the gap
                                                          appears on the
                                                          server side.</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">To
                                                          try and
                                                          isolate any
                                                          dataspaces-specific
                                                          behavior, I
                                                          created a pure
                                                          Margo test
                                                          case that just
                                                          sends a single
                                                          rpc of the
                                                          same size as
                                                          dataspaces
                                                          iteratively,
                                                          whre the
                                                          server side
                                                          does an 8-byte
                                                          bulk transfer
                                                          initiated by
                                                          the server,
                                                          and sends a
                                                          response. The
                                                          results are
                                                          similar,
                                                          except that it
                                                          is now the 4th
                                                          and 8th
                                                          timestep that
                                                          are slow (and
                                                          the first
                                                          timestep is
                                                          VERY long,
                                                          presumably
                                                          because rxm
                                                          communication
                                                          state is being
                                                          established.
                                                          DataSpaces has
                                                          an earlier RPC
                                                          in its init
                                                          that was
                                                          absorbing this
                                                          latency).</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I
                                                          got margo
                                                          profiling
                                                          results for
                                                          this test
                                                          case:</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">```</div>
                                                          <div class="">
                                                          <div class="">3</div>
                                                          <div class="">18446744025556676964,ofi+verbs;ofi_rxm://192.168.72.245:39573</div>
                                                          <div class="">0xa2a1,term_rpc</div>
                                                          <div class="">0x27b5,put_rpc</div>
                                                          <div class="">0xd320,__shutdown__</div>
                                                          <div class="">0x27b5
,0.000206208,10165,18446744027256353016,0,0.041241646,0.000045538,0.025733232,200,18446744073709551615,286331153,0,18446744073709551615,286331153,0</div>
                                                          <div class="">0x27b5
,0;0.041241646,200.000000000, 0;</div>
                                                          <div class="">0xa2a1
,0.000009298,41633,18446744027256353016,0,0.000009298,0.000009298,0.000009298,1,18446744073709551615,286331153,0,18446744073709551615,286331153,0</div>
                                                          <div class="">0xa2a1
,0;0.000009298,1.000000000, 0;</div>
                                                          </div>
                                                          <div class="">```</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">So
                                                          I guess my
                                                          question at
                                                          this point is,
                                                          is there any
                                                          sensible
                                                          reason why the
                                                          4th and 8th
                                                          RPC sent would
                                                          have a long
                                                          response time?
                                                          I think I’ve
                                                          cleared my
                                                          code on the
                                                          client side
                                                          and server
                                                          side, so it
                                                          appears to be
                                                          latency being
                                                          introduced by
                                                          Margo,
                                                          LibFabric,
                                                          Argobots, or
                                                          the underlying
                                                          OS. I do see
                                                          long timesteps
                                                          occasionally
                                                          after this
                                                          (perhaps every
                                                          20-30
                                                          timesteps) but
                                                          these are not
                                                          consistent.</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">One
                                                          last detail:
                                                          this does not
                                                          happen on
                                                          Summit. On
                                                          summit, I see
                                                          about 5-7x
                                                          worse
                                                          single-RPC
                                                          performance
                                                          (250-350
                                                          microseconds
                                                          per RPC), but
                                                          without the
                                                          intermittent
                                                          long
                                                          timesteps.</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">I
                                                          can provide
                                                          the minimal
                                                          test case if
                                                          it would be
                                                          helpful. I am
                                                          using APEX for
                                                          timing
                                                          results, and
                                                          the following
                                                          dependencies
                                                          with Spack:</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""><a class="moz-txt-link-freetext x_moz-txt-link-abbreviated" href="mailto:argobots@1.1" moz-do-not-send="true">argobots@1.1</a>  <a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:json-c@0.15" moz-do-not-send="true">json-c@0.15</a>
                                                           <a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:libfabric@1.13.1" moz-do-not-send="true">libfabric@1.13.1</a>
                                                           <a class="x_moz-txt-link-abbreviated
moz-txt-link-freetext" href="mailto:mercury@2.0.1" moz-do-not-send="true">mercury@2.0.1</a>
                                                           <a class="moz-txt-link-freetext
x_moz-txt-link-abbreviated" href="mailto:mochi-margo@0.9.5" moz-do-not-send="true">mochi-margo@0.9.5</a>
                                                           rdma-core@20</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class="">Thanks,</div>
                                                          <div class="">Philip</div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <div class=""><br class="">
                                                          </div>
                                                          <br class="">
                                                          <fieldset class="x_mimeAttachmentHeader"></fieldset>
                                                          <pre class="x_moz-quote-pre">_______________________________________________
mochi-devel mailing list
<a class="moz-txt-link-freetext x_moz-txt-link-abbreviated" href="mailto:mochi-devel@lists.mcs.anl.gov" moz-do-not-send="true">mochi-devel@lists.mcs.anl.gov</a>
<a class="x_moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.mcs.anl.gov%2Fmailman%2Flistinfo%2Fmochi-devel&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Ce2bcb39b6d5b4cf4532708d99ff848e9%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637716711746583958%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ww5eZQ8j98frK5PUk7A2Vn4pL7Cq06wuK0cYn%2B%2FdCeQ%3D&reserved=0" originalsrc="https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel" shash="U//wRP/ZbiFaVwLlUowAz/GyX5SbAIUR0Qd1rzZQYTR/Cg+FBAB0LV7uSIt0Zs5sW810w9UZIqlcPGWJPN9jWZykoRKiWR7BSjRYfQrIkWpzcTsiwxNs9L+HF7x7Gr8L3rK55061msbIzLgDOEXwx8O6eCpoCG7eb8OkFrn68fE=" moz-do-not-send="true">https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel</a>
<a class="x_moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fresearch%2Fprojects%2Fmochi&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Ce2bcb39b6d5b4cf4532708d99ff848e9%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637716711746583958%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=PJqXoN2cC1EexuVnsC%2FDd%2F1hBAAx1TRw4dwMn%2BnIRh0%3D&reserved=0" originalsrc="https://www.mcs.anl.gov/research/projects/mochi" shash="ek4lwRtuuqwPAsh7L9n6vI4U2tHbYHOKR9YhXG6A/7Ak16qKccrQNekXWFGEU0KuumEnp81e8yj/nJGPpvbKq1WUQYOah8tkAmhOSaFMBM3QUfZYENeru0ze8LskRJUZEuGnd/AQtNXU0Z4dUBvc9hg+UfB7TGm0yVxeBtDoXCc=" moz-do-not-send="true">https://www.mcs.anl.gov/research/projects/mochi</a>
</pre>
                                                          </blockquote>
                                                          </div>
_______________________________________________<br class="">
                                                          mochi-devel
                                                          mailing list<br class="">
                                                          <a href="mailto:mochi-devel@lists.mcs.anl.gov" class="moz-txt-link-freetext" moz-do-not-send="true">mochi-devel@lists.mcs.anl.gov</a><br class="">
                                                          <a class="moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.mcs.anl.gov%2Fmailman%2Flistinfo%2Fmochi-devel&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Ce2bcb39b6d5b4cf4532708d99ff848e9%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637716711746593953%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=b7VlzzDilEYHP8qT%2FvJikvIt8LXhQcpLLDlgnUdacLI%3D&reserved=0" originalsrc="https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel" shash="Yv6KmtyofDUcby5UrXHw7heCU90fU26cRsWvm5YsW0KtQvfMU/1rDOBggDOfWQQ6fmmJVxRZ2Sph8JHvF4q9RuqGukEZzvoX8CJilaXEar1Su6OxxFnbohr155WH55Y/T/cR3AcBQEk1T7B+T4w6sOBh2g0BYjob41Tc9MkGdiQ=" moz-do-not-send="true">https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel</a><br class="">
                                                          <a class="moz-txt-link-freetext" href="https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mcs.anl.gov%2Fresearch%2Fprojects%2Fmochi&data=04%7C01%7Cphilip.e.davis%40rutgers.edu%7Ce2bcb39b6d5b4cf4532708d99ff848e9%7Cb92d2b234d35447093ff69aca6632ffe%7C1%7C0%7C637716711746593953%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=o6BhPWv%2BJ2ktyAq9CM9jFgzRoVCqvt1Nd0e5tK5iFhw%3D&reserved=0" originalsrc="https://www.mcs.anl.gov/research/projects/mochi" shash="nbyPE2dy7P+FhQi1q13Wq5jpzGL5X6cjz/OAePcVfUs5CFHshqKBCimvmqpU0nqK2mfcxHfEm66CtLwJUOXlaKgmVBdv7A5cafVocPnBG/AayIfEOmI3IR3e1IxkhjED2h4RFfn8amkthcnD2E4pkUQxCL+A0xitnL4BUPDWVS8=" moz-do-not-send="true">https://www.mcs.anl.gov/research/projects/mochi</a><br class="">
                                                          </div>
                                                          </blockquote>
                                                          </div>
                                                          <br class="">
                                                          </div>
                                                          </div>
                                                        </blockquote>
                                                      </div>
                                                    </div>
                                                  </blockquote>
                                                </div>
                                                <br class="">
                                              </div>
                                            </blockquote>
                                          </div>
                                        </div>
                                      </blockquote>
                                    </div>
                                    <br class="">
                                  </div>
                                </div>
                              </blockquote>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br class="">
                    </div>
                  </blockquote>
                </div>
              </div>
            </blockquote>
          </div>
          <br class="">
        </div>
      </div>
    </blockquote>
  </body>
</html>