[Mochi-devel] Margo + sockets provider inexplicably slow
Clement Barthelemy
clement.barthelemy at inria.fr
Thu Sep 2 09:16:15 CDT 2021
Ok that did the trick, I'm now reaching the same order of magnitude with the sockets provider busy-looping: 5.90e-5 s/RPC.
Is there anything I can do to help fix this? Do you need a bug report or something?
Clément
----- Mail original -----
> De: "Clement Barthelemy" <clement.barthelemy at inria.fr>
> À: "Phil Carns" <carns at mcs.anl.gov>
> Cc: "mochi-devel" <mochi-devel at lists.mcs.anl.gov>
> Envoyé: Jeudi 2 Septembre 2021 15:53:22
> Objet: Re: [Mochi-devel] Margo + sockets provider inexplicably slow
> Hi Phil,
>
> I wrote a test with a simple RPC that takes an integer argument and returns an
> integer. The client sends it 100 000 times and I simply use the unix time
> command on it. My reasoning was that this would not saturate the bandwidth and
> I'd be able to see the latency.
>
> Thanks for the advice, I'll try the busy-polling and report back.
>
> Clément
>
>
> ----- Mail original -----
>> De: "Phil Carns" <carns at mcs.anl.gov>
>> À: "mochi-devel" <mochi-devel at lists.mcs.anl.gov>
>> Envoyé: Jeudi 2 Septembre 2021 15:15:35
>> Objet: Re: [Mochi-devel] Margo + sockets provider inexplicably slow
>
>> Hi Clément,
>
>> What benchmark are you using to generate these numbers?
>
>> My first guess would be a difference in polling strategy (how frequently
>> HG_Progress() is being called, and with what timeout value), and how well the
>> provider handles that.
>
>> One quick and dirty way to test this theory would be to set the margo_init_info
>> -> hg_init_info ->
>> na_init_info -> progress_mode to NA_NO_BLOCK before initializing margo with
>> margo_init_ext(). There is an example that does this here:
>
>> https://github.com/mochi-hpc-experiments/mochi-tests/blob/main/perf-regression/margo-p2p-latency.c#L105
>
>> That's tedious from an API perspective, but the reason it might be informative
>> as a quick hack is that it will force Mercury to busy poll on the underlying
>> transport no matter what margo or other callers are actually asking it to do.
>> It effectively short circuits any higher level polling strategy decisions.
>
>> Depending on what that tells us, we can go from there. I suspect that the
>> sockets provider mechanism for waiting for events (as opposed to just polling
>> for events) might be problematic.
>
>> thanks,
>
>> -Phil
>> On 9/2/21 8:29 AM, Clement Barthelemy wrote:
>
>>> Hello all,
>
>>> I did some latency measurement to compare Mercury and Margo with different
>>> providers, the results are below:
>
>>> Mercury (s/RPC) Margo (s/RPC)
>>> ofi+psm2 6.21e-5 5.01e-5
>>> ofi+tcp;ofi_rxm 8.20e-5 9.55e-5
>>> ofi+sockets 7.54e-5 2.08e-2 (!)
>
>>> As you can see, the Margo + the sockets provider is 250 times slower than the
>>> rest. I first suspected libfabric, but Mercury does not have the problem. Do
>>> you know what could be causing this?
>
>>> I've tested with Margo 0.9.5, Mercury 2.0.1 and libfabric 1.12.1 & 1.13.1.
>
>>> Thanks,
>
>>> Clément
>>> _______________________________________________
>>> mochi-devel mailing list [ mailto:mochi-devel at lists.mcs.anl.gov |
>>> mochi-devel at lists.mcs.anl.gov ] [
>>> https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel |
>>> https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel ] [
>>> https://www.mcs.anl.gov/research/projects/mochi |
>>> https://www.mcs.anl.gov/research/projects/mochi ]
>
>> _______________________________________________
>> mochi-devel mailing list
>> mochi-devel at lists.mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel
> > https://www.mcs.anl.gov/research/projects/mochi
More information about the mochi-devel
mailing list