[Mochi-devel] Margo + sockets provider inexplicably slow

Clement Barthelemy clement.barthelemy at inria.fr
Thu Sep 2 08:53:22 CDT 2021


Hi Phil,

I wrote a test with a simple RPC that takes an integer argument and returns an integer. The client sends it 100 000 times and I simply use the unix time command on it. My reasoning was that this would not saturate the bandwidth and I'd be able to see the latency. 

Thanks for the advice, I'll try the busy-polling and report back.

Clément


----- Mail original -----
> De: "Phil Carns" <carns at mcs.anl.gov>
> À: "mochi-devel" <mochi-devel at lists.mcs.anl.gov>
> Envoyé: Jeudi 2 Septembre 2021 15:15:35
> Objet: Re: [Mochi-devel] Margo + sockets provider inexplicably slow

> Hi Clément,

> What benchmark are you using to generate these numbers?

> My first guess would be a difference in polling strategy (how frequently
> HG_Progress() is being called, and with what timeout value), and how well the
> provider handles that.

> One quick and dirty way to test this theory would be to set the margo_init_info
> -> hg_init_info ->
> na_init_info -> progress_mode to NA_NO_BLOCK before initializing margo with
> margo_init_ext(). There is an example that does this here:

> https://github.com/mochi-hpc-experiments/mochi-tests/blob/main/perf-regression/margo-p2p-latency.c#L105

> That's tedious from an API perspective, but the reason it might be informative
> as a quick hack is that it will force Mercury to busy poll on the underlying
> transport no matter what margo or other callers are actually asking it to do.
> It effectively short circuits any higher level polling strategy decisions.

> Depending on what that tells us, we can go from there. I suspect that the
> sockets provider mechanism for waiting for events (as opposed to just polling
> for events) might be problematic.

> thanks,

> -Phil
> On 9/2/21 8:29 AM, Clement Barthelemy wrote:

>> Hello all,

>> I did some latency measurement to compare Mercury and Margo with different
>> providers, the results are below:

>>                   Mercury (s/RPC)  Margo (s/RPC)
>> ofi+psm2            6.21e-5          5.01e-5
>> ofi+tcp;ofi_rxm     8.20e-5          9.55e-5
>> ofi+sockets         7.54e-5          2.08e-2 (!)

>> As you can see, the Margo + the sockets provider is 250 times slower than the
>> rest. I first suspected libfabric, but Mercury does not have the problem. Do
>> you know what could be causing this?

>> I've tested with Margo 0.9.5, Mercury 2.0.1 and libfabric 1.12.1 & 1.13.1.

>> Thanks,

>> Clément
>> _______________________________________________
>> mochi-devel mailing list [ mailto:mochi-devel at lists.mcs.anl.gov |
>> mochi-devel at lists.mcs.anl.gov ] [
>> https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel |
>> https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel ] [
>> https://www.mcs.anl.gov/research/projects/mochi |
>> https://www.mcs.anl.gov/research/projects/mochi ]

> _______________________________________________
> mochi-devel mailing list
> mochi-devel at lists.mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel
> https://www.mcs.anl.gov/research/projects/mochi


More information about the mochi-devel mailing list