[Mochi-devel] Margo + sockets provider inexplicably slow

Phil Carns carns at mcs.anl.gov
Wed Sep 8 12:27:14 CDT 2021


Thanks Clément (both for confirming the behavior and for reaching out to 
the libfabric developers).

I was kind of waiting to see if we got a response there but no such luck 
yet :)

We are kind of stuck here because my impression is that the socket 
provider is at this point intended mostly as a reference implementation 
(and thus won't get much tuning on fundamental things like threading and 
wait object performance).  On the other hand the obvious replacement is 
the tcp/rxm provider stack, but the tcp provider has some notable 
deadlock bugs in the 1.13.x libfabric releases.

I know this isn't a very satisfying answer, but I think I would probably 
recommend using tcp/rxm with libfabric 1.12.x for now if you need TCP/IP 
support.

To our knowledge the other providers in the 1.13.x releases are fine.

thanks,

-Phil


On 9/2/21 10:16 AM, Clement Barthelemy wrote:
> Ok that did the trick, I'm now reaching the same order of magnitude with the sockets provider busy-looping: 5.90e-5 s/RPC.
>
> Is there anything I can do to help fix this? Do you need a bug report or something?
>
> Clément
>
>
> ----- Mail original -----
>> De: "Clement Barthelemy" <clement.barthelemy at inria.fr>
>> À: "Phil Carns" <carns at mcs.anl.gov>
>> Cc: "mochi-devel" <mochi-devel at lists.mcs.anl.gov>
>> Envoyé: Jeudi 2 Septembre 2021 15:53:22
>> Objet: Re: [Mochi-devel] Margo + sockets provider inexplicably slow
>> Hi Phil,
>>
>> I wrote a test with a simple RPC that takes an integer argument and returns an
>> integer. The client sends it 100 000 times and I simply use the unix time
>> command on it. My reasoning was that this would not saturate the bandwidth and
>> I'd be able to see the latency.
>>
>> Thanks for the advice, I'll try the busy-polling and report back.
>>
>> Clément
>>
>>
>> ----- Mail original -----
>>> De: "Phil Carns" <carns at mcs.anl.gov>
>>> À: "mochi-devel" <mochi-devel at lists.mcs.anl.gov>
>>> Envoyé: Jeudi 2 Septembre 2021 15:15:35
>>> Objet: Re: [Mochi-devel] Margo + sockets provider inexplicably slow
>>> Hi Clément,
>>> What benchmark are you using to generate these numbers?
>>> My first guess would be a difference in polling strategy (how frequently
>>> HG_Progress() is being called, and with what timeout value), and how well the
>>> provider handles that.
>>> One quick and dirty way to test this theory would be to set the margo_init_info
>>> -> hg_init_info ->
>>> na_init_info -> progress_mode to NA_NO_BLOCK before initializing margo with
>>> margo_init_ext(). There is an example that does this here:
>>> https://github.com/mochi-hpc-experiments/mochi-tests/blob/main/perf-regression/margo-p2p-latency.c#L105
>>> That's tedious from an API perspective, but the reason it might be informative
>>> as a quick hack is that it will force Mercury to busy poll on the underlying
>>> transport no matter what margo or other callers are actually asking it to do.
>>> It effectively short circuits any higher level polling strategy decisions.
>>> Depending on what that tells us, we can go from there. I suspect that the
>>> sockets provider mechanism for waiting for events (as opposed to just polling
>>> for events) might be problematic.
>>> thanks,
>>> -Phil
>>> On 9/2/21 8:29 AM, Clement Barthelemy wrote:
>>>> Hello all,
>>>> I did some latency measurement to compare Mercury and Margo with different
>>>> providers, the results are below:
>>>>                    Mercury (s/RPC)  Margo (s/RPC)
>>>> ofi+psm2            6.21e-5          5.01e-5
>>>> ofi+tcp;ofi_rxm     8.20e-5          9.55e-5
>>>> ofi+sockets         7.54e-5          2.08e-2 (!)
>>>> As you can see, the Margo + the sockets provider is 250 times slower than the
>>>> rest. I first suspected libfabric, but Mercury does not have the problem. Do
>>>> you know what could be causing this?
>>>> I've tested with Margo 0.9.5, Mercury 2.0.1 and libfabric 1.12.1 & 1.13.1.
>>>> Thanks,
>>>> Clément
>>>> _______________________________________________
>>>> mochi-devel mailing list [ mailto:mochi-devel at lists.mcs.anl.gov |
>>>> mochi-devel at lists.mcs.anl.gov ] [
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel |
>>>> https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel ] [
>>>> https://www.mcs.anl.gov/research/projects/mochi |
>>>> https://www.mcs.anl.gov/research/projects/mochi ]
>>> _______________________________________________
>>> mochi-devel mailing list
>>> mochi-devel at lists.mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mochi-devel
>>> https://www.mcs.anl.gov/research/projects/mochi


More information about the mochi-devel mailing list