[mpich-discuss] apparent hydra problem
Martin Pokorny
mpokorny at nrao.edu
Fri Mar 2 14:52:02 CST 2012
Dave Goodell wrote:
> On Mar 2, 2012, at 2:26 PM CST, Martin Pokorny wrote:
>
>> Dave Goodell wrote:
>>> This causes the temp context ID to collide with a context ID used
>>> by an internal subcommunicator on half of the intercomm, and
>>> potentially to collide with a random communicator on the other
>>> half. So it's possible to get some "cross talk" between two
>>> otherwise unrelated communicators.
>> That's conceivably applicable in my case because the involved
>> processes can be long-running, and threads with distinct
>> communicators are employed to allow writing multiple files (using
>> MPI-IO) concurrently. Is there some way I might be able to modify
>> the MPIR_Intercomm_merge_impl code to test for a context ID
>> collision (and then report this condition)?
>
> Yes and no. It wouldn't be hard to detect that you've intruded into
> some random comm's context ID space, but it would be very hard to
> detect that it was actually causing a problem. Basically, the only
> crosstalk that can happen is between MPI_Allreduce operations and
> this communicator merge operation. Are you making any allreduce
> calls?
Yes, indeed. Might I simply be able to replace that call with a
reduce/broadcast pair as a possible workaround?
--
Martin Pokorny
Software Engineer - Expanded Very Large Array
National Radio Astronomy Observatory - New Mexico Operations
More information about the mpich-discuss
mailing list