[mpich-discuss] apparent hydra problem

Martin Pokorny mpokorny at nrao.edu
Fri Mar 2 14:52:02 CST 2012


Dave Goodell wrote:
> On Mar 2, 2012, at 2:26 PM CST, Martin Pokorny wrote:
> 
>> Dave Goodell wrote:
>>> This causes the temp context ID to collide with a context ID used
>>> by an internal subcommunicator on half of the intercomm, and
>>> potentially to collide with a random communicator on the other
>>> half.  So it's possible to get some "cross talk" between two
>>> otherwise unrelated communicators.
>> That's conceivably applicable in my case because the involved
>> processes can be long-running, and threads with distinct
>> communicators are employed to allow writing multiple files (using
>> MPI-IO) concurrently. Is there some way I might be able to modify
>> the MPIR_Intercomm_merge_impl code to test for a context ID
>> collision (and then report this condition)?
> 
> Yes and no.  It wouldn't be hard to detect that you've intruded into
> some random comm's context ID space, but it would be very hard to
> detect that it was actually causing a problem.  Basically, the only
> crosstalk that can happen is between MPI_Allreduce operations and
> this communicator merge operation.  Are you making any allreduce
> calls?

Yes, indeed. Might I simply be able to replace that call with a 
reduce/broadcast pair as a possible workaround?

-- 
Martin Pokorny
Software Engineer - Expanded Very Large Array
National Radio Astronomy Observatory - New Mexico Operations


More information about the mpich-discuss mailing list