[mpich-discuss] MPI-IO and (potentially) unnecessary synchronization

Rob Ross rross at mcs.anl.gov
Thu Sep 2 12:43:55 CDT 2010


Also, for clarification, your interpretation of a collective call as  
one that must "block the progress of each process until at least all  
of the processes have entered the call" is incorrect. There is no such  
constraint.

Rob

On Sep 2, 2010, at 12:41 PM, Rob Ross wrote:

> Hi,
>
> The short (perhaps snarky) answer is that that is how the standard  
> is defined.
>
> The longer answer is that this provides an opportunity for caches to  
> be flushed and data to be aggregated and written prior to closing.  
> This opportunity isn't taken advantage of very much in current  
> implementations; however, it might be (for example) the place at  
> which final cache flushing is performed in an implementation that  
> performs coordinated caching of write data, even if collective  
> buffering weren't involved (see A. Nisar's recent work in the area  
> for an example).
>
> If you really don't want any collective behavior, open with  
> MPI_COMM_SELF.
>
> Rob
>
> On Sep 2, 2010, at 12:12 PM, burlen wrote:
>
>> Could anyone explain  why MPI_File_close must be a collective call  
>> when collective buffering is not used? By collective I mean block  
>> the progress of each process until at least all of the processes  
>> have entered the call?
>>
>> I realize my first post misunderstands the situation in a number of  
>> ways. To attempt to correct myself, each process who touches the  
>> disk must have his own file descriptor somewhere. When collective  
>> buffering isn't used to close the file each process would have to  
>> close his local descriptor. I have noticed that MPI_File_sync is  
>> documented as a collective function but does not behave like one  
>> when collective buffering is not used. By this I mean that it  
>> completes before all processes have entered the call. if  
>> MPI_File_sync can behave this way, why wouldn't MPI_File_close do  
>> the same?
>>
>> burlen wrote:
>>> in benchmarks of very large concurrent writes on Lustre using both  
>>> cb and non cb API I have observed that for the non cb API that  
>>> asynchronism during write can be advantageous as it tends to  
>>> reduce congestion and contention. This can increase the  
>>> throughput.  However, in this case the synchronization time that  
>>> occurs at MPI_File_close is significant for many of the processes,  
>>> as non return until the slowest process enters the call. This  
>>> synchronization at close in net effect ruins any advantage gained.  
>>> So I wonder does MPI_File_close really require a collective  
>>> implementation? For instance I could imagine a reference counting  
>>> scheme where one process were designated to manage close  
>>> operation, others call MPI_File_sync (which I've observed doesn't  
>>> block unless it has to) and post a non-blocking 0 byte message to  
>>> the manager rank, then they can continue unimpeded. You could  
>>> perhaps remove all to one communication using some sort of  
>>> hierarchical structured communication pattern. If I understand  
>>> such a scheme wouldn't violate consistency because if one cares  
>>> about it then a barrier is required anyway.
>>>
>>> Have I misunderstood the situation?
>>>
>>> Thanks
>>> Burlen
>>>
>>
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list