[mpich-discuss] MPI-IO and (potentially) unnecessary synchronization

Rob Ross rross at mcs.anl.gov
Thu Sep 2 12:41:42 CDT 2010


Hi,

The short (perhaps snarky) answer is that that is how the standard is  
defined.

The longer answer is that this provides an opportunity for caches to  
be flushed and data to be aggregated and written prior to closing.  
This opportunity isn't taken advantage of very much in current  
implementations; however, it might be (for example) the place at which  
final cache flushing is performed in an implementation that performs  
coordinated caching of write data, even if collective buffering  
weren't involved (see A. Nisar's recent work in the area for an  
example).

If you really don't want any collective behavior, open with  
MPI_COMM_SELF.

Rob

On Sep 2, 2010, at 12:12 PM, burlen wrote:

> Could anyone explain  why MPI_File_close must be a collective call  
> when collective buffering is not used? By collective I mean block  
> the progress of each process until at least all of the processes  
> have entered the call?
>
> I realize my first post misunderstands the situation in a number of  
> ways. To attempt to correct myself, each process who touches the  
> disk must have his own file descriptor somewhere. When collective  
> buffering isn't used to close the file each process would have to  
> close his local descriptor. I have noticed that MPI_File_sync is  
> documented as a collective function but does not behave like one  
> when collective buffering is not used. By this I mean that it  
> completes before all processes have entered the call. if  
> MPI_File_sync can behave this way, why wouldn't MPI_File_close do  
> the same?
>
> burlen wrote:
>> in benchmarks of very large concurrent writes on Lustre using both  
>> cb and non cb API I have observed that for the non cb API that  
>> asynchronism during write can be advantageous as it tends to reduce  
>> congestion and contention. This can increase the throughput.   
>> However, in this case the synchronization time that occurs at  
>> MPI_File_close is significant for many of the processes, as non  
>> return until the slowest process enters the call. This  
>> synchronization at close in net effect ruins any advantage gained.  
>> So I wonder does MPI_File_close really require a collective  
>> implementation? For instance I could imagine a reference counting  
>> scheme where one process were designated to manage close operation,  
>> others call MPI_File_sync (which I've observed doesn't block unless  
>> it has to) and post a non-blocking 0 byte message to the manager  
>> rank, then they can continue unimpeded. You could perhaps remove  
>> all to one communication using some sort of hierarchical structured  
>> communication pattern. If I understand such a scheme wouldn't  
>> violate consistency because if one cares about it then a barrier is  
>> required anyway.
>>
>> Have I misunderstood the situation?
>>
>> Thanks
>> Burlen
>>
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list