[mpich-discuss] MPI-IO and (potentially) unnecessary synchronization

Wed Sep 1 12:38:14 CDT 2010

in benchmarks of very large concurrent writes on Lustre using both cb 
and non cb API I have observed that for the non cb API that asynchronism 
during write can be advantageous as it tends to reduce congestion and 
contention. This can increase the throughput.  However, in this case the 
synchronization time that occurs at MPI_File_close is significant for 
many of the processes, as non return until the slowest process enters 
the call. This synchronization at close in net effect ruins any 
advantage gained. So I wonder does MPI_File_close really require a 
collective implementation? For instance I could imagine a reference 
counting scheme where one process were designated to manage close 
operation, others call MPI_File_sync (which I've observed doesn't block 
unless it has to) and post a non-blocking 0 byte message to the manager 
rank, then they can continue unimpeded. You could perhaps remove all to 
one communication using some sort of hierarchical structured 
communication pattern. If I understand such a scheme wouldn't violate 
consistency because if one cares about it then a barrier is required anyway.

Have I misunderstood the situation?

Thanks
Burlen