[mpich2-dev] mpich 1.1 beta: details of MPI_Win_fence semantics

Tue Apr 14 12:33:18 CDT 2009

The text on pg 338 of the 2.1 specification is essentially unchanged from
earlier docs, and still seems too vague. It should state explicitly that a
single instance of a call to fence both ends the previous epoch and starts
a new epoch. But supporting this seems problematic.

So, you are saying that it is valid to do the following:

[fence - 0]
[RMA operations]
[fence - 0]
[RMA operations]
[fence - 0]
[RMA operations]
[fence - 0]

Does this mean that one is required to use NOPRECEDE and NOSUCCEED in order
to avoid RMA_SYNC errors when switching to/from another synchronization
methods after/before fence? Or else implementations must not do error
checking for the synchronization primitives? This seems like it's forcing
low quality implementations.

In the new test "mixedsync" it does:

[lock]
[RMA operations]
[unlock]
[fence - 0]
[RMA operations]
[fence - 0]
<repeat>

In this case, the second fence would be followed by a lock (looping back to
the beginning).  Does this mean we must also allow lock-unlock while a
fence epoch is active? It seems that there can be little or no error
checking on synchronization primitives at all, or else a very (overly)
complex internal state model is needed. In the above case, should the
second fence start a "tentative" epoch which is then released/converted
when the lock happens? Or should it allow both lock-rma-unlock and plain
rma within the fence?

Is there better documentation of this interaction, or better examples
showing how this should work?

thanks,
doug miller

             "Rajeev Thakur"                                               
             <thakur at mcs.anl.g                                             
             ov>                                                        To 
             Sent by:                  <mpich2-dev at mcs.anl.gov>            
             mpich2-dev-bounce                                          cc 
             s at mcs.anl.gov                                                 
                                                                   Subject 
                                       Re: [mpich2-dev] mpich 1.1 beta:    
             04/14/2009 10:40          details of MPI_Win_fence semantics  
             AM                                                            

             Please respond to                                             
             mpich2-dev at mcs.an                                             
                   l.gov                                                   

Doug,
     A call to fence both completes the previous epoch (if there was one)
and starts the next epoch, as described on pg 338 of MPI 2.1. In other
words, the sequence fence-put-fence-put-fence is allowed. MPICH2 handles
this case. That is why there are the asserts MPI_MODE_NOPRECEDE and
MPI_MODE_NOSUCCEED for the user to indicate otherwise. Unless the user
passes these asserts or it is the very first fence, the implementation
should assume that a given fence can be preceded by puts/gets and followed
by puts/gets.

Rajeev

> -----Original Message-----
> From: mpich2-dev-bounces at mcs.anl.gov
> [mailto:mpich2-dev-bounces at mcs.anl.gov] On Behalf Of Douglas Miller
> Sent: Tuesday, April 14, 2009 9:53 AM
> To: mpich2-dev at mcs.anl.gov
> Subject: [mpich2-dev] mpich 1.1 beta: details of
> MPI_Win_fence semantics
>
>
> Some new tests in mpich 1.1 beta use MPI_Win_fence in an
> unexpected (to me)
> fashion. They do the following:
>
> [fence - NOPRECEDE]
> [RMA operations]
> [fence - 0]
> [RMA operations]
> [fence - 0]
> [RMA operations]
> [fence - NOSUCCEED]
>
> I was assuming that MPI_Win_fence was *either* starting or
> completing an
> epoch (i.e. there were always matched-pairs of fence calls),
> not both. But
> this usage implies that there is an expectation that a single
> call to fence
> can *both* end an epoch and start a new epoch. The
> specification is vague
> at best.
>
> The problem with the above usage is that the middle calls to
> fence create a
> situation where the implementation cannot be certain whether it is
> operating within a fence epoch or not. I'm not sure how to
> implement any
> sort of error checking to cover this case, as the user could
> follow a fence
> with either RMA calls or some other synchronization
> primitives (POST-START
> or LOCK) or even protected local access. It was my
> understanding that the
> ASSERT flags were meant to be hints to the implementation and
> not required
> by the caller for proper operation.
>
> Can you help clear this up?  Is the test wrong or are we
> actually required
> to handle this situation?
>
> thanks,
>
> doug miller
>
>