[mpich-discuss] fence vs. lock-unlock

Tue Jul 10 08:51:52 CDT 2012

Can you sketch out the code that you are using when you switch to passive mode?  Replacing fence with lock/unlock could be done a couple of ways.

Are you using EXCLUSIVE or SHARED locks?  Are you providing any assertion hints to the library?

-Dave

On Jul 10, 2012, at 3:06 PM GMT+02:00, Jie Chen wrote:

> What I did was to add "-env MPICH_ASYNC_PROGRESS 1" when calling mpiexec. But I don't see any improvements. Did I miss anything?
> 
> Jie
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Mon, 9 Jul 2012 01:20:39 -0500
> From: Rajeev Thakur <thakur at mcs.anl.gov>
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] fence vs. lock-unlock
> Message-ID: <68DD6AB6-ECD4-4FEE-ADE8-68F08FDEBFCD at mcs.anl.gov>
> Content-Type: text/plain; charset=us-ascii
> 
> That's because by default progress on passive target RMA may need an MPI function to be called on the target. You can set the environment variable MPICH_ASYNC_PROGRESS to enable asynchronous progress.
> 
> Rajeev
> 
> On Jul 9, 2012, at 1:10 AM, Jie Chen wrote:
> 
>> I am using one sided communications lock-unlock and see something that I do not understand.
>> 
>> In the normal case (using fence), my code block looks something like this (rank is my process id, nproc is the total number of processes):
>> 
>> for i = 0 to nproc-1
>> -- fence
>> -- MPI_Get something from process (rank+i)%nproc
>> -- fence
>> -- computation
>> end
>> 
>> The time line for the above operations looks like the following, which is completely normal (- means computation, * means communication including fence and MPI_Get):
>> 
>> proc 0: *--------*--------*--------*--------
>> proc 1: *--------*--------*--------*--------
>> proc 2: *--------*--------*--------*--------
>> proc 3: *--------*--------*--------*--------
>> 
>> The above illustration shows perfect computation load balance, for simplicity.
>> 
>> However, when I change the two fences by lock and unlock, the time line looks like the following
>> 
>> proc 0: *--------**********--------*--------**********--------
>> proc 1: *--------*--------**********--------*--------
>> proc 2: *--------*******************--------*--------*--------
>> proc 3: *--------*--------**********--------**********--------
>> 
>> The problem here is that sometimes the communication takes a very long time to finish. In particular, this is attributed to the MPI_Win_unlock call that will not return until the target process has finished one round of computation.
>> 
>> I do not understand why the unlock (or perhaps the actual data transfer) is so time consuming. The figures here show balanced computational work load. When the work load is not balanced, I thought the lock/unlock mechanism was better than using fence because it avoids barriers. But according to experiments, it appears that having barriers is better than none. Is this caused by the implementation of MPI_Win_unlock or the hardware?
>> 
>> 
>> 
>> --
>> Jie Chen
>> Mathematics and Computer Science Division
>> Argonne National Laboratory
>> Address: 9700 S Cass Ave, Bldg 240, Lemont, IL 60439
>> Phone: (630) 252-3313
>> Email: jiechen at mcs.anl.gov
>> Homepage: http://www.mcs.anl.gov/~jiechen
>> _______________________________________________
>> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
>> To manage subscription options or unsubscribe:
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list     mpich-discuss at mcs.anl.gov
> To manage subscription options or unsubscribe:
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss