[mpich-discuss] Using MPI_Put/Get correctly?

Rajeev Thakur thakur at mcs.anl.gov
Thu Dec 16 14:38:16 CST 2010


Puts from window memory may be ok (I am not 100% positive), but that is not the cause of the segfault in any case. You may be able to simplify the code by using a single vector datatype and using the displacement parameter to Put to place it in the right location. You could also try replacing puts with gets and see if you still get an error.

Rajeev

On Dec 16, 2010, at 2:27 PM, James Dinan wrote:

> Hi Matt,
> 
> If my understanding is correct, the only time you are allowed to perform direct load/store accesses on local data that is exposed in a window is when the window is closed under active target or when you are in an exclusive access epoch under passive mode target.  So I think what you are doing may be invalid even though you are able to guarantee that accesses do not overlap.  The source for your put will need to be a private buffer, you may be able to accomplish this easily in your code or you might have to copy data into a private buffer (before you post the window) before you can put().
> 
> Even though this is outside of the standard, some (many?) MPI implementations may actually allow this on cache-coherent systems (I think MPICH2 on shared memory will allow it).
> 
> I would be surprised if this error is causing your seg fault (more likely it should just result in corrupted data within the bounds of your buffer).  I would tend to suspect that something is off in your datatype, possibly the target datatype since the segfault occurs in wait() which is when data might be getting unpacked at the target.  Can you run your code through a debugger or valgrind to give us more information on how/when the seg faul occurs?
> 
> Cheers,
> ~Jim.
> 
> On 12/16/2010 12:33 PM, Grismer, Matthew J Civ USAF AFMC AFRL/RBAT wrote:
>> I am trying to modify the communication routines in our code to use
>> MPI_Put's instead of sends and receives.  This worked fine for several
>> variable Put's, but now I have one that is causing seg faults. Reading
>> through the MPI documentation it is not clear to me if what I am doing
>> is permissible or not.  Basically, the question is this - if I have
>> defined all of an array as a window on each processor, can I PUT data
>> from that array to remote processes at the same time as the remote
>> processes are PUTing into the local copy, assuming no overlaps of any of
>> the PUTs?
>> 
>> Here are the details if that doesn't make sense.  I have a (Fortran)
>> array QF(6,2,N) on each processor, where N could be a very large number
>> (100,000). I create a window QFWIN on the entire array on all the
>> processors.  I define MPI_Type_indexed "sending" datatypes (QFSND) with
>> block lengths of 6 that send from QF(1,1,*), and MPI_Type_indexed
>> "receiving" datatypes (QFREC) with block lengths of 6 the receive into
>> QF(1,2,*).  Here * is non-repeating set of integers up to N.  I create
>> groups of processors that communicate, where these groups will all
>> exchange QF data, PUTing local QF(1,1,*) to remote QF(1,2,*).  So,
>> processor 1 is PUTing QF data to processors 2,3,4 at the same time 2,3,4
>> are putting their QF data to 1, and so on.  Processors 2,3,4 are PUTing
>> into non-overlapping regions of QF(1,2,*) on 1, and 1 is PUTing from
>> QF(1,1,*) to 2,3,4, and so on.  So, my calls look like this on each
>> processor:
>> 
>> assertion = 0
>> call MPI_Win_post(group, assertion, QFWIN, ierr)
>> call MPI_Win_start(group, assertion, QFWIN, ierr)
>> 
>> do I=1,neighbors
>>   call MPI_Put(QF, 1, QFSND(I), NEIGHBOR(I), 0, 1, QFREC(I), QFWIN,
>> ierr)
>> end do
>> 
>> call MPI_Win_complete(QFWIN,ierr)
>> call MPI_Win_wait(QFWIN,ierr)
>> 
>> Note I did define QFREC locally on each processor to properly represent
>> where the data was going on the remote processors.  The error value
>> ierr=0 after MPI_Win_post, MPI_Win_start, MPI_Put, and MPI_Win_complete,
>> and the code seg faults in MPI_Win_wait.
>> 
>> I'm using MPICH2 1.3.1 on Mac OS X 10.6.5, built with Intel XE (12.0)
>> compilers, and running on just 2 (internal) processors of my Mac Pro.
>> The code ran normally with this configuration up until the point I put
>> the above in.  Several other communications with MPI_Put similar to the
>> above work fine, though the windows are only on a subset of the
>> communicated array, and the origin data is being PUT from part of the
>> array that is not within the window.
>> 
>> _____________________________________________________
>> Matt
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list