[mpich-discuss] Using MPI_Put/Get correctly?

Rajeev Thakur thakur at mcs.anl.gov
Thu Dec 16 15:07:08 CST 2010


If you could send us a small test program that fails, it would be helpful.

Rajeev

On Dec 16, 2010, at 2:48 PM, Grismer, Matthew J Civ USAF AFMC AFRL/RBAT wrote:

> I actually tried changing to gets yesterday, still got the same error.
> I need the indexed datatype because the data is irregularly spaced, and
> I'm pretty confident they are setup correctly (the default version of
> the code uses them with fine with sends and receives, i.e. gives the
> correct answers).
> 
> Matt
> 
> -----Original Message-----
> From: mpich-discuss-bounces at mcs.anl.gov
> [mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev Thakur
> Sent: Thursday, December 16, 2010 3:38 PM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Using MPI_Put/Get correctly?
> 
> Puts from window memory may be ok (I am not 100% positive), but that is
> not the cause of the segfault in any case. You may be able to simplify
> the code by using a single vector datatype and using the displacement
> parameter to Put to place it in the right location. You could also try
> replacing puts with gets and see if you still get an error.
> 
> Rajeev
> 
> On Dec 16, 2010, at 2:27 PM, James Dinan wrote:
> 
>> Hi Matt,
>> 
>> If my understanding is correct, the only time you are allowed to
> perform direct load/store accesses on local data that is exposed in a
> window is when the window is closed under active target or when you are
> in an exclusive access epoch under passive mode target.  So I think what
> you are doing may be invalid even though you are able to guarantee that
> accesses do not overlap.  The source for your put will need to be a
> private buffer, you may be able to accomplish this easily in your code
> or you might have to copy data into a private buffer (before you post
> the window) before you can put().
>> 
>> Even though this is outside of the standard, some (many?) MPI
> implementations may actually allow this on cache-coherent systems (I
> think MPICH2 on shared memory will allow it).
>> 
>> I would be surprised if this error is causing your seg fault (more
> likely it should just result in corrupted data within the bounds of your
> buffer).  I would tend to suspect that something is off in your
> datatype, possibly the target datatype since the segfault occurs in
> wait() which is when data might be getting unpacked at the target.  Can
> you run your code through a debugger or valgrind to give us more
> information on how/when the seg faul occurs?
>> 
>> Cheers,
>> ~Jim.
>> 
>> On 12/16/2010 12:33 PM, Grismer, Matthew J Civ USAF AFMC AFRL/RBAT
> wrote:
>>> I am trying to modify the communication routines in our code to use
>>> MPI_Put's instead of sends and receives.  This worked fine for
> several
>>> variable Put's, but now I have one that is causing seg faults.
> Reading
>>> through the MPI documentation it is not clear to me if what I am
> doing
>>> is permissible or not.  Basically, the question is this - if I have
>>> defined all of an array as a window on each processor, can I PUT data
>>> from that array to remote processes at the same time as the remote
>>> processes are PUTing into the local copy, assuming no overlaps of any
> of
>>> the PUTs?
>>> 
>>> Here are the details if that doesn't make sense.  I have a (Fortran)
>>> array QF(6,2,N) on each processor, where N could be a very large
> number
>>> (100,000). I create a window QFWIN on the entire array on all the
>>> processors.  I define MPI_Type_indexed "sending" datatypes (QFSND)
> with
>>> block lengths of 6 that send from QF(1,1,*), and MPI_Type_indexed
>>> "receiving" datatypes (QFREC) with block lengths of 6 the receive
> into
>>> QF(1,2,*).  Here * is non-repeating set of integers up to N.  I
> create
>>> groups of processors that communicate, where these groups will all
>>> exchange QF data, PUTing local QF(1,1,*) to remote QF(1,2,*).  So,
>>> processor 1 is PUTing QF data to processors 2,3,4 at the same time
> 2,3,4
>>> are putting their QF data to 1, and so on.  Processors 2,3,4 are
> PUTing
>>> into non-overlapping regions of QF(1,2,*) on 1, and 1 is PUTing from
>>> QF(1,1,*) to 2,3,4, and so on.  So, my calls look like this on each
>>> processor:
>>> 
>>> assertion = 0
>>> call MPI_Win_post(group, assertion, QFWIN, ierr)
>>> call MPI_Win_start(group, assertion, QFWIN, ierr)
>>> 
>>> do I=1,neighbors
>>>  call MPI_Put(QF, 1, QFSND(I), NEIGHBOR(I), 0, 1, QFREC(I), QFWIN,
>>> ierr)
>>> end do
>>> 
>>> call MPI_Win_complete(QFWIN,ierr)
>>> call MPI_Win_wait(QFWIN,ierr)
>>> 
>>> Note I did define QFREC locally on each processor to properly
> represent
>>> where the data was going on the remote processors.  The error value
>>> ierr=0 after MPI_Win_post, MPI_Win_start, MPI_Put, and
> MPI_Win_complete,
>>> and the code seg faults in MPI_Win_wait.
>>> 
>>> I'm using MPICH2 1.3.1 on Mac OS X 10.6.5, built with Intel XE (12.0)
>>> compilers, and running on just 2 (internal) processors of my Mac Pro.
>>> The code ran normally with this configuration up until the point I
> put
>>> the above in.  Several other communications with MPI_Put similar to
> the
>>> above work fine, though the windows are only on a subset of the
>>> communicated array, and the origin data is being PUT from part of the
>>> array that is not within the window.
>>> 
>>> _____________________________________________________
>>> Matt
>>> 
>>> _______________________________________________
>>> mpich-discuss mailing list
>>> mpich-discuss at mcs.anl.gov
>>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss



More information about the mpich-discuss mailing list