[mpich-discuss] Using MPI_Put/Get correctly?

Grismer, Matthew J Civ USAF AFMC AFRL/RBAT Matthew.Grismer at wpafb.af.mil
Thu Dec 16 14:48:25 CST 2010


I actually tried changing to gets yesterday, still got the same error.
I need the indexed datatype because the data is irregularly spaced, and
I'm pretty confident they are setup correctly (the default version of
the code uses them with fine with sends and receives, i.e. gives the
correct answers).

Matt

-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of Rajeev Thakur
Sent: Thursday, December 16, 2010 3:38 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Using MPI_Put/Get correctly?

Puts from window memory may be ok (I am not 100% positive), but that is
not the cause of the segfault in any case. You may be able to simplify
the code by using a single vector datatype and using the displacement
parameter to Put to place it in the right location. You could also try
replacing puts with gets and see if you still get an error.

Rajeev

On Dec 16, 2010, at 2:27 PM, James Dinan wrote:

> Hi Matt,
> 
> If my understanding is correct, the only time you are allowed to
perform direct load/store accesses on local data that is exposed in a
window is when the window is closed under active target or when you are
in an exclusive access epoch under passive mode target.  So I think what
you are doing may be invalid even though you are able to guarantee that
accesses do not overlap.  The source for your put will need to be a
private buffer, you may be able to accomplish this easily in your code
or you might have to copy data into a private buffer (before you post
the window) before you can put().
> 
> Even though this is outside of the standard, some (many?) MPI
implementations may actually allow this on cache-coherent systems (I
think MPICH2 on shared memory will allow it).
> 
> I would be surprised if this error is causing your seg fault (more
likely it should just result in corrupted data within the bounds of your
buffer).  I would tend to suspect that something is off in your
datatype, possibly the target datatype since the segfault occurs in
wait() which is when data might be getting unpacked at the target.  Can
you run your code through a debugger or valgrind to give us more
information on how/when the seg faul occurs?
> 
> Cheers,
> ~Jim.
> 
> On 12/16/2010 12:33 PM, Grismer, Matthew J Civ USAF AFMC AFRL/RBAT
wrote:
>> I am trying to modify the communication routines in our code to use
>> MPI_Put's instead of sends and receives.  This worked fine for
several
>> variable Put's, but now I have one that is causing seg faults.
Reading
>> through the MPI documentation it is not clear to me if what I am
doing
>> is permissible or not.  Basically, the question is this - if I have
>> defined all of an array as a window on each processor, can I PUT data
>> from that array to remote processes at the same time as the remote
>> processes are PUTing into the local copy, assuming no overlaps of any
of
>> the PUTs?
>> 
>> Here are the details if that doesn't make sense.  I have a (Fortran)
>> array QF(6,2,N) on each processor, where N could be a very large
number
>> (100,000). I create a window QFWIN on the entire array on all the
>> processors.  I define MPI_Type_indexed "sending" datatypes (QFSND)
with
>> block lengths of 6 that send from QF(1,1,*), and MPI_Type_indexed
>> "receiving" datatypes (QFREC) with block lengths of 6 the receive
into
>> QF(1,2,*).  Here * is non-repeating set of integers up to N.  I
create
>> groups of processors that communicate, where these groups will all
>> exchange QF data, PUTing local QF(1,1,*) to remote QF(1,2,*).  So,
>> processor 1 is PUTing QF data to processors 2,3,4 at the same time
2,3,4
>> are putting their QF data to 1, and so on.  Processors 2,3,4 are
PUTing
>> into non-overlapping regions of QF(1,2,*) on 1, and 1 is PUTing from
>> QF(1,1,*) to 2,3,4, and so on.  So, my calls look like this on each
>> processor:
>> 
>> assertion = 0
>> call MPI_Win_post(group, assertion, QFWIN, ierr)
>> call MPI_Win_start(group, assertion, QFWIN, ierr)
>> 
>> do I=1,neighbors
>>   call MPI_Put(QF, 1, QFSND(I), NEIGHBOR(I), 0, 1, QFREC(I), QFWIN,
>> ierr)
>> end do
>> 
>> call MPI_Win_complete(QFWIN,ierr)
>> call MPI_Win_wait(QFWIN,ierr)
>> 
>> Note I did define QFREC locally on each processor to properly
represent
>> where the data was going on the remote processors.  The error value
>> ierr=0 after MPI_Win_post, MPI_Win_start, MPI_Put, and
MPI_Win_complete,
>> and the code seg faults in MPI_Win_wait.
>> 
>> I'm using MPICH2 1.3.1 on Mac OS X 10.6.5, built with Intel XE (12.0)
>> compilers, and running on just 2 (internal) processors of my Mac Pro.
>> The code ran normally with this configuration up until the point I
put
>> the above in.  Several other communications with MPI_Put similar to
the
>> above work fine, though the windows are only on a subset of the
>> communicated array, and the origin data is being PUT from part of the
>> array that is not within the window.
>> 
>> _____________________________________________________
>> Matt
>> 
>> _______________________________________________
>> mpich-discuss mailing list
>> mpich-discuss at mcs.anl.gov
>> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss
> 
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list