[mpich-discuss] Using MPI_Put/Get correctly?

Grismer, Matthew J Civ USAF AFMC AFRL/RBAT Matthew.Grismer at wpafb.af.mil
Thu Dec 16 15:35:21 CST 2010


I attached to the running processes with gdb, and get the following
error when the code dies:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason:  13 at address 0x00000000000
0x000000010040a5e5 in MPID_Segment_blkidx_m2m ()

if that is any help at all...

-----Original Message-----
From: mpich-discuss-bounces at mcs.anl.gov
[mailto:mpich-discuss-bounces at mcs.anl.gov] On Behalf Of James Dinan
Sent: Thursday, December 16, 2010 3:28 PM
To: mpich-discuss at mcs.anl.gov
Subject: Re: [mpich-discuss] Using MPI_Put/Get correctly?

Hi Matt,

If my understanding is correct, the only time you are allowed to perform

direct load/store accesses on local data that is exposed in a window is 
when the window is closed under active target or when you are in an 
exclusive access epoch under passive mode target.  So I think what you 
are doing may be invalid even though you are able to guarantee that 
accesses do not overlap.  The source for your put will need to be a 
private buffer, you may be able to accomplish this easily in your code 
or you might have to copy data into a private buffer (before you post 
the window) before you can put().

Even though this is outside of the standard, some (many?) MPI 
implementations may actually allow this on cache-coherent systems (I 
think MPICH2 on shared memory will allow it).

I would be surprised if this error is causing your seg fault (more 
likely it should just result in corrupted data within the bounds of your

buffer).  I would tend to suspect that something is off in your 
datatype, possibly the target datatype since the segfault occurs in 
wait() which is when data might be getting unpacked at the target.  Can 
you run your code through a debugger or valgrind to give us more 
information on how/when the seg faul occurs?

Cheers,
  ~Jim.

On 12/16/2010 12:33 PM, Grismer, Matthew J Civ USAF AFMC AFRL/RBAT
wrote:
> I am trying to modify the communication routines in our code to use
> MPI_Put's instead of sends and receives.  This worked fine for several
> variable Put's, but now I have one that is causing seg faults. Reading
> through the MPI documentation it is not clear to me if what I am doing
> is permissible or not.  Basically, the question is this - if I have
> defined all of an array as a window on each processor, can I PUT data
> from that array to remote processes at the same time as the remote
> processes are PUTing into the local copy, assuming no overlaps of any
of
> the PUTs?
>
> Here are the details if that doesn't make sense.  I have a (Fortran)
> array QF(6,2,N) on each processor, where N could be a very large
number
> (100,000). I create a window QFWIN on the entire array on all the
> processors.  I define MPI_Type_indexed "sending" datatypes (QFSND)
with
> block lengths of 6 that send from QF(1,1,*), and MPI_Type_indexed
> "receiving" datatypes (QFREC) with block lengths of 6 the receive into
> QF(1,2,*).  Here * is non-repeating set of integers up to N.  I create
> groups of processors that communicate, where these groups will all
> exchange QF data, PUTing local QF(1,1,*) to remote QF(1,2,*).  So,
> processor 1 is PUTing QF data to processors 2,3,4 at the same time
2,3,4
> are putting their QF data to 1, and so on.  Processors 2,3,4 are
PUTing
> into non-overlapping regions of QF(1,2,*) on 1, and 1 is PUTing from
> QF(1,1,*) to 2,3,4, and so on.  So, my calls look like this on each
> processor:
>
> assertion = 0
> call MPI_Win_post(group, assertion, QFWIN, ierr)
> call MPI_Win_start(group, assertion, QFWIN, ierr)
>
> do I=1,neighbors
>    call MPI_Put(QF, 1, QFSND(I), NEIGHBOR(I), 0, 1, QFREC(I), QFWIN,
> ierr)
> end do
>
> call MPI_Win_complete(QFWIN,ierr)
> call MPI_Win_wait(QFWIN,ierr)
>
> Note I did define QFREC locally on each processor to properly
represent
> where the data was going on the remote processors.  The error value
> ierr=0 after MPI_Win_post, MPI_Win_start, MPI_Put, and
MPI_Win_complete,
> and the code seg faults in MPI_Win_wait.
>
> I'm using MPICH2 1.3.1 on Mac OS X 10.6.5, built with Intel XE (12.0)
> compilers, and running on just 2 (internal) processors of my Mac Pro.
> The code ran normally with this configuration up until the point I put
> the above in.  Several other communications with MPI_Put similar to
the
> above work fine, though the windows are only on a subset of the
> communicated array, and the origin data is being PUT from part of the
> array that is not within the window.
>
> _____________________________________________________
> Matt
>
> _______________________________________________
> mpich-discuss mailing list
> mpich-discuss at mcs.anl.gov
> https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss

_______________________________________________
mpich-discuss mailing list
mpich-discuss at mcs.anl.gov
https://lists.mcs.anl.gov/mailman/listinfo/mpich-discuss


More information about the mpich-discuss mailing list