[MPICH] MPI_REDUCE with MPI_IN_PLACE fails with memory error

Tue Mar 13 15:04:48 CDT 2007

Hi, thanks to all of you for your comments!

On Di, 13 M?r 2007, Rajeev Thakur wrote:

>In my opinion, there is no problem with the use of MPI_IN_PLACE on non-root
>nodes in the example below, because the recvbuf argument is not significant
>on non-root nodes. You can pass NULL or any garbage or even MPI_IN_PLACE
>there.

Does this mean that mpich will allocate its own buffer regardless of
what is passed as recvbuf?

In onother message, Rajeev Thakur wrote:

>Since a reduction operation has two operands, the MPI implementation
>needs to allocate memory to store the second operand received from
>another process. MPI_IN_PLACE only means that the root's contribution
>to the reduction is to be found in its recvbuf instead of sendbuf.

OK, understood.
I rewrote my code by identifying a vector  which can easily be swapped
to disk, and using this vector as the recvbuf argument, then rereading
this vector from disk:

#########################
#ifdef PARALLEL
c     use vecf1 as scratch and reread later!
         vecf1 = 0.0d0
         call MPI_Reduce(vecf2(1), vecf1(1),
     $           n*nneue,
     $           MPI_double_precision, MPI_SUM, 0,
     $           MPI_Comm_World, MPIerr)
         if (myid .eq. 0) then
            vecf2 = vecf1
         endif
c     reread vecf1 from disk
      .....
#endif
##########################

I was hoping (no opportunity for a large test yet - cluster is full and
- as alwas - problems are in the largest calculations only ;)
that this would eliminate the need for allocation of scratch space by
mpich. But if I interpret the first quote right, this is not the case,
becauser mpich ignores the recvbuf argument on the nodes...?

>You can avoid this problem by calling MPI_Reduce a few times on smaller
>amounts of data, not the entire buffer at once. Call it 3 times for 200
>MB each, or whatever works.

As users tend to push everything to the limit, this is not a good
option. The calculation which caused my problem was done by a user,
consuming 800MB for Integrals and 2*600 MB of vectors on a 2GB node - so
no room for additional allocation of a buffer...

   ...martin

>
>> -----Original Message-----
>> From: owner-mpich-discuss at mcs.anl.gov 
>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Anthony Chan
>> Sent: Tuesday, March 13, 2007 12:15 PM
>> To: Rusty Lusk
>> Cc: Martin Kleinschmidt; mpich-discuss at mcs.anl.gov
>> Subject: Re: [MPICH] MPI_REDUCE with MPI_IN_PLACE fails with 
>> memory error
>> 
>> 
>> The error checking profiling library should be able to detect 
>> the problem.
>> 
>> A.Chan
>> 
>> On Tue, 13 Mar 2007, Rusty Lusk wrote:
>> 
>> > I believe our error-checking profiling library detects this error,
>> > doesn't it?
>> >
>> > On Mar 13, 2007, at 11:08 AM, Anthony Chan wrote:
>> >
>> > >
>> > > We have no access to intel 9.x compilers yet, can't 
>> directly verify
>> > > the
>> > > reported problem.  However you may be using MPI_IN_PLACE 
>> incorrectly
>> > > in non-root process.
>> > >
>> > > On Tue, 13 Mar 2007, Martin Kleinschmidt wrote:
>> > >
>> > >> The corresponding lines of code are:
>> > >>
>> > >> #################
>> > >> #ifdef PARALLEL
>> > >>          if (myid .eq. 0) then
>> > >>             call MPI_Reduce(MPI_IN_PLACE, vecf2(1),
>> > >>      $           n*nneue,
>> > >>      $           MPI_double_precision, MPI_SUM, 0,
>> > >>      $           MPI_Comm_World, MPIerr)
>> > >>          else
>> > >>             call MPI_Reduce(vecf2(1),MPI_IN_PLACE,
>> > >>      $           n*nneue,
>> > >>      $           MPI_double_precision, MPI_SUM, 0,
>> > >>      $           MPI_Comm_World, MPIerr)
>> > >>          endif
>> > >> #endif
>> > >
>> > > Try calling MPI_Reduce with MPI_IN_PLACE in send buffer in all
>> > > ranks, i.e.
>> > >
>> > >               call MPI_Reduce(MPI_IN_PLACE, vecf2(1),
>> > >        $           n*nneue,
>> > >        $           MPI_DOUBLE_PRECISION, MPI_SUM, 0,
>> > >        $           MPI_COMM_WORLD, MPIerr)
>> > >
>> > >
>> > > A.Chan
>> > >
>> > >> #################
>> > >> with n*nneue = 76160987, and 76160987*8 = 609287896, about 600 MB
>> > >>
>> > >> The point is: I thought, I could avoid the need for allocationg
>> > >> additional memory by using MPI_IN_PLACE, which obviously does not
>> > >> work.
>> > >>
>> > >> - do I use MPI_IN_PLACE in the right way?
>> > >> - why does MPI_IN_PLACE need additional memory?
>> > >> - is it possible to rewrite this code in a way that 
>> eliminates the
>> > >> need
>> > >>   for allocating additional memory? This part of the code is not
>> > >>   time-critical - it is executed once every few hours.
>> > >>
>> > >> (I'm using mpich2-1.0.5p2, Intel fortran compiler 
>> 9.1.040, Intel C
>> > >> compiler 9.1.045 for compiling both mpich and my code)
>> > >>
>> > >>    ...martin
>> > >>
>> > >>
>> > >
>> >
>> >
>> 
>> 

--