[MPICH] MPI_REDUCE with MPI_IN_PLACE fails with memory error

Rajeev Thakur thakur at mcs.anl.gov
Tue Mar 13 11:17:57 CDT 2007


Martin,
       Since a reduction operation has two operands, the MPI implementation
needs to allocate memory to store the second operand received from another
process. MPI_IN_PLACE only means that the root's contribution to the
reduction is to be found in its recvbuf instead of sendbuf.

You can avoid this problem by calling MPI_Reduce a few times on smaller
amounts of data, not the entire buffer at once. Call it 3 times for 200 MB
each, or whatever works.

Rajeev
  

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Martin 
> Kleinschmidt
> Sent: Tuesday, March 13, 2007 4:08 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: [MPICH] MPI_REDUCE with MPI_IN_PLACE fails with memory error
> 
> 
> Hi,
> 
> my mpi program fails with the following error:
> 
> #################
> [cli_0]: aborting job:
> Fatal error in MPI_Reduce: Other MPI error, error stack:
> MPI_Reduce(850).: MPI_Reduce(sbuf=MPI_IN_PLACE, rbuf=0x956c8008,
> count=76160987, MPI_DOUBLE_PRECISION, MPI_SUM, root=0, MPI_COMM_WORLD)
> failed
> MPIR_Reduce(149): Unable to allocate 609287896 bytes of memory for
> temporary buffer (probably out of memory)
> ##################
> 
> which is, of course, quite self-explaining.
> 
> The corresponding lines of code are:
> 
> #################
> #ifdef PARALLEL
>          if (myid .eq. 0) then
>             call MPI_Reduce(MPI_IN_PLACE, vecf2(1),
>      $           n*nneue,
>      $           MPI_double_precision, MPI_SUM, 0,
>      $           MPI_Comm_World, MPIerr)
>          else
>             call MPI_Reduce(vecf2(1),MPI_IN_PLACE,
>      $           n*nneue,
>      $           MPI_double_precision, MPI_SUM, 0,
>      $           MPI_Comm_World, MPIerr)
>          endif
> #endif
> #################
> with n*nneue = 76160987, and 76160987*8 = 609287896, about 600 MB
> 
> The point is: I thought, I could avoid the need for allocationg
> additional memory by using MPI_IN_PLACE, which obviously does 
> not work.
> 
> - do I use MPI_IN_PLACE in the right way?
> - why does MPI_IN_PLACE need additional memory?
> - is it possible to rewrite this code in a way that 
> eliminates the need
>   for allocating additional memory? This part of the code is not
>   time-critical - it is executed once every few hours.
> 
> (I'm using mpich2-1.0.5p2, Intel fortran compiler 9.1.040, Intel C
> compiler 9.1.045 for compiling both mpich and my code)
> 
>    ...martin
> 
> 




More information about the mpich-discuss mailing list