[mpich-discuss] Internal memory allocation error?

Mon Oct 20 10:51:28 CDT 2008

Hi Brian,

I'm not sure if what I'll write will be any helpful but everything you 
describe
is similar to a frustrating problem I've run into. I've been unable to 
resolve it
and will post something to the list very soon, when I finish some final 
tests.

Have you tried running the same task on a different cluster with a different
MPI implementation ? I ask that because I'm confronted with performing
a 2-D data transposition of an array that has dimensions Nx * Ny * Nz
(typical values, Nx=256, Ny =128 & Nz=175) for the purpose of doing 2-D 
FFTs.
The data is stored contiguously in processor w/ respect to the i & k 
indices and
the array is partitioned across processors along the j-index. After 
transposition,
the array is partitioned in the i-index and stored contiguously w/r to 
the j & k indices.

I typically have to run about 40K to 250K of these transpositions during 
a simulation.
During a transposition over P procs., each processor posts P Irecvs & P 
sends.
I had repeatedly run this type of algorithm on a cluster with MPICH2 & 
Myrinet
interconnects for years w/o problems. When I migrated the code to two 
different
clusters (one with Open-MPI & Infiniband) and another (with MPICH2 & Myrinet
like before), the code failed. I redesigned the data transposition to 
use MPI_ALLTOALL
but still it failed. I also ran with some popular parallel transposition 
routines designed
by Steve Plimpton at Sandia. These also failed.

Again, not sure if this is relevant but there seems to be a problem when 
one has
to do a large number of Send/Receives with the message size being 
greater than
a certain length. My impression is that for certain switch topologies 
and hardware,
there is some buffer that gets overfilled. I tried doing my 
transposition with increasing
the number of Send/Receives but using smaller messages, in hope that 
despite the inefficiency
and increased latency, my codes would work. Again, things failed ...

I will soon post a more detailed description of my problem but I would 
like to know
if there is an issue with multiple large messages or multiple small ones 
hanging on a specific
cluster and if MPI is designed to be fool-proof in this regard, 
particularly in terms of calls
like MPI_ALLTOALL, which should have everything "hidden under the hood".

Cheers,

Pete

Brian Harker wrote:
> Hi Gib-
>
> Well, no dice.  I still get the same error.  Argh.
>
> On Mon, Oct 20, 2008 at 8:35 AM, Brian Harker <brian.harker at gmail.com> wrote:
>   
>> Hi Gib-
>>
>> Thanks for the idea, it's now running by dispatching full columns at a
>> time, so maybe I'll avoid the error, since the number of sends and
>> receives is drastically reduced.  I'll keep you posted.
>>
>> On Sun, Oct 19, 2008 at 7:17 PM, Gib Bogle <g.bogle at auckland.ac.nz> wrote:
>>     
>>> Hi Brian,
>>>
>>> You don't really send the data pixel-by-pixel do you?  If so that would be
>>> horrendously inefficient.  Why not send a row or column at a time?  Have I
>>> misunderstood you?
>>>
>>> Cheers
>>> Gib
>>>
>>>
>>> Brian Harker wrote:
>>>       
>>>> Hi Rajeev-
>>>>
>>>> The code I have posted is pared down quite a bit.  I print the
>>>> "numsent" variable after each send and receive to make sure the
>>>> correct pixel has been dispatched.  I guess I should mention that if I
>>>> run the same program on some other data that I have, which has only
>>>> 68K pixels, it works flawlessly.
>>>>
>>>> On Sun, Oct 19, 2008 at 5:57 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>>>>         
>>>>> Brian,
>>>>>     This code should not even need an Irecv because the master and slave
>>>>> operate in lock step. And if you call Irecv followed by a Wait
>>>>> immediately,
>>>>> you could as well call Recv.
>>>>>
>>>>> Since it fails with even one slave, run the 1 master 1 slave version and
>>>>> add
>>>>> print statements after the sends and receives to see how many sends and
>>>>> receives are happening.
>>>>>
>>>>> Rajeev
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Brian Harker
>>>>>> Sent: Sunday, October 19, 2008 6:05 PM
>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>> Subject: Re: [mpich-discuss] Internal memory allocation error?
>>>>>>
>>>>>> Hi list-
>>>>>>
>>>>>> Here's my sample code for those that are wary of opening
>>>>>> email attachments  :)  Didn't think about that until just now.
>>>>>>
>>>>>> PROGRAM mpi_main
>>>>>>
>>>>>> USE subroutines  ! contains subroutine "invert_pixel"
>>>>>> IMPLICIT NONE
>>>>>>
>>>>>> INCLUDE "mpif.h"
>>>>>>
>>>>>> ! total number of pixels
>>>>>> INTEGER, PARAMETER :: np = nx * ny   !nx,ny specified in
>>>>>> module "subroutines"
>>>>>>
>>>>>> ! parameter at each pixel
>>>>>> DOUBLE PRECISION :: par
>>>>>>
>>>>>> ! full-size parameter array
>>>>>> DOUBLE PRECISION, DIMENSION(nx,ny) :: map
>>>>>>
>>>>>> ! pixel coordinates
>>>>>> INTEGER, DIMENSION(2) :: pxl
>>>>>>
>>>>>> ! dummy termination message
>>>>>> INTEGER, PARAMETER :: term = 0
>>>>>>
>>>>>> ! MPI-related
>>>>>> INTEGER :: proc_id
>>>>>> INTEGER :: proc_num
>>>>>> INTEGER :: master
>>>>>> INTEGER :: ierr
>>>>>> INTEGER :: sender
>>>>>> INTEGER :: tag
>>>>>> INTEGER :: numsent
>>>>>> INTEGER :: request
>>>>>> INTEGER, DIMENSION(MPI_STATUS_SIZE) :: status
>>>>>>
>>>>>> ! local indices
>>>>>> INTEGER :: i
>>>>>>
>>>>>> !********************************************************************!
>>>>>> !************************** BEGIN PROGRAM ***************************!
>>>>>> !********************************************************************!
>>>>>> master = 0
>>>>>>
>>>>>> ! initialize MPI libraries...
>>>>>> CALL MPI_Init( ierr )
>>>>>> CALL MPI_Comm_size( MPI_COMM_WORLD, proc_num, ierr ) CALL
>>>>>> MPI_Comm_rank( MPI_COMM_WORLD, proc_id, ierr )
>>>>>>
>>>>>> ! --------------------------------------------- !
>>>>>> !             master process code               !
>>>>>> ! --------------------------------------------- !
>>>>>> IF ( proc_id == master ) THEN
>>>>>>
>>>>>>  ! dispatch first group of pixels to slave processes...
>>>>>>  numsent = 0
>>>>>>  DO i = 1, proc_num - 1
>>>>>>
>>>>>>  pxl(1) = INT(numsent/ny) + 1
>>>>>>  pxl(2) = MOD(numsent,ny) + 1
>>>>>>  CALL MPI_Send( pxl, 2, MPI_INTEGER, i, numsent+1,
>>>>>> MPI_COMM_WORLD, ierr )
>>>>>>  numsent = numsent + 1
>>>>>>
>>>>>>  END DO
>>>>>>
>>>>>>  mstr: DO
>>>>>>
>>>>>>        ! receive inferred parameter back from a slave...
>>>>>>        CALL MPI_Recv( par, 1, MPI_DOUBLE_PRECISION,
>>>>>> MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, status, ierr )
>>>>>>        sender = status(MPI_SOURCE)
>>>>>>        tag = status(MPI_TAG)
>>>>>>
>>>>>>        ! assemble into full-size array...
>>>>>>        x = INT(tag/ny) + 1
>>>>>>        y = MOD(tag,ny) + 1
>>>>>>
>>>>>>        map(x,y) = par
>>>>>>
>>>>>>        IF ( tag == np ) THEN
>>>>>>         ! all done, send termination message...
>>>>>>         DO j = 1, proc_num - 1
>>>>>>          CALL MPI_Send( term, 1, MPI_INTEGER, j, 0,
>>>>>> MPI_COMM_WORLD, ierr )
>>>>>>         END DO
>>>>>>         EXIT mstr
>>>>>>        END IF
>>>>>>
>>>>>>        ! send the next available pixel to same slave...
>>>>>>        pxl(1) = INT((numsent+1) / ny) + 1
>>>>>>        pxl(2) = MOD((numsent+1), ny) + 1
>>>>>>
>>>>>>        CALL MPI_Send( pxl, 2, MPI_INTEGER, sender,
>>>>>> numsent+1, MPI_COMM_WORLD, ierr )
>>>>>>        numsent = numsent + 1
>>>>>>
>>>>>>       END DO mstr
>>>>>> ! --------------------------------------------- !
>>>>>> !              slave process code               !
>>>>>> ! --------------------------------------------- !
>>>>>> ELSE
>>>>>>
>>>>>>  slv: DO
>>>>>>
>>>>>>       ! receive pixel coordinates from master...
>>>>>>       CALL MPI_Irecv( pxl, 2, MPI_INTEGER, master,
>>>>>> MPI_ANY_TAG, MPI_COMM_WORLD, request, ierr )
>>>>>>       CALL MPI_Wait( request, status, ierr )
>>>>>>       tag = status(MPI_TAG)
>>>>>>       IF ( tag == 0 ) THEN
>>>>>>        ! all done, exit program...
>>>>>>        EXIT slv
>>>>>>       END IF
>>>>>>
>>>>>>       ! call genetic algorithm for inversion...
>>>>>>       CALL invert_pixel( pxl, par )
>>>>>>
>>>>>>       ! send parameter result back to master...
>>>>>>       CALL MPI_Send( par, 1, MPI_DOUBLE_PRECISION, master,
>>>>>> tag, MPI_COMM_WORLD, ierr )
>>>>>>      END DO slv
>>>>>>
>>>>>> END IF
>>>>>>
>>>>>> ! shut down MPI...
>>>>>> CALL MPI_Finalize( ierr )
>>>>>> WRITE(*,*) ""
>>>>>> IF ( ierr == 0 ) WRITE(*,*) "Goodbye from process",proc_id
>>>>>>
>>>>>> !********************************************************************!
>>>>>> !**************************** END PROGRAM ***************************!
>>>>>> !********************************************************************!
>>>>>>
>>>>>> STOP
>>>>>> END PROGRAM mpi_main
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Cheers,
>>>>>> Brian
>>>>>> brian.harker at gmail.com
>>>>>>
>>>>>>
>>>>>> "In science, there is only physics; all the rest is stamp-collecting."
>>>>>>
>>>>>> -Ernest Rutherford
>>>>>>
>>>>>>
>>>>>>             
>>>>
>>>>         
>>>       
>>
>> --
>> Cheers,
>> Brian
>> brian.harker at gmail.com
>>
>>
>> "In science, there is only physics; all the rest is stamp-collecting."
>>
>> -Ernest Rutherford
>>
>>     
>
>
>
>   

-- 

-------------------------------------------------------------
Peter Diamessis
Assistant Professor
Environmental Fluid Mechanics & Hydrology
School of Civil and Environmental Engineering
Cornell University
Ithaca, NY 14853
Phone: (607)-255-1719 --- Fax: (607)-255-9004
pjd38 at cornell.edu <mailto:pjd38 at cornell.edu>
http://www.cee.cornell.edu/faculty/pjd38