[mpich-discuss] Internal memory allocation error?

Rajeev Thakur thakur at mcs.anl.gov
Mon Oct 20 10:50:42 CDT 2008


Has the slave finished 161,999 pixels at that point? Just want to make sure
they are in lock-step as intended. I can't imagine how this code would fail
because in each iteration the slave waits for a message from the master, and
the master waits for a message from the slave. So one of them can't race
ahead and flood the other with unexpected messages.

Rajeev

> -----Original Message-----
> From: owner-mpich-discuss at mcs.anl.gov 
> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Brian Harker
> Sent: Monday, October 20, 2008 9:07 AM
> To: mpich-discuss at mcs.anl.gov
> Subject: Re: [mpich-discuss] Internal memory allocation error?
> 
> Hi Rajeev-
> 
> After the master process sends the coordinates for around the 
> 162,000th pixel, the targeted slave process fails to receive 
> them, and I get the error from my original post.  Thanks!
> 
> On Sun, Oct 19, 2008 at 7:11 PM, Rajeev Thakur 
> <thakur at mcs.anl.gov> wrote:
> > After how many sends and receives does it fail?
> >
> > Rajeev
> >
> >> -----Original Message-----
> >> From: owner-mpich-discuss at mcs.anl.gov 
> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Brian Harker
> >> Sent: Sunday, October 19, 2008 7:28 PM
> >> To: mpich-discuss at mcs.anl.gov
> >> Subject: Re: [mpich-discuss] Internal memory allocation error?
> >>
> >> Hi Rajeev-
> >>
> >> The code I have posted is pared down quite a bit.  I print the 
> >> "numsent" variable after each send and receive to make sure the 
> >> correct pixel has been dispatched.  I guess I should 
> mention that if 
> >> I run the same program on some other data that I have, 
> which has only 
> >> 68K pixels, it works flawlessly.
> >>
> >> On Sun, Oct 19, 2008 at 5:57 PM, Rajeev Thakur 
> <thakur at mcs.anl.gov> 
> >> wrote:
> >> > Brian,
> >> >      This code should not even need an Irecv because the 
> master and 
> >> > slave operate in lock step. And if you call Irecv followed
> >> by a Wait
> >> > immediately, you could as well call Recv.
> >> >
> >> > Since it fails with even one slave, run the 1 master 1
> >> slave version
> >> > and add print statements after the sends and receives to
> >> see how many
> >> > sends and receives are happening.
> >> >
> >> > Rajeev
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: owner-mpich-discuss at mcs.anl.gov 
> >> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of 
> Brian Harker
> >> >> Sent: Sunday, October 19, 2008 6:05 PM
> >> >> To: mpich-discuss at mcs.anl.gov
> >> >> Subject: Re: [mpich-discuss] Internal memory allocation error?
> >> >>
> >> >> Hi list-
> >> >>
> >> >> Here's my sample code for those that are wary of opening email 
> >> >> attachments  :)  Didn't think about that until just now.
> >> >>
> >> >> PROGRAM mpi_main
> >> >>
> >> >> USE subroutines  ! contains subroutine "invert_pixel"
> >> >> IMPLICIT NONE
> >> >>
> >> >> INCLUDE "mpif.h"
> >> >>
> >> >> ! total number of pixels
> >> >> INTEGER, PARAMETER :: np = nx * ny   !nx,ny specified in
> >> >> module "subroutines"
> >> >>
> >> >> ! parameter at each pixel
> >> >> DOUBLE PRECISION :: par
> >> >>
> >> >> ! full-size parameter array
> >> >> DOUBLE PRECISION, DIMENSION(nx,ny) :: map
> >> >>
> >> >> ! pixel coordinates
> >> >> INTEGER, DIMENSION(2) :: pxl
> >> >>
> >> >> ! dummy termination message
> >> >> INTEGER, PARAMETER :: term = 0
> >> >>
> >> >> ! MPI-related
> >> >> INTEGER :: proc_id
> >> >> INTEGER :: proc_num
> >> >> INTEGER :: master
> >> >> INTEGER :: ierr
> >> >> INTEGER :: sender
> >> >> INTEGER :: tag
> >> >> INTEGER :: numsent
> >> >> INTEGER :: request
> >> >> INTEGER, DIMENSION(MPI_STATUS_SIZE) :: status
> >> >>
> >> >> ! local indices
> >> >> INTEGER :: i
> >> >>
> >> >>
> >> 
> !********************************************************************!
> >> >> !************************** BEGIN PROGRAM
> >> ***************************!
> >> >>
> >> 
> !********************************************************************!
> >> >> master = 0
> >> >>
> >> >> ! initialize MPI libraries...
> >> >> CALL MPI_Init( ierr )
> >> >> CALL MPI_Comm_size( MPI_COMM_WORLD, proc_num, ierr ) CALL 
> >> >> MPI_Comm_rank( MPI_COMM_WORLD, proc_id, ierr )
> >> >>
> >> >> ! --------------------------------------------- !
> >> >> !             master process code               !
> >> >> ! --------------------------------------------- !
> >> >> IF ( proc_id == master ) THEN
> >> >>
> >> >>  ! dispatch first group of pixels to slave processes...
> >> >>  numsent = 0
> >> >>  DO i = 1, proc_num - 1
> >> >>
> >> >>   pxl(1) = INT(numsent/ny) + 1
> >> >>   pxl(2) = MOD(numsent,ny) + 1
> >> >>   CALL MPI_Send( pxl, 2, MPI_INTEGER, i, numsent+1,
> >> MPI_COMM_WORLD,
> >> >> ierr )
> >> >>   numsent = numsent + 1
> >> >>
> >> >>  END DO
> >> >>
> >> >>  mstr: DO
> >> >>
> >> >>         ! receive inferred parameter back from a slave...
> >> >>         CALL MPI_Recv( par, 1, MPI_DOUBLE_PRECISION,
> >> MPI_ANY_SOURCE,
> >> >> MPI_ANY_TAG, MPI_COMM_WORLD, status, ierr )
> >> >>         sender = status(MPI_SOURCE)
> >> >>         tag = status(MPI_TAG)
> >> >>
> >> >>         ! assemble into full-size array...
> >> >>         x = INT(tag/ny) + 1
> >> >>         y = MOD(tag,ny) + 1
> >> >>
> >> >>         map(x,y) = par
> >> >>
> >> >>         IF ( tag == np ) THEN
> >> >>          ! all done, send termination message...
> >> >>          DO j = 1, proc_num - 1
> >> >>           CALL MPI_Send( term, 1, MPI_INTEGER, j, 0,
> >> MPI_COMM_WORLD,
> >> >> ierr )
> >> >>          END DO
> >> >>          EXIT mstr
> >> >>         END IF
> >> >>
> >> >>         ! send the next available pixel to same slave...
> >> >>         pxl(1) = INT((numsent+1) / ny) + 1
> >> >>         pxl(2) = MOD((numsent+1), ny) + 1
> >> >>
> >> >>         CALL MPI_Send( pxl, 2, MPI_INTEGER, sender,
> >> >> numsent+1, MPI_COMM_WORLD, ierr )
> >> >>         numsent = numsent + 1
> >> >>
> >> >>        END DO mstr
> >> >> ! --------------------------------------------- !
> >> >> !              slave process code               !
> >> >> ! --------------------------------------------- !
> >> >> ELSE
> >> >>
> >> >>  slv: DO
> >> >>
> >> >>        ! receive pixel coordinates from master...
> >> >>        CALL MPI_Irecv( pxl, 2, MPI_INTEGER, master, 
> MPI_ANY_TAG, 
> >> >> MPI_COMM_WORLD, request, ierr )
> >> >>        CALL MPI_Wait( request, status, ierr )
> >> >>        tag = status(MPI_TAG)
> >> >>        IF ( tag == 0 ) THEN
> >> >>         ! all done, exit program...
> >> >>         EXIT slv
> >> >>        END IF
> >> >>
> >> >>        ! call genetic algorithm for inversion...
> >> >>        CALL invert_pixel( pxl, par )
> >> >>
> >> >>        ! send parameter result back to master...
> >> >>        CALL MPI_Send( par, 1, MPI_DOUBLE_PRECISION, 
> master, tag, 
> >> >> MPI_COMM_WORLD, ierr )
> >> >>       END DO slv
> >> >>
> >> >> END IF
> >> >>
> >> >> ! shut down MPI...
> >> >> CALL MPI_Finalize( ierr )
> >> >> WRITE(*,*) ""
> >> >> IF ( ierr == 0 ) WRITE(*,*) "Goodbye from process",proc_id
> >> >>
> >> >>
> >> 
> !********************************************************************!
> >> >> !**************************** END PROGRAM
> >> ***************************!
> >> >>
> >> 
> !********************************************************************!
> >> >>
> >> >> STOP
> >> >> END PROGRAM mpi_main
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Cheers,
> >> >> Brian
> >> >> brian.harker at gmail.com
> >> >>
> >> >>
> >> >> "In science, there is only physics; all the rest is
> >> stamp-collecting."
> >> >>
> >> >> -Ernest Rutherford
> >> >>
> >> >>
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Cheers,
> >> Brian
> >> brian.harker at gmail.com
> >>
> >>
> >> "In science, there is only physics; all the rest is 
> stamp-collecting."
> >>
> >> -Ernest Rutherford
> >>
> >>
> >
> >
> 
> 
> 
> --
> Cheers,
> Brian
> brian.harker at gmail.com
> 
> 
> "In science, there is only physics; all the rest is stamp-collecting."
> 
> -Ernest Rutherford
> 
> 




More information about the mpich-discuss mailing list