[mpich-discuss] Internal memory allocation error?

Brian Harker brian.harker at gmail.com
Sun Oct 19 17:41:24 CDT 2008


Hi Rajeev and list-

Here's a code sample.  I'm assuming you could replace my subroutine
"invert_pixel" with a dummy subroutine, and integer parameters, nx and
ny (415 and 509 in my code) with something else.  BTW, I am using
MPICH2 1.0.7 with the Intel icc,icpc,ifort compiler suite.  Thanks a
lot!

On Sun, Oct 19, 2008 at 3:59 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> Can you send us a code fragment that shows exactly what you are doing and
> how many sends/recvs are being issued? You don't need to change sends to
> isends, just the recvs.
>
> Rajeev
>
>> -----Original Message-----
>> From: owner-mpich-discuss at mcs.anl.gov
>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Brian Harker
>> Sent: Sunday, October 19, 2008 4:39 PM
>> To: mpich-discuss at mcs.anl.gov
>> Subject: Re: [mpich-discuss] Internal memory allocation error?
>>
>> Hello Rajeev and list-
>>
>> Well, I've replaced MPI_Send with MPI_ISend and MPI_Recv with
>> MPI_Irecv, with the corresponding MPI_Wait calls as late as I
>> possibly can while doing the intermediate calculations, and I
>> still get the error.  The error even comes up when I use only
>> one slave process to do the calculations (in essence the
>> serial version of the algorithm).
>>
>> Is there a limit on the tag value that accompanies the MPI_Send?
>>
>>
>>
>> On Sat, Oct 18, 2008 at 3:39 PM, Rajeev Thakur
>> <thakur at mcs.anl.gov> wrote:
>> > Yes, you do need MPI_Wait or MPI_Waitall but you can call
>> the Irecv as
>> > early as possible and delay the Wait until just before you
>> need the data.
>> >
>> > Rajeev
>> >
>> >> -----Original Message-----
>> >> From: owner-mpich-discuss at mcs.anl.gov
>> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Brian Harker
>> >> Sent: Saturday, October 18, 2008 11:38 AM
>> >> To: mpich-discuss at mcs.anl.gov
>> >> Subject: Re: [mpich-discuss] Internal memory allocation error?
>> >>
>> >> Thanks Rajeev...since MPI_Irecv is nonblocking, should I
>> pair it up
>> >> with an MPI_Wait to make sure I'm not trying to access a
>> buffer that
>> >> hasn't been written to yet?
>> >>
>> >> On Sat, Oct 18, 2008 at 9:38 AM, Rajeev Thakur
>> <thakur at mcs.anl.gov>
>> >> wrote:
>> >> > This can happen if the sender does too many sends and
>> the receiver
>> >> > doesn't post receives fast enough. Try using MPI_Irecv
>> and posting
>> >> > enough of them to match the incoming sends.
>> >> >
>> >> > Rajeev
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: owner-mpich-discuss at mcs.anl.gov
>> >> >> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
>> Brian Harker
>> >> >> Sent: Friday, October 17, 2008 4:19 PM
>> >> >> To: mpich-discuss at mcs.anl.gov
>> >> >> Subject: [mpich-discuss] Internal memory allocation error?
>> >> >>
>> >> >> Hello list-
>> >> >>
>> >> >> I have a fortran 90 program that loops over pixels in
>> an image in
>> >> >> parallel.  There are 211K total pixels in the
>> >> field-of-view, and the
>> >> >> code always crashes around the 160K^th pixel, give or take
>> >> a hundred
>> >> >> or so, with the following message:
>> >> >>
>> >> >> Fatal error in MPI_Recv: Other MPI error, error stack:
>> >> >> MPI_Recv(186).............................:
>> >> >> MPI_Recv(buf=0x82210d0, count=2, MPI_INTEGER, src=0,
>> >> tag=MPI_ANY_TAG,
>> >> >> MPI_COMM_WORLD,
>> >> >> status=0x82210e0) failed
>> >> >> MPIDI_CH3i_Progress_wait(214).............: an error
>> >> occurred while
>> >> >> handling an event returned by MPIDU_Sock_Wait()
>> >> >> MPIDI_CH3I_Progress_handle_sock_event(436):
>> >> >> MPIDI_EagerContigIsend(567)...............: failure
>> occurred while
>> >> >> allocating memory for a request object[cli_2]: aborting job:
>> >> >>
>> >> >> Now, I have no dynamically allocatable variables in the
>> >> code, so the
>> >> >> error means there is not enough memory in the buffer
>> for all the
>> >> >> communication at this step?  I have increased
>> >> MP_BUFFER_MEM from the
>> >> >> default 64M to 128M with no change in the error.  Is it
>> >> possible that
>> >> >> I'm just trying to do too much at once with my dual-core
>> >> processor?
>> >> >> I wouldn't think so, I'm only running the code with 6
>> >> processes...and
>> >> >> I don't believe this is a data problem.
>> >> >>
>> >> >> Any ideas would be appreciated, and I can post any other
>> >> information
>> >> >> anyone wants.  Thanks!
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Cheers,
>> >> >> Brian
>> >> >> brian.harker at gmail.com
>> >> >>
>> >> >>
>> >> >> "In science, there is only physics; all the rest is
>> >> stamp-collecting."
>> >> >>
>> >> >> -Ernest Rutherford
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Cheers,
>> >> Brian
>> >> brian.harker at gmail.com
>> >>
>> >>
>> >> "In science, there is only physics; all the rest is
>> stamp-collecting."
>> >>
>> >> -Ernest Rutherford
>> >>
>> >>
>> >
>> >
>>
>>
>>
>> --
>> Cheers,
>> Brian
>> brian.harker at gmail.com
>>
>>
>> "In science, there is only physics; all the rest is stamp-collecting."
>>
>> -Ernest Rutherford
>>
>>
>
>



-- 
Cheers,
Brian
brian.harker at gmail.com


"In science, there is only physics; all the rest is stamp-collecting."

-Ernest Rutherford
-------------- next part --------------
A non-text attachment was scrubbed...
Name: driver.f90
Type: application/octet-stream
Size: 3497 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20081019/0e33e71a/attachment.obj>


More information about the mpich-discuss mailing list