[mpich-discuss] Internal memory allocation error?

Gib Bogle g.bogle at auckland.ac.nz
Sun Oct 19 18:31:17 CDT 2008


Hi Brian,

I've just looked quickly at the first few lines of your main program, and I see a couple odd things.

(1) You say that nx and ny are set in the module subroutines, but I don't see any call to a 
subroutine to set nx and ny before they are used.

(2) Assuming that somehow you initialize nx to 415 and ny to 509, I don't understand these lines:

   pxl(1) = INT(numsent/ny) + 1
   pxl(2) = MOD(numsent,ny) + 1

since numsent < proc_num, which makes pxl(1) = 1, and pxl(2) = numsent+1.  Is this what you want?

Gib

Brian Harker wrote:
> Hi Rajeev and list-
> 
> Here's a code sample.  I'm assuming you could replace my subroutine
> "invert_pixel" with a dummy subroutine, and integer parameters, nx and
> ny (415 and 509 in my code) with something else.  BTW, I am using
> MPICH2 1.0.7 with the Intel icc,icpc,ifort compiler suite.  Thanks a
> lot!
> 
> On Sun, Oct 19, 2008 at 3:59 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>> Can you send us a code fragment that shows exactly what you are doing and
>> how many sends/recvs are being issued? You don't need to change sends to
>> isends, just the recvs.
>>
>> Rajeev
>>
>>> -----Original Message-----
>>> From: owner-mpich-discuss at mcs.anl.gov
>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Brian Harker
>>> Sent: Sunday, October 19, 2008 4:39 PM
>>> To: mpich-discuss at mcs.anl.gov
>>> Subject: Re: [mpich-discuss] Internal memory allocation error?
>>>
>>> Hello Rajeev and list-
>>>
>>> Well, I've replaced MPI_Send with MPI_ISend and MPI_Recv with
>>> MPI_Irecv, with the corresponding MPI_Wait calls as late as I
>>> possibly can while doing the intermediate calculations, and I
>>> still get the error.  The error even comes up when I use only
>>> one slave process to do the calculations (in essence the
>>> serial version of the algorithm).
>>>
>>> Is there a limit on the tag value that accompanies the MPI_Send?
>>>
>>>
>>>
>>> On Sat, Oct 18, 2008 at 3:39 PM, Rajeev Thakur
>>> <thakur at mcs.anl.gov> wrote:
>>>> Yes, you do need MPI_Wait or MPI_Waitall but you can call
>>> the Irecv as
>>>> early as possible and delay the Wait until just before you
>>> need the data.
>>>> Rajeev
>>>>
>>>>> -----Original Message-----
>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Brian Harker
>>>>> Sent: Saturday, October 18, 2008 11:38 AM
>>>>> To: mpich-discuss at mcs.anl.gov
>>>>> Subject: Re: [mpich-discuss] Internal memory allocation error?
>>>>>
>>>>> Thanks Rajeev...since MPI_Irecv is nonblocking, should I
>>> pair it up
>>>>> with an MPI_Wait to make sure I'm not trying to access a
>>> buffer that
>>>>> hasn't been written to yet?
>>>>>
>>>>> On Sat, Oct 18, 2008 at 9:38 AM, Rajeev Thakur
>>> <thakur at mcs.anl.gov>
>>>>> wrote:
>>>>>> This can happen if the sender does too many sends and
>>> the receiver
>>>>>> doesn't post receives fast enough. Try using MPI_Irecv
>>> and posting
>>>>>> enough of them to match the incoming sends.
>>>>>>
>>>>>> Rajeev
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
>>> Brian Harker
>>>>>>> Sent: Friday, October 17, 2008 4:19 PM
>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>> Subject: [mpich-discuss] Internal memory allocation error?
>>>>>>>
>>>>>>> Hello list-
>>>>>>>
>>>>>>> I have a fortran 90 program that loops over pixels in
>>> an image in
>>>>>>> parallel.  There are 211K total pixels in the
>>>>> field-of-view, and the
>>>>>>> code always crashes around the 160K^th pixel, give or take
>>>>> a hundred
>>>>>>> or so, with the following message:
>>>>>>>
>>>>>>> Fatal error in MPI_Recv: Other MPI error, error stack:
>>>>>>> MPI_Recv(186).............................:
>>>>>>> MPI_Recv(buf=0x82210d0, count=2, MPI_INTEGER, src=0,
>>>>> tag=MPI_ANY_TAG,
>>>>>>> MPI_COMM_WORLD,
>>>>>>> status=0x82210e0) failed
>>>>>>> MPIDI_CH3i_Progress_wait(214).............: an error
>>>>> occurred while
>>>>>>> handling an event returned by MPIDU_Sock_Wait()
>>>>>>> MPIDI_CH3I_Progress_handle_sock_event(436):
>>>>>>> MPIDI_EagerContigIsend(567)...............: failure
>>> occurred while
>>>>>>> allocating memory for a request object[cli_2]: aborting job:
>>>>>>>
>>>>>>> Now, I have no dynamically allocatable variables in the
>>>>> code, so the
>>>>>>> error means there is not enough memory in the buffer
>>> for all the
>>>>>>> communication at this step?  I have increased
>>>>> MP_BUFFER_MEM from the
>>>>>>> default 64M to 128M with no change in the error.  Is it
>>>>> possible that
>>>>>>> I'm just trying to do too much at once with my dual-core
>>>>> processor?
>>>>>>> I wouldn't think so, I'm only running the code with 6
>>>>> processes...and
>>>>>>> I don't believe this is a data problem.
>>>>>>>
>>>>>>> Any ideas would be appreciated, and I can post any other
>>>>> information
>>>>>>> anyone wants.  Thanks!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Cheers,
>>>>>>> Brian
>>>>>>> brian.harker at gmail.com
>>>>>>>
>>>>>>>
>>>>>>> "In science, there is only physics; all the rest is
>>>>> stamp-collecting."
>>>>>>> -Ernest Rutherford
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Cheers,
>>>>> Brian
>>>>> brian.harker at gmail.com
>>>>>
>>>>>
>>>>> "In science, there is only physics; all the rest is
>>> stamp-collecting."
>>>>> -Ernest Rutherford
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Brian
>>> brian.harker at gmail.com
>>>
>>>
>>> "In science, there is only physics; all the rest is stamp-collecting."
>>>
>>> -Ernest Rutherford
>>>
>>>
>>
> 
> 
> 




More information about the mpich-discuss mailing list