[mpich-discuss] Internal memory allocation error?

Gib Bogle g.bogle at auckland.ac.nz
Sun Oct 19 19:19:23 CDT 2008


OK, sorry for my irrelevant post.

Gib

Brian Harker wrote:
> Hi Gib and list-
> 
> nx and ny are declared as constant parameters in module "subroutines"
> before the "contains" line, so they should be accessible to any scope
> that "use"s the "subroutines" module.  The variable "numsent" is
> updated every time the master process sends new pixel coordinates to a
> slave process.  I have extensively checked the distribution of pixel
> coordinates, and it is as it should be.  Thanks!
> 
> On Sun, Oct 19, 2008 at 5:31 PM, Gib Bogle <g.bogle at auckland.ac.nz> wrote:
>> Hi Brian,
>>
>> I've just looked quickly at the first few lines of your main program, and I
>> see a couple odd things.
>>
>> (1) You say that nx and ny are set in the module subroutines, but I don't
>> see any call to a subroutine to set nx and ny before they are used.
>>
>> (2) Assuming that somehow you initialize nx to 415 and ny to 509, I don't
>> understand these lines:
>>
>>  pxl(1) = INT(numsent/ny) + 1
>>  pxl(2) = MOD(numsent,ny) + 1
>>
>> since numsent < proc_num, which makes pxl(1) = 1, and pxl(2) = numsent+1.
>>  Is this what you want?
>>
>> Gib
>>
>> Brian Harker wrote:
>>> Hi Rajeev and list-
>>>
>>> Here's a code sample.  I'm assuming you could replace my subroutine
>>> "invert_pixel" with a dummy subroutine, and integer parameters, nx and
>>> ny (415 and 509 in my code) with something else.  BTW, I am using
>>> MPICH2 1.0.7 with the Intel icc,icpc,ifort compiler suite.  Thanks a
>>> lot!
>>>
>>> On Sun, Oct 19, 2008 at 3:59 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>>>> Can you send us a code fragment that shows exactly what you are doing and
>>>> how many sends/recvs are being issued? You don't need to change sends to
>>>> isends, just the recvs.
>>>>
>>>> Rajeev
>>>>
>>>>> -----Original Message-----
>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Brian Harker
>>>>> Sent: Sunday, October 19, 2008 4:39 PM
>>>>> To: mpich-discuss at mcs.anl.gov
>>>>> Subject: Re: [mpich-discuss] Internal memory allocation error?
>>>>>
>>>>> Hello Rajeev and list-
>>>>>
>>>>> Well, I've replaced MPI_Send with MPI_ISend and MPI_Recv with
>>>>> MPI_Irecv, with the corresponding MPI_Wait calls as late as I
>>>>> possibly can while doing the intermediate calculations, and I
>>>>> still get the error.  The error even comes up when I use only
>>>>> one slave process to do the calculations (in essence the
>>>>> serial version of the algorithm).
>>>>>
>>>>> Is there a limit on the tag value that accompanies the MPI_Send?
>>>>>
>>>>>
>>>>>
>>>>> On Sat, Oct 18, 2008 at 3:39 PM, Rajeev Thakur
>>>>> <thakur at mcs.anl.gov> wrote:
>>>>>> Yes, you do need MPI_Wait or MPI_Waitall but you can call
>>>>> the Irecv as
>>>>>> early as possible and delay the Wait until just before you
>>>>> need the data.
>>>>>> Rajeev
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Brian Harker
>>>>>>> Sent: Saturday, October 18, 2008 11:38 AM
>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>> Subject: Re: [mpich-discuss] Internal memory allocation error?
>>>>>>>
>>>>>>> Thanks Rajeev...since MPI_Irecv is nonblocking, should I
>>>>> pair it up
>>>>>>> with an MPI_Wait to make sure I'm not trying to access a
>>>>> buffer that
>>>>>>> hasn't been written to yet?
>>>>>>>
>>>>>>> On Sat, Oct 18, 2008 at 9:38 AM, Rajeev Thakur
>>>>> <thakur at mcs.anl.gov>
>>>>>>> wrote:
>>>>>>>> This can happen if the sender does too many sends and
>>>>> the receiver
>>>>>>>> doesn't post receives fast enough. Try using MPI_Irecv
>>>>> and posting
>>>>>>>> enough of them to match the incoming sends.
>>>>>>>>
>>>>>>>> Rajeev
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>>>>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of
>>>>> Brian Harker
>>>>>>>>> Sent: Friday, October 17, 2008 4:19 PM
>>>>>>>>> To: mpich-discuss at mcs.anl.gov
>>>>>>>>> Subject: [mpich-discuss] Internal memory allocation error?
>>>>>>>>>
>>>>>>>>> Hello list-
>>>>>>>>>
>>>>>>>>> I have a fortran 90 program that loops over pixels in
>>>>> an image in
>>>>>>>>> parallel.  There are 211K total pixels in the
>>>>>>> field-of-view, and the
>>>>>>>>> code always crashes around the 160K^th pixel, give or take
>>>>>>> a hundred
>>>>>>>>> or so, with the following message:
>>>>>>>>>
>>>>>>>>> Fatal error in MPI_Recv: Other MPI error, error stack:
>>>>>>>>> MPI_Recv(186).............................:
>>>>>>>>> MPI_Recv(buf=0x82210d0, count=2, MPI_INTEGER, src=0,
>>>>>>> tag=MPI_ANY_TAG,
>>>>>>>>> MPI_COMM_WORLD,
>>>>>>>>> status=0x82210e0) failed
>>>>>>>>> MPIDI_CH3i_Progress_wait(214).............: an error
>>>>>>> occurred while
>>>>>>>>> handling an event returned by MPIDU_Sock_Wait()
>>>>>>>>> MPIDI_CH3I_Progress_handle_sock_event(436):
>>>>>>>>> MPIDI_EagerContigIsend(567)...............: failure
>>>>> occurred while
>>>>>>>>> allocating memory for a request object[cli_2]: aborting job:
>>>>>>>>>
>>>>>>>>> Now, I have no dynamically allocatable variables in the
>>>>>>> code, so the
>>>>>>>>> error means there is not enough memory in the buffer
>>>>> for all the
>>>>>>>>> communication at this step?  I have increased
>>>>>>> MP_BUFFER_MEM from the
>>>>>>>>> default 64M to 128M with no change in the error.  Is it
>>>>>>> possible that
>>>>>>>>> I'm just trying to do too much at once with my dual-core
>>>>>>> processor?
>>>>>>>>> I wouldn't think so, I'm only running the code with 6
>>>>>>> processes...and
>>>>>>>>> I don't believe this is a data problem.
>>>>>>>>>
>>>>>>>>> Any ideas would be appreciated, and I can post any other
>>>>>>> information
>>>>>>>>> anyone wants.  Thanks!
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Cheers,
>>>>>>>>> Brian
>>>>>>>>> brian.harker at gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "In science, there is only physics; all the rest is
>>>>>>> stamp-collecting."
>>>>>>>>> -Ernest Rutherford
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Cheers,
>>>>>>> Brian
>>>>>>> brian.harker at gmail.com
>>>>>>>
>>>>>>>
>>>>>>> "In science, there is only physics; all the rest is
>>>>> stamp-collecting."
>>>>>>> -Ernest Rutherford
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> Cheers,
>>>>> Brian
>>>>> brian.harker at gmail.com
>>>>>
>>>>>
>>>>> "In science, there is only physics; all the rest is stamp-collecting."
>>>>>
>>>>> -Ernest Rutherford
>>>>>
>>>>>
>>>
>>>
>>
> 
> 
> 




More information about the mpich-discuss mailing list