[mpich-discuss] Overflow in MPI_Aint

Pavan Balaji balaji at mcs.anl.gov
Tue Jul 21 13:55:07 CDT 2009


Joe: I believe all the Aint related patches have already gone into 1.1. 
Is something still missing?

Christina: If you update to mpich2-1.1, you can try the --with-aint-size 
configure option. Note that this has not been tested on anything other 
than BG/P, but it might be worth a shot.

  -- Pavan

On 07/21/2009 01:49 PM, Christina Patrick wrote:
> Hi Joe,
> 
> I am attaching my test case in this email. If you run it with any
> number of processes except one, it will give you the SIGFPE error.
> Similarly if you change the write in this program to a read, you will
> get the same problem.
> 
> I would sure appreciate a patch for this problem. If it is not too
> much trouble, could you please give me the patch? I could try making
> the corresponding changes to my setup.
> 
> Thanks and Regards,
> Christina.
> 
> On Tue, Jul 21, 2009 at 2:06 PM, Joe Ratterman<jratt at us.ibm.com> wrote:
>> Christina,
>> Blue Gene/P is a 32-bit platform where we have hit similar problems.  To get
>> around this, we increased the size of MPI_Aint in MPICH2 to be larger than
>> void*, to 64 bits.  I suspect that your test case would work on our system,
>> and I would like to see your test code if that is possible.  It should run
>> on our system, and I would like to make sure we have it correct.
>> If you are interested, we have patches against 1.0.7 and 1.1.0 that you can
>> use (we skipped 1.0.8).  If you can build MPICH2 using those patches, you
>> may be able to run your application.  On the other hand, they may be too
>> specific to our platform.  We have been working with ANL to incorporate our
>> changes into the standard MPICH2 releases, but there isn't a lot of demand
>> for 64-bit MPI-IO on 32-bit machines.
>>
>> Thanks,
>> Joe Ratterman
>> IBM Blue Gene/P Messsaging
>> jratt at us.ibm.com
>>
>>
>> On Fri, Jul 17, 2009 at 7:12 PM, Christina Patrick
>> <christina.subscribes at gmail.com> wrote:
>>> Hi Pavan,
>>>
>>> I ran the command
>>>
>>> $ getconf | grep -i WORD
>>> WORD_BIT=32
>>>
>>> So I guess it is a 32 bit system.
>>>
>>> Thanks and Regards,
>>> Christina.
>>>
>>> On Fri, Jul 17, 2009 at 8:06 PM, Pavan Balaji<balaji at mcs.anl.gov> wrote:
>>>> Is it a 32-bit system? MPI_Aint is the size of a (void *), so on 32-bit
>>>> systems it's restricted to 2GB.
>>>>
>>>>  -- Pavan
>>>>
>>>> On 07/17/2009 07:04 PM, Christina Patrick wrote:
>>>>> Hi Everybody,
>>>>>
>>>>> I am trying to create an array 32768 x 32768 x 8 bytes(double) = 8GB
>>>>> file using 16 MPI processes. However, everytime, I try doing that, MPI
>>>>> aborts. The backtrace is showing me that there is a problem in
>>>>> ADIOI_Calc_my_off_len() function. There is a variable there:
>>>>> MPI_Aint filetype_extent;
>>>>>
>>>>> and the value of the variable is filetype_extent = 0 whenever it
>>>>> executes
>>>>> MPI_Type_extent(fd->filetype, &filetype_extent);
>>>>> Hence, when it reaches the statement:
>>>>>    335             n_filetypes  = (offset - flat_file->indices[0]) /
>>>>> filetype_extent;
>>>>> I always get SIGFPE. Is there a solution to this problem? Can I create
>>>>> such a big file?
>>>>> I checked the value of the variable while creating a file of upto 2G
>>>>> and it is NOT zero which makes me conclude that there is an overflow
>>>>> when I am specifying 8G.
>>>>>
>>>>> Thanks and Regards,
>>>>> Christina.
>>>>>
>>>>> PS: I am using the PVFS2 filesystem with mpich2-1.0.8 and pvfs-2.8.0.
>>>> --
>>>> Pavan Balaji
>>>> http://www.mcs.anl.gov/~balaji
>>>>
>>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji


More information about the mpich-discuss mailing list