[mpich-discuss] Overflow in MPI_Aint

Joseph Ratterman jratt at us.ibm.com
Tue Jul 21 15:10:38 CDT 2009


Pavan,

There are a number of MPI_Aint changes that are not in 1.1.  For example, 
there is a block of changes in configure.in that relates to this, and it 
is only in the BGP version.



Christina,

I ran it on our system and is exited without error after creating an 8GB 
file.  You can find the patches here:
http://dcmf.anl-external.org/patches/patch-delete
http://dcmf.anl-external.org/patches/1.1.0.patch.gz
http://dcmf.anl-external.org/patches/1.0.7.patch.gz

The 1.0.7, the patch should apply normally.  The 1.1.0 version was changed 
to be a lot smaller by not including deleted files; you will have to run 
"patch-delete" against the resultant tree:
zcat patch.gz | patch -p0 --force --directory=mpich2-1.1
zcat patch.gz | ./patch-delete    --directory=mpich2-1.1

For both, you will also have to do a "developer style build" (see link), 
and use the "--with-aint-size=8" that Pavan mentioned.
http://wiki.mcs.anl.gov/mpich2/index.php/Getting_And_Building_MPICH2



$ mpirun -np 16 io
rows             = 32768
rows             = 32768
rows             = 32768
rows             = 32768
rows             = 32768
rows             = 32768
rows             = 32768
rows             = 32768
rows             = 32768
cols             = 32768
rows             = 32768
rows             = 32768
rows             = 32768
rows             = 32768
cols             = 32768
rows             = 32768
Filesize         = 0
cols             = 32768
rows             = 32768
cols             = 32768
cols             = 32768
cols             = 32768
cols             = 32768
cols             = 32768
cols             = 32768
Element size     = 8
cols             = 32768
cols             = 32768
cols             = 32768
cols             = 32768
Element size     = 8
cols             = 32768
rows             = 32768
Element size     = 8
cols             = 32768
Element size     = 8
Element size     = 8
Element size     = 8
Element size     = 8
Element size     = 8
Element size     = 8
Filesize         = 0
Element size     = 8
Element size     = 8
Element size     = 8
Element size     = 8
Filesize         = 0
Element size     = 8
cols             = 32768
Filesize         = 0
Element size     = 8
Filesize         = 0
Filesize         = 0
Filesize         = 0
Filesize         = 0
Filesize         = 0
Filesize         = 0
Coll buf size    = 8388608
Filesize         = 0
Filesize         = 0
Filesize         = 0
Filesize         = 0
Coll buf size    = 8388608
Filesize         = 0
Element size     = 8
Coll buf size    = 8388608
Filesize         = 0
Coll buf size    = 8388608
Coll buf size    = 8388608
Coll buf size    = 8388608
Coll buf size    = 8388608
Coll buf size    = 8388608
Coll buf size    = 8388608
Rows in coll buf = 512
Coll buf size    = 8388608
Coll buf size    = 8388608
Coll buf size    = 8388608
Coll buf size    = 8388608
Rows in coll buf = 512
Coll buf size    = 8388608
Filesize         = 0
Rows in coll buf = 512
Coll buf size    = 8388608
Rows in coll buf = 512
Rows in coll buf = 512
Rows in coll buf = 512
Rows in coll buf = 512
Rows in coll buf = 512
Rows in coll buf = 512
Cols in coll buf = 2048
Rows in coll buf = 512
Rows in coll buf = 512
Rows in coll buf = 512
Rows in coll buf = 512
Cols in coll buf = 2048
Rows in coll buf = 512
Coll buf size    = 8388608
Cols in coll buf = 2048
Rows in coll buf = 512
Cols in coll buf = 2048
Cols in coll buf = 2048
Cols in coll buf = 2048
Cols in coll buf = 2048
Cols in coll buf = 2048
Cols in coll buf = 2048
array_size[]     = {32768, 32768}
Cols in coll buf = 2048
Cols in coll buf = 2048
Cols in coll buf = 2048
Cols in coll buf = 2048
array_size[]     = {32768, 32768}
Cols in coll buf = 2048
Rows in coll buf = 512
array_size[]     = {32768, 32768}
Cols in coll buf = 2048
array_size[]     = {32768, 32768}
array_size[]     = {32768, 32768}
array_size[]     = {32768, 32768}
array_size[]     = {32768, 32768}
array_size[]     = {32768, 32768}
array_size[]     = {32768, 32768}
array_subsize[]  = {32768, 2048}
array_size[]     = {32768, 32768}
array_size[]     = {32768, 32768}
array_size[]     = {32768, 32768}
array_size[]     = {32768, 32768}
array_subsize[]  = {32768, 2048}
array_size[]     = {32768, 32768}
Cols in coll buf = 2048
array_subsize[]  = {32768, 2048}
array_size[]     = {32768, 32768}
array_subsize[]  = {32768, 2048}
array_subsize[]  = {32768, 2048}
array_subsize[]  = {32768, 2048}
array_subsize[]  = {32768, 2048}
array_subsize[]  = {32768, 2048}
array_subsize[]  = {32768, 2048}
array_start[]    = {0, 14336}
array_subsize[]  = {32768, 2048}
array_subsize[]  = {32768, 2048}
array_subsize[]  = {32768, 2048}
array_subsize[]  = {32768, 2048}
array_start[]    = {0, 22528}
array_subsize[]  = {32768, 2048}
array_size[]     = {32768, 32768}
array_start[]    = {0, 28672}
array_subsize[]  = {32768, 2048}
array_start[]    = {0, 4096}
array_start[]    = {0, 6144}
array_start[]    = {0, 12288}
array_start[]    = {0, 18432}
array_start[]    = {0, 2048}
array_start[]    = {0, 30720}
array_start[]    = {0, 20480}
array_start[]    = {0, 24576}
array_start[]    = {0, 16384}
array_start[]    = {0, 8192}
array_start[]    = {0, 26624}
array_subsize[]  = {32768, 2048}
array_start[]    = {0, 10240}
array_start[]    = {0, 0}



$ ls -l testfile 
-rw-r--r-- 1 jratt jratt 8589934592 2009-07-21 14:40 testfile




Thanks,
Joe Ratterman
IBM Blue Gene/P Messsaging
jratt at us.ibm.com






From:
Pavan Balaji <balaji at mcs.anl.gov>
To:
mpich-discuss at mcs.anl.gov
Cc:
Joseph Ratterman/Rochester/IBM at IBMUS
Date:
07/21/09 01:55 PM
Subject:
Re: [mpich-discuss] Overflow in MPI_Aint




Joe: I believe all the Aint related patches have already gone into 1.1. 
Is something still missing?

Christina: If you update to mpich2-1.1, you can try the --with-aint-size 
configure option. Note that this has not been tested on anything other 
than BG/P, but it might be worth a shot.

  -- Pavan

On 07/21/2009 01:49 PM, Christina Patrick wrote:
> Hi Joe,
> 
> I am attaching my test case in this email. If you run it with any
> number of processes except one, it will give you the SIGFPE error.
> Similarly if you change the write in this program to a read, you will
> get the same problem.
> 
> I would sure appreciate a patch for this problem. If it is not too
> much trouble, could you please give me the patch? I could try making
> the corresponding changes to my setup.
> 
> Thanks and Regards,
> Christina.
> 
> On Tue, Jul 21, 2009 at 2:06 PM, Joe Ratterman<jratt at us.ibm.com> wrote:
>> Christina,
>> Blue Gene/P is a 32-bit platform where we have hit similar problems. To 
get
>> around this, we increased the size of MPI_Aint in MPICH2 to be larger 
than
>> void*, to 64 bits.  I suspect that your test case would work on our 
system,
>> and I would like to see your test code if that is possible.  It should 
run
>> on our system, and I would like to make sure we have it correct.
>> If you are interested, we have patches against 1.0.7 and 1.1.0 that you 
can
>> use (we skipped 1.0.8).  If you can build MPICH2 using those patches, 
you
>> may be able to run your application.  On the other hand, they may be 
too
>> specific to our platform.  We have been working with ANL to incorporate 
our
>> changes into the standard MPICH2 releases, but there isn't a lot of 
demand
>> for 64-bit MPI-IO on 32-bit machines.
>>
>> Thanks,
>> Joe Ratterman
>> IBM Blue Gene/P Messsaging
>> jratt at us.ibm.com
>>
>>
>> On Fri, Jul 17, 2009 at 7:12 PM, Christina Patrick
>> <christina.subscribes at gmail.com> wrote:
>>> Hi Pavan,
>>>
>>> I ran the command
>>>
>>> $ getconf | grep -i WORD
>>> WORD_BIT=32
>>>
>>> So I guess it is a 32 bit system.
>>>
>>> Thanks and Regards,
>>> Christina.
>>>
>>> On Fri, Jul 17, 2009 at 8:06 PM, Pavan Balaji<balaji at mcs.anl.gov> 
wrote:
>>>> Is it a 32-bit system? MPI_Aint is the size of a (void *), so on 
32-bit
>>>> systems it's restricted to 2GB.
>>>>
>>>>  -- Pavan
>>>>
>>>> On 07/17/2009 07:04 PM, Christina Patrick wrote:
>>>>> Hi Everybody,
>>>>>
>>>>> I am trying to create an array 32768 x 32768 x 8 bytes(double) = 8GB
>>>>> file using 16 MPI processes. However, everytime, I try doing that, 
MPI
>>>>> aborts. The backtrace is showing me that there is a problem in
>>>>> ADIOI_Calc_my_off_len() function. There is a variable there:
>>>>> MPI_Aint filetype_extent;
>>>>>
>>>>> and the value of the variable is filetype_extent = 0 whenever it
>>>>> executes
>>>>> MPI_Type_extent(fd->filetype, &filetype_extent);
>>>>> Hence, when it reaches the statement:
>>>>>    335             n_filetypes  = (offset - flat_file->indices[0]) /
>>>>> filetype_extent;
>>>>> I always get SIGFPE. Is there a solution to this problem? Can I 
create
>>>>> such a big file?
>>>>> I checked the value of the variable while creating a file of upto 2G
>>>>> and it is NOT zero which makes me conclude that there is an overflow
>>>>> when I am specifying 8G.
>>>>>
>>>>> Thanks and Regards,
>>>>> Christina.
>>>>>
>>>>> PS: I am using the PVFS2 filesystem with mpich2-1.0.8 and 
pvfs-2.8.0.
>>>> --
>>>> Pavan Balaji
>>>> http://www.mcs.anl.gov/~balaji
>>>>
>>

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090721/914cd1ff/attachment-0001.htm>


More information about the mpich-discuss mailing list