[mpich-discuss] Facing problem while using MPI_File_set_view

Christina Patrick christina.subscribes at gmail.com
Sun May 31 00:26:31 CDT 2009


Hi Wei-keng,

I tried the first code change (alone) on code base 2-1.0.8, but that
gave me a segfault.
So, I included the first and second code change and tested it on code
bases 2-1.0.8 and 2-1.0.8p1, but with these code changes in place
ADIOI_Calc_aggregator() calls MPI_Abort().

I think that you may have some other code change in addition to the
above mentioned. I am including the error messages that I mentioned
above. I am using pvfs-2.8.0. Which version of pvfs are you using?

mpich2-1.0.8 (Code change 1):
3:  Program received signal SIGSEGV, Segmentation fault.
3:  0x08057c22 in ADIOI_Calc_my_off_len (fd=0x819cb48, bufcount=1048576,
3:      datatype=1275070475, file_ptr_type=101, offset=-2147483648,
3:      offset_list_ptr=0xbff9972c, len_list_ptr=0xbff99720,
3:      start_offset_ptr=0xbff996f8, end_offset_ptr=0xbff996f0,
3:      contig_access_count_ptr=0xbff99740) at ad_read_coll.c:416
3:  416                 offset_list[k] = off;
3:  (gdb) 3:  (gdb) bt
3:  #0  0x08057c22 in ADIOI_Calc_my_off_len (fd=0x819cb48, bufcount=1048576,
3:      datatype=1275070475, file_ptr_type=101, offset=-2147483648,
3:      offset_list_ptr=0xbff9972c, len_list_ptr=0xbff99720,
3:      start_offset_ptr=0xbff996f8, end_offset_ptr=0xbff996f0,
3:      contig_access_count_ptr=0xbff99740) at ad_read_coll.c:416
3:  #1  0x08058ef0 in ADIOI_GEN_ReadStridedColl (fd=0x819cb48, buf=0xb77ed008,
3:      count=1048576, datatype=1275070475, file_ptr_type=101, offset=0,
3:      status=0xbff99858, error_code=0xbff997b8) at ad_read_coll.c:94
3:  #2  0x080543ee in MPIOI_File_read_all (mpi_fh=0x819cb48, offset=0,
3:      file_ptr_type=101, buf=0xb77ed008, count=1048576, datatype=1275070475,
3:      myname=0x8169e58 "MPI_FILE_READ_ALL", status=0xbff99858) at
read_all.c:106
3:  #3  0x080544e9 in PMPI_File_read_all (mpi_fh=0x819cb48, buf=0xb77ed008,
3:      count=1048576, datatype=1275070475, status=0xbff99858) at read_all.c:52
3:  #4  0x0804b5a2 in main (argc=1, argv=0xbff99dd4) at row.c:94

mpich2-1.0.8/mpich2-1.0.8p1 (Code change 1+2):
rank 0 in job 1  aum9.cse.psu.edu_58968   caused collective abort of all ranks
  exit status of rank 0: killed by signal 9
0:  Error in ADIOI_Calc_aggregator(): rank_index(-251) >=
fd->hints->cb_nodes (4) fd_size=10643444 off=-2147483648
0:  application called MPI_Abort(MPI_COMM_WORLD, 1) - process
0[cli_0]: aborting job:
0:  application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0

Thanks and Regards,
Christina.

On Sat, May 30, 2009 at 10:19 PM, Christina Patrick
<christina.subscribes at gmail.com> wrote:
> Thank you very much for giving me the fix. I really appreciate it.
> I will try it out and let you know.
>
> Regards,
> Christina.
>
> On Sat, May 30, 2009 at 3:03 AM, Wei-keng Liao
> <wkliao at ece.northwestern.edu> wrote:
>> Hi, Christina,
>>
>> There is another bug at line 970 in the same file.
>> The problem has been pointed out by the comments above that line.
>> A quick fix is: (starting from line 958)
>>
>> +    if (file_ptr_type == ADIO_INDIVIDUAL)
>> +        fd->fp_ind =
>> file_offsets[file_list_count-1]+file_lengths[file_list_count-1];
>>    ADIOI_Free(file_offsets);
>>    ADIOI_Free(file_lengths);
>>
>> -    /* Other ADIO routines will convert absolute bytes into counts of
>> datatypes */
>> -    /* when incrementing fp_ind, need to also take into account the file
>> type:
>> -     * consider an N-element 1-d subarray with a lb and ub: (
>> |---xxxxx-----|
>> -     * if we wrote N elements, offset needs to point at beginning of type,
>> not
>> -     * at empty region at offset N+1) */
>> -    if (file_ptr_type == ADIO_INDIVIDUAL) {
>> -        /* this is closer, but still incorrect for the cases where a small
>> -         * amount of a file type is "leftover" after a write */
>> -        fd->fp_ind = disp + flat_file->indices[j] +
>> -            ((ADIO_Offset)n_filetypes)*filetype_extent;
>> -    }
>>
>>
>> So, together with the earlier fix at line 477, I can run your test code
>> even with MPI_Type_create_subarray() and #define COLL_BUFSIZE (8388608)
>>
>> Wei-keng
>>
>>
>>
>> On May 29, 2009, at 11:38 PM, Wei-keng Liao wrote:
>>
>>> Hi, Christina,
>>>
>>> Can you try the following temporary fix? I believe there is a bug
>>> in adio/ad_pvfs2/ad_pvfs2_read.c, line 477. Please replace
>>>               file_lengths[0] = st_frd_size;
>>> with
>>>               file_lengths[0] = ADIOI_MIN(st_frd_size, bufsize);
>>>
>>>
>>> I tested you program with this fix and it ran fine.
>>> (I used mpich2-1.0.8p1)
>>>
>>> Wei-keng
>>>
>>>
>>>
>>> On May 28, 2009, at 1:28 PM, Christina Patrick wrote:
>>>
>>>> Yeah sure. Please go ahead.
>>>> In case you'll are able to figure out where the problem lies, please
>>>> do let me know so that I can download the patch.
>>>> Also, could you please tell me how to download the CVS version? And,
>>>> is the problem going away with the latest mpich CVS or pvfs CVS
>>>> versions?
>>>>
>>>> Thanks and Regards,
>>>> Christina.
>>>>
>>>>
>>>>
>>>> On Thu, May 28, 2009 at 8:43 AM, Rob Latham <robl at mcs.anl.gov> wrote:
>>>>>
>>>>> On Wed, May 27, 2009 at 06:04:28PM -0400, Christina Patrick wrote:
>>>>>>
>>>>>> Thank you very much. Since the API MPI_Type_create_subarray() is very
>>>>>> easy to use as compared to other API's used to create data types in
>>>>>> MPI, I would really appreciate a solution to this problem.
>>>>>> FYI: I am using pvfs 2.8.0.
>>>>>> I will check it out with pvfs2.8.1 and keep you posted.
>>>>>
>>>>> I think 2.8.1 will also crash.  You will probably have to download the
>>>>> CVS version if you want things to work right now.  We can work on
>>>>> figuring out just what changed between 2.8.1 and recent CVS.
>>>>>
>>>>> Can we add your program to the PVFS testsuite?
>>>>>
>>>>> ==rob
>>>>>
>>>>> --
>>>>> Rob Latham
>>>>> Mathematics and Computer Science Division
>>>>> Argonne National Lab, IL USA
>>>>>
>>>>
>>>
>>
>>
>


More information about the mpich-discuss mailing list