[mpich-discuss] Facing problem while using MPI_File_set_view

Christina Patrick christina.subscribes at gmail.com
Sun May 31 12:03:04 CDT 2009


Hi Wei-keng,

I tested the program and it is working with the patch. Thank you very
much for your help. I really appreciate it.

Have a nice day,
Regards,
Christina.

On Sun, May 31, 2009 at 2:23 AM, Wei-keng Liao
<wkliao at ece.northwestern.edu> wrote:
> Christina,
>
> I attached my patch for mpich2-1.0.8p1 and the test codes modified from
> yours with only changes to COLL_BUFSIZE and fname.) I ran it with
> command "mpiexec -l -n 4 a.out" and had no error at all. From your
> debug printout, you seemed not running the same codes. Can you give
> my code a try first?
>
> My pvfs2 is pvfs-2.8.1 and I don't think it is pvfs2's problem.
>
> Wei-keng
>
>
>
>
>
>
> On May 31, 2009, at 12:26 AM, Christina Patrick wrote:
>
>> Hi Wei-keng,
>>
>> I tried the first code change (alone) on code base 2-1.0.8, but that
>> gave me a segfault.
>> So, I included the first and second code change and tested it on code
>> bases 2-1.0.8 and 2-1.0.8p1, but with these code changes in place
>> ADIOI_Calc_aggregator() calls MPI_Abort().
>>
>> I think that you may have some other code change in addition to the
>> above mentioned. I am including the error messages that I mentioned
>> above. I am using pvfs-2.8.0. Which version of pvfs are you using?
>>
>> mpich2-1.0.8 (Code change 1):
>> 3:  Program received signal SIGSEGV, Segmentation fault.
>> 3:  0x08057c22 in ADIOI_Calc_my_off_len (fd=0x819cb48, bufcount=1048576,
>> 3:      datatype=1275070475, file_ptr_type=101, offset=-2147483648,
>> 3:      offset_list_ptr=0xbff9972c, len_list_ptr=0xbff99720,
>> 3:      start_offset_ptr=0xbff996f8, end_offset_ptr=0xbff996f0,
>> 3:      contig_access_count_ptr=0xbff99740) at ad_read_coll.c:416
>> 3:  416                 offset_list[k] = off;
>> 3:  (gdb) 3:  (gdb) bt
>> 3:  #0  0x08057c22 in ADIOI_Calc_my_off_len (fd=0x819cb48,
>> bufcount=1048576,
>> 3:      datatype=1275070475, file_ptr_type=101, offset=-2147483648,
>> 3:      offset_list_ptr=0xbff9972c, len_list_ptr=0xbff99720,
>> 3:      start_offset_ptr=0xbff996f8, end_offset_ptr=0xbff996f0,
>> 3:      contig_access_count_ptr=0xbff99740) at ad_read_coll.c:416
>> 3:  #1  0x08058ef0 in ADIOI_GEN_ReadStridedColl (fd=0x819cb48,
>> buf=0xb77ed008,
>> 3:      count=1048576, datatype=1275070475, file_ptr_type=101, offset=0,
>> 3:      status=0xbff99858, error_code=0xbff997b8) at ad_read_coll.c:94
>> 3:  #2  0x080543ee in MPIOI_File_read_all (mpi_fh=0x819cb48, offset=0,
>> 3:      file_ptr_type=101, buf=0xb77ed008, count=1048576,
>> datatype=1275070475,
>> 3:      myname=0x8169e58 "MPI_FILE_READ_ALL", status=0xbff99858) at
>> read_all.c:106
>> 3:  #3  0x080544e9 in PMPI_File_read_all (mpi_fh=0x819cb48,
>> buf=0xb77ed008,
>> 3:      count=1048576, datatype=1275070475, status=0xbff99858) at
>> read_all.c:52
>> 3:  #4  0x0804b5a2 in main (argc=1, argv=0xbff99dd4) at row.c:94
>>
>> mpich2-1.0.8/mpich2-1.0.8p1 (Code change 1+2):
>> rank 0 in job 1  aum9.cse.psu.edu_58968   caused collective abort of all
>> ranks
>>  exit status of rank 0: killed by signal 9
>> 0:  Error in ADIOI_Calc_aggregator(): rank_index(-251) >=
>> fd->hints->cb_nodes (4) fd_size=10643444 off=-2147483648
>> 0:  application called MPI_Abort(MPI_COMM_WORLD, 1) - process
>> 0[cli_0]: aborting job:
>> 0:  application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>
>> Thanks and Regards,
>> Christina.
>>
>> On Sat, May 30, 2009 at 10:19 PM, Christina Patrick
>> <christina.subscribes at gmail.com> wrote:
>>>
>>> Thank you very much for giving me the fix. I really appreciate it.
>>> I will try it out and let you know.
>>>
>>> Regards,
>>> Christina.
>>>
>>> On Sat, May 30, 2009 at 3:03 AM, Wei-keng Liao
>>> <wkliao at ece.northwestern.edu> wrote:
>>>>
>>>> Hi, Christina,
>>>>
>>>> There is another bug at line 970 in the same file.
>>>> The problem has been pointed out by the comments above that line.
>>>> A quick fix is: (starting from line 958)
>>>>
>>>> +    if (file_ptr_type == ADIO_INDIVIDUAL)
>>>> +        fd->fp_ind =
>>>> file_offsets[file_list_count-1]+file_lengths[file_list_count-1];
>>>>  ADIOI_Free(file_offsets);
>>>>  ADIOI_Free(file_lengths);
>>>>
>>>> -    /* Other ADIO routines will convert absolute bytes into counts of
>>>> datatypes */
>>>> -    /* when incrementing fp_ind, need to also take into account the
>>>> file
>>>> type:
>>>> -     * consider an N-element 1-d subarray with a lb and ub: (
>>>> |---xxxxx-----|
>>>> -     * if we wrote N elements, offset needs to point at beginning of
>>>> type,
>>>> not
>>>> -     * at empty region at offset N+1) */
>>>> -    if (file_ptr_type == ADIO_INDIVIDUAL) {
>>>> -        /* this is closer, but still incorrect for the cases where a
>>>> small
>>>> -         * amount of a file type is "leftover" after a write */
>>>> -        fd->fp_ind = disp + flat_file->indices[j] +
>>>> -            ((ADIO_Offset)n_filetypes)*filetype_extent;
>>>> -    }
>>>>
>>>>
>>>> So, together with the earlier fix at line 477, I can run your test code
>>>> even with MPI_Type_create_subarray() and #define COLL_BUFSIZE (8388608)
>>>>
>>>> Wei-keng
>>>>
>>>>
>>>>
>>>> On May 29, 2009, at 11:38 PM, Wei-keng Liao wrote:
>>>>
>>>>> Hi, Christina,
>>>>>
>>>>> Can you try the following temporary fix? I believe there is a bug
>>>>> in adio/ad_pvfs2/ad_pvfs2_read.c, line 477. Please replace
>>>>>             file_lengths[0] = st_frd_size;
>>>>> with
>>>>>             file_lengths[0] = ADIOI_MIN(st_frd_size, bufsize);
>>>>>
>>>>>
>>>>> I tested you program with this fix and it ran fine.
>>>>> (I used mpich2-1.0.8p1)
>>>>>
>>>>> Wei-keng
>>>>>
>>>>>
>>>>>
>>>>> On May 28, 2009, at 1:28 PM, Christina Patrick wrote:
>>>>>
>>>>>> Yeah sure. Please go ahead.
>>>>>> In case you'll are able to figure out where the problem lies, please
>>>>>> do let me know so that I can download the patch.
>>>>>> Also, could you please tell me how to download the CVS version? And,
>>>>>> is the problem going away with the latest mpich CVS or pvfs CVS
>>>>>> versions?
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Christina.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, May 28, 2009 at 8:43 AM, Rob Latham <robl at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>> On Wed, May 27, 2009 at 06:04:28PM -0400, Christina Patrick wrote:
>>>>>>>>
>>>>>>>> Thank you very much. Since the API MPI_Type_create_subarray() is
>>>>>>>> very
>>>>>>>> easy to use as compared to other API's used to create data types in
>>>>>>>> MPI, I would really appreciate a solution to this problem.
>>>>>>>> FYI: I am using pvfs 2.8.0.
>>>>>>>> I will check it out with pvfs2.8.1 and keep you posted.
>>>>>>>
>>>>>>> I think 2.8.1 will also crash.  You will probably have to download
>>>>>>> the
>>>>>>> CVS version if you want things to work right now.  We can work on
>>>>>>> figuring out just what changed between 2.8.1 and recent CVS.
>>>>>>>
>>>>>>> Can we add your program to the PVFS testsuite?
>>>>>>>
>>>>>>> ==rob
>>>>>>>
>>>>>>> --
>>>>>>> Rob Latham
>>>>>>> Mathematics and Computer Science Division
>>>>>>> Argonne National Lab, IL USA
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>
>
>


More information about the mpich-discuss mailing list