[mpich-discuss] Facing problem while using MPI_File_set_view

Wei-keng Liao wkliao at ece.northwestern.edu
Sun May 31 01:23:21 CDT 2009


Christina,

I attached my patch for mpich2-1.0.8p1 and the test codes modified from
yours with only changes to COLL_BUFSIZE and fname.) I ran it with
command "mpiexec -l -n 4 a.out" and had no error at all. From your
debug printout, you seemed not running the same codes. Can you give
my code a try first?

My pvfs2 is pvfs-2.8.1 and I don't think it is pvfs2's problem.

Wei-keng

-------------- next part --------------
A non-text attachment was scrubbed...
Name: wkl.patch
Type: application/octet-stream
Size: 3162 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090531/83b7a7c7/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Christina.c
Type: application/octet-stream
Size: 2440 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20090531/83b7a7c7/attachment-0001.obj>
-------------- next part --------------




On May 31, 2009, at 12:26 AM, Christina Patrick wrote:

> Hi Wei-keng,
>
> I tried the first code change (alone) on code base 2-1.0.8, but that
> gave me a segfault.
> So, I included the first and second code change and tested it on code
> bases 2-1.0.8 and 2-1.0.8p1, but with these code changes in place
> ADIOI_Calc_aggregator() calls MPI_Abort().
>
> I think that you may have some other code change in addition to the
> above mentioned. I am including the error messages that I mentioned
> above. I am using pvfs-2.8.0. Which version of pvfs are you using?
>
> mpich2-1.0.8 (Code change 1):
> 3:  Program received signal SIGSEGV, Segmentation fault.
> 3:  0x08057c22 in ADIOI_Calc_my_off_len (fd=0x819cb48,  
> bufcount=1048576,
> 3:      datatype=1275070475, file_ptr_type=101, offset=-2147483648,
> 3:      offset_list_ptr=0xbff9972c, len_list_ptr=0xbff99720,
> 3:      start_offset_ptr=0xbff996f8, end_offset_ptr=0xbff996f0,
> 3:      contig_access_count_ptr=0xbff99740) at ad_read_coll.c:416
> 3:  416                 offset_list[k] = off;
> 3:  (gdb) 3:  (gdb) bt
> 3:  #0  0x08057c22 in ADIOI_Calc_my_off_len (fd=0x819cb48,  
> bufcount=1048576,
> 3:      datatype=1275070475, file_ptr_type=101, offset=-2147483648,
> 3:      offset_list_ptr=0xbff9972c, len_list_ptr=0xbff99720,
> 3:      start_offset_ptr=0xbff996f8, end_offset_ptr=0xbff996f0,
> 3:      contig_access_count_ptr=0xbff99740) at ad_read_coll.c:416
> 3:  #1  0x08058ef0 in ADIOI_GEN_ReadStridedColl (fd=0x819cb48,  
> buf=0xb77ed008,
> 3:      count=1048576, datatype=1275070475, file_ptr_type=101,  
> offset=0,
> 3:      status=0xbff99858, error_code=0xbff997b8) at ad_read_coll.c:94
> 3:  #2  0x080543ee in MPIOI_File_read_all (mpi_fh=0x819cb48, offset=0,
> 3:      file_ptr_type=101, buf=0xb77ed008, count=1048576,  
> datatype=1275070475,
> 3:      myname=0x8169e58 "MPI_FILE_READ_ALL", status=0xbff99858) at
> read_all.c:106
> 3:  #3  0x080544e9 in PMPI_File_read_all (mpi_fh=0x819cb48,  
> buf=0xb77ed008,
> 3:      count=1048576, datatype=1275070475, status=0xbff99858) at  
> read_all.c:52
> 3:  #4  0x0804b5a2 in main (argc=1, argv=0xbff99dd4) at row.c:94
>
> mpich2-1.0.8/mpich2-1.0.8p1 (Code change 1+2):
> rank 0 in job 1  aum9.cse.psu.edu_58968   caused collective abort of  
> all ranks
>  exit status of rank 0: killed by signal 9
> 0:  Error in ADIOI_Calc_aggregator(): rank_index(-251) >=
> fd->hints->cb_nodes (4) fd_size=10643444 off=-2147483648
> 0:  application called MPI_Abort(MPI_COMM_WORLD, 1) - process
> 0[cli_0]: aborting job:
> 0:  application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>
> Thanks and Regards,
> Christina.
>
> On Sat, May 30, 2009 at 10:19 PM, Christina Patrick
> <christina.subscribes at gmail.com> wrote:
>> Thank you very much for giving me the fix. I really appreciate it.
>> I will try it out and let you know.
>>
>> Regards,
>> Christina.
>>
>> On Sat, May 30, 2009 at 3:03 AM, Wei-keng Liao
>> <wkliao at ece.northwestern.edu> wrote:
>>> Hi, Christina,
>>>
>>> There is another bug at line 970 in the same file.
>>> The problem has been pointed out by the comments above that line.
>>> A quick fix is: (starting from line 958)
>>>
>>> +    if (file_ptr_type == ADIO_INDIVIDUAL)
>>> +        fd->fp_ind =
>>> file_offsets[file_list_count-1]+file_lengths[file_list_count-1];
>>>   ADIOI_Free(file_offsets);
>>>   ADIOI_Free(file_lengths);
>>>
>>> -    /* Other ADIO routines will convert absolute bytes into  
>>> counts of
>>> datatypes */
>>> -    /* when incrementing fp_ind, need to also take into account  
>>> the file
>>> type:
>>> -     * consider an N-element 1-d subarray with a lb and ub: (
>>> |---xxxxx-----|
>>> -     * if we wrote N elements, offset needs to point at beginning  
>>> of type,
>>> not
>>> -     * at empty region at offset N+1) */
>>> -    if (file_ptr_type == ADIO_INDIVIDUAL) {
>>> -        /* this is closer, but still incorrect for the cases  
>>> where a small
>>> -         * amount of a file type is "leftover" after a write */
>>> -        fd->fp_ind = disp + flat_file->indices[j] +
>>> -            ((ADIO_Offset)n_filetypes)*filetype_extent;
>>> -    }
>>>
>>>
>>> So, together with the earlier fix at line 477, I can run your test  
>>> code
>>> even with MPI_Type_create_subarray() and #define COLL_BUFSIZE  
>>> (8388608)
>>>
>>> Wei-keng
>>>
>>>
>>>
>>> On May 29, 2009, at 11:38 PM, Wei-keng Liao wrote:
>>>
>>>> Hi, Christina,
>>>>
>>>> Can you try the following temporary fix? I believe there is a bug
>>>> in adio/ad_pvfs2/ad_pvfs2_read.c, line 477. Please replace
>>>>              file_lengths[0] = st_frd_size;
>>>> with
>>>>              file_lengths[0] = ADIOI_MIN(st_frd_size, bufsize);
>>>>
>>>>
>>>> I tested you program with this fix and it ran fine.
>>>> (I used mpich2-1.0.8p1)
>>>>
>>>> Wei-keng
>>>>
>>>>
>>>>
>>>> On May 28, 2009, at 1:28 PM, Christina Patrick wrote:
>>>>
>>>>> Yeah sure. Please go ahead.
>>>>> In case you'll are able to figure out where the problem lies,  
>>>>> please
>>>>> do let me know so that I can download the patch.
>>>>> Also, could you please tell me how to download the CVS version?  
>>>>> And,
>>>>> is the problem going away with the latest mpich CVS or pvfs CVS
>>>>> versions?
>>>>>
>>>>> Thanks and Regards,
>>>>> Christina.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, May 28, 2009 at 8:43 AM, Rob Latham <robl at mcs.anl.gov>  
>>>>> wrote:
>>>>>>
>>>>>> On Wed, May 27, 2009 at 06:04:28PM -0400, Christina Patrick  
>>>>>> wrote:
>>>>>>>
>>>>>>> Thank you very much. Since the API MPI_Type_create_subarray()  
>>>>>>> is very
>>>>>>> easy to use as compared to other API's used to create data  
>>>>>>> types in
>>>>>>> MPI, I would really appreciate a solution to this problem.
>>>>>>> FYI: I am using pvfs 2.8.0.
>>>>>>> I will check it out with pvfs2.8.1 and keep you posted.
>>>>>>
>>>>>> I think 2.8.1 will also crash.  You will probably have to  
>>>>>> download the
>>>>>> CVS version if you want things to work right now.  We can work on
>>>>>> figuring out just what changed between 2.8.1 and recent CVS.
>>>>>>
>>>>>> Can we add your program to the PVFS testsuite?
>>>>>>
>>>>>> ==rob
>>>>>>
>>>>>> --
>>>>>> Rob Latham
>>>>>> Mathematics and Computer Science Division
>>>>>> Argonne National Lab, IL USA
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>



More information about the mpich-discuss mailing list