[MPICH] slow IOR when using fileview

Mon Jul 2 16:00:07 CDT 2007

This is Cray's implementation, I do not know the internals.

--
Weikuan Yu

-----Original Message-----
From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
Sent: Mon 7/2/2007 4:41 PM
To: Yu, Weikuan
Cc: mpich-discuss at mcs.anl.gov; Canon, Richard Shane; Hodson, Stephen W.; Renaud, William A.; Vetter, Jeffrey S.
Subject: RE: [MPICH] slow IOR when using fileview

Weikuan,

Can you let me know where in ROMIO this fcntl() is called? I wonder why it 
is called when I used a fileview and not when fileview is not used.

Wei-keng

On Mon, 2 Jul 2007, Yu, Weikuan wrote:

> Hi, Wei-keng,
>
> The problem apprears to lie with the costly system call over Cray XT. In
> this case, it is fcntl(). While fcntl() has been avoided for file
> systems like PVFS/PVFS2, and its cost is also negligible on a
> linux-based lustre file system, it does seem to be costly on Cray XT. In
> addtion, given the fact that the MPI-IO source code for Cray XT lies
> with Cray, we need to work with Cray to get a possible fix to this as
> soon as we can. In the mean time, you may try to avoid the use of
> write_all or explore some other alternative.
>
> Thanks again for your report,
> --Weikuan
>
>> -----Original Message-----
>> From: Wei-keng Liao [mailto:wkliao at ece.northwestern.edu]
>> Sent: Monday, July 02, 2007 2:02 PM
>> To: Yu, Weikuan
>> Cc: mpich-discuss at mcs.anl.gov; Canon, Richard Shane; Hodson,
>> Stephen W.; Renaud, William A.
>> Subject: RE: [MPICH] slow IOR when using fileview
>>
>> Weikuan,
>>
>> I found this problem when I ran IOR benchmark. I extracted a
>> simpler code to reproduce this situation (it is provided in
>> my earlier post on this
>> list.) The 10MB is used by this simpler code in order to show
>> the performance difference I mentioned. It does not mean I
>> used only 10 MB in my IOR runs.
>>
>> Wei-keng
>>
>>
>> On Mon, 2 Jul 2007, Yu, Weikuan wrote:
>>
>>>
>>> Thanks for reporting this. I got a similar report from a ticket you
>>> filed at ORNL. I am following up with this thread for folks'
>>> collective attention on this behavior.
>>>
>>> While there are many differences between Cray XT and other
>> platforms
>>> with a linux-based Lustre file system or PVFS, such as caching and
>>> comm library etc, this performance difference between write_all and
>>> write_at_all does not seem to be directly related to them. Besides,
>>> running collective IO with IOR may not seem to be an
>> advisable thing
>>> with the intended pattern of 10MB per file? Could you
>> detail a little
>>> more on the actual need/intention of a file view here, or a breif
>>> description of intended access pattern in your apps?
>>>
>>> --
>>> Weikuan Yu, Ph.D
>>> Future Technologies & Technology Integration Oak Ridge National
>>> Laboratory Oak Ridge, TN 37831-6173
>>> Email: wyu at ornl.gov
>>> http://ft.ornl.gov/~wyu/
>>>
>>>> -----Original Message-----
>>>> From: owner-mpich-discuss at mcs.anl.gov
>>>> [mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Wei-keng Liao
>>>> Sent: Saturday, June 30, 2007 3:03 AM
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Subject: [MPICH] slow IOR when using fileview
>>>>
>>>>
>>>> I am experiencing slow IOR performance on Cray XT3 when using
>>>> fileview option. I extract the code into a simpler version
>>>> (attached). The code compares two collective writes:
>>>> MPI_File_write_all and MPI_File_write_at_all. The former
>> uses an MPI
>>>> fileview and the latter uses explicit file offset. For both cases,
>>>> each process writes 10 MB to a shared file, contiguously,
>>>> non-overlapping, non-interleaved. On the Cray
>>>> XT3 with Lustre file system, the former is extremely
>> slower than the
>>>> latter. Here is an output for using 8 processes:
>>>>
>>>> 2: MPI_File_write_all() time = 4.72 sec
>>>> 3: MPI_File_write_all() time = 4.74 sec
>>>> 6: MPI_File_write_all() time = 4.77 sec
>>>> 1: MPI_File_write_all() time = 4.79 sec
>>>> 7: MPI_File_write_all() time = 4.81 sec
>>>> 0: MPI_File_write_all() time = 4.83 sec
>>>> 5: MPI_File_write_all() time = 4.85 sec
>>>> 4: MPI_File_write_all() time = 4.89 sec
>>>> 2: MPI_File_write_at_all() time = 0.02 sec
>>>> 1: MPI_File_write_at_all() time = 0.02 sec
>>>> 3: MPI_File_write_at_all() time = 0.02 sec
>>>> 0: MPI_File_write_at_all() time = 0.02 sec
>>>> 6: MPI_File_write_at_all() time = 0.02 sec
>>>> 4: MPI_File_write_at_all() time = 0.02 sec
>>>> 7: MPI_File_write_at_all() time = 0.02 sec
>>>> 5: MPI_File_write_at_all() time = 0.02 sec
>>>>
>>>> I tried the same code on other machines and different file systems
>>>> (eg.
>>>> PVFS), and timings for both cases were very close to each
>> other. If
>>>> anyone has access to a Cray XT3 machine, could you please
>> try it and
>>>> let me know?
>>>> Thanks.
>>>>
>>>> Wei-keng
>>>>
>>>
>>
>