[mpich-discuss] Performance issue when writing large files with MPI-IO/ROMIO and OrangeFS (PVFS)

Louis-Claude Canon louis-claude.canon at inria.fr
Fri Mar 16 03:41:39 CDT 2012


On 03/15/2012 10:16 PM, Rob Latham wrote:
> On Wed, Mar 14, 2012 at 06:18:34PM +0100, Louis-Claude Canon wrote:
>> I am seeing significant variability and low performance when writing
>> large files, whereas it is stable with small ones (the threshold
>> seems to be around 100 MB) with one server and one client.
>> IOR -a MPIIO -i 10 -o pvfs2:/mnt/pvfs2/iortest -t 10000000 -b 10000000
>> # 10 MB
>> Operation  Max (MiB)  Min (MiB)  Mean (MiB)   Std Dev
>> write          92.06      88.43       91.61      1.06
>> IOR -a MPIIO -i 10 -o pvfs2:/mnt/pvfs2/iortest -t 200000000 -b 200000000
>> # 200 MB
>> write          94.67      35.72       68.07     26.17
> Writes are funny operations.  They may or may not get cached.  What
> might be happening is that once you've exceeded a certain size, the
> PVFS servers will start exhausting cache and have to actually write to
> disk.

It was also my assumption that a specific mechanism was performed when 
some limit was reached. However, I do not understand why pvfs2-cp is not 
affected by this. Also, even if it was, why does it take so long to 
write large contiguous files (dd reports bandwidth around 500 MB/s)?

I increased and decreased the values for /proc/sys/vm/dirty_* without 
seeing significant impact.

I plan to perform those measurements with the POSIX interface and with 
Lustre. I will post the results if anything seems relevant.

> Doesn't IOR report a bandwith that also includes open/close times?  If
> so, I suspect you will get more consistent values that way.

IOR includes indeed the time to open and close files when computing the 
bandwidth. But these times are quite stable, so the instability comes 
from the call to MPI_File_write. With other measurements, I observed 
instability even with small files when using MPI_File_sync before 
closing the file.

Louis

>> I am not sure if it is related to OrangeFS or ROMIO. When I use
>> pvfs2-cp with a file of size 2GB, the bandwidth is correct and
>> stable, which suggests that it comes from ROMIO:
>> dd if=/dev/zero of=/tmp/test bs=1000000 count=2000
>> pvfs2-cp -t /tmp/test /mnt/pvfs2/test
>> Wrote 2000000000 bytes in 19.527506 seconds. 97.674973 MB/seconds
>>
>> On the other hand when I enable TroveSyncData or when I put
>> TroveMethod to directio with OrangeFS, the variability disappears.
> This setting further makes me suspect the server-size VFS cache: both
> those settings (in different ways) make the VFS cache irrelevant.
>
> ==rob


More information about the mpich-discuss mailing list