[mpich-discuss] Performance issue when writing large files with MPI-IO/ROMIO and OrangeFS (PVFS)
Louis-Claude Canon
louis-claude.canon at inria.fr
Fri Mar 16 03:41:39 CDT 2012
On 03/15/2012 10:16 PM, Rob Latham wrote:
> On Wed, Mar 14, 2012 at 06:18:34PM +0100, Louis-Claude Canon wrote:
>> I am seeing significant variability and low performance when writing
>> large files, whereas it is stable with small ones (the threshold
>> seems to be around 100 MB) with one server and one client.
>> IOR -a MPIIO -i 10 -o pvfs2:/mnt/pvfs2/iortest -t 10000000 -b 10000000
>> # 10 MB
>> Operation Max (MiB) Min (MiB) Mean (MiB) Std Dev
>> write 92.06 88.43 91.61 1.06
>> IOR -a MPIIO -i 10 -o pvfs2:/mnt/pvfs2/iortest -t 200000000 -b 200000000
>> # 200 MB
>> write 94.67 35.72 68.07 26.17
> Writes are funny operations. They may or may not get cached. What
> might be happening is that once you've exceeded a certain size, the
> PVFS servers will start exhausting cache and have to actually write to
> disk.
It was also my assumption that a specific mechanism was performed when
some limit was reached. However, I do not understand why pvfs2-cp is not
affected by this. Also, even if it was, why does it take so long to
write large contiguous files (dd reports bandwidth around 500 MB/s)?
I increased and decreased the values for /proc/sys/vm/dirty_* without
seeing significant impact.
I plan to perform those measurements with the POSIX interface and with
Lustre. I will post the results if anything seems relevant.
> Doesn't IOR report a bandwith that also includes open/close times? If
> so, I suspect you will get more consistent values that way.
IOR includes indeed the time to open and close files when computing the
bandwidth. But these times are quite stable, so the instability comes
from the call to MPI_File_write. With other measurements, I observed
instability even with small files when using MPI_File_sync before
closing the file.
Louis
>> I am not sure if it is related to OrangeFS or ROMIO. When I use
>> pvfs2-cp with a file of size 2GB, the bandwidth is correct and
>> stable, which suggests that it comes from ROMIO:
>> dd if=/dev/zero of=/tmp/test bs=1000000 count=2000
>> pvfs2-cp -t /tmp/test /mnt/pvfs2/test
>> Wrote 2000000000 bytes in 19.527506 seconds. 97.674973 MB/seconds
>>
>> On the other hand when I enable TroveSyncData or when I put
>> TroveMethod to directio with OrangeFS, the variability disappears.
> This setting further makes me suspect the server-size VFS cache: both
> those settings (in different ways) make the VFS cache irrelevant.
>
> ==rob
More information about the mpich-discuss
mailing list