[mpich-discuss] Parallel I/O on Lustre: MPI Vs. POSIX

Tue Jun 21 12:47:57 CDT 2011

Maybe there is some synchronization in the close.

Rajeev

On Jun 21, 2011, at 12:38 PM, George Zagaris wrote:

> Hi Rajeev,
> 
> Yes, the previous numbers were with collective MPI I/O.
> Attached is a new figure that includes the non-collective MPI I/O.
> Indeed, these numbers are very close to the POSIX I/O and relatively
> better than buffered I/O on separate files.
> 
> One of the things that is puzzling me is why the close is so expensive?
> 
> Thanks again for all your feedback and help.
> 
> Best,
> George
> 
> On Tue, Jun 21, 2011 at 12:09 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>> Are you still using the _all versions of the MPI functions? The results with MPI read/write and POSIX read/write should be more or less identical in this example. The problem with writing to separate files is that you have to deal with so many of them and you may need some post processing to use them when you run the program with a different number of processes.
>> 
>> Rajeev
>> 
>> 
>> On Jun 21, 2011, at 11:02 AM, George Zagaris wrote:
>> 
>>> Dear Rajeev,
>>> 
>>> Thank you very much for your feedback.
>>> 
>>> I followed your suggestion and implemented benchmarks (also attached) with POSIX
>>> open/read/write calls writing to both separate files and a shared file.
>>> 
>>> A summary of the results is also given in the attached chart_2.png.
>>> These measurements
>>> were obtained with 32 MPI processes (4 nodes x 8 cores/node) where the
>>> stripe count (number of OSTs)
>>> is 32 and the stripe size is 32MB. Moreover, each process writes 32MB,
>>> hence the data is
>>> stripe aligned and since the number of OSTs is the same as the number
>>> of I/O  processes I would not
>>> expect to see any performance degradation due to file-system contention.
>>> 
>>> The results appear to favor unbuffered I/O to separate files as the
>>> best strategy. I am wondering if this
>>> premise will hold as the data size grows larger. What would be the
>>> reasons for not choosing this
>>> strategy for large scale I/O? Any thoughts?
>>> 
>>> I sincerely thank you for all your time and help.
>>> 
>>> Best Regards,
>>> George
>>> 
>>> 
>>>> Message: 3
>>>> Date: Mon, 20 Jun 2011 16:01:09 -0500
>>>> From: Rajeev Thakur <thakur at mcs.anl.gov>
>>>> Subject: Re: [mpich-discuss] Parallel I/O on Lustre: MPI Vs. POSIX
>>>> To: mpich-discuss at mcs.anl.gov
>>>> Message-ID: <374DEFCF-7C41-43E7-B43F-5D6AD4F8077A at mcs.anl.gov>
>>>> Content-Type: text/plain; charset=us-ascii
>>>> 
>>>> Try using the independent I/O functions MPI_File_write_at and MPI_File_read_at instead of the collective ones for this access pattern (large contiguous blocks). Also, the closest POSIX functions to compare with are open/read/write instead of fopen/fread/fwrite. And you can write to a shared file with POSIX I/O as well (open/read/write) for a more equal comparsion.
>>>> 
>>>> Rajeev
>>>> 
>>>> 
>>> <POSIXSeparateFile.cxx><POSIXSharedFile.cxx><chart_2.png>
>> 
>> 
> <chart_2_2.png>