[mpich-discuss] Parallel I/O on Lustre: MPI Vs. POSIX
George Zagaris
george.zagaris at kitware.com
Tue Jun 21 13:17:48 CDT 2011
Hi Rajeev,
I agree that there is synchronization in MPI_File_close(). Let me re-phrase
my question.
What I am observing is that when I open a file to write with:
MPI_File_open(
MPI_COMM_WORLD, "mpitestdata.dat",
MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL,
&fhandle );
MPI_File_close() takes ~ 0.6s.
In contrast, when I open the file to read with:
MPI_File_open(
MPI_COMM_WORLD, "mpitestdata.dat",
MPI_MODE_RDONLY, MPI_INFO_NULL,
&fhandle );
MPI_File_close() takes ~ 0.00499s
Any thoughts? Is there a reason that MPI_File_close() on handle that was
opened for writing would take much longer?
Thank you very much for your input.
Best Regards,
George
On Tue, Jun 21, 2011 at 1:47 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> Maybe there is some synchronization in the close.
>
> Rajeev
>
> On Jun 21, 2011, at 12:38 PM, George Zagaris wrote:
>
>> Hi Rajeev,
>>
>> Yes, the previous numbers were with collective MPI I/O.
>> Attached is a new figure that includes the non-collective MPI I/O.
>> Indeed, these numbers are very close to the POSIX I/O and relatively
>> better than buffered I/O on separate files.
>>
>> One of the things that is puzzling me is why the close is so expensive?
>>
>> Thanks again for all your feedback and help.
>>
>> Best,
>> George
>>
>> On Tue, Jun 21, 2011 at 12:09 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>>> Are you still using the _all versions of the MPI functions? The results with MPI read/write and POSIX read/write should be more or less identical in this example. The problem with writing to separate files is that you have to deal with so many of them and you may need some post processing to use them when you run the program with a different number of processes.
>>>
>>> Rajeev
>>>
>>>
>>> On Jun 21, 2011, at 11:02 AM, George Zagaris wrote:
>>>
>>>> Dear Rajeev,
>>>>
>>>> Thank you very much for your feedback.
>>>>
>>>> I followed your suggestion and implemented benchmarks (also attached) with POSIX
>>>> open/read/write calls writing to both separate files and a shared file.
>>>>
>>>> A summary of the results is also given in the attached chart_2.png.
>>>> These measurements
>>>> were obtained with 32 MPI processes (4 nodes x 8 cores/node) where the
>>>> stripe count (number of OSTs)
>>>> is 32 and the stripe size is 32MB. Moreover, each process writes 32MB,
>>>> hence the data is
>>>> stripe aligned and since the number of OSTs is the same as the number
>>>> of I/O processes I would not
>>>> expect to see any performance degradation due to file-system contention.
>>>>
>>>> The results appear to favor unbuffered I/O to separate files as the
>>>> best strategy. I am wondering if this
>>>> premise will hold as the data size grows larger. What would be the
>>>> reasons for not choosing this
>>>> strategy for large scale I/O? Any thoughts?
>>>>
>>>> I sincerely thank you for all your time and help.
>>>>
>>>> Best Regards,
>>>> George
>>>>
>>>>
>>>>> Message: 3
>>>>> Date: Mon, 20 Jun 2011 16:01:09 -0500
>>>>> From: Rajeev Thakur <thakur at mcs.anl.gov>
>>>>> Subject: Re: [mpich-discuss] Parallel I/O on Lustre: MPI Vs. POSIX
>>>>> To: mpich-discuss at mcs.anl.gov
>>>>> Message-ID: <374DEFCF-7C41-43E7-B43F-5D6AD4F8077A at mcs.anl.gov>
>>>>> Content-Type: text/plain; charset=us-ascii
>>>>>
>>>>> Try using the independent I/O functions MPI_File_write_at and MPI_File_read_at instead of the collective ones for this access pattern (large contiguous blocks). Also, the closest POSIX functions to compare with are open/read/write instead of fopen/fread/fwrite. And you can write to a shared file with POSIX I/O as well (open/read/write) for a more equal comparsion.
>>>>>
>>>>> Rajeev
>>>>>
>>>>>
>>>> <POSIXSeparateFile.cxx><POSIXSharedFile.cxx><chart_2.png>
>>>
>>>
>> <chart_2_2.png>
>
>
More information about the mpich-discuss
mailing list