Independent write
Roger Ting
rogermht at vpac.org
Sun Mar 28 01:43:50 CST 2004
Hi
After rereading your email, i find the discussion about overlapping I/O
confusing. Your explanation seems to indicate that
multiple processors can write to a single file independently given the
the position that they are writing to is different. This is assuming we
are in independent mode.
My application returns MPI_File_write error when i am doing this.It
seems like if we are in independent mode multiple processor should not
be able to write to a single file even if they are writing to different
position. Some reference indicates that if you so set file view to be
MPI_Comm_Self, that file is only exclusively for that processor.
Please correct me if i am wrong. I am new to parallel i/o.
Regards,
Roger
On Sat, 2004-03-13 at 02:10, Rob Ross wrote:
> On 12 Mar 2004, Roger Ting wrote:
>
> > Does the independent writing coordinate all the processors?
>
> Nope, by definition independent writing is not coordinated.
>
> > I mean i have a netcdf file which each processor will append a new entry
> > at the end of the file. For the append operation i use independent mode
> > writing operation. It seems like if processor 1 appends an entry at
> > position ith and processor 2 also wants to append another entry to the
> > file, it will overwrite the entry at position ith because it doesn't
> > realise the processor has already append an entry there.
>
> To implement an append mode, there would have to be some sort of
> communication between processes that kept everyone up to date about what
> entry was the "last" one, and some mechanism for ensuring that only one
> process got to write to that position.
>
> That is a generally difficult thing to implement in a low-overhead,
> scalable way, regardless of the API.
>
> Luckily the netCDF API doesn't include this sort of thing, so we don't
> have to worry about it. What functions were you using to try to do this?
>
> [ Goes and looks at next email. ]
>
> Ok. Even *if* the nfmpi_inq_dimlen were returning the accurate value
> (which it may or may not be), there would be a race condition that is
> unavoidable. Any one of your processes can see the same value and decide
> to write there. That's not a PnetCDF interface deficiency; you're just
> trying to do something that you shouldn't try to do without external
> synchronization.
>
> > Is there a way around this? Ideally, each processor can just append to
> > the file at position ith without worrying that another processor has
> > just already written to that position.
>
> Again, even if there were a way to do this in PnetCDF (which I do not
> think that there is), it would not be high performance.
>
> I would have to know a little more about your application to know how you
> could better perform this operation, but here are some possible solutions.
>
> If your application has clear I/O and computation phases, I would suggest
> using collective I/O rather than independent I/O. You could have your
> processes communicate with each other regarding the # of records that they
> want to write, partition up the space, and perform a collective write of
> all records without trouble. MPI_Allgather would be a nice way to do the
> communication of # of records in a scalable way.
>
> If your application truly has lots of independent processes writing to the
> file, I suggest using MPI to pass a token between processes that specifies
> that a given process is done performing I/O and includes the next entry
> number to write to. Then processes could cache values to write, use
> MPI_Irecv to post a nonblocking recv for the token, and MPI_Test to see if
> they've gotten it when they hit convenient points. Not trivial, but it
> would turn the pattern into something deterministic, and you would end up
> with better overall performance from aggregating the writes of the
> records. To get overlap of I/O, a process could immediately pass the
> token on taking into account the # of records that it has to write, then
> perform writes, so the next process doesn't have to wait.
>
> The collective I/O approach is going to get better performance, especially
> at scale.
>
> There is no way to accurately get the dimension of a variable during
> independent mode because of the rules for use of that function (which I'll
> discuss in response to your next email).
>
> I am happy to further discuss this with you if it would help. I realize
> that the solutions that I have proposed require additional work, and that
> it would be nice if the I/O API just did this stuff for you, but it's just
> not as easy as that. I do think that we can come up with a good solution.
>
> Regards,
>
> Rob
>
More information about the parallel-netcdf
mailing list