Independent write
Rob Ross
rross at mcs.anl.gov
Fri Mar 12 09:10:11 CST 2004
On 12 Mar 2004, Roger Ting wrote:
> Does the independent writing coordinate all the processors?
Nope, by definition independent writing is not coordinated.
> I mean i have a netcdf file which each processor will append a new entry
> at the end of the file. For the append operation i use independent mode
> writing operation. It seems like if processor 1 appends an entry at
> position ith and processor 2 also wants to append another entry to the
> file, it will overwrite the entry at position ith because it doesn't
> realise the processor has already append an entry there.
To implement an append mode, there would have to be some sort of
communication between processes that kept everyone up to date about what
entry was the "last" one, and some mechanism for ensuring that only one
process got to write to that position.
That is a generally difficult thing to implement in a low-overhead,
scalable way, regardless of the API.
Luckily the netCDF API doesn't include this sort of thing, so we don't
have to worry about it. What functions were you using to try to do this?
[ Goes and looks at next email. ]
Ok. Even *if* the nfmpi_inq_dimlen were returning the accurate value
(which it may or may not be), there would be a race condition that is
unavoidable. Any one of your processes can see the same value and decide
to write there. That's not a PnetCDF interface deficiency; you're just
trying to do something that you shouldn't try to do without external
synchronization.
> Is there a way around this? Ideally, each processor can just append to
> the file at position ith without worrying that another processor has
> just already written to that position.
Again, even if there were a way to do this in PnetCDF (which I do not
think that there is), it would not be high performance.
I would have to know a little more about your application to know how you
could better perform this operation, but here are some possible solutions.
If your application has clear I/O and computation phases, I would suggest
using collective I/O rather than independent I/O. You could have your
processes communicate with each other regarding the # of records that they
want to write, partition up the space, and perform a collective write of
all records without trouble. MPI_Allgather would be a nice way to do the
communication of # of records in a scalable way.
If your application truly has lots of independent processes writing to the
file, I suggest using MPI to pass a token between processes that specifies
that a given process is done performing I/O and includes the next entry
number to write to. Then processes could cache values to write, use
MPI_Irecv to post a nonblocking recv for the token, and MPI_Test to see if
they've gotten it when they hit convenient points. Not trivial, but it
would turn the pattern into something deterministic, and you would end up
with better overall performance from aggregating the writes of the
records. To get overlap of I/O, a process could immediately pass the
token on taking into account the # of records that it has to write, then
perform writes, so the next process doesn't have to wait.
The collective I/O approach is going to get better performance, especially
at scale.
There is no way to accurately get the dimension of a variable during
independent mode because of the rules for use of that function (which I'll
discuss in response to your next email).
I am happy to further discuss this with you if it would help. I realize
that the solutions that I have proposed require additional work, and that
it would be nice if the I/O API just did this stuff for you, but it's just
not as easy as that. I do think that we can come up with a good solution.
Regards,
Rob
More information about the parallel-netcdf
mailing list