Independent write

Sun Mar 28 01:43:50 CST 2004

Hi 
	After rereading your email, i find the discussion about overlapping I/O
confusing. Your explanation seems to indicate that 
multiple processors can write to a single file independently given the 
the position that they are writing to is different. This is assuming we
are in independent mode. 
	My application returns MPI_File_write error when i am doing this.It
seems like if we are in independent mode multiple processor should not
be able to write to a single file even if they are writing to different
position. Some reference indicates that if you so set file view to be
MPI_Comm_Self, that file is only exclusively for that processor. 
	Please correct me if i am wrong. I am new to parallel i/o.

Regards,

Roger 

On Sat, 2004-03-13 at 02:10, Rob Ross wrote:
> On 12 Mar 2004, Roger Ting wrote:
> 
> > Does the independent writing coordinate all the processors?
> 
> Nope, by definition independent writing is not coordinated.
> 
> > I mean i have a netcdf file which each processor will append a new entry
> > at the end of the file. For the append operation i use independent mode
> > writing operation. It seems like if processor 1 appends an entry at
> > position ith  and processor 2 also wants to append another entry to the
> > file, it will overwrite the entry at position ith because it doesn't
> > realise the processor has already append an entry there. 
> 
> To implement an append mode, there would have to be some sort of 
> communication between processes that kept everyone up to date about what 
> entry was the "last" one, and some mechanism for ensuring that only one 
> process got to write to that position.
> 
> That is a generally difficult thing to implement in a low-overhead, 
> scalable way, regardless of the API.
> 
> Luckily the netCDF API doesn't include this sort of thing, so we don't
> have to worry about it.  What functions were you using to try to do this?
> 
> [ Goes and looks at next email. ]
> 
> Ok.  Even *if* the nfmpi_inq_dimlen were returning the accurate value 
> (which it may or may not be), there would be a race condition that is 
> unavoidable.  Any one of your processes can see the same value and decide 
> to write there.  That's not a PnetCDF interface deficiency; you're just 
> trying to do something that you shouldn't try to do without external 
> synchronization.
> 
> > Is there a way around this? Ideally, each processor can just append to
> > the file at position ith without worrying that another processor has
> > just already written to that position. 
> 
> Again, even if there were a way to do this in PnetCDF (which I do not 
> think that there is), it would not be high performance.
> 
> I would have to know a little more about your application to know how you 
> could better perform this operation, but here are some possible solutions.
> 
> If your application has clear I/O and computation phases, I would suggest 
> using collective I/O rather than independent I/O.  You could have your 
> processes communicate with each other regarding the # of records that they 
> want to write, partition up the space, and perform a collective write of 
> all records without trouble.  MPI_Allgather would be a nice way to do the 
> communication of # of records in a scalable way.
> 
> If your application truly has lots of independent processes writing to the
> file, I suggest using MPI to pass a token between processes that specifies
> that a given process is done performing I/O and includes the next entry
> number to write to.  Then processes could cache values to write, use 
> MPI_Irecv to post a nonblocking recv for the token, and MPI_Test to see if 
> they've gotten it when they hit convenient points.  Not trivial, but it 
> would turn the pattern into something deterministic, and you would end up 
> with better overall performance from aggregating the writes of the 
> records.  To get overlap of I/O, a process could immediately pass the 
> token on taking into account the # of records that it has to write, then 
> perform writes, so the next process doesn't have to wait.
> 
> The collective I/O approach is going to get better performance, especially 
> at scale.
> 
> There is no way to accurately get the dimension of a variable during
> independent mode because of the rules for use of that function (which I'll
> discuss in response to your next email).
> 
> I am happy to further discuss this with you if it would help.  I realize 
> that the solutions that I have proposed require additional work, and that 
> it would be nice if the I/O API just did this stuff for you, but it's just 
> not as easy as that.  I do think that we can come up with a good solution.
> 
> Regards,
> 
> Rob
>