problems writing vars with pnetcdf
Jianwei Li
jianwei at ece.northwestern.edu
Sat Dec 4 11:31:08 CST 2004
Katie,
I found another simple, effective way to walk around your trouble:
Whenever a 0-byte I/O is detected, let that process call the
pnetcdf functions (still collective as before) with parameters
(start=0[hardcoded], edge=0[detected], ...). (It should have
no problem even if [dimsize = 0] now.)
Since the process is writing 0 bytes, I assume what start is doesn't
matter for it. That way, the user is responsible for interpreting
(start=whatever, edge=0[detected], ...) as valid.
Once you agree with that, I think the problem can be solved directly
in the user code which can run correctly even with our old pnetcdf
release (also that saves you some work trying to hack the pnetcdf src).
Regards,
Jianwei
=========================================
Jianwei Li ~
~
Northwestern University ~
2145 Sheridan Rd, ECE Dept. ~
Evanston, IL 60208 ~
~
(847)467-2299 ~
=========================================
On Fri, 3 Dec 2004, Katie Antypas wrote:
> Thanks for the email. I'll try to make that fix.
>
> We had one other idea for a fix that currently doesn't work, but let me
> run it by you. Pnetcdf allows you
> to work in collective and independent data modes. Right now we are doing
> everything in collective mode (ie all put calls end in _all). We thought
> that possibly we could get around this bug by writing the particles out in
> the independent mode. That way a processor with zero particles wouldn't
> make the put_vars call at all and then the syncronization wouldn't get
> messed up.(?)
>
> This seems to be more of the way that hdf5 works for us. We don't write a
> zero length array, instead the processor with zero particles doesn't make
> the h5_write call.
>
> I've been reading the bit of documenation on this which talks very
> briefly about setting MPI_File_set_view as a file handle for collective
> operations and MPI_COMM_SELF as the handler for independent mode.
>
> There is this mysterious line in documentation though, 'It is difficult if not
> impossible in the general case to ensure consistency of access when a
> collection of processes are using multiple MPI_File handles to access the
> same file with mixed independent and collective operations....'
>
> which sounds like this might be a more complicated fix.
>
> any thoughts? do you think using independent mode could fix this?
>
> Katie
>
>
>
> On Fri, 3 Dec 2004, Jianwei Li wrote:
>
> > Sorry for some minor corrections as below:
> >
> >
> > >Hello, Katie,
> > >
> > >Thank you for pointing this out.
> > >I think you found a hidden bug in our PnetCDF implementation in dealing with
> > >zero size I/O.
> > >
> > >For sub-array access, although underlying MPI/MPI-IO can handle "size=0"
> > ^^^^^^^^^
> > It's also the same case as stride subarray access.
> >
> > >gracefully (so can intermediate malloc), the PnetCDF code would check the
> > >(start, edge, dimsize), and it thought that [start+edge > dimsize] was not
> > ^^^^^^^^^^^^^^^^^^^^
> > This should be always invalid,
> > but [start >= dimsize] was
> > handled inappropriately in
> > the coordinate check for
> > [edge==0]
> >
> >
> > >valid even if [edge==0] and returned error like:
> > > "Index exceeds dimension bound".
> > >
> > >Actually, this is also a "bug" in Unidata netCDF-3.5.0, and it returns the same
> > >error message:
> > > "Index exceeds dimension bound"
> > >
> > >Luckily, nobody in serial netcdf world has interest trying to read/write zero
> > >bytes. (though we should point this out to Unidata netcdf developers, or
> > >probably they are watching this message.)
> > >
> > >I agree that this case is inevitable in parallel I/O environment and I will
> > >fix this bug in the next release, but for now I have following quick fix for
> > >whoever met this problem:
> > >
> > > 1. go into the pnetcdf src code: parallel-netcdf/src/lib/mpinetcdf.c
> > > 2. identify all ncmpi_{get/put}_vara[_all], ncmpi_{get/put}_vars[_all]
> > > subroutines. (well, if you only need "vars", you can ignore the
> > > "vara" part for now)
> > > 3. in each of the subroutines, locate code section between (excluding)
> > > set_var{a/s}_fileview and MPI_File_write[_all] function calls:
> > >
> > > set_var{a/s}_fileview
> > >
> > > section{
> > > 4 lines of code calculating nelems/nbytes
> > > other code
> > > }
> > >
> > > MPI_File_write[_all]
> > >
> > > 4. move the 4 lines of nelems/nbytes calculation code out from after
> > > the set_var{a/s}_fileview function call to before it, and move
> > > set_var{a/s}_fileview function call into that section.
> > > 5. After nbytes is calculated, bypass the above section if nbyte==0
> > > using the following sudo-code:
> > >
> > > calculating nelems/nbytes
> > >
> > > if (nbytes != 0) {
> > > set_var{a/s}_fileview
> > > section [without calculating nelems/nbytes]
> > > }
> > >
> > > MPI_File_write[_all]
> > >
> > > 6. Rebuild the pnetCDF library.
> > >
> > >Note: it will only solve this problem and may make "nc_test" in our test
> > >suite miss some originally-expected errors (hence report failures), because
> > >(start, edge=0, dimsize) was invalid if [start>dimsize] but now it is always
> > ^^^^^^^^^^^^^
> > I meant [start>=dimsize]
> >
> >
> > >valid as we'll bypass the boundary check. Actually it's hard to tell if it's
> > >valid or not after all, but it is at least safe to treat it just as VALID.
> > >
> > >Hope it will work for you and everybody.
> > >
> > >Thanks again for the valuable feedbacks and welcome for further comments!
> > >
> > >
> > > Jianwei
> > >
> > >
> > >
> > >>Hi All,
> > >
> > >>
> > >>I'm not sure if this list gets much traffic but here goes. I'm having a
> > >>problem writing out data in parallel for a particular case when there are
> > >>zero elements to write on a given processor.
> > >>
> > >>Let me explain a little better. For a very simple case, a 1 dimensional
> > >>array that we want to write in parallel - we define a dimension say,
> > >>'dim_num_particles' and define a variable, say 'particles' with a unique
> > >>id.
> > >>
> > >>Each processor then writes out its portion of the particles into the
> > >>particles variable with the correct
> > >>starting position and count. As long as each processor has at least one
> > >>particle to write we have absolutely no problems, but quite often in our
> > >>code there are
> > >>processors that have zero particles for a given checkpoint file and thus
> > >>have nothing to write to
> > >>file. This is where we hang.
> > >>
> > >>
> > >>I've tried a couple different hacks to get around this --
> > >>
> > >>* First was to try to write a zero-length array, with the count= zero
> > >> and the offset or starting point = 'dim_num_particles' but that
> > >> returned an error message from the put_vars calls.
> > >> All other offsets I choose returned errors as well, which is
> > >> understandable.
> > >>
> > >>* The second thing I tried was to not write the data at all if there
> > >> were zero particles on a proc. But that hung. After talking to some
> > >> people here they though this also made sense because all procs now would
> > >> not be doing the same task, a problem we've also seen hang hdf5.
> > >>
> > >>-- I can do a really ugly hack by increasing the dim_num_particles to have
> > >>extra room. That way if a proc had zero particles it could write out a
> > >>dummy value. The problem is that messes up our offsets when we need
> > >>to read in the checkpoint file.
> > >>
> > >>
> > >>Has anyone else seen this problem or know a fix to it?
> > >>
> > >>Thanks,
> > >>
> > >>Katie
> > >>
> > >>
> > >>____________________________
> > >>Katie Antypas
> > >>ASC Flash Center
> > >>University of Chicago
> > >>kantypas at flash.uchicago.edu
> > >>
> >
> >
> > Jianwei
> >
> > =========================================
> > Jianwei Li ~
> > ~
> > Northwestern University ~
> > 2145 Sheridan Rd, ECE Dept. ~
> > Evanston, IL 60208 ~
> > ~
> > (847)467-2299 ~
> > =========================================
> >
>
> --
>
More information about the parallel-netcdf
mailing list