problems writing vars with pnetcdf

Fri Dec 3 18:25:06 CST 2004

Katie,

I think your approach can also avoid the bug. But making the originally
collective I/O independent just because of one or two processes with 
0-size I/O will lose some optimizations and also complicate the coding.
Also, when switching from collective to independent I/O back and forth
in pnetcdf, there's some extra coding and performance overhead, I think.

Anyway, I will incorporate my fix into the next release if it's successful.

As to the new release date, We're still working on the derived datatype
support for flexible API and also on put/get_varm functions and may need
a couple of more weeks, so I hope we can put together a 1.0 release by 
the end of this year (otherwise, 1.0-pre1 is achievable, I think).
So you can go ahead and code your stuff in the more efficient way, assuming
my suggested way works for you (otherwise, please let me know).

Regards,
Jianwei

=========================================
 Jianwei Li				~
					~
 Northwestern University		~
 2145 Sheridan Rd, ECE Dept.		~
 Evanston, IL 60208			~
					~
 (847)467-2299				~
=========================================

>
>Thanks for the email.  I'll try to make that fix.  
>
>We had one other idea for a fix that currently doesn't work, but let me 
>run it by you.  Pnetcdf allows you 
>to work in collective and independent data modes.  Right now we are doing 
>everything in collective mode (ie all put calls end in _all).  We thought 
>that possibly we could get around this bug by writing the particles out in 
>the independent mode.  That way a processor with zero particles wouldn't 
>make the put_vars call at all and then the syncronization wouldn't get 
>messed up.(?)
>
>This seems to be more of the way that hdf5 works for us.  We don't write a 
>zero length array, instead the processor with zero particles doesn't make 
>the h5_write call.
>
>I've been reading the bit of documenation on this which talks very 
>briefly about setting MPI_File_set_view as a file handle for collective 
>operations and MPI_COMM_SELF as the handler for independent mode.
>
>There is this mysterious line in documentation though, 'It is difficult if not 
>impossible in the general case to ensure consistency of access when a 
>collection of processes are using multiple MPI_File handles to access the 
>same file with mixed independent and collective operations....'
>
>which sounds like this might be a more complicated fix.
>
>any thoughts? do you think using independent mode could fix this?
>
>Katie
>
>
>
>On Fri, 3 Dec 2004, Jianwei Li wrote:
>
>> 	Sorry for some minor corrections as below:
>> 
>> 
>> >Hello, Katie,
>> >
>> >Thank you for pointing this out. 
>> >I think you found a hidden bug in our PnetCDF implementation in dealing with
>> >zero size I/O.
>> >
>> >For sub-array access, although underlying MPI/MPI-IO can handle "size=0" 
>>      ^^^^^^^^^
>> 	It's also the same case as stride subarray access.
>> 
>> >gracefully (so can intermediate malloc), the PnetCDF code would check the 
>> >(start, edge, dimsize), and it thought that [start+edge > dimsize] was not
>> 					      ^^^^^^^^^^^^^^^^^^^^
>> 					      	This should be always invalid,
>> 					      	but [start >= dimsize] was    		
>> 					      	handled inappropriately in
>> 					      	the coordinate check for
>> 					      	[edge==0]
>> 					
>>  
>> >valid even if [edge==0] and returned error like:
>> >	"Index exceeds dimension bound".
>> >
>> >Actually, this is also a "bug" in Unidata netCDF-3.5.0, and it returns the 
same
>> >error message:
>> >	"Index exceeds dimension bound"
>> >
>> >Luckily, nobody in serial netcdf world has interest trying to read/write 
zero
>> >bytes. (though we should point this out to Unidata netcdf developers, or  
>> >probably they are watching this message.)
>> >
>> >I agree that this case is inevitable in parallel I/O environment and I will 
>> >fix this bug in the next release, but for now I have following quick fix for
>> >whoever met this problem:
>> >
>> >	1. go into the pnetcdf src code: parallel-netcdf/src/lib/mpinetcdf.c
>> >	2. identify all ncmpi_{get/put}_vara[_all], ncmpi_{get/put}_vars[_all]
>> >	   subroutines. (well, if you only need "vars", you can ignore the 
>> >	   "vara" part for now)
>> >	3. in each of the subroutines, locate code section between (excluding)
>> >	   set_var{a/s}_fileview and MPI_File_write[_all] function calls:
>> >	   
>> >	   	set_var{a/s}_fileview
>> >	   	
>> >	   	section{	   		
>> >	   		4 lines of code calculating nelems/nbytes
>> >	   		other code
>> >	   	}
>> >	   	
>> >	   	MPI_File_write[_all]
>> >	   
>> >	4. move the 4 lines of nelems/nbytes calculation code out from after 
>> >	   the set_var{a/s}_fileview function call to before it, and move
>> >	   set_var{a/s}_fileview function call into that section.
>> >	5. After nbytes is calculated, bypass the above section if nbyte==0
>> >	   using the following sudo-code:
>> >	   
>> >	   	calculating nelems/nbytes
>> >	   	
>> >	   	if (nbytes != 0) {
>> >	   		set_var{a/s}_fileview
>> >	   		section [without calculating nelems/nbytes]
>> >	   	}
>> >	   	
>> >	   	MPI_File_write[_all]
>> >	   	
>> >	6. Rebuild the pnetCDF library.
>> >
>> >Note: it will only solve this problem and may make "nc_test" in our test
>> >suite miss some originally-expected errors (hence report failures), because
>> >(start, edge=0, dimsize) was invalid if [start>dimsize] but now it is always 
>> 					  ^^^^^^^^^^^^^
>> 					  I meant [start>=dimsize]
>> 					  
>> 
>> >valid as we'll bypass the boundary check. Actually it's hard to tell if it's
>> >valid or not after all, but it is at least safe to treat it just as VALID.
>> >
>> >Hope it will work for you and everybody.
>> >
>> >Thanks again for the valuable feedbacks and welcome for further comments!
>> >
>> >
>> > Jianwei
>> >  
>> >	   
>> >
>> >>Hi All,
>> >
>> >>
>> >>I'm not sure if this list gets much traffic but here goes.  I'm having a 
>> >>problem writing out data in parallel for a particular case when there are 
>> >>zero elements to write on a given processor.
>> >>
>> >>Let me explain a little better.  For a very simple case, a 1 dimensional 
>> >>array that we want to write in parallel - we define a dimension say, 
>> >>'dim_num_particles' and define a variable, say 'particles' with a unique 
>> >>id.
>> >>
>> >>Each processor then writes out its portion of the particles into the 
>> >>particles variable with the correct 
>> >>starting position and count.  As long as each processor has at least one 
>> >>particle to write we have absolutely no problems, but quite often in our 
>> >>code there are 
>> >>processors that have zero particles for a given checkpoint file and thus 
>> >>have nothing to write to 
>> >>file.  This is where we hang.
>> >>
>> >>
>> >>I've tried a couple different hacks to get around this --
>> >>
>> >>* First was to try to write a zero-length array, with the count= zero 
>> >>  and the offset or starting point = 'dim_num_particles'  but that 
>> >>  returned an error message from the put_vars calls.  
>> >>  All other offsets I choose returned errors as well, which is 
>> >>  understandable.
>> >>
>> >>* The second thing I tried was to not write the data at all if there 
>> >>  were zero particles on a proc.  But that hung.  After talking to some 
>> >>  people here they though this also made sense because all procs now would 
>> >>  not be doing the same task, a problem we've also seen hang hdf5.
>> >>
>> >>-- I can do a really ugly hack by increasing the dim_num_particles to have 
>> >>extra room.  That way if a proc had zero particles it could write out a 
>> >>dummy value.  The problem is that messes up our offsets when we need 
>> >>to read in the checkpoint file.
>> >>
>> >>
>> >>Has anyone else seen this problem or know a fix to it?
>> >>
>> >>Thanks,
>> >>
>> >>Katie
>> >>
>> >>
>> >>____________________________
>> >>Katie Antypas
>> >>ASC Flash Center
>> >>University of Chicago
>> >>kantypas at flash.uchicago.edu
>> >>
>> 
>> 
>> Jianwei
>> 
>> =========================================
>>  Jianwei Li				~
>> 					~
>>  Northwestern University		~
>>  2145 Sheridan Rd, ECE Dept.		~
>>  Evanston, IL 60208			~
>> 					~
>>  (847)467-2299				~
>> =========================================
>> 
>
>--