Using parallel write with subset of processors

Tue May 10 10:54:34 CDT 2022

On Mon, 2022-05-09 at 12:51 -0700, Pascale Garaud wrote:
> Is there a best way to do that efficiently? The code currently uses a
> collective  PUT_VAR_ALL to write the 3D dataset to file, but that
> would not work for the slice (and hangs when I try). 

These collective I/O calls require all processes to participate in the call... but not all processes need to have data.

For proceses that do not have any data, you can set the 'count': Consider the common "put_vara_float_all" call for one example:

int ncmpi_put_vara_float_all(int ncid, int varid, const MPI_Offset *start,
                 const MPI_Offset *count, const float *op);

that 'count' parameter can just be an N dimensional array of 0 for the processes with no data

> I could just copy the whole data for the slice into a single
> processor, and then do an "independent" write for that processor, but
> that doesn't seem to be very efficient. 

indeed! please don't do this

> I tried to understand how to use IPUT instead, but I am very confused
> about the syntax / procedure, especially given that all of the
> examples I have seen end up using all processors for the write. 

IPUT is a fun optimization.  Once you get the hang of the "blocking" versions, revisit the "non-blocking" routines, especially if you have writes to multiple variables.

==rob