how to do nonblocking collective i/o

Wei-keng Liao wkliao at ece.northwestern.edu
Mon Jan 28 18:13:58 CST 2013


Hi, Jialin, please see my in-line response below.

On Jan 28, 2013, at 4:05 PM, Liu, Jaln wrote:

> Hi Rob, 
> 
> Thanks for your answer.
> 
>> You're close. I  bet by the time I finish writing this email Wei-keng
>> will already respond.
> 
> You reminds me of a previous thread in pnetcdf maillist: 
> 'Performance tuning problem with iput_vara_double/wait_all',
> 
> Dr. Liao once mentioned "But concatenating two filetypes will end up with a filetype violating the requirement of monotonic non-decreasing file offsets",
> 
> So I guess, even my code is correctly trying to do non-blocking collective I/O, but it will still result in individually collective I/O, right?
> Is there any way that we can know this before performance test?

This problem has been resolved in SVN r1121 committed on Saturday.
Please give it a try and let me know if you see a problem.


> I have another related question, 
> According to the paper "combining I/O operations for multiple array variables in parallel netCDF", the non-blocking collective i/o is designed for multiple variables access. But I assume it is also useful to optimize multiple subsets access for one variable? just like what I'm trying to do in the code. right?

PnetCDF nonblocking APIs can be used to aggregate requests within a variable and
across variables (also, mixed record and non-record variables). There is an
example program newly added in trunk/examples/column_wise.c that calls multiple
nonblocking writes to a single 2D variable, each request writes a column of the 2D array.


Wei-keng



> Jialin
> 
> 
> 
>> Here are the codes I wrote:
>> 
>>        float ** nb_temp_in=malloc(numcalls*sizeof(float *));
>>        int * request=calloc(numcalls, sizeof(int));
>>        int * status=calloc(numcalls,sizeof(int));
>>        int varasize;
>>        for(j=0;j<numcalls;j++)
>>        {
>>          mpi_count[1]=(j>NLVL)?NLVL:j+1;
>>          varasize=mpi_count[0]*mpi_count[1]*NLAT*NLON;
>>          nb_temp_in[j]=calloc(varasize,sizeof(float));
>>          if (ret = ncmpi_iget_vara(ncid, temp_varid,
>>               mpi_start,mpi_count,nb_temp_in[j],
>>               varasize,MPI_FLOAT,&(request[j])));
>>          if (ret != NC_NOERR) handle_error(ret);
>>        }
>> 
>>        ret = ncmpi_wait_all(ncid, numcalls, request, status);
>>        for (j=0; j<numcalls; j++)
>>         if (status[j] != NC_NOERR) handle_error(status[j]);
>>      }
>> 
>> I have two questions,
>> 1, in the above code, what is right way to parallelize the program?
>> by decomposing the for loop " for(j=0;j<numcalls;j++)"?
> 
> No "right" way, really. Depends on what the reader needs.  Decomposing
> over numcalls is definitely one way.  Or you can decompose over
> 'mpi_start' and 'mpi_count' -- though I personally have to wrestle
> with block decomposition for a while before it's correct.
> 
>> 2, how to do non-blocking collective I/O? is there a function like
>> 'ncmpi_iget_vara_all'?
> 
> you already did it.
> 
> We've iterated over a few nonblocking-pnetcdf approaches over the
> years, but settled on this way:
> - operations are posted independently.
> - One can collectively wait for completion with "ncmpi_wait_all", as
>  you did.
> - If one needs to wait for completion locally due to the nature of the
>  application, one might not get the best performance, but
>  "ncmpi_wait" is still there if the app needs independent I/O
>  completion.
> 
> ==rob
> 
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA



More information about the parallel-netcdf mailing list