how to do nonblocking collective i/o
Liu, Jaln
jaln.liu at ttu.edu
Mon Jan 28 16:05:19 CST 2013
Hi Rob,
Thanks for your answer.
>You're close. I bet by the time I finish writing this email Wei-keng
>will already respond.
You reminds me of a previous thread in pnetcdf maillist:
'Performance tuning problem with iput_vara_double/wait_all',
Dr. Liao once mentioned "But concatenating two filetypes will end up with a filetype violating the requirement of monotonic non-decreasing file offsets",
So I guess, even my code is correctly trying to do non-blocking collective I/O, but it will still result in individually collective I/O, right?
Is there any way that we can know this before performance test?
I have another related question,
According to the paper "combining I/O operations for multiple array variables in parallel netCDF", the non-blocking collective i/o is designed for multiple variables access. But I assume it is also useful to optimize multiple subsets access for one variable? just like what I'm trying to do in the code. right?
Jialin
> Here are the codes I wrote:
>
> float ** nb_temp_in=malloc(numcalls*sizeof(float *));
> int * request=calloc(numcalls, sizeof(int));
> int * status=calloc(numcalls,sizeof(int));
> int varasize;
> for(j=0;j<numcalls;j++)
> {
> mpi_count[1]=(j>NLVL)?NLVL:j+1;
> varasize=mpi_count[0]*mpi_count[1]*NLAT*NLON;
> nb_temp_in[j]=calloc(varasize,sizeof(float));
> if (ret = ncmpi_iget_vara(ncid, temp_varid,
> mpi_start,mpi_count,nb_temp_in[j],
> varasize,MPI_FLOAT,&(request[j])));
> if (ret != NC_NOERR) handle_error(ret);
> }
>
> ret = ncmpi_wait_all(ncid, numcalls, request, status);
> for (j=0; j<numcalls; j++)
> if (status[j] != NC_NOERR) handle_error(status[j]);
> }
>
> I have two questions,
> 1, in the above code, what is right way to parallelize the program?
> by decomposing the for loop " for(j=0;j<numcalls;j++)"?
No "right" way, really. Depends on what the reader needs. Decomposing
over numcalls is definitely one way. Or you can decompose over
'mpi_start' and 'mpi_count' -- though I personally have to wrestle
with block decomposition for a while before it's correct.
> 2, how to do non-blocking collective I/O? is there a function like
> 'ncmpi_iget_vara_all'?
you already did it.
We've iterated over a few nonblocking-pnetcdf approaches over the
years, but settled on this way:
- operations are posted independently.
- One can collectively wait for completion with "ncmpi_wait_all", as
you did.
- If one needs to wait for completion locally due to the nature of the
application, one might not get the best performance, but
"ncmpi_wait" is still there if the app needs independent I/O
completion.
==rob
--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
More information about the parallel-netcdf
mailing list