nonblocking write gets stuck

Wei-keng Liao wkliao at eecs.northwestern.edu
Thu Aug 29 11:38:08 CDT 2019


I notice the followings from your codes.

grank is produced from comm1 in line 68
68           call mpi_comm_rank(comm1, grank, err)

But when creating a new file, comm2 is used.
111           if(grank .eq. 0) then
112             err = nfmpi_create(comm2, filename, cmode, info, ncid)

All collective I/O subroutines, such as nfmpi_create, require all
processes in the communicator to participate (in this case, all
processes in comm2.)

Please explain what you are trying to do.

Wei-keng

> On Aug 29, 2019, at 9:16 AM, 刘壮 via parallel-netcdf <parallel-netcdf at lists.mcs.anl.gov> wrote:
> 
> Hi:
> 
>  I have got a problem when using the nonblocking-write function in pnetcdf. The problem seems
> very strange, my program gets stuck in the function "nfmpi_wait_all". 
>  However, if all the outputing processes are running on one node, the problem will go away. And
> I have test my program on several machines, only one of them has this problem. 
>  The attached file is a simplified example of my program, which also has this problem. The files
> in "Start" and "Count" directories are the "starts" and "counts" for the outputing processes. To 
> see this problem, one can use 41~49 mpi processes to run this program (if your machine has more 
> than 50 processors on one node, please modify "group_size" to larger numbers and run the program 
> using 4*group_size+1~5*group_size-1 processors, to make sure that the outputing processes are 
> running on at least two nodes).
>  Suggestions are repected. Thank you very much!
> 
> Best,
> Zhuang
> <test.tar.gz>



More information about the parallel-netcdf mailing list