nonblocking write gets stuck
Wei-keng Liao
wkliao at eecs.northwestern.edu
Thu Aug 29 11:38:08 CDT 2019
I notice the followings from your codes.
grank is produced from comm1 in line 68
68 call mpi_comm_rank(comm1, grank, err)
But when creating a new file, comm2 is used.
111 if(grank .eq. 0) then
112 err = nfmpi_create(comm2, filename, cmode, info, ncid)
All collective I/O subroutines, such as nfmpi_create, require all
processes in the communicator to participate (in this case, all
processes in comm2.)
Please explain what you are trying to do.
Wei-keng
> On Aug 29, 2019, at 9:16 AM, 刘壮 via parallel-netcdf <parallel-netcdf at lists.mcs.anl.gov> wrote:
>
> Hi:
>
> I have got a problem when using the nonblocking-write function in pnetcdf. The problem seems
> very strange, my program gets stuck in the function "nfmpi_wait_all".
> However, if all the outputing processes are running on one node, the problem will go away. And
> I have test my program on several machines, only one of them has this problem.
> The attached file is a simplified example of my program, which also has this problem. The files
> in "Start" and "Count" directories are the "starts" and "counts" for the outputing processes. To
> see this problem, one can use 41~49 mpi processes to run this program (if your machine has more
> than 50 processors on one node, please modify "group_size" to larger numbers and run the program
> using 4*group_size+1~5*group_size-1 processors, to make sure that the outputing processes are
> running on at least two nodes).
> Suggestions are repected. Thank you very much!
>
> Best,
> Zhuang
> <test.tar.gz>
More information about the parallel-netcdf
mailing list