nonblocking write gets stuck
刘壮
liuzhuang at lsec.cc.ac.cn
Thu Aug 29 12:22:47 CDT 2019
Hi Wei-keng,
Thanks very much for your reply.
I am trying to use part of mpi processes to do the output for my program.
For example, if "group_size=10", and the total number of running processes
is 41, then I want to use processes "0, 10, 20, 30, 40" to do the output, which
have the feather "grank=0". To create the nc file, I know that all the output
processes should be in the same communicator, so I split the MPI_COMM_WORLD
to comm2, then the output processes "0, 10, 20, 30, 40" are in the same comm2.
Am I misusing some wrong interface in mpi or pnetcdf?
Best
> -----原始邮件-----
> 发件人: "Wei-keng Liao" <wkliao at eecs.northwestern.edu>
> 发送时间: 2019-08-30 00:38:08 (星期五)
> 收件人: "刘壮" <liuzhuang at lsec.cc.ac.cn>
> 抄送: parallel-netcdf at lists.mcs.anl.gov
> 主题: Re: nonblocking write gets stuck
>
> I notice the followings from your codes.
>
> grank is produced from comm1 in line 68
> 68 call mpi_comm_rank(comm1, grank, err)
>
> But when creating a new file, comm2 is used.
> 111 if(grank .eq. 0) then
> 112 err = nfmpi_create(comm2, filename, cmode, info, ncid)
>
> All collective I/O subroutines, such as nfmpi_create, require all
> processes in the communicator to participate (in this case, all
> processes in comm2.)
>
> Please explain what you are trying to do.
>
> Wei-keng
>
> > On Aug 29, 2019, at 9:16 AM, 刘壮 via parallel-netcdf <parallel-netcdf at lists.mcs.anl.gov> wrote:
> >
> > Hi:
> >
> > I have got a problem when using the nonblocking-write function in pnetcdf. The problem seems
> > very strange, my program gets stuck in the function "nfmpi_wait_all".
> > However, if all the outputing processes are running on one node, the problem will go away. And
> > I have test my program on several machines, only one of them has this problem.
> > The attached file is a simplified example of my program, which also has this problem. The files
> > in "Start" and "Count" directories are the "starts" and "counts" for the outputing processes. To
> > see this problem, one can use 41~49 mpi processes to run this program (if your machine has more
> > than 50 processors on one node, please modify "group_size" to larger numbers and run the program
> > using 4*group_size+1~5*group_size-1 processors, to make sure that the outputing processes are
> > running on at least two nodes).
> > Suggestions are repected. Thank you very much!
> >
> > Best,
> > Zhuang
> > <test.tar.gz>
More information about the parallel-netcdf
mailing list