Problem on Blue Gene/P

Julien Bodart julien.bodart at gmail.com
Mon Jun 15 12:57:45 CDT 2009


Thanks everybody for your help.

I am afraid I don't get the point "your code is defining netcdf variables
and attributes in
a slightly different way on some MPI processes than others"... depending on
what?

Another test I could try is to unable the check made by ncpmi_enddef if it
is possible, and see which kind of output file I get.
I don't know if it is possible to do it easily without recompiling the
library.

I will try anyway the binary debugging.


2009/6/15 Rob Latham <robl at mcs.anl.gov>

> On Fri, Jun 12, 2009 at 02:19:33PM +0200, Julien Bodart wrote:
> > While it does not create any problems on small cases, bigger cases stop
> at
> > the ncmpi_enddef call on some files (randomly, even with synchronisation
> in
> > between), saying that there is a mismatch between dimensions. After many
> > check it does not seems that there is something wrong with the
> dimensions. I
> > have no idea of how to solve the problem. Did anyone had similar problem?
> > Thanks for your help.
>
> Hi Julien. Wei-keng is right: I know you've checked carefully, but
> some part of your code is defining netcdf variables and attributes in
> a slightly different way on some MPI processes than others.
>
> The main way people debug this is through binary search: comment out
> half of the define-mode portion; if the problem persists, comment out
> half of the remainder, else, try with the other half.
>
> You're not the first to encounter this problem.  Maybe this could be a
> warning and not an error, and maybe we should just have the define
> mode view as rank 0 sees it be the one that wins if there's a
> discrepancy.   I don't know how many people (if any) rely on the
> current behavior to find problems.
>
> ==rob
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/parallel-netcdf/attachments/20090615/5cbc4975/attachment.htm>


More information about the parallel-netcdf mailing list