Problem on Blue Gene/P

Rob Latham robl at mcs.anl.gov
Mon Jun 15 10:59:43 CDT 2009


On Fri, Jun 12, 2009 at 02:19:33PM +0200, Julien Bodart wrote:
> While it does not create any problems on small cases, bigger cases stop at
> the ncmpi_enddef call on some files (randomly, even with synchronisation in
> between), saying that there is a mismatch between dimensions. After many
> check it does not seems that there is something wrong with the dimensions. I
> have no idea of how to solve the problem. Did anyone had similar problem?
> Thanks for your help.

Hi Julien. Wei-keng is right: I know you've checked carefully, but
some part of your code is defining netcdf variables and attributes in
a slightly different way on some MPI processes than others.   

The main way people debug this is through binary search: comment out
half of the define-mode portion; if the problem persists, comment out
half of the remainder, else, try with the other half.

You're not the first to encounter this problem.  Maybe this could be a
warning and not an error, and maybe we should just have the define
mode view as rank 0 sees it be the one that wins if there's a
discrepancy.   I don't know how many people (if any) rely on the
current behavior to find problems.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the parallel-netcdf mailing list