pnetcdf nfmpi_put_vara_real_all problem

Rob Latham robl at mcs.anl.gov
Wed Sep 18 12:57:14 CDT 2013


On Wed, Sep 18, 2013 at 05:46:42PM +0000, Wong, David wrote:
> Hi Wei-keng,
> 
>      start and count have been declared as:
> 
>           NFMPI_OFFSET :: start(4), count(4)
> 
> I have tested in a smaller version of code and it worked. After I put it in our numerical model and it crashed. Any other thought? Please advise.

We've tried pretty hard to have pnetcdf return errors, not segfault,
in the face of bad inputs.  There may still be work to do on this
front.  

Can you run one or more MPI processes under valgrind and see what it
has to say about that segfault?

==rob


> 
> Cheers,
> David
> 
> ________________________________________
> From: Wei-keng Liao <wkliao at ece.northwestern.edu>
> Sent: Wednesday, September 18, 2013 1:01 PM
> To: Wong, David
> Cc: parallel-netcdf at mcs.anl.gov
> Subject: Re: pnetcdf nfmpi_put_vara_real_all problem
> 
> Hi, David,
> 
> Could you send us the code fragment or the program that can reproduce the error?
> 
> Just a reminder. The datatype of start and count must be integer*8.
> Similarly, use integer*8 to define dimensions.
> If you have done that already, there must be something else.
> 
> 
> Wei-keng
> 
> On Sep 18, 2013, at 11:09 AM, David Wong wrote:
> 
> > Hi,
> >
> >    I am able to create a file:
> >
> > netcdf pCTM_CONC_1 {
> > dimensions:
> >         cols = 423 ;
> >         rows = 594 ;
> >         lays = 14 ;
> >         time = UNLIMITED ; // (0 currently)
> >         vars = 142 ;
> > variables:
> >         float NO2(time, lays, rows, cols) ;
> >         float NO(time, lays, rows, cols) ;
> >         float O(time, lays, rows, cols) ;
> >
> > The code crashed with a segmentation fault (indicated the following line by traceback option):
> >
> >         stat = nfmpi_put_vara_real_all (loc_pos%fileid, loc_pos%var_id(v), start, count, loc_data)
> >
> > The argument for this call in one of the processor is:
> >
> > loc_pos%fileid = 0
> > loc_pos%var_id(v) = 1
> > start =   1    1    1    1
> > count = 423   50   14    1
> > loc_data (size) = 423     50     14
> >
> > I wonder what is the problem. Please advise.
> >
> > Cheers,
> > David
> >
> >
> 

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA


More information about the parallel-netcdf mailing list