segmentation fault

Wei-keng Liao wkliao at ece.northwestern.edu
Fri Aug 3 18:06:42 CDT 2012


Thanks, Jialin. I have fixed those bugs.

As for NFS issue, please check this link.
http://www.mcs.anl.gov/research/projects/mpi/mpich1-old/docs/mpichman-chp4mpd/node44.htm


Wei-keng

On Aug 3, 2012, at 5:15 PM, Liu, Jaln wrote:

> Hi Dr. Liao,
> 
> Thanks, I fixed the problem with your help. I think the source file in pnetcdf website should also be updated,
> 
> here is the original link
> https://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/examples/tutorial/pnetcdf-read-standard.c
> 
> Could you give me more information about why using NFS can lead to inconsistent data? I found in the following link
> http://trac.mcs.anl.gov/projects/parallel-netcdf/browser/trunk/README#L19
> that romio can accept both nfs and pvfs2
> 
> Best Regards,
> Jialin Liu, Ph.D student.
> Computer Science Department
> Texas Tech University
> Phone: 806.742.3513(x241)
> Office:Engineer Center 304
> http://myweb.ttu.edu/jialliu/
> ________________________________________
> From: parallel-netcdf-bounces at lists.mcs.anl.gov [parallel-netcdf-bounces at lists.mcs.anl.gov] on behalf of Wei-keng Liao [wkliao at ece.northwestern.edu]
> Sent: Thursday, August 02, 2012 10:47 PM
> To: parallel-netcdf at mcs.anl.gov
> Subject: Re: segmentation fault
> 
> Hi, Jialin,
> 
> You are mistakenly using dimension ID in count.
>            count[j] = dimids[j];
> 
> Shouldn't it be the following?
>            count[j] = dim_sizes[dimids[j]];
> 
> Also, please avoid using NFS when doing parallel I/O.
> It can cause inconsistent data and unexpected results.
> 
> Wei-keng
> 
> On Aug 2, 2012, at 8:49 PM, Liu, Jaln wrote:
> 
>> Hi,
>> Can anybody help me with the following problem? I appreciate it.
>> 
>> when I run the program in a 16 nodes cluster, I met the segmentation fault error
>> 
>> the code is downloaded from pnetcdf website and is modified a little:
>> source:
>> #include <stdlib.h>
>> #include <mpi.h>
>> #include <pnetcdf.h>
>> #include <stdio.h>
>> 
>> static void handle_error(int status)
>> {
>>   fprintf(stderr, "%s\n", ncmpi_strerror(status));
>>   exit(-1);
>> }
>> 
>> 
>> int main(int argc, char **argv) {
>> 
>>   int rank, nprocs;
>>   int ret, varid,ncfile, ndims, nvars, ngatts, unlimited;
>>   int var_ndims, var_natts;;
>>   MPI_Offset *dim_sizes, var_size;
>>   MPI_Offset *start, *count;
>> 
>>   char varname[NC_MAX_NAME+1];
>>   int dimids[NC_MAX_VAR_DIMS];
>>   nc_type type;
>> 
>>   int i, j;
>> 
>>   int *data;
>> 
>>   MPI_Init(&argc, &argv);
>> 
>>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>   MPI_Comm_size(MPI_COMM_WORLD, &nprocs);
>>   char * FILE_NAME="nfs:output.nc";
>>   ret = ncmpi_open(MPI_COMM_WORLD, FILE_NAME, NC_NOWRITE, MPI_INFO_NULL,
>>           &ncfile);
>>   if (ret != NC_NOERR) handle_error(ret);
>> 
>> 
>>   ret = ncmpi_inq(ncfile, &ndims, &nvars, &ngatts, &unlimited);
>>   if (ret != NC_NOERR) handle_error(ret);
>> 
>>   dim_sizes = calloc(ndims, sizeof(MPI_Offset));
>> 
>>   for(i=0; i<ndims; i++)  {
>>       ret = ncmpi_inq_dimlen(ncfile, i, &(dim_sizes[i]) );
>>       if (ret != NC_NOERR) handle_error(ret);
>>   }
>> 
>>   for(i=0; i<nvars; i++) {
>>      ret = ncmpi_inq_var(ncfile, i, varname, &type, &var_ndims, dimids,
>>               &var_natts);
>>       if (ret != NC_NOERR) handle_error(ret);
>> 
>>       start = calloc(var_ndims, sizeof(MPI_Offset));
>>       count = calloc(var_ndims, sizeof(MPI_Offset));
>> 
>>       start[0] = (dim_sizes[dimids[0]]/nprocs)*rank;
>>       count[0] = (dim_sizes[dimids[0]]/nprocs);
>>       var_size = count[0];
>> 
>>       for (j=1; j<var_ndims; j++) {
>>           start[j] = 0;
>>           count[j] = dimids[j];
>>           var_size *= count[j];
>>       }
>> 
>>       switch(type) {
>>           case NC_INT:
>>               data = calloc(var_size, sizeof(int));
>> 
>>               ret = ncmpi_get_vara_int_all(ncfile, i, start, count, data);
>> 
>>               if (ret != NC_NOERR) handle_error(ret);
>> 
>>               break;
>>           default:
>>               /* we can do this for all the known netcdf types but this
>>                * example is already getting too long  */
>>               fprintf(stderr, "unsupported NetCDF type \n");
>>       }
>> 
>>       free(start);
>>       free(count);
>>       if (data != NULL) free(data);
>> 
>> }
>>   ret = ncmpi_close(ncfile);
>>   if (ret != NC_NOERR) handle_error(ret);
>> 
>>   MPI_Finalize();
>>   return 0;
>> }
>> 
>> the file output.nc:
>> netcdf output {
>> dimensions:
>>   d1 = 16 ;
>> variables:
>>   int v1(d1) ;
>>   int v2(d1) ;
>> 
>> // global attributes:
>>       :string = "Hello World\n",
>>           "" ;
>> data:
>> 
>> v1 = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ;
>> 
>> v2 = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ;
>> }
>> 
>> Compiling command:
>> mpicc -o preadsegfault preadsegfault.c
>> Run the program:
>> mpirun -np 4 ./preadsegfault
>> 
>> 
>> Best Regards,
>> Jialin Liu, Ph.D student.
>> Computer Science Department
>> Texas Tech University
>> Phone: 806.742.3513(x241)
>> Office:Engineer Center 304
>> http://myweb.ttu.edu/jialliu/
>> <preadsegfault.c><output.nc>
> 



More information about the parallel-netcdf mailing list