compiling parallel-netcdf with openmpi 1.8 and 1.10

Wei-keng Liao wkliao at eecs.northwestern.edu
Wed Jun 29 14:56:48 CDT 2016


Hi, Jeb

Thanks for sending the error messages. The error you are seeing is due to
a bug in OpenMPI. This has been reported by Carl Ponder @ NVIDIA.
See the discussion thread in (with bug reports to OpenMPI, etc.)
http://lists.mcs.anl.gov/pipermail/parallel-netcdf/2016-May/001853.html

A patch has been added into PnetCDF (also provided below) to bypass this bug.
http://trac.mcs.anl.gov/projects/parallel-netcdf/changeset/2406


--- fill.c	2016-06-29 14:45:31.259724268 -0500
+++ new_fill.c	2016-06-29 14:45:24.466359243 -0500
@@ -653,6 +653,13 @@
     }
     /* j is now the number of valid write segments */
 
+    if (status == NC_NOERR && j == 0) {
+        NCI_Free(noFill);
+        NCI_Free(count);
+        NCI_Free(offset);
+        return NC_NOERR;
+    }
+
     /* allocate one contiguous buffer space for all writes */
     blocklengths = (int*) NCI_Malloc(j * sizeof(int));
     buf = NCI_Malloc(buf_len);



Wei-keng

On Jun 29, 2016, at 1:57 PM, Jeb Baxley wrote:

> here is the error message:
> 
> /usr/global/openmpi/haswell/gcc53/1.10.3/bin/mpicxx -g -O2     -I../../src/lib -I../../src/libcxx -I/tmp/pnetcdf-1.7.0/parallel-netcdf-1.7.0/test/CXX/../common -DHAVE_CONFIG_H  -c /tmp/pnetcdf-1.7.0/parallel-netcdf-1.7.0/test/CXX/test_classic.cpp
> /usr/global/openmpi/haswell/gcc53/1.10.3/bin/mpicxx -g -O2 -o test_classic test_classic.o  -L../common /tmp/pnetcdf-1.7.0/build/src/lib/libpnetcdf.a -ltestutils  
> ./nctst        ./testfile.nc
> [santee-06:43015] *** Process received signal ***
> [santee-06:43015] Signal: Segmentation fault (11)
> [santee-06:43015] Signal code: Address not mapped (1)
> [santee-06:43015] Failing at address: (nil)
> [santee-06:43015] [ 0] /lib64/libpthread.so.0(+0xf7e0)[0x7fc1d4fa27e0]
> [santee-06:43015] [ 1] /usr/global/openmpi/haswell/gcc53/1.10.3/lib/openmpi/mca_io_romio.so(ADIOI_Flatten+0x113f)[0x7fc1ce0bd67f]
> [santee-06:43015] [ 2] /usr/global/openmpi/haswell/gcc53/1.10.3/lib/openmpi/mca_io_romio.so(ADIOI_Flatten_datatype+0xe1)[0x7fc1ce0be6f1]
> [santee-06:43015] [ 3] /usr/global/openmpi/haswell/gcc53/1.10.3/lib/openmpi/mca_io_romio.so(ADIO_Set_view+0x1fd)[0x7fc1ce0b452d]
> [santee-06:43015] [ 4] /usr/global/openmpi/haswell/gcc53/1.10.3/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_set_view+0x299)[0x7fc1ce09a629]
> [santee-06:43015] [ 5] /usr/global/openmpi/haswell/gcc53/1.10.3/lib/libmpi.so.12(MPI_File_set_view+0xdd)[0x7fc1d5a5841d]
> [santee-06:43015] [ 6] ./nctst[0x423634]
> [santee-06:43015] [ 7] ./nctst[0x41ab05]
> [santee-06:43015] [ 8] ./nctst[0x41b993]
> [santee-06:43015] [ 9] ./nctst[0x4134ea]
> [santee-06:43015] [10] ./nctst[0x4703e9]
> [santee-06:43015] [11] ./nctst[0x441244]
> [santee-06:43015] [12] ./nctst[0x40f3c1]
> [santee-06:43015] [13] ./nctst[0x412943]
> [santee-06:43015] [14] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fc1d4c1dd5d]
> [santee-06:43015] [15] ./nctst[0x40b5c1]
> [santee-06:43015] *** End of error message ***
> make[2]: *** [testing] Segmentation fault
> make[2]: Leaving directory `/tmp/pnetcdf-1.7.0/build/test/CXX'
> make[1]: *** [check-CXX] Error 2
> make[1]: Leaving directory `/tmp/pnetcdf-1.7.0/build/test'
> make: *** [check] Error 2
> 
> 
>  
> 
>  	
> Jeb Baxley
> about.me/jeb.baxley
>  				
>  
> 
> On Tue, Jun 28, 2016 at 3:38 PM, Wei-keng Liao <wkliao at eecs.northwestern.edu> wrote:
> 
> What is the error message that reports the failed test program?
> 
> Wei-keng
> 
> On Jun 28, 2016, at 1:51 PM, Jeb Baxley wrote:
> 
> > Hey,
> >
> > I attempted to build parallel-netcdf with gcc 5.3.0 and openmpi 1.8 and 1.10 - but keep getting segmentation faults with either the CXX tests or the F90 tests.
> >
> > Has anyone built this package with GNU 5.3 and OpenMPI?  Also, I'm running RHEL 6.7.
> >
> >
> >
> > Jeb Baxley
> > about.me/jeb.baxley
> >
> >
> 
> 



More information about the parallel-netcdf mailing list