Problem with testing PNETCDF 1.7.0

Rob Latham robl at mcs.anl.gov
Tue Jun 21 12:54:15 CDT 2016



On 06/20/2016 04:28 PM, Kemp, Eric M. (GSFC-606.0)[SCIENCE SYSTEMS AND 
APPLICATIONS INC] wrote:
>
> On further investigation, I traced the problem to a linking error (for
> some reason, I must set LIBS=‘-lmpi’ when running configure with SGI
> MPT).  I found the hint in the README.SGI file.
>
> Sorry for the premature bug report.

I'm glad our big pile of machine-specific README files came in handy.

==rob

>
> Thanks,
>
> -Eric
>
> Eric M. Kemp (SSAI)
> NASA/GSFC
> Mail Code: 606
> Greenbelt, MD 20771
> 301.286.9768
> eric.kemp at nasa.gov
> eric.kemp at ssaihq.com
>
>
> From: <parallel-netcdf-bounces at lists.mcs.anl.gov
> <mailto:parallel-netcdf-bounces at lists.mcs.anl.gov>> on behalf of Eric
> Kemp <eric.kemp at nasa.gov <mailto:eric.kemp at nasa.gov>>
> Date: Monday, June 20, 2016 at 4:38 PM
> To: "parallel-netcdf at mcs.anl.gov <mailto:parallel-netcdf at mcs.anl.gov>"
> <parallel-netcdf at mcs.anl.gov <mailto:parallel-netcdf at mcs.anl.gov>>
> Subject: Problem with testing PNETCDF 1.7.0
>
>
> Dear PNETCDF developers:
>
> I’m attempting to install PNETCDF 1.7.0 on a Linux cluster running SLES
> 11.3 and GPFS, using SGI MPT 2.12 and Intel 15.0.3.187 compilers.  I get
> the following error when ‘make testing’ runs the tst_f90 program:
>
>     mpiexec_mpt -n 1 ./tst_f90      ./testfile.nc
>     srun.slurm: cluster configuration lacks support for cpu binding
>     MPT ERROR: rank:0, function:MPI_TYPE_CREATE_HVECTOR, Invalid argument
>     MPT: Global rank 0 is aborting with error code 0.
>           Process ID: 17524, Host: borgo015, Program:
>     /gpfsm/dnb32/emkemp/NUWRFLIB/svn/branches/features/external_upgrades/builds/parallel-netcdf-1.7.0/test/F90/tst_f90
>
>     MPT: --------stack traceback-------
>     MPT: Attaching to program: /proc/17524/exe, process 17524
>     MPT: Try: zypper install -C
>     "debuginfo(build-id)=48172710254f4e2549684d7d3e9f9622272d6c66"
>     MPT: (no debugging symbols found)...done.
>     MPT: [Thread debugging using libthread_db enabled]
>     MPT: Using host libthread_db library "/lib64/libthread_db.so.1".
>     MPT: Try: zypper install -C
>     "debuginfo(build-id)=f0721cb50ab9fbdf06314a53bff5af581bbefe64"
>     MPT: (no debugging symbols found)...done.
>     MPT: Try: zypper install -C
>     "debuginfo(build-id)=e2cab3c95cb1189420734b4af264b047355be2e5"
>     MPT: (no debugging symbols found)...done.
>     MPT: Try: zypper install -C
>     "debuginfo(build-id)=732292820e69f70459cb927ade5b49bc56d32b0f"
>     MPT: (no debugging symbols found)...done.
>     MPT: Try: zypper install -C
>     "debuginfo(build-id)=9fdc592b21682a31f460f6f043f50eea8c8b6821"
>     MPT: (no debugging symbols found)...done.
>     MPT: Try: zypper install -C
>     "debuginfo(build-id)=e1a13ecb56367b69b89d1c9ca1a4c42167336030"
>     MPT: (no debugging symbols found)...done.
>     MPT: Try: zypper install -C
>     "debuginfo(build-id)=719375f80fd84b85b905db2c20ec70e8805b36e5"
>
>     MPT: (no debugging symbols found)...done.
>     MPT: Try: zypper install -C
>     "debuginfo(build-id)=c4ce7f7c226abce4cec56fdbb4ed87e49024868d"
>     MPT: (no debugging symbols found)...done.
>     MPT: 0x00002aaaaacdc3bf in waitpid () from /lib64/libpthread.so.0
>     MPT: (gdb) #0  0x00002aaaaacdc3bf in waitpid () from
>     /lib64/libpthread.so.0
>     MPT: #1  0x00002aaaab3e40ec in mpi_sgi_system (command=<optimized
>     out>) at sig.c:99
>     MPT: #2  MPI_SGI_stacktraceback (header=<optimized out>) at sig.c:319
>     MPT: #3  0x00002aaaab337aea in print_traceback (ecode=0) at abort.c:197
>     MPT: #4  0x00002aaaab337bde in MPI_SGI_abort () at abort.c:85
>     MPT: #5  0x00002aaaab36f042 in errors_are_fatal (comm=<optimized out>,
>     MPT:     code=<optimized out>) at errhandler.c:220
>     MPT: #6  0x00002aaaab36f2f1 in MPI_SGI_error (comm=1, code=13) at
>     errhandler.c:56
>     MPT: #7  0x00002aaaab3edd1d in PMPI_Type_create_hindexed (count=0,
>     blocklens=0x0,
>     MPT:     indices=0x2872760, oldtype=27, newtype=0x7fffffff8a98)
>     MPT:     at type_create_hindexed.c:25
>     MPT: #8  0x00000000006f40fc in fillerup_aggregate..0 ()
>
>     MPT: #9  0x00000000006ea9fb in ncmpii_NC_enddef ()
>     MPT: #10 0x00000000006eb349 in ncmpii_enddef ()
>     MPT: #11 0x00000000006dedef in ncmpi_enddef ()
>     MPT: #12 0x0000000000406877 in pnetcdf_mp_nf90mpi_enddef_ ()
>     MPT: #13 0x0000000000405173 in MAIN__ ()
>     MPT: #14 0x00000000004049ee in main ()
>     MPT: (gdb) A debugging session is active.
>     MPT:
>     MPT:    Inferior 1 [process 17524] will be detached.
>     MPT:
>     MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
>     MPT: Detaching from program: /proc/17524/exe, process 17524
>
>     MPT: -----stack traceback ends-----
>     slurmstepd-borgo015: *** STEP 8841377.772 CANCELLED AT
>     2016-06-20T16:18:33 *** on borgo015
>     srun.slurm: Job step aborted: Waiting up to 2 seconds for job step
>     to finish.
>     srun.slurm: error: borgo015: task 0: Exited with exit code 255
>
>
> I’ve traced the trigger to the setting of the “_FillValue” attribute for
> the 3D pressure variable P in tst_f90.f90.  If I change the attribute
> name to “FillValue” the test program runs w/o apparent error.
>
> I have not found anything in the underlying PNetCDF Fortran or C codes
> that would explain this.  Do you have any suggestions?
>
> Thanks,
>
> -Eric
>
> Eric M. Kemp (SSAI)
> NASA/GSFC
> Mail Code: 606
> Greenbelt, MD 20771
> 301.286.9768
> eric.kemp at nasa.gov
> eric.kemp at ssaihq.com
>


More information about the parallel-netcdf mailing list