Problem with testing PNETCDF 1.7.0
Rob Latham
robl at mcs.anl.gov
Tue Jun 21 12:54:15 CDT 2016
On 06/20/2016 04:28 PM, Kemp, Eric M. (GSFC-606.0)[SCIENCE SYSTEMS AND
APPLICATIONS INC] wrote:
>
> On further investigation, I traced the problem to a linking error (for
> some reason, I must set LIBS=‘-lmpi’ when running configure with SGI
> MPT). I found the hint in the README.SGI file.
>
> Sorry for the premature bug report.
I'm glad our big pile of machine-specific README files came in handy.
==rob
>
> Thanks,
>
> -Eric
>
> Eric M. Kemp (SSAI)
> NASA/GSFC
> Mail Code: 606
> Greenbelt, MD 20771
> 301.286.9768
> eric.kemp at nasa.gov
> eric.kemp at ssaihq.com
>
>
> From: <parallel-netcdf-bounces at lists.mcs.anl.gov
> <mailto:parallel-netcdf-bounces at lists.mcs.anl.gov>> on behalf of Eric
> Kemp <eric.kemp at nasa.gov <mailto:eric.kemp at nasa.gov>>
> Date: Monday, June 20, 2016 at 4:38 PM
> To: "parallel-netcdf at mcs.anl.gov <mailto:parallel-netcdf at mcs.anl.gov>"
> <parallel-netcdf at mcs.anl.gov <mailto:parallel-netcdf at mcs.anl.gov>>
> Subject: Problem with testing PNETCDF 1.7.0
>
>
> Dear PNETCDF developers:
>
> I’m attempting to install PNETCDF 1.7.0 on a Linux cluster running SLES
> 11.3 and GPFS, using SGI MPT 2.12 and Intel 15.0.3.187 compilers. I get
> the following error when ‘make testing’ runs the tst_f90 program:
>
> mpiexec_mpt -n 1 ./tst_f90 ./testfile.nc
> srun.slurm: cluster configuration lacks support for cpu binding
> MPT ERROR: rank:0, function:MPI_TYPE_CREATE_HVECTOR, Invalid argument
> MPT: Global rank 0 is aborting with error code 0.
> Process ID: 17524, Host: borgo015, Program:
> /gpfsm/dnb32/emkemp/NUWRFLIB/svn/branches/features/external_upgrades/builds/parallel-netcdf-1.7.0/test/F90/tst_f90
>
> MPT: --------stack traceback-------
> MPT: Attaching to program: /proc/17524/exe, process 17524
> MPT: Try: zypper install -C
> "debuginfo(build-id)=48172710254f4e2549684d7d3e9f9622272d6c66"
> MPT: (no debugging symbols found)...done.
> MPT: [Thread debugging using libthread_db enabled]
> MPT: Using host libthread_db library "/lib64/libthread_db.so.1".
> MPT: Try: zypper install -C
> "debuginfo(build-id)=f0721cb50ab9fbdf06314a53bff5af581bbefe64"
> MPT: (no debugging symbols found)...done.
> MPT: Try: zypper install -C
> "debuginfo(build-id)=e2cab3c95cb1189420734b4af264b047355be2e5"
> MPT: (no debugging symbols found)...done.
> MPT: Try: zypper install -C
> "debuginfo(build-id)=732292820e69f70459cb927ade5b49bc56d32b0f"
> MPT: (no debugging symbols found)...done.
> MPT: Try: zypper install -C
> "debuginfo(build-id)=9fdc592b21682a31f460f6f043f50eea8c8b6821"
> MPT: (no debugging symbols found)...done.
> MPT: Try: zypper install -C
> "debuginfo(build-id)=e1a13ecb56367b69b89d1c9ca1a4c42167336030"
> MPT: (no debugging symbols found)...done.
> MPT: Try: zypper install -C
> "debuginfo(build-id)=719375f80fd84b85b905db2c20ec70e8805b36e5"
>
> MPT: (no debugging symbols found)...done.
> MPT: Try: zypper install -C
> "debuginfo(build-id)=c4ce7f7c226abce4cec56fdbb4ed87e49024868d"
> MPT: (no debugging symbols found)...done.
> MPT: 0x00002aaaaacdc3bf in waitpid () from /lib64/libpthread.so.0
> MPT: (gdb) #0 0x00002aaaaacdc3bf in waitpid () from
> /lib64/libpthread.so.0
> MPT: #1 0x00002aaaab3e40ec in mpi_sgi_system (command=<optimized
> out>) at sig.c:99
> MPT: #2 MPI_SGI_stacktraceback (header=<optimized out>) at sig.c:319
> MPT: #3 0x00002aaaab337aea in print_traceback (ecode=0) at abort.c:197
> MPT: #4 0x00002aaaab337bde in MPI_SGI_abort () at abort.c:85
> MPT: #5 0x00002aaaab36f042 in errors_are_fatal (comm=<optimized out>,
> MPT: code=<optimized out>) at errhandler.c:220
> MPT: #6 0x00002aaaab36f2f1 in MPI_SGI_error (comm=1, code=13) at
> errhandler.c:56
> MPT: #7 0x00002aaaab3edd1d in PMPI_Type_create_hindexed (count=0,
> blocklens=0x0,
> MPT: indices=0x2872760, oldtype=27, newtype=0x7fffffff8a98)
> MPT: at type_create_hindexed.c:25
> MPT: #8 0x00000000006f40fc in fillerup_aggregate..0 ()
>
> MPT: #9 0x00000000006ea9fb in ncmpii_NC_enddef ()
> MPT: #10 0x00000000006eb349 in ncmpii_enddef ()
> MPT: #11 0x00000000006dedef in ncmpi_enddef ()
> MPT: #12 0x0000000000406877 in pnetcdf_mp_nf90mpi_enddef_ ()
> MPT: #13 0x0000000000405173 in MAIN__ ()
> MPT: #14 0x00000000004049ee in main ()
> MPT: (gdb) A debugging session is active.
> MPT:
> MPT: Inferior 1 [process 17524] will be detached.
> MPT:
> MPT: Quit anyway? (y or n) [answered Y; input not from terminal]
> MPT: Detaching from program: /proc/17524/exe, process 17524
>
> MPT: -----stack traceback ends-----
> slurmstepd-borgo015: *** STEP 8841377.772 CANCELLED AT
> 2016-06-20T16:18:33 *** on borgo015
> srun.slurm: Job step aborted: Waiting up to 2 seconds for job step
> to finish.
> srun.slurm: error: borgo015: task 0: Exited with exit code 255
>
>
> I’ve traced the trigger to the setting of the “_FillValue” attribute for
> the 3D pressure variable P in tst_f90.f90. If I change the attribute
> name to “FillValue” the test program runs w/o apparent error.
>
> I have not found anything in the underlying PNetCDF Fortran or C codes
> that would explain this. Do you have any suggestions?
>
> Thanks,
>
> -Eric
>
> Eric M. Kemp (SSAI)
> NASA/GSFC
> Mail Code: 606
> Greenbelt, MD 20771
> 301.286.9768
> eric.kemp at nasa.gov
> eric.kemp at ssaihq.com
>
More information about the parallel-netcdf
mailing list