[Darshan-users] Darshan 3.3.1 aborting while generating log file
André R. Carneiro
andre.es at gmail.com
Tue Jul 27 08:19:32 CDT 2021
Hi,
I'm testing version 3.3.1 with different MPI implementations. With newer
versions OpenMPI (4.X with ROMIO v3.2.1) and Intel MPI (Parallel Studio XE
2019 and 2020 with ROMIO from MPICH v3.3) everything runs smoothly. But
with older versions I'm getting the erros bellow while generating the log
file when using a Lustre filesystem.
I'm only able to generate the log files with those older versions if I
configure the environment variable DARSHAN_LOGHINTS with "".
The application I'm testing is the BT-IO from NAS NPB v3.3.1.
The version of the Lustre FS is 2.12.4.1_cray_139_g0763d21
======================================================
* OpenMPI 3.X with ROMIO v3.1.4
Program received signal SIGFPE: Floating-point exception - erroneous
arithmetic operation.
Backtrace for this error:
#0 0x2b8da12e03ef in ???
#1 0x2b8da01e8abe in ???
#2 0x2b8da01ead06 in ???
#3 0x2b8da02186c0 in ???
#4 0x2b8da0218ddb in ???
#5 0x2b8da01da6f1 in ???
#6 0x2b8da015592b in ???
#7 0x2b8d9f813c40 in MPI_File_write_at_all
at lib/darshan-mpiio.c:573
#8 0x2b8d9f7f5134 in darshan_log_append
at lib/darshan-core.c:1884
#9 0x2b8d9f7f84bd in darshan_log_write_name_record_hash
at lib/darshan-core.c:1775
#10 0x2b8d9f7f84bd in darshan_core_shutdown
at lib/darshan-core.c:604
#11 0x2b8d9f7f4917 in MPI_Finalize
at lib/darshan-core-init-finalize.c:85
======================================================
* OpenMPI 3.X with OMPIO
Program received signal SIGSEGV: Segmentation fault - invalid memory
reference.
#0 0x2b6fc617627f in ???
#1 0x2b6fc468bcfd in darshan_core_lookup_record_name
at lib/darshan-core.c:2389
#2 0x2b6fc46a5484 in darshan_stdio_lookup_record_name
at lib/darshan-stdio.c:1288
#3 0x2b6fc4694c87 in fileno
at lib/darshan-posix.c:768
#4 0x2b6fc7b68504 in ???
#5 0x2b6fc7b69920 in ???
#6 0x2b6fc506c817 in ???
#7 0x2b6fc500ef1a in ???
#8 0x2b6fc50b4361 in ???
#9 0x2b6fc506e5c8 in ???
#10 0x2b6fc4fbafeb in ???
#11 0x2b6fc4fe8903 in ???
#12 0x2b6fc46a798b in MPI_File_open
at lib/darshan-mpiio.c:345
#13 0x2b6fc468d4a1 in darshan_log_open
at lib/darshan-core.c:1604
#14 0x2b6fc468d4a1 in darshan_core_shutdown
at lib/darshan-core.c:584
#15 0x2b6fc468a917 in MPI_Finalize
at lib/darshan-core-init-finalize.c:85
#16 0x2b6fc4d3b798 in ???
#17 0x4025cc in ???
#18 0x402f39 in ???
#19 0x2b6fc61623d4 in ???
#20 0x401868 in ???
#21 0xffffffffffffffff in ???
======================================================
* Intel PSXE 2018 with ROMIO from MPICH v3.2
forrtl: severe (71): integer divide by zero
Image PC Routine Line Source
libifcoremt.so.5 00002B4887FFE4CF for__signal_handl Unknown Unknown
libpthread-2.17.s 00002B4887B6F630 Unknown Unknown Unknown
libmpi_lustre.so. 00002B488EF5EFDF ADIOI_LUSTRE_Get_ Unknown Unknown
libmpi_lustre.so. 00002B488EF59FD9 ADIOI_LUSTRE_Writ Unknown Unknown
libmpi.so.12.0 00002B4886FBD15C Unknown Unknown Unknown
libmpi.so.12 00002B4886FBE1D5 PMPI_File_write_a Unknown Unknown
libdarshan.so 00002B4886500B07 MPI_File_write_at Unknown Unknown
libdarshan.so 00002B48864E088D Unknown Unknown Unknown
libdarshan.so 00002B48864E3BC3 darshan_core_shut Unknown Unknown
libdarshan.so 00002B48864E00A8 MPI_Finalize Unknown Unknown
libmpifort.so.12. 00002B48867B24DA pmpi_finalize__ Unknown Unknown
bt.C.36.mpi_io_fu 0000000000402A35 Unknown Unknown Unknown
bt.C.36.mpi_io_fu 0000000000401D92 Unknown Unknown Unknown
libc-2.17.so 00002B488A76F545 __libc_start_main Unknown Unknown
bt.C.36.mpi_io_fu 0000000000401C99 Unknown Unknown Unknown
======================================================
--
Abraços³,
André Ramos Carneiro.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20210727/f8e2f1bf/attachment.html>
More information about the Darshan-users
mailing list