[Darshan-users] Darshan 3.3.1 aborting while generating log file

André R. Carneiro andre.es at gmail.com
Tue Jul 27 08:19:32 CDT 2021


Hi,

I'm testing version 3.3.1 with different MPI implementations. With newer
versions OpenMPI (4.X  with ROMIO v3.2.1) and Intel MPI (Parallel Studio XE
2019 and 2020  with ROMIO from MPICH v3.3) everything runs smoothly. But
with older versions I'm getting the erros bellow while generating the log
file when using a Lustre filesystem.

I'm only able to generate the log files with those older versions if I
configure the environment variable DARSHAN_LOGHINTS with "".

The application I'm testing is the BT-IO from NAS NPB v3.3.1.

The version of the Lustre FS is 2.12.4.1_cray_139_g0763d21

======================================================

* OpenMPI 3.X with ROMIO v3.1.4

Program received signal SIGFPE: Floating-point exception - erroneous
arithmetic operation.
Backtrace for this error:
#0  0x2b8da12e03ef in ???
#1  0x2b8da01e8abe in ???
#2  0x2b8da01ead06 in ???
#3  0x2b8da02186c0 in ???
#4  0x2b8da0218ddb in ???
#5  0x2b8da01da6f1 in ???
#6  0x2b8da015592b in ???
#7  0x2b8d9f813c40 in MPI_File_write_at_all
                  at lib/darshan-mpiio.c:573
#8  0x2b8d9f7f5134 in darshan_log_append
                  at lib/darshan-core.c:1884
#9  0x2b8d9f7f84bd in darshan_log_write_name_record_hash
                  at lib/darshan-core.c:1775
#10  0x2b8d9f7f84bd in darshan_core_shutdown
                  at lib/darshan-core.c:604
#11  0x2b8d9f7f4917 in MPI_Finalize
                  at lib/darshan-core-init-finalize.c:85
======================================================

* OpenMPI 3.X with OMPIO

Program received signal SIGSEGV: Segmentation fault - invalid memory
reference.
#0  0x2b6fc617627f in ???
#1  0x2b6fc468bcfd in darshan_core_lookup_record_name
at lib/darshan-core.c:2389
#2  0x2b6fc46a5484 in darshan_stdio_lookup_record_name
                  at lib/darshan-stdio.c:1288
#3  0x2b6fc4694c87 in fileno
                  at lib/darshan-posix.c:768
#4  0x2b6fc7b68504 in ???
#5  0x2b6fc7b69920 in ???
#6  0x2b6fc506c817 in ???
#7  0x2b6fc500ef1a in ???
#8  0x2b6fc50b4361 in ???
#9  0x2b6fc506e5c8 in ???
#10  0x2b6fc4fbafeb in ???
#11  0x2b6fc4fe8903 in ???
#12  0x2b6fc46a798b in MPI_File_open
                  at lib/darshan-mpiio.c:345
#13  0x2b6fc468d4a1 in darshan_log_open
                  at lib/darshan-core.c:1604
#14  0x2b6fc468d4a1 in darshan_core_shutdown
                  at lib/darshan-core.c:584
#15  0x2b6fc468a917 in MPI_Finalize
                  at lib/darshan-core-init-finalize.c:85
#16  0x2b6fc4d3b798 in ???
#17  0x4025cc in ???
#18  0x402f39 in ???
#19  0x2b6fc61623d4 in ???
#20  0x401868 in ???
#21  0xffffffffffffffff in ???

======================================================
* Intel PSXE 2018 with ROMIO from MPICH v3.2

forrtl: severe (71): integer divide by zero
Image              PC                Routine            Line        Source

libifcoremt.so.5   00002B4887FFE4CF  for__signal_handl     Unknown  Unknown
libpthread-2.17.s  00002B4887B6F630  Unknown               Unknown  Unknown
libmpi_lustre.so.  00002B488EF5EFDF  ADIOI_LUSTRE_Get_     Unknown  Unknown
libmpi_lustre.so.  00002B488EF59FD9  ADIOI_LUSTRE_Writ     Unknown  Unknown
libmpi.so.12.0     00002B4886FBD15C  Unknown               Unknown  Unknown
libmpi.so.12       00002B4886FBE1D5  PMPI_File_write_a     Unknown  Unknown
libdarshan.so      00002B4886500B07  MPI_File_write_at     Unknown  Unknown
libdarshan.so      00002B48864E088D  Unknown               Unknown  Unknown
libdarshan.so      00002B48864E3BC3  darshan_core_shut     Unknown  Unknown
libdarshan.so      00002B48864E00A8  MPI_Finalize          Unknown  Unknown
libmpifort.so.12.  00002B48867B24DA  pmpi_finalize__       Unknown  Unknown
bt.C.36.mpi_io_fu  0000000000402A35  Unknown               Unknown  Unknown
bt.C.36.mpi_io_fu  0000000000401D92  Unknown               Unknown  Unknown
libc-2.17.so       00002B488A76F545  __libc_start_main     Unknown  Unknown
bt.C.36.mpi_io_fu  0000000000401C99  Unknown               Unknown  Unknown
======================================================






-- 
Abraços³,
André Ramos Carneiro.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20210727/f8e2f1bf/attachment.html>


More information about the Darshan-users mailing list