<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Hi Andre,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
Thanks for reporting these issues to us!</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
We think the 1st and 3rd issues you mention are related to a known bug in older versions of ROMIO's Lustre driver. This bug has since been fixed, but we probably do need to offer some sort of workaround in Darshan so we aren't crashing user codes. I've opened
up an issue on our GitHub to track this problem (<a href="https://github.com/darshan-hpc/darshan/issues/424" id="LPlnk304094">https://github.com/darshan-hpc/darshan/issues/424</a>) -- our current plan is to offer a configure option that helps work around this
issue for affected MPI versions, by avoiding usage of ROMIO's Lustre driver (where the bug is occurring).
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
I'll have to look into the 2nd issue you reported more to see if I can reproduce on systems that I have access to. I'll keep you posted on if I'm able to help narrow down what's going wrong there.<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);">
--Shane</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Darshan-users <darshan-users-bounces@lists.mcs.anl.gov> on behalf of André R. Carneiro <andre.es@gmail.com><br>
<b>Sent:</b> Tuesday, July 27, 2021 8:19 AM<br>
<b>To:</b> darshan-users@lists.mcs.anl.gov <darshan-users@lists.mcs.anl.gov><br>
<b>Subject:</b> [Darshan-users] Darshan 3.3.1 aborting while generating log file</font>
<div> </div>
</div>
<div>
<div dir="ltr">Hi,
<div><br>
</div>
<div>I'm testing version 3.3.1 with different MPI implementations. With newer versions OpenMPI (4.X with ROMIO v3.2.1) and Intel MPI (Parallel Studio XE 2019 and 2020 with ROMIO from MPICH v3.3) everything runs smoothly. But with older versions I'm getting
the erros bellow while generating the log file when using a Lustre filesystem. </div>
<div><br>
</div>
<div>I'm only able to generate the log files with those older versions if I configure the environment variable DARSHAN_LOGHINTS with "". </div>
<div><br>
</div>
<div>The application I'm testing is the BT-IO from NAS NPB v3.3.1.</div>
<div><br>
</div>
<div>The version of the Lustre FS is 2.12.4.1_cray_139_g0763d21</div>
<div><br>
</div>
<div>
<div>
<div>======================================================</div>
</div>
<div><br>
</div>
</div>
<div>
<div>* OpenMPI 3.X with ROMIO v3.1.4<br>
<br>
</div>
<div><span class="x_gmail-im" style="color:rgb(80,0,80)">Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.<br>
Backtrace for this error:<br>
</span>#0 0x2b8da12e03ef in ???<br>
#1 0x2b8da01e8abe in ???<br>
#2 0x2b8da01ead06 in ???<br>
#3 0x2b8da02186c0 in ???<br>
#4 0x2b8da0218ddb in ???<br>
#5 0x2b8da01da6f1 in ???<br>
#6 0x2b8da015592b in ???<br>
#7 0x2b8d9f813c40 in MPI_File_write_at_all<br>
at lib/darshan-mpiio.c:573<br>
#8 0x2b8d9f7f5134 in darshan_log_append<br>
at lib/darshan-core.c:1884<br>
#9 0x2b8d9f7f84bd in darshan_log_write_name_record_hash<br>
at lib/darshan-core.c:1775<br>
#10 0x2b8d9f7f84bd in darshan_core_shutdown<br>
at lib/darshan-core.c:604<br>
#11 0x2b8d9f7f4917 in MPI_Finalize<br>
at lib/darshan-core-init-finalize.c:85<br>
</div>
<div>======================================================</div>
</div>
<div><br>
</div>
<div>* OpenMPI 3.X with OMPIO<br>
</div>
<div><br>
</div>
<div>Program received signal SIGSEGV: Segmentation fault - invalid memory reference.<br>
</div>
<div>#0 0x2b6fc617627f in ???<br>
#1 0x2b6fc468bcfd in darshan_core_lookup_record_name<br>
at lib/darshan-core.c:2389<br>
#2 0x2b6fc46a5484 in darshan_stdio_lookup_record_name<br>
at lib/darshan-stdio.c:1288<br>
#3 0x2b6fc4694c87 in fileno<br>
at lib/darshan-posix.c:768<br>
#4 0x2b6fc7b68504 in ???<br>
#5 0x2b6fc7b69920 in ???<br>
#6 0x2b6fc506c817 in ???<br>
#7 0x2b6fc500ef1a in ???<br>
#8 0x2b6fc50b4361 in ???<br>
#9 0x2b6fc506e5c8 in ???<br>
#10 0x2b6fc4fbafeb in ???<br>
#11 0x2b6fc4fe8903 in ???<br>
#12 0x2b6fc46a798b in MPI_File_open<br>
at lib/darshan-mpiio.c:345<br>
#13 0x2b6fc468d4a1 in darshan_log_open<br>
at lib/darshan-core.c:1604<br>
#14 0x2b6fc468d4a1 in darshan_core_shutdown<br>
at lib/darshan-core.c:584<br>
#15 0x2b6fc468a917 in MPI_Finalize<br>
at lib/darshan-core-init-finalize.c:85<br>
#16 0x2b6fc4d3b798 in ???<br>
#17 0x4025cc in ???<br>
#18 0x402f39 in ???<br>
#19 0x2b6fc61623d4 in ???<br>
#20 0x401868 in ???<br>
#21 0xffffffffffffffff in ???<br>
</div>
<div><br>
</div>
<div>======================================================<br>
</div>
<div>
<div>* Intel PSXE 2018 with ROMIO from MPICH v3.2</div>
<div><br>
</div>
<div><span class="x_gmail-im" style="color:rgb(80,0,80)">forrtl: severe (71): integer divide by zero<br>
Image PC Routine Line Source <br>
</span>libifcoremt.so.5 00002B4887FFE4CF for__signal_handl Unknown Unknown<br>
libpthread-2.17.s 00002B4887B6F630 Unknown Unknown Unknown<br>
libmpi_lustre.so. 00002B488EF5EFDF ADIOI_LUSTRE_Get_ Unknown Unknown<br>
libmpi_lustre.so. 00002B488EF59FD9 ADIOI_LUSTRE_Writ Unknown Unknown<br>
libmpi.so.12.0 00002B4886FBD15C Unknown Unknown Unknown<br>
libmpi.so.12 00002B4886FBE1D5 PMPI_File_write_a Unknown Unknown<br>
libdarshan.so 00002B4886500B07 MPI_File_write_at Unknown Unknown<br>
libdarshan.so 00002B48864E088D Unknown Unknown Unknown<br>
libdarshan.so 00002B48864E3BC3 darshan_core_shut Unknown Unknown<br>
libdarshan.so 00002B48864E00A8 MPI_Finalize Unknown Unknown<br>
libmpifort.so.12. 00002B48867B24DA pmpi_finalize__ Unknown Unknown<br>
bt.C.36.mpi_io_fu 0000000000402A35 Unknown Unknown Unknown<br>
bt.C.36.mpi_io_fu 0000000000401D92 Unknown Unknown Unknown<br>
<a href="http://libc-2.17.so/" target="_blank">libc-2.17.so</a> 00002B488A76F545 __libc_start_main Unknown Unknown<br>
bt.C.36.mpi_io_fu 0000000000401C99 Unknown Unknown Unknown<br>
</div>
<div>======================================================<br>
</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div><br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">Abraços³, <br>
André Ramos Carneiro.</div>
</div>
</div>
</div>
</div>
</body>
</html>