[Darshan-users] Darshan v3.2.1 hangs with mvapich2 2.3.3 (IOR)

Snyder, Shane ssnyder at mcs.anl.gov
Tue Sep 22 13:56:06 CDT 2020


Hi Cormac,

Thanks for reporting this bug.

This is the first I've seen an error like this and, to be honest, I'm not sure what could ultimately cause it. It seems Darshan is failing to write its header, which is the last thing that is written to the log file -- presumably, Darshan was able to write other portions of the log file with no problem, which is the odd part.

I'll have to try to see if I can reproduce the problem and work out what's going wrong. I'll keep you posted.

--Shane
________________________________
From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Cormac Garvey <cormac.t.garvey at gmail.com>
Sent: Friday, September 18, 2020 1:31 PM
To: darshan-users at lists.mcs.anl.gov <darshan-users at lists.mcs.anl.gov>
Subject: [Darshan-users] Darshan v3.2.1 hangs with mvapich2 2.3.3 (IOR)


Hi All,
I am unable to get darshan v3.2.1 to work with IOR v3.2.1 built with mvapich2 v2.3.3.

I am using CentOS 7.7

module load mpi/mvapich2
spack load darshan-runtime^mvapich2
export DARSHAN_LOG_DIR_PATH=/share/home/hpcuser/darshan_logs
export LD_PRELOAD=${DARSHAN_RUNTIME_DIR}/lib/libdarshan.so

TRANSFER_SIZE=32m
SIZE=2G
IO_API=POSIX
IO_API_ARG="-F"
TYPE_IO="direct_io"
TYPE_IO_ARG="-B"

MPI_OPTS=""-np 4 --hostfile $PBS_NODEFILE"

mpirun -genvlist LD_PRELOAD,DARSHAN_LOG_DIR_PATH -bind-to hwthread $MPI_OPTS $IOR_BIN/ior -a $IO_API -v -i 1 $TYPE_IO_ARG -m -d 1 $IO_API_ARG -w -t $TRANSFER_SIZE -b $SIZE -o ${FILESYSTEM}/test -O summaryFormat=$SUMMARY_FORMAT -O summaryFile=ior_${IO_API}_${TYPE_IO}_${TRANSFER_SIZE}_${SIZE}_${HOST}_${NUMPROCS}.out_$$

The IOR job hangs at the end if I export the darshan LD_PRELOAD (Runs correctly if I remove the LD_PRELOAD)

When I kill the job, i get the following information in PBS stderr file.

"darshan_library_warning: unable to write header to file /share/home/hpcuser/darshan_logs/hpcuser_ior_id26_9-18-65156-2174583250033636718.darshan_partial."

The IOR benchmark completes (i.e can see the I/O stats), but does not appear to exit correctly (remains in a hung state until I kill it)


Running a similar job with IOR+mpich or IOR+OpenMPI works fine with darshan.

Any ideas what I am missing?

Thanks for your support.

Regards,
Cormac.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/darshan-users/attachments/20200922/18d073ee/attachment.html>


More information about the Darshan-users mailing list