[Darshan-users] Darshan & IPM results not the same

Harms, Kevin harms at alcf.anl.gov
Mon Nov 8 13:35:40 CST 2021


  a few ideas:
  - is the I/O done using fread() or similar? These are accounted under the STDIO module rather than the POSIX module. Can you check to see what STDIO module shows?
  - is the application threaded? It's possible an issue with threading, but given the disparity that seems less likely.
  - Perhaps an issue with darshan not intercepting a subset of the calls your application is making. If you look at the file name list, does it seem obvious that darshan is missing I/O from some set of files? (This could also be due to files being caught under the exclude list)


From: Darshan-users <darshan-users-bounces at lists.mcs.anl.gov> on behalf of Vineet Soni <vsoni at mercator-ocean.fr>
Sent: Monday, November 8, 2021 4:05 AM
To: darshan-users at lists.mcs.anl.gov
Subject: [Darshan-users] Darshan & IPM results not the same


I am trying to analyze the IO behavior of our codes with Darshan.
Darshan: 3.3.0
Compilers: Intel 2018
MPI: Intel MPI 2018
FS: Lustre (lustre-module disabled in Darshan configuration)
Darshan profiling: LD_PRELOAD

I observe a big difference in IO results from Darshan and IPM (v2.0.5) for one of our codes. I guess that both profilers are not profiling the same POSIX calls?

The POSIXIO calls profiled in IPM are:

fopen, fdopen, freopen, open, open64
fclose, close
fread, read
fwrite, write
fseek, lseek, lseek64
fgetpos, fsetpos, fgetc, getc, ungetc
truncate, ftruncate, truncate64, ftruncate64

While the ones profiled by Darshan are: https://github.com/darshan-hpc/darshan/blob/main/darshan-runtime/lib/darshan-posix.c ?

However, the huge difference is observed in the “read” call, which exists in both the profilers.

|                   |     IPM    |  Darshan  |
| Read (s)          |     324.57 |      6.02 |
| Agg. Read (count) | 34 766 456 | 2 946 271 |

I tested Darshan and IPM with other codes (reading NOT the same files) to check if this issue is faced in them as well. But, I got the same results.
So, I don't understand what could be the reason that this application is not giving the same results.

Do you have any idea of why this could happen?

Thanks in advance.

PS: The application does a lot of IO, and is expected to spend a significant time in read operations.

Best regards,

More information about the Darshan-users mailing list